baldart 4.24.3 → 4.26.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/CHANGELOG.md CHANGED
@@ -5,6 +5,27 @@ All notable changes to BALDART will be documented in this file.
5
5
  The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
6
6
  and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
7
7
 
8
+ ## [4.26.0] - 2026-06-11
9
+
10
+ **`new2`: Phase 1 decomposed into specialist agents — the owner implements, it no longer explores.** The old per-card pipeline gave the owner agent one mega-prompt ("you ARE claim + architect + plan-auditor + owner"), so the owner absorbed the whole codebase exploration into its own context and reached the actual coding with a degraded window. Phase 1 now runs as dedicated specialists with file handoff (`/tmp`), per the "ognuno fa una cosa" principle. Verified premise: nested subagent spawning does NOT exist in Claude Code (official docs: "Subagents cannot spawn other subagents" + 2 empirical probes), so the decomposition lives at the WORKFLOW level — the JS is the orchestrator, exactly what dynamic workflows are for. **MINOR** (pipeline capability on the EXPERIMENTAL `new2` surface only; no config key, no change to `/new`).
11
+
12
+ ### Changed
13
+
14
+ - **`framework/.claude/workflows/new2.js` — per-card `codebase-architect` specialist (B7).** Context retrieval is now a dedicated `codebase-architect` spawn that writes the COMPLETE untruncated baseline to `/tmp/arch-baseline-<CARD>.md` (the owner and reviewers Read the file; the structured return stays minimal). Bonus over the old pre-flight snapshot: the per-card architect sees prior in-batch commits — card N's baseline reflects what cards 1..N-1 changed. The pre-flight no longer writes baselines (generalist relieved of specialist work; `archBaselinePaths` removed from its contract). **Fail-safe**: architect crash/not-ok degrades to the old inline behavior (the owner explores itself) — never blocks the card; transient outage re-queues it.
15
+ - **`new2.js` — `plan-auditor` demoted to a deterministic DRIFT gate.** `/prd` already validates every card at creation time, so an unconditional re-audit per card was duplicate work. plan-auditor now runs ONLY when execution-time drift is plausible: (a) a prior in-batch commit touched the card's declared surface (`filesLikelyTouched` ∩ prior `filesChanged`, computed in JS), (b) the architect found declared paths missing (stale card — factual `missingPaths` check, no judgement), or (c) `review_profile: deep` (Rule C escalation). Its corrections amend the implementation BRIEFING (never the backlog YAML). A fresh single-card batch with no drift evidence skips it at zero cost. Crash → non-blocking skip, ledgered.
16
+ - **`new2.js` — epic guard moved to JS (zero spawns for trackers).** The pre-flight returns `cardGraph[].isEpic` (implement.md §6b rule); an epic card now short-circuits in JS before ANY spawn — the old path burned a full owner-agent spawn just to learn the card was a tracker. The impl agent's own epic flag stays as backstop.
17
+ - **`new2.js` telemetry — `per_card[].phase1`** records `{architect: done|inline-fallback, audit: pass|fixes|skipped-no-drift|skipped-error|skipped-no-baseline}` plus `phase1-architect`/`phase1-audit` ledger rows, so the A/B accounting can price the decomposition.
18
+
19
+ ## [4.25.0] - 2026-06-11
20
+
21
+ **`new2`: deterministic per-card review matrix — `balanced` no longer spawns every specialist on every card.** The old fan-out ran code-reviewer + doc-reviewer + qa-sentinel + api-perf-cost-auditor (+ security-reviewer) unconditionally at `balanced`: an api-perf pass on a doc-only card, a doc pass on a pure-code card — 1–3 wasted spawns per card. Each specialist now runs IFF its domain is evidenced by the card's actual surface (`scopeFiles ∪ MAY-EDIT`), computed deterministically in JS (no agent judgement) and audited via a `review-matrix` ledger row per card. **MINOR** (review-behavior capability on the EXPERIMENTAL `new2` surface only; no config key, no change to `/new` — its interactive profiles are untouched; the schema-change propagation rule does not apply).
22
+
23
+ ### Changed
24
+
25
+ - **`framework/.claude/workflows/new2.js` — relevance-gated `balanced` fan-out.** `touchesCode` → code-reviewer + qa-sentinel; `touchesDocs` (`.md`/`.mdx` or under `paths.docs_dir|references_dir|wiki_dir|prd_dir|design_system`) → doc-reviewer (on a doc-ONLY card it IS the core finder — code-reviewer and qa-sentinel are skipped); `touchesApiData` (superset of the final review's `hasApiDataFiles` regex) → api-perf-cost-auditor; the v4.24.2 `securityRelevant` gate → security-reviewer. Surface = `scopeFiles ∪ MAY-EDIT`, so a DoD-mandated doc the implementer FORGOT to edit still triggers doc-reviewer (it's in MAY-EDIT). **Fail-safes**: `deep` keeps the unconditional full fan-out (Rule C escalation is respected); an empty surface (no evidence) falls back to the full fan-out — absence of evidence is not evidence of absence. `light`/`skip` unchanged.
26
+ - **`new2.js` Phase Final — the F-041 `singleCard`/slim skip is now coverage-gated.** The final review's doc pass is skipped for a single-card batch ONLY when doc-reviewer ACTUALLY ran per-card (`reviewersRun`); otherwise it runs as the missing-doc-update safety net — so gating a per-card specialist off never leaves its domain unreviewed at batch level. This also closes a **latent F-041 hole**: under `light`, doc-reviewer never ran per-card, yet the slim skip still suppressed the final doc pass for single-card batches.
27
+ - **`new2.js` telemetry — `per_card[].reviewers`** records the gated matrix per card, so the A/B spawn accounting can measure exactly what the gating saves.
28
+
8
29
  ## [4.24.3] - 2026-06-10
9
30
 
10
31
  **`new2`: two follow-ups from the v4.24.2 logic review — epic closure no longer misses deferred cards, and the owner-gated classification reaches the final review.** Both are "parallel location" gaps of earlier fixes (the v4.17.2 meta-lesson): v4.22.1's epic-closure and v4.24.1's owner-gated deferral each missed one site. **PATCH** (bug-fix to the EXPERIMENTAL `new2` surface only; no config key, no change to `/new`).
package/VERSION CHANGED
@@ -1 +1 @@
1
- 4.24.3
1
+ 4.26.0
@@ -148,6 +148,10 @@ const PREFLIGHT_SCHEMA = {
148
148
  // spawn per card in runCard (always false on a fresh run).
149
149
  alreadyCommitted: { type: 'boolean', description: 'commit referencing the card exists in trunk..HEAD of the worktree AND validation re-runs green AND no open follow-up' },
150
150
  alreadyCommittedSha: { type: 'string' },
151
+ // B7 (v4.26.0) — epic guard moved to JS (an epic never reaches any spawn) + the
152
+ // card's declared surface, used by the deterministic plan-audit drift gate.
153
+ isEpic: { type: 'boolean', description: "epic/tracker card per implement.md §6b: id ends '-00' OR filename ends '-epic.yml' OR group.is_epic:true OR review_profile 'skip' with no requirements" },
154
+ filesLikelyTouched: { type: 'array', items: { type: 'string' }, description: "the card's files_likely_touched, verbatim" },
151
155
  // F-016 — ACs whose only implementation file is outside the card MAY-EDIT,
152
156
  // pre-classified deferred-by-policy (never routed to resolve).
153
157
  policyDeferredACs: { type: 'array', items: { type: 'object', additionalProperties: true } },
@@ -156,6 +160,9 @@ const PREFLIGHT_SCHEMA = {
156
160
  },
157
161
  excluded: { type: 'array', items: { type: 'object', additionalProperties: true } },
158
162
  ownershipMapPath: { type: 'string' },
163
+ // B7 — archBaselinePaths removed: the per-card baseline is now written by the
164
+ // codebase-architect SPECIALIST in runCard (it also sees prior in-batch commits,
165
+ // which a pre-flight snapshot cannot), not by this generalist agent.
159
166
  crossCard: { type: 'string' },
160
167
  // G2 — deterministic Codex availability (glob-resolved, NOT a self-reported judgement),
161
168
  // so a false negative on the cross-card check is visible in telemetry. true=companion found.
@@ -164,7 +171,6 @@ const PREFLIGHT_SCHEMA = {
164
171
  // `light` card can run Codex as its finder. Empty string when codexResolved is false.
165
172
  codexScriptPath: { type: 'string' },
166
173
  workspaceNote: { type: 'string' },
167
- archBaselinePaths: { type: 'array', items: { type: 'string' } },
168
174
  },
169
175
  }
170
176
 
@@ -220,11 +226,11 @@ try {
220
226
  g3Bullet +
221
227
  `• G4 card-field validation (setup.md 1b/1c): card missing requirements/acceptance_criteria/files_likely_touched → EXCLUDE (excluded[] + reason). Never HALT for one bad card.\n` +
222
228
  `• G5 depends-on: a card whose depends_on names a non-DONE card NOT in this batch → EXCLUDE it AND every in-batch card that transitively depends on it.\n` +
223
- `• cardGraph (REQUIRED, F-021): for every runnable card return { id, dependsOn:[IN-BATCH deps only], ownerAgent (the card's owner_agent; G25 unknown→'coder'), reviewProfile (the card's review_profile; default 'balanced'), policyDeferredACs, alreadyCommitted, alreadyCommittedSha }.\n` +
229
+ `• cardGraph (REQUIRED, F-021): for every runnable card return { id, dependsOn:[IN-BATCH deps only], ownerAgent (the card's owner_agent; G25 unknown→'coder'), reviewProfile (the card's review_profile; default 'balanced'), policyDeferredACs, alreadyCommitted, alreadyCommittedSha, isEpic (implement.md §6b epic guard: id ends '-00' OR filename ends '-epic.yml' OR group.is_epic:true OR review_profile 'skip' with no requirements), filesLikelyTouched (verbatim from the YAML) }.\n` +
224
230
  `• B1/F-026 idempotency (per card, AFTER the worktree exists): set alreadyCommitted:true (+ alreadyCommittedSha) IFF ALL hold: (a) a commit referencing the card id exists in ${TRUNK}..HEAD of the worktree; (b) the card's validation_commands re-run GREEN right now; (c) NO open follow-up card for it exists in ${paths.backlog_dir || 'backlog'}. On a FRESH worktree ${TRUNK}..HEAD is empty → all false, zero extra work.\n` +
225
231
  `• F-016 AC↔ownership consistency: for each acceptance_criterion, derive the file(s) it requires editing. If those files are NOT a subset of the card's MAY-EDIT/files_likely_touched → add the AC to policyDeferredACs:[{n,text,owningCard|owningFile,reason}] (it will become ONE follow-up, never a resolve). Do the same for any AC whose remedy is an owner-gated infra action (remote db push / deploy / secret / DNS).\n` +
226
232
  `• Ownership (setup.md 3c): build the file-ownership map → /tmp; return ownershipMapPath. F-040: each card's MAY-EDIT = files_likely_touched ∪ every path NAMED EXPLICITLY in that card's acceptance_criteria/definition_of_done (an ADR the DoD says to update, the data-model / ER doc for a schema-change, etc.) — so editing a DoD-mandated doc is NOT a file-diff violation. Do NOT add another card's files this way.\n` +
227
- `• Persist per-card architecture baselines to /tmp/arch-baseline-<CARD>.md; return archBaselinePaths.\n\n` +
233
+ `• Do NOT write architecture baselines the per-card codebase-architect specialist does that during the card pipeline (B7).\n\n` +
228
234
  `Return the structured PREFLIGHT object. ok:false ONLY if the workspace is unworkable.`,
229
235
  { label: 'preflight', phase: 'Pre-flight', agentType: 'general-purpose', schema: PREFLIGHT_SCHEMA }
230
236
  )
@@ -267,7 +273,6 @@ const sharedCtx = {
267
273
  worktreePath: preflight.worktreePath,
268
274
  branch: preflight.branch,
269
275
  ownershipMapPath: preflight.ownershipMapPath,
270
- archBaselinePaths: preflight.archBaselinePaths || [],
271
276
  // v4.18.0 — per-card Codex-light finder needs the resolved companion path in runCard scope.
272
277
  codexResolved: !!preflight.codexResolved,
273
278
  codexScriptPath: preflight.codexScriptPath || '',
@@ -390,6 +395,11 @@ async function runCard(cardId, cardPath) {
390
395
  const deferredClasses = new Set(deferredOpen ? ['policy-deferred-ac'] : [])
391
396
  function g(name, decision, detail) { gates.push({ gate: name, decision, detail: detail || '' }); ledger(cardId, name, decision, detail) }
392
397
 
398
+ // B7 — epic guard in JS (pre-flight reads the YAML): an epic is a tracker, not implementable
399
+ // work — skip it BEFORE any spawn (the old path burned a full owner-agent spawn to learn it).
400
+ // The impl agent's own epic flag stays as backstop for a pre-flight false negative.
401
+ if (node.isEpic) { g('router', 'EPIC-SKIPPED', 'epic card (pre-flight, zero spawns)'); return { card: cardId, status: 'epic-skipped', gates, commit: '-' } }
402
+
393
403
  // F-026/B1 — skip-completed from the PRE-FLIGHT's git-authoritative probe (cardGraph[].
394
404
  // alreadyCommitted), not a per-card agent spawn: on a fresh run the old Haiku probe was N
395
405
  // guaranteed-false spawns, and on resume the journal cache already covers it. Keyed on the
@@ -399,14 +409,70 @@ async function runCard(cardId, cardPath) {
399
409
  return { card: cardId, status: 'committed', commit: node.alreadyCommittedSha || '-', filesChanged: [], scopeFiles: [], archBaselinePath: `/tmp/arch-baseline-${cardId}.md`, gates }
400
410
  }
401
411
 
402
- const cardBrief = `${projectBrief}\n\nCard: ${cardId}\nCard YAML: ${cardPath}\nOwner agent: ${ownerAgent} · Review profile: ${reviewProfile}\nWorktree: ${sharedCtx.worktreePath} (cd into it)\nFile-ownership map: ${sharedCtx.ownershipMapPath}\nArch baseline (write to /tmp/arch-baseline-${cardId}.md): reuse if present.\nNOTE: ACs already pre-classified as policy-deferred MUST NOT be implemented or routed — they are tracked as follow-ups.`
412
+ const cardBrief = `${projectBrief}\n\nCard: ${cardId}\nCard YAML: ${cardPath}\nOwner agent: ${ownerAgent} · Review profile: ${reviewProfile}\nWorktree: ${sharedCtx.worktreePath} (cd into it)\nFile-ownership map: ${sharedCtx.ownershipMapPath}\nNOTE: ACs already pre-classified as policy-deferred MUST NOT be implemented or routed — they are tracked as follow-ups.`
403
413
 
404
- // --- Phase 1+2: dispatch the card's OWNER_AGENT (F-024), not general-purpose. ---
414
+ // --- Phase 1 (B7, v4.26.0) SPECIALIST decomposition: each Phase-1 duty runs as its own
415
+ // agent ("ognuno fa una cosa"), with handoff via /tmp files so the owner's context stays
416
+ // clean of exploration noise. The old single mega-prompt ("you ARE claim+architect+
417
+ // plan-auditor+owner") made the owner absorb the whole exploration into its context.
418
+ const baselinePath = `/tmp/arch-baseline-${cardId}.md`
419
+ const phase1 = { architect: 'inline-fallback', audit: 'skipped-no-drift' }
420
+ let arch = null
421
+ try {
422
+ arch = await agentSafe(
423
+ `You are the Phase-1 context retriever for card ${cardId} (per ${REF}/implement.md Phase 1 step 3 / 5b). cd into the worktree ${sharedCtx.worktreePath}.\n\n${cardBrief}\n\n` +
424
+ `Explore the codebase exactly as your system prompt mandates for this card's scope (requirements + files_likely_touched: ${JSON.stringify(node.filesLikelyTouched || [])}). Write your COMPLETE untruncated findings (file paths, type signatures, patterns, high-risk paths) to ${baselinePath} — refresh it if a stale copy exists. The owner agent and the per-card reviewers will Read that file; keep your structured return MINIMAL.\n\n` +
425
+ `Also report missingPaths: every path in files_likely_touched that does NOT exist in the worktree (factual ls/test check — no judgement). Return: { ok, missingPaths:[...], note }`,
426
+ { label: `architect:${cardId}`, phase: 'Implement', agentType: 'codebase-architect',
427
+ schema: { type: 'object', required: ['ok'], additionalProperties: true, properties: { ok: { type: 'boolean' }, missingPaths: { type: 'array', items: { type: 'string' } }, note: { type: 'string' } } } }
428
+ )
429
+ if (arch && arch.ok) { phase1.architect = 'done'; g('phase1-architect', 'DONE', `baseline → ${baselinePath}${(arch.missingPaths || []).length ? ' · missing: ' + arch.missingPaths.join(', ') : ''}`) }
430
+ else g('phase1-architect', 'FALLBACK-INLINE', (arch && arch.note) || 'architect returned not-ok — owner explores inline')
431
+ } catch (e) {
432
+ if (e && e.transientExhausted) { noteDegraded('outage'); return { card: cardId, status: 'pending', gates } }
433
+ g('phase1-architect', 'FALLBACK-INLINE', `architect crashed (${String(e && e.message)}) — owner explores inline`)
434
+ }
435
+ const architectOk = phase1.architect === 'done'
436
+
437
+ // B7 — plan-auditor as a DETERMINISTIC DRIFT GATE, not an unconditional duplicate of the
438
+ // validation /prd already ran at card-creation time. It runs ONLY when execution-time drift
439
+ // is plausible: (a) a prior in-batch commit touched this card's declared surface (the
440
+ // strongest signal — the codebase moved under the card), (b) the architect found declared
441
+ // paths missing (stale card), or (c) review_profile 'deep' (Rule C escalation). Fresh
442
+ // single-card batches with no drift evidence skip it — /prd's validation stands.
443
+ const flt = (node.filesLikelyTouched || []).map(String)
444
+ const priorTouched = perCardResults.filter((r) => r.status === 'committed').flatMap((r) => r.filesChanged || [])
445
+ const inBatchOverlap = flt.length && priorTouched.some((f) => flt.some((c) => String(f).includes(c) || c.includes(String(f))))
446
+ const needAudit = reviewProfile === 'deep' || inBatchOverlap || (architectOk && (arch.missingPaths || []).length > 0)
447
+ let auditCorrections = []
448
+ if (needAudit && architectOk) {
449
+ try {
450
+ const audit = await agentSafe(
451
+ `Audit card ${cardId} for EXECUTION-TIME DRIFT per ${REF}/implement.md Phase 1 step 4 (you are plan-auditor; the card was already validated by /prd at creation time — your job is what changed SINCE). cd into ${sharedCtx.worktreePath}.\n\n${cardBrief}\nArchitecture baseline (Read it): ${baselinePath}\nDrift signals: ${inBatchOverlap ? 'prior in-batch commits touched this card surface; ' : ''}${(arch.missingPaths || []).length ? 'missing declared paths: ' + arch.missingPaths.join(', ') : ''}\n\n` +
452
+ `Check ONLY: (1) paths in files_likely_touched still exist; (2) type/field references in the requirements still correct per the baseline; (3) [ASSUMED] items now answerable from the code. Return PASS or the exact corrections — corrections amend the IMPLEMENTATION BRIEFING, never the backlog YAML.\n\nReturn: { verdict, corrections:[strings] }`,
453
+ { label: `plan-audit:${cardId}`, phase: 'Implement', agentType: 'plan-auditor',
454
+ schema: { type: 'object', required: ['verdict'], additionalProperties: true, properties: { verdict: { enum: ['PASS', 'FIXES_NEEDED'] }, corrections: { type: 'array', items: { type: 'string' } } } } }
455
+ )
456
+ auditCorrections = (audit && audit.corrections) || []
457
+ phase1.audit = audit && audit.verdict === 'FIXES_NEEDED' ? 'fixes' : 'pass'
458
+ g('phase1-audit', phase1.audit === 'fixes' ? 'FIXES-APPLIED-TO-BRIEF' : 'PASS', `drift gate (${reviewProfile === 'deep' ? 'deep' : inBatchOverlap ? 'in-batch overlap' : 'missing paths'})${auditCorrections.length ? ' · ' + auditCorrections.length + ' corrections' : ''}`)
459
+ } catch (e) {
460
+ if (e && e.transientExhausted) { noteDegraded('outage'); return { card: cardId, status: 'pending', gates } }
461
+ phase1.audit = 'skipped-error'
462
+ g('phase1-audit', 'SKIPPED', `auditor crashed (${String(e && e.message)}) — proceeding without audit (non-blocking)`)
463
+ }
464
+ } else if (needAudit && !architectOk) { phase1.audit = 'skipped-no-baseline'; g('phase1-audit', 'SKIPPED', 'drift signaled but no baseline — owner re-derives context inline') }
465
+ else g('phase1-audit', 'SKIPPED', 'no drift evidence — /prd creation-time validation stands')
466
+
467
+ // --- Phase 2: dispatch the card's OWNER_AGENT (F-024), not general-purpose. ---
468
+ const phase1Brief = architectOk
469
+ ? `Phase 1 already ran as specialist agents. READ the architecture baseline at ${baselinePath} BEFORE writing any code — it is your codebase context; do NOT redo the exploration.${auditCorrections.length ? `\nPlan-audit corrections (apply them as amendments to this briefing — do NOT modify the backlog YAML):\n${auditCorrections.map((c, i) => ` ${i + 1}. ${c}`).join('\n')}` : ''}`
470
+ : `Phase 1 fallback: the architect specialist was unavailable — do the Phase 1 claim+architect exploration yourself per ${REF}/implement.md and persist the baseline to ${baselinePath} before coding.`
405
471
  let impl
406
472
  try {
407
473
  impl = await agentSafe(
408
- `Implement card ${cardId} per ${REF}/implement.md (Phase 1 claim+architect+plan-auditor, Phase 2 you ARE the owner_agent '${ownerAgent}') and ${REF}/completeness.md (Phase 2.5 + 2.5b AC-closure ledger). Run all gates/bash yourself.\n\n${cardBrief}\n\n` +
409
- `POLICIES: G26 Phase-2 lint/tsc/test/build failing after the module's retry cap → buildBlocked:true + blockedGate. Build the AC Closure Ledger (one row per AC: implemented|unmet|policy-deferred). DO NOT silently defer; report unmet rows (excluding policy-deferred). Persist arch baseline to /tmp/arch-baseline-${cardId}.md and the diff to /tmp/diff-${cardId}.txt.\n\n` +
474
+ `Implement card ${cardId} per ${REF}/implement.md Phase 2 you ARE the owner_agent '${ownerAgent}' and ${REF}/completeness.md (Phase 2.5 + 2.5b AC-closure ledger). Run all gates/bash yourself.\n\n${phase1Brief}\n\n${cardBrief}\n\n` +
475
+ `POLICIES: G26 Phase-2 lint/tsc/test/build failing after the module's retry cap → buildBlocked:true + blockedGate. Build the AC Closure Ledger (one row per AC: implemented|unmet|policy-deferred). DO NOT silently defer; report unmet rows (excluding policy-deferred). Persist the diff to /tmp/diff-${cardId}.txt.\n\n` +
410
476
  `E4 OWNERSHIP RECONCILE (implement.md §11b — do this BEFORE returning): the card's MAY-EDIT includes files_likely_touched ∪ paths NAMED EXPLICITLY in this card's acceptance_criteria/definition_of_done (e.g. an ADR the DoD says to update, the data-model / ER doc for a schema change). Editing THOSE is in-scope. For any OTHER dirty file outside MAY-EDIT (another card's file, or unrelated): \`git checkout -- <file>\` to revert it (NEVER leave it orphaned), list it in revertedOutOfOwnership. Set fileDiffViolation:true ONLY if such an edit genuinely could not be reverted (then say why in note) — it is no longer a silent label.\n\n` +
411
477
  `Return: { epic, buildBlocked, blockedGate, unmetACs:[{n,text}], scopeFiles, mayEditPaths, revertedOutOfOwnership:[paths], fileDiffViolation, note }`,
412
478
  { label: `implement:${cardId}`, phase: 'Implement', agentType: ownerAgent,
@@ -460,12 +526,40 @@ async function runCard(cardId, cardPath) {
460
526
  // v4.18.0 — at `light`, Codex is the SOLE finder (cost-shift off Claude); `code-reviewer` is the
461
527
  // fallback when the companion is unavailable. The FP-gate equivalent is preserved downstream: any
462
528
  // block routes through resolve(), whose mandatory adversarial judge (new2-resolve F-015, code domain
463
- // → code-reviewer) cross-checks the Codex finding before a fix/followup. `balanced`/`deep` unchanged.
529
+ // → code-reviewer) cross-checks the Codex finding before a fix/followup.
464
530
  const codexAvail = !!sharedCtx.codexResolved && !!sharedCtx.codexScriptPath
465
- const reviewers = reviewProfile === 'skip' ? []
466
- : reviewProfile === 'light' ? (codexAvail ? ['codex'] : ['code-reviewer'])
467
- : ['code-reviewer', 'doc-reviewer', 'qa-sentinel', 'api-perf-cost-auditor'].concat(
468
- securityRelevant ? ['security-reviewer'] : [])
531
+ // B6 (v4.25.0) deterministic per-card review matrix. `balanced` no longer means "every
532
+ // specialist on every card": each one runs IFF its domain is evidenced by the card's actual
533
+ // surface (scopeFiles ∪ MAY-EDIT), computed deterministically HERE in JS and audited via a
534
+ // `review-matrix` ledger row. `deep` keeps the unconditional full fan-out (Rule C assigns it
535
+ // to high-risk cards — respect the escalation). Coverage holds at batch level: the final
536
+ // review's doc pass stays the missing-doc-update safety net (its singleCard/slim skip is now
537
+ // gated on doc-reviewer having ACTUALLY run per-card — see Phase Final), and its api-perf
538
+ // pass keys on hasApiDataFiles with a regex this matrix supersets — so gating OFF a per-card
539
+ // specialist never leaves its domain unreviewed.
540
+ const surface = dedupe((scopeFiles || []).concat(mayEdit || []))
541
+ const docDirs = [paths.docs_dir, paths.references_dir, paths.wiki_dir, paths.prd_dir, paths.design_system].filter(Boolean)
542
+ const isDocFile = (f) => /\.(md|mdx)$/i.test(String(f)) || docDirs.some((d) => String(f).includes(String(d)))
543
+ const touchesDocs = surface.some(isDocFile)
544
+ const touchesCode = surface.some((f) => !isDocFile(f))
545
+ // superset of new-final-review's hasApiDataFiles regex — per-card coverage may exceed the
546
+ // final pass, never lag it.
547
+ const touchesApiData = surface.some((f) => /api\/|data-model|\.sql$|migrations?\/|server|route|edge|middleware|cron|queue|worker|prisma|drizzle|supabase|schema/i.test(String(f)))
548
+ const noEvidence = surface.length === 0 // fail-safe: absence of evidence ≠ evidence of absence
549
+ const FULL_FANOUT = ['code-reviewer', 'doc-reviewer', 'qa-sentinel', 'api-perf-cost-auditor'].concat(securityRelevant ? ['security-reviewer'] : [])
550
+ let reviewers
551
+ if (reviewProfile === 'skip') reviewers = []
552
+ else if (reviewProfile === 'light') reviewers = codexAvail ? ['codex'] : ['code-reviewer']
553
+ else if (reviewProfile === 'deep' || noEvidence) reviewers = FULL_FANOUT
554
+ else { // balanced — relevance-gated
555
+ reviewers = []
556
+ if (touchesCode) reviewers.push('code-reviewer')
557
+ if (touchesDocs) reviewers.push('doc-reviewer') // doc-ONLY card: doc-reviewer IS the core finder
558
+ if (touchesCode) reviewers.push('qa-sentinel') // skip for doc-only cards — no behavior to QA
559
+ if (touchesApiData) reviewers.push('api-perf-cost-auditor')
560
+ if (securityRelevant) reviewers.push('security-reviewer')
561
+ }
562
+ if (reviewProfile !== 'skip') g('review-matrix', 'PLANNED', `[${reviewProfile}${noEvidence && reviewProfile === 'balanced' ? '→conservative-full (no surface evidence)' : ''}] ${reviewers.join('+') || '(none)'} · docs:${touchesDocs} code:${touchesCode} api/data:${touchesApiData} sec:${securityRelevant}`)
469
563
  const reviewSchema = { type: 'object', required: ['blocks', 'scopeExpansion'], additionalProperties: true,
470
564
  properties: { blocks: { type: 'array', items: { type: 'object', additionalProperties: true } }, scopeExpansion: { type: 'array', items: { type: 'object', additionalProperties: true } }, note: { type: 'string' } } }
471
565
  let reviewResults = []
@@ -535,7 +629,7 @@ async function runCard(cardId, cardPath) {
535
629
  for (const sx of grp) g('scope-expansion', s === 'resolved' ? 'INTEGRATED' : 'FOLLOWUP', sx.evidence || '')
536
630
  }
537
631
 
538
- if (cardBlocked) { await rollbackCard(cardId, mayEdit); return { card: cardId, status: 'followup', gates, commit: '-', scopeFiles, archBaselinePath: `/tmp/arch-baseline-${cardId}.md` } }
632
+ if (cardBlocked) { await rollbackCard(cardId, mayEdit); return { card: cardId, status: 'followup', gates, commit: '-', scopeFiles, reviewersRun: reviewers, archBaselinePath: `/tmp/arch-baseline-${cardId}.md` } }
539
633
 
540
634
  // --- Phase 4 — commit (F-023: Haiku + git-status reconcile, never git add -A). ---
541
635
  // F-040/H — DONE policy. A card with an OPEN owner-gated/policy-deferred AC commits its code but
@@ -578,6 +672,11 @@ async function runCard(cardId, cardPath) {
578
672
  // (A3): an 'unresolved' class means the DoD is genuinely unmet → the card stays IN_PROGRESS.
579
673
  deferred: deferredOpen,
580
674
  deferredClasses: Array.from(deferredClasses),
675
+ // B6 — which reviewers ACTUALLY ran (the gated matrix); Phase Final keys its slim/skip
676
+ // decision on this, and per_card telemetry records it for the A/B spawn accounting.
677
+ reviewersRun: reviewers,
678
+ // B7 — Phase-1 specialist record (architect: done|inline-fallback; audit: pass|fixes|skipped-*).
679
+ phase1,
581
680
  scopeFiles, archBaselinePath: `/tmp/arch-baseline-${cardId}.md`, gates,
582
681
  }
583
682
  }
@@ -675,9 +774,15 @@ if (committed.length && !degraded) {
675
774
  firstCardId: firstCard, worktreePath: sharedCtx.worktreePath, baseBranch: TRUNK,
676
775
  cardPaths: committed.map((r) => pathById[r.card]).filter(Boolean),
677
776
  reviewScopeFiles, archBaselinePaths: allArch, hasApiDataFiles, config: cfg,
678
- // F-041 — single-card batch: doc-reviewer + api-perf already ran per-card and there is
679
- // NO cross-card conflict to find. Keep only the cross-model Codex pass + qa gates.
680
- singleCard: committed.length === 1,
777
+ // F-041 + B6 slim the final pass (skip its doc-reviewer) ONLY when doc-reviewer
778
+ // ACTUALLY ran per-card for this single card. The old unconditional `length === 1` was
779
+ // built on the premise "doc + api-perf already ran per-card" — false under the gated
780
+ // review matrix (and it was ALREADY false under `light`, a latent F-041 coverage hole
781
+ // this closes). When per-card doc review was gated off, the final doc pass is the
782
+ // missing-doc-update safety net and must run even for a single card. api-perf at the
783
+ // final pass is independently gated by hasApiDataFiles (regex the per-card matrix
784
+ // supersets), so its coverage is consistent either way.
785
+ singleCard: committed.length === 1 && ((committed[0].reviewersRun || []).includes('doc-reviewer')),
681
786
  })
682
787
  } catch (e) { if (e && isTransient(e)) noteDegraded('outage'); final = null }
683
788
 
@@ -840,7 +945,7 @@ function buildTelemetry() {
840
945
  // cost — total_tokens via budget.spent() delta; agent_count via counter; wall_clock_s stamped by the SKILL.
841
946
  total_tokens: totalTokens,
842
947
  agent_count: agentCount,
843
- per_card: perCardResults.map((r) => ({ card: r.card, status: r.status, deferred: !!r.deferred, deferredClasses: r.deferredClasses || [], gates: (r.gates || []).length })),
948
+ per_card: perCardResults.map((r) => ({ card: r.card, status: r.status, deferred: !!r.deferred, deferredClasses: r.deferredClasses || [], reviewers: r.reviewersRun || [], phase1: r.phase1 || null, gates: (r.gates || []).length })),
844
949
  stats_requested: !!FLAGS.stats,
845
950
  }
846
951
  }
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "baldart",
3
- "version": "4.24.3",
3
+ "version": "4.26.0",
4
4
  "description": "Claude Agent Framework - Reusable framework for coordinating AI agents and humans in software projects",
5
5
  "bin": {
6
6
  "baldart": "./bin/baldart.js"