baldart 4.29.0 → 4.29.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/CHANGELOG.md CHANGED
@@ -5,6 +5,15 @@ All notable changes to BALDART will be documented in this file.
5
5
  The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
6
6
  and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
7
7
 
8
+ ## [4.29.1] - 2026-06-11
9
+
10
+ **`new2`: `deferral_breakdown` telemetry — see WHY residuals became follow-ups, per class.** Follow-up cards in `new2` are not "useless deferred work": an in-scope fixable anomaly is fixed directly by `resolve()`'s domain fixer in-batch; a follow-up is created ONLY for a residual that is structurally undeferrable (`out-of-ownership` would edit another card's files · `owner-gated`/`not-a-code-defect`/`baseline-not-reached` aren't code a coder can apply · `unresolved` already failed fixer+judge+tier-2 · `scope-expansion` with a new AC needs a PRD decision · `outage`). To diagnose a run with *many* follow-ups, the telemetry now counts residuals per class so a skewed breakdown points at the real root cause (many `out-of-ownership` → PRD MAY-EDIT too narrow · many `unresolved` → fixes genuinely hard · many `scope-expansion` → cards under-specified upstream) — data first, before any change to the deferral logic. **PATCH** (observability only on the EXPERIMENTAL `new2` surface; diagnosis-only — nothing is auto-implemented from a residual; no behavior change, no config key).
11
+
12
+ ### Added
13
+
14
+ - **`framework/.claude/workflows/new2.js`** — `telemetry.deferral_breakdown` (per-class residual counts, derived from the `deferralClass`/`kind` already carried on each residual) + a "Ripartizione per classe" line in the human report's Residui section.
15
+ - **`framework/.claude/skills/new2/SKILL.md`** — the A/B record step now keeps `deferral_breakdown` and names it the data to consult BEFORE proposing any deferral-logic change.
16
+
8
17
  ## [4.29.0] - 2026-06-11
9
18
 
10
19
  **Final review (F.4): each domain specialist owns its lane — no cross-domain `code-reviewer` re-judge, no self-judge.** The verification step classified findings by spawning `code-reviewer` over EVERY low-confidence finding. That was wrong on two counts: (a) a `doc` finding from `doc-reviewer` (or an `api`/`perf` finding from `api-perf-cost-auditor`) was re-validated by `code-reviewer` — the WRONG specialist judging another domain (a `code-reviewer` judging prose); (b) when Codex is unavailable the `code-reviewer` *fallback* produces the findings and `code-reviewer` then re-judged its OWN findings — self-judging with no model diversity (the same waste removed from the resolve pass in v4.27.1–2). Now every domain specialist FP-checks its OWN findings in the finding pass (doc-reviewer, api-perf-cost-auditor, and the Codex/`code-reviewer`-fallback code engine), so surviving findings arrive already validated; the residual `confidence < 80` path is routed to the finding's DOMAIN specialist (doc→doc-reviewer, api/perf→api-perf-cost-auditor, security/migration→security-reviewer, test→qa-sentinel, else code-reviewer), and when that specialist is the originating finder the finding is surfaced as `NEEDS_MANUAL_CONFIRMATION` rather than re-judged. Applies to BOTH `/new` (inline F.4 prose, the SSOT) and `new2` (the `new-final-review` workflow). **MINOR** (behavioral refinement of the final-review classification rule; the returned `{findings, classification, summary}` contract is unchanged; no `baldart.config.yml` key, so the schema-change propagation rule does not apply).
package/VERSION CHANGED
@@ -1 +1 @@
1
- 4.29.0
1
+ 4.29.1
@@ -226,5 +226,9 @@ returns when the batch is done. It returns:
226
226
  the A/B comparison stays honest. Also record `migration_gate: <migration.status>`
227
227
  (`none`|`applied`|`skipped`|`degraded`) — the Step-3.5 gate is a pre-launch interaction, NOT a
228
228
  mid-batch question, so it does not break the zero-ask-during-batch invariant; logging it keeps the
229
- A/B honest about when a migration was front-loaded. Do NOT re-summarise the cards — the workflow
230
- already did.
229
+ A/B honest about when a migration was front-loaded. Keep `deferral_breakdown` (per-class counts
230
+ of WHY residuals became follow-ups instead of in-batch fixes) in the record — a class dominating
231
+ it is a root-cause signal (many `out-of-ownership` → PRD MAY-EDIT too narrow · many `unresolved` →
232
+ fixes genuinely hard · many `scope-expansion` → cards under-specified upstream), and it is the
233
+ data to consult BEFORE proposing any change to the deferral logic. Do NOT re-summarise the cards —
234
+ the workflow already did.
@@ -987,6 +987,13 @@ function buildTelemetry() {
987
987
  // satisfied up-front instead of deferred owner-gated.
988
988
  migration_gate: (migration && migration.status) || 'none',
989
989
  residuals_total: residuals.length,
990
+ // Why each residual became a follow-up instead of an in-batch fix, counted per class
991
+ // (out-of-ownership | owner-gated | not-a-code-defect | baseline-not-reached | unresolved |
992
+ // outage | scope-expansion | policy-deferred-ac | out-of-scope | file-diff-violation). A
993
+ // skewed breakdown is a root-cause signal: many `out-of-ownership` → MAY-EDIT too narrow (PRD
994
+ // ownership), many `unresolved` → fixes genuinely hard, many `scope-expansion` → cards
995
+ // under-specified upstream. This is diagnosis-only; nothing is auto-implemented from a residual.
996
+ deferral_breakdown: residuals.reduce((b, x) => { const k = x.deferralClass || x.kind || 'unknown'; b[k] = (b[k] || 0) + 1; return b }, {}),
990
997
  // followups_on_disk is filled by the SKILL after it materialises pending residuals.
991
998
  followups_materialized_in_workflow: residuals.filter((x) => x.materialized).length,
992
999
  resolve_invocations: resolvedSignatures.size,
@@ -1029,6 +1036,8 @@ function buildReport(o) {
1029
1036
  }
1030
1037
  if (residuals.length) {
1031
1038
  L.push(``, `## ⚠️ Residui (il skill materializza le follow-up mancanti — nulla perso)`)
1039
+ const bd = residuals.reduce((b, x) => { const k = x.deferralClass || x.kind || 'unknown'; b[k] = (b[k] || 0) + 1; return b }, {})
1040
+ L.push(`Ripartizione per classe: ${Object.entries(bd).map(([k, n]) => `${k}=${n}`).join(' · ')} — una classe dominante è il segnale di causa (out-of-ownership → MAY-EDIT troppo strette · unresolved → fix difficili · scope-expansion → card sotto-specificate).`)
1032
1041
  for (const f of residuals) L.push(`- ${f.card} (${f.kind})${f.materialized ? ' ✓' : ' — DA MATERIALIZZARE'}: ${f.evidence}`)
1033
1042
  }
1034
1043
  const excluded = gateLedger.filter((x) => x.decision === 'EXCLUDED')
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "baldart",
3
- "version": "4.29.0",
3
+ "version": "4.29.1",
4
4
  "description": "Claude Agent Framework - Reusable framework for coordinating AI agents and humans in software projects",
5
5
  "bin": {
6
6
  "baldart": "./bin/baldart.js"