npm - baldart - Versions diffs - 4.29.0 → 4.29.1 - Mend

baldart 4.29.0 → 4.29.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (5) hide show

package/CHANGELOG.md +9 -0
package/VERSION +1 -1
package/framework/.claude/skills/new2/SKILL.md +6 -2
package/framework/.claude/workflows/new2.js +9 -0
package/package.json +1 -1

package/CHANGELOG.md CHANGED Viewed

@@ -5,6 +5,15 @@ All notable changes to BALDART will be documented in this file.
 The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
 and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
+## [4.29.1] - 2026-06-11
+**`new2`: `deferral_breakdown` telemetry — see WHY residuals became follow-ups, per class.** Follow-up cards in `new2` are not "useless deferred work": an in-scope fixable anomaly is fixed directly by `resolve()`'s domain fixer in-batch; a follow-up is created ONLY for a residual that is structurally undeferrable (`out-of-ownership` would edit another card's files · `owner-gated`/`not-a-code-defect`/`baseline-not-reached` aren't code a coder can apply · `unresolved` already failed fixer+judge+tier-2 · `scope-expansion` with a new AC needs a PRD decision · `outage`). To diagnose a run with *many* follow-ups, the telemetry now counts residuals per class so a skewed breakdown points at the real root cause (many `out-of-ownership` → PRD MAY-EDIT too narrow · many `unresolved` → fixes genuinely hard · many `scope-expansion` → cards under-specified upstream) — data first, before any change to the deferral logic. **PATCH** (observability only on the EXPERIMENTAL `new2` surface; diagnosis-only — nothing is auto-implemented from a residual; no behavior change, no config key).
+### Added
+- **`framework/.claude/workflows/new2.js`** — `telemetry.deferral_breakdown` (per-class residual counts, derived from the `deferralClass`/`kind` already carried on each residual) + a "Ripartizione per classe" line in the human report's Residui section.
+- **`framework/.claude/skills/new2/SKILL.md`** — the A/B record step now keeps `deferral_breakdown` and names it the data to consult BEFORE proposing any deferral-logic change.
 ## [4.29.0] - 2026-06-11
 **Final review (F.4): each domain specialist owns its lane — no cross-domain `code-reviewer` re-judge, no self-judge.** The verification step classified findings by spawning `code-reviewer` over EVERY low-confidence finding. That was wrong on two counts: (a) a `doc` finding from `doc-reviewer` (or an `api`/`perf` finding from `api-perf-cost-auditor`) was re-validated by `code-reviewer` — the WRONG specialist judging another domain (a `code-reviewer` judging prose); (b) when Codex is unavailable the `code-reviewer` *fallback* produces the findings and `code-reviewer` then re-judged its OWN findings — self-judging with no model diversity (the same waste removed from the resolve pass in v4.27.1–2). Now every domain specialist FP-checks its OWN findings in the finding pass (doc-reviewer, api-perf-cost-auditor, and the Codex/`code-reviewer`-fallback code engine), so surviving findings arrive already validated; the residual `confidence < 80` path is routed to the finding's DOMAIN specialist (doc→doc-reviewer, api/perf→api-perf-cost-auditor, security/migration→security-reviewer, test→qa-sentinel, else code-reviewer), and when that specialist is the originating finder the finding is surfaced as `NEEDS_MANUAL_CONFIRMATION` rather than re-judged. Applies to BOTH `/new` (inline F.4 prose, the SSOT) and `new2` (the `new-final-review` workflow). **MINOR** (behavioral refinement of the final-review classification rule; the returned `{findings, classification, summary}` contract is unchanged; no `baldart.config.yml` key, so the schema-change propagation rule does not apply).

package/VERSION CHANGED Viewed

	@@ -1 +1 @@
1	- 4.29.0
1	+ 4.29.1

package/framework/.claude/skills/new2/SKILL.md CHANGED Viewed

@@ -226,5 +226,9 @@ returns when the batch is done. It returns:
    the A/B comparison stays honest. Also record `migration_gate: <migration.status>`
    (`none`|`applied`|`skipped`|`degraded`) — the Step-3.5 gate is a pre-launch interaction, NOT a
    mid-batch question, so it does not break the zero-ask-during-batch invariant; logging it keeps the
-   A/B honest about when a migration was front-loaded. Do NOT re-summarise the cards — the workflow
-   already did.
+   A/B honest about when a migration was front-loaded. Keep `deferral_breakdown` (per-class counts
+   of WHY residuals became follow-ups instead of in-batch fixes) in the record — a class dominating
+   it is a root-cause signal (many `out-of-ownership` → PRD MAY-EDIT too narrow · many `unresolved` →
+   fixes genuinely hard · many `scope-expansion` → cards under-specified upstream), and it is the
+   data to consult BEFORE proposing any change to the deferral logic. Do NOT re-summarise the cards —
+   the workflow already did.

package/framework/.claude/workflows/new2.js CHANGED Viewed

@@ -987,6 +987,13 @@ function buildTelemetry() {
     // satisfied up-front instead of deferred owner-gated.
     migration_gate: (migration && migration.status) || 'none',
     residuals_total: residuals.length,
+    // Why each residual became a follow-up instead of an in-batch fix, counted per class
+    // (out-of-ownership | owner-gated | not-a-code-defect | baseline-not-reached | unresolved |
+    // outage | scope-expansion | policy-deferred-ac | out-of-scope | file-diff-violation). A
+    // skewed breakdown is a root-cause signal: many `out-of-ownership` → MAY-EDIT too narrow (PRD
+    // ownership), many `unresolved` → fixes genuinely hard, many `scope-expansion` → cards
+    // under-specified upstream. This is diagnosis-only; nothing is auto-implemented from a residual.
+    deferral_breakdown: residuals.reduce((b, x) => { const k = x.deferralClass || x.kind || 'unknown'; b[k] = (b[k] || 0) + 1; return b }, {}),
     // followups_on_disk is filled by the SKILL after it materialises pending residuals.
     followups_materialized_in_workflow: residuals.filter((x) => x.materialized).length,
     resolve_invocations: resolvedSignatures.size,
@@ -1029,6 +1036,8 @@ function buildReport(o) {
   }
   if (residuals.length) {
     L.push(``, `## ⚠️ Residui (il skill materializza le follow-up mancanti — nulla perso)`)
+    const bd = residuals.reduce((b, x) => { const k = x.deferralClass || x.kind || 'unknown'; b[k] = (b[k] || 0) + 1; return b }, {})
+    L.push(`Ripartizione per classe: ${Object.entries(bd).map(([k, n]) => `${k}=${n}`).join(' · ')} — una classe dominante è il segnale di causa (out-of-ownership → MAY-EDIT troppo strette · unresolved → fix difficili · scope-expansion → card sotto-specificate).`)
     for (const f of residuals) L.push(`- ${f.card} (${f.kind})${f.materialized ? ' ✓' : ' — DA MATERIALIZZARE'}: ${f.evidence}`)
   }
   const excluded = gateLedger.filter((x) => x.decision === 'EXCLUDED')

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "baldart",
-  "version": "4.29.0",
+  "version": "4.29.1",
   "description": "Claude Agent Framework - Reusable framework for coordinating AI agents and humans in software projects",
   "bin": {
     "baldart": "./bin/baldart.js"