baldart 4.29.0 → 4.29.1
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/CHANGELOG.md
CHANGED
|
@@ -5,6 +5,15 @@ All notable changes to BALDART will be documented in this file.
|
|
|
5
5
|
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
|
|
6
6
|
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
|
|
7
7
|
|
|
8
|
+
## [4.29.1] - 2026-06-11
|
|
9
|
+
|
|
10
|
+
**`new2`: `deferral_breakdown` telemetry — see WHY residuals became follow-ups, per class.** Follow-up cards in `new2` are not "useless deferred work": an in-scope fixable anomaly is fixed directly by `resolve()`'s domain fixer in-batch; a follow-up is created ONLY for a residual that is structurally undeferrable (`out-of-ownership` would edit another card's files · `owner-gated`/`not-a-code-defect`/`baseline-not-reached` aren't code a coder can apply · `unresolved` already failed fixer+judge+tier-2 · `scope-expansion` with a new AC needs a PRD decision · `outage`). To diagnose a run with *many* follow-ups, the telemetry now counts residuals per class so a skewed breakdown points at the real root cause (many `out-of-ownership` → PRD MAY-EDIT too narrow · many `unresolved` → fixes genuinely hard · many `scope-expansion` → cards under-specified upstream) — data first, before any change to the deferral logic. **PATCH** (observability only on the EXPERIMENTAL `new2` surface; diagnosis-only — nothing is auto-implemented from a residual; no behavior change, no config key).
|
|
11
|
+
|
|
12
|
+
### Added
|
|
13
|
+
|
|
14
|
+
- **`framework/.claude/workflows/new2.js`** — `telemetry.deferral_breakdown` (per-class residual counts, derived from the `deferralClass`/`kind` already carried on each residual) + a "Ripartizione per classe" line in the human report's Residui section.
|
|
15
|
+
- **`framework/.claude/skills/new2/SKILL.md`** — the A/B record step now keeps `deferral_breakdown` and names it the data to consult BEFORE proposing any deferral-logic change.
|
|
16
|
+
|
|
8
17
|
## [4.29.0] - 2026-06-11
|
|
9
18
|
|
|
10
19
|
**Final review (F.4): each domain specialist owns its lane — no cross-domain `code-reviewer` re-judge, no self-judge.** The verification step classified findings by spawning `code-reviewer` over EVERY low-confidence finding. That was wrong on two counts: (a) a `doc` finding from `doc-reviewer` (or an `api`/`perf` finding from `api-perf-cost-auditor`) was re-validated by `code-reviewer` — the WRONG specialist judging another domain (a `code-reviewer` judging prose); (b) when Codex is unavailable the `code-reviewer` *fallback* produces the findings and `code-reviewer` then re-judged its OWN findings — self-judging with no model diversity (the same waste removed from the resolve pass in v4.27.1–2). Now every domain specialist FP-checks its OWN findings in the finding pass (doc-reviewer, api-perf-cost-auditor, and the Codex/`code-reviewer`-fallback code engine), so surviving findings arrive already validated; the residual `confidence < 80` path is routed to the finding's DOMAIN specialist (doc→doc-reviewer, api/perf→api-perf-cost-auditor, security/migration→security-reviewer, test→qa-sentinel, else code-reviewer), and when that specialist is the originating finder the finding is surfaced as `NEEDS_MANUAL_CONFIRMATION` rather than re-judged. Applies to BOTH `/new` (inline F.4 prose, the SSOT) and `new2` (the `new-final-review` workflow). **MINOR** (behavioral refinement of the final-review classification rule; the returned `{findings, classification, summary}` contract is unchanged; no `baldart.config.yml` key, so the schema-change propagation rule does not apply).
|
package/VERSION
CHANGED
|
@@ -1 +1 @@
|
|
|
1
|
-
4.29.
|
|
1
|
+
4.29.1
|
|
@@ -226,5 +226,9 @@ returns when the batch is done. It returns:
|
|
|
226
226
|
the A/B comparison stays honest. Also record `migration_gate: <migration.status>`
|
|
227
227
|
(`none`|`applied`|`skipped`|`degraded`) — the Step-3.5 gate is a pre-launch interaction, NOT a
|
|
228
228
|
mid-batch question, so it does not break the zero-ask-during-batch invariant; logging it keeps the
|
|
229
|
-
A/B honest about when a migration was front-loaded.
|
|
230
|
-
|
|
229
|
+
A/B honest about when a migration was front-loaded. Keep `deferral_breakdown` (per-class counts
|
|
230
|
+
of WHY residuals became follow-ups instead of in-batch fixes) in the record — a class dominating
|
|
231
|
+
it is a root-cause signal (many `out-of-ownership` → PRD MAY-EDIT too narrow · many `unresolved` →
|
|
232
|
+
fixes genuinely hard · many `scope-expansion` → cards under-specified upstream), and it is the
|
|
233
|
+
data to consult BEFORE proposing any change to the deferral logic. Do NOT re-summarise the cards —
|
|
234
|
+
the workflow already did.
|
|
@@ -987,6 +987,13 @@ function buildTelemetry() {
|
|
|
987
987
|
// satisfied up-front instead of deferred owner-gated.
|
|
988
988
|
migration_gate: (migration && migration.status) || 'none',
|
|
989
989
|
residuals_total: residuals.length,
|
|
990
|
+
// Why each residual became a follow-up instead of an in-batch fix, counted per class
|
|
991
|
+
// (out-of-ownership | owner-gated | not-a-code-defect | baseline-not-reached | unresolved |
|
|
992
|
+
// outage | scope-expansion | policy-deferred-ac | out-of-scope | file-diff-violation). A
|
|
993
|
+
// skewed breakdown is a root-cause signal: many `out-of-ownership` → MAY-EDIT too narrow (PRD
|
|
994
|
+
// ownership), many `unresolved` → fixes genuinely hard, many `scope-expansion` → cards
|
|
995
|
+
// under-specified upstream. This is diagnosis-only; nothing is auto-implemented from a residual.
|
|
996
|
+
deferral_breakdown: residuals.reduce((b, x) => { const k = x.deferralClass || x.kind || 'unknown'; b[k] = (b[k] || 0) + 1; return b }, {}),
|
|
990
997
|
// followups_on_disk is filled by the SKILL after it materialises pending residuals.
|
|
991
998
|
followups_materialized_in_workflow: residuals.filter((x) => x.materialized).length,
|
|
992
999
|
resolve_invocations: resolvedSignatures.size,
|
|
@@ -1029,6 +1036,8 @@ function buildReport(o) {
|
|
|
1029
1036
|
}
|
|
1030
1037
|
if (residuals.length) {
|
|
1031
1038
|
L.push(``, `## ⚠️ Residui (il skill materializza le follow-up mancanti — nulla perso)`)
|
|
1039
|
+
const bd = residuals.reduce((b, x) => { const k = x.deferralClass || x.kind || 'unknown'; b[k] = (b[k] || 0) + 1; return b }, {})
|
|
1040
|
+
L.push(`Ripartizione per classe: ${Object.entries(bd).map(([k, n]) => `${k}=${n}`).join(' · ')} — una classe dominante è il segnale di causa (out-of-ownership → MAY-EDIT troppo strette · unresolved → fix difficili · scope-expansion → card sotto-specificate).`)
|
|
1032
1041
|
for (const f of residuals) L.push(`- ${f.card} (${f.kind})${f.materialized ? ' ✓' : ' — DA MATERIALIZZARE'}: ${f.evidence}`)
|
|
1033
1042
|
}
|
|
1034
1043
|
const excluded = gateLedger.filter((x) => x.decision === 'EXCLUDED')
|