baldart 4.30.1 → 4.31.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/CHANGELOG.md CHANGED
@@ -5,6 +5,19 @@ All notable changes to BALDART will be documented in this file.
5
5
  The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
6
6
  and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
7
7
 
8
+ ## [4.31.0] - 2026-06-11
9
+
10
+ **`new2`: the residual ledger self-corrects before returning — no more duplicate-of-done follow-ups, no more N defers for one external action.** A real `new2` run (FEAT-0022 epic, 3 cards) surfaced two over-report classes in the offline-safe residual ledger that the skill was absorbing **by hand** every run: (1) **4 of 8 follow-ups were false-open** — scope-expansion residuals deferred early in the batch but satisfied LATER by another card's commit / a final-review fix, which only `integrateCrossCard()` retracts; a residual closed by any *other* in-batch path stayed falsely-open (and left a best-effort, uncommitted follow-up YAML in the worktree). (2) **3 follow-ups for ONE physical action** — one migration's remote `db:push` re-raised per-card AND batch-wide by the final review → three near-identical owner-gated cards. Both were caught only by the skill's manual per-residual disk grep + consolidation (load-bearing, repeated every run). This release moves that work into the workflow, once, deterministically:
11
+ - **Ledger self-correction (`reconcileLedgerAgainstHead`)** — before returning, re-verify every still-pending code/doc residual (`scope-expansion`/`out-of-scope`/`unresolved`/`file-diff-violation`/`merge-artifact-skipped`/`out-of-ownership`, `materialized` true OR false) against the worktree HEAD via one read-only agent, and retract only the ones it can back with a `file:line` proof. **Conservative by design** — default KEEP: a false-open residual is recoverable downstream (the skill re-checks disk), but a wrong retract silently drops real work (F-029). Owner-gated / not-a-code-defect / policy-deferred are EXTERNAL actions a commit cannot close → never auto-retracted here. Telemetry `ledger_reconciled`.
12
+ - **Owner-gated dedup (`dedupOwnerGatedResiduals`)** — collapse owner-gated / not-a-code-defect residuals that share one action key (migration filename · `db:push`/`db:check-sync` · deploy · secret · DNS), keeping **one per distinct real card** (the skill marks each card DONE only after ITS follow-up exists) and dropping only batch-level duplicates (residual `card` is a finding id → no DONE-linkage); a batch-level residual with no matching per-card entry is a genuinely-new action and is kept. Telemetry `owner_gated_deduped`.
13
+
14
+ The skill's Step 5.1 disk reconciliation **still runs** — it is now a safety net over a pre-cleaned ledger (the self-correction is conservative; F-040 worktree-not-merged still applies), not the sole defence. Same recurring shape as prior `new2` fixes: the splice existed in ONE location (`integrateCrossCard`), the resolution-detection was needed in ALL paths that close a residual. **MINOR** (additive capability + observability on the EXPERIMENTAL `new2` surface; no behavior regression — the guards/policies are unchanged and the retract is proof-gated + conservative; no `baldart.config.yml` key, so the schema-change propagation rule does not apply; no change to `/new`).
15
+
16
+ ### Added
17
+
18
+ - **`framework/.claude/workflows/new2.js`** — `reconcileLedgerAgainstHead()` (new `Reconcile` phase, conservative proof-gated retract of already-satisfied residuals) + `dedupOwnerGatedResiduals()` (collapse duplicate external actions), both run right before `finalReturn`. New telemetry fields `ledger_reconciled` + `owner_gated_deduped`; new counters wired through `buildTelemetry`.
19
+ - **`framework/.claude/skills/new2/SKILL.md`** — documents that `residuals[]` now arrives pre-cleaned (Step 5.1 reframed as a safety net) and records the two new telemetry fields in the A/B step.
20
+
8
21
  ## [4.30.1] - 2026-06-11
9
22
 
10
23
  **`new2`: stop spending opus on mechanical ops steps — explicit per-step model overrides.** Three `general-purpose` agents had no `model:` override, so they inherited the session's main-loop model (opus) for work that needs none. The Merge step is a deterministic OPS/GIT executor (git merge + YAML status reconciliation + grep-based epic closure + leave-and-report hygiene gates) whose correctness-critical checks (F-029 forcedDone guard, F-040 deferred guard) are enforced in JS AFTER it returns — independent of the agent's reasoning → **sonnet**. The per-card Codex review agent is a pure DRIVER (runs the companion, strips `[codex]` traces, maps findings) — the review intelligence is Codex, run externally → **haiku**. The post-merge Production Readiness checklist is non-blocking report-not-execute → **sonnet**. The Pre-flight agent (DAG + ownership map + idempotency — it grounds the whole batch) intentionally **stays opus**. **PATCH** (cost optimization on the EXPERIMENTAL `new2` surface; no behavior change — the deterministic guards/policies are unchanged; no config key).
package/VERSION CHANGED
@@ -1 +1 @@
1
- 4.30.1
1
+ 4.31.0
@@ -159,6 +159,14 @@ returns when the batch is done. It returns:
159
159
  flag is **advisory only** — `true` means the workflow *attempted* a write (possibly
160
160
  into a worktree that never merged), not that a card exists on disk in the main repo.
161
161
  **You (the skill) must reconcile EVERY residual against the main-repo disk** (Step 5.1).
162
+ **v4.31.0 — the workflow now self-corrects this ledger before returning**: it re-verifies
163
+ every still-pending code/doc residual against the worktree HEAD and **retracts the ones a
164
+ later in-batch commit already satisfied** (telemetry `ledger_reconciled`), and **collapses
165
+ duplicate owner-gated residuals that map to one external action** (e.g. one migration's
166
+ `db:push` re-raised per-card + batch-wide → telemetry `owner_gated_deduped`). So `residuals[]`
167
+ arrives pre-cleaned of false-open and duplicate entries; your Step 5.1 disk reconciliation is
168
+ now a **safety net** over a clean ledger (it still runs — the self-correction is conservative
169
+ and F-040 worktree-not-merged still applies), not the sole defence.
162
170
  - `degraded` / `degradationReasons` — the batch stopped early under a sustained
163
171
  outage (or another degradation). The batch is NOT complete; it must be resumed.
164
172
  - `telemetry` — the Phase-8 record (`variant:"new2"`).
@@ -234,4 +242,9 @@ returns when the batch is done. It returns:
234
242
  too (count of residuals the pre-final-review Cross-Card Integration Pass implemented in-batch —
235
243
  out-of-ownership-within-batch + outage retries — instead of leaving as follow-ups); with
236
244
  `deferral_breakdown` it shows how many deferrals were genuinely undeferrable vs absorbed in-batch.
245
+ Keep `ledger_reconciled` + `owner_gated_deduped` (v4.31.0) too — they quantify the ledger
246
+ self-correction: `ledger_reconciled` > 0 means the workflow retracted residuals a later commit had
247
+ already satisfied (work the skill used to suppress by hand; a persistently high value signals
248
+ deferrals resolving too late — order the dependent card earlier), and `owner_gated_deduped` > 0
249
+ means N defers were collapsed to one external action.
237
250
  Do NOT re-summarise the cards — the workflow already did.
@@ -9,6 +9,7 @@ export const meta = {
9
9
  { title: 'Final', detail: 'cross-batch final review (delegates to new-final-review)' },
10
10
  { title: 'Merge', detail: 'integrity-gated auto-merge to trunk via git.merge_strategy + cleanup' },
11
11
  { title: 'Production', detail: 'post-merge production-readiness checklist (Phase 7, non-blocking)' },
12
+ { title: 'Reconcile', detail: 'ledger self-correction: retract residuals already satisfied in HEAD + dedup owner-gated external actions (v4.31.0)' },
12
13
  ],
13
14
  }
14
15
 
@@ -71,6 +72,8 @@ const cardMayEdit = {} // v4.30.0 — per-card MAY-EDIT, for the cross-car
71
72
  const perCardResults = []
72
73
  let prodReadiness = null
73
74
  let integratedCount = 0 // v4.30.0 — residuals resolved by the Cross-Card Integration Pass
75
+ let ledgerReconciled = 0 // v4.31.0 — residuals retracted because already-satisfied in HEAD (resolved by a later commit, not by integrateCrossCard)
76
+ let ownerGatedDeduped = 0 // v4.31.0 — duplicate owner-gated residuals collapsed (N defers → 1 external action)
74
77
  let degraded = false
75
78
  const degradationReasons = []
76
79
 
@@ -1062,6 +1065,111 @@ if (mergeResult && mergeResult.merged) {
1062
1065
  ledger(firstCard, 'phase7-production', 'SKIPPED', 'not merged')
1063
1066
  }
1064
1067
 
1068
+ // ───────────────────────────────────────────────────────────────────────────
1069
+ // Ledger self-correction (v4.31.0) — a residual deferred earlier in the batch can be satisfied
1070
+ // LATER by another card's commit, the Cross-Card Integration Pass, or the final-review resolve.
1071
+ // Only integrateCrossCard() splices ITS OWN resolutions; a residual closed by any OTHER in-batch
1072
+ // path stays falsely-open in the offline-safe ledger (and materialiseFollowup left a best-effort,
1073
+ // uncommitted follow-up YAML in the worktree). The skill then has to grep main-repo disk per
1074
+ // residual to suppress the false-open ones — load-bearing, manual, repeated every run. Move that
1075
+ // verification HERE, once, deterministically: re-check every still-pending code/doc residual
1076
+ // against the worktree HEAD and retract only the ones an agent can PROVE are already satisfied.
1077
+ // CONSERVATIVE: default KEEP — a false-open residual is recoverable downstream (the skill re-checks
1078
+ // disk), but a wrong retract silently drops real work (F-029). Owner-gated / not-a-code-defect /
1079
+ // policy-deferred are EXTERNAL actions a commit cannot close → never auto-retracted here.
1080
+ // ───────────────────────────────────────────────────────────────────────────
1081
+ const VERIFIABLE_RETRACT = new Set(['scope-expansion', 'out-of-scope', 'unresolved', 'file-diff-violation', 'merge-artifact-skipped', 'out-of-ownership'])
1082
+ async function reconcileLedgerAgainstHead() {
1083
+ if (degraded || !sharedCtx || !sharedCtx.worktreePath) return
1084
+ // materialized:true is NOT proof a residual is correctly-open — its best-effort worktree YAML may
1085
+ // be for work since closed, which would mint a duplicate-of-done card in the main repo. So verify
1086
+ // BOTH materialized true and false, restricted to classes a later commit could actually close.
1087
+ const candidates = residuals.filter((r) => !r.integrated && VERIFIABLE_RETRACT.has(r.deferralClass || r.kind))
1088
+ if (!candidates.length) return
1089
+ phase('Reconcile')
1090
+ let verdict = null
1091
+ try {
1092
+ verdict = await agentSafe(
1093
+ `Read-only verification (you are general-purpose ops; ROLE BOUNDARY: git read commands only — you NEVER edit, write, or delete any file). cd into the worktree ${sharedCtx.worktreePath}; its HEAD holds every committed card of the batch.\n\n` +
1094
+ `For each residual below, decide whether the gap it describes is ALREADY SATISFIED in the worktree HEAD — a later card's commit, a cross-card fix, or a final-review fix may have closed it after it was deferred. Inspect the actual code/docs/tests in HEAD; do NOT trust the residual text.\n` +
1095
+ `CONSERVATIVE CONTRACT: set resolved only for indices you can back with the exact file:line in HEAD that satisfies the gap. If you are not certain, OMIT the index (a still-open residual is safely re-checked downstream; a wrong 'resolved' silently drops real work).\n\n` +
1096
+ `Residuals (index · card · class · evidence):\n` +
1097
+ candidates.map((r, i) => ` [${i}] ${r.card} · ${r.deferralClass || r.kind} · ${r.evidence}`).join('\n') +
1098
+ `\n\nReturn: { resolved: [ { index, proof } ] } — only indices already satisfied in HEAD, each with a file:line proof.`,
1099
+ { label: 'ledger-reconcile', phase: 'Reconcile', agentType: 'general-purpose', model: 'sonnet',
1100
+ schema: { type: 'object', required: ['resolved'], additionalProperties: true, properties: { resolved: { type: 'array', items: { type: 'object', required: ['index'], additionalProperties: true, properties: { index: { type: 'number' }, proof: { type: 'string' } } } } } } }
1101
+ )
1102
+ } catch (e) { if (e && e.transientExhausted) noteDegraded('outage'); return }
1103
+ const satisfied = ((verdict && verdict.resolved) || []).map((x) => x.index).filter((i) => Number.isInteger(i) && i >= 0 && i < candidates.length)
1104
+ if (!satisfied.length) { ledger(firstCard, 'ledger-reconcile', 'CLEAN', `${candidates.length} pending residual(s); none verifiably already-satisfied in HEAD`); return }
1105
+ const retracted = []
1106
+ for (const i of satisfied) {
1107
+ const r = candidates[i]
1108
+ const idx = residuals.indexOf(r)
1109
+ if (idx < 0) continue // already spliced (duplicate index)
1110
+ residuals.splice(idx, 1)
1111
+ for (let j = residualFollowups.length - 1; j >= 0; j--) {
1112
+ if (residualFollowups[j].card === r.card && residualFollowups[j].kind === r.kind) residualFollowups.splice(j, 1)
1113
+ }
1114
+ retracted.push(`${r.card}/${r.deferralClass || r.kind}`)
1115
+ ledgerReconciled++
1116
+ }
1117
+ ledger(firstCard, 'ledger-reconcile', 'RETRACTED', `${retracted.length} residual(s) already-satisfied in HEAD → retracted (no duplicate-of-done card): ${retracted.join(', ')}`)
1118
+ }
1119
+
1120
+ // ───────────────────────────────────────────────────────────────────────────
1121
+ // Owner-gated dedup (v4.31.0) — several deferred ACs across the batch can map to ONE physical
1122
+ // external action (e.g. one migration's remote `db:push` re-raised per-card AND batch-wide by the
1123
+ // final review). Minting N follow-ups for one action is redundant. Collapse owner-gated /
1124
+ // not-a-code-defect residuals that share an action key — but KEEP one per distinct REAL card (the
1125
+ // skill marks each card DONE only after ITS follow-up exists) and drop only batch-level duplicates
1126
+ // (residual.card is a finding id, not a backlog card → no DONE-linkage to preserve). A batch-level
1127
+ // owner-gated residual with NO matching real-card entry is a genuinely-new action → kept untouched.
1128
+ // ───────────────────────────────────────────────────────────────────────────
1129
+ function ownerGatedActionKey(r) {
1130
+ const hay = `${r.evidence || ''} ${(r.remedyFiles || []).join(' ')}`
1131
+ const mig = hay.match(/(\d{14}_[a-z0-9_]+\.sql)/i)
1132
+ if (mig) return 'migration:' + mig[1].toLowerCase()
1133
+ if (/\bdb:push\b|\bdb:check-sync\b|remote db push|migration.*(deploy|remote|push)/i.test(hay)) return 'db-migration-deploy'
1134
+ if (/\bdeploy(ment)?\b/i.test(hay)) return 'deploy'
1135
+ if (/\bsecret\b/i.test(hay)) return 'secret'
1136
+ if (/\bDNS\b|\bdomain\b/i.test(hay)) return 'dns'
1137
+ return null // unknown action → never dedup (avoid collapsing two genuinely-distinct externals)
1138
+ }
1139
+ function dedupOwnerGatedResiduals() {
1140
+ const realCard = new Set(cardIds)
1141
+ const groups = {}
1142
+ for (const r of residuals) {
1143
+ if (r.deferralClass !== 'owner-gated' && r.deferralClass !== 'not-a-code-defect') continue
1144
+ const k = ownerGatedActionKey(r)
1145
+ if (!k) continue
1146
+ ;(groups[k] = groups[k] || []).push(r)
1147
+ }
1148
+ for (const k of Object.keys(groups)) {
1149
+ const g = groups[k]
1150
+ if (g.length < 2) continue
1151
+ if (!g.some((r) => realCard.has(r.card))) continue // all batch-level (new action) — keep them
1152
+ const seenCard = new Set()
1153
+ for (const r of g) {
1154
+ const isReal = realCard.has(r.card)
1155
+ const isDup = isReal ? seenCard.has(r.card) : true // keep one per real card; every batch-level entry is a dup of the per-card tracking
1156
+ if (isReal) seenCard.add(r.card)
1157
+ if (!isDup) continue
1158
+ const idx = residuals.indexOf(r)
1159
+ if (idx < 0) continue
1160
+ residuals.splice(idx, 1)
1161
+ for (let j = residualFollowups.length - 1; j >= 0; j--) {
1162
+ if (residualFollowups[j].card === r.card && residualFollowups[j].kind === r.kind) residualFollowups.splice(j, 1)
1163
+ }
1164
+ ownerGatedDeduped++
1165
+ }
1166
+ }
1167
+ if (ownerGatedDeduped) ledger(firstCard, 'owner-gated-dedup', 'COLLAPSED', `${ownerGatedDeduped} duplicate owner-gated residual(s) collapsed (multiple defers → one external action)`)
1168
+ }
1169
+
1170
+ await reconcileLedgerAgainstHead()
1171
+ dedupOwnerGatedResiduals()
1172
+
1065
1173
  return finalReturn({ fatal: false })
1066
1174
 
1067
1175
  // ───────────────────────────────────────────────────────────────────────────
@@ -1105,6 +1213,13 @@ function buildTelemetry() {
1105
1213
  // v4.30.0 — residuals the Cross-Card Integration Pass implemented in-batch (out-of-ownership
1106
1214
  // within the batch union + outage retries) instead of leaving as follow-ups to manage later.
1107
1215
  cross_card_integrated: integratedCount,
1216
+ // v4.31.0 — residuals retracted because a later in-batch commit already satisfied them (verified
1217
+ // against the worktree HEAD, conservative). A non-zero value means the ledger self-corrected what
1218
+ // the skill used to suppress by hand; a persistently high value signals deferrals that resolve too
1219
+ // late (consider ordering the dependent card earlier).
1220
+ ledger_reconciled: ledgerReconciled,
1221
+ // v4.31.0 — duplicate owner-gated residuals collapsed to one external action (e.g. one db:push).
1222
+ owner_gated_deduped: ownerGatedDeduped,
1108
1223
  // followups_on_disk is filled by the SKILL after it materialises pending residuals.
1109
1224
  followups_materialized_in_workflow: residuals.filter((x) => x.materialized).length,
1110
1225
  resolve_invocations: resolvedSignatures.size,
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "baldart",
3
- "version": "4.30.1",
3
+ "version": "4.31.0",
4
4
  "description": "Claude Agent Framework - Reusable framework for coordinating AI agents and humans in software projects",
5
5
  "bin": {
6
6
  "baldart": "./bin/baldart.js"