baldart 4.30.1 → 4.31.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/CHANGELOG.md +13 -0
- package/VERSION +1 -1
- package/framework/.claude/skills/new2/SKILL.md +13 -0
- package/framework/.claude/workflows/new2.js +115 -0
- package/package.json +1 -1
package/CHANGELOG.md
CHANGED
|
@@ -5,6 +5,19 @@ All notable changes to BALDART will be documented in this file.
|
|
|
5
5
|
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
|
|
6
6
|
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
|
|
7
7
|
|
|
8
|
+
## [4.31.0] - 2026-06-11
|
|
9
|
+
|
|
10
|
+
**`new2`: the residual ledger self-corrects before returning — no more duplicate-of-done follow-ups, no more N defers for one external action.** A real `new2` run (FEAT-0022 epic, 3 cards) surfaced two over-report classes in the offline-safe residual ledger that the skill was absorbing **by hand** every run: (1) **4 of 8 follow-ups were false-open** — scope-expansion residuals deferred early in the batch but satisfied LATER by another card's commit / a final-review fix, which only `integrateCrossCard()` retracts; a residual closed by any *other* in-batch path stayed falsely-open (and left a best-effort, uncommitted follow-up YAML in the worktree). (2) **3 follow-ups for ONE physical action** — one migration's remote `db:push` re-raised per-card AND batch-wide by the final review → three near-identical owner-gated cards. Both were caught only by the skill's manual per-residual disk grep + consolidation (load-bearing, repeated every run). This release moves that work into the workflow, once, deterministically:
|
|
11
|
+
- **Ledger self-correction (`reconcileLedgerAgainstHead`)** — before returning, re-verify every still-pending code/doc residual (`scope-expansion`/`out-of-scope`/`unresolved`/`file-diff-violation`/`merge-artifact-skipped`/`out-of-ownership`, `materialized` true OR false) against the worktree HEAD via one read-only agent, and retract only the ones it can back with a `file:line` proof. **Conservative by design** — default KEEP: a false-open residual is recoverable downstream (the skill re-checks disk), but a wrong retract silently drops real work (F-029). Owner-gated / not-a-code-defect / policy-deferred are EXTERNAL actions a commit cannot close → never auto-retracted here. Telemetry `ledger_reconciled`.
|
|
12
|
+
- **Owner-gated dedup (`dedupOwnerGatedResiduals`)** — collapse owner-gated / not-a-code-defect residuals that share one action key (migration filename · `db:push`/`db:check-sync` · deploy · secret · DNS), keeping **one per distinct real card** (the skill marks each card DONE only after ITS follow-up exists) and dropping only batch-level duplicates (residual `card` is a finding id → no DONE-linkage); a batch-level residual with no matching per-card entry is a genuinely-new action and is kept. Telemetry `owner_gated_deduped`.
|
|
13
|
+
|
|
14
|
+
The skill's Step 5.1 disk reconciliation **still runs** — it is now a safety net over a pre-cleaned ledger (the self-correction is conservative; F-040 worktree-not-merged still applies), not the sole defence. Same recurring shape as prior `new2` fixes: the splice existed in ONE location (`integrateCrossCard`), the resolution-detection was needed in ALL paths that close a residual. **MINOR** (additive capability + observability on the EXPERIMENTAL `new2` surface; no behavior regression — the guards/policies are unchanged and the retract is proof-gated + conservative; no `baldart.config.yml` key, so the schema-change propagation rule does not apply; no change to `/new`).
|
|
15
|
+
|
|
16
|
+
### Added
|
|
17
|
+
|
|
18
|
+
- **`framework/.claude/workflows/new2.js`** — `reconcileLedgerAgainstHead()` (new `Reconcile` phase, conservative proof-gated retract of already-satisfied residuals) + `dedupOwnerGatedResiduals()` (collapse duplicate external actions), both run right before `finalReturn`. New telemetry fields `ledger_reconciled` + `owner_gated_deduped`; new counters wired through `buildTelemetry`.
|
|
19
|
+
- **`framework/.claude/skills/new2/SKILL.md`** — documents that `residuals[]` now arrives pre-cleaned (Step 5.1 reframed as a safety net) and records the two new telemetry fields in the A/B step.
|
|
20
|
+
|
|
8
21
|
## [4.30.1] - 2026-06-11
|
|
9
22
|
|
|
10
23
|
**`new2`: stop spending opus on mechanical ops steps — explicit per-step model overrides.** Three `general-purpose` agents had no `model:` override, so they inherited the session's main-loop model (opus) for work that needs none. The Merge step is a deterministic OPS/GIT executor (git merge + YAML status reconciliation + grep-based epic closure + leave-and-report hygiene gates) whose correctness-critical checks (F-029 forcedDone guard, F-040 deferred guard) are enforced in JS AFTER it returns — independent of the agent's reasoning → **sonnet**. The per-card Codex review agent is a pure DRIVER (runs the companion, strips `[codex]` traces, maps findings) — the review intelligence is Codex, run externally → **haiku**. The post-merge Production Readiness checklist is non-blocking report-not-execute → **sonnet**. The Pre-flight agent (DAG + ownership map + idempotency — it grounds the whole batch) intentionally **stays opus**. **PATCH** (cost optimization on the EXPERIMENTAL `new2` surface; no behavior change — the deterministic guards/policies are unchanged; no config key).
|
package/VERSION
CHANGED
|
@@ -1 +1 @@
|
|
|
1
|
-
4.
|
|
1
|
+
4.31.0
|
|
@@ -159,6 +159,14 @@ returns when the batch is done. It returns:
|
|
|
159
159
|
flag is **advisory only** — `true` means the workflow *attempted* a write (possibly
|
|
160
160
|
into a worktree that never merged), not that a card exists on disk in the main repo.
|
|
161
161
|
**You (the skill) must reconcile EVERY residual against the main-repo disk** (Step 5.1).
|
|
162
|
+
**v4.31.0 — the workflow now self-corrects this ledger before returning**: it re-verifies
|
|
163
|
+
every still-pending code/doc residual against the worktree HEAD and **retracts the ones a
|
|
164
|
+
later in-batch commit already satisfied** (telemetry `ledger_reconciled`), and **collapses
|
|
165
|
+
duplicate owner-gated residuals that map to one external action** (e.g. one migration's
|
|
166
|
+
`db:push` re-raised per-card + batch-wide → telemetry `owner_gated_deduped`). So `residuals[]`
|
|
167
|
+
arrives pre-cleaned of false-open and duplicate entries; your Step 5.1 disk reconciliation is
|
|
168
|
+
now a **safety net** over a clean ledger (it still runs — the self-correction is conservative
|
|
169
|
+
and F-040 worktree-not-merged still applies), not the sole defence.
|
|
162
170
|
- `degraded` / `degradationReasons` — the batch stopped early under a sustained
|
|
163
171
|
outage (or another degradation). The batch is NOT complete; it must be resumed.
|
|
164
172
|
- `telemetry` — the Phase-8 record (`variant:"new2"`).
|
|
@@ -234,4 +242,9 @@ returns when the batch is done. It returns:
|
|
|
234
242
|
too (count of residuals the pre-final-review Cross-Card Integration Pass implemented in-batch —
|
|
235
243
|
out-of-ownership-within-batch + outage retries — instead of leaving as follow-ups); with
|
|
236
244
|
`deferral_breakdown` it shows how many deferrals were genuinely undeferrable vs absorbed in-batch.
|
|
245
|
+
Keep `ledger_reconciled` + `owner_gated_deduped` (v4.31.0) too — they quantify the ledger
|
|
246
|
+
self-correction: `ledger_reconciled` > 0 means the workflow retracted residuals a later commit had
|
|
247
|
+
already satisfied (work the skill used to suppress by hand; a persistently high value signals
|
|
248
|
+
deferrals resolving too late — order the dependent card earlier), and `owner_gated_deduped` > 0
|
|
249
|
+
means N defers were collapsed to one external action.
|
|
237
250
|
Do NOT re-summarise the cards — the workflow already did.
|
|
@@ -9,6 +9,7 @@ export const meta = {
|
|
|
9
9
|
{ title: 'Final', detail: 'cross-batch final review (delegates to new-final-review)' },
|
|
10
10
|
{ title: 'Merge', detail: 'integrity-gated auto-merge to trunk via git.merge_strategy + cleanup' },
|
|
11
11
|
{ title: 'Production', detail: 'post-merge production-readiness checklist (Phase 7, non-blocking)' },
|
|
12
|
+
{ title: 'Reconcile', detail: 'ledger self-correction: retract residuals already satisfied in HEAD + dedup owner-gated external actions (v4.31.0)' },
|
|
12
13
|
],
|
|
13
14
|
}
|
|
14
15
|
|
|
@@ -71,6 +72,8 @@ const cardMayEdit = {} // v4.30.0 — per-card MAY-EDIT, for the cross-car
|
|
|
71
72
|
const perCardResults = []
|
|
72
73
|
let prodReadiness = null
|
|
73
74
|
let integratedCount = 0 // v4.30.0 — residuals resolved by the Cross-Card Integration Pass
|
|
75
|
+
let ledgerReconciled = 0 // v4.31.0 — residuals retracted because already-satisfied in HEAD (resolved by a later commit, not by integrateCrossCard)
|
|
76
|
+
let ownerGatedDeduped = 0 // v4.31.0 — duplicate owner-gated residuals collapsed (N defers → 1 external action)
|
|
74
77
|
let degraded = false
|
|
75
78
|
const degradationReasons = []
|
|
76
79
|
|
|
@@ -1062,6 +1065,111 @@ if (mergeResult && mergeResult.merged) {
|
|
|
1062
1065
|
ledger(firstCard, 'phase7-production', 'SKIPPED', 'not merged')
|
|
1063
1066
|
}
|
|
1064
1067
|
|
|
1068
|
+
// ───────────────────────────────────────────────────────────────────────────
|
|
1069
|
+
// Ledger self-correction (v4.31.0) — a residual deferred earlier in the batch can be satisfied
|
|
1070
|
+
// LATER by another card's commit, the Cross-Card Integration Pass, or the final-review resolve.
|
|
1071
|
+
// Only integrateCrossCard() splices ITS OWN resolutions; a residual closed by any OTHER in-batch
|
|
1072
|
+
// path stays falsely-open in the offline-safe ledger (and materialiseFollowup left a best-effort,
|
|
1073
|
+
// uncommitted follow-up YAML in the worktree). The skill then has to grep main-repo disk per
|
|
1074
|
+
// residual to suppress the false-open ones — load-bearing, manual, repeated every run. Move that
|
|
1075
|
+
// verification HERE, once, deterministically: re-check every still-pending code/doc residual
|
|
1076
|
+
// against the worktree HEAD and retract only the ones an agent can PROVE are already satisfied.
|
|
1077
|
+
// CONSERVATIVE: default KEEP — a false-open residual is recoverable downstream (the skill re-checks
|
|
1078
|
+
// disk), but a wrong retract silently drops real work (F-029). Owner-gated / not-a-code-defect /
|
|
1079
|
+
// policy-deferred are EXTERNAL actions a commit cannot close → never auto-retracted here.
|
|
1080
|
+
// ───────────────────────────────────────────────────────────────────────────
|
|
1081
|
+
const VERIFIABLE_RETRACT = new Set(['scope-expansion', 'out-of-scope', 'unresolved', 'file-diff-violation', 'merge-artifact-skipped', 'out-of-ownership'])
|
|
1082
|
+
async function reconcileLedgerAgainstHead() {
|
|
1083
|
+
if (degraded || !sharedCtx || !sharedCtx.worktreePath) return
|
|
1084
|
+
// materialized:true is NOT proof a residual is correctly-open — its best-effort worktree YAML may
|
|
1085
|
+
// be for work since closed, which would mint a duplicate-of-done card in the main repo. So verify
|
|
1086
|
+
// BOTH materialized true and false, restricted to classes a later commit could actually close.
|
|
1087
|
+
const candidates = residuals.filter((r) => !r.integrated && VERIFIABLE_RETRACT.has(r.deferralClass || r.kind))
|
|
1088
|
+
if (!candidates.length) return
|
|
1089
|
+
phase('Reconcile')
|
|
1090
|
+
let verdict = null
|
|
1091
|
+
try {
|
|
1092
|
+
verdict = await agentSafe(
|
|
1093
|
+
`Read-only verification (you are general-purpose ops; ROLE BOUNDARY: git read commands only — you NEVER edit, write, or delete any file). cd into the worktree ${sharedCtx.worktreePath}; its HEAD holds every committed card of the batch.\n\n` +
|
|
1094
|
+
`For each residual below, decide whether the gap it describes is ALREADY SATISFIED in the worktree HEAD — a later card's commit, a cross-card fix, or a final-review fix may have closed it after it was deferred. Inspect the actual code/docs/tests in HEAD; do NOT trust the residual text.\n` +
|
|
1095
|
+
`CONSERVATIVE CONTRACT: set resolved only for indices you can back with the exact file:line in HEAD that satisfies the gap. If you are not certain, OMIT the index (a still-open residual is safely re-checked downstream; a wrong 'resolved' silently drops real work).\n\n` +
|
|
1096
|
+
`Residuals (index · card · class · evidence):\n` +
|
|
1097
|
+
candidates.map((r, i) => ` [${i}] ${r.card} · ${r.deferralClass || r.kind} · ${r.evidence}`).join('\n') +
|
|
1098
|
+
`\n\nReturn: { resolved: [ { index, proof } ] } — only indices already satisfied in HEAD, each with a file:line proof.`,
|
|
1099
|
+
{ label: 'ledger-reconcile', phase: 'Reconcile', agentType: 'general-purpose', model: 'sonnet',
|
|
1100
|
+
schema: { type: 'object', required: ['resolved'], additionalProperties: true, properties: { resolved: { type: 'array', items: { type: 'object', required: ['index'], additionalProperties: true, properties: { index: { type: 'number' }, proof: { type: 'string' } } } } } } }
|
|
1101
|
+
)
|
|
1102
|
+
} catch (e) { if (e && e.transientExhausted) noteDegraded('outage'); return }
|
|
1103
|
+
const satisfied = ((verdict && verdict.resolved) || []).map((x) => x.index).filter((i) => Number.isInteger(i) && i >= 0 && i < candidates.length)
|
|
1104
|
+
if (!satisfied.length) { ledger(firstCard, 'ledger-reconcile', 'CLEAN', `${candidates.length} pending residual(s); none verifiably already-satisfied in HEAD`); return }
|
|
1105
|
+
const retracted = []
|
|
1106
|
+
for (const i of satisfied) {
|
|
1107
|
+
const r = candidates[i]
|
|
1108
|
+
const idx = residuals.indexOf(r)
|
|
1109
|
+
if (idx < 0) continue // already spliced (duplicate index)
|
|
1110
|
+
residuals.splice(idx, 1)
|
|
1111
|
+
for (let j = residualFollowups.length - 1; j >= 0; j--) {
|
|
1112
|
+
if (residualFollowups[j].card === r.card && residualFollowups[j].kind === r.kind) residualFollowups.splice(j, 1)
|
|
1113
|
+
}
|
|
1114
|
+
retracted.push(`${r.card}/${r.deferralClass || r.kind}`)
|
|
1115
|
+
ledgerReconciled++
|
|
1116
|
+
}
|
|
1117
|
+
ledger(firstCard, 'ledger-reconcile', 'RETRACTED', `${retracted.length} residual(s) already-satisfied in HEAD → retracted (no duplicate-of-done card): ${retracted.join(', ')}`)
|
|
1118
|
+
}
|
|
1119
|
+
|
|
1120
|
+
// ───────────────────────────────────────────────────────────────────────────
|
|
1121
|
+
// Owner-gated dedup (v4.31.0) — several deferred ACs across the batch can map to ONE physical
|
|
1122
|
+
// external action (e.g. one migration's remote `db:push` re-raised per-card AND batch-wide by the
|
|
1123
|
+
// final review). Minting N follow-ups for one action is redundant. Collapse owner-gated /
|
|
1124
|
+
// not-a-code-defect residuals that share an action key — but KEEP one per distinct REAL card (the
|
|
1125
|
+
// skill marks each card DONE only after ITS follow-up exists) and drop only batch-level duplicates
|
|
1126
|
+
// (residual.card is a finding id, not a backlog card → no DONE-linkage to preserve). A batch-level
|
|
1127
|
+
// owner-gated residual with NO matching real-card entry is a genuinely-new action → kept untouched.
|
|
1128
|
+
// ───────────────────────────────────────────────────────────────────────────
|
|
1129
|
+
function ownerGatedActionKey(r) {
|
|
1130
|
+
const hay = `${r.evidence || ''} ${(r.remedyFiles || []).join(' ')}`
|
|
1131
|
+
const mig = hay.match(/(\d{14}_[a-z0-9_]+\.sql)/i)
|
|
1132
|
+
if (mig) return 'migration:' + mig[1].toLowerCase()
|
|
1133
|
+
if (/\bdb:push\b|\bdb:check-sync\b|remote db push|migration.*(deploy|remote|push)/i.test(hay)) return 'db-migration-deploy'
|
|
1134
|
+
if (/\bdeploy(ment)?\b/i.test(hay)) return 'deploy'
|
|
1135
|
+
if (/\bsecret\b/i.test(hay)) return 'secret'
|
|
1136
|
+
if (/\bDNS\b|\bdomain\b/i.test(hay)) return 'dns'
|
|
1137
|
+
return null // unknown action → never dedup (avoid collapsing two genuinely-distinct externals)
|
|
1138
|
+
}
|
|
1139
|
+
function dedupOwnerGatedResiduals() {
|
|
1140
|
+
const realCard = new Set(cardIds)
|
|
1141
|
+
const groups = {}
|
|
1142
|
+
for (const r of residuals) {
|
|
1143
|
+
if (r.deferralClass !== 'owner-gated' && r.deferralClass !== 'not-a-code-defect') continue
|
|
1144
|
+
const k = ownerGatedActionKey(r)
|
|
1145
|
+
if (!k) continue
|
|
1146
|
+
;(groups[k] = groups[k] || []).push(r)
|
|
1147
|
+
}
|
|
1148
|
+
for (const k of Object.keys(groups)) {
|
|
1149
|
+
const g = groups[k]
|
|
1150
|
+
if (g.length < 2) continue
|
|
1151
|
+
if (!g.some((r) => realCard.has(r.card))) continue // all batch-level (new action) — keep them
|
|
1152
|
+
const seenCard = new Set()
|
|
1153
|
+
for (const r of g) {
|
|
1154
|
+
const isReal = realCard.has(r.card)
|
|
1155
|
+
const isDup = isReal ? seenCard.has(r.card) : true // keep one per real card; every batch-level entry is a dup of the per-card tracking
|
|
1156
|
+
if (isReal) seenCard.add(r.card)
|
|
1157
|
+
if (!isDup) continue
|
|
1158
|
+
const idx = residuals.indexOf(r)
|
|
1159
|
+
if (idx < 0) continue
|
|
1160
|
+
residuals.splice(idx, 1)
|
|
1161
|
+
for (let j = residualFollowups.length - 1; j >= 0; j--) {
|
|
1162
|
+
if (residualFollowups[j].card === r.card && residualFollowups[j].kind === r.kind) residualFollowups.splice(j, 1)
|
|
1163
|
+
}
|
|
1164
|
+
ownerGatedDeduped++
|
|
1165
|
+
}
|
|
1166
|
+
}
|
|
1167
|
+
if (ownerGatedDeduped) ledger(firstCard, 'owner-gated-dedup', 'COLLAPSED', `${ownerGatedDeduped} duplicate owner-gated residual(s) collapsed (multiple defers → one external action)`)
|
|
1168
|
+
}
|
|
1169
|
+
|
|
1170
|
+
await reconcileLedgerAgainstHead()
|
|
1171
|
+
dedupOwnerGatedResiduals()
|
|
1172
|
+
|
|
1065
1173
|
return finalReturn({ fatal: false })
|
|
1066
1174
|
|
|
1067
1175
|
// ───────────────────────────────────────────────────────────────────────────
|
|
@@ -1105,6 +1213,13 @@ function buildTelemetry() {
|
|
|
1105
1213
|
// v4.30.0 — residuals the Cross-Card Integration Pass implemented in-batch (out-of-ownership
|
|
1106
1214
|
// within the batch union + outage retries) instead of leaving as follow-ups to manage later.
|
|
1107
1215
|
cross_card_integrated: integratedCount,
|
|
1216
|
+
// v4.31.0 — residuals retracted because a later in-batch commit already satisfied them (verified
|
|
1217
|
+
// against the worktree HEAD, conservative). A non-zero value means the ledger self-corrected what
|
|
1218
|
+
// the skill used to suppress by hand; a persistently high value signals deferrals that resolve too
|
|
1219
|
+
// late (consider ordering the dependent card earlier).
|
|
1220
|
+
ledger_reconciled: ledgerReconciled,
|
|
1221
|
+
// v4.31.0 — duplicate owner-gated residuals collapsed to one external action (e.g. one db:push).
|
|
1222
|
+
owner_gated_deduped: ownerGatedDeduped,
|
|
1108
1223
|
// followups_on_disk is filled by the SKILL after it materialises pending residuals.
|
|
1109
1224
|
followups_materialized_in_workflow: residuals.filter((x) => x.materialized).length,
|
|
1110
1225
|
resolve_invocations: resolvedSignatures.size,
|