baldart 4.24.0 → 4.24.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/CHANGELOG.md CHANGED
@@ -5,6 +5,40 @@ All notable changes to BALDART will be documented in this file.
5
5
  The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
6
6
  and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
7
7
 
8
+ ## [4.24.2] - 2026-06-10
9
+
10
+ **`new2`: holistic logic review — deterministic owner_agent routing, a merge gate that no longer strands the batch, `deferralClass` end-to-end, and −N agent spawns per run.** A full logic review of `new2.js`/`new2-resolve.js`/`SKILL.md` against `/new`'s reference modules found four correctness defects and five sources of wasted spawns. The headline routing bug: the pre-flight's `ownerAgent` was passed RAW as `agentType` — the G25 "unknown→coder" rule lived only in the prompt, so any freeform value (`claude`, `backend`, a typo) was a PERMANENT spawn error → card `failed` → (combined with the merge-gate bug) the whole batch unmerged. **PATCH** (bug-fix/hardening of the EXPERIMENTAL `new2` surface only; **no `baldart.config.yml` key**, **no change to `/new`** — the schema-change propagation rule does not apply).
11
+
12
+ ### Fixed
13
+
14
+ - **`new2.js` (A1) — deterministic owner_agent clamp in JS, not prompt.** After the pre-flight, `cardGraph[].ownerAgent` is clamped through the exact `/new` router table (`implement.md` §6b): `coder`/`ui-expert` pass through; `plan`/`visual-designer`/`motion-expert`/unknown/missing degrade to `coder`, each with a `[ROUTER]` ledger row (audit trail). The RAW value is kept for the security-relevance heuristic. Fixes the "wrong agent for the card" failure class at the code level.
15
+ - **`new2.js` (A2) — the merge integrity gate matches its own comment.** `followup`/`blocked` cards (rolled back + tracked in the offline-safe residual ledger) no longer count as "incomplete": one failed card used to strand every committed card in an orphaned worktree with no resume path (the run wasn't `degraded`, so the skill never resumed). `failed` (crash) counts as complete ONLY when its cleanup verified the worktree clean.
16
+ - **`new2.js` + `new2/SKILL.md` (A3) — `deferralClass` end-to-end (v4.24.1 covered only the review-blocks loop).** The ac-unmet loop now records WHY each deferral happened; `residuals[]` carry `deferralClass` and committed cards carry `deferredClasses[]`. The skill marks a deferred card DONE post-run ONLY when every class is `owner-gated`/`not-a-code-defect`/`policy-deferred-ac`; an `unresolved` class (an AC the workflow tried and failed to implement) keeps the card IN_PROGRESS and is surfaced explicitly — a genuinely unmet DoD can no longer be auto-DONE'd with a follow-up as a fig leaf (F-029).
17
+ - **`new2.js` (A4) — crashed cards clean up after themselves.** `crashResult` now runs the rollback (whole-worktree scope — safe: all committed work lives in HEAD), so a crashed card's dirty files no longer poison the next card's E4 reconcile with a misleading `out-of-ownership` revert.
18
+ - **`new2.js` (A5) — `${paths.backlog_dir}` instead of a hardcoded `backlog/*.yml`** in the epic-closure merge prompt (consumer portability).
19
+
20
+ ### Changed
21
+
22
+ - **`new2.js` (B1) — per-card idempotency probe removed (−N Haiku spawns/run).** `cardGraph[].alreadyCommitted` is now probed once by the pre-flight (git-authoritative: commit in `trunk..HEAD` + validations green + no open follow-up); on a fresh worktree it costs nothing, and resume stays covered by the journal cache.
23
+ - **`new2.js` (B2) — `security-reviewer` fan-out only when the card's files intersect `high_risk_modules`** (the bare `highRisk.length` fired it on every card of every project that configures the list); the owner-agent and brief-token triggers are unchanged.
24
+ - **`new2.js` (B3) — review blocks and scope-expansion findings batched per kind+domain → ONE `resolve()` per group** via the existing `findings[]` contract (the same F-007 batching the final review already used). One-by-one routing was the dominant per-card cost driver (each resolve = fixer + judge + possible Tier-2 fan-out).
25
+ - **`new2.js` (B4) — `executionMode`/`groups` removed from the pre-flight contract**: the DAG scheduler is strictly sequential (single worktree) and nothing ever read them; telemetry now reports the honest constant. Real parallelism is a future release, after A/B data.
26
+ - **`new2.js` (B5/C) — dead code removed**: the never-populated `lessons` mechanism (every cardBrief promised batch lessons that were always `(none)`), the unreachable `resolve()` `'fatal'` branch (`new2-resolve` never returns it — v4.17.2 G1 rule), the always-empty per-card `telemetry` stub (per_card now reports `deferred`/`deferredClasses`/gate count), and the now-unused `batchFatal` flag.
27
+
28
+ ## [4.24.1] - 2026-06-10
29
+
30
+ **`new2`: an owner-gated gate no longer destroys a completed card (silent work-loss → commit + defer).** A real `/new2` run on a schema-change card produced **zero output** despite 52 min of work: the card's only obstacle was an *owner-gated* step (`db:check-sync` needs an approved remote `db:push`). That step was correctly `policy-deferred` up front — but the **same** condition was *also* re-raised by a reviewer as a fresh `MIGRATION_NOT_DEPLOYED` blocker, which (via the review-block branch's `s !== 'resolved' → cardBlocked`) triggered `rollbackCard`'s `git clean -fd` and **erased the completed migration**. Compounding it: the `E4-file-diff` gate logged `AUTO-REVERTED` while reverting *nothing* (leaving the card's DoD-mandated ADR/ER-doc edits orphaned in the worktree — its MAY-EDIT map was narrower than the DoD), and the residual follow-up was written *inside the worktree* and marked `materialized:true` without disk proof, so it vanished when the batch didn't merge. Root cause confirmed via the gate ledger + on-disk state + two rounds of adversarial review (the obvious "E4 reverted it" diagnosis was **wrong** — E4 was a no-op; `rollbackCard` was the eraser). **PATCH** (bug-fix to the EXPERIMENTAL `new2` skill + its workflows; **no `baldart.config.yml` key** and **no change to shared `/new` prose** — the DONE-deferral is handled entirely inside `new2`'s own merge prompt + skill, so `/new` interactive is provably unaffected; the schema-change propagation rule does not apply).
31
+
32
+ ### Fixed
33
+
34
+ - **`framework/.claude/workflows/new2.js` + `new2-resolve.js` — classification-based card-block (the primary fix, F-040).** `resolve()` now propagates a structured `deferralClass` from `new2-resolve`'s terminal-judge. A review-block that resolves to an **owner-gated / not-a-code-defect** deferral no longer sets `cardBlocked` → the card's *complete* code is committed (it is **not** rolled back); only a genuine unresolved **code** defect (or `out-of-ownership`/`baseline`/`outage`) still blocks + rolls back. A deliberately-broken migration (`db:reset` failing) is still classified code-defect → blocks, so this is not an over-match escape hatch.
35
+ - **`new2.js` — `E4-file-diff` is honest.** The old `AUTO-REVERTED` log reverted nothing. The owner agent now reconciles out-of-ownership edits itself (per `implement.md §11b`) and reports `revertedOutOfOwnership`; the gate logs `REVERTED` / `FLAGGED` to match reality, and an unresolved violation becomes a tracked residual (never silent, never orphaned).
36
+ - **`new2.js` pre-flight — ownership map ⊇ DoD (root cause of the E4 false positive).** A card's MAY-EDIT now = `files_likely_touched` ∪ paths **named explicitly** in its `acceptance_criteria`/`definition_of_done` (the ADR/data-model/ER doc a schema-change must touch), so editing a DoD-mandated doc is no longer a file-diff violation.
37
+ - **`new2.js` + `new2/SKILL.md` — DONE deferred to the skill, gated on the follow-up existing on disk (closes the F-029 false-DONE the review surfaced).** A card carrying an open owner-gated/policy-deferred AC commits its code but stays **NON-DONE**; the merge agent is told to leave those cards non-DONE; the **skill** marks them DONE post-run **only after** verifying the deferral's follow-up exists on disk in the main repo (fail-loud otherwise — never DONE with a dropped requirement).
38
+ - **`new2-resolve.js` + `new2/SKILL.md` — follow-ups are reliable.** Follow-up materialisation is best-effort inside the workflow (it rides the merge if the batch merges); the **skill is the SSOT**, verifying every residual against the **main-repo** disk and creating any missing follow-up there — so a non-merged batch never loses one. `materialized` is now advisory only.
39
+ - **`new2.js` (Fix G) — AC-deferral dedup is text-drift-proof.** The policy-deferred-AC key is now scoped to the AC *number* (`acSig`), so a deferred AC is no longer re-routed to `resolve()` a second time when the pre-flight and implement agents word the AC text slightly differently.
40
+ - **`new2/SKILL.md` — telemetry reconciled against disk.** Before recording, the skill verifies each `committed` card actually has a commit on trunk and never presents progress the disk does not show; adds `cards_deferred_done_pending` so the A/B record distinguishes "code landed" from "DONE".
41
+
8
42
  ## [4.24.0] - 2026-06-10
9
43
 
10
44
  **Atomic backlog-ID allocator — no FEAT/BUG collisions across parallel worktrees.** When several `/prd` (or `/new`/`new2` follow-up) sessions run in parallel on sibling worktrees, each branched from the same trunk, the old `max(^id: FEAT-) + 1` scan made them all land on the **same next integer**: the other session's card was in flight on an unmerged sibling branch, invisible to both the local backlog and the trunk merge-base — so two epics both became `FEAT-0024` and conflicted at rebase/merge. The `git fetch` + merge-base scan only ever covered *already-merged* IDs, never in-flight ones. A new allocator anchors a lock + per-prefix high-water mark in `$MAIN/.worktrees/` (the shared coordination point every worktree already reaches via `git rev-parse --git-common-dir`, gitignored like `registry.json`), so a reservation is atomic across every worktree **on the same machine**. The high-water mark bumped under the lock is the correctness anchor; `max()` against the real backlog + sibling-worktree backlogs + reservations log + trunk merge-base makes it **self-healing**. **MINOR** (additive capability on the `worktree-manager` skill; opt-in — callers fall back to the inline merge-base scan + `[ID-RACE-RISK]` note when the script is absent, so older installs and cross-machine cloud agents are unaffected. **No `baldart.config.yml` key** — the allocator reuses `paths.backlog_dir` + the gitignored `.worktrees/` convention, so the schema-change propagation rule does not apply).
package/VERSION CHANGED
@@ -1 +1 @@
1
- 4.24.0
1
+ 4.24.2
@@ -117,42 +117,64 @@ returns when the batch is done. It returns:
117
117
 
118
118
  - `report` — ready-to-show markdown batch summary.
119
119
  - `residuals` — the **OFFLINE-SAFE ledger of record**: every residual the workflow
120
- could not finish, each `{ card, kind, evidence, materialized }`. A residual with
121
- `materialized:false` has NO follow-up card on disk yet (e.g. the workflow hit an
122
- outage where no agent could write the file). **You (the skill) must reconcile it.**
120
+ could not finish, each `{ card, kind, evidence, materialized }`. The `materialized`
121
+ flag is **advisory only** `true` means the workflow *attempted* a write (possibly
122
+ into a worktree that never merged), not that a card exists on disk in the main repo.
123
+ **You (the skill) must reconcile EVERY residual against the main-repo disk** (Step 5.1).
123
124
  - `degraded` / `degradationReasons` — the batch stopped early under a sustained
124
125
  outage (or another degradation). The batch is NOT complete; it must be resumed.
125
126
  - `telemetry` — the Phase-8 record (`variant:"new2"`).
126
127
 
127
128
  ### Step 5 — Reconcile, resume, present, record
128
129
 
129
- 1. **Materialise missing follow-ups (offline-safeyou have filesystem access, the
130
- workflow does not).** For every `residuals[]` entry with `materialized:false`,
131
- create a follow-up card `${paths.backlog_dir}/<card>-followup-<kind>.yml` (status:
132
- TODO). **Delegate the write to the `prd-card-writer` agent** the same owner the
133
- workflow uses (card-template, Rule C `review_profile`, `owner_agent` routed to the
134
- residual's domain, traceability) derived from the residual (≥1 requirement;
135
- `acceptance_criteria` = the verbatim residual; `files_likely_touched` from the card's
136
- ownership). Do NOT hand-write a minimal stub the offline path must match the
137
- agent-path quality (F-039). It MUST pass the `/new` pre-flight field check. If
138
- `prd-card-writer` itself is unavailable (total outage), fall back to a minimal valid
139
- stub so the card still exists. This is the layer that guarantees **nothing is ever
140
- dropped, even when every agent was dead** during the run.
141
- 2. **Resume if degraded.** If `degraded` is true, re-invoke the workflow with
130
+ 1. **Materialise follow-ups in the MAIN repo verify on disk, do NOT trust `materialized`
131
+ (F-040).** The workflow's agents run cd'd into the *worktree*, so any follow-up they wrote
132
+ may live in a worktree that was NOT merged (and is now gone) — `materialized:true` only means
133
+ "the workflow attempted a write", never proof on disk. So for **every** `residuals[]` entry
134
+ (regardless of the `materialized` flag), check whether a matching follow-up card actually exists
135
+ on disk under `${paths.backlog_dir}` in the **MAIN repo** (`<card>-followup-*.yml`). If it is
136
+ absent, create it by **delegating to the `prd-card-writer` agent** the same owner the workflow
137
+ uses (card-template, Rule C `review_profile`, `owner_agent` routed to the residual's domain,
138
+ traceability) derived from the residual (≥1 requirement; `acceptance_criteria` = the verbatim
139
+ residual; `files_likely_touched` from the card's ownership). Do NOT hand-write a minimal stub —
140
+ the offline path must match agent-path quality (F-039); it MUST pass the `/new` pre-flight field
141
+ check. If `prd-card-writer` is unavailable (total outage), fall back to a minimal valid stub. This
142
+ main-repo, **disk-verified** write is the SSOT — nothing is dropped even on a non-merged batch.
143
+ 2. **Mark deferred cards DONE — only after their follow-up exists AND every deferral class allows
144
+ it (F-040/H + A3).** Some committed cards were intentionally left **NON-DONE** because they carry
145
+ an open deferred AC: they are `perCardResults[]` entries with `deferred:true` plus a
146
+ `deferredClasses[]` array (also `cards_deferred_done_pending` in telemetry + the `F040-deferred`
147
+ ledger row). For each such card, **check the classes first**:
148
+ - **Every class ∈ {`owner-gated`, `not-a-code-defect`, `policy-deferred-ac`}** → the card's own
149
+ code is complete; the residual is an external/infra step. Now that step 1 guaranteed its
150
+ deferral's follow-up exists on disk in the main repo, set the card `status: DONE` +
151
+ `completed_date` + an implementation_note (`"DONE post-run (new2) — AC deferred to follow-up
152
+ <id>"`) in `${paths.backlog_dir}/<card>.yml`, and fold all of them into ONE reconciliation
153
+ commit in the MAIN repo.
154
+ - **ANY class ∈ {`unresolved`, `out-of-ownership`, `outage`} (or missing)** → the card's DoD is
155
+ genuinely NOT met (an AC the workflow tried and failed to implement). Leave it **IN_PROGRESS**
156
+ and surface it explicitly in the presentation: `"commit landed, DoD NON soddisfatta —
157
+ follow-up <id>"`. NEVER auto-DONE it — a follow-up tracks the gap, but DONE would lie (F-029).
158
+ **If a card's follow-up could NOT be created in step 1, leave it NON-DONE and surface it** —
159
+ fail-loud; NEVER mark a card DONE with a silently-dropped requirement (F-029).
160
+ 3. **Resume if degraded.** If `degraded` is true, re-invoke the workflow with
142
161
  `Workflow({ scriptPath, resumeFromRunId })` (same `args` + the new `ts`). The
143
162
  per-card **skip-completed** guard makes the resume idempotent — already-committed
144
163
  cards are skipped, only the incomplete/blocked ones run. Repeat until `degraded`
145
164
  is false (or the same cards stall twice → surface to the user).
146
- 3. **Present.** Print `report` verbatim. Surface `residuals` prominently
165
+ 4. **Present.** Print `report` verbatim. Surface `residuals` prominently
147
166
  ("questi residui sono tracciati come follow-up: …") — the post-run review that
148
167
  replaced the ~25 mid-run questions. If `degraded`, say so plainly (the run was
149
168
  incomplete and resumed).
150
- 4. **Record truthful telemetry.** Before appending `telemetry` to
151
- `${metricsDir}/skill-runs.jsonl`, fill the fields the workflow could not compute:
152
- `wall_clock_s` (now kickoff `ts`) and `followups_on_disk` (count the actual
153
- follow-up files on disk, NOT `residualFollowups.length` which double-counts).
154
- `total_tokens`/`agent_count` come from the workflow (`budget.spent()` delta +
155
- spawn counter); if `total_tokens` is null, run the `/new` Phase-8 `-stats` script
156
- to backfill real `usage`. Keep `degraded`/`degradation_reasons` in the record so
157
- the A/B comparison can exclude or weight degraded runs. Do NOT re-summarise the
158
- cardsthe workflow already did.
169
+ 5. **Record truthful telemetry — reconciled against disk (F-040).** Before appending `telemetry`
170
+ to `${metricsDir}/skill-runs.jsonl`, fill the fields the workflow could not compute and
171
+ **reconcile the report against the real disk state** (agent `reason` strings can over-claim — a
172
+ residual may say "AC PASS / migration created" about a change a rollback later erased). Verify:
173
+ every `perCardResults` entry marked `committed` actually has a commit in `${trunk}`
174
+ (`git -C $MAIN log --oneline ${trunk} | grep <card>`); annotate any divergence and never present
175
+ progress the disk does not show. Then fill `wall_clock_s` (now kickoff `ts`) and
176
+ `followups_on_disk` (count the actual follow-up files on disk in the main repo, NOT
177
+ `residualFollowups.length`which double-counts). `total_tokens`/`agent_count` come from the
178
+ workflow; if `total_tokens` is null, run the `/new` Phase-8 `-stats` script to backfill real
179
+ `usage`. Keep `degraded`/`degradation_reasons` + `cards_deferred_done_pending` in the record so
180
+ the A/B comparison stays honest. Do NOT re-summarise the cards — the workflow already did.
@@ -153,12 +153,12 @@ if (kind === 'scope-expansion') {
153
153
  `If INTEGRATE: apply (you are ${fixerAgent}), re-run lint+tsc, return applied:true verified:true. If FOLLOW-UP: applied:false verified:false note:'needs-followup: <why>'.`,
154
154
  { label: `resolve:scope:${card}`, phase: 'Repair', agentType: fixerAgent, schema: FIX_SCHEMA }
155
155
  )
156
- } catch (e) { if (e && e.transientExhausted) return { status: 'followup', reason: 'outage during scope-expansion', outOfScopeFindings: [] }; throw e }
156
+ } catch (e) { if (e && e.transientExhausted) return { status: 'followup', reason: 'outage during scope-expansion', deferralClass: 'outage', outOfScopeFindings: [] }; throw e }
157
157
  if (decide && decide.verified) {
158
158
  const ok = await judgeVerify([{ i: 1, r: decide }])
159
159
  if (ok.ok) { log('scope-expansion integrated within ownership.'); return { status: 'resolved', outOfScopeFindings: collectOOS(decide) } }
160
160
  }
161
- return await materialiseFollowup('scope-expansion', (decide && decide.note) || 'outside ownership / new AC / protected', collectOOS(decide))
161
+ return await materialiseFollowup('scope-expansion', (decide && decide.note) || 'outside ownership / new AC / protected', collectOOS(decide), 'scope-expansion')
162
162
  }
163
163
 
164
164
  // ───────────────────────────────────────────────────────────────────────────
@@ -179,7 +179,7 @@ try {
179
179
  `Tier-1 targeted repair for card ${card} (${kind}).\n\n${brief}\n\n${gateHint}\n\nApply the minimal correct fix within MAY-EDIT only. Re-run the originating gate and report verified honestly (never claim verified without re-running it).`,
180
180
  { label: `resolve:${kind}:${card}`, phase: 'Repair', agentType: fixerAgent, schema: FIX_SCHEMA }
181
181
  )
182
- } catch (e) { if (e && e.transientExhausted) return { status: 'followup', reason: 'outage during tier-1', outOfScopeFindings: [] }; throw e }
182
+ } catch (e) { if (e && e.transientExhausted) return { status: 'followup', reason: 'outage during tier-1', deferralClass: 'outage', outOfScopeFindings: [] }; throw e }
183
183
 
184
184
  // F-008 — terminal short-circuit, verified not trusted.
185
185
  if (attempt && attempt.terminal) {
@@ -197,7 +197,7 @@ if (attempt && attempt.terminal) {
197
197
  confirmed = !!(tj && tj.confirmed)
198
198
  } catch (_) { confirmed = false }
199
199
  }
200
- if (confirmed) { log(`${kind} terminal (${tr}) — short-circuit to follow-up.`); return await materialiseFollowup(kind, `terminal: ${tr} — ${attempt.note || ''}`, collectOOS(attempt)) }
200
+ if (confirmed) { log(`${kind} terminal (${tr}) — short-circuit to follow-up.`); return await materialiseFollowup(kind, `terminal: ${tr} — ${attempt.note || ''}`, collectOOS(attempt), tr || 'unresolved') }
201
201
  log(`terminal verdict (${tr}) rejected — proceeding to multi-attempt.`)
202
202
  }
203
203
 
@@ -242,7 +242,7 @@ if (canFanOut && !protectedDomain) {
242
242
  log('budget near target — skipping tier-2 fan-out.')
243
243
  }
244
244
 
245
- return await materialiseFollowup(kind, (attempt && attempt.note) || 'unresolved after repair tiers', collectOOS(attempt).concat(tier2OOS))
245
+ return await materialiseFollowup(kind, (attempt && attempt.note) || 'unresolved after repair tiers', collectOOS(attempt).concat(tier2OOS), 'unresolved')
246
246
 
247
247
  // ───────────────────────────────────────────────────────────────────────────
248
248
  // F-015/F-033 — mandatory adversarial judge + deterministic JS cross-check.
@@ -266,11 +266,20 @@ async function judgeVerify(verifiedAttempts) {
266
266
  return { ok: true, best: judge.best }
267
267
  }
268
268
 
269
- async function materialiseFollowup(k, reason, oos) {
269
+ // F-040 — `deferralClass` (4th arg) classifies WHY this became a follow-up so new2.js can
270
+ // decide whether the CARD should still commit. owner-gated / not-a-code-defect → the card's
271
+ // own code is complete; the residual is an external step (do NOT roll the card back). Anything
272
+ // else (unresolved code defect, out-of-ownership, baseline) → genuine block → rollback as before.
273
+ // The classifier flows back through resolve() in new2.js; never write it into the worktree.
274
+ async function materialiseFollowup(k, reason, oos, deferralClass) {
275
+ const cls = deferralClass || 'unresolved'
270
276
  let r = null
271
277
  try {
272
278
  // F-039 — backlog cards are owned by prd-card-writer (card-template + Rule C
273
279
  // review_profile + owner_agent + traceability), NOT a hand-written Haiku stub.
280
+ // F-040 — the workflow agent runs cd'd into the worktree, so this write is BEST-EFFORT
281
+ // (it rides the merge if the batch merges). The SKILL is the SSOT: it verifies/creates the
282
+ // card in the MAIN repo post-run, so a non-merged batch never loses the follow-up.
274
283
  r = await agentSafe(
275
284
  `Create ONE follow-up backlog card so this residual is TRACKED, not dropped (per ${REF}/completeness.md Phase 2.5b option 3). You are prd-card-writer: apply your card-template, Rule C (review_profile), owner_agent routing, and traceability rules — do NOT emit a minimal stub.\n\n${brief}\nKind: ${k}\nResidual domain: ${domain}\nReason unresolved: ${reason}\n\n` +
276
285
  `Write ${backlogDir}/${card}-followup-<gate>.yml with status: TODO, derived from the residual: requirements + acceptance_criteria (the verbatim residual as ≥1 AC), owner_agent routed to the residual domain (${domain}), review_profile per Rule C, files_likely_touched ≥1 from the card ownership / remedy files. It MUST pass the /new pre-flight field check. Return the created card id.`,
@@ -280,9 +289,9 @@ async function materialiseFollowup(k, reason, oos) {
280
289
  // F-020 — could not materialise (e.g. outage): return WITHOUT a followupCard so the
281
290
  // SKILL writes it from the offline-safe residual ledger. Never claim it was created.
282
291
  log(`follow-up materialisation failed (${String(e && e.message)}) — skill will reconcile.`)
283
- return { status: 'followup', followupCard: null, reason, outOfScopeFindings: oos || [] }
292
+ return { status: 'followup', followupCard: null, reason, deferralClass: cls, outOfScopeFindings: oos || [] }
284
293
  }
285
294
  const followupCard = (r && r.created && r.followupCard) ? r.followupCard : null
286
295
  log(`${k} → follow-up ${followupCard || '(deferred to skill)'} (nothing dropped).`)
287
- return { status: 'followup', followupCard, reason, outOfScopeFindings: oos || [] }
296
+ return { status: 'followup', followupCard, reason, deferralClass: cls, outOfScopeFindings: oos || [] }
288
297
  }
@@ -27,6 +27,10 @@ export const meta = {
27
27
  // { report, perCardResults, gateLedger, residualFollowups, telemetry, degraded, degradationReasons, residuals }
28
28
  // `residuals` (materialized:false) is the OFFLINE-SAFE ledger of record — the skill writes
29
29
  // any missing follow-up YAML and, if `degraded`, resumes via Workflow({scriptPath,resumeFromRunId}).
30
+ // A3 — residuals[] entries carry `deferralClass` and committed perCardResults[] carry
31
+ // `deferred` + `deferredClasses[]`: the skill marks a deferred card DONE post-run ONLY when
32
+ // every class is owner-gated / not-a-code-defect / policy-deferred-ac (an 'unresolved' class
33
+ // = DoD genuinely unmet → the card stays IN_PROGRESS and is surfaced).
30
34
  // ───────────────────────────────────────────────────────────────────────────
31
35
 
32
36
  // F-001/F-004 — tolerate args delivered as a JSON string (parse-or-default).
@@ -53,7 +57,6 @@ const gateLedger = [] // { card, gate, decision, detail }
53
57
  const residualFollowups = [] // { card, kind, followupCard, reason }
54
58
  const residuals = [] // F-020 OFFLINE-SAFE ledger: { card, kind, evidence, materialized }
55
59
  const perCardResults = []
56
- let batchFatal = false
57
60
  let prodReadiness = null
58
61
  let degraded = false
59
62
  const degradationReasons = []
@@ -70,6 +73,11 @@ function sig(card, gate, evidence) {
70
73
  const e = String(evidence || '').toLowerCase().replace(/\s+/g, ' ').replace(/[0-9a-f]{7,40}/g, '#').trim().slice(0, 160)
71
74
  return `${card}::${String(gate || '').toLowerCase()}::${e}`
72
75
  }
76
+ // F-040 (Fix G) — AC-deferral key on AC NUMBER ONLY. The full-text sig() drifted between the
77
+ // pre-flight policyDeferredACs[].text and the implement agent's unmetACs[].text, so a policy-
78
+ // deferred AC got re-routed to resolve() a second time (the migration-card double-routing). This
79
+ // coarse key is scoped to ac-defer so it never collides with a freeform 'blocker' finding.
80
+ function acSig(card, n) { return `${card}::ac-defer::ac-${String(n)}` }
73
81
 
74
82
  function ledger(card, gate, decision, detail) {
75
83
  gateLedger.push({ card, gate, decision, detail: detail || '' })
@@ -112,7 +120,7 @@ if (!cardIds.length) {
112
120
  // ───────────────────────────────────────────────────────────────────────────
113
121
  const PREFLIGHT_SCHEMA = {
114
122
  type: 'object',
115
- required: ['ok', 'worktreePath', 'branch', 'baseline', 'executionMode', 'cards', 'cardGraph'],
123
+ required: ['ok', 'worktreePath', 'branch', 'baseline', 'cards', 'cardGraph'],
116
124
  additionalProperties: false,
117
125
  properties: {
118
126
  ok: { type: 'boolean' },
@@ -121,8 +129,9 @@ const PREFLIGHT_SCHEMA = {
121
129
  port: { type: ['number', 'string'] },
122
130
  baseline: { enum: ['pass', 'fail'] },
123
131
  baselineLog: { type: 'string' },
124
- executionMode: { enum: ['sequential', 'team'] },
125
- groups: { type: 'array', items: { type: 'object', additionalProperties: true } },
132
+ // B4 — executionMode/groups removed: the DAG scheduler is strictly sequential (single
133
+ // worktree); computing team-mode groups was pre-flight work nobody read. Real parallelism
134
+ // is a future release, after A/B data.
126
135
  cards: { type: 'array', items: { type: 'string' }, description: 'card ids cleared to run' },
127
136
  // F-021/F-024/F-025/F-016 — the dependency graph + per-card routing facts (the
128
137
  // script cannot read YAML; the pre-flight agent supplies these).
@@ -135,6 +144,10 @@ const PREFLIGHT_SCHEMA = {
135
144
  dependsOn: { type: 'array', items: { type: 'string' }, description: 'IN-BATCH deps only' },
136
145
  ownerAgent: { type: 'string', description: 'coder|ui-expert|visual-designer|motion-expert (G25: unknown→coder)' },
137
146
  reviewProfile: { enum: ['skip', 'light', 'balanced', 'deep'] },
147
+ // B1/F-026 — git-authoritative idempotency, probed ONCE here instead of one Haiku
148
+ // spawn per card in runCard (always false on a fresh run).
149
+ alreadyCommitted: { type: 'boolean', description: 'commit referencing the card exists in trunk..HEAD of the worktree AND validation re-runs green AND no open follow-up' },
150
+ alreadyCommittedSha: { type: 'string' },
138
151
  // F-016 — ACs whose only implementation file is outside the card MAY-EDIT,
139
152
  // pre-classified deferred-by-policy (never routed to resolve).
140
153
  policyDeferredACs: { type: 'array', items: { type: 'object', additionalProperties: true } },
@@ -163,6 +176,7 @@ const MERGE_SCHEMA = {
163
176
  mergeTs: { type: 'string' },
164
177
  reconciliation: { type: 'string' },
165
178
  forcedDone: { type: 'array', items: { type: 'string' }, description: 'MUST be empty — false-DONE is forbidden (F-029)' },
179
+ deferredLeftOpen: { type: 'array', items: { type: 'string' }, description: 'F-040 — committed cards left NON-DONE (open owner-gated AC); the skill marks them DONE post-run' },
166
180
  epicsClosed: { type: 'array', items: { type: 'string' }, description: 'Epic/parent cards marked DONE by Phase 6b step 5e (all children DONE) — NOT a forcedDone violation' },
167
181
  uncommittedLeft: { type: 'boolean', description: 'true if dirty code was left (NOT committed) + reported (F-030)' },
168
182
  note: { type: 'string' },
@@ -206,9 +220,10 @@ try {
206
220
  g3Bullet +
207
221
  `• G4 card-field validation (setup.md 1b/1c): card missing requirements/acceptance_criteria/files_likely_touched → EXCLUDE (excluded[] + reason). Never HALT for one bad card.\n` +
208
222
  `• G5 depends-on: a card whose depends_on names a non-DONE card NOT in this batch → EXCLUDE it AND every in-batch card that transitively depends on it.\n` +
209
- `• cardGraph (REQUIRED, F-021): for every runnable card return { id, dependsOn:[IN-BATCH deps only], ownerAgent (the card's owner_agent; G25 unknown→'coder'), reviewProfile (the card's review_profile; default 'balanced'), policyDeferredACs }.\n` +
223
+ `• cardGraph (REQUIRED, F-021): for every runnable card return { id, dependsOn:[IN-BATCH deps only], ownerAgent (the card's owner_agent; G25 unknown→'coder'), reviewProfile (the card's review_profile; default 'balanced'), policyDeferredACs, alreadyCommitted, alreadyCommittedSha }.\n` +
224
+ `• B1/F-026 idempotency (per card, AFTER the worktree exists): set alreadyCommitted:true (+ alreadyCommittedSha) IFF ALL hold: (a) a commit referencing the card id exists in ${TRUNK}..HEAD of the worktree; (b) the card's validation_commands re-run GREEN right now; (c) NO open follow-up card for it exists in ${paths.backlog_dir || 'backlog'}. On a FRESH worktree ${TRUNK}..HEAD is empty → all false, zero extra work.\n` +
210
225
  `• F-016 AC↔ownership consistency: for each acceptance_criterion, derive the file(s) it requires editing. If those files are NOT a subset of the card's MAY-EDIT/files_likely_touched → add the AC to policyDeferredACs:[{n,text,owningCard|owningFile,reason}] (it will become ONE follow-up, never a resolve). Do the same for any AC whose remedy is an owner-gated infra action (remote db push / deploy / secret / DNS).\n` +
211
- `• Complexity (setup.md 3c): decide executionMode sequential|team (+ groups for team). Build the file-ownership map /tmp; return ownershipMapPath.\n` +
226
+ `• Ownership (setup.md 3c): build the file-ownership map → /tmp; return ownershipMapPath. F-040: each card's MAY-EDIT = files_likely_touched ∪ every path NAMED EXPLICITLY in that card's acceptance_criteria/definition_of_done (an ADR the DoD says to update, the data-model / ER doc for a schema-change, etc.) so editing a DoD-mandated doc is NOT a file-diff violation. Do NOT add another card's files this way.\n` +
212
227
  `• Persist per-card architecture baselines to /tmp/arch-baseline-<CARD>.md; return archBaselinePaths.\n\n` +
213
228
  `Return the structured PREFLIGHT object. ok:false ONLY if the workspace is unworkable.`,
214
229
  { label: 'preflight', phase: 'Pre-flight', agentType: 'general-purpose', schema: PREFLIGHT_SCHEMA }
@@ -234,6 +249,20 @@ const runnableCards = preflight.cards || []
234
249
  const cardGraph = preflight.cardGraph || []
235
250
  const graphById = {}
236
251
  for (const n of cardGraph) graphById[n.id] = n
252
+
253
+ // A1/G25 — deterministic owner_agent clamp, in JS not prompt. An invalid agentType is a
254
+ // PERMANENT spawn error (→ crashResult → card 'failed'), so the /new router table
255
+ // (implement.md §6b) must be a code-level guarantee: plan/visual-designer/motion-expert
256
+ // degrade to coder (their briefing variant is not implemented — same as /new), anything
257
+ // unknown/missing → coder. The RAW value is kept for the security-relevance heuristic.
258
+ const OWNER_SPAWN = { coder: 'coder', 'ui-expert': 'ui-expert' }
259
+ for (const n of cardGraph) {
260
+ const raw = String(n.ownerAgent || '').trim()
261
+ const spawn = OWNER_SPAWN[raw.toLowerCase()] || 'coder'
262
+ n.ownerAgentRaw = raw
263
+ n.ownerAgent = spawn
264
+ ledger(n.id, 'router', spawn === raw ? 'OK' : 'CLAMPED', `owner_agent='${raw || '(missing)'}' → spawn=${spawn}`)
265
+ }
237
266
  const sharedCtx = {
238
267
  worktreePath: preflight.worktreePath,
239
268
  branch: preflight.branch,
@@ -247,8 +276,10 @@ const sharedCtx = {
247
276
  // F-016/F-010 — materialise ONE follow-up per policy-deferred AC up front; never resolve.
248
277
  for (const n of cardGraph) {
249
278
  for (const ac of (n.policyDeferredACs || [])) {
250
- residuals.push({ card: n.id, kind: 'policy-deferred-ac', evidence: `AC-${ac.n}: ${ac.text} (${ac.reason || 'out-of-ownership / owner-gated'})`, materialized: false })
279
+ residuals.push({ card: n.id, kind: 'policy-deferred-ac', evidence: `AC-${ac.n}: ${ac.text} (${ac.reason || 'out-of-ownership / owner-gated'})`, materialized: false, deferralClass: 'policy-deferred-ac' })
251
280
  acceptedDeferrals.add(sig(n.id, 'ac-unmet', `AC-${ac.n}: ${ac.text}`))
281
+ acceptedDeferrals.add(acSig(n.id, ac.n)) // F-040 (Fix G) — text-drift-proof AC key
282
+
252
283
  ledger(n.id, 'F016-policy-defer', 'DEFERRED-BY-POLICY', `AC-${ac.n} → follow-up (owner: ${ac.owningCard || ac.owningFile || '?'})`)
253
284
  }
254
285
  }
@@ -274,11 +305,15 @@ function domainMayEdit(dom, codeScope) {
274
305
  return docPaths.length ? docPaths : codeScope // doc-only ownership; fall back to code scope if no doc paths configured
275
306
  }
276
307
 
308
+ // F-040 — returns { status:'resolved'|'followup'|'fatal', deferralClass }. deferralClass tells
309
+ // the caller WHY a followup happened: 'owner-gated'/'not-a-code-defect' → the card's own code is
310
+ // complete (external/infra step remains) → caller must NOT roll the card back; anything else
311
+ // ('unresolved'/'out-of-ownership'/'baseline-not-reached'/'outage') → genuine block → rollback.
277
312
  async function resolve(kind, card, evidence, extra) {
278
313
  const s = sig(card, kind, evidence)
279
314
  if (resolvedSignatures.has(s) || acceptedDeferrals.has(s)) {
280
315
  ledger(card, 'resolve:' + kind, 'DEDUP-SKIP', 'already resolved/deferred this run')
281
- return 'resolved'
316
+ return { status: 'resolved', deferralClass: null }
282
317
  }
283
318
  resolvedSignatures.add(s)
284
319
  const dom = (extra && extra.domain) || 'code'
@@ -295,22 +330,25 @@ async function resolve(kind, card, evidence, extra) {
295
330
  })
296
331
  } catch (e) {
297
332
  if (e && (e.transientExhausted || isTransient(e))) noteDegraded('outage')
298
- res = { status: 'followup', reason: 'resolve workflow error: ' + String(e && e.message) }
333
+ res = { status: 'followup', reason: 'resolve workflow error: ' + String(e && e.message), deferralClass: 'outage' }
299
334
  }
335
+ // C — the 'fatal' branch was removed: new2-resolve never returns it (unreachable dead code,
336
+ // same rule as v4.17.2 G1).
300
337
  const status = (res && res.status) || 'followup'
301
- if (status === 'fatal') { batchFatal = true; ledger(card, 'resolve:' + kind, 'FATAL', (res && res.reason) || ''); return status }
338
+ const deferralClass = (res && res.deferralClass) || null
302
339
  if (status === 'followup') {
303
340
  acceptedDeferrals.add(s) // F-028 — a deferred residual must not be re-routed by a later gate.
304
341
  const fc = (res && res.followupCard) || null
305
342
  residualFollowups.push({ card, kind, followupCard: fc || '(pending)', reason: (res && res.reason) || '' })
306
- residuals.push({ card, kind, evidence, materialized: !!fc })
343
+ // A3 deferralClass rides the residual so the skill can gate DONE-reconciliation on it.
344
+ residuals.push({ card, kind, evidence, materialized: !!fc, deferralClass })
307
345
  }
308
346
  // F-022 — route out-of-scope findings the resolve surfaced.
309
347
  for (const osf of (res && res.outOfScopeFindings) || []) {
310
348
  residuals.push({ card, kind: 'out-of-scope', evidence: `${osf.file || ''}:${osf.line || ''} ${osf.evidence || ''}`, materialized: false })
311
349
  }
312
350
  ledger(card, 'resolve:' + kind, status, (res && (res.followupCard || res.reason)) || '')
313
- return status
351
+ return { status, deferralClass }
314
352
  }
315
353
 
316
354
  // ───────────────────────────────────────────────────────────────────────────
@@ -321,39 +359,47 @@ async function resolve(kind, card, evidence, extra) {
321
359
  async function rollbackCard(cardId, mayEdit) {
322
360
  // F-018 — restore the card's files to HEAD so a failed card never poisons the next.
323
361
  // Safe at file granularity because the DAG guarantees all deps are already committed
324
- // (HEAD contains their work); this removes only THIS card's uncommitted changes.
362
+ // (HEAD contains their work); this removes only THIS card's uncommitted changes. With an
363
+ // empty mayEdit (A4 crash path: ownership unknown) the scope is the whole worktree — still
364
+ // safe for the same reason: everything good is in HEAD, only the crashed card is dirty.
365
+ // Returns true when the cleanup VERIFIED clean (A2 uses this to un-strand the merge gate).
366
+ const scope = (mayEdit || []).map((p) => `'${p}'`).join(' ') || '.'
325
367
  try {
326
- await agentSafe(
327
- `In the worktree ${sharedCtx.worktreePath}, restore the working tree to a CLEAN state for a FAILED card: \`git restore --source=HEAD --worktree --staged -- ${(mayEdit || []).map((p) => `'${p}'`).join(' ') || '.'}\` then \`git clean -fd\` ONLY within the card MAY-EDIT paths. Do NOT touch other cards' committed work. Confirm \`git status --porcelain\` is empty for those paths.`,
368
+ const r = await agentSafe(
369
+ `In the worktree ${sharedCtx.worktreePath}, restore the working tree to a CLEAN state for a FAILED card: \`git restore --source=HEAD --worktree --staged -- ${scope}\` then \`git clean -fd ${scope === '.' ? '' : '-- ' + scope}\` (with scope '.' this cleans the whole worktree — safe: all committed work lives in HEAD). Do NOT touch other cards' committed work. Confirm \`git status --porcelain\` is empty for the scope.`,
328
370
  { label: `rollback:${cardId}`, phase: 'Implement', agentType: 'general-purpose', model: 'haiku',
329
371
  schema: { type: 'object', required: ['clean'], additionalProperties: true, properties: { clean: { type: 'boolean' }, note: { type: 'string' } } } }
330
372
  )
331
- } catch (_) { /* best-effort; OUTAGE path already flagged by caller */ }
373
+ return !!(r && r.clean)
374
+ } catch (_) { return false /* best-effort; OUTAGE path already flagged by caller */ }
332
375
  }
333
376
 
334
- async function runCard(cardId, cardPath, lessons) {
377
+ async function runCard(cardId, cardPath) {
335
378
  const gates = []
336
- const tele = {}
337
379
  const node = graphById[cardId] || {}
338
380
  const ownerAgent = node.ownerAgent || 'coder'
339
381
  const reviewProfile = node.reviewProfile || 'balanced'
382
+ // F-040/H — a card carrying an open owner-gated/policy-deferred AC commits its code but stays
383
+ // NON-DONE; the SKILL marks it DONE post-run only after the deferral's follow-up exists on disk
384
+ // in the main repo. Seeded from the pre-flight policy-deferred ACs; set by any owner-gated review
385
+ // deferral or unmet-AC follow-up below.
386
+ // A3 — deferredClasses records WHY each deferral happened. The skill marks the card DONE
387
+ // post-run ONLY if every class is owner-gated/not-a-code-defect/policy-deferred-ac; a class
388
+ // like 'unresolved' (a genuinely unimplemented AC) keeps the card IN_PROGRESS — never auto-DONE.
389
+ let deferredOpen = ((node.policyDeferredACs) || []).length > 0
390
+ const deferredClasses = new Set(deferredOpen ? ['policy-deferred-ac'] : [])
340
391
  function g(name, decision, detail) { gates.push({ gate: name, decision, detail: detail || '' }); ledger(cardId, name, decision, detail) }
341
392
 
342
- // F-026 — skip-completed: only if committed AND gates green for that sha AND no open follow-up.
343
- // Keyed on the receipt, NOT the (unreliable) DONE flag.
344
- try {
345
- const probe = await agentSafe(
346
- `Idempotency probe for card ${cardId} in worktree ${sharedCtx.worktreePath}. Return done:true ONLY if ALL hold: (a) a commit referencing ${cardId} exists in ${TRUNK}..HEAD; (b) the card's validation_commands re-run GREEN right now (tsc/lint/card greps); (c) NO open follow-up card for ${cardId} exists in ${paths.backlog_dir || 'backlog'}. Otherwise done:false. Do not edit anything.`,
347
- { label: `probe:${cardId}`, phase: 'Implement', agentType: 'general-purpose', model: 'haiku',
348
- schema: { type: 'object', required: ['done'], additionalProperties: true, properties: { done: { type: 'boolean' }, commit: { type: 'string' }, note: { type: 'string' } } } }
349
- )
350
- if (probe && probe.done) {
351
- g('skip-completed', 'CACHED', probe.commit || 'already committed + green')
352
- return { card: cardId, status: 'committed', commit: probe.commit || '-', filesChanged: [], scopeFiles: [], archBaselinePath: `/tmp/arch-baseline-${cardId}.md`, gates, telemetry: tele }
353
- }
354
- } catch (e) { if (e && e.transientExhausted) { noteDegraded('outage'); return { card: cardId, status: 'pending', gates, telemetry: tele } } }
393
+ // F-026/B1 — skip-completed from the PRE-FLIGHT's git-authoritative probe (cardGraph[].
394
+ // alreadyCommitted), not a per-card agent spawn: on a fresh run the old Haiku probe was N
395
+ // guaranteed-false spawns, and on resume the journal cache already covers it. Keyed on the
396
+ // receipt (commit in TRUNK..HEAD + green + no open follow-up), NOT the unreliable DONE flag.
397
+ if (node.alreadyCommitted) {
398
+ g('skip-completed', 'CACHED', node.alreadyCommittedSha || 'pre-flight: commit in trunk..HEAD + green + no open follow-up')
399
+ return { card: cardId, status: 'committed', commit: node.alreadyCommittedSha || '-', filesChanged: [], scopeFiles: [], archBaselinePath: `/tmp/arch-baseline-${cardId}.md`, gates }
400
+ }
355
401
 
356
- const cardBrief = `${projectBrief}\n\nCard: ${cardId}\nCard YAML: ${cardPath}\nOwner agent: ${ownerAgent} · Review profile: ${reviewProfile}\nWorktree: ${sharedCtx.worktreePath} (cd into it)\nFile-ownership map: ${sharedCtx.ownershipMapPath}\nBatch lessons so far: ${lessons.length ? lessons.join(' | ') : '(none)'}\nArch baseline (write to /tmp/arch-baseline-${cardId}.md): reuse if present.\nNOTE: ACs already pre-classified as policy-deferred MUST NOT be implemented or routed — they are tracked as follow-ups.`
402
+ const cardBrief = `${projectBrief}\n\nCard: ${cardId}\nCard YAML: ${cardPath}\nOwner agent: ${ownerAgent} · Review profile: ${reviewProfile}\nWorktree: ${sharedCtx.worktreePath} (cd into it)\nFile-ownership map: ${sharedCtx.ownershipMapPath}\nArch baseline (write to /tmp/arch-baseline-${cardId}.md): reuse if present.\nNOTE: ACs already pre-classified as policy-deferred MUST NOT be implemented or routed — they are tracked as follow-ups.`
357
403
 
358
404
  // --- Phase 1+2: dispatch the card's OWNER_AGENT (F-024), not general-purpose. ---
359
405
  let impl
@@ -361,40 +407,55 @@ async function runCard(cardId, cardPath, lessons) {
361
407
  impl = await agentSafe(
362
408
  `Implement card ${cardId} per ${REF}/implement.md (Phase 1 claim+architect+plan-auditor, Phase 2 you ARE the owner_agent '${ownerAgent}') and ${REF}/completeness.md (Phase 2.5 + 2.5b AC-closure ledger). Run all gates/bash yourself.\n\n${cardBrief}\n\n` +
363
409
  `POLICIES: G26 Phase-2 lint/tsc/test/build failing after the module's retry cap → buildBlocked:true + blockedGate. Build the AC Closure Ledger (one row per AC: implemented|unmet|policy-deferred). DO NOT silently defer; report unmet rows (excluding policy-deferred). Persist arch baseline to /tmp/arch-baseline-${cardId}.md and the diff to /tmp/diff-${cardId}.txt.\n\n` +
364
- `Return: { epic, buildBlocked, blockedGate, unmetACs:[{n,text}], scopeFiles, mayEditPaths, fileDiffViolation, note }`,
410
+ `E4 OWNERSHIP RECONCILE (implement.md §11b — do this BEFORE returning): the card's MAY-EDIT includes files_likely_touched ∪ paths NAMED EXPLICITLY in this card's acceptance_criteria/definition_of_done (e.g. an ADR the DoD says to update, the data-model / ER doc for a schema change). Editing THOSE is in-scope. For any OTHER dirty file outside MAY-EDIT (another card's file, or unrelated): \`git checkout -- <file>\` to revert it (NEVER leave it orphaned), list it in revertedOutOfOwnership. Set fileDiffViolation:true ONLY if such an edit genuinely could not be reverted (then say why in note) — it is no longer a silent label.\n\n` +
411
+ `Return: { epic, buildBlocked, blockedGate, unmetACs:[{n,text}], scopeFiles, mayEditPaths, revertedOutOfOwnership:[paths], fileDiffViolation, note }`,
365
412
  { label: `implement:${cardId}`, phase: 'Implement', agentType: ownerAgent,
366
413
  schema: { type: 'object', required: ['epic', 'buildBlocked', 'unmetACs', 'scopeFiles'], additionalProperties: true,
367
- properties: { epic: { type: 'boolean' }, buildBlocked: { type: 'boolean' }, blockedGate: { type: 'string' }, unmetACs: { type: 'array', items: { type: 'object', additionalProperties: true } }, scopeFiles: { type: 'array', items: { type: 'string' } }, mayEditPaths: { type: 'array', items: { type: 'string' } }, fileDiffViolation: { type: 'boolean' }, note: { type: 'string' } } } }
414
+ properties: { epic: { type: 'boolean' }, buildBlocked: { type: 'boolean' }, blockedGate: { type: 'string' }, unmetACs: { type: 'array', items: { type: 'object', additionalProperties: true } }, scopeFiles: { type: 'array', items: { type: 'string' } }, mayEditPaths: { type: 'array', items: { type: 'string' } }, revertedOutOfOwnership: { type: 'array', items: { type: 'string' } }, fileDiffViolation: { type: 'boolean' }, note: { type: 'string' } } } }
368
415
  )
369
416
  } catch (e) {
370
- if (e && e.transientExhausted) { noteDegraded('outage'); return { card: cardId, status: 'pending', gates, telemetry: tele } }
417
+ if (e && e.transientExhausted) { noteDegraded('outage'); return { card: cardId, status: 'pending', gates } }
371
418
  throw e
372
419
  }
373
420
 
374
- if (impl && impl.epic) { g('router', 'EPIC-SKIPPED', 'epic card'); return { card: cardId, status: 'epic-skipped', gates, commit: '-', telemetry: tele } }
421
+ if (impl && impl.epic) { g('router', 'EPIC-SKIPPED', 'epic card'); return { card: cardId, status: 'epic-skipped', gates, commit: '-' } }
375
422
 
376
423
  const mayEdit = (impl && impl.mayEditPaths) || []
377
424
  const scopeFiles = (impl && impl.scopeFiles) || []
378
- if (impl && impl.fileDiffViolation) g('E4-file-diff', 'AUTO-REVERTED', 'coder touched files outside ownership')
425
+ // F-040 E4 honest label: 'AUTO-REVERTED' used to be a no-op log (files were left orphaned).
426
+ // Now the owner agent reconciles out-of-ownership edits itself (implement.md §11b); we report
427
+ // what it actually did. A genuine unresolved violation becomes a tracked residual, never silent.
428
+ const reverted = (impl && impl.revertedOutOfOwnership) || []
429
+ if (reverted.length) g('E4-file-diff', 'REVERTED', `out-of-ownership reverted: ${reverted.join(', ')}`)
430
+ if (impl && impl.fileDiffViolation) {
431
+ g('E4-file-diff', 'FLAGGED', 'unresolved out-of-ownership edit — tracked as residual')
432
+ residuals.push({ card: cardId, kind: 'file-diff-violation', evidence: `unresolved out-of-ownership edit: ${(impl && impl.note) || ''}`, materialized: false })
433
+ }
379
434
 
380
435
  if (impl && impl.buildBlocked) {
381
- const s = await resolve('blocker', cardId, `Phase-2 gate failing: ${impl.blockedGate}`, { mayEditPaths: mayEdit, scopeFiles, domain: 'code' })
436
+ const s = (await resolve('blocker', cardId, `Phase-2 gate failing: ${impl.blockedGate}`, { mayEditPaths: mayEdit, scopeFiles, domain: 'code' })).status
382
437
  g('G26-build', s === 'resolved' ? 'RESOLVED' : 'FOLLOWUP', impl.blockedGate)
383
- if (s !== 'resolved') { await rollbackCard(cardId, mayEdit); return { card: cardId, status: 'followup', gates, commit: '-', scopeFiles, telemetry: tele } }
438
+ if (s !== 'resolved') { await rollbackCard(cardId, mayEdit); return { card: cardId, status: 'followup', gates, commit: '-', scopeFiles } }
384
439
  }
385
440
 
386
441
  // F-010/F-016 — unmet ACs that are policy-deferred are skipped (already tracked).
387
442
  for (const ac of (impl && impl.unmetACs) || []) {
388
- if (acceptedDeferrals.has(sig(cardId, 'ac-unmet', `AC-${ac.n}: ${ac.text}`))) { g('G7-ac-closure', 'DEFERRED-BY-POLICY', `AC-${ac.n}`); continue }
389
- const s = await resolve('ac-unmet', cardId, `AC-${ac.n}: ${ac.text}`, { mayEditPaths: mayEdit, scopeFiles, domain: 'code' })
390
- g('G7-ac-closure', s === 'resolved' ? 'RESOLVED' : 'FOLLOWUP', `AC-${ac.n}`)
443
+ if (acceptedDeferrals.has(acSig(cardId, ac.n)) || acceptedDeferrals.has(sig(cardId, 'ac-unmet', `AC-${ac.n}: ${ac.text}`))) { g('G7-ac-closure', 'DEFERRED-BY-POLICY', `AC-${ac.n}`); deferredOpen = true; deferredClasses.add('policy-deferred-ac'); continue }
444
+ const r = await resolve('ac-unmet', cardId, `AC-${ac.n}: ${ac.text}`, { mayEditPaths: mayEdit, scopeFiles, domain: 'code' })
445
+ g('G7-ac-closure', r.status === 'resolved' ? 'RESOLVED' : 'FOLLOWUP', `AC-${ac.n}`)
446
+ // A3 — record the deferral CLASS (v4.24.1 did this only for the blocks loop). An
447
+ // 'unresolved' AC still commits (never destroy completed work) but the class keeps the
448
+ // card from being auto-DONE by the skill — its DoD is genuinely not met.
449
+ if (r.status !== 'resolved') { deferredOpen = true; deferredClasses.add(r.deferralClass || 'unresolved') }
391
450
  }
392
451
 
393
452
  // --- Review fan-out (F-024/F-025): specialized agents, trimmed by review_profile. ---
394
453
  // G5 — scopeFiles tokens alone miss a security card whose files don't carry them. Also
395
- // trigger security-reviewer on the card's owner_agent and its brief (title/requirements).
396
- const securityRelevant = highRisk.length
397
- || ownerAgent === 'security-reviewer'
454
+ // trigger security-reviewer on the card's RAW owner_agent and its brief (title/requirements).
455
+ // B2 high_risk_modules triggers only when THIS card's files intersect them (`highRisk.length`
456
+ // alone fired security-reviewer on every card of every project that configures the list).
457
+ const securityRelevant = highRisk.some((m) => scopeFiles.some((f) => String(f).includes(String(m))))
458
+ || node.ownerAgentRaw === 'security-reviewer'
398
459
  || /auth|security|secret|migration|rls/i.test(`${scopeFiles.join(' ')} ${cardBrief}`)
399
460
  // v4.18.0 — at `light`, Codex is the SOLE finder (cost-shift off Claude); `code-reviewer` is the
400
461
  // fallback when the companion is unavailable. The FP-gate equivalent is preserved downstream: any
@@ -435,48 +496,89 @@ async function runCard(cardId, cardPath, lessons) {
435
496
  const blocks = reviewResults.flatMap((r) => (r.blocks || [])).filter((b) => b && b.gate && b.evidence)
436
497
  const scopeExp = reviewResults.flatMap((r) => (r.scopeExpansion || []))
437
498
  let cardBlocked = false
499
+ // B3 — group blocks by kind+domain → ONE resolve per group via findings[] (the same F-007
500
+ // batching the final review already does). One-by-one routing was the dominant per-card cost
501
+ // driver (each resolve = fixer + judge + possible Tier-2 fan-out). A group is homogeneous by
502
+ // kind+domain, so its single status/deferralClass is coherent for every block in it.
503
+ const blockGroups = {}
438
504
  for (const b of blocks) {
439
505
  const kind = /e2e/i.test(b.gate) ? 'e2e-blocked' : /qa/i.test(b.gate) ? 'qa-fail' : 'blocker'
440
- const s = await resolve(kind, cardId, `${b.gate}: ${b.evidence}`, { mayEditPaths: mayEdit, scopeFiles, domain: b.domain || 'code' })
441
- g(b.gate, s === 'resolved' ? 'RESOLVED' : 'FOLLOWUP', b.evidence)
442
- if (s !== 'resolved') cardBlocked = true
506
+ const key = `${kind}::${b.domain || 'code'}`
507
+ ;(blockGroups[key] = blockGroups[key] || []).push(Object.assign({ kindResolved: kind }, b))
443
508
  }
444
- for (const sx of scopeExp) {
445
- const s = await resolve('scope-expansion', cardId, sx.evidence || '', { mayEditPaths: mayEdit, scopeFiles, domain: sx.domain || 'code' })
446
- g('scope-expansion', s === 'resolved' ? 'INTEGRATED' : 'FOLLOWUP', sx.evidence || '')
509
+ for (const key of Object.keys(blockGroups)) {
510
+ const grp = blockGroups[key]
511
+ const kind = grp[0].kindResolved
512
+ const dom = grp[0].domain || 'code'
513
+ const r = await resolve(kind, cardId, grp.map((b) => `${b.gate}: ${b.evidence}`).join(' || '),
514
+ { mayEditPaths: mayEdit, scopeFiles, domain: dom,
515
+ findings: grp.map((b) => ({ kind, evidence: `${b.gate}: ${b.evidence}`, domain: b.domain || 'code' })) })
516
+ // F-040 — THE primary fix. An owner-gated / not-a-code-defect deferral means the card's OWN
517
+ // code is complete and correct; the residual is an external/infra step (e.g. a remote db push)
518
+ // already tracked as a follow-up. Do NOT roll the card back — it proceeds to commit, NON-DONE
519
+ // (the skill marks it DONE post-run once the follow-up exists on disk). This replaces the old
520
+ // `s !== 'resolved' → cardBlocked` which destroyed a completed migration card's work over a db:push gate.
521
+ // A genuine unresolved CODE defect (or out-of-ownership/baseline/outage) still blocks + rolls back.
522
+ const ownerGated = r.status === 'followup' && (r.deferralClass === 'owner-gated' || r.deferralClass === 'not-a-code-defect')
523
+ for (const b of grp) g(b.gate, r.status === 'resolved' ? 'RESOLVED' : ownerGated ? 'DEFERRED-OWNER-GATED' : 'FOLLOWUP', b.evidence)
524
+ if (ownerGated) { deferredOpen = true; deferredClasses.add(r.deferralClass) }
525
+ else if (r.status !== 'resolved') cardBlocked = true
526
+ }
527
+ // B3 — scope-expansion findings batched per domain (same rationale).
528
+ const sxGroups = {}
529
+ for (const sx of scopeExp) { const d = sx.domain || 'code'; (sxGroups[d] = sxGroups[d] || []).push(sx) }
530
+ for (const dom of Object.keys(sxGroups)) {
531
+ const grp = sxGroups[dom]
532
+ const s = (await resolve('scope-expansion', cardId, grp.map((x) => x.evidence || '').join(' || '),
533
+ { mayEditPaths: mayEdit, scopeFiles, domain: dom,
534
+ findings: grp.map((x) => ({ kind: 'scope-expansion', evidence: x.evidence || '', domain: dom })) })).status
535
+ for (const sx of grp) g('scope-expansion', s === 'resolved' ? 'INTEGRATED' : 'FOLLOWUP', sx.evidence || '')
447
536
  }
448
537
 
449
- if (cardBlocked) { await rollbackCard(cardId, mayEdit); return { card: cardId, status: 'followup', gates, commit: '-', scopeFiles, archBaselinePath: `/tmp/arch-baseline-${cardId}.md`, telemetry: tele } }
538
+ if (cardBlocked) { await rollbackCard(cardId, mayEdit); return { card: cardId, status: 'followup', gates, commit: '-', scopeFiles, archBaselinePath: `/tmp/arch-baseline-${cardId}.md` } }
450
539
 
451
540
  // --- Phase 4 — commit (F-023: Haiku + git-status reconcile, never git add -A). ---
541
+ // F-040/H — DONE policy. A card with an OPEN owner-gated/policy-deferred AC commits its code but
542
+ // must NOT be marked DONE here (its own DoD isn't met yet — e.g. the remote db:push is pending).
543
+ // The new2 SKILL marks it DONE post-run, ONLY after the deferral's follow-up exists on disk in the
544
+ // main repo (so a card is never DONE with a silently-dropped requirement — F-029).
545
+ const doneStep = deferredOpen
546
+ ? `(4) DO NOT mark the card DONE: it has an OPEN owner-gated/policy-deferred AC. Keep status IN_PROGRESS and add an implementation_note "deferred — DONE pending follow-up (new2 skill reconciles post-run)". STILL add the ssot-registry row for the committed code.`
547
+ : `(4) mark the card DONE in its YAML + add the ssot-registry row.`
452
548
  let commitRes
453
549
  try {
454
550
  commitRes = await agentSafe(
455
551
  `Commit card ${cardId} in worktree ${sharedCtx.worktreePath}. MECHANICAL — do NOT re-read reference modules.\n` +
456
- `Steps: (1) \`git status --porcelain\`; (2) stage = MAY-EDIT (${JSON.stringify(mayEdit)}) ∩ dirty — NEVER \`git add -A\`, NEVER \`git stash\`; if dirty has files OUTSIDE MAY-EDIT, do NOT stage them and set reconcileNote; (3) commit message \`[${cardId}] <concise>\`; (4) mark the card DONE in its YAML + add the ssot-registry row; (5) 'nothing to commit' = already committed (record HEAD).\n` +
552
+ `Steps: (1) \`git status --porcelain\`; (2) stage = MAY-EDIT (${JSON.stringify(mayEdit)}) ∩ dirty — NEVER \`git add -A\`, NEVER \`git stash\`; if dirty has files OUTSIDE MAY-EDIT, do NOT stage them and set reconcileNote; (3) commit message \`[${cardId}] <concise>\`; ${doneStep} (5) 'nothing to commit' = already committed (record HEAD).\n` +
457
553
  `On COMMIT_LOCK: clear stale lock + retry once. Still locked → committed:false.\n\n` +
458
554
  `Return: { committed, commit, filesChanged, reconcileNote }`,
459
555
  { label: `commit:${cardId}`, phase: 'Implement', agentType: 'general-purpose', model: 'haiku',
460
556
  schema: { type: 'object', required: ['committed'], additionalProperties: true, properties: { committed: { type: 'boolean' }, commit: { type: 'string' }, filesChanged: { type: 'array', items: { type: 'string' } }, reconcileNote: { type: 'string' } } } }
461
557
  )
462
558
  } catch (e) {
463
- if (e && e.transientExhausted) { noteDegraded('outage'); await rollbackCard(cardId, mayEdit); return { card: cardId, status: 'pending', gates, telemetry: tele } }
559
+ if (e && e.transientExhausted) { noteDegraded('outage'); await rollbackCard(cardId, mayEdit); return { card: cardId, status: 'pending', gates } }
464
560
  throw e
465
561
  }
466
562
 
467
563
  if (!commitRes || !commitRes.committed) {
468
- const s = await resolve('blocker', cardId, 'commit blocked after retries', { mayEditPaths: mayEdit, scopeFiles, domain: 'code' })
564
+ const s = (await resolve('blocker', cardId, 'commit blocked after retries', { mayEditPaths: mayEdit, scopeFiles, domain: 'code' })).status
469
565
  g('G16-commit', s === 'resolved' ? 'RESOLVED' : 'FOLLOWUP')
470
- if (s !== 'resolved') { await rollbackCard(cardId, mayEdit); return { card: cardId, status: 'followup', gates, commit: '-', scopeFiles, archBaselinePath: `/tmp/arch-baseline-${cardId}.md`, telemetry: tele } }
566
+ if (s !== 'resolved') { await rollbackCard(cardId, mayEdit); return { card: cardId, status: 'followup', gates, commit: '-', scopeFiles, archBaselinePath: `/tmp/arch-baseline-${cardId}.md` } }
471
567
  }
472
568
  if (commitRes && commitRes.reconcileNote) g('commit-reconcile', 'NOTE', commitRes.reconcileNote)
473
569
 
474
- g('commit', 'COMMITTED', (commitRes && commitRes.commit) || '')
570
+ g('commit', 'COMMITTED', `${(commitRes && commitRes.commit) || ''}${deferredOpen ? ' (NON-DONE — deferred, skill reconciles)' : ''}`)
475
571
  return {
476
572
  card: cardId, status: 'committed',
477
573
  commit: (commitRes && commitRes.commit) || '-',
478
574
  filesChanged: (commitRes && commitRes.filesChanged) || [],
479
- scopeFiles, archBaselinePath: `/tmp/arch-baseline-${cardId}.md`, gates, telemetry: tele,
575
+ // F-040/H — true when this committed card is intentionally left NON-DONE (open deferral). The
576
+ // merge agent leaves it non-DONE and the SKILL marks it DONE after its follow-up materialises —
577
+ // but ONLY if every class in deferredClasses is owner-gated/not-a-code-defect/policy-deferred-ac
578
+ // (A3): an 'unresolved' class means the DoD is genuinely unmet → the card stays IN_PROGRESS.
579
+ deferred: deferredOpen,
580
+ deferredClasses: Array.from(deferredClasses),
581
+ scopeFiles, archBaselinePath: `/tmp/arch-baseline-${cardId}.md`, gates,
480
582
  }
481
583
  }
482
584
 
@@ -487,7 +589,7 @@ async function runCard(cardId, cardPath, lessons) {
487
589
  // re-queue with a cap; sustained outage → stop cleanly + degraded return.
488
590
  // ───────────────────────────────────────────────────────────────────────────
489
591
  phase('Implement')
490
- const lessons = []
592
+ const failedCleaned = new Set() // A4 — failed cards whose crash cleanup VERIFIED clean (un-strands the merge gate)
491
593
  const state = {} // cardId → 'pending'|'committed'|'followup'|'epic-skipped'|'blocked'|'failed'
492
594
  const attempts = {} // cardId → retry count (transient)
493
595
  const RETRY_CAP = 2
@@ -517,7 +619,7 @@ while (guard-- > 0) {
517
619
  const next = runnableCards.find((id) => state[id] === 'pending' && depsSatisfied(id))
518
620
  if (!next) break // nothing runnable (all done/blocked, or a cycle/stall)
519
621
 
520
- const r = await runCard(next, pathById[next], lessons).catch((e) => crashResult(next, e))
622
+ const r = await runCard(next, pathById[next]).catch((e) => crashResult(next, e))
521
623
  if (r.status === 'pending') {
522
624
  attempts[next]++
523
625
  consecutiveOutage++
@@ -532,7 +634,6 @@ while (guard-- > 0) {
532
634
  consecutiveOutage = 0
533
635
  state[next] = r.status
534
636
  perCardResults.push(r)
535
- if (r.note) lessons.push(`${next}: ${r.note}`)
536
637
  }
537
638
 
538
639
  // Any still-pending card after the loop (outage) is recorded as a residual.
@@ -540,11 +641,17 @@ for (const id of runnableCards) {
540
641
  if (state[id] === 'pending') residuals.push({ card: id, kind: 'not-reached', evidence: 'batch paused before this card ran', materialized: false })
541
642
  }
542
643
 
543
- function crashResult(id, e) {
544
- if (e && (e.transientExhausted || isTransient(e))) { return { card: id, status: 'pending', gates: [], telemetry: {} } }
644
+ async function crashResult(id, e) {
645
+ if (e && (e.transientExhausted || isTransient(e))) { return { card: id, status: 'pending', gates: [] } }
545
646
  residuals.push({ card: id, kind: 'agent-crash', evidence: String(e && e.message), materialized: false })
546
647
  ledger(id, 'runCard', 'ERROR', String(e && e.message))
547
- return { card: id, status: 'failed', gates: [{ gate: 'runCard', decision: 'ERROR', detail: String(e && e.message) }], commit: '-', telemetry: {} }
648
+ // A4 a crashed card used to leave its dirty files in the worktree (the NEXT card's owner
649
+ // agent then reverted them via E4 with a misleading 'out-of-ownership' label). Clean up here;
650
+ // a VERIFIED-clean failure also stops stranding the merge gate (A2).
651
+ const clean = await rollbackCard(id, [])
652
+ if (clean) failedCleaned.add(id)
653
+ ledger(id, 'crash-cleanup', clean ? 'CLEAN' : 'DIRTY', clean ? 'worktree restored to HEAD' : 'cleanup unverified — merge stays blocked')
654
+ return { card: id, status: 'failed', gates: [{ gate: 'runCard', decision: 'ERROR', detail: String(e && e.message) }], commit: '-' }
548
655
  }
549
656
 
550
657
  const committed = perCardResults.filter((r) => r.status === 'committed')
@@ -555,8 +662,8 @@ const committed = perCardResults.filter((r) => r.status === 'committed')
555
662
  // ───────────────────────────────────────────────────────────────────────────
556
663
  phase('Final')
557
664
  let finalSummary = null
558
- let mergeBlocked = batchFatal || degraded
559
- if (committed.length && !batchFatal && !degraded) {
665
+ let mergeBlocked = degraded
666
+ if (committed.length && !degraded) {
560
667
  const reviewScopeFiles = dedupe(committed.flatMap((r) => r.scopeFiles || []))
561
668
  const archPaths = committed.map((r) => r.archBaselinePath).filter(Boolean)
562
669
  const allArch = archPaths.length === committed.length ? archPaths : null
@@ -594,14 +701,14 @@ if (committed.length && !batchFatal && !degraded) {
594
701
  }
595
702
  for (const area of Object.keys(byArea)) {
596
703
  const group = byArea[area]
597
- const s = await resolve('merge-blocker', group[0].finding_id || firstCard,
704
+ const s = (await resolve('merge-blocker', group[0].finding_id || firstCard,
598
705
  group.map((f) => `${f.severity} ${f.title}: ${f.evidence}`).join(' || '),
599
706
  { mayEditPaths: reviewScopeFiles, scopeFiles: reviewScopeFiles, domain: group[0].domain || 'code',
600
- findings: group.map((f) => ({ kind: 'merge-blocker', evidence: `${f.title}: ${f.evidence}`, domain: f.domain || 'code' })) })
707
+ findings: group.map((f) => ({ kind: 'merge-blocker', evidence: `${f.title}: ${f.evidence}`, domain: f.domain || 'code' })) })).status
601
708
  if (s !== 'resolved') mergeBlocked = true
602
709
  }
603
710
  if (finalSummary && finalSummary.failingGates && finalSummary.failingGates.length) {
604
- const s = await resolve('qa-fail', firstCard, `final gates failing: ${finalSummary.failingGates.join(', ')}`, { mayEditPaths: reviewScopeFiles, scopeFiles: reviewScopeFiles, domain: 'code' })
711
+ const s = (await resolve('qa-fail', firstCard, `final gates failing: ${finalSummary.failingGates.join(', ')}`, { mayEditPaths: reviewScopeFiles, scopeFiles: reviewScopeFiles, domain: 'code' })).status
605
712
  if (s !== 'resolved') mergeBlocked = true
606
713
  }
607
714
  } else {
@@ -609,7 +716,7 @@ if (committed.length && !batchFatal && !degraded) {
609
716
  mergeBlocked = true
610
717
  }
611
718
  } else {
612
- ledger(firstCard, 'final-review', 'SKIPPED', batchFatal ? 'batch fatal' : degraded ? 'degraded (outage)' : 'no committed cards')
719
+ ledger(firstCard, 'final-review', 'SKIPPED', degraded ? 'degraded (outage)' : 'no committed cards')
613
720
  }
614
721
 
615
722
  // ───────────────────────────────────────────────────────────────────────────
@@ -620,8 +727,19 @@ if (committed.length && !batchFatal && !degraded) {
620
727
  // ───────────────────────────────────────────────────────────────────────────
621
728
  phase('Merge')
622
729
  let mergeResult = null
623
- const incomplete = runnableCards.filter((id) => state[id] !== 'committed' && state[id] !== 'epic-skipped')
624
- const integrityOK = committed.length > 0 && !mergeBlocked && !batchFatal && !degraded && incomplete.length === 0
730
+ // A2 'followup'/'blocked' cards were rolled back and their residuals live in the offline-safe
731
+ // ledger: the worktree holds ONLY gate-passing committed work, so they must not strand the batch
732
+ // (the old filter contradicted the gate's own comment and orphaned the worktree with no resume
733
+ // path — not degraded, so the skill never resumed). 'failed' counts as complete ONLY when its
734
+ // crash cleanup VERIFIED the worktree clean (A4); an unverified crash still blocks the merge.
735
+ const incomplete = runnableCards.filter((id) =>
736
+ !(state[id] === 'committed' || state[id] === 'epic-skipped' || state[id] === 'followup'
737
+ || state[id] === 'blocked' || (state[id] === 'failed' && failedCleaned.has(id))))
738
+ // F-040/H — committed cards intentionally left NON-DONE (open owner-gated/policy-deferred AC). They
739
+ // ARE merged (their code is complete), but Phase 6b must NOT force them to DONE; the SKILL does that
740
+ // post-run once the deferral's follow-up exists on disk. They count as complete for the merge gate.
741
+ const deferredCards = committed.filter((r) => r.deferred).map((r) => r.card)
742
+ const integrityOK = committed.length > 0 && !mergeBlocked && !degraded && incomplete.length === 0
625
743
  if (!committed.length) {
626
744
  ledger(firstCard, 'merge', 'SKIPPED', 'no committed cards')
627
745
  } else if (!integrityOK) {
@@ -635,15 +753,23 @@ if (!committed.length) {
635
753
  `• G24 → auto-merge via merge_strategy.\n` +
636
754
  `• F-030 HARD RULE: NEVER \`git add\`/commit code that did not pass the per-card gates. If the worktree is dirty with uncommitted code → DO NOT commit it; leave it, set uncommittedLeft:true, and report. NO "safety commit". Security/migration code is NEVER swept in.\n` +
637
755
  `• F-029 HARD RULE: Phase 6b reconciliation marks a card DONE ONLY if it has a real commit in ${TRUNK}..HEAD AND its gates are green. NEVER force a non-implemented card to DONE. Return forcedDone:[] (must be empty).\n` +
638
- `• EPIC CLOSURE (Phase 6b step 5e): the epic/parent card (group.is_epic:true) is NOT in the batch and stays TODO unless closed here. For each distinct group.parent of the batch cards (and any epic card in the batch itself): if EVERY child of that epic \`grep -l "parent: <EPIC-ID>" backlog/*.yml | xargs grep -L "status: DONE"\` prints nothing set the epic card status:DONE + completed_date + note "epic-closure gate all children DONE" and fold into the reconciliation commit. If any child is still open → leave the epic untouched. This is NOT a forcedDone violation (the epic is a tracker, gated on all-children-DONE, not on its own commit). Return epicsClosed:[<EPIC-IDs marked DONE>].\n` +
756
+ `• F-040 DEFERRED CARDS leave NON-DONE (do NOT force to DONE in Phase 6b): ${deferredCards.length ? deferredCards.join(' ') : '(none)'}. These committed their code but carry an OPEN owner-gated/policy-deferred AC (e.g. a pending remote db:push). Their YAML is INTENTIONALLY IN_PROGRESS; the new2 skill marks them DONE post-run after materialising the deferral's follow-up. They ARE part of the merge just skip them in the DONE-reconciliation. Return deferredLeftOpen:[the ones you left non-DONE].\n` +
757
+ `• EPIC CLOSURE (Phase 6b step 5e): the epic/parent card (group.is_epic:true) is NOT in the batch and stays TODO unless closed here. For each distinct group.parent of the batch cards (and any epic card in the batch itself): if EVERY child of that epic — \`grep -l "parent: <EPIC-ID>" ${paths.backlog_dir || 'backlog'}/*.yml | xargs grep -L "status: DONE"\` prints nothing — set the epic card status:DONE + completed_date + note "epic-closure gate — all children DONE" and fold into the reconciliation commit. If any child is still open → leave the epic untouched. This is NOT a forcedDone violation (the epic is a tracker, gated on all-children-DONE, not on its own commit). Return epicsClosed:[<EPIC-IDs marked DONE>].\n` +
639
758
  `• G19 sync-deferred → HEAD==${TRUNK} ff-pull, else leave+report. G20 → leave+report. G21 post-batch dirty → partition-ignore framework artifacts; leave the rest + report (do NOT commit). G22 divergence → behind: ff-pull; ahead/both: leave+report; NEVER reset --hard/force-push. G23 stash restore conflict → leave intact + report.\n\n` +
640
- `Return: { merged, mergeCommit, mergeTs, reconciliation, forcedDone:[], uncommittedLeft, note }`,
759
+ `Return: { merged, mergeCommit, mergeTs, reconciliation, forcedDone:[], deferredLeftOpen:[], uncommittedLeft, note }`,
641
760
  { label: 'merge', phase: 'Merge', agentType: 'general-purpose', schema: MERGE_SCHEMA }
642
761
  )
643
762
  } catch (e) { if (e && e.transientExhausted) noteDegraded('outage'); mergeResult = null }
644
763
  if (mergeResult && (mergeResult.forcedDone || []).length) { noteDegraded('false_done'); ledger(firstCard, 'F029-guard', 'VIOLATION', `forcedDone: ${mergeResult.forcedDone.join(' ')}`) }
645
764
  if (mergeResult && mergeResult.uncommittedLeft) ledger(firstCard, 'F030-guard', 'LEFT-UNCOMMITTED', 'dirty code left (not swept) + reported')
646
765
  if (mergeResult && (mergeResult.epicsClosed || []).length) ledger(firstCard, 'epic-closure', 'CLOSED', `epics marked DONE (all children DONE): ${mergeResult.epicsClosed.join(' ')}`)
766
+ if (deferredCards.length) {
767
+ ledger(firstCard, 'F040-deferred', 'LEFT-NON-DONE', `${deferredCards.join(' ')} — skill marks DONE post-run after follow-up materialises`)
768
+ // F-040 guard — catch a merge agent that ignored the instruction and force-DONE'd a deferred card.
769
+ const leftOpen = (mergeResult && mergeResult.deferredLeftOpen) || []
770
+ const wronglyDone = deferredCards.filter((c) => !leftOpen.includes(c))
771
+ if (mergeResult && wronglyDone.length) { noteDegraded('false_done'); ledger(firstCard, 'F040-guard', 'VIOLATION', `deferred cards force-DONE by merge: ${wronglyDone.join(' ')}`) }
772
+ }
647
773
  ledger(firstCard, 'G24-merge', (mergeResult && mergeResult.merged) ? 'MERGED' : 'INCOMPLETE', (mergeResult && (mergeResult.mergeCommit || mergeResult.note)) || '')
648
774
  if (mergeResult && mergeResult.reconciliation) ledger(firstCard, 'G19-23-reconcile', 'AUTO', mergeResult.reconciliation)
649
775
  }
@@ -686,6 +812,9 @@ function buildTelemetry() {
686
812
  ts: TS || null,
687
813
  cards_total: cardIds.length,
688
814
  cards_real_done: perCardResults.filter((r) => r.status === 'committed').length,
815
+ // F-040/H — committed cards left NON-DONE pending their owner-gated follow-up (the skill marks
816
+ // them DONE post-run). Surfaced so the A/B telemetry distinguishes "code landed" from "DONE".
817
+ cards_deferred_done_pending: perCardResults.filter((r) => r.deferred).length,
689
818
  cards_force_done: 0, // F-029 — force-DONE forbidden; always 0.
690
819
  cards_followup: perCardResults.filter((r) => r.status === 'followup').length,
691
820
  cards_blocked: runnableCards.filter((id) => state[id] === 'blocked').length,
@@ -698,12 +827,12 @@ function buildTelemetry() {
698
827
  merged: !!(mergeResult && mergeResult.merged),
699
828
  degraded,
700
829
  degradation_reasons: degradationReasons,
701
- execution_mode: preflight ? preflight.executionMode : 'sequential',
830
+ execution_mode: 'sequential', // B4 the scheduler is strictly sequential by design
702
831
  codex_resolved: preflight ? !!preflight.codexResolved : null, // v4.18.0 — probed for EVERY batch (drives per-card Codex-light + multi-card cross-card)
703
832
  // cost — total_tokens via budget.spent() delta; agent_count via counter; wall_clock_s stamped by the SKILL.
704
833
  total_tokens: totalTokens,
705
834
  agent_count: agentCount,
706
- per_card: perCardResults.map((r) => ({ card: r.card, status: r.status, telemetry: r.telemetry || {} })),
835
+ per_card: perCardResults.map((r) => ({ card: r.card, status: r.status, deferred: !!r.deferred, deferredClasses: r.deferredClasses || [], gates: (r.gates || []).length })),
707
836
  stats_requested: !!FLAGS.stats,
708
837
  }
709
838
  }
@@ -711,12 +840,12 @@ function buildTelemetry() {
711
840
  function buildReport(o) {
712
841
  const L = []
713
842
  L.push(`# new2 batch — ${cardIds.join(' ')}`)
714
- L.push(`Variant: **new2** · Mode: ${preflight ? preflight.executionMode : '?'} · Trunk: ${TRUNK}${degraded ? ' · ⚠️ DEGRADED (' + degradationReasons.join(',') + ')' : ''}`)
843
+ L.push(`Variant: **new2** · Mode: sequential · Trunk: ${TRUNK}${degraded ? ' · ⚠️ DEGRADED (' + degradationReasons.join(',') + ')' : ''}`)
715
844
  if (o.fatal) { L.push(``, `## ⛔ BATCH FATAL`, o.reason || 'workspace unworkable'); return L.join('\n') }
716
845
  L.push(``, `## Esito card`)
717
846
  L.push(`| Card | Status | Commit | File |`)
718
847
  L.push(`|------|--------|--------|------|`)
719
- for (const r of perCardResults) L.push(`| ${r.card} | ${r.status} | ${r.commit || '-'} | ${(r.filesChanged || []).length} |`)
848
+ for (const r of perCardResults) L.push(`| ${r.card} | ${r.status}${r.deferred ? ' (NON-DONE: deferred)' : ''} | ${r.commit || '-'} | ${(r.filesChanged || []).length} |`)
720
849
  const blockedIds = runnableCards.filter((id) => state[id] === 'blocked' || state[id] === 'pending')
721
850
  for (const id of blockedIds) L.push(`| ${id} | ${state[id]} | - | 0 |`)
722
851
  if (finalSummary) {
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "baldart",
3
- "version": "4.24.0",
3
+ "version": "4.24.2",
4
4
  "description": "Claude Agent Framework - Reusable framework for coordinating AI agents and humans in software projects",
5
5
  "bin": {
6
6
  "baldart": "./bin/baldart.js"