npm - baldart - Versions diffs - 4.27.1 → 4.27.2 - Mend

baldart 4.27.1 → 4.27.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (7) hide show

package/CHANGELOG.md +8 -0
package/VERSION +1 -1
package/framework/.claude/skills/new/references/implement.md +8 -0
package/framework/.claude/skills/new/references/setup.md +13 -0
package/framework/.claude/skills/new2/SKILL.md +38 -0
package/framework/.claude/workflows/new2-resolve.js +20 -10
package/package.json +1 -1

package/CHANGELOG.md CHANGED Viewed

@@ -5,6 +5,14 @@ All notable changes to BALDART will be documented in this file.
 The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
 and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
+## [4.27.2] - 2026-06-11
+**`new2` resolve: kill the second self-judging adversarial pass for `security` too.** v4.27.1 removed the wasteful adversarial doc review (doc-reviewer judging doc-reviewer). An audit of every fixer→judge pair in the resolution pass found `security` is the exact same structural case: the fixer is `security-reviewer` (protected domain) and so is the judge, so the mandatory cross-check was `security-reviewer` judging `security-reviewer` — no cross-model diversity, the same waste. The skip is now driven by a structural guard, `selfJudges = (fixerAgent === judgeAgent)`, instead of a per-domain allowlist, so it covers doc + security today and can't silently regress if the fixer/judge map changes. Every domain where fixer ≠ judge (ui/perf/test/code/migration) keeps the mandatory adversarial cross-check unchanged. **PATCH** (cost/latency optimization on the EXPERIMENTAL `new2` surface; no config key, no change to `/new`).
+### Changed
+- **`framework/.claude/workflows/new2-resolve.js`** — introduced `selfJudges`; both `judgeVerify()` and the terminal-judge ratification branch short-circuit when `selfJudges` is true (was `domain === 'doc'`). `meta.description` updated to describe the structural fixer===judge rule.
 ## [4.27.1] - 2026-06-11
 **`new2` resolve: never run an adversarial doc review.** When a resolution pass fixed a `doc`-domain finding, the fixer was `doc-reviewer` (which applies *and* self-verifies the fix in one pass) — but `judgeVerify()` then spawned `doc-reviewer` *again* as the mandatory adversarial judge, plus the terminal-judge ratification used it a third time. That is `doc-reviewer` judging `doc-reviewer`: zero cross-model diversity, pure waste of tokens and time. The doc domain now trusts the single reviewer-writer pass — the adversarial judge and the terminal-judge ratification are both skipped for `doc` only (every other domain keeps the mandatory cross-check unchanged, since their fixer and judge are genuinely independent specialists). **PATCH** (cost/latency optimization on the EXPERIMENTAL `new2` surface; no config key, no change to `/new`, no behavior change for code/ui/security/perf/test resolutions).

package/VERSION CHANGED Viewed

	@@ -1 +1 @@
1	- 4.27.1
1	+ 4.27.2

package/framework/.claude/skills/new/references/implement.md CHANGED Viewed

@@ -93,6 +93,14 @@
    You MUST actively verify and produce one of the two outcomes (implementation OR TODO comment).
    See `coder.md § Conditional Requirements — Binary-Outcome Items`.
+   ### Migration context (include ONLY when this card appears in the tracker `## Migration` manifest's `affects_cards`)
+   The required DB migration `<summary>` was ALREADY APPLIED to the active DB before this batch
+   started (Phase 0 step 1b, modality `<id>`). The schema is LIVE — run REAL validation against it
+   (`validation_commands`, queries, DB-generated types). Do NOT defer the schema-apply AC and do NOT
+   build against a stubbed/absent schema: the table/column exists. If `## Migration` shows
+   `status: degraded` instead (artifact missing / not applied), treat the schema as NOT yet live and
+   follow the card's normal owner-gated deferral path.
    ### Business Context (WHY this feature exists)
    [paste business_rationale field from card YAML]
    [if field is empty/missing, read PRD Section 1b from card's links.prd path]

package/framework/.claude/skills/new/references/setup.md CHANGED Viewed

@@ -17,6 +17,19 @@
 1. **Resolve `$MAIN`** — the absolute path of the main repo (not a worktree). If `/new` was invoked from inside a worktree, walk up to the parent repo via `git rev-parse --show-superproject-working-tree` or `git worktree list` until you find the non-worktree root. Persist as `Main repo:` in the tracker `## Worktree` section. **Write `$MAIN` to the tracker the moment it is computed** — every later consumer (Phase 6c, Phase 6b) MUST re-read it from the tracker and HALT with "`$MAIN` absent from tracker" if the field is missing or empty, never silently use an undefined `$MAIN` (it does not survive context compaction).
+1b. **Migration Gate (BLOCKING only when a migration is *declared* — else a silent no-op)** — resolve DB migrations interactively **before** the worktree exists, so the schema is live before any card builds against it. *Why this exists*: a migration applied to a shared/remote DB is owner-gated, so without this gate it is deferred to the END of the batch — and every downstream card in the batch is then built and verified against a schema that is not yet live (`validation_commands` / QA / E2E / DB-generated `tsc` types fail falsely → those cards cascade into deferral/blocked). Front-loading the migration removes that root cause. **The declaration lives in the EPIC card** (`migration_plan` block — project-specific, authored by the user, typically via the `.baldart/overlays/new.md` overlay). Steps:
+   1. **Find the batch's epic card(s).** Collect the distinct `group.parent` of the batch cards (or, for a `-full` invocation, the parent already known). For each, locate the epic YAML in `${paths.backlog_dir}` (filename `<PARENT>-*-epic.yml`, or the card whose `id == <PARENT>` / `group.is_epic: true`) and Read it.
+   2. **Look for `migration_plan` with `required: true`.** If **no** epic in the batch declares one → write `## Migration\nnone (no migration_plan declared)` to the tracker and **proceed to step 2** (behaviour is identical to today — zero extra prompts). This is the common case; do not surface anything.
+   3. **(declared) Verify the artifacts exist on disk.** For each path in `migration_plan.artifacts`, check it exists under `$MAIN`. If **any is missing** → do **NOT** apply or prompt; write `## Migration\ndeclared but artifact(s) missing: <paths> — degraded to deferred (author the migration first)` to the tracker and **proceed to step 2** (the migration falls back to the current end-of-batch owner-gated deferral — no regression). *(Auto-generating a missing artifact is out of scope; the migration must already be authored to be applied up-front.)*
+   4. **(declared + artifacts present) Assemble the apply modalities**, in this source order, de-duplicating by `id`: (a) `migration_plan.apply_modalities` from the epic block; (b) a `## Migration modalities` section in `.baldart/overlays/new.md` if present; (c) any project-memory note on how this project applies migrations; (d) the built-in tail `["Già applicata — prosegui", "Abort"]`. Each modality is `{ id, label, command? }`.
+   5. **`AskUserQuestion`** — `"La epic dichiara una migrazione DB (<summary>). Va applicata PRIMA del batch così le card downstream verificano contro lo schema reale. Come procedo?"` with up to 4 of the assembled modalities (always include "Già applicata — prosegui" and "Abort" as the last options). This is a legitimate Phase 0 question (Auto Mode does not override it).
+   6. **Execute the choice in `$MAIN`** (project env):
+      - a **command** modality → run it with output to disk (`<cmd> > /tmp/migration-<FIRST-CARD-ID>.log 2>&1`), surface only the exit code; on exit 0 run the optional `migration_plan.verify` probe and require it green too. On **failure** → surface a bounded extract (`tail -n 30`) and re-ask (re-offer the modalities + Abort); never silently proceed against a non-live schema. **Never run a command without the user having selected it.**
+      - **"Già applicata — prosegui"** → run the optional `verify` probe if present; otherwise trust the user.
+      - **"Abort"** → HALT the batch cleanly (leave the tracker in place); do not create a worktree.
+   7. **Record the manifest** in the tracker `## Migration` section: `status: applied|skipped|degraded`, `modality: <id>`, `artifacts: [...]`, `affects_cards: [...]`, `applied_at: <timestamp>`. Phase 1/2 briefings (implement.md) MUST surface, for any card in `affects_cards`, the note *"migration `<summary>` already applied to the active DB — run REAL validation (do not defer the schema AC)"*, so the live schema is actually exercised and the card's migration-apply AC is treated as satisfied, not deferred.
 2. **Fetch remote state**:
    ```bash
    git -C "$MAIN" fetch origin --quiet

package/framework/.claude/skills/new2/SKILL.md CHANGED Viewed

@@ -86,6 +86,43 @@ to its backlog YAML path under `${paths.backlog_dir}`.
   Read them for the review SEMANTICS, so `new2` stays a thin orchestration host and
   the A/B test isolates the host, not the protocol).
+### Step 3.5 — Migration Gate (BLOCKING only when a migration is *declared* — else a silent no-op)
+The workflow runs autonomously and **cannot** apply an owner-gated DB migration mid-run, so today a
+declared migration is deferred to the end of the batch — and every downstream card is built and
+verified against a schema that is not yet live (`validation_commands` / QA / E2E / DB-generated `tsc`
+types fail falsely → those cards cascade into deferral). This gate front-loads the migration: it runs
+**here, in the skill** (the main loop — it can interact; the workflow's zero-ask contract is
+untouched), so the schema is live **before** the workflow starts. It mirrors `/new` Phase 0 step 1b
+(`references/setup.md`) — same algorithm, same `migration_plan` epic-card convention.
+1. **Find the batch's epic card(s)** — distinct `group.parent` of the resolved cards (or the `-full`
+   parent). Read each epic YAML in `${paths.backlog_dir}` (`<PARENT>-*-epic.yml`, or the card whose
+   `id == <PARENT>` / `group.is_epic: true`).
+2. **Look for `migration_plan` with `required: true`.** None → set `migration = { status: 'none' }`
+   and go to Step 4 (identical to today — zero extra prompts; do not surface anything).
+3. **(declared) Verify `migration_plan.artifacts` exist on disk under `$MAIN`.** Any missing → do NOT
+   apply or prompt; set `migration = { status: 'degraded', reason: 'artifact missing', artifacts }`
+   and go to Step 4 (the migration falls back to the current end-of-batch owner-gated deferral — no
+   regression).
+4. **(declared + present) Assemble the apply modalities**, de-duped by `id`, in source order:
+   (a) `migration_plan.apply_modalities`; (b) a `## Migration modalities` section in
+   `.baldart/overlays/new.md`; (c) any project-memory note on how this project applies migrations;
+   (d) built-in tail `["Già applicata — prosegui", "Abort"]`.
+5. **`AskUserQuestion`** — `"La epic dichiara una migrazione DB (<summary>). La applico PRIMA del
+   batch così le card downstream verificano contro lo schema reale. Come procedo?"` with up to 4
+   modalities (always include "Già applicata — prosegui" and "Abort"). This is the SAME class as the
+   Step-2 "ONE pre-launch question" — pre-launch, not a mid-run gate; the zero-ask contract is about
+   the *workflow*, which is untouched.
+6. **Execute the choice in `$MAIN`**:
+   - a **command** modality → run with output to `/tmp/migration-<firstCard>.log`, surface only the
+     exit code; on exit 0 run the optional `migration_plan.verify` probe and require it green. On
+     failure → bounded extract (`tail -n 30`) + re-ask. **Never run a command the user did not pick.**
+   - **"Già applicata — prosegui"** → run `verify` if present; else trust the user.
+   - **"Abort"** → HALT cleanly; do NOT call the workflow.
+7. **Build the manifest** to pass to the workflow:
+   `migration = { status: 'applied'|'skipped', modality: <id>, summary, artifacts: [...], affects_cards: [...], appliedAt: ts }`.
 ### Step 4 — Delegate the whole batch to the workflow
 Call the `Workflow` tool:
@@ -100,6 +137,7 @@ Workflow({ name: 'new2', args: {
   refModulesBase,     // .claude/skills/new/references (semantic SSOT)
   config,             // the parsed baldart.config.yml (paths.*/stack.*/features.*/git.*)
   ts,                 // ISO timestamp NOW — the workflow has no clock (Date.now() unavailable there)
+  migration,          // Step-3.5 manifest: { status:'none'|'applied'|'skipped'|'degraded', modality?, summary?, artifacts?, affects_cards? }
   flags: { stats, effort, full }
 }})
 ```

package/framework/.claude/workflows/new2-resolve.js CHANGED Viewed

@@ -1,7 +1,7 @@
 export const meta = {
   name: 'new2-resolve',
   description:
-    "Self-healing resolution pass for the autonomous new2 batch workflow. Called by new2 whenever a deterministic gate would otherwise need a human: a card fail/blocker (ac-unmet | blocker | qa-fail | e2e-blocked | merge-blocker) or a legitimate scope-EXPANDING finding (scope-expansion). Tier-1 targeted fix with a TERMINAL short-circuit (skips the costly multi-attempt when the problem is impossible-by-definition, verified not trusted), then a judged multi-attempt; a MANDATORY adversarial judge cross-checks every verified claim against the real diff (prevents fabricated success) for every domain EXCEPT doc — doc-reviewer is a reviewer-writer that owns its domain (applies + self-verifies in one pass), so a second adversarial doc-reviewer would just judge itself: wasteful, so the doc judge is skipped. Specialized per domain (doc→doc-reviewer single pass, ui→ui-expert, security→security-reviewer judge, perf→api-perf-cost-auditor judge). Terminal is a tracked follow-up. Accepts a `findings` list (batched per area). Uses agent()/parallel() only — no nested workflows.",
+    "Self-healing resolution pass for the autonomous new2 batch workflow. Called by new2 whenever a deterministic gate would otherwise need a human: a card fail/blocker (ac-unmet | blocker | qa-fail | e2e-blocked | merge-blocker) or a legitimate scope-EXPANDING finding (scope-expansion). Tier-1 targeted fix with a TERMINAL short-circuit (skips the costly multi-attempt when the problem is impossible-by-definition, verified not trusted), then a judged multi-attempt; a MANDATORY adversarial judge cross-checks every verified claim against the real diff (prevents fabricated success) whenever the judge is a DIFFERENT specialist than the fixer. When fixer === judge (doc→doc-reviewer, security→security-reviewer) the second pass would just judge its own work — no cross-model diversity, pure waste — so the judge AND the terminal-verdict ratification are skipped (the reviewer-writer self-verifies in its single pass). Specialized per domain (doc→doc-reviewer single pass, ui→ui-expert fix + code-reviewer judge, security→security-reviewer single pass, perf→api-perf-cost-auditor judge). Terminal is a tracked follow-up. Accepts a `findings` list (batched per area). Uses agent()/parallel() only — no nested workflows.",
   phases: [
     { title: 'Diagnose', detail: 'classify + terminal short-circuit + scope-expansion boundary' },
     { title: 'Repair', detail: 'targeted fix, then judged multi-attempt if needed' },
@@ -110,6 +110,15 @@ const judgeAgent = (domain === 'security' || domain === 'migration') ? 'security
   : domain === 'doc' ? 'doc-reviewer'
   : domain === 'test' ? 'qa-sentinel'
   : 'code-reviewer'
+// Anti-self-judging — when the fixer and the adversarial judge resolve to the SAME agent type
+// (doc→doc-reviewer, security→security-reviewer), the "judge" would just review its own work:
+// no cross-model diversity, pure waste of tokens and time. These are reviewer-writer specialists
+// that own their domain and self-verify (re-run the gate) inside the fix pass, so we trust the
+// single pass and skip both the judge cross-check and the terminal-verdict ratification. Domains
+// where fixer ≠ judge (ui/perf/test/code/migration) keep the mandatory adversarial cross-check —
+// there the judge is a genuinely independent specialist. Condition is structural (not a domain
+// allowlist), so a future fixer/judge map change can't silently re-introduce self-judging.
+const selfJudges = fixerAgent === judgeAgent
 const findingsBlock = findings.map((f, i) => `  ${i + 1}. [${f.kind || kind}/${f.domain || domain}] ${f.evidence}`).join('\n')
 const brief = [
@@ -197,10 +206,10 @@ if (attempt && attempt.terminal) {
   let confirmed = false
   if (tr === 'out-of-ownership') {
     confirmed = !filesInScope(attempt.remedyFiles) // genuinely terminal iff remedy files are NOT in MAY-EDIT
-  } else if (domain === 'doc') {
-    // doc is fixed by doc-reviewer (reviewer-writer that owns its domain). Ratifying its
-    // terminal verdict with a SECOND doc-reviewer is doc-reviewer-judges-doc-reviewer —
-    // no cross-model diversity, pure waste. Trust the single pass; no adversarial re-run.
+  } else if (selfJudges) {
+    // fixer === judge (doc→doc-reviewer, security→security-reviewer): ratifying the terminal
+    // verdict with a SECOND instance of the same specialist is self-judging — no cross-model
+    // diversity, pure waste. Trust the reviewer-writer's single pass; no adversarial re-run.
     confirmed = true
   } else {
     // owner-gated / not-a-code-defect / baseline-not-reached — ratify with the judge.
@@ -265,11 +274,12 @@ return await materialiseFollowup(kind, (attempt && attempt.note) || 'unresolved
 // returns the files it independently confirmed changed; we cross-check ⊆ MAY-EDIT.
 async function judgeVerify(verifiedAttempts) {
   if (!verifiedAttempts.length) return { ok: false, best: 0 }
-  // doc-reviewer is a reviewer-writer that owns its domain: it applies the fix AND
-  // self-verifies in the same pass. A SECOND adversarial doc-reviewer cross-check is
-  // doc-reviewer-judges-doc-reviewer — no cross-model diversity, a pure waste of tokens
-  // and time. Trust the single pass; accept the first verified attempt without re-spawning.
-  if (domain === 'doc') {
+  // fixer === judge (doc→doc-reviewer, security→security-reviewer): the reviewer-writer
+  // already applied the fix AND self-verified (re-ran the gate) in the same pass. A SECOND
+  // adversarial instance of the same specialist would just judge its own work — no cross-model
+  // diversity, a pure waste of tokens and time. Trust the single pass; accept the first
+  // verified attempt without re-spawning. Domains where fixer ≠ judge keep the cross-check.
+  if (selfJudges) {
     const first = verifiedAttempts.find((v) => v.r && v.r.verified) || verifiedAttempts[0]
     return { ok: true, best: first.i }
   }

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "baldart",
-  "version": "4.27.1",
+  "version": "4.27.2",
   "description": "Claude Agent Framework - Reusable framework for coordinating AI agents and humans in software projects",
   "bin": {
     "baldart": "./bin/baldart.js"