npm - baldart - Versions diffs - 4.26.0 → 4.26.1 - Mend

baldart 4.26.0 → 4.26.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (5) hide show

package/CHANGELOG.md +14 -0
package/VERSION +1 -1
package/framework/.claude/workflows/new2-resolve.js +11 -1
package/framework/.claude/workflows/new2.js +50 -16
package/package.json +1 -1

package/CHANGELOG.md CHANGED Viewed

@@ -5,6 +5,20 @@ All notable changes to BALDART will be documented in this file.
 The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
 and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
+## [4.26.1] - 2026-06-11
+**`new2`: specialization integrity — a full audit of every agent spawn; no role mixing anywhere.** User principle: code is written ONLY by `coder`, UI only by `ui-expert`, each agent does one thing. The audit of all 16 spawn sites found two genuine violations and three under-specified plumbing roles. **PATCH** (role-integrity fixes on the EXPERIMENTAL `new2` surface; no config key, no change to `/new`).
+### Fixed
+- **`new2.js` E2 — the ops pre-flight agent no longer repairs the baseline.** "baseline FAILS → fix once" had a general-purpose git agent editing source code. Now: the pre-flight returns `baseline:'fail'` + an actionable log (explicit role boundary: it never edits source/doc files), and the WORKFLOW spawns the **coder** specialist for one bounded repair attempt — verified by deterministically re-running the baseline gates (no claim to trust), `E2-baseline: FIXED-BY-CODER` ledger row; still failing → batch-fatal as before. Zero extra spawns in the happy path.
+- **`new2.js` — the Codex driver never reviews.** Its runtime fallback was "perform the review yourself with the code-reviewer lens" — a general-purpose agent doing code review. Now the driver returns `note:'codex-unavailable'` with empty findings, and the workflow spawns the REAL `code-reviewer` (same `stdReview` call as the matrix), ledgered as `review-codex: FALLBACK`.
+- **`new2-resolve.js` — judge map completed per domain.** `doc` fixes are now judged by **doc-reviewer** (code-reviewer judging prose was cross-domain) and `test` fixes by **qa-sentinel**; `security`/`migration` → security-reviewer and `perf` → api-perf-cost-auditor unchanged; `ui`/`code` stay with code-reviewer (the judge verifies a CODE change — its charter, incl. DS rule 8 for UI). Fixer and judge of the same type remain two independent adversarial instances.
+### Changed
+- **`new2.js` — explicit ROLE BOUNDARY lines on every plumbing agent** (pre-flight, commit, merge, production-readiness): they never edit source/doc files; commit/merge touch ONLY card YAML status fields + registry rows; a content-level merge conflict is leave+report, never hand-resolved; production-readiness executes stack-matched commands but reports (never edits) code/config changes. The full writer map is now: code/perf/migration/test fixes → `coder`; UI → `ui-expert`; security fixes → `security-reviewer`; docs → `doc-reviewer`; backlog YAML → `prd-card-writer`; bookkeeping (status/registry) → mechanical commit/merge agents.
 ## [4.26.0] - 2026-06-11
 **`new2`: Phase 1 decomposed into specialist agents — the owner implements, it no longer explores.** The old per-card pipeline gave the owner agent one mega-prompt ("you ARE claim + architect + plan-auditor + owner"), so the owner absorbed the whole codebase exploration into its own context and reached the actual coding with a degraded window. Phase 1 now runs as dedicated specialists with file handoff (`/tmp`), per the "ognuno fa una cosa" principle. Verified premise: nested subagent spawning does NOT exist in Claude Code (official docs: "Subagents cannot spawn other subagents" + 2 empirical probes), so the decomposition lives at the WORKFLOW level — the JS is the orchestrator, exactly what dynamic workflows are for. **MINOR** (pipeline capability on the EXPERIMENTAL `new2` surface only; no config key, no change to `/new`).

package/VERSION CHANGED Viewed

	@@ -1 +1 @@
1	- 4.26.0
1	+ 4.26.1

package/framework/.claude/workflows/new2-resolve.js CHANGED Viewed

@@ -99,7 +99,17 @@ const FOLLOWUP_SCHEMA = {
 // F-024 — domain-specialized fixer + judge (full map; reviewer-owns-its-domain — a doc
 // finding is fixed by doc-reviewer, a security finding by security-reviewer, never coder).
 const fixerAgent = ({ doc: 'doc-reviewer', ui: 'ui-expert', security: 'security-reviewer' })[domain] || 'coder'
-const judgeAgent = (domain === 'security' || domain === 'migration') ? 'security-reviewer' : domain === 'perf' ? 'api-perf-cost-auditor' : 'code-reviewer'
+// Specialization integrity (v4.26.1) — the judge is the VERIFICATION specialist of the
+// finding's domain: doc fixes judged by doc-reviewer (code-reviewer judging prose was
+// cross-domain), test fixes by qa-sentinel (THE test specialist). `ui` and `code` stay with
+// code-reviewer: the judge verifies a CODE change, which is code-reviewer's charter
+// (including DS-coherence rule 8 for UI). Fixer and judge of the same type are still two
+// independent instances — the judge prompt is adversarial and greps the files itself.
+const judgeAgent = (domain === 'security' || domain === 'migration') ? 'security-reviewer'
+  : domain === 'perf' ? 'api-perf-cost-auditor'
+  : domain === 'doc' ? 'doc-reviewer'
+  : domain === 'test' ? 'qa-sentinel'
+  : 'code-reviewer'
 const findingsBlock = findings.map((f, i) => `  ${i + 1}. [${f.kind || kind}/${f.domain || domain}] ${f.evidence}`).join('\n')
 const brief = [

package/framework/.claude/workflows/new2.js CHANGED Viewed

@@ -219,9 +219,10 @@ try {
     `You are the deterministic PRE-FLIGHT for an autonomous /new batch (variant new2). Follow ${REF}/setup.md (Phase 0 + Pre-flight) and ${REF}/implement.md (Phase 1 depends-on gate) for the SEMANTICS, but replace EVERY AskUserQuestion with the deterministic policy below. You run all git/bash yourself (the workflow cannot).\n\n` +
       `${projectBrief}\n\nCards in batch (Read each YAML):\n${cardPaths.join('\n')}\nCard IDs: ${cardIds.join(' ')}\n\n` +
       `Create/maintain the recovery tracker at /tmp/batch-tracker-${firstCard}.md (per setup.md § Context Tracking).\n\n` +
+      `ROLE BOUNDARY (specialization integrity): you are the OPS/GIT agent. You NEVER edit source or doc files — any needed content change belongs to the coder specialist; report it instead.\n\n` +
       `DETERMINISTIC GATE POLICIES (NO user prompts):\n` +
       `• G1 dirty-tree (main repo ${MAIN}): partition framework-managed noise exactly as setup.md step 3 ($METRICS=${METRICS}, .baldart/generated|state.json|skill-conflicts.json — NOT overlays/). Genuine user work → auto-stash 'baldart-new2-${firstCard}' (main checkout) and record the label. Never commit/abort/prompt.\n` +
-      `• Worktree (setup.md step 4): create ONE code worktree off ${TRUNK}; install deps; assign a port; run the baseline (tsc+lint+build). Copy ONLY the artifacts needed (env/.env.local/.env.example/supabase/.temp) — do NOT bulk-copy untracked files from the main repo (avoids stray backlog cards in the worktree). Use the git-authoritative idempotency pre-check. E2: baseline FAILS → fix once; still failing → baseline:'fail' + baselineLog (batch-fatal).\n` +
+      `• Worktree (setup.md step 4): create ONE code worktree off ${TRUNK}; install deps; assign a port; run the baseline (tsc+lint+build). Copy ONLY the artifacts needed (env/.env.local/.env.example/supabase/.temp) — do NOT bulk-copy untracked files from the main repo (avoids stray backlog cards in the worktree). Use the git-authoritative idempotency pre-check. E2: baseline FAILS → do NOT fix it yourself (role boundary — the coder specialist repairs it); return baseline:'fail' + a baselineLog precise enough for a coder to act (failing command, error excerpt, suspect files).\n` +
       codexResolveBullet +
       g3Bullet +
       `• G4 card-field validation (setup.md 1b/1c): card missing requirements/acceptance_criteria/files_likely_touched → EXCLUDE (excluded[] + reason). Never HALT for one bad card.\n` +
@@ -239,9 +240,28 @@ try {
   return finalReturn({ fatal: true, reason: 'pre-flight failed: ' + String(e && e.message) })
 }
-if (!preflight || preflight.ok === false || preflight.baseline === 'fail') {
-  ledger(firstCard, 'E2-baseline', 'BATCH-FATAL', (preflight && preflight.baselineLog) || 'workspace unworkable')
-  return finalReturn({ fatal: true, reason: 'baseline build irrecoverable — see baselineLog' })
+if (!preflight || preflight.ok === false) {
+  ledger(firstCard, 'preflight', 'BATCH-FATAL', (preflight && preflight.workspaceNote) || 'workspace unworkable')
+  return finalReturn({ fatal: true, reason: 'workspace unworkable — see pre-flight' })
+}
+if (preflight.baseline === 'fail') {
+  // E2 (specialization integrity) — baseline repair is CODE work: it belongs to the coder
+  // specialist, not the ops pre-flight agent (which never edits source). ONE bounded attempt;
+  // the verification is the deterministic re-run of the baseline gates themselves (no claim
+  // to trust — and every card's G26 re-exercises them anyway). Still failing → batch-fatal.
+  let repair = null
+  try {
+    repair = await agentSafe(
+      `You are the coder. The batch worktree BASELINE is failing on trunk-derived code (this is NOT card work — no card has run yet). Worktree: ${preflight.worktreePath} (cd into it).\n\nFailure log:\n${preflight.baselineLog || '(missing — re-run tsc/lint/build to reproduce)'}\n\nApply the minimal correct fix so the baseline gates (tsc + lint + build) pass, RE-RUN them, and report honestly (fixed:true ONLY if they now pass). Return: { fixed, log }`,
+      { label: 'baseline-repair', phase: 'Pre-flight', agentType: 'coder',
+        schema: { type: 'object', required: ['fixed'], additionalProperties: true, properties: { fixed: { type: 'boolean' }, log: { type: 'string' } } } }
+    )
+  } catch (e) { if (e && e.transientExhausted) noteDegraded('outage'); repair = null }
+  if (repair && repair.fixed) ledger(firstCard, 'E2-baseline', 'FIXED-BY-CODER', String(repair.log || '').slice(0, 200))
+  else {
+    ledger(firstCard, 'E2-baseline', 'BATCH-FATAL', preflight.baselineLog || 'baseline irrecoverable')
+    return finalReturn({ fatal: true, reason: 'baseline build irrecoverable — see baselineLog' })
+  }
 }
 for (const ex of preflight.excluded || []) ledger(ex.card, 'preflight-exclude', 'EXCLUDED', ex.reason)
@@ -563,28 +583,41 @@ async function runCard(cardId, cardPath) {
   const reviewSchema = { type: 'object', required: ['blocks', 'scopeExpansion'], additionalProperties: true,
     properties: { blocks: { type: 'array', items: { type: 'object', additionalProperties: true } }, scopeExpansion: { type: 'array', items: { type: 'object', additionalProperties: true } }, note: { type: 'string' } } }
   let reviewResults = []
+  const onErr = (e) => { if (e && e.transientExhausted) noteDegraded('outage'); return null }
+  // The standard SPECIALIST reviewer spawn — also reused as the JS-level fallback when the
+  // Codex companion dies at runtime (specialization integrity: the driver never reviews).
+  const stdReview = (ra) => agentSafe(
+    `You are ${ra}. Review card ${cardId} per ${REF}/review-cycle.md + ${REF}/codex-gate.md (your domain only). Run your gates on the COMMITTED-or-working state.\n\n${cardBrief}\nDiff: /tmp/diff-${cardId}.txt\n\n` +
+      `Report ONLY blocking failures that survive your retry cap as blocks:[{gate,domain,evidence}] (each MUST have non-empty gate AND evidence — F-014). Report legitimate findings BEYOND this card's AC as scopeExpansion:[{evidence,domain,withinOwnership,newAC}].\n\n` +
+      `Return: { blocks:[...], scopeExpansion:[...], note }`,
+    { label: `review:${cardId}:${ra}`, phase: 'Implement', agentType: ra, schema: reviewSchema }
+  ).catch(onErr)
   try {
     reviewResults = (await parallel(reviewers.map((ra) => () => {
-      const onErr = (e) => { if (e && e.transientExhausted) noteDegraded('outage'); return null }
       if (ra === 'codex') {
-        // Codex-light finder: a general-purpose agent drives the resolved companion (Bash, --wait).
+        // Codex-light finder: a general-purpose agent DRIVES the resolved companion (Bash,
+        // --wait). Driver role only — it never reviews; runtime failure → note flag, and the
+        // workflow spawns the real code-reviewer below.
         return agentSafe(
-          `You are the Codex review driver for card ${cardId} (review_profile=light — Codex is the SOLE finder since v4.18.0; you are NOT code-reviewer). Run the OpenAI Codex companion as a REVIEW-ONLY adversarial pass over this card's diff, then return its material findings in the schema below.\n\n${cardBrief}\nDiff: /tmp/diff-${cardId}.txt\nMAY-EDIT: ${JSON.stringify(mayEdit)}\n\n` +
+          `You are the Codex review DRIVER for card ${cardId} (review_profile=light — Codex is the SOLE finder since v4.18.0; you are a driver, NOT a reviewer). Run the OpenAI Codex companion as a REVIEW-ONLY adversarial pass over this card's diff, then return its material findings in the schema below.\n\n${cardBrief}\nDiff: /tmp/diff-${cardId}.txt\nMAY-EDIT: ${JSON.stringify(mayEdit)}\n\n` +
             `Run it in the FOREGROUND (it blocks; do NOT pass run_in_background):\n  node "${sharedCtx.codexScriptPath}" task "Review-only — DO NOT make edits, no --write flag. Adversarial review of card ${cardId} using the diff at /tmp/diff-${cardId}.txt. Focus: auth/permission boundaries, data-loss paths, race conditions, rollback safety, schema drift, invariant violations. Report ONLY material findings with file+line evidence." --wait\n` +
-            `Read the Codex output ONLY through a [codex]-trace-stripping filter. **Fallback**: if it exits non-zero / prints CODEX_NOT_FOUND / stays empty, set note='codex-unavailable' and perform the review yourself with the code-reviewer lens (your domain).\n\n` +
+            `Read the Codex output ONLY through a [codex]-trace-stripping filter. **Fallback**: if it exits non-zero / prints CODEX_NOT_FOUND / stays empty, return note:'codex-unavailable' with EMPTY blocks/scopeExpansion — do NOT review yourself (role boundary); the workflow spawns the real code-reviewer.\n\n` +
             `Map Codex BLOCKER/HIGH findings to blocks:[{gate:'codex-light',domain,evidence}] (each non-empty gate AND evidence — F-014). Map legitimate findings BEYOND this card's AC to scopeExpansion:[{evidence,domain,withinOwnership,newAC}].\n\n` +
             `Return: { blocks:[...], scopeExpansion:[...], note }`,
           { label: `review:${cardId}:codex`, phase: 'Implement', agentType: 'general-purpose', schema: reviewSchema }
         ).catch(onErr)
       }
-      return agentSafe(
-        `You are ${ra}. Review card ${cardId} per ${REF}/review-cycle.md + ${REF}/codex-gate.md (your domain only). Run your gates on the COMMITTED-or-working state.\n\n${cardBrief}\nDiff: /tmp/diff-${cardId}.txt\n\n` +
-          `Report ONLY blocking failures that survive your retry cap as blocks:[{gate,domain,evidence}] (each MUST have non-empty gate AND evidence — F-014). Report legitimate findings BEYOND this card's AC as scopeExpansion:[{evidence,domain,withinOwnership,newAC}].\n\n` +
-          `Return: { blocks:[...], scopeExpansion:[...], note }`,
-        { label: `review:${cardId}:${ra}`, phase: 'Implement', agentType: ra, schema: reviewSchema }
-      ).catch(onErr)
+      return stdReview(ra)
     }))).filter(Boolean)
   } catch (_) { /* parallel never rejects; nulls filtered */ }
+  // Specialization integrity — companion died at runtime: replace the empty driver result
+  // with a REAL code-reviewer pass (the same fallback the JS-level codexAvail gate uses).
+  if (reviewResults.some((r) => r && r.note === 'codex-unavailable')) {
+    reviewResults = reviewResults.filter((r) => !(r && r.note === 'codex-unavailable'))
+    g('review-codex', 'FALLBACK', 'companion failed at runtime → code-reviewer spawned (driver never reviews)')
+    const fb = await stdReview('code-reviewer')
+    if (fb) reviewResults.push(fb)
+  }
   // F-014 — only route well-formed blocks (non-empty gate+evidence).
   const blocks = reviewResults.flatMap((r) => (r.blocks || [])).filter((b) => b && b.gate && b.evidence)
@@ -642,7 +675,7 @@ async function runCard(cardId, cardPath) {
   let commitRes
   try {
     commitRes = await agentSafe(
-      `Commit card ${cardId} in worktree ${sharedCtx.worktreePath}. MECHANICAL — do NOT re-read reference modules.\n` +
+      `Commit card ${cardId} in worktree ${sharedCtx.worktreePath}. MECHANICAL — do NOT re-read reference modules. ROLE BOUNDARY: you NEVER modify file contents except the card YAML status/note fields and the ssot-registry row — source/doc changes are not yours.\n` +
         `Steps: (1) \`git status --porcelain\`; (2) stage = MAY-EDIT (${JSON.stringify(mayEdit)}) ∩ dirty — NEVER \`git add -A\`, NEVER \`git stash\`; if dirty has files OUTSIDE MAY-EDIT, do NOT stage them and set reconcileNote; (3) commit message \`[${cardId}] <concise>\`; ${doneStep} (5) 'nothing to commit' = already committed (record HEAD).\n` +
         `On COMMIT_LOCK: clear stale lock + retry once. Still locked → committed:false.\n\n` +
         `Return: { committed, commit, filesChanged, reconcileNote }`,
@@ -863,6 +896,7 @@ if (!committed.length) {
     mergeResult = await agentSafe(
       `Auto-merge the batch worktree to ${TRUNK} per ${REF}/merge-cleanup.md (Phase 6 via /mw programmatic checksAlreadyPassed:true, Phase 6b status reconciliation, Phase 6c hygiene). Run git yourself.\n\n${projectBrief}\nWorktree: ${sharedCtx.worktreePath}\nBranch: ${sharedCtx.branch}\nmerge_strategy: ${mergeStrategy}\nCommitted cards: ${committed.map((r) => r.card).join(' ')}\nPhase-0 stash to restore (if any): see /tmp/batch-tracker-${firstCard}.md.\n\n` +
         `DETERMINISTIC POLICIES (NO prompts):\n` +
+        `• ROLE BOUNDARY: you are the OPS/GIT agent — you NEVER edit source or doc files. Reconciliation touches ONLY card YAML status fields + registry rows. A merge conflict on content is leave+report, never hand-resolved here.\n` +
         `• G24 → auto-merge via merge_strategy.\n` +
         `• F-030 HARD RULE: NEVER \`git add\`/commit code that did not pass the per-card gates. If the worktree is dirty with uncommitted code → DO NOT commit it; leave it, set uncommittedLeft:true, and report. NO "safety commit". Security/migration code is NEVER swept in.\n` +
         `• F-029 HARD RULE: Phase 6b reconciliation marks a card DONE ONLY if it has a real commit in ${TRUNK}..HEAD AND its gates are green. NEVER force a non-implemented card to DONE. Return forcedDone:[] (must be empty).\n` +
@@ -894,7 +928,7 @@ phase('Production')
 if (mergeResult && mergeResult.merged) {
   try {
     prodReadiness = await agentSafe(
-      `Run the post-merge Production Readiness checklist per ${REF}/production-readiness.md (Phase 7) over the batch's changed files. Auto-EXECUTE only stack-matched index/access-rule/cron deploys; REPORT (do not execute) env vars, feature flags, DB migrations, secrets, DNS. NON-BLOCKING.\n\n${projectBrief}\nChanged files: ${dedupe(committed.flatMap((r) => r.filesChanged || [])).join(', ') || '(derive from git)'}\n\nReturn: { autoExecuted:[...], manualItems:[...], note }`,
+      `Run the post-merge Production Readiness checklist per ${REF}/production-readiness.md (Phase 7) over the batch's changed files. Auto-EXECUTE only stack-matched index/access-rule/cron deploys; REPORT (do not execute) env vars, feature flags, DB migrations, secrets, DNS. NON-BLOCKING. ROLE BOUNDARY: you EXECUTE commands, you never edit repository files — a needed code/config change is reported as a manual item.\n\n${projectBrief}\nChanged files: ${dedupe(committed.flatMap((r) => r.filesChanged || [])).join(', ') || '(derive from git)'}\n\nReturn: { autoExecuted:[...], manualItems:[...], note }`,
       { label: 'production-readiness', phase: 'Production', agentType: 'general-purpose',
         schema: { type: 'object', required: ['manualItems'], additionalProperties: true, properties: { autoExecuted: { type: 'array', items: { type: 'string' } }, manualItems: { type: 'array', items: { type: 'string' } }, note: { type: 'string' } } } }
     )

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "baldart",
-  "version": "4.26.0",
+  "version": "4.26.1",
   "description": "Claude Agent Framework - Reusable framework for coordinating AI agents and humans in software projects",
   "bin": {
     "baldart": "./bin/baldart.js"