baldart 4.25.0 → 4.26.1
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/CHANGELOG.md +25 -0
- package/VERSION +1 -1
- package/framework/.claude/workflows/new2-resolve.js +11 -1
- package/framework/.claude/workflows/new2.js +127 -25
- package/package.json +1 -1
package/CHANGELOG.md
CHANGED
|
@@ -5,6 +5,31 @@ All notable changes to BALDART will be documented in this file.
|
|
|
5
5
|
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
|
|
6
6
|
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
|
|
7
7
|
|
|
8
|
+
## [4.26.1] - 2026-06-11
|
|
9
|
+
|
|
10
|
+
**`new2`: specialization integrity — a full audit of every agent spawn; no role mixing anywhere.** User principle: code is written ONLY by `coder`, UI only by `ui-expert`, each agent does one thing. The audit of all 16 spawn sites found two genuine violations and three under-specified plumbing roles. **PATCH** (role-integrity fixes on the EXPERIMENTAL `new2` surface; no config key, no change to `/new`).
|
|
11
|
+
|
|
12
|
+
### Fixed
|
|
13
|
+
|
|
14
|
+
- **`new2.js` E2 — the ops pre-flight agent no longer repairs the baseline.** "baseline FAILS → fix once" had a general-purpose git agent editing source code. Now: the pre-flight returns `baseline:'fail'` + an actionable log (explicit role boundary: it never edits source/doc files), and the WORKFLOW spawns the **coder** specialist for one bounded repair attempt — verified by deterministically re-running the baseline gates (no claim to trust), `E2-baseline: FIXED-BY-CODER` ledger row; still failing → batch-fatal as before. Zero extra spawns in the happy path.
|
|
15
|
+
- **`new2.js` — the Codex driver never reviews.** Its runtime fallback was "perform the review yourself with the code-reviewer lens" — a general-purpose agent doing code review. Now the driver returns `note:'codex-unavailable'` with empty findings, and the workflow spawns the REAL `code-reviewer` (same `stdReview` call as the matrix), ledgered as `review-codex: FALLBACK`.
|
|
16
|
+
- **`new2-resolve.js` — judge map completed per domain.** `doc` fixes are now judged by **doc-reviewer** (code-reviewer judging prose was cross-domain) and `test` fixes by **qa-sentinel**; `security`/`migration` → security-reviewer and `perf` → api-perf-cost-auditor unchanged; `ui`/`code` stay with code-reviewer (the judge verifies a CODE change — its charter, incl. DS rule 8 for UI). Fixer and judge of the same type remain two independent adversarial instances.
|
|
17
|
+
|
|
18
|
+
### Changed
|
|
19
|
+
|
|
20
|
+
- **`new2.js` — explicit ROLE BOUNDARY lines on every plumbing agent** (pre-flight, commit, merge, production-readiness): they never edit source/doc files; commit/merge touch ONLY card YAML status fields + registry rows; a content-level merge conflict is leave+report, never hand-resolved; production-readiness executes stack-matched commands but reports (never edits) code/config changes. The full writer map is now: code/perf/migration/test fixes → `coder`; UI → `ui-expert`; security fixes → `security-reviewer`; docs → `doc-reviewer`; backlog YAML → `prd-card-writer`; bookkeeping (status/registry) → mechanical commit/merge agents.
|
|
21
|
+
|
|
22
|
+
## [4.26.0] - 2026-06-11
|
|
23
|
+
|
|
24
|
+
**`new2`: Phase 1 decomposed into specialist agents — the owner implements, it no longer explores.** The old per-card pipeline gave the owner agent one mega-prompt ("you ARE claim + architect + plan-auditor + owner"), so the owner absorbed the whole codebase exploration into its own context and reached the actual coding with a degraded window. Phase 1 now runs as dedicated specialists with file handoff (`/tmp`), per the "ognuno fa una cosa" principle. Verified premise: nested subagent spawning does NOT exist in Claude Code (official docs: "Subagents cannot spawn other subagents" + 2 empirical probes), so the decomposition lives at the WORKFLOW level — the JS is the orchestrator, exactly what dynamic workflows are for. **MINOR** (pipeline capability on the EXPERIMENTAL `new2` surface only; no config key, no change to `/new`).
|
|
25
|
+
|
|
26
|
+
### Changed
|
|
27
|
+
|
|
28
|
+
- **`framework/.claude/workflows/new2.js` — per-card `codebase-architect` specialist (B7).** Context retrieval is now a dedicated `codebase-architect` spawn that writes the COMPLETE untruncated baseline to `/tmp/arch-baseline-<CARD>.md` (the owner and reviewers Read the file; the structured return stays minimal). Bonus over the old pre-flight snapshot: the per-card architect sees prior in-batch commits — card N's baseline reflects what cards 1..N-1 changed. The pre-flight no longer writes baselines (generalist relieved of specialist work; `archBaselinePaths` removed from its contract). **Fail-safe**: architect crash/not-ok degrades to the old inline behavior (the owner explores itself) — never blocks the card; transient outage re-queues it.
|
|
29
|
+
- **`new2.js` — `plan-auditor` demoted to a deterministic DRIFT gate.** `/prd` already validates every card at creation time, so an unconditional re-audit per card was duplicate work. plan-auditor now runs ONLY when execution-time drift is plausible: (a) a prior in-batch commit touched the card's declared surface (`filesLikelyTouched` ∩ prior `filesChanged`, computed in JS), (b) the architect found declared paths missing (stale card — factual `missingPaths` check, no judgement), or (c) `review_profile: deep` (Rule C escalation). Its corrections amend the implementation BRIEFING (never the backlog YAML). A fresh single-card batch with no drift evidence skips it at zero cost. Crash → non-blocking skip, ledgered.
|
|
30
|
+
- **`new2.js` — epic guard moved to JS (zero spawns for trackers).** The pre-flight returns `cardGraph[].isEpic` (implement.md §6b rule); an epic card now short-circuits in JS before ANY spawn — the old path burned a full owner-agent spawn just to learn the card was a tracker. The impl agent's own epic flag stays as backstop.
|
|
31
|
+
- **`new2.js` telemetry — `per_card[].phase1`** records `{architect: done|inline-fallback, audit: pass|fixes|skipped-no-drift|skipped-error|skipped-no-baseline}` plus `phase1-architect`/`phase1-audit` ledger rows, so the A/B accounting can price the decomposition.
|
|
32
|
+
|
|
8
33
|
## [4.25.0] - 2026-06-11
|
|
9
34
|
|
|
10
35
|
**`new2`: deterministic per-card review matrix — `balanced` no longer spawns every specialist on every card.** The old fan-out ran code-reviewer + doc-reviewer + qa-sentinel + api-perf-cost-auditor (+ security-reviewer) unconditionally at `balanced`: an api-perf pass on a doc-only card, a doc pass on a pure-code card — 1–3 wasted spawns per card. Each specialist now runs IFF its domain is evidenced by the card's actual surface (`scopeFiles ∪ MAY-EDIT`), computed deterministically in JS (no agent judgement) and audited via a `review-matrix` ledger row per card. **MINOR** (review-behavior capability on the EXPERIMENTAL `new2` surface only; no config key, no change to `/new` — its interactive profiles are untouched; the schema-change propagation rule does not apply).
|
package/VERSION
CHANGED
|
@@ -1 +1 @@
|
|
|
1
|
-
4.
|
|
1
|
+
4.26.1
|
|
@@ -99,7 +99,17 @@ const FOLLOWUP_SCHEMA = {
|
|
|
99
99
|
// F-024 — domain-specialized fixer + judge (full map; reviewer-owns-its-domain — a doc
|
|
100
100
|
// finding is fixed by doc-reviewer, a security finding by security-reviewer, never coder).
|
|
101
101
|
const fixerAgent = ({ doc: 'doc-reviewer', ui: 'ui-expert', security: 'security-reviewer' })[domain] || 'coder'
|
|
102
|
-
|
|
102
|
+
// Specialization integrity (v4.26.1) — the judge is the VERIFICATION specialist of the
|
|
103
|
+
// finding's domain: doc fixes judged by doc-reviewer (code-reviewer judging prose was
|
|
104
|
+
// cross-domain), test fixes by qa-sentinel (THE test specialist). `ui` and `code` stay with
|
|
105
|
+
// code-reviewer: the judge verifies a CODE change, which is code-reviewer's charter
|
|
106
|
+
// (including DS-coherence rule 8 for UI). Fixer and judge of the same type are still two
|
|
107
|
+
// independent instances — the judge prompt is adversarial and greps the files itself.
|
|
108
|
+
const judgeAgent = (domain === 'security' || domain === 'migration') ? 'security-reviewer'
|
|
109
|
+
: domain === 'perf' ? 'api-perf-cost-auditor'
|
|
110
|
+
: domain === 'doc' ? 'doc-reviewer'
|
|
111
|
+
: domain === 'test' ? 'qa-sentinel'
|
|
112
|
+
: 'code-reviewer'
|
|
103
113
|
|
|
104
114
|
const findingsBlock = findings.map((f, i) => ` ${i + 1}. [${f.kind || kind}/${f.domain || domain}] ${f.evidence}`).join('\n')
|
|
105
115
|
const brief = [
|
|
@@ -148,6 +148,10 @@ const PREFLIGHT_SCHEMA = {
|
|
|
148
148
|
// spawn per card in runCard (always false on a fresh run).
|
|
149
149
|
alreadyCommitted: { type: 'boolean', description: 'commit referencing the card exists in trunk..HEAD of the worktree AND validation re-runs green AND no open follow-up' },
|
|
150
150
|
alreadyCommittedSha: { type: 'string' },
|
|
151
|
+
// B7 (v4.26.0) — epic guard moved to JS (an epic never reaches any spawn) + the
|
|
152
|
+
// card's declared surface, used by the deterministic plan-audit drift gate.
|
|
153
|
+
isEpic: { type: 'boolean', description: "epic/tracker card per implement.md §6b: id ends '-00' OR filename ends '-epic.yml' OR group.is_epic:true OR review_profile 'skip' with no requirements" },
|
|
154
|
+
filesLikelyTouched: { type: 'array', items: { type: 'string' }, description: "the card's files_likely_touched, verbatim" },
|
|
151
155
|
// F-016 — ACs whose only implementation file is outside the card MAY-EDIT,
|
|
152
156
|
// pre-classified deferred-by-policy (never routed to resolve).
|
|
153
157
|
policyDeferredACs: { type: 'array', items: { type: 'object', additionalProperties: true } },
|
|
@@ -156,6 +160,9 @@ const PREFLIGHT_SCHEMA = {
|
|
|
156
160
|
},
|
|
157
161
|
excluded: { type: 'array', items: { type: 'object', additionalProperties: true } },
|
|
158
162
|
ownershipMapPath: { type: 'string' },
|
|
163
|
+
// B7 — archBaselinePaths removed: the per-card baseline is now written by the
|
|
164
|
+
// codebase-architect SPECIALIST in runCard (it also sees prior in-batch commits,
|
|
165
|
+
// which a pre-flight snapshot cannot), not by this generalist agent.
|
|
159
166
|
crossCard: { type: 'string' },
|
|
160
167
|
// G2 — deterministic Codex availability (glob-resolved, NOT a self-reported judgement),
|
|
161
168
|
// so a false negative on the cross-card check is visible in telemetry. true=companion found.
|
|
@@ -164,7 +171,6 @@ const PREFLIGHT_SCHEMA = {
|
|
|
164
171
|
// `light` card can run Codex as its finder. Empty string when codexResolved is false.
|
|
165
172
|
codexScriptPath: { type: 'string' },
|
|
166
173
|
workspaceNote: { type: 'string' },
|
|
167
|
-
archBaselinePaths: { type: 'array', items: { type: 'string' } },
|
|
168
174
|
},
|
|
169
175
|
}
|
|
170
176
|
|
|
@@ -213,18 +219,19 @@ try {
|
|
|
213
219
|
`You are the deterministic PRE-FLIGHT for an autonomous /new batch (variant new2). Follow ${REF}/setup.md (Phase 0 + Pre-flight) and ${REF}/implement.md (Phase 1 depends-on gate) for the SEMANTICS, but replace EVERY AskUserQuestion with the deterministic policy below. You run all git/bash yourself (the workflow cannot).\n\n` +
|
|
214
220
|
`${projectBrief}\n\nCards in batch (Read each YAML):\n${cardPaths.join('\n')}\nCard IDs: ${cardIds.join(' ')}\n\n` +
|
|
215
221
|
`Create/maintain the recovery tracker at /tmp/batch-tracker-${firstCard}.md (per setup.md § Context Tracking).\n\n` +
|
|
222
|
+
`ROLE BOUNDARY (specialization integrity): you are the OPS/GIT agent. You NEVER edit source or doc files — any needed content change belongs to the coder specialist; report it instead.\n\n` +
|
|
216
223
|
`DETERMINISTIC GATE POLICIES (NO user prompts):\n` +
|
|
217
224
|
`• G1 dirty-tree (main repo ${MAIN}): partition framework-managed noise exactly as setup.md step 3 ($METRICS=${METRICS}, .baldart/generated|state.json|skill-conflicts.json — NOT overlays/). Genuine user work → auto-stash 'baldart-new2-${firstCard}' (main checkout) and record the label. Never commit/abort/prompt.\n` +
|
|
218
|
-
`• Worktree (setup.md step 4): create ONE code worktree off ${TRUNK}; install deps; assign a port; run the baseline (tsc+lint+build). Copy ONLY the artifacts needed (env/.env.local/.env.example/supabase/.temp) — do NOT bulk-copy untracked files from the main repo (avoids stray backlog cards in the worktree). Use the git-authoritative idempotency pre-check. E2: baseline FAILS → fix
|
|
225
|
+
`• Worktree (setup.md step 4): create ONE code worktree off ${TRUNK}; install deps; assign a port; run the baseline (tsc+lint+build). Copy ONLY the artifacts needed (env/.env.local/.env.example/supabase/.temp) — do NOT bulk-copy untracked files from the main repo (avoids stray backlog cards in the worktree). Use the git-authoritative idempotency pre-check. E2: baseline FAILS → do NOT fix it yourself (role boundary — the coder specialist repairs it); return baseline:'fail' + a baselineLog precise enough for a coder to act (failing command, error excerpt, suspect files).\n` +
|
|
219
226
|
codexResolveBullet +
|
|
220
227
|
g3Bullet +
|
|
221
228
|
`• G4 card-field validation (setup.md 1b/1c): card missing requirements/acceptance_criteria/files_likely_touched → EXCLUDE (excluded[] + reason). Never HALT for one bad card.\n` +
|
|
222
229
|
`• G5 depends-on: a card whose depends_on names a non-DONE card NOT in this batch → EXCLUDE it AND every in-batch card that transitively depends on it.\n` +
|
|
223
|
-
`• cardGraph (REQUIRED, F-021): for every runnable card return { id, dependsOn:[IN-BATCH deps only], ownerAgent (the card's owner_agent; G25 unknown→'coder'), reviewProfile (the card's review_profile; default 'balanced'), policyDeferredACs, alreadyCommitted, alreadyCommittedSha }.\n` +
|
|
230
|
+
`• cardGraph (REQUIRED, F-021): for every runnable card return { id, dependsOn:[IN-BATCH deps only], ownerAgent (the card's owner_agent; G25 unknown→'coder'), reviewProfile (the card's review_profile; default 'balanced'), policyDeferredACs, alreadyCommitted, alreadyCommittedSha, isEpic (implement.md §6b epic guard: id ends '-00' OR filename ends '-epic.yml' OR group.is_epic:true OR review_profile 'skip' with no requirements), filesLikelyTouched (verbatim from the YAML) }.\n` +
|
|
224
231
|
`• B1/F-026 idempotency (per card, AFTER the worktree exists): set alreadyCommitted:true (+ alreadyCommittedSha) IFF ALL hold: (a) a commit referencing the card id exists in ${TRUNK}..HEAD of the worktree; (b) the card's validation_commands re-run GREEN right now; (c) NO open follow-up card for it exists in ${paths.backlog_dir || 'backlog'}. On a FRESH worktree ${TRUNK}..HEAD is empty → all false, zero extra work.\n` +
|
|
225
232
|
`• F-016 AC↔ownership consistency: for each acceptance_criterion, derive the file(s) it requires editing. If those files are NOT a subset of the card's MAY-EDIT/files_likely_touched → add the AC to policyDeferredACs:[{n,text,owningCard|owningFile,reason}] (it will become ONE follow-up, never a resolve). Do the same for any AC whose remedy is an owner-gated infra action (remote db push / deploy / secret / DNS).\n` +
|
|
226
233
|
`• Ownership (setup.md 3c): build the file-ownership map → /tmp; return ownershipMapPath. F-040: each card's MAY-EDIT = files_likely_touched ∪ every path NAMED EXPLICITLY in that card's acceptance_criteria/definition_of_done (an ADR the DoD says to update, the data-model / ER doc for a schema-change, etc.) — so editing a DoD-mandated doc is NOT a file-diff violation. Do NOT add another card's files this way.\n` +
|
|
227
|
-
`•
|
|
234
|
+
`• Do NOT write architecture baselines — the per-card codebase-architect specialist does that during the card pipeline (B7).\n\n` +
|
|
228
235
|
`Return the structured PREFLIGHT object. ok:false ONLY if the workspace is unworkable.`,
|
|
229
236
|
{ label: 'preflight', phase: 'Pre-flight', agentType: 'general-purpose', schema: PREFLIGHT_SCHEMA }
|
|
230
237
|
)
|
|
@@ -233,9 +240,28 @@ try {
|
|
|
233
240
|
return finalReturn({ fatal: true, reason: 'pre-flight failed: ' + String(e && e.message) })
|
|
234
241
|
}
|
|
235
242
|
|
|
236
|
-
if (!preflight || preflight.ok === false
|
|
237
|
-
ledger(firstCard, '
|
|
238
|
-
return finalReturn({ fatal: true, reason: '
|
|
243
|
+
if (!preflight || preflight.ok === false) {
|
|
244
|
+
ledger(firstCard, 'preflight', 'BATCH-FATAL', (preflight && preflight.workspaceNote) || 'workspace unworkable')
|
|
245
|
+
return finalReturn({ fatal: true, reason: 'workspace unworkable — see pre-flight' })
|
|
246
|
+
}
|
|
247
|
+
if (preflight.baseline === 'fail') {
|
|
248
|
+
// E2 (specialization integrity) — baseline repair is CODE work: it belongs to the coder
|
|
249
|
+
// specialist, not the ops pre-flight agent (which never edits source). ONE bounded attempt;
|
|
250
|
+
// the verification is the deterministic re-run of the baseline gates themselves (no claim
|
|
251
|
+
// to trust — and every card's G26 re-exercises them anyway). Still failing → batch-fatal.
|
|
252
|
+
let repair = null
|
|
253
|
+
try {
|
|
254
|
+
repair = await agentSafe(
|
|
255
|
+
`You are the coder. The batch worktree BASELINE is failing on trunk-derived code (this is NOT card work — no card has run yet). Worktree: ${preflight.worktreePath} (cd into it).\n\nFailure log:\n${preflight.baselineLog || '(missing — re-run tsc/lint/build to reproduce)'}\n\nApply the minimal correct fix so the baseline gates (tsc + lint + build) pass, RE-RUN them, and report honestly (fixed:true ONLY if they now pass). Return: { fixed, log }`,
|
|
256
|
+
{ label: 'baseline-repair', phase: 'Pre-flight', agentType: 'coder',
|
|
257
|
+
schema: { type: 'object', required: ['fixed'], additionalProperties: true, properties: { fixed: { type: 'boolean' }, log: { type: 'string' } } } }
|
|
258
|
+
)
|
|
259
|
+
} catch (e) { if (e && e.transientExhausted) noteDegraded('outage'); repair = null }
|
|
260
|
+
if (repair && repair.fixed) ledger(firstCard, 'E2-baseline', 'FIXED-BY-CODER', String(repair.log || '').slice(0, 200))
|
|
261
|
+
else {
|
|
262
|
+
ledger(firstCard, 'E2-baseline', 'BATCH-FATAL', preflight.baselineLog || 'baseline irrecoverable')
|
|
263
|
+
return finalReturn({ fatal: true, reason: 'baseline build irrecoverable — see baselineLog' })
|
|
264
|
+
}
|
|
239
265
|
}
|
|
240
266
|
|
|
241
267
|
for (const ex of preflight.excluded || []) ledger(ex.card, 'preflight-exclude', 'EXCLUDED', ex.reason)
|
|
@@ -267,7 +293,6 @@ const sharedCtx = {
|
|
|
267
293
|
worktreePath: preflight.worktreePath,
|
|
268
294
|
branch: preflight.branch,
|
|
269
295
|
ownershipMapPath: preflight.ownershipMapPath,
|
|
270
|
-
archBaselinePaths: preflight.archBaselinePaths || [],
|
|
271
296
|
// v4.18.0 — per-card Codex-light finder needs the resolved companion path in runCard scope.
|
|
272
297
|
codexResolved: !!preflight.codexResolved,
|
|
273
298
|
codexScriptPath: preflight.codexScriptPath || '',
|
|
@@ -390,6 +415,11 @@ async function runCard(cardId, cardPath) {
|
|
|
390
415
|
const deferredClasses = new Set(deferredOpen ? ['policy-deferred-ac'] : [])
|
|
391
416
|
function g(name, decision, detail) { gates.push({ gate: name, decision, detail: detail || '' }); ledger(cardId, name, decision, detail) }
|
|
392
417
|
|
|
418
|
+
// B7 — epic guard in JS (pre-flight reads the YAML): an epic is a tracker, not implementable
|
|
419
|
+
// work — skip it BEFORE any spawn (the old path burned a full owner-agent spawn to learn it).
|
|
420
|
+
// The impl agent's own epic flag stays as backstop for a pre-flight false negative.
|
|
421
|
+
if (node.isEpic) { g('router', 'EPIC-SKIPPED', 'epic card (pre-flight, zero spawns)'); return { card: cardId, status: 'epic-skipped', gates, commit: '-' } }
|
|
422
|
+
|
|
393
423
|
// F-026/B1 — skip-completed from the PRE-FLIGHT's git-authoritative probe (cardGraph[].
|
|
394
424
|
// alreadyCommitted), not a per-card agent spawn: on a fresh run the old Haiku probe was N
|
|
395
425
|
// guaranteed-false spawns, and on resume the journal cache already covers it. Keyed on the
|
|
@@ -399,14 +429,70 @@ async function runCard(cardId, cardPath) {
|
|
|
399
429
|
return { card: cardId, status: 'committed', commit: node.alreadyCommittedSha || '-', filesChanged: [], scopeFiles: [], archBaselinePath: `/tmp/arch-baseline-${cardId}.md`, gates }
|
|
400
430
|
}
|
|
401
431
|
|
|
402
|
-
const cardBrief = `${projectBrief}\n\nCard: ${cardId}\nCard YAML: ${cardPath}\nOwner agent: ${ownerAgent} · Review profile: ${reviewProfile}\nWorktree: ${sharedCtx.worktreePath} (cd into it)\nFile-ownership map: ${sharedCtx.ownershipMapPath}\
|
|
432
|
+
const cardBrief = `${projectBrief}\n\nCard: ${cardId}\nCard YAML: ${cardPath}\nOwner agent: ${ownerAgent} · Review profile: ${reviewProfile}\nWorktree: ${sharedCtx.worktreePath} (cd into it)\nFile-ownership map: ${sharedCtx.ownershipMapPath}\nNOTE: ACs already pre-classified as policy-deferred MUST NOT be implemented or routed — they are tracked as follow-ups.`
|
|
403
433
|
|
|
404
|
-
// --- Phase 1
|
|
434
|
+
// --- Phase 1 (B7, v4.26.0) — SPECIALIST decomposition: each Phase-1 duty runs as its own
|
|
435
|
+
// agent ("ognuno fa una cosa"), with handoff via /tmp files so the owner's context stays
|
|
436
|
+
// clean of exploration noise. The old single mega-prompt ("you ARE claim+architect+
|
|
437
|
+
// plan-auditor+owner") made the owner absorb the whole exploration into its context.
|
|
438
|
+
const baselinePath = `/tmp/arch-baseline-${cardId}.md`
|
|
439
|
+
const phase1 = { architect: 'inline-fallback', audit: 'skipped-no-drift' }
|
|
440
|
+
let arch = null
|
|
441
|
+
try {
|
|
442
|
+
arch = await agentSafe(
|
|
443
|
+
`You are the Phase-1 context retriever for card ${cardId} (per ${REF}/implement.md Phase 1 step 3 / 5b). cd into the worktree ${sharedCtx.worktreePath}.\n\n${cardBrief}\n\n` +
|
|
444
|
+
`Explore the codebase exactly as your system prompt mandates for this card's scope (requirements + files_likely_touched: ${JSON.stringify(node.filesLikelyTouched || [])}). Write your COMPLETE untruncated findings (file paths, type signatures, patterns, high-risk paths) to ${baselinePath} — refresh it if a stale copy exists. The owner agent and the per-card reviewers will Read that file; keep your structured return MINIMAL.\n\n` +
|
|
445
|
+
`Also report missingPaths: every path in files_likely_touched that does NOT exist in the worktree (factual ls/test check — no judgement). Return: { ok, missingPaths:[...], note }`,
|
|
446
|
+
{ label: `architect:${cardId}`, phase: 'Implement', agentType: 'codebase-architect',
|
|
447
|
+
schema: { type: 'object', required: ['ok'], additionalProperties: true, properties: { ok: { type: 'boolean' }, missingPaths: { type: 'array', items: { type: 'string' } }, note: { type: 'string' } } } }
|
|
448
|
+
)
|
|
449
|
+
if (arch && arch.ok) { phase1.architect = 'done'; g('phase1-architect', 'DONE', `baseline → ${baselinePath}${(arch.missingPaths || []).length ? ' · missing: ' + arch.missingPaths.join(', ') : ''}`) }
|
|
450
|
+
else g('phase1-architect', 'FALLBACK-INLINE', (arch && arch.note) || 'architect returned not-ok — owner explores inline')
|
|
451
|
+
} catch (e) {
|
|
452
|
+
if (e && e.transientExhausted) { noteDegraded('outage'); return { card: cardId, status: 'pending', gates } }
|
|
453
|
+
g('phase1-architect', 'FALLBACK-INLINE', `architect crashed (${String(e && e.message)}) — owner explores inline`)
|
|
454
|
+
}
|
|
455
|
+
const architectOk = phase1.architect === 'done'
|
|
456
|
+
|
|
457
|
+
// B7 — plan-auditor as a DETERMINISTIC DRIFT GATE, not an unconditional duplicate of the
|
|
458
|
+
// validation /prd already ran at card-creation time. It runs ONLY when execution-time drift
|
|
459
|
+
// is plausible: (a) a prior in-batch commit touched this card's declared surface (the
|
|
460
|
+
// strongest signal — the codebase moved under the card), (b) the architect found declared
|
|
461
|
+
// paths missing (stale card), or (c) review_profile 'deep' (Rule C escalation). Fresh
|
|
462
|
+
// single-card batches with no drift evidence skip it — /prd's validation stands.
|
|
463
|
+
const flt = (node.filesLikelyTouched || []).map(String)
|
|
464
|
+
const priorTouched = perCardResults.filter((r) => r.status === 'committed').flatMap((r) => r.filesChanged || [])
|
|
465
|
+
const inBatchOverlap = flt.length && priorTouched.some((f) => flt.some((c) => String(f).includes(c) || c.includes(String(f))))
|
|
466
|
+
const needAudit = reviewProfile === 'deep' || inBatchOverlap || (architectOk && (arch.missingPaths || []).length > 0)
|
|
467
|
+
let auditCorrections = []
|
|
468
|
+
if (needAudit && architectOk) {
|
|
469
|
+
try {
|
|
470
|
+
const audit = await agentSafe(
|
|
471
|
+
`Audit card ${cardId} for EXECUTION-TIME DRIFT per ${REF}/implement.md Phase 1 step 4 (you are plan-auditor; the card was already validated by /prd at creation time — your job is what changed SINCE). cd into ${sharedCtx.worktreePath}.\n\n${cardBrief}\nArchitecture baseline (Read it): ${baselinePath}\nDrift signals: ${inBatchOverlap ? 'prior in-batch commits touched this card surface; ' : ''}${(arch.missingPaths || []).length ? 'missing declared paths: ' + arch.missingPaths.join(', ') : ''}\n\n` +
|
|
472
|
+
`Check ONLY: (1) paths in files_likely_touched still exist; (2) type/field references in the requirements still correct per the baseline; (3) [ASSUMED] items now answerable from the code. Return PASS or the exact corrections — corrections amend the IMPLEMENTATION BRIEFING, never the backlog YAML.\n\nReturn: { verdict, corrections:[strings] }`,
|
|
473
|
+
{ label: `plan-audit:${cardId}`, phase: 'Implement', agentType: 'plan-auditor',
|
|
474
|
+
schema: { type: 'object', required: ['verdict'], additionalProperties: true, properties: { verdict: { enum: ['PASS', 'FIXES_NEEDED'] }, corrections: { type: 'array', items: { type: 'string' } } } } }
|
|
475
|
+
)
|
|
476
|
+
auditCorrections = (audit && audit.corrections) || []
|
|
477
|
+
phase1.audit = audit && audit.verdict === 'FIXES_NEEDED' ? 'fixes' : 'pass'
|
|
478
|
+
g('phase1-audit', phase1.audit === 'fixes' ? 'FIXES-APPLIED-TO-BRIEF' : 'PASS', `drift gate (${reviewProfile === 'deep' ? 'deep' : inBatchOverlap ? 'in-batch overlap' : 'missing paths'})${auditCorrections.length ? ' · ' + auditCorrections.length + ' corrections' : ''}`)
|
|
479
|
+
} catch (e) {
|
|
480
|
+
if (e && e.transientExhausted) { noteDegraded('outage'); return { card: cardId, status: 'pending', gates } }
|
|
481
|
+
phase1.audit = 'skipped-error'
|
|
482
|
+
g('phase1-audit', 'SKIPPED', `auditor crashed (${String(e && e.message)}) — proceeding without audit (non-blocking)`)
|
|
483
|
+
}
|
|
484
|
+
} else if (needAudit && !architectOk) { phase1.audit = 'skipped-no-baseline'; g('phase1-audit', 'SKIPPED', 'drift signaled but no baseline — owner re-derives context inline') }
|
|
485
|
+
else g('phase1-audit', 'SKIPPED', 'no drift evidence — /prd creation-time validation stands')
|
|
486
|
+
|
|
487
|
+
// --- Phase 2: dispatch the card's OWNER_AGENT (F-024), not general-purpose. ---
|
|
488
|
+
const phase1Brief = architectOk
|
|
489
|
+
? `Phase 1 already ran as specialist agents. READ the architecture baseline at ${baselinePath} BEFORE writing any code — it is your codebase context; do NOT redo the exploration.${auditCorrections.length ? `\nPlan-audit corrections (apply them as amendments to this briefing — do NOT modify the backlog YAML):\n${auditCorrections.map((c, i) => ` ${i + 1}. ${c}`).join('\n')}` : ''}`
|
|
490
|
+
: `Phase 1 fallback: the architect specialist was unavailable — do the Phase 1 claim+architect exploration yourself per ${REF}/implement.md and persist the baseline to ${baselinePath} before coding.`
|
|
405
491
|
let impl
|
|
406
492
|
try {
|
|
407
493
|
impl = await agentSafe(
|
|
408
|
-
`Implement card ${cardId} per ${REF}/implement.md
|
|
409
|
-
`POLICIES: G26 Phase-2 lint/tsc/test/build failing after the module's retry cap → buildBlocked:true + blockedGate. Build the AC Closure Ledger (one row per AC: implemented|unmet|policy-deferred). DO NOT silently defer; report unmet rows (excluding policy-deferred). Persist
|
|
494
|
+
`Implement card ${cardId} per ${REF}/implement.md Phase 2 — you ARE the owner_agent '${ownerAgent}' — and ${REF}/completeness.md (Phase 2.5 + 2.5b AC-closure ledger). Run all gates/bash yourself.\n\n${phase1Brief}\n\n${cardBrief}\n\n` +
|
|
495
|
+
`POLICIES: G26 Phase-2 lint/tsc/test/build failing after the module's retry cap → buildBlocked:true + blockedGate. Build the AC Closure Ledger (one row per AC: implemented|unmet|policy-deferred). DO NOT silently defer; report unmet rows (excluding policy-deferred). Persist the diff to /tmp/diff-${cardId}.txt.\n\n` +
|
|
410
496
|
`E4 OWNERSHIP RECONCILE (implement.md §11b — do this BEFORE returning): the card's MAY-EDIT includes files_likely_touched ∪ paths NAMED EXPLICITLY in this card's acceptance_criteria/definition_of_done (e.g. an ADR the DoD says to update, the data-model / ER doc for a schema change). Editing THOSE is in-scope. For any OTHER dirty file outside MAY-EDIT (another card's file, or unrelated): \`git checkout -- <file>\` to revert it (NEVER leave it orphaned), list it in revertedOutOfOwnership. Set fileDiffViolation:true ONLY if such an edit genuinely could not be reverted (then say why in note) — it is no longer a silent label.\n\n` +
|
|
411
497
|
`Return: { epic, buildBlocked, blockedGate, unmetACs:[{n,text}], scopeFiles, mayEditPaths, revertedOutOfOwnership:[paths], fileDiffViolation, note }`,
|
|
412
498
|
{ label: `implement:${cardId}`, phase: 'Implement', agentType: ownerAgent,
|
|
@@ -497,28 +583,41 @@ async function runCard(cardId, cardPath) {
|
|
|
497
583
|
const reviewSchema = { type: 'object', required: ['blocks', 'scopeExpansion'], additionalProperties: true,
|
|
498
584
|
properties: { blocks: { type: 'array', items: { type: 'object', additionalProperties: true } }, scopeExpansion: { type: 'array', items: { type: 'object', additionalProperties: true } }, note: { type: 'string' } } }
|
|
499
585
|
let reviewResults = []
|
|
586
|
+
const onErr = (e) => { if (e && e.transientExhausted) noteDegraded('outage'); return null }
|
|
587
|
+
// The standard SPECIALIST reviewer spawn — also reused as the JS-level fallback when the
|
|
588
|
+
// Codex companion dies at runtime (specialization integrity: the driver never reviews).
|
|
589
|
+
const stdReview = (ra) => agentSafe(
|
|
590
|
+
`You are ${ra}. Review card ${cardId} per ${REF}/review-cycle.md + ${REF}/codex-gate.md (your domain only). Run your gates on the COMMITTED-or-working state.\n\n${cardBrief}\nDiff: /tmp/diff-${cardId}.txt\n\n` +
|
|
591
|
+
`Report ONLY blocking failures that survive your retry cap as blocks:[{gate,domain,evidence}] (each MUST have non-empty gate AND evidence — F-014). Report legitimate findings BEYOND this card's AC as scopeExpansion:[{evidence,domain,withinOwnership,newAC}].\n\n` +
|
|
592
|
+
`Return: { blocks:[...], scopeExpansion:[...], note }`,
|
|
593
|
+
{ label: `review:${cardId}:${ra}`, phase: 'Implement', agentType: ra, schema: reviewSchema }
|
|
594
|
+
).catch(onErr)
|
|
500
595
|
try {
|
|
501
596
|
reviewResults = (await parallel(reviewers.map((ra) => () => {
|
|
502
|
-
const onErr = (e) => { if (e && e.transientExhausted) noteDegraded('outage'); return null }
|
|
503
597
|
if (ra === 'codex') {
|
|
504
|
-
// Codex-light finder: a general-purpose agent
|
|
598
|
+
// Codex-light finder: a general-purpose agent DRIVES the resolved companion (Bash,
|
|
599
|
+
// --wait). Driver role only — it never reviews; runtime failure → note flag, and the
|
|
600
|
+
// workflow spawns the real code-reviewer below.
|
|
505
601
|
return agentSafe(
|
|
506
|
-
`You are the Codex review
|
|
602
|
+
`You are the Codex review DRIVER for card ${cardId} (review_profile=light — Codex is the SOLE finder since v4.18.0; you are a driver, NOT a reviewer). Run the OpenAI Codex companion as a REVIEW-ONLY adversarial pass over this card's diff, then return its material findings in the schema below.\n\n${cardBrief}\nDiff: /tmp/diff-${cardId}.txt\nMAY-EDIT: ${JSON.stringify(mayEdit)}\n\n` +
|
|
507
603
|
`Run it in the FOREGROUND (it blocks; do NOT pass run_in_background):\n node "${sharedCtx.codexScriptPath}" task "Review-only — DO NOT make edits, no --write flag. Adversarial review of card ${cardId} using the diff at /tmp/diff-${cardId}.txt. Focus: auth/permission boundaries, data-loss paths, race conditions, rollback safety, schema drift, invariant violations. Report ONLY material findings with file+line evidence." --wait\n` +
|
|
508
|
-
`Read the Codex output ONLY through a [codex]-trace-stripping filter. **Fallback**: if it exits non-zero / prints CODEX_NOT_FOUND / stays empty,
|
|
604
|
+
`Read the Codex output ONLY through a [codex]-trace-stripping filter. **Fallback**: if it exits non-zero / prints CODEX_NOT_FOUND / stays empty, return note:'codex-unavailable' with EMPTY blocks/scopeExpansion — do NOT review yourself (role boundary); the workflow spawns the real code-reviewer.\n\n` +
|
|
509
605
|
`Map Codex BLOCKER/HIGH findings to blocks:[{gate:'codex-light',domain,evidence}] (each non-empty gate AND evidence — F-014). Map legitimate findings BEYOND this card's AC to scopeExpansion:[{evidence,domain,withinOwnership,newAC}].\n\n` +
|
|
510
606
|
`Return: { blocks:[...], scopeExpansion:[...], note }`,
|
|
511
607
|
{ label: `review:${cardId}:codex`, phase: 'Implement', agentType: 'general-purpose', schema: reviewSchema }
|
|
512
608
|
).catch(onErr)
|
|
513
609
|
}
|
|
514
|
-
return
|
|
515
|
-
`You are ${ra}. Review card ${cardId} per ${REF}/review-cycle.md + ${REF}/codex-gate.md (your domain only). Run your gates on the COMMITTED-or-working state.\n\n${cardBrief}\nDiff: /tmp/diff-${cardId}.txt\n\n` +
|
|
516
|
-
`Report ONLY blocking failures that survive your retry cap as blocks:[{gate,domain,evidence}] (each MUST have non-empty gate AND evidence — F-014). Report legitimate findings BEYOND this card's AC as scopeExpansion:[{evidence,domain,withinOwnership,newAC}].\n\n` +
|
|
517
|
-
`Return: { blocks:[...], scopeExpansion:[...], note }`,
|
|
518
|
-
{ label: `review:${cardId}:${ra}`, phase: 'Implement', agentType: ra, schema: reviewSchema }
|
|
519
|
-
).catch(onErr)
|
|
610
|
+
return stdReview(ra)
|
|
520
611
|
}))).filter(Boolean)
|
|
521
612
|
} catch (_) { /* parallel never rejects; nulls filtered */ }
|
|
613
|
+
// Specialization integrity — companion died at runtime: replace the empty driver result
|
|
614
|
+
// with a REAL code-reviewer pass (the same fallback the JS-level codexAvail gate uses).
|
|
615
|
+
if (reviewResults.some((r) => r && r.note === 'codex-unavailable')) {
|
|
616
|
+
reviewResults = reviewResults.filter((r) => !(r && r.note === 'codex-unavailable'))
|
|
617
|
+
g('review-codex', 'FALLBACK', 'companion failed at runtime → code-reviewer spawned (driver never reviews)')
|
|
618
|
+
const fb = await stdReview('code-reviewer')
|
|
619
|
+
if (fb) reviewResults.push(fb)
|
|
620
|
+
}
|
|
522
621
|
|
|
523
622
|
// F-014 — only route well-formed blocks (non-empty gate+evidence).
|
|
524
623
|
const blocks = reviewResults.flatMap((r) => (r.blocks || [])).filter((b) => b && b.gate && b.evidence)
|
|
@@ -576,7 +675,7 @@ async function runCard(cardId, cardPath) {
|
|
|
576
675
|
let commitRes
|
|
577
676
|
try {
|
|
578
677
|
commitRes = await agentSafe(
|
|
579
|
-
`Commit card ${cardId} in worktree ${sharedCtx.worktreePath}. MECHANICAL — do NOT re-read reference modules.\n` +
|
|
678
|
+
`Commit card ${cardId} in worktree ${sharedCtx.worktreePath}. MECHANICAL — do NOT re-read reference modules. ROLE BOUNDARY: you NEVER modify file contents except the card YAML status/note fields and the ssot-registry row — source/doc changes are not yours.\n` +
|
|
580
679
|
`Steps: (1) \`git status --porcelain\`; (2) stage = MAY-EDIT (${JSON.stringify(mayEdit)}) ∩ dirty — NEVER \`git add -A\`, NEVER \`git stash\`; if dirty has files OUTSIDE MAY-EDIT, do NOT stage them and set reconcileNote; (3) commit message \`[${cardId}] <concise>\`; ${doneStep} (5) 'nothing to commit' = already committed (record HEAD).\n` +
|
|
581
680
|
`On COMMIT_LOCK: clear stale lock + retry once. Still locked → committed:false.\n\n` +
|
|
582
681
|
`Return: { committed, commit, filesChanged, reconcileNote }`,
|
|
@@ -609,6 +708,8 @@ async function runCard(cardId, cardPath) {
|
|
|
609
708
|
// B6 — which reviewers ACTUALLY ran (the gated matrix); Phase Final keys its slim/skip
|
|
610
709
|
// decision on this, and per_card telemetry records it for the A/B spawn accounting.
|
|
611
710
|
reviewersRun: reviewers,
|
|
711
|
+
// B7 — Phase-1 specialist record (architect: done|inline-fallback; audit: pass|fixes|skipped-*).
|
|
712
|
+
phase1,
|
|
612
713
|
scopeFiles, archBaselinePath: `/tmp/arch-baseline-${cardId}.md`, gates,
|
|
613
714
|
}
|
|
614
715
|
}
|
|
@@ -795,6 +896,7 @@ if (!committed.length) {
|
|
|
795
896
|
mergeResult = await agentSafe(
|
|
796
897
|
`Auto-merge the batch worktree to ${TRUNK} per ${REF}/merge-cleanup.md (Phase 6 via /mw programmatic checksAlreadyPassed:true, Phase 6b status reconciliation, Phase 6c hygiene). Run git yourself.\n\n${projectBrief}\nWorktree: ${sharedCtx.worktreePath}\nBranch: ${sharedCtx.branch}\nmerge_strategy: ${mergeStrategy}\nCommitted cards: ${committed.map((r) => r.card).join(' ')}\nPhase-0 stash to restore (if any): see /tmp/batch-tracker-${firstCard}.md.\n\n` +
|
|
797
898
|
`DETERMINISTIC POLICIES (NO prompts):\n` +
|
|
899
|
+
`• ROLE BOUNDARY: you are the OPS/GIT agent — you NEVER edit source or doc files. Reconciliation touches ONLY card YAML status fields + registry rows. A merge conflict on content is leave+report, never hand-resolved here.\n` +
|
|
798
900
|
`• G24 → auto-merge via merge_strategy.\n` +
|
|
799
901
|
`• F-030 HARD RULE: NEVER \`git add\`/commit code that did not pass the per-card gates. If the worktree is dirty with uncommitted code → DO NOT commit it; leave it, set uncommittedLeft:true, and report. NO "safety commit". Security/migration code is NEVER swept in.\n` +
|
|
800
902
|
`• F-029 HARD RULE: Phase 6b reconciliation marks a card DONE ONLY if it has a real commit in ${TRUNK}..HEAD AND its gates are green. NEVER force a non-implemented card to DONE. Return forcedDone:[] (must be empty).\n` +
|
|
@@ -826,7 +928,7 @@ phase('Production')
|
|
|
826
928
|
if (mergeResult && mergeResult.merged) {
|
|
827
929
|
try {
|
|
828
930
|
prodReadiness = await agentSafe(
|
|
829
|
-
`Run the post-merge Production Readiness checklist per ${REF}/production-readiness.md (Phase 7) over the batch's changed files. Auto-EXECUTE only stack-matched index/access-rule/cron deploys; REPORT (do not execute) env vars, feature flags, DB migrations, secrets, DNS. NON-BLOCKING.\n\n${projectBrief}\nChanged files: ${dedupe(committed.flatMap((r) => r.filesChanged || [])).join(', ') || '(derive from git)'}\n\nReturn: { autoExecuted:[...], manualItems:[...], note }`,
|
|
931
|
+
`Run the post-merge Production Readiness checklist per ${REF}/production-readiness.md (Phase 7) over the batch's changed files. Auto-EXECUTE only stack-matched index/access-rule/cron deploys; REPORT (do not execute) env vars, feature flags, DB migrations, secrets, DNS. NON-BLOCKING. ROLE BOUNDARY: you EXECUTE commands, you never edit repository files — a needed code/config change is reported as a manual item.\n\n${projectBrief}\nChanged files: ${dedupe(committed.flatMap((r) => r.filesChanged || [])).join(', ') || '(derive from git)'}\n\nReturn: { autoExecuted:[...], manualItems:[...], note }`,
|
|
830
932
|
{ label: 'production-readiness', phase: 'Production', agentType: 'general-purpose',
|
|
831
933
|
schema: { type: 'object', required: ['manualItems'], additionalProperties: true, properties: { autoExecuted: { type: 'array', items: { type: 'string' } }, manualItems: { type: 'array', items: { type: 'string' } }, note: { type: 'string' } } } }
|
|
832
934
|
)
|
|
@@ -877,7 +979,7 @@ function buildTelemetry() {
|
|
|
877
979
|
// cost — total_tokens via budget.spent() delta; agent_count via counter; wall_clock_s stamped by the SKILL.
|
|
878
980
|
total_tokens: totalTokens,
|
|
879
981
|
agent_count: agentCount,
|
|
880
|
-
per_card: perCardResults.map((r) => ({ card: r.card, status: r.status, deferred: !!r.deferred, deferredClasses: r.deferredClasses || [], reviewers: r.reviewersRun || [], gates: (r.gates || []).length })),
|
|
982
|
+
per_card: perCardResults.map((r) => ({ card: r.card, status: r.status, deferred: !!r.deferred, deferredClasses: r.deferredClasses || [], reviewers: r.reviewersRun || [], phase1: r.phase1 || null, gates: (r.gates || []).length })),
|
|
881
983
|
stats_requested: !!FLAGS.stats,
|
|
882
984
|
}
|
|
883
985
|
}
|