baldart 4.48.0 → 4.49.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/CHANGELOG.md CHANGED
@@ -5,6 +5,26 @@ All notable changes to BALDART will be documented in this file.
5
5
  The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
6
6
  and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
7
7
 
8
+ ## [4.49.1] - 2026-06-17
9
+
10
+ **The four `/new` dynamic workflows shed their paragraph-length `meta.description` — the last prose that re-entered the orchestrator context on every workflow completion.** The `Workflow` tool contract specifies `description` as a *one-line* field shown in the permission dialog, but `new-card-review` / `new-final-review` / `new2` / `new2-resolve` each carried a full-paragraph rationale (~150–230 words) that the runtime echoes back into the orchestrator prefix on the completion line — pure prose with zero runtime value (the orchestrator already knows what it invoked). Each is collapsed to a single line; the full semantics were already duplicated in the script's own args-contract comment block (source code, never injected into context) plus the `references/*.md` SSOT, so nothing is lost. Same family as the v4.49.0 prose-leak closures, applied to the workflow layer. **PATCH** (no behaviour change — descriptions are display/permission-dialog text only; **no new `baldart.config.yml` key**).
11
+
12
+ ### Changed
13
+
14
+ - **`framework/.claude/workflows/{new-card-review,new-final-review,new2,new2-resolve}.js`** — `meta.description` collapsed from a full paragraph to one line each (e.g. `'Per-wave review+fix cluster for /new (off-context).'`). A two-line comment above each marks why it must stay a one-liner (re-enters orchestrator context on completion) and points to the args-contract comment block + reference module that hold the full semantics.
15
+
16
+ ## [4.49.0] - 2026-06-17
17
+
18
+ **Caveman-style terse contracts close the last prose-leak the v4.47.0 turn-economy left out of scope: `codebase-architect` gets an opt-in `OUTPUT=terse` grounding mode, and `/new`'s orchestrator is forced near-silent.** A study of the [caveman](https://github.com/JuliusBrussee/caveman) skill (compress *how the agent talks*, not *what it builds* — output tokens only, reasoning untouched) plus a **prose-leak surface map of BALDART's own fleet** found the surface already covered almost everywhere — finders emit YAML/schema (`code-reviewer`, auditors, `plan-auditor`, `prd-card-writer`, `coder` completion-report), `senior-researcher` is file-resident, `context-primer` is already capped at a 30–50-line XML — with ONE genuinely-uncovered prose exploder: `codebase-architect`, whose budget cap is *input*-side only (`codebase-architect.md`), whose "Communication Style" explicitly licenses narrative prose, and whose return is outside the v4.47.0 rule by design (`new/SKILL.md` § "Context economy": *"It does NOT change any agent's return contract"*). The earlier ponytail-style "cap all finder prose" idea was **refuted** by a 3-skeptic adversarial pass (finders already capped + data-driven; new2 already isolates subagent output; driver is turn-count) — this release ships only the two data-validated survivors. A follow-up **return-consumption audit** of the whole fleet confirmed the floor is already reached everywhere else — finders/auditors return a parsed schema or override-to-`## FINDINGS`, and `qa-sentinel`/`doc-reviewer`/`senior-researcher`/`prd-card-writer` are file-resident with a thin manifest return; the only remaining lever was wiring the new `OUTPUT=terse` flag into the remaining machine-to-machine `codebase-architect` call-sites (`context-primer`, `/prd` discovery dimension-resolution + ISA spawns). The near-silent rule is scoped to `/new` only — `/prd` and `/bug` are conversational, where the prose IS the deliverable (caveman's own Auto-Clarity principle: never compress what a human reads to decide). `/new` is the opposite: not conversational — the user keeps the orchestrator attached only so it can surface a genuine decision, so its running narration serves no one. **MINOR** (additive agent capability + skill behaviour; **no new `baldart.config.yml` key** ⇒ schema-change propagation rule N/A; no removed surface).
19
+
20
+ ### Changed
21
+
22
+ - **`framework/.claude/agents/codebase-architect.md`** — new **"Output mode: terse grounding (opt-in)"** section: when the invoking prompt carries `OUTPUT=terse`, the architect emits the structured substance only (the `## Reuse Analysis` table + `## Canonical Evidence` block unchanged, affected code as `path:line — symbol — pattern/role` rows, high-risk paths one-line each, a closing `totals:`), drops the high-level-overview/​"why" narrative, keeps code identifiers verbatim, and keeps an **Auto-clarity carve-out** (`⚠` one-liner for a genuine ambiguity/risk a row can't carry). The existing "Communication Style" is retitled *(interactive / explanatory invocations only)* — without the flag the narrative path is unchanged.
23
+ - **`framework/.claude/skills/{new/references/implement.md` (Phase 1 step 3), `bug/SKILL.md` (Phase 0 step 3), `context-primer/SKILL.md` (loader spawn template), `prd/references/discovery-phase.md` (dimension-resolution spawn + ISA spawn)}** — every machine-to-machine `codebase-architect` grounding spawn now passes `OUTPUT=terse`. The `/new` baseline persisted verbatim at step 5b stays lean while keeping everything a lean `/codexreview` needs (paths, signatures, patterns, high-risk paths).
24
+ - **`framework/.claude/skills/new/SKILL.md`** — new **HARD RULE in § "Context economy": "orchestrator is near-silent / maximally cryptic"** — the orchestrator emits **nothing user-facing during the run** (no progress, no "now I will…", no per-phase recap, no agent-activity description, no tracker restatement, no tool-call narration) and speaks in exactly **three** cases: (1) decision gates — `AskUserQuestion` + security/​irreversible-action confirmations, full clarity, never compressed; (2) genuine blockers / items needing user action, terse and exact; (3) one minimal end-of-batch result block (one line per card + actionable residue). Governs the orchestrator's *prose*, not any agent return contract.
25
+ - **`framework/.claude/agents/plan-auditor.md`** — the FULL-mode `## OUTPUT FORMAT` 10-section block is compressed in place (~78 → ~22 lines): same 10 sections + every load-bearing semantic (verdict definitions, `Priority ≥ 16 → BLOCK`, the three distinct pre-mortem root-cause classes), with the verbose per-section sub-bullet templates collapsed to one-line specs. This is dormant weight on the common QUICK-mode spawns (which explicitly skip it), trimmed from every spawn's system prompt. Compressed **in place** rather than extracted to an on-demand reference — agents don't Read external format files (that's a skill-only pattern), so extraction would introduce a cross-context path dependency not worth ~1k tokens. (Return-consumption audit verdict: this is a per-spawn *subagent input* cost, not the orchestrator's replayed-context multiplier — a deliberately small, low-risk trim, the only one of the dormant-format set worth touching.)
26
+ - **`framework/.claude/skills/new/references/final-review.md`** (Step F.6 item 4) — the per-card + batch **13-field summary report** is collapsed to the minimal result block (`CARD-ID: merged @<sha> | failed: … | deferred → …` one line per card + actionable residue only). All process telemetry (files changed, test/build/lint status, fix cycles, review counts, QA verdict, commit/merge hashes, cleanup, knowledge-sync) is no longer rendered — it lives in the tracker + `skill-runs.jsonl`. The Next Steps & Launch Command (4b) and Production Readiness manual steps stay (they are actionable residue).
27
+
8
28
  ## [4.48.0] - 2026-06-16
9
29
 
10
30
  **`new2` is relaxed: a post-batch, interactive-only escape hatch hands genuinely-blocked hard cases to `/new`'s real human gate — without re-introducing rigidity, a twin, or breaking the A/B.** `new2` runs the batch autonomously, so a card the deterministic policy can't salvage becomes a tracked follow-up + is left `IN_PROGRESS` — the "edge case that wants human intelligence" the user flagged. A 3-lens **adversarial review before implementation** killed the obvious fix (a Step-5 `AskUserQuestion` that lets the skill implement/resolve the residual): (1) **correctness** — the workflow auto-merges and removes the worktree *before* Step 5, so a skill-side fix has no worktree, lands unreviewed code on trunk, and bypasses the F-029 DONE gate; (2) **value** — on the only run with a full `deferral_breakdown` (14 residuals) a post-batch fix changes the outcome of *zero* of them (the batch already merged); only a mid-batch checkpoint could salvage-before-merge, and the data (n=1) doesn't justify breaking `new2`'s background/no-poll contract; (3) **prior-art** — it would twin `/new`'s Phase 2.5b AC-Closure gate and muddy the autonomous-vs-`/new` A/B. What survived all three: the sound escalation is **not to re-implement a gate but to invoke `/new` on the already-materialised follow-up** — `/new` owns the real worktree+review+AC-Closure+F-029+merge pipeline. **MINOR** (additive skill behaviour, interactive-only, autonomous-mode-safe; **no new `baldart.config.yml` key** ⇒ schema-change propagation rule N/A; no removed surface).
package/VERSION CHANGED
@@ -1 +1 @@
1
- 4.48.0
1
+ 4.49.1
@@ -276,7 +276,30 @@ canonical doc already told you.
276
276
  - Documentation sync: code and docs must stay aligned
277
277
  - Backlog-driven work: features are tracked in `${paths.backlog_dir}/*.yml` files
278
278
 
279
- **Communication Style:**
279
+ **Output mode: terse grounding (opt-in, since v4.49.0):**
280
+
281
+ When the invoking prompt contains `OUTPUT=terse` (architecture grounding handed to
282
+ an implementer or reviewer — not an interactive explanation for a human to read),
283
+ emit the structured substance ONLY, no narrative. Brain big, mouth small. Your
284
+ return value is injected verbatim into the caller's context, so every word of
285
+ prose costs them context budget on every subsequent turn.
286
+
287
+ - **Emit, unchanged (these ARE the contract):** the `## Reuse Analysis` table and
288
+ the `## Canonical Evidence` block.
289
+ - **Affected code as rows, not paragraphs:** `path:line — symbol — pattern/role`.
290
+ - **High-risk paths / integration points, one line each:** `path:line — risk`.
291
+ - **Close with** a `totals:` line (files, reuse candidates, high-risk paths).
292
+ - **Drop:** the high-level-overview paragraph, the "why" essays, restating the
293
+ request, any prose a row already carries.
294
+ - **Keep verbatim:** file paths, symbols, type signatures, error strings — never
295
+ abbreviate a code identifier.
296
+ - **Auto-clarity carve-out:** a genuine ambiguity, drift, or safety-relevant
297
+ caveat a row can't carry gets ONE plain short line flagged `⚠`. Never compress a
298
+ warning into ambiguity.
299
+
300
+ Without `OUTPUT=terse`, use the narrative style below.
301
+
302
+ **Communication Style (interactive / explanatory invocations only):**
280
303
 
281
304
  - Start with a high-level overview, then dive into specifics
282
305
  - Use precise technical terminology from `coding-standards.md`
@@ -471,85 +471,23 @@ Every HIGH and MEDIUM finding MUST be emitted in this exact shape so it can be p
471
471
 
472
472
  LOW findings can be one-liners with target tag (no full schema).
473
473
 
474
- ## OUTPUT FORMAT (MANDATORY — USE THIS EXACT STRUCTURE)
475
-
476
- ### 1) Executive Verdict
477
- - **Verdict**: {PASS | PASS WITH FIXES | BLOCK | NEEDS_REPLAN}
478
- - **Coverage Score**: <N>/8 audit categories passed (A–H)
479
- - **Memory matches**: <N> known pitfalls applied to this audit
480
- - **High-Risk Path Triggers**: [list or "none"] if any: "Per-card `/codexreview` MUST run BEFORE merge"
481
- - **Specialist auto-spawn**: [list of agents spawned, or "none triggered"]
482
- - **Active Code Context drift**: [list of colliding cards, or "no collision"]
483
- - **Tool budget used**: <reads>/20 reads, <greps>/30 greps
484
- - 3–7 bullet reasons (highest impact first)
485
-
486
- **Verdict definitions:**
487
- - `PASS`: thorough, explicit, ready for implementation. (Rare.)
488
- - `PASS WITH FIXES`: structurally sound, gaps enumerated and fixable before coding starts.
489
- - `BLOCK`: gap that would cause production incident, data loss, security breach, or mid-sprint rewrite. Work must not start.
490
- - `NEEDS_REPLAN`: framing is wrong, not just details. Premise / approach / scope is incorrect; restart planning, do not patch.
491
-
492
- ### 2) High-Risk Gaps (Must Fix)
493
- Findings list in YAML schema, grouped by category (integrity, architecture, security, reliability, testing, api-perf, traceability, verification, adr, drift, high-risk-path, simulation, injection).
494
-
495
- For backlog card findings, include the `target` field per TARGET TAG SYSTEM.
496
-
497
- ### 3) Plan Simulation Findings
498
- Walk through each step that broke during simulation. Format:
499
- - **Step N**: <step description>
500
- - **Broken invariant**: <what assumption fails>
501
- - **Why**: <causal chain>
502
- - **Fix**: <reorder, add prerequisite step, add rollback>
503
-
504
- ### 4) Assumptions & Hidden Dependencies
505
- - List each assumption found in or implied by the plan
506
- - For each: **Confidence level** (High / Med / Low) + **How to validate quickly**
507
-
508
- ### 5) Blocking Questions (If any)
509
- - Numbered list of precise questions
510
- - For each: provide **recommended default if unanswered** + **trade-off explanation**
511
-
512
- ### 6) Hardened Plan (Rewritten)
513
- - Provide a revised plan with:
514
- - Phases (numbered, with clear entry/exit criteria)
515
- - Tasks per phase (with estimated complexity: S/M/L)
516
- - Acceptance criteria per phase (testable, specific)
517
- - Explicit dependencies between phases and external systems
518
- - Rollout + rollback strategy
519
- - Instrumentation requirements (what to log, what to alert on)
520
- - Test plan (what to test at each phase)
521
-
522
- ### 7) Quantified Risk Register
523
- Format per risk: **Risk** | **Impact (1-5)** | **Likelihood (1-5)** | **Priority (I×L)** | **Severity** | **Mitigation** | **Owner**
524
-
525
- Rank by Priority descending. Highlight any Priority ≥ 16 as **BLOCK**.
526
-
527
- ### 8) Three-Scenario Pre-Mortem (counterfactual diversity)
528
- Generate 3 INDEPENDENT failure scenarios, each with a DIFFERENT root cause class:
529
-
530
- **Scenario A — Data Corruption** ("It's 30 days later and silent data loss occurred because…")
531
- - Causal chain
532
- - Top 3 failure modes feeding this scenario
533
- - How the hardened plan prevents each
534
-
535
- **Scenario B — Security Breach** ("It's 30 days later and a customer escalated unauthorized access because…")
536
- - Causal chain
537
- - Top 3 failure modes
538
- - Mitigation in hardened plan
539
-
540
- **Scenario C — Scale Collapse** ("It's 30 days later and prod is throttling because…")
541
- - Causal chain
542
- - Top 3 failure modes
543
- - Mitigation in hardened plan
544
-
545
- If a scenario class doesn't apply to this plan (e.g. no data writes → no Scenario A), state explicitly and skip — but argue why it doesn't apply.
546
-
547
- ### 9) Hallucinated Findings Dropped (CoVe)
548
- Findings disproven by Chain-of-Verification. Format:
549
- - **Finding title** — Verification: <command> → <result> → dropped because <reason>
550
-
551
- ### 10) Suppressed Findings (Challenge Pass)
552
- Already documented in the suppressed-findings collapsible block above.
474
+ ## OUTPUT FORMAT — FULL mode only (MANDATORY — USE THESE 10 SECTIONS)
475
+
476
+ > QUICK mode does NOT use this section — it returns `VERDICT: …` + numbered fixes (see top). The
477
+ > 10-section structure below is for a standalone/holistic FULL audit; emit every section in order.
478
+ > Terse spec one line per field, no prose padding; substance and exact values verbatim.
479
+
480
+ 1. **Executive Verdict** — `Verdict` {PASS | PASS WITH FIXES | BLOCK | NEEDS_REPLAN}; Coverage <N>/8 (A–H); Memory matches <N>; High-Risk Path Triggers [list|none] (if any: "Per-card `/codexreview` MUST run BEFORE merge"); Specialist auto-spawn [list|none]; Active-Code-Context drift [colliding cards|none]; Tool budget <reads>/20, <greps>/30; 3–7 reasons, highest impact first.
481
+ - **Verdicts:** `PASS` ready (rare) · `PASS WITH FIXES` sound, gaps fixable pre-coding · `BLOCK` gap → prod incident / data loss / security breach / mid-sprint rewrite, do not start · `NEEDS_REPLAN` premise/approach/scope wrong, restart not patch.
482
+ 2. **High-Risk Gaps** findings in the FINDINGS YAML SCHEMA above, grouped by category; include `target` for card findings.
483
+ 3. **Plan Simulation Findings** per broken step: `Step N` · broken invariant · why (causal chain) · fix (reorder / add prerequisite / add rollback).
484
+ 4. **Assumptions & Hidden Dependencies** — each assumption + confidence (High/Med/Low) + how to validate quickly.
485
+ 5. **Blocking Questions** — numbered; each + recommended default if unanswered + trade-off.
486
+ 6. **Hardened Plan** — phases (entry/exit criteria) · tasks S/M/L · per-phase ACs (testable) · explicit deps (phases + external) · rollout+rollback · instrumentation (log/alert) · test plan.
487
+ 7. **Quantified Risk Register** — `Risk | Impact 1-5 | Likelihood 1-5 | Priority I×L | Severity | Mitigation | Owner`, ranked by Priority desc; Priority ≥ 16 → **BLOCK**.
488
+ 8. **Three-Scenario Pre-Mortem** 3 INDEPENDENT "30 days later…" scenarios, each a DIFFERENT root-cause class (A Data Corruption / B Security Breach / C Scale Collapse): causal chain + top-3 failure modes + how the hardened plan prevents each. A class that can't apply (e.g. no writes → no A) is stated and skipped with the reason.
489
+ 9. **Hallucinated Findings Dropped (CoVe)** `Finding Verification: <command> <result> dropped because <reason>`.
490
+ 10. **Suppressed Findings (Challenge Pass)** per the suppressed-findings block above.
553
491
 
554
492
  ## TONE & STYLE
555
493
 
@@ -60,6 +60,9 @@ See [framework/docs/MCP-INTEGRATION.md](../../../docs/MCP-INTEGRATION.md) for th
60
60
  - **Auth** → the project's auth middleware/guard + auth error codes (per `stack.auth_provider`)
61
61
  - **Build/Deploy** → the deploy platform's logs + build output (per `stack.deployment`, e.g. Vercel)
62
62
  3. Launch `codebase-architect` agent to map the affected code paths — do NOT start debugging blind.
63
+ Pass `OUTPUT=terse` (since v4.49.0): you want the structured map (`path:line — symbol` rows,
64
+ high-risk paths, `totals:`), not a narrative essay — its return is injected verbatim into this
65
+ investigation context, so keep it lean.
63
66
  When `features.has_lsp_layer: true` and the symptom names a concrete symbol
64
67
  (function/type/handler), `codebase-architect` will use LSP find-references /
65
68
  go-to-definition to locate the exact callsites instead of grepping. See
@@ -61,6 +61,10 @@ precedence caveats: `framework/agents/effort-protocol.md`.
61
61
  Load context about: {keywords}
62
62
  Task type: {task_type}
63
63
 
64
+ OUTPUT=terse — you are a silent loader's machine consumer, not a human Q&A. Return the
65
+ structured contract (reuse table + canonical evidence + `path:line — symbol` rows + `totals:`)
66
+ with no narrative prose; the loader condenses this into the XML context block.
67
+
64
68
  TOKEN BUDGET: stay under 15K tokens total. Prefer the canonical router + targeted reads over broad file scans.
65
69
 
66
70
  SEARCH STRATEGY — canonical-router-first, verify-only-if-needed:
@@ -207,6 +207,24 @@ baselines. Keep that bulk on disk and pass **paths**, not bodies.
207
207
  > completion — end your turn after spawning; never `sleep N; echo "waiting…"` (already stated for
208
208
  > team mode in `references/team-mode.md` — it applies to every barrier in both modes).
209
209
  >
210
+
211
+ > **HARD RULE — orchestrator is near-silent / maximally cryptic (since v4.49.0).** `/new` is NOT
212
+ > conversational: the user does not read its running narration and keeps the orchestrator attached ONLY so it
213
+ > can surface a genuine decision when something unpredictable needs a human. So emit **nothing user-facing
214
+ > during the run** — no progress, no "now I will…", no per-phase recap, no description of what an agent is
215
+ > doing, no tracker restatement, no tool-call narration (each also costs a full replayed-context turn per
216
+ > § turn-count above; the tracker is the only state surface). The orchestrator speaks to the user in exactly
217
+ > THREE cases, and nowhere else:
218
+ > 1. **Decision gates — full clarity, NEVER compress:** `AskUserQuestion` prompts and any security /
219
+ > irreversible-action confirmation. This is the sole reason the human is in the loop — it must be readable.
220
+ > 2. **Genuine blockers / items needing user action** — surfaced terse and exact (`CARD-ID`, `path:line`,
221
+ > error string verbatim), no prose around them.
222
+ > 3. **One minimal end-of-batch result block** — one line per card (`CARD-ID: merged @<sha> | failed:
223
+ > <one clause> | deferred → <followup-id>`) plus only the actionable residue (unresolved/flagged items,
224
+ > the launch command, any production-readiness manual step). NOT a per-phase report — process telemetry
225
+ > lives in the tracker + `skill-runs.jsonl`, never rendered.
226
+ > This governs the orchestrator's *prose*, not any agent's return contract (that is § "Context economy" above).
227
+ >
210
228
  > These rules reduce the *number* of replays; the bulk-content rule above reduces the *size* of each.
211
229
  > Both compound — apply them together.
212
230
 
@@ -295,21 +295,10 @@ that is a **gate violation**: log it as
295
295
  If the project ships a `knowledge-sync` corpus agent (declared in `.baldart/overlays/new.md`), invoke it after doc updates so the external corpus stays aligned. If no such agent exists, skip this step with a one-line notice (do NOT dispatch a non-existent agent).
296
296
 
297
297
  3. **Proceed to Phase 6** (post-batch merge & cleanup).
298
- 4. Present a **single summary report** to the user per card (and a batch summary at the end):
299
- - **Files changed** (short list per card)
300
- - **Test results** (new tests + existing tests count, pass rate at each iteration)
301
- - **Build/lint status** (pass + retry count if any)
302
- - **Fix cycles** (total number of self-healing retries across phases)
303
- - **Final review** (findings: N raised / M verified / K false positives | fixes applied: N | highest severity)
304
- - **UX testing** (PASS/FAIL/SKIP | test file path if written)
305
- - **QA result** (profile: skip/light/balanced/deep | verdict: PASS/FAIL/SKIP | confidence % | failing gates: which of lint/tsc/test/build/markdownlint, or "none" | findings file path). E2E is reported separately (Phase 2.6 e2e-review), not by qa-sentinel.
306
- - **Issues needing user attention** (anything unresolved, partially wired, or flagged)
307
- - **Commit hashes** (from tracker)
308
- - **Merge commit hash** (from Phase 6)
309
- - **Card status reconciliation** (Phase 6b: N cards verified DONE, K force-updated)
310
- - **Worktree cleanup status** (success/failed)
311
- - **Knowledge-corpus sync status** (from Step F.6 item 2 "Knowledge Base Sync" — COMPLETED/FAILED/PARTIAL, or SKIPPED if no corpus-sync agent is configured)
312
- - Overall implementation status
298
+ 4. Emit the **minimal end-of-batch result block** (per § "orchestrator is near-silent" in the core SKILL.md — this is NOT a per-phase report). ONE line per card and nothing more of the process telemetry:
299
+ - `<CARD-ID>: merged @<sha> | failed: <one clause> | deferred → <followup-id>`
300
+
301
+ Then, **only if non-empty**, the **actionable residue**: issues needing user attention (anything unresolved, partially wired, or flagged — terse, with `path:line` / `CARD-ID` exact). All process telemetry — files changed, test/build/lint status, fix cycles, review-finding counts, UX/QA verdicts, commit + merge hashes, card-status reconciliation, worktree cleanup, knowledge-corpus sync — is **NOT rendered**: it lives in the tracker and `$METRICS/skill-runs.jsonl`. (The Next Steps & Launch Command in 4b below and any Production Readiness manual steps ARE part of the actionable residue — keep them.)
313
302
 
314
303
  4b. **Next Steps & Launch Command** (MANDATORY section — always present):
315
304
 
@@ -10,7 +10,7 @@
10
10
  - If `DONE` (or the user chose "proceed anyway") → continue.
11
11
  2b. **Claim** — only now set the card status to `IN_PROGRESS` (in the backlog YAML / tracker) and assign yourself. This is a state write, not a visibility emission — do not also spend a TaskUpdate / Progress-Bar turn (§ "State surface — the tracker only").
12
12
  2c. **Trivial-card classification (BLOCKING gate for steps 3–4)** — evaluate `IS_TRIVIAL(card)` per § "Trivial-card fast-lane". Note: condition 3 (non-source diff) cannot be fully evaluated until the coder has produced the diff, so at this point compute the **provisional** trivial flag from conditions 1+2 only (`review_profile == skip` AND no Step-A trigger sourced from the card YAML text + `files_likely_touched` extensions — if EVERY path in `files_likely_touched` is non-source, condition 3 is provisionally satisfied). If provisionally trivial → **SKIP steps 3, 3a, and 4** (architecture grounding); log `trivial: architecture grounding skipped (review_profile=skip + non-source files_likely_touched + 0 triggers)` and jump to Phase 2. Re-confirm `IS_TRIVIAL` on the ACTUAL committed diff at the review gates (Phase 2.55/3.5/3.7); if the coder unexpectedly touched a source file, the guard flips the card back onto the normal review path there. If NOT provisionally trivial → run steps 3, 3a, 4 as normal.
13
- 3. **(skip when provisionally trivial — see 2c)** Invoke the **codebase-architect** agent (MUST per AGENTS.md) to understand the relevant codebase area, existing patterns, and architecture before any implementation. When `features.has_lsp_layer: true`, the architect uses LSP find-references for identifier-shaped lookups — this needs NO handoff from the orchestrator: the architect reads `features.has_lsp_layer` from `baldart.config.yml` directly (the flag is ambient) per `agents/code-search-protocol.md`. Likewise, when `features.has_code_graph: true`, the architect uses the Graphify code graph for structural/relational lookups (ambient flag) per `agents/code-graph-protocol.md`. The orchestrator does NOT propagate either flag. (Earlier doc versions numbered this step 4; the step that read project-status BEFORE the architect was removed because it persisted pre-analysis context — see step 3a.)
13
+ 3. **(skip when provisionally trivial — see 2c)** Invoke the **codebase-architect** agent (MUST per AGENTS.md) to understand the relevant codebase area, existing patterns, and architecture before any implementation. When `features.has_lsp_layer: true`, the architect uses LSP find-references for identifier-shaped lookups — this needs NO handoff from the orchestrator: the architect reads `features.has_lsp_layer` from `baldart.config.yml` directly (the flag is ambient) per `agents/code-search-protocol.md`. Likewise, when `features.has_code_graph: true`, the architect uses the Graphify code graph for structural/relational lookups (ambient flag) per `agents/code-graph-protocol.md`. The orchestrator does NOT propagate either flag. (Earlier doc versions numbered this step 4; the step that read project-status BEFORE the architect was removed because it persisted pre-analysis context — see step 3a.) **Pass `OUTPUT=terse` in the architect prompt (since v4.49.0)** — this is grounding for the coder/reviewer, not an explanation, so the architect returns its structured contract (reuse table + canonical evidence + `path:line — symbol` rows + `totals:`) without narrative prose. The verbatim baseline persisted at step 5b keeps all the substance a lean `/codexreview` needs (paths, signatures, patterns, high-risk paths) while staying small in the orchestrator's context and on disk.
14
14
  3a. Update `${paths.references_dir}/project-status.md` Active Code Context (skip when the file does not exist in the project) — do this AFTER the codebase-architect run (step 3) so the "Active Code Context" reflects the architect's findings (which files are actually in scope), not just the card YAML's `files_likely_touched`. Writing it before the architect run would persist pre-analysis claims that downstream agents (e.g. a parallel card) would then read as truth.
15
15
  4. **Plan-auditor grounding check** — but first, a **provenance skip (since v4.7.0 — mirror of Step 3d)**: the `/prd` Step 6.6 holistic audit already ran a per-card grounding pass when the card was authored. Re-grounding here is only worth a plan-auditor spawn if something changed since. Skip the plan-auditor when **BOTH** hold:
16
16
  - **P1 — provenance present**: the card has `metadata.holistic_audit` with non-empty `audited_at`, `audited_commit`, `audited_set`.
@@ -663,9 +663,11 @@ section of the state file. Mark dimension as covered.
663
663
  If more specific detail is needed beyond what context-primer provided,
664
664
  invoke `codebase-architect` agent (foreground) with a targeted question:
665
665
  ```
666
+ OUTPUT=terse — machine consumer (discovery dimension resolution), not human Q&A.
666
667
  For feature `<slug>`: regarding the dimension `<dimension>`, does the codebase
667
668
  already answer this? Specifically check: existing collections, routes, components,
668
- permissions, and patterns. Report what you find with file references.
669
+ permissions, and patterns. Report what you find with file references — structured
670
+ rows (`path:line — symbol — note`) + `totals:`, no narrative prose.
669
671
  ```
670
672
  Avoid launching codebase-architect when the context-primer output already
671
673
  provides sufficient information for the dimension.
@@ -743,6 +745,8 @@ Instead, run the ISA scan and present findings for validation.
743
745
  1. Invoke `codebase-architect` agent with this specific prompt:
744
746
 
745
747
  ```
748
+ OUTPUT=terse — machine consumer (ISA table builder), not human Q&A: per-touchpoint
749
+ rows only, no narrative overview.
746
750
  For feature `<slug>`: perform an Integration Surface Analysis (ISA).
747
751
  Find ALL existing code that should reference, link to, or interact with this
748
752
  new feature. Scan these categories systematically:
@@ -1,7 +1,8 @@
1
1
  export const meta = {
2
2
  name: 'new-card-review',
3
- description:
4
- "Per-wave review+fix cluster for /new. Takes 1..N co-located cards (1 = sequential per-card; N = a team-mode group/wave) and runs the whole review fan-out OUTSIDE the orchestrator context: Simplify + cross-model Codex (agent-launched, code-reviewer fallback) + qa-sentinel gates + security-reviewer (high-risk only), each specialist FP-checking its OWN findings; then ONE coder applies all VERIFIED code/perf/security/simplify findings in a single pass (files disjoint by ownership) and re-verifies lint/tsc/build. Doc-review and api-perf are deliberately OUT (doc runs post-E2E in the skill; api-perf is deferred to the Final FULL gate). Returns a compact {perCard:{fixesApplied,residual}, gateTable, summary} — the only object that re-enters the orchestrator prefix. Maps to references/review-cycle.md (Phase 2.55+3.5+3.7) and references/team-mode.md (Step D.2-D.4b). NOT new2: no human gates (returned as residual), no AC-closure, no E2E, no merge/commit/backlog.",
3
+ // One-liner only: this string re-enters the orchestrator context on completion.
4
+ // Full semantics live in the args-contract comment block below + references/review-cycle.md.
5
+ description: 'Per-wave review+fix cluster for /new (off-context).',
5
6
  phases: [
6
7
  { title: 'Baseline', detail: 'architecture grounding for the wave scope' },
7
8
  { title: 'Discovery', detail: 'parallel finders per card (simplify / codex / qa / security)' },
@@ -1,7 +1,8 @@
1
1
  export const meta = {
2
2
  name: 'new-final-review',
3
- description:
4
- "Cross-batch final code review for /new. Fans out a multi-agent review (Codex primary + doc-reviewer + api-perf-cost-auditor + qa-sentinel gates) over the WHOLE batch diff. Each domain specialist OWNS its lane end-to-end — it FP-checks its own findings in the finding pass, so there is no generic code-reviewer re-judge over another specialist's domain (cross-domain) nor over its own findings (self-judge); any residual unresolved finding is routed to its domain specialist. Read-only: returns classified findings + a gate table, applies NO fixes (the calling skill owns fix application + user gates). Maps to references/final-review.md steps F.2–F.4.",
3
+ // One-liner only: this string re-enters the orchestrator context on completion.
4
+ // Full semantics live in the args-contract comment block below + references/final-review.md.
5
+ description: 'Cross-batch final code review for /new (read-only, off-context).',
5
6
  phases: [
6
7
  { title: 'Baseline', detail: 'architecture grounding for the batch scope (F.2)' },
7
8
  { title: 'Review', detail: 'parallel multi-agent review of the batch diff (F.3)' },
@@ -1,7 +1,8 @@
1
1
  export const meta = {
2
2
  name: 'new2-resolve',
3
- description:
4
- "Self-healing resolution pass for the autonomous new2 batch workflow. Called by new2 whenever a deterministic gate would otherwise need a human: a card fail/blocker (ac-unmet | blocker | qa-fail | e2e-blocked | merge-blocker) or a legitimate scope-EXPANDING finding (scope-expansion). Tier-1 targeted fix with a TERMINAL short-circuit (skips the costly multi-attempt when the problem is impossible-by-definition, verified not trusted), then a judged multi-attempt; a MANDATORY adversarial judge cross-checks every verified claim against the real diff (prevents fabricated success) whenever the judge is a DIFFERENT specialist than the fixer. When fixer === judge (doc→doc-reviewer, security→security-reviewer) the second pass would just judge its own work — no cross-model diversity, pure waste — so the judge AND the terminal-verdict ratification are skipped (the reviewer-writer self-verifies in its single pass). Specialized per domain (doc→doc-reviewer single pass, ui→ui-expert fix + code-reviewer judge, security→security-reviewer single pass, perf→api-perf-cost-auditor judge). Terminal is a tracked follow-up. Accepts a `findings` list (batched per area). Uses agent()/parallel() only — no nested workflows.",
3
+ // One-liner only: this string re-enters the orchestrator context on completion.
4
+ // Full semantics live in the args-contract comment block below.
5
+ description: 'Self-healing resolution pass for the autonomous new2 batch workflow.',
5
6
  phases: [
6
7
  { title: 'Diagnose', detail: 'classify + terminal short-circuit + scope-expansion boundary' },
7
8
  { title: 'Repair', detail: 'targeted fix, then judged multi-attempt if needed' },
@@ -1,7 +1,8 @@
1
1
  export const meta = {
2
2
  name: 'new2',
3
- description:
4
- "EXPERIMENTAL workflow host for /new (A/B testing). Runs an ENTIRE backlog-card batch autonomously in the background runtime — pre-flight, a dependency-gated (DAG) per-card implement+review pipeline with specialized agents, cross-batch final review, and an integrity-gated auto-merge — so subagent output never enters the main orchestrator context. Zero AskUserQuestion: each /new gate is a deterministic policy; blocking gates and scope-expanding findings are routed to the new2-resolve self-healing workflow. Before the final review a cross-card integration pass implements the residuals that were deferred only by the per-card ownership artifact (out-of-ownership whose remedy lands inside the batch union) or by a transient outage — re-running resolve with batch-union ownership, domain-routed — so they are not left as follow-ups to manage later (owner-gated / not-a-code-defect / baseline / new-AC scope-expansion still defer). Resilient by design: transient API errors are retried, a sustained outage degrades cleanly (durable resume), follow-ups for residuals are reconciled by the skill (offline-safe), and the merge is blocked unless the batch is verifiably complete (no false-DONE, no unreviewed code). Agents Read /new's reference modules for semantics (args.refModulesBase) so this script encodes only orchestration shape + gate policy. Claude-only.",
3
+ // One-liner only: this string re-enters the orchestrator context on completion.
4
+ // Full semantics live in the args-contract comment block below + /new's reference modules.
5
+ description: 'EXPERIMENTAL autonomous workflow host for the whole /new batch (off-context, Claude-only).',
5
6
  phases: [
6
7
  { title: 'Pre-flight', detail: 'deterministic workspace hygiene + worktree + dep-graph + cross-card grounding (Phase 0)' },
7
8
  { title: 'Implement', detail: 'dependency-gated per-card implement+review pipeline with owner_agent + specialized review fan-out' },
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "baldart",
3
- "version": "4.48.0",
3
+ "version": "4.49.1",
4
4
  "description": "Claude Agent Framework - Reusable framework for coordinating AI agents and humans in software projects",
5
5
  "bin": {
6
6
  "baldart": "./bin/baldart.js"