npm - baldart - Versions diffs - 4.48.0 → 4.49.1 - Mend

baldart 4.48.0 → 4.49.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (15) hide show

package/CHANGELOG.md +20 -0
package/VERSION +1 -1
package/framework/.claude/agents/codebase-architect.md +24 -1
package/framework/.claude/agents/plan-auditor.md +17 -79
package/framework/.claude/skills/bug/SKILL.md +3 -0
package/framework/.claude/skills/context-primer/SKILL.md +4 -0
package/framework/.claude/skills/new/SKILL.md +18 -0
package/framework/.claude/skills/new/references/final-review.md +4 -15
package/framework/.claude/skills/new/references/implement.md +1 -1
package/framework/.claude/skills/prd/references/discovery-phase.md +5 -1
package/framework/.claude/workflows/new-card-review.js +3 -2
package/framework/.claude/workflows/new-final-review.js +3 -2
package/framework/.claude/workflows/new2-resolve.js +3 -2
package/framework/.claude/workflows/new2.js +3 -2
package/package.json +1 -1

package/CHANGELOG.md CHANGED Viewed

@@ -5,6 +5,26 @@ All notable changes to BALDART will be documented in this file.
 The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
 and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
+## [4.49.1] - 2026-06-17
+**The four `/new` dynamic workflows shed their paragraph-length `meta.description` — the last prose that re-entered the orchestrator context on every workflow completion.** The `Workflow` tool contract specifies `description` as a *one-line* field shown in the permission dialog, but `new-card-review` / `new-final-review` / `new2` / `new2-resolve` each carried a full-paragraph rationale (~150–230 words) that the runtime echoes back into the orchestrator prefix on the completion line — pure prose with zero runtime value (the orchestrator already knows what it invoked). Each is collapsed to a single line; the full semantics were already duplicated in the script's own args-contract comment block (source code, never injected into context) plus the `references/*.md` SSOT, so nothing is lost. Same family as the v4.49.0 prose-leak closures, applied to the workflow layer. **PATCH** (no behaviour change — descriptions are display/permission-dialog text only; **no new `baldart.config.yml` key**).
+### Changed
+- **`framework/.claude/workflows/{new-card-review,new-final-review,new2,new2-resolve}.js`** — `meta.description` collapsed from a full paragraph to one line each (e.g. `'Per-wave review+fix cluster for /new (off-context).'`). A two-line comment above each marks why it must stay a one-liner (re-enters orchestrator context on completion) and points to the args-contract comment block + reference module that hold the full semantics.
+## [4.49.0] - 2026-06-17
+**Caveman-style terse contracts close the last prose-leak the v4.47.0 turn-economy left out of scope: `codebase-architect` gets an opt-in `OUTPUT=terse` grounding mode, and `/new`'s orchestrator is forced near-silent.** A study of the [caveman](https://github.com/JuliusBrussee/caveman) skill (compress *how the agent talks*, not *what it builds* — output tokens only, reasoning untouched) plus a **prose-leak surface map of BALDART's own fleet** found the surface already covered almost everywhere — finders emit YAML/schema (`code-reviewer`, auditors, `plan-auditor`, `prd-card-writer`, `coder` completion-report), `senior-researcher` is file-resident, `context-primer` is already capped at a 30–50-line XML — with ONE genuinely-uncovered prose exploder: `codebase-architect`, whose budget cap is *input*-side only (`codebase-architect.md`), whose "Communication Style" explicitly licenses narrative prose, and whose return is outside the v4.47.0 rule by design (`new/SKILL.md` § "Context economy": *"It does NOT change any agent's return contract"*). The earlier ponytail-style "cap all finder prose" idea was **refuted** by a 3-skeptic adversarial pass (finders already capped + data-driven; new2 already isolates subagent output; driver is turn-count) — this release ships only the two data-validated survivors. A follow-up **return-consumption audit** of the whole fleet confirmed the floor is already reached everywhere else — finders/auditors return a parsed schema or override-to-`## FINDINGS`, and `qa-sentinel`/`doc-reviewer`/`senior-researcher`/`prd-card-writer` are file-resident with a thin manifest return; the only remaining lever was wiring the new `OUTPUT=terse` flag into the remaining machine-to-machine `codebase-architect` call-sites (`context-primer`, `/prd` discovery dimension-resolution + ISA spawns). The near-silent rule is scoped to `/new` only — `/prd` and `/bug` are conversational, where the prose IS the deliverable (caveman's own Auto-Clarity principle: never compress what a human reads to decide). `/new` is the opposite: not conversational — the user keeps the orchestrator attached only so it can surface a genuine decision, so its running narration serves no one. **MINOR** (additive agent capability + skill behaviour; **no new `baldart.config.yml` key** ⇒ schema-change propagation rule N/A; no removed surface).
+### Changed
+- **`framework/.claude/agents/codebase-architect.md`** — new **"Output mode: terse grounding (opt-in)"** section: when the invoking prompt carries `OUTPUT=terse`, the architect emits the structured substance only (the `## Reuse Analysis` table + `## Canonical Evidence` block unchanged, affected code as `path:line — symbol — pattern/role` rows, high-risk paths one-line each, a closing `totals:`), drops the high-level-overview/"why" narrative, keeps code identifiers verbatim, and keeps an **Auto-clarity carve-out** (`⚠` one-liner for a genuine ambiguity/risk a row can't carry). The existing "Communication Style" is retitled *(interactive / explanatory invocations only)* — without the flag the narrative path is unchanged.
+- **`framework/.claude/skills/{new/references/implement.md` (Phase 1 step 3), `bug/SKILL.md` (Phase 0 step 3), `context-primer/SKILL.md` (loader spawn template), `prd/references/discovery-phase.md` (dimension-resolution spawn + ISA spawn)}** — every machine-to-machine `codebase-architect` grounding spawn now passes `OUTPUT=terse`. The `/new` baseline persisted verbatim at step 5b stays lean while keeping everything a lean `/codexreview` needs (paths, signatures, patterns, high-risk paths).
+- **`framework/.claude/skills/new/SKILL.md`** — new **HARD RULE in § "Context economy": "orchestrator is near-silent / maximally cryptic"** — the orchestrator emits **nothing user-facing during the run** (no progress, no "now I will…", no per-phase recap, no agent-activity description, no tracker restatement, no tool-call narration) and speaks in exactly **three** cases: (1) decision gates — `AskUserQuestion` + security/irreversible-action confirmations, full clarity, never compressed; (2) genuine blockers / items needing user action, terse and exact; (3) one minimal end-of-batch result block (one line per card + actionable residue). Governs the orchestrator's *prose*, not any agent return contract.
+- **`framework/.claude/agents/plan-auditor.md`** — the FULL-mode `## OUTPUT FORMAT` 10-section block is compressed in place (~78 → ~22 lines): same 10 sections + every load-bearing semantic (verdict definitions, `Priority ≥ 16 → BLOCK`, the three distinct pre-mortem root-cause classes), with the verbose per-section sub-bullet templates collapsed to one-line specs. This is dormant weight on the common QUICK-mode spawns (which explicitly skip it), trimmed from every spawn's system prompt. Compressed **in place** rather than extracted to an on-demand reference — agents don't Read external format files (that's a skill-only pattern), so extraction would introduce a cross-context path dependency not worth ~1k tokens. (Return-consumption audit verdict: this is a per-spawn *subagent input* cost, not the orchestrator's replayed-context multiplier — a deliberately small, low-risk trim, the only one of the dormant-format set worth touching.)
+- **`framework/.claude/skills/new/references/final-review.md`** (Step F.6 item 4) — the per-card + batch **13-field summary report** is collapsed to the minimal result block (`CARD-ID: merged @<sha> | failed: … | deferred → …` one line per card + actionable residue only). All process telemetry (files changed, test/build/lint status, fix cycles, review counts, QA verdict, commit/merge hashes, cleanup, knowledge-sync) is no longer rendered — it lives in the tracker + `skill-runs.jsonl`. The Next Steps & Launch Command (4b) and Production Readiness manual steps stay (they are actionable residue).
 ## [4.48.0] - 2026-06-16
 **`new2` is relaxed: a post-batch, interactive-only escape hatch hands genuinely-blocked hard cases to `/new`'s real human gate — without re-introducing rigidity, a twin, or breaking the A/B.** `new2` runs the batch autonomously, so a card the deterministic policy can't salvage becomes a tracked follow-up + is left `IN_PROGRESS` — the "edge case that wants human intelligence" the user flagged. A 3-lens **adversarial review before implementation** killed the obvious fix (a Step-5 `AskUserQuestion` that lets the skill implement/resolve the residual): (1) **correctness** — the workflow auto-merges and removes the worktree *before* Step 5, so a skill-side fix has no worktree, lands unreviewed code on trunk, and bypasses the F-029 DONE gate; (2) **value** — on the only run with a full `deferral_breakdown` (14 residuals) a post-batch fix changes the outcome of *zero* of them (the batch already merged); only a mid-batch checkpoint could salvage-before-merge, and the data (n=1) doesn't justify breaking `new2`'s background/no-poll contract; (3) **prior-art** — it would twin `/new`'s Phase 2.5b AC-Closure gate and muddy the autonomous-vs-`/new` A/B. What survived all three: the sound escalation is **not to re-implement a gate but to invoke `/new` on the already-materialised follow-up** — `/new` owns the real worktree+review+AC-Closure+F-029+merge pipeline. **MINOR** (additive skill behaviour, interactive-only, autonomous-mode-safe; **no new `baldart.config.yml` key** ⇒ schema-change propagation rule N/A; no removed surface).

package/VERSION CHANGED Viewed

	@@ -1 +1 @@
1	- 4.48.0
1	+ 4.49.1

package/framework/.claude/agents/codebase-architect.md CHANGED Viewed

@@ -276,7 +276,30 @@ canonical doc already told you.
 - Documentation sync: code and docs must stay aligned
 - Backlog-driven work: features are tracked in `${paths.backlog_dir}/*.yml` files
-**Communication Style:**
+**Output mode: terse grounding (opt-in, since v4.49.0):**
+When the invoking prompt contains `OUTPUT=terse` (architecture grounding handed to
+an implementer or reviewer — not an interactive explanation for a human to read),
+emit the structured substance ONLY, no narrative. Brain big, mouth small. Your
+return value is injected verbatim into the caller's context, so every word of
+prose costs them context budget on every subsequent turn.
+- **Emit, unchanged (these ARE the contract):** the `## Reuse Analysis` table and
+  the `## Canonical Evidence` block.
+- **Affected code as rows, not paragraphs:** `path:line — symbol — pattern/role`.
+- **High-risk paths / integration points, one line each:** `path:line — risk`.
+- **Close with** a `totals:` line (files, reuse candidates, high-risk paths).
+- **Drop:** the high-level-overview paragraph, the "why" essays, restating the
+  request, any prose a row already carries.
+- **Keep verbatim:** file paths, symbols, type signatures, error strings — never
+  abbreviate a code identifier.
+- **Auto-clarity carve-out:** a genuine ambiguity, drift, or safety-relevant
+  caveat a row can't carry gets ONE plain short line flagged `⚠`. Never compress a
+  warning into ambiguity.
+Without `OUTPUT=terse`, use the narrative style below.
+**Communication Style (interactive / explanatory invocations only):**
 - Start with a high-level overview, then dive into specifics
 - Use precise technical terminology from `coding-standards.md`

package/framework/.claude/agents/plan-auditor.md CHANGED Viewed

@@ -471,85 +471,23 @@ Every HIGH and MEDIUM finding MUST be emitted in this exact shape so it can be p
 LOW findings can be one-liners with target tag (no full schema).
-## OUTPUT FORMAT (MANDATORY — USE THIS EXACT STRUCTURE)
-### 1) Executive Verdict
-- **Verdict**: {PASS | PASS WITH FIXES | BLOCK | NEEDS_REPLAN}
-- **Coverage Score**: <N>/8 audit categories passed (A–H)
-- **Memory matches**: <N> known pitfalls applied to this audit
-- **High-Risk Path Triggers**: [list or "none"] — if any: "Per-card `/codexreview` MUST run BEFORE merge"
-- **Specialist auto-spawn**: [list of agents spawned, or "none triggered"]
-- **Active Code Context drift**: [list of colliding cards, or "no collision"]
-- **Tool budget used**: <reads>/20 reads, <greps>/30 greps
-- 3–7 bullet reasons (highest impact first)
-**Verdict definitions:**
-- `PASS`: thorough, explicit, ready for implementation. (Rare.)
-- `PASS WITH FIXES`: structurally sound, gaps enumerated and fixable before coding starts.
-- `BLOCK`: gap that would cause production incident, data loss, security breach, or mid-sprint rewrite. Work must not start.
-- `NEEDS_REPLAN`: framing is wrong, not just details. Premise / approach / scope is incorrect; restart planning, do not patch.
-### 2) High-Risk Gaps (Must Fix)
-Findings list in YAML schema, grouped by category (integrity, architecture, security, reliability, testing, api-perf, traceability, verification, adr, drift, high-risk-path, simulation, injection).
-For backlog card findings, include the `target` field per TARGET TAG SYSTEM.
-### 3) Plan Simulation Findings
-Walk through each step that broke during simulation. Format:
-- **Step N**: <step description>
-- **Broken invariant**: <what assumption fails>
-- **Why**: <causal chain>
-- **Fix**: <reorder, add prerequisite step, add rollback>
-### 4) Assumptions & Hidden Dependencies
-- List each assumption found in or implied by the plan
-- For each: **Confidence level** (High / Med / Low) + **How to validate quickly**
-### 5) Blocking Questions (If any)
-- Numbered list of precise questions
-- For each: provide **recommended default if unanswered** + **trade-off explanation**
-### 6) Hardened Plan (Rewritten)
-- Provide a revised plan with:
-  - Phases (numbered, with clear entry/exit criteria)
-  - Tasks per phase (with estimated complexity: S/M/L)
-  - Acceptance criteria per phase (testable, specific)
-  - Explicit dependencies between phases and external systems
-  - Rollout + rollback strategy
-  - Instrumentation requirements (what to log, what to alert on)
-  - Test plan (what to test at each phase)
-### 7) Quantified Risk Register
-Format per risk: **Risk** | **Impact (1-5)** | **Likelihood (1-5)** | **Priority (I×L)** | **Severity** | **Mitigation** | **Owner**
-Rank by Priority descending. Highlight any Priority ≥ 16 as **BLOCK**.
-### 8) Three-Scenario Pre-Mortem (counterfactual diversity)
-Generate 3 INDEPENDENT failure scenarios, each with a DIFFERENT root cause class:
-**Scenario A — Data Corruption** ("It's 30 days later and silent data loss occurred because…")
-- Causal chain
-- Top 3 failure modes feeding this scenario
-- How the hardened plan prevents each
-**Scenario B — Security Breach** ("It's 30 days later and a customer escalated unauthorized access because…")
-- Causal chain
-- Top 3 failure modes
-- Mitigation in hardened plan
-**Scenario C — Scale Collapse** ("It's 30 days later and prod is throttling because…")
-- Causal chain
-- Top 3 failure modes
-- Mitigation in hardened plan
-If a scenario class doesn't apply to this plan (e.g. no data writes → no Scenario A), state explicitly and skip — but argue why it doesn't apply.
-### 9) Hallucinated Findings Dropped (CoVe)
-Findings disproven by Chain-of-Verification. Format:
-- **Finding title** — Verification: <command> → <result> → dropped because <reason>
-### 10) Suppressed Findings (Challenge Pass)
-Already documented in the suppressed-findings collapsible block above.
+## OUTPUT FORMAT — FULL mode only (MANDATORY — USE THESE 10 SECTIONS)
+> QUICK mode does NOT use this section — it returns `VERDICT: …` + numbered fixes (see top). The
+> 10-section structure below is for a standalone/holistic FULL audit; emit every section in order.
+> Terse spec — one line per field, no prose padding; substance and exact values verbatim.
+1. **Executive Verdict** — `Verdict` {PASS | PASS WITH FIXES | BLOCK | NEEDS_REPLAN}; Coverage <N>/8 (A–H); Memory matches <N>; High-Risk Path Triggers [list|none] (if any: "Per-card `/codexreview` MUST run BEFORE merge"); Specialist auto-spawn [list|none]; Active-Code-Context drift [colliding cards|none]; Tool budget <reads>/20, <greps>/30; 3–7 reasons, highest impact first.
+   - **Verdicts:** `PASS` ready (rare) · `PASS WITH FIXES` sound, gaps fixable pre-coding · `BLOCK` gap → prod incident / data loss / security breach / mid-sprint rewrite, do not start · `NEEDS_REPLAN` premise/approach/scope wrong, restart not patch.
+2. **High-Risk Gaps** — findings in the FINDINGS YAML SCHEMA above, grouped by category; include `target` for card findings.
+3. **Plan Simulation Findings** — per broken step: `Step N` · broken invariant · why (causal chain) · fix (reorder / add prerequisite / add rollback).
+4. **Assumptions & Hidden Dependencies** — each assumption + confidence (High/Med/Low) + how to validate quickly.
+5. **Blocking Questions** — numbered; each + recommended default if unanswered + trade-off.
+6. **Hardened Plan** — phases (entry/exit criteria) · tasks S/M/L · per-phase ACs (testable) · explicit deps (phases + external) · rollout+rollback · instrumentation (log/alert) · test plan.
+7. **Quantified Risk Register** — `Risk | Impact 1-5 | Likelihood 1-5 | Priority I×L | Severity | Mitigation | Owner`, ranked by Priority desc; Priority ≥ 16 → **BLOCK**.
+8. **Three-Scenario Pre-Mortem** — 3 INDEPENDENT "30 days later…" scenarios, each a DIFFERENT root-cause class (A Data Corruption / B Security Breach / C Scale Collapse): causal chain + top-3 failure modes + how the hardened plan prevents each. A class that can't apply (e.g. no writes → no A) is stated and skipped with the reason.
+9. **Hallucinated Findings Dropped (CoVe)** — `Finding — Verification: <command> → <result> → dropped because <reason>`.
+10. **Suppressed Findings (Challenge Pass)** — per the suppressed-findings block above.
 ## TONE & STYLE

package/framework/.claude/skills/bug/SKILL.md CHANGED Viewed

@@ -60,6 +60,9 @@ See [framework/docs/MCP-INTEGRATION.md](../../../docs/MCP-INTEGRATION.md) for th
    - **Auth** → the project's auth middleware/guard + auth error codes (per `stack.auth_provider`)
    - **Build/Deploy** → the deploy platform's logs + build output (per `stack.deployment`, e.g. Vercel)
 3. Launch `codebase-architect` agent to map the affected code paths — do NOT start debugging blind.
+   Pass `OUTPUT=terse` (since v4.49.0): you want the structured map (`path:line — symbol` rows,
+   high-risk paths, `totals:`), not a narrative essay — its return is injected verbatim into this
+   investigation context, so keep it lean.
    When `features.has_lsp_layer: true` and the symptom names a concrete symbol
    (function/type/handler), `codebase-architect` will use LSP find-references /
    go-to-definition to locate the exact callsites instead of grepping. See

package/framework/.claude/skills/context-primer/SKILL.md CHANGED Viewed

@@ -61,6 +61,10 @@ precedence caveats: `framework/agents/effort-protocol.md`.
 Load context about: {keywords}
 Task type: {task_type}
+OUTPUT=terse — you are a silent loader's machine consumer, not a human Q&A. Return the
+structured contract (reuse table + canonical evidence + `path:line — symbol` rows + `totals:`)
+with no narrative prose; the loader condenses this into the XML context block.
 TOKEN BUDGET: stay under 15K tokens total. Prefer the canonical router + targeted reads over broad file scans.
 SEARCH STRATEGY — canonical-router-first, verify-only-if-needed:

package/framework/.claude/skills/new/SKILL.md CHANGED Viewed

@@ -207,6 +207,24 @@ baselines. Keep that bulk on disk and pass **paths**, not bodies.
 >    completion — end your turn after spawning; never `sleep N; echo "waiting…"` (already stated for
 >    team mode in `references/team-mode.md` — it applies to every barrier in both modes).
 >
+> **HARD RULE — orchestrator is near-silent / maximally cryptic (since v4.49.0).** `/new` is NOT
+> conversational: the user does not read its running narration and keeps the orchestrator attached ONLY so it
+> can surface a genuine decision when something unpredictable needs a human. So emit **nothing user-facing
+> during the run** — no progress, no "now I will…", no per-phase recap, no description of what an agent is
+> doing, no tracker restatement, no tool-call narration (each also costs a full replayed-context turn per
+> § turn-count above; the tracker is the only state surface). The orchestrator speaks to the user in exactly
+> THREE cases, and nowhere else:
+> 1. **Decision gates — full clarity, NEVER compress:** `AskUserQuestion` prompts and any security /
+>    irreversible-action confirmation. This is the sole reason the human is in the loop — it must be readable.
+> 2. **Genuine blockers / items needing user action** — surfaced terse and exact (`CARD-ID`, `path:line`,
+>    error string verbatim), no prose around them.
+> 3. **One minimal end-of-batch result block** — one line per card (`CARD-ID: merged @<sha> | failed:
+>    <one clause> | deferred → <followup-id>`) plus only the actionable residue (unresolved/flagged items,
+>    the launch command, any production-readiness manual step). NOT a per-phase report — process telemetry
+>    lives in the tracker + `skill-runs.jsonl`, never rendered.
+> This governs the orchestrator's *prose*, not any agent's return contract (that is § "Context economy" above).
+>
 > These rules reduce the *number* of replays; the bulk-content rule above reduces the *size* of each.
 > Both compound — apply them together.

package/framework/.claude/skills/new/references/final-review.md CHANGED Viewed

@@ -295,21 +295,10 @@ that is a **gate violation**: log it as
    If the project ships a `knowledge-sync` corpus agent (declared in `.baldart/overlays/new.md`), invoke it after doc updates so the external corpus stays aligned. If no such agent exists, skip this step with a one-line notice (do NOT dispatch a non-existent agent).
 3. **Proceed to Phase 6** (post-batch merge & cleanup).
-4. Present a **single summary report** to the user per card (and a batch summary at the end):
-   - **Files changed** (short list per card)
-   - **Test results** (new tests + existing tests count, pass rate at each iteration)
-   - **Build/lint status** (pass + retry count if any)
-   - **Fix cycles** (total number of self-healing retries across phases)
-   - **Final review** (findings: N raised / M verified / K false positives | fixes applied: N | highest severity)
-   - **UX testing** (PASS/FAIL/SKIP | test file path if written)
-   - **QA result** (profile: skip/light/balanced/deep | verdict: PASS/FAIL/SKIP | confidence % | failing gates: which of lint/tsc/test/build/markdownlint, or "none" | findings file path). E2E is reported separately (Phase 2.6 e2e-review), not by qa-sentinel.
-   - **Issues needing user attention** (anything unresolved, partially wired, or flagged)
-   - **Commit hashes** (from tracker)
-   - **Merge commit hash** (from Phase 6)
-   - **Card status reconciliation** (Phase 6b: N cards verified DONE, K force-updated)
-   - **Worktree cleanup status** (success/failed)
-   - **Knowledge-corpus sync status** (from Step F.6 item 2 "Knowledge Base Sync" — COMPLETED/FAILED/PARTIAL, or SKIPPED if no corpus-sync agent is configured)
-   - Overall implementation status
+4. Emit the **minimal end-of-batch result block** (per § "orchestrator is near-silent" in the core SKILL.md — this is NOT a per-phase report). ONE line per card and nothing more of the process telemetry:
+   - `<CARD-ID>: merged @<sha> | failed: <one clause> | deferred → <followup-id>`
+   Then, **only if non-empty**, the **actionable residue**: issues needing user attention (anything unresolved, partially wired, or flagged — terse, with `path:line` / `CARD-ID` exact). All process telemetry — files changed, test/build/lint status, fix cycles, review-finding counts, UX/QA verdicts, commit + merge hashes, card-status reconciliation, worktree cleanup, knowledge-corpus sync — is **NOT rendered**: it lives in the tracker and `$METRICS/skill-runs.jsonl`. (The Next Steps & Launch Command in 4b below and any Production Readiness manual steps ARE part of the actionable residue — keep them.)
 4b. **Next Steps & Launch Command** (MANDATORY section — always present):

package/framework/.claude/skills/new/references/implement.md CHANGED Viewed

@@ -10,7 +10,7 @@
    - If `DONE` (or the user chose "proceed anyway") → continue.
 2b. **Claim** — only now set the card status to `IN_PROGRESS` (in the backlog YAML / tracker) and assign yourself. This is a state write, not a visibility emission — do not also spend a TaskUpdate / Progress-Bar turn (§ "State surface — the tracker only").
 2c. **Trivial-card classification (BLOCKING gate for steps 3–4)** — evaluate `IS_TRIVIAL(card)` per § "Trivial-card fast-lane". Note: condition 3 (non-source diff) cannot be fully evaluated until the coder has produced the diff, so at this point compute the **provisional** trivial flag from conditions 1+2 only (`review_profile == skip` AND no Step-A trigger sourced from the card YAML text + `files_likely_touched` extensions — if EVERY path in `files_likely_touched` is non-source, condition 3 is provisionally satisfied). If provisionally trivial → **SKIP steps 3, 3a, and 4** (architecture grounding); log `trivial: architecture grounding skipped (review_profile=skip + non-source files_likely_touched + 0 triggers)` and jump to Phase 2. Re-confirm `IS_TRIVIAL` on the ACTUAL committed diff at the review gates (Phase 2.55/3.5/3.7); if the coder unexpectedly touched a source file, the guard flips the card back onto the normal review path there. If NOT provisionally trivial → run steps 3, 3a, 4 as normal.
-3. **(skip when provisionally trivial — see 2c)** Invoke the **codebase-architect** agent (MUST per AGENTS.md) to understand the relevant codebase area, existing patterns, and architecture before any implementation. When `features.has_lsp_layer: true`, the architect uses LSP find-references for identifier-shaped lookups — this needs NO handoff from the orchestrator: the architect reads `features.has_lsp_layer` from `baldart.config.yml` directly (the flag is ambient) per `agents/code-search-protocol.md`. Likewise, when `features.has_code_graph: true`, the architect uses the Graphify code graph for structural/relational lookups (ambient flag) per `agents/code-graph-protocol.md`. The orchestrator does NOT propagate either flag. (Earlier doc versions numbered this step 4; the step that read project-status BEFORE the architect was removed because it persisted pre-analysis context — see step 3a.)
+3. **(skip when provisionally trivial — see 2c)** Invoke the **codebase-architect** agent (MUST per AGENTS.md) to understand the relevant codebase area, existing patterns, and architecture before any implementation. When `features.has_lsp_layer: true`, the architect uses LSP find-references for identifier-shaped lookups — this needs NO handoff from the orchestrator: the architect reads `features.has_lsp_layer` from `baldart.config.yml` directly (the flag is ambient) per `agents/code-search-protocol.md`. Likewise, when `features.has_code_graph: true`, the architect uses the Graphify code graph for structural/relational lookups (ambient flag) per `agents/code-graph-protocol.md`. The orchestrator does NOT propagate either flag. (Earlier doc versions numbered this step 4; the step that read project-status BEFORE the architect was removed because it persisted pre-analysis context — see step 3a.) **Pass `OUTPUT=terse` in the architect prompt (since v4.49.0)** — this is grounding for the coder/reviewer, not an explanation, so the architect returns its structured contract (reuse table + canonical evidence + `path:line — symbol` rows + `totals:`) without narrative prose. The verbatim baseline persisted at step 5b keeps all the substance a lean `/codexreview` needs (paths, signatures, patterns, high-risk paths) while staying small in the orchestrator's context and on disk.
 3a. Update `${paths.references_dir}/project-status.md` Active Code Context (skip when the file does not exist in the project) — do this AFTER the codebase-architect run (step 3) so the "Active Code Context" reflects the architect's findings (which files are actually in scope), not just the card YAML's `files_likely_touched`. Writing it before the architect run would persist pre-analysis claims that downstream agents (e.g. a parallel card) would then read as truth.
 4. **Plan-auditor grounding check** — but first, a **provenance skip (since v4.7.0 — mirror of Step 3d)**: the `/prd` Step 6.6 holistic audit already ran a per-card grounding pass when the card was authored. Re-grounding here is only worth a plan-auditor spawn if something changed since. Skip the plan-auditor when **BOTH** hold:
    - **P1 — provenance present**: the card has `metadata.holistic_audit` with non-empty `audited_at`, `audited_commit`, `audited_set`.

package/framework/.claude/skills/prd/references/discovery-phase.md CHANGED Viewed

@@ -663,9 +663,11 @@ section of the state file. Mark dimension as covered.
    If more specific detail is needed beyond what context-primer provided,
    invoke `codebase-architect` agent (foreground) with a targeted question:
    ```
+   OUTPUT=terse — machine consumer (discovery dimension resolution), not human Q&A.
    For feature `<slug>`: regarding the dimension `<dimension>`, does the codebase
    already answer this? Specifically check: existing collections, routes, components,
-   permissions, and patterns. Report what you find with file references.
+   permissions, and patterns. Report what you find with file references — structured
+   rows (`path:line — symbol — note`) + `totals:`, no narrative prose.
    ```
    Avoid launching codebase-architect when the context-primer output already
    provides sufficient information for the dimension.
@@ -743,6 +745,8 @@ Instead, run the ISA scan and present findings for validation.
 1. Invoke `codebase-architect` agent with this specific prompt:
    ```
+   OUTPUT=terse — machine consumer (ISA table builder), not human Q&A: per-touchpoint
+   rows only, no narrative overview.
    For feature `<slug>`: perform an Integration Surface Analysis (ISA).
    Find ALL existing code that should reference, link to, or interact with this
    new feature. Scan these categories systematically:

package/framework/.claude/workflows/new-card-review.js CHANGED Viewed

@@ -1,7 +1,8 @@
 export const meta = {
   name: 'new-card-review',
-  description:
-    "Per-wave review+fix cluster for /new. Takes 1..N co-located cards (1 = sequential per-card; N = a team-mode group/wave) and runs the whole review fan-out OUTSIDE the orchestrator context: Simplify + cross-model Codex (agent-launched, code-reviewer fallback) + qa-sentinel gates + security-reviewer (high-risk only), each specialist FP-checking its OWN findings; then ONE coder applies all VERIFIED code/perf/security/simplify findings in a single pass (files disjoint by ownership) and re-verifies lint/tsc/build. Doc-review and api-perf are deliberately OUT (doc runs post-E2E in the skill; api-perf is deferred to the Final FULL gate). Returns a compact {perCard:{fixesApplied,residual}, gateTable, summary} — the only object that re-enters the orchestrator prefix. Maps to references/review-cycle.md (Phase 2.55+3.5+3.7) and references/team-mode.md (Step D.2-D.4b). NOT new2: no human gates (returned as residual), no AC-closure, no E2E, no merge/commit/backlog.",
+  // One-liner only: this string re-enters the orchestrator context on completion.
+  // Full semantics live in the args-contract comment block below + references/review-cycle.md.
+  description: 'Per-wave review+fix cluster for /new (off-context).',
   phases: [
     { title: 'Baseline', detail: 'architecture grounding for the wave scope' },
     { title: 'Discovery', detail: 'parallel finders per card (simplify / codex / qa / security)' },

package/framework/.claude/workflows/new-final-review.js CHANGED Viewed

@@ -1,7 +1,8 @@
 export const meta = {
   name: 'new-final-review',
-  description:
-    "Cross-batch final code review for /new. Fans out a multi-agent review (Codex primary + doc-reviewer + api-perf-cost-auditor + qa-sentinel gates) over the WHOLE batch diff. Each domain specialist OWNS its lane end-to-end — it FP-checks its own findings in the finding pass, so there is no generic code-reviewer re-judge over another specialist's domain (cross-domain) nor over its own findings (self-judge); any residual unresolved finding is routed to its domain specialist. Read-only: returns classified findings + a gate table, applies NO fixes (the calling skill owns fix application + user gates). Maps to references/final-review.md steps F.2–F.4.",
+  // One-liner only: this string re-enters the orchestrator context on completion.
+  // Full semantics live in the args-contract comment block below + references/final-review.md.
+  description: 'Cross-batch final code review for /new (read-only, off-context).',
   phases: [
     { title: 'Baseline', detail: 'architecture grounding for the batch scope (F.2)' },
     { title: 'Review', detail: 'parallel multi-agent review of the batch diff (F.3)' },

package/framework/.claude/workflows/new2-resolve.js CHANGED Viewed

@@ -1,7 +1,8 @@
 export const meta = {
   name: 'new2-resolve',
-  description:
-    "Self-healing resolution pass for the autonomous new2 batch workflow. Called by new2 whenever a deterministic gate would otherwise need a human: a card fail/blocker (ac-unmet | blocker | qa-fail | e2e-blocked | merge-blocker) or a legitimate scope-EXPANDING finding (scope-expansion). Tier-1 targeted fix with a TERMINAL short-circuit (skips the costly multi-attempt when the problem is impossible-by-definition, verified not trusted), then a judged multi-attempt; a MANDATORY adversarial judge cross-checks every verified claim against the real diff (prevents fabricated success) whenever the judge is a DIFFERENT specialist than the fixer. When fixer === judge (doc→doc-reviewer, security→security-reviewer) the second pass would just judge its own work — no cross-model diversity, pure waste — so the judge AND the terminal-verdict ratification are skipped (the reviewer-writer self-verifies in its single pass). Specialized per domain (doc→doc-reviewer single pass, ui→ui-expert fix + code-reviewer judge, security→security-reviewer single pass, perf→api-perf-cost-auditor judge). Terminal is a tracked follow-up. Accepts a `findings` list (batched per area). Uses agent()/parallel() only — no nested workflows.",
+  // One-liner only: this string re-enters the orchestrator context on completion.
+  // Full semantics live in the args-contract comment block below.
+  description: 'Self-healing resolution pass for the autonomous new2 batch workflow.',
   phases: [
     { title: 'Diagnose', detail: 'classify + terminal short-circuit + scope-expansion boundary' },
     { title: 'Repair', detail: 'targeted fix, then judged multi-attempt if needed' },

package/framework/.claude/workflows/new2.js CHANGED Viewed

@@ -1,7 +1,8 @@
 export const meta = {
   name: 'new2',
-  description:
-    "EXPERIMENTAL workflow host for /new (A/B testing). Runs an ENTIRE backlog-card batch autonomously in the background runtime — pre-flight, a dependency-gated (DAG) per-card implement+review pipeline with specialized agents, cross-batch final review, and an integrity-gated auto-merge — so subagent output never enters the main orchestrator context. Zero AskUserQuestion: each /new gate is a deterministic policy; blocking gates and scope-expanding findings are routed to the new2-resolve self-healing workflow. Before the final review a cross-card integration pass implements the residuals that were deferred only by the per-card ownership artifact (out-of-ownership whose remedy lands inside the batch union) or by a transient outage — re-running resolve with batch-union ownership, domain-routed — so they are not left as follow-ups to manage later (owner-gated / not-a-code-defect / baseline / new-AC scope-expansion still defer). Resilient by design: transient API errors are retried, a sustained outage degrades cleanly (durable resume), follow-ups for residuals are reconciled by the skill (offline-safe), and the merge is blocked unless the batch is verifiably complete (no false-DONE, no unreviewed code). Agents Read /new's reference modules for semantics (args.refModulesBase) so this script encodes only orchestration shape + gate policy. Claude-only.",
+  // One-liner only: this string re-enters the orchestrator context on completion.
+  // Full semantics live in the args-contract comment block below + /new's reference modules.
+  description: 'EXPERIMENTAL autonomous workflow host for the whole /new batch (off-context, Claude-only).',
   phases: [
     { title: 'Pre-flight', detail: 'deterministic workspace hygiene + worktree + dep-graph + cross-card grounding (Phase 0)' },
     { title: 'Implement', detail: 'dependency-gated per-card implement+review pipeline with owner_agent + specialized review fan-out' },

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "baldart",
-  "version": "4.48.0",
+  "version": "4.49.1",
   "description": "Claude Agent Framework - Reusable framework for coordinating AI agents and humans in software projects",
   "bin": {
     "baldart": "./bin/baldart.js"