baldart 4.48.0 → 4.49.1
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/CHANGELOG.md +20 -0
- package/VERSION +1 -1
- package/framework/.claude/agents/codebase-architect.md +24 -1
- package/framework/.claude/agents/plan-auditor.md +17 -79
- package/framework/.claude/skills/bug/SKILL.md +3 -0
- package/framework/.claude/skills/context-primer/SKILL.md +4 -0
- package/framework/.claude/skills/new/SKILL.md +18 -0
- package/framework/.claude/skills/new/references/final-review.md +4 -15
- package/framework/.claude/skills/new/references/implement.md +1 -1
- package/framework/.claude/skills/prd/references/discovery-phase.md +5 -1
- package/framework/.claude/workflows/new-card-review.js +3 -2
- package/framework/.claude/workflows/new-final-review.js +3 -2
- package/framework/.claude/workflows/new2-resolve.js +3 -2
- package/framework/.claude/workflows/new2.js +3 -2
- package/package.json +1 -1
package/CHANGELOG.md
CHANGED
|
@@ -5,6 +5,26 @@ All notable changes to BALDART will be documented in this file.
|
|
|
5
5
|
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
|
|
6
6
|
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
|
|
7
7
|
|
|
8
|
+
## [4.49.1] - 2026-06-17
|
|
9
|
+
|
|
10
|
+
**The four `/new` dynamic workflows shed their paragraph-length `meta.description` — the last prose that re-entered the orchestrator context on every workflow completion.** The `Workflow` tool contract specifies `description` as a *one-line* field shown in the permission dialog, but `new-card-review` / `new-final-review` / `new2` / `new2-resolve` each carried a full-paragraph rationale (~150–230 words) that the runtime echoes back into the orchestrator prefix on the completion line — pure prose with zero runtime value (the orchestrator already knows what it invoked). Each is collapsed to a single line; the full semantics were already duplicated in the script's own args-contract comment block (source code, never injected into context) plus the `references/*.md` SSOT, so nothing is lost. Same family as the v4.49.0 prose-leak closures, applied to the workflow layer. **PATCH** (no behaviour change — descriptions are display/permission-dialog text only; **no new `baldart.config.yml` key**).
|
|
11
|
+
|
|
12
|
+
### Changed
|
|
13
|
+
|
|
14
|
+
- **`framework/.claude/workflows/{new-card-review,new-final-review,new2,new2-resolve}.js`** — `meta.description` collapsed from a full paragraph to one line each (e.g. `'Per-wave review+fix cluster for /new (off-context).'`). A two-line comment above each marks why it must stay a one-liner (re-enters orchestrator context on completion) and points to the args-contract comment block + reference module that hold the full semantics.
|
|
15
|
+
|
|
16
|
+
## [4.49.0] - 2026-06-17
|
|
17
|
+
|
|
18
|
+
**Caveman-style terse contracts close the last prose-leak the v4.47.0 turn-economy left out of scope: `codebase-architect` gets an opt-in `OUTPUT=terse` grounding mode, and `/new`'s orchestrator is forced near-silent.** A study of the [caveman](https://github.com/JuliusBrussee/caveman) skill (compress *how the agent talks*, not *what it builds* — output tokens only, reasoning untouched) plus a **prose-leak surface map of BALDART's own fleet** found the surface already covered almost everywhere — finders emit YAML/schema (`code-reviewer`, auditors, `plan-auditor`, `prd-card-writer`, `coder` completion-report), `senior-researcher` is file-resident, `context-primer` is already capped at a 30–50-line XML — with ONE genuinely-uncovered prose exploder: `codebase-architect`, whose budget cap is *input*-side only (`codebase-architect.md`), whose "Communication Style" explicitly licenses narrative prose, and whose return is outside the v4.47.0 rule by design (`new/SKILL.md` § "Context economy": *"It does NOT change any agent's return contract"*). The earlier ponytail-style "cap all finder prose" idea was **refuted** by a 3-skeptic adversarial pass (finders already capped + data-driven; new2 already isolates subagent output; driver is turn-count) — this release ships only the two data-validated survivors. A follow-up **return-consumption audit** of the whole fleet confirmed the floor is already reached everywhere else — finders/auditors return a parsed schema or override-to-`## FINDINGS`, and `qa-sentinel`/`doc-reviewer`/`senior-researcher`/`prd-card-writer` are file-resident with a thin manifest return; the only remaining lever was wiring the new `OUTPUT=terse` flag into the remaining machine-to-machine `codebase-architect` call-sites (`context-primer`, `/prd` discovery dimension-resolution + ISA spawns). The near-silent rule is scoped to `/new` only — `/prd` and `/bug` are conversational, where the prose IS the deliverable (caveman's own Auto-Clarity principle: never compress what a human reads to decide). `/new` is the opposite: not conversational — the user keeps the orchestrator attached only so it can surface a genuine decision, so its running narration serves no one. **MINOR** (additive agent capability + skill behaviour; **no new `baldart.config.yml` key** ⇒ schema-change propagation rule N/A; no removed surface).
|
|
19
|
+
|
|
20
|
+
### Changed
|
|
21
|
+
|
|
22
|
+
- **`framework/.claude/agents/codebase-architect.md`** — new **"Output mode: terse grounding (opt-in)"** section: when the invoking prompt carries `OUTPUT=terse`, the architect emits the structured substance only (the `## Reuse Analysis` table + `## Canonical Evidence` block unchanged, affected code as `path:line — symbol — pattern/role` rows, high-risk paths one-line each, a closing `totals:`), drops the high-level-overview/"why" narrative, keeps code identifiers verbatim, and keeps an **Auto-clarity carve-out** (`⚠` one-liner for a genuine ambiguity/risk a row can't carry). The existing "Communication Style" is retitled *(interactive / explanatory invocations only)* — without the flag the narrative path is unchanged.
|
|
23
|
+
- **`framework/.claude/skills/{new/references/implement.md` (Phase 1 step 3), `bug/SKILL.md` (Phase 0 step 3), `context-primer/SKILL.md` (loader spawn template), `prd/references/discovery-phase.md` (dimension-resolution spawn + ISA spawn)}** — every machine-to-machine `codebase-architect` grounding spawn now passes `OUTPUT=terse`. The `/new` baseline persisted verbatim at step 5b stays lean while keeping everything a lean `/codexreview` needs (paths, signatures, patterns, high-risk paths).
|
|
24
|
+
- **`framework/.claude/skills/new/SKILL.md`** — new **HARD RULE in § "Context economy": "orchestrator is near-silent / maximally cryptic"** — the orchestrator emits **nothing user-facing during the run** (no progress, no "now I will…", no per-phase recap, no agent-activity description, no tracker restatement, no tool-call narration) and speaks in exactly **three** cases: (1) decision gates — `AskUserQuestion` + security/irreversible-action confirmations, full clarity, never compressed; (2) genuine blockers / items needing user action, terse and exact; (3) one minimal end-of-batch result block (one line per card + actionable residue). Governs the orchestrator's *prose*, not any agent return contract.
|
|
25
|
+
- **`framework/.claude/agents/plan-auditor.md`** — the FULL-mode `## OUTPUT FORMAT` 10-section block is compressed in place (~78 → ~22 lines): same 10 sections + every load-bearing semantic (verdict definitions, `Priority ≥ 16 → BLOCK`, the three distinct pre-mortem root-cause classes), with the verbose per-section sub-bullet templates collapsed to one-line specs. This is dormant weight on the common QUICK-mode spawns (which explicitly skip it), trimmed from every spawn's system prompt. Compressed **in place** rather than extracted to an on-demand reference — agents don't Read external format files (that's a skill-only pattern), so extraction would introduce a cross-context path dependency not worth ~1k tokens. (Return-consumption audit verdict: this is a per-spawn *subagent input* cost, not the orchestrator's replayed-context multiplier — a deliberately small, low-risk trim, the only one of the dormant-format set worth touching.)
|
|
26
|
+
- **`framework/.claude/skills/new/references/final-review.md`** (Step F.6 item 4) — the per-card + batch **13-field summary report** is collapsed to the minimal result block (`CARD-ID: merged @<sha> | failed: … | deferred → …` one line per card + actionable residue only). All process telemetry (files changed, test/build/lint status, fix cycles, review counts, QA verdict, commit/merge hashes, cleanup, knowledge-sync) is no longer rendered — it lives in the tracker + `skill-runs.jsonl`. The Next Steps & Launch Command (4b) and Production Readiness manual steps stay (they are actionable residue).
|
|
27
|
+
|
|
8
28
|
## [4.48.0] - 2026-06-16
|
|
9
29
|
|
|
10
30
|
**`new2` is relaxed: a post-batch, interactive-only escape hatch hands genuinely-blocked hard cases to `/new`'s real human gate — without re-introducing rigidity, a twin, or breaking the A/B.** `new2` runs the batch autonomously, so a card the deterministic policy can't salvage becomes a tracked follow-up + is left `IN_PROGRESS` — the "edge case that wants human intelligence" the user flagged. A 3-lens **adversarial review before implementation** killed the obvious fix (a Step-5 `AskUserQuestion` that lets the skill implement/resolve the residual): (1) **correctness** — the workflow auto-merges and removes the worktree *before* Step 5, so a skill-side fix has no worktree, lands unreviewed code on trunk, and bypasses the F-029 DONE gate; (2) **value** — on the only run with a full `deferral_breakdown` (14 residuals) a post-batch fix changes the outcome of *zero* of them (the batch already merged); only a mid-batch checkpoint could salvage-before-merge, and the data (n=1) doesn't justify breaking `new2`'s background/no-poll contract; (3) **prior-art** — it would twin `/new`'s Phase 2.5b AC-Closure gate and muddy the autonomous-vs-`/new` A/B. What survived all three: the sound escalation is **not to re-implement a gate but to invoke `/new` on the already-materialised follow-up** — `/new` owns the real worktree+review+AC-Closure+F-029+merge pipeline. **MINOR** (additive skill behaviour, interactive-only, autonomous-mode-safe; **no new `baldart.config.yml` key** ⇒ schema-change propagation rule N/A; no removed surface).
|
package/VERSION
CHANGED
|
@@ -1 +1 @@
|
|
|
1
|
-
4.
|
|
1
|
+
4.49.1
|
|
@@ -276,7 +276,30 @@ canonical doc already told you.
|
|
|
276
276
|
- Documentation sync: code and docs must stay aligned
|
|
277
277
|
- Backlog-driven work: features are tracked in `${paths.backlog_dir}/*.yml` files
|
|
278
278
|
|
|
279
|
-
**
|
|
279
|
+
**Output mode: terse grounding (opt-in, since v4.49.0):**
|
|
280
|
+
|
|
281
|
+
When the invoking prompt contains `OUTPUT=terse` (architecture grounding handed to
|
|
282
|
+
an implementer or reviewer — not an interactive explanation for a human to read),
|
|
283
|
+
emit the structured substance ONLY, no narrative. Brain big, mouth small. Your
|
|
284
|
+
return value is injected verbatim into the caller's context, so every word of
|
|
285
|
+
prose costs them context budget on every subsequent turn.
|
|
286
|
+
|
|
287
|
+
- **Emit, unchanged (these ARE the contract):** the `## Reuse Analysis` table and
|
|
288
|
+
the `## Canonical Evidence` block.
|
|
289
|
+
- **Affected code as rows, not paragraphs:** `path:line — symbol — pattern/role`.
|
|
290
|
+
- **High-risk paths / integration points, one line each:** `path:line — risk`.
|
|
291
|
+
- **Close with** a `totals:` line (files, reuse candidates, high-risk paths).
|
|
292
|
+
- **Drop:** the high-level-overview paragraph, the "why" essays, restating the
|
|
293
|
+
request, any prose a row already carries.
|
|
294
|
+
- **Keep verbatim:** file paths, symbols, type signatures, error strings — never
|
|
295
|
+
abbreviate a code identifier.
|
|
296
|
+
- **Auto-clarity carve-out:** a genuine ambiguity, drift, or safety-relevant
|
|
297
|
+
caveat a row can't carry gets ONE plain short line flagged `⚠`. Never compress a
|
|
298
|
+
warning into ambiguity.
|
|
299
|
+
|
|
300
|
+
Without `OUTPUT=terse`, use the narrative style below.
|
|
301
|
+
|
|
302
|
+
**Communication Style (interactive / explanatory invocations only):**
|
|
280
303
|
|
|
281
304
|
- Start with a high-level overview, then dive into specifics
|
|
282
305
|
- Use precise technical terminology from `coding-standards.md`
|
|
@@ -471,85 +471,23 @@ Every HIGH and MEDIUM finding MUST be emitted in this exact shape so it can be p
|
|
|
471
471
|
|
|
472
472
|
LOW findings can be one-liners with target tag (no full schema).
|
|
473
473
|
|
|
474
|
-
## OUTPUT FORMAT (MANDATORY — USE
|
|
475
|
-
|
|
476
|
-
|
|
477
|
-
-
|
|
478
|
-
|
|
479
|
-
|
|
480
|
-
|
|
481
|
-
- **
|
|
482
|
-
- **
|
|
483
|
-
|
|
484
|
-
|
|
485
|
-
|
|
486
|
-
**
|
|
487
|
-
|
|
488
|
-
-
|
|
489
|
-
|
|
490
|
-
|
|
491
|
-
|
|
492
|
-
### 2) High-Risk Gaps (Must Fix)
|
|
493
|
-
Findings list in YAML schema, grouped by category (integrity, architecture, security, reliability, testing, api-perf, traceability, verification, adr, drift, high-risk-path, simulation, injection).
|
|
494
|
-
|
|
495
|
-
For backlog card findings, include the `target` field per TARGET TAG SYSTEM.
|
|
496
|
-
|
|
497
|
-
### 3) Plan Simulation Findings
|
|
498
|
-
Walk through each step that broke during simulation. Format:
|
|
499
|
-
- **Step N**: <step description>
|
|
500
|
-
- **Broken invariant**: <what assumption fails>
|
|
501
|
-
- **Why**: <causal chain>
|
|
502
|
-
- **Fix**: <reorder, add prerequisite step, add rollback>
|
|
503
|
-
|
|
504
|
-
### 4) Assumptions & Hidden Dependencies
|
|
505
|
-
- List each assumption found in or implied by the plan
|
|
506
|
-
- For each: **Confidence level** (High / Med / Low) + **How to validate quickly**
|
|
507
|
-
|
|
508
|
-
### 5) Blocking Questions (If any)
|
|
509
|
-
- Numbered list of precise questions
|
|
510
|
-
- For each: provide **recommended default if unanswered** + **trade-off explanation**
|
|
511
|
-
|
|
512
|
-
### 6) Hardened Plan (Rewritten)
|
|
513
|
-
- Provide a revised plan with:
|
|
514
|
-
- Phases (numbered, with clear entry/exit criteria)
|
|
515
|
-
- Tasks per phase (with estimated complexity: S/M/L)
|
|
516
|
-
- Acceptance criteria per phase (testable, specific)
|
|
517
|
-
- Explicit dependencies between phases and external systems
|
|
518
|
-
- Rollout + rollback strategy
|
|
519
|
-
- Instrumentation requirements (what to log, what to alert on)
|
|
520
|
-
- Test plan (what to test at each phase)
|
|
521
|
-
|
|
522
|
-
### 7) Quantified Risk Register
|
|
523
|
-
Format per risk: **Risk** | **Impact (1-5)** | **Likelihood (1-5)** | **Priority (I×L)** | **Severity** | **Mitigation** | **Owner**
|
|
524
|
-
|
|
525
|
-
Rank by Priority descending. Highlight any Priority ≥ 16 as **BLOCK**.
|
|
526
|
-
|
|
527
|
-
### 8) Three-Scenario Pre-Mortem (counterfactual diversity)
|
|
528
|
-
Generate 3 INDEPENDENT failure scenarios, each with a DIFFERENT root cause class:
|
|
529
|
-
|
|
530
|
-
**Scenario A — Data Corruption** ("It's 30 days later and silent data loss occurred because…")
|
|
531
|
-
- Causal chain
|
|
532
|
-
- Top 3 failure modes feeding this scenario
|
|
533
|
-
- How the hardened plan prevents each
|
|
534
|
-
|
|
535
|
-
**Scenario B — Security Breach** ("It's 30 days later and a customer escalated unauthorized access because…")
|
|
536
|
-
- Causal chain
|
|
537
|
-
- Top 3 failure modes
|
|
538
|
-
- Mitigation in hardened plan
|
|
539
|
-
|
|
540
|
-
**Scenario C — Scale Collapse** ("It's 30 days later and prod is throttling because…")
|
|
541
|
-
- Causal chain
|
|
542
|
-
- Top 3 failure modes
|
|
543
|
-
- Mitigation in hardened plan
|
|
544
|
-
|
|
545
|
-
If a scenario class doesn't apply to this plan (e.g. no data writes → no Scenario A), state explicitly and skip — but argue why it doesn't apply.
|
|
546
|
-
|
|
547
|
-
### 9) Hallucinated Findings Dropped (CoVe)
|
|
548
|
-
Findings disproven by Chain-of-Verification. Format:
|
|
549
|
-
- **Finding title** — Verification: <command> → <result> → dropped because <reason>
|
|
550
|
-
|
|
551
|
-
### 10) Suppressed Findings (Challenge Pass)
|
|
552
|
-
Already documented in the suppressed-findings collapsible block above.
|
|
474
|
+
## OUTPUT FORMAT — FULL mode only (MANDATORY — USE THESE 10 SECTIONS)
|
|
475
|
+
|
|
476
|
+
> QUICK mode does NOT use this section — it returns `VERDICT: …` + numbered fixes (see top). The
|
|
477
|
+
> 10-section structure below is for a standalone/holistic FULL audit; emit every section in order.
|
|
478
|
+
> Terse spec — one line per field, no prose padding; substance and exact values verbatim.
|
|
479
|
+
|
|
480
|
+
1. **Executive Verdict** — `Verdict` {PASS | PASS WITH FIXES | BLOCK | NEEDS_REPLAN}; Coverage <N>/8 (A–H); Memory matches <N>; High-Risk Path Triggers [list|none] (if any: "Per-card `/codexreview` MUST run BEFORE merge"); Specialist auto-spawn [list|none]; Active-Code-Context drift [colliding cards|none]; Tool budget <reads>/20, <greps>/30; 3–7 reasons, highest impact first.
|
|
481
|
+
- **Verdicts:** `PASS` ready (rare) · `PASS WITH FIXES` sound, gaps fixable pre-coding · `BLOCK` gap → prod incident / data loss / security breach / mid-sprint rewrite, do not start · `NEEDS_REPLAN` premise/approach/scope wrong, restart not patch.
|
|
482
|
+
2. **High-Risk Gaps** — findings in the FINDINGS YAML SCHEMA above, grouped by category; include `target` for card findings.
|
|
483
|
+
3. **Plan Simulation Findings** — per broken step: `Step N` · broken invariant · why (causal chain) · fix (reorder / add prerequisite / add rollback).
|
|
484
|
+
4. **Assumptions & Hidden Dependencies** — each assumption + confidence (High/Med/Low) + how to validate quickly.
|
|
485
|
+
5. **Blocking Questions** — numbered; each + recommended default if unanswered + trade-off.
|
|
486
|
+
6. **Hardened Plan** — phases (entry/exit criteria) · tasks S/M/L · per-phase ACs (testable) · explicit deps (phases + external) · rollout+rollback · instrumentation (log/alert) · test plan.
|
|
487
|
+
7. **Quantified Risk Register** — `Risk | Impact 1-5 | Likelihood 1-5 | Priority I×L | Severity | Mitigation | Owner`, ranked by Priority desc; Priority ≥ 16 → **BLOCK**.
|
|
488
|
+
8. **Three-Scenario Pre-Mortem** — 3 INDEPENDENT "30 days later…" scenarios, each a DIFFERENT root-cause class (A Data Corruption / B Security Breach / C Scale Collapse): causal chain + top-3 failure modes + how the hardened plan prevents each. A class that can't apply (e.g. no writes → no A) is stated and skipped with the reason.
|
|
489
|
+
9. **Hallucinated Findings Dropped (CoVe)** — `Finding — Verification: <command> → <result> → dropped because <reason>`.
|
|
490
|
+
10. **Suppressed Findings (Challenge Pass)** — per the suppressed-findings block above.
|
|
553
491
|
|
|
554
492
|
## TONE & STYLE
|
|
555
493
|
|
|
@@ -60,6 +60,9 @@ See [framework/docs/MCP-INTEGRATION.md](../../../docs/MCP-INTEGRATION.md) for th
|
|
|
60
60
|
- **Auth** → the project's auth middleware/guard + auth error codes (per `stack.auth_provider`)
|
|
61
61
|
- **Build/Deploy** → the deploy platform's logs + build output (per `stack.deployment`, e.g. Vercel)
|
|
62
62
|
3. Launch `codebase-architect` agent to map the affected code paths — do NOT start debugging blind.
|
|
63
|
+
Pass `OUTPUT=terse` (since v4.49.0): you want the structured map (`path:line — symbol` rows,
|
|
64
|
+
high-risk paths, `totals:`), not a narrative essay — its return is injected verbatim into this
|
|
65
|
+
investigation context, so keep it lean.
|
|
63
66
|
When `features.has_lsp_layer: true` and the symptom names a concrete symbol
|
|
64
67
|
(function/type/handler), `codebase-architect` will use LSP find-references /
|
|
65
68
|
go-to-definition to locate the exact callsites instead of grepping. See
|
|
@@ -61,6 +61,10 @@ precedence caveats: `framework/agents/effort-protocol.md`.
|
|
|
61
61
|
Load context about: {keywords}
|
|
62
62
|
Task type: {task_type}
|
|
63
63
|
|
|
64
|
+
OUTPUT=terse — you are a silent loader's machine consumer, not a human Q&A. Return the
|
|
65
|
+
structured contract (reuse table + canonical evidence + `path:line — symbol` rows + `totals:`)
|
|
66
|
+
with no narrative prose; the loader condenses this into the XML context block.
|
|
67
|
+
|
|
64
68
|
TOKEN BUDGET: stay under 15K tokens total. Prefer the canonical router + targeted reads over broad file scans.
|
|
65
69
|
|
|
66
70
|
SEARCH STRATEGY — canonical-router-first, verify-only-if-needed:
|
|
@@ -207,6 +207,24 @@ baselines. Keep that bulk on disk and pass **paths**, not bodies.
|
|
|
207
207
|
> completion — end your turn after spawning; never `sleep N; echo "waiting…"` (already stated for
|
|
208
208
|
> team mode in `references/team-mode.md` — it applies to every barrier in both modes).
|
|
209
209
|
>
|
|
210
|
+
|
|
211
|
+
> **HARD RULE — orchestrator is near-silent / maximally cryptic (since v4.49.0).** `/new` is NOT
|
|
212
|
+
> conversational: the user does not read its running narration and keeps the orchestrator attached ONLY so it
|
|
213
|
+
> can surface a genuine decision when something unpredictable needs a human. So emit **nothing user-facing
|
|
214
|
+
> during the run** — no progress, no "now I will…", no per-phase recap, no description of what an agent is
|
|
215
|
+
> doing, no tracker restatement, no tool-call narration (each also costs a full replayed-context turn per
|
|
216
|
+
> § turn-count above; the tracker is the only state surface). The orchestrator speaks to the user in exactly
|
|
217
|
+
> THREE cases, and nowhere else:
|
|
218
|
+
> 1. **Decision gates — full clarity, NEVER compress:** `AskUserQuestion` prompts and any security /
|
|
219
|
+
> irreversible-action confirmation. This is the sole reason the human is in the loop — it must be readable.
|
|
220
|
+
> 2. **Genuine blockers / items needing user action** — surfaced terse and exact (`CARD-ID`, `path:line`,
|
|
221
|
+
> error string verbatim), no prose around them.
|
|
222
|
+
> 3. **One minimal end-of-batch result block** — one line per card (`CARD-ID: merged @<sha> | failed:
|
|
223
|
+
> <one clause> | deferred → <followup-id>`) plus only the actionable residue (unresolved/flagged items,
|
|
224
|
+
> the launch command, any production-readiness manual step). NOT a per-phase report — process telemetry
|
|
225
|
+
> lives in the tracker + `skill-runs.jsonl`, never rendered.
|
|
226
|
+
> This governs the orchestrator's *prose*, not any agent's return contract (that is § "Context economy" above).
|
|
227
|
+
>
|
|
210
228
|
> These rules reduce the *number* of replays; the bulk-content rule above reduces the *size* of each.
|
|
211
229
|
> Both compound — apply them together.
|
|
212
230
|
|
|
@@ -295,21 +295,10 @@ that is a **gate violation**: log it as
|
|
|
295
295
|
If the project ships a `knowledge-sync` corpus agent (declared in `.baldart/overlays/new.md`), invoke it after doc updates so the external corpus stays aligned. If no such agent exists, skip this step with a one-line notice (do NOT dispatch a non-existent agent).
|
|
296
296
|
|
|
297
297
|
3. **Proceed to Phase 6** (post-batch merge & cleanup).
|
|
298
|
-
4.
|
|
299
|
-
-
|
|
300
|
-
|
|
301
|
-
- **
|
|
302
|
-
- **Fix cycles** (total number of self-healing retries across phases)
|
|
303
|
-
- **Final review** (findings: N raised / M verified / K false positives | fixes applied: N | highest severity)
|
|
304
|
-
- **UX testing** (PASS/FAIL/SKIP | test file path if written)
|
|
305
|
-
- **QA result** (profile: skip/light/balanced/deep | verdict: PASS/FAIL/SKIP | confidence % | failing gates: which of lint/tsc/test/build/markdownlint, or "none" | findings file path). E2E is reported separately (Phase 2.6 e2e-review), not by qa-sentinel.
|
|
306
|
-
- **Issues needing user attention** (anything unresolved, partially wired, or flagged)
|
|
307
|
-
- **Commit hashes** (from tracker)
|
|
308
|
-
- **Merge commit hash** (from Phase 6)
|
|
309
|
-
- **Card status reconciliation** (Phase 6b: N cards verified DONE, K force-updated)
|
|
310
|
-
- **Worktree cleanup status** (success/failed)
|
|
311
|
-
- **Knowledge-corpus sync status** (from Step F.6 item 2 "Knowledge Base Sync" — COMPLETED/FAILED/PARTIAL, or SKIPPED if no corpus-sync agent is configured)
|
|
312
|
-
- Overall implementation status
|
|
298
|
+
4. Emit the **minimal end-of-batch result block** (per § "orchestrator is near-silent" in the core SKILL.md — this is NOT a per-phase report). ONE line per card and nothing more of the process telemetry:
|
|
299
|
+
- `<CARD-ID>: merged @<sha> | failed: <one clause> | deferred → <followup-id>`
|
|
300
|
+
|
|
301
|
+
Then, **only if non-empty**, the **actionable residue**: issues needing user attention (anything unresolved, partially wired, or flagged — terse, with `path:line` / `CARD-ID` exact). All process telemetry — files changed, test/build/lint status, fix cycles, review-finding counts, UX/QA verdicts, commit + merge hashes, card-status reconciliation, worktree cleanup, knowledge-corpus sync — is **NOT rendered**: it lives in the tracker and `$METRICS/skill-runs.jsonl`. (The Next Steps & Launch Command in 4b below and any Production Readiness manual steps ARE part of the actionable residue — keep them.)
|
|
313
302
|
|
|
314
303
|
4b. **Next Steps & Launch Command** (MANDATORY section — always present):
|
|
315
304
|
|
|
@@ -10,7 +10,7 @@
|
|
|
10
10
|
- If `DONE` (or the user chose "proceed anyway") → continue.
|
|
11
11
|
2b. **Claim** — only now set the card status to `IN_PROGRESS` (in the backlog YAML / tracker) and assign yourself. This is a state write, not a visibility emission — do not also spend a TaskUpdate / Progress-Bar turn (§ "State surface — the tracker only").
|
|
12
12
|
2c. **Trivial-card classification (BLOCKING gate for steps 3–4)** — evaluate `IS_TRIVIAL(card)` per § "Trivial-card fast-lane". Note: condition 3 (non-source diff) cannot be fully evaluated until the coder has produced the diff, so at this point compute the **provisional** trivial flag from conditions 1+2 only (`review_profile == skip` AND no Step-A trigger sourced from the card YAML text + `files_likely_touched` extensions — if EVERY path in `files_likely_touched` is non-source, condition 3 is provisionally satisfied). If provisionally trivial → **SKIP steps 3, 3a, and 4** (architecture grounding); log `trivial: architecture grounding skipped (review_profile=skip + non-source files_likely_touched + 0 triggers)` and jump to Phase 2. Re-confirm `IS_TRIVIAL` on the ACTUAL committed diff at the review gates (Phase 2.55/3.5/3.7); if the coder unexpectedly touched a source file, the guard flips the card back onto the normal review path there. If NOT provisionally trivial → run steps 3, 3a, 4 as normal.
|
|
13
|
-
3. **(skip when provisionally trivial — see 2c)** Invoke the **codebase-architect** agent (MUST per AGENTS.md) to understand the relevant codebase area, existing patterns, and architecture before any implementation. When `features.has_lsp_layer: true`, the architect uses LSP find-references for identifier-shaped lookups — this needs NO handoff from the orchestrator: the architect reads `features.has_lsp_layer` from `baldart.config.yml` directly (the flag is ambient) per `agents/code-search-protocol.md`. Likewise, when `features.has_code_graph: true`, the architect uses the Graphify code graph for structural/relational lookups (ambient flag) per `agents/code-graph-protocol.md`. The orchestrator does NOT propagate either flag. (Earlier doc versions numbered this step 4; the step that read project-status BEFORE the architect was removed because it persisted pre-analysis context — see step 3a.)
|
|
13
|
+
3. **(skip when provisionally trivial — see 2c)** Invoke the **codebase-architect** agent (MUST per AGENTS.md) to understand the relevant codebase area, existing patterns, and architecture before any implementation. When `features.has_lsp_layer: true`, the architect uses LSP find-references for identifier-shaped lookups — this needs NO handoff from the orchestrator: the architect reads `features.has_lsp_layer` from `baldart.config.yml` directly (the flag is ambient) per `agents/code-search-protocol.md`. Likewise, when `features.has_code_graph: true`, the architect uses the Graphify code graph for structural/relational lookups (ambient flag) per `agents/code-graph-protocol.md`. The orchestrator does NOT propagate either flag. (Earlier doc versions numbered this step 4; the step that read project-status BEFORE the architect was removed because it persisted pre-analysis context — see step 3a.) **Pass `OUTPUT=terse` in the architect prompt (since v4.49.0)** — this is grounding for the coder/reviewer, not an explanation, so the architect returns its structured contract (reuse table + canonical evidence + `path:line — symbol` rows + `totals:`) without narrative prose. The verbatim baseline persisted at step 5b keeps all the substance a lean `/codexreview` needs (paths, signatures, patterns, high-risk paths) while staying small in the orchestrator's context and on disk.
|
|
14
14
|
3a. Update `${paths.references_dir}/project-status.md` Active Code Context (skip when the file does not exist in the project) — do this AFTER the codebase-architect run (step 3) so the "Active Code Context" reflects the architect's findings (which files are actually in scope), not just the card YAML's `files_likely_touched`. Writing it before the architect run would persist pre-analysis claims that downstream agents (e.g. a parallel card) would then read as truth.
|
|
15
15
|
4. **Plan-auditor grounding check** — but first, a **provenance skip (since v4.7.0 — mirror of Step 3d)**: the `/prd` Step 6.6 holistic audit already ran a per-card grounding pass when the card was authored. Re-grounding here is only worth a plan-auditor spawn if something changed since. Skip the plan-auditor when **BOTH** hold:
|
|
16
16
|
- **P1 — provenance present**: the card has `metadata.holistic_audit` with non-empty `audited_at`, `audited_commit`, `audited_set`.
|
|
@@ -663,9 +663,11 @@ section of the state file. Mark dimension as covered.
|
|
|
663
663
|
If more specific detail is needed beyond what context-primer provided,
|
|
664
664
|
invoke `codebase-architect` agent (foreground) with a targeted question:
|
|
665
665
|
```
|
|
666
|
+
OUTPUT=terse — machine consumer (discovery dimension resolution), not human Q&A.
|
|
666
667
|
For feature `<slug>`: regarding the dimension `<dimension>`, does the codebase
|
|
667
668
|
already answer this? Specifically check: existing collections, routes, components,
|
|
668
|
-
permissions, and patterns. Report what you find with file references
|
|
669
|
+
permissions, and patterns. Report what you find with file references — structured
|
|
670
|
+
rows (`path:line — symbol — note`) + `totals:`, no narrative prose.
|
|
669
671
|
```
|
|
670
672
|
Avoid launching codebase-architect when the context-primer output already
|
|
671
673
|
provides sufficient information for the dimension.
|
|
@@ -743,6 +745,8 @@ Instead, run the ISA scan and present findings for validation.
|
|
|
743
745
|
1. Invoke `codebase-architect` agent with this specific prompt:
|
|
744
746
|
|
|
745
747
|
```
|
|
748
|
+
OUTPUT=terse — machine consumer (ISA table builder), not human Q&A: per-touchpoint
|
|
749
|
+
rows only, no narrative overview.
|
|
746
750
|
For feature `<slug>`: perform an Integration Surface Analysis (ISA).
|
|
747
751
|
Find ALL existing code that should reference, link to, or interact with this
|
|
748
752
|
new feature. Scan these categories systematically:
|
|
@@ -1,7 +1,8 @@
|
|
|
1
1
|
export const meta = {
|
|
2
2
|
name: 'new-card-review',
|
|
3
|
-
|
|
4
|
-
|
|
3
|
+
// One-liner only: this string re-enters the orchestrator context on completion.
|
|
4
|
+
// Full semantics live in the args-contract comment block below + references/review-cycle.md.
|
|
5
|
+
description: 'Per-wave review+fix cluster for /new (off-context).',
|
|
5
6
|
phases: [
|
|
6
7
|
{ title: 'Baseline', detail: 'architecture grounding for the wave scope' },
|
|
7
8
|
{ title: 'Discovery', detail: 'parallel finders per card (simplify / codex / qa / security)' },
|
|
@@ -1,7 +1,8 @@
|
|
|
1
1
|
export const meta = {
|
|
2
2
|
name: 'new-final-review',
|
|
3
|
-
|
|
4
|
-
|
|
3
|
+
// One-liner only: this string re-enters the orchestrator context on completion.
|
|
4
|
+
// Full semantics live in the args-contract comment block below + references/final-review.md.
|
|
5
|
+
description: 'Cross-batch final code review for /new (read-only, off-context).',
|
|
5
6
|
phases: [
|
|
6
7
|
{ title: 'Baseline', detail: 'architecture grounding for the batch scope (F.2)' },
|
|
7
8
|
{ title: 'Review', detail: 'parallel multi-agent review of the batch diff (F.3)' },
|
|
@@ -1,7 +1,8 @@
|
|
|
1
1
|
export const meta = {
|
|
2
2
|
name: 'new2-resolve',
|
|
3
|
-
|
|
4
|
-
|
|
3
|
+
// One-liner only: this string re-enters the orchestrator context on completion.
|
|
4
|
+
// Full semantics live in the args-contract comment block below.
|
|
5
|
+
description: 'Self-healing resolution pass for the autonomous new2 batch workflow.',
|
|
5
6
|
phases: [
|
|
6
7
|
{ title: 'Diagnose', detail: 'classify + terminal short-circuit + scope-expansion boundary' },
|
|
7
8
|
{ title: 'Repair', detail: 'targeted fix, then judged multi-attempt if needed' },
|
|
@@ -1,7 +1,8 @@
|
|
|
1
1
|
export const meta = {
|
|
2
2
|
name: 'new2',
|
|
3
|
-
|
|
4
|
-
|
|
3
|
+
// One-liner only: this string re-enters the orchestrator context on completion.
|
|
4
|
+
// Full semantics live in the args-contract comment block below + /new's reference modules.
|
|
5
|
+
description: 'EXPERIMENTAL autonomous workflow host for the whole /new batch (off-context, Claude-only).',
|
|
5
6
|
phases: [
|
|
6
7
|
{ title: 'Pre-flight', detail: 'deterministic workspace hygiene + worktree + dep-graph + cross-card grounding (Phase 0)' },
|
|
7
8
|
{ title: 'Implement', detail: 'dependency-gated per-card implement+review pipeline with owner_agent + specialized review fan-out' },
|