npm - hatch3r - Versions diffs - 1.9.0 → 2.0.0 - Mend

hatch3r 1.9.0 → 2.0.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (288) hide show

package/dist/content/commands/hatch3r-spec.md ADDED Viewed

@@ -0,0 +1,216 @@
+---
+id: hatch3r-spec
+type: command
+orchestrator: true
+agentPipeline: [hatch3r-greenfield-spec, hatch3r-brownfield-spec]
+description: Spec orchestrator — detects greenfield vs brownfield project state and runs the corresponding spec agent to produce requirements, acceptance criteria, risk inventory, and test plan (greenfield adds market/competitive/persona/tech-stack; brownfield adds codebase-map/pattern-detection/integration/migration).
+tags: [spec, planning, orchestrator]
+pillars:
+  governance: [P1, P2, P8]
+  content-quality: [CQ8, CQ9]
+triage_tiers: [1, 2, 3]
+supports_resume: true
+quality_charter: agents/shared/quality-charter.md
+efficiency_patterns: agents/shared/efficiency-patterns.md
+efficiency_tier: standard
+cache_friendly: true
+parallel_tool_default: true
+sub_agents_spawned:
+  count: 1
+  rationale: "one spec sub-agent per invocation — chosen between greenfield/brownfield by project-state detection (not parallel; mutually exclusive)"
+---
+# /hatch3r-spec
+Spec orchestrator that detects project state, picks the matching spec agent (`hatch3r-greenfield-spec` or `hatch3r-brownfield-spec`), and aggregates the 8-deliverable contract — shared core (requirements, acceptance criteria, risk inventory, test plan) plus state-specific deliverables (greenfield: market/competitive/persona/tech-stack; brownfield: codebase-map/pattern-detection/integration/migration). Routing is mutually exclusive — exactly one spec agent runs per invocation.
+## §0 Detect Ambiguity (P8 B1)
+Before any action, scan the user's request for unresolved questions per `agents/shared/user-question-protocol.md`. Ask via the platform-native question tool — do not proceed under silent assumption. Orchestrator-level ambiguities to resolve up front:
+1. **Subsystem scope** — does the spec cover the whole project, a named module, or a single feature? If the request says "spec the project" without a scope marker, ask which slice and offer three options (full / named-subsystem / feature-only).
+2. **MVP vs full vision** — for greenfield, ask whether the spec targets a shippable MVP (CQ8 surface area minimized) or the full envisioned product. Default `MVP` if no response.
+3. **Triage tier** — Light (single-feature), Standard (subsystem), Deep (full-project). Auto-classify from scope answer; ask only if the request explicitly overrides.
+Acceptable to proceed without asking ONLY when the brief alone is single-target, single-concern, and produces a testable acceptance criterion.
+## Phase 0 — Project state detection
+Detect greenfield vs brownfield from the working directory. Run these read-only probes in parallel:
+| Signal | Source | Brownfield score |
+|--------|--------|------------------|
+| Tracked source files | `git ls-files` count under `src/`, `lib/`, `app/`, language-default dirs | +1 per ≥10 files |
+| Manifest present | `package.json`, `go.mod`, `Cargo.toml`, `pom.xml`, `pyproject.toml`, `Gemfile`, `composer.json` | +1 per manifest found |
+| Commit count | `git rev-list --count HEAD` | +1 if ≥5 |
+| Test suite | tests/`, `__tests__/`, `spec/`, `*_test.go`, `*.test.ts` | +1 if present |
+| Existing README with body | README.md ≥30 lines and not the npm/cargo init template | +1 |
+| Adapter outputs already present | `.cursor/`, `CLAUDE.md`, `.github/copilot-instructions.md` | +1 |
+**Classification:** brownfield score ≥3 → brownfield. Otherwise → greenfield. Threshold configurable via `--state=greenfield|brownfield` to override the heuristic.
+Cache the score, the matched signals, and the resolved state in the run context. Report to the user before delegating.
+## Triage
+Resolve the triage tier from scope answer (or auto-detect):
+- **Light** — single-feature spec. Skip optional deliverables (market sizing for greenfield, migration plan for brownfield) unless the user opts in. Default sub-agent input budget ~6,000 tokens.
+- **Standard** — subsystem-level spec covering a coherent module. All 8 deliverables produced. Default budget ~12,000 tokens.
+- **Deep** — full-project spec, multi-domain. All 8 deliverables plus depth on risk inventory (≥5 named risks with mitigation owners) and test plan (per-layer coverage matrix). Default budget ~24,000 tokens.
+## Effort Override (Decision 17)
+Auto-tiering can misclassify — a trivial brief scored as Deep, or a multi-domain brief scored as Light. The user override is the recovery path mandated by hatch3r's universal `--effort` override contract ("User overridable via `--effort` flag"):
+- `--effort=light|standard|deep` forces the named tier, bypassing the Triage auto-classification above.
+- The override wins over the auto-detected tier; record both the auto-detected tier and the override in the run context so the Cost estimate block reports the budget delta.
+- When `--effort=deep` lands on a Light-classified scope (or `--effort=light` on a Deep-classified scope), accept the override and emit the resized `estimated_input_tokens_static_frame` in the Cost estimate block.
+- No override passed → the Triage auto-classification stands.
+## Confidence Propagation Contract
+Every sub-agent delegation prompt in this command MUST include the confidence expression requirement below (verbatim). Sub-agents are invoked with the `quality_charter: agents/shared/quality-charter.md` reference in their frontmatter, but the orchestrator repeats the directive to override runtime prompt defaults per the charter §1 rule.
+> Confidence expression requirement: rate every recommendation and finding as high/medium/low confidence per the quality charter (`agents/shared/quality-charter.md`). High = verified against current code. Medium = pattern-based, not fully verified. Low = best judgment, recommend human review.
+Downstream propagation: the Phase 0 project-state classification (already scored against the brownfield-signal table), each of the 8 deliverables' risk and acceptance-criteria confidence, and the Phase 6 iteration-summary Confidence field MUST carry a high/medium/low rating sourced from the spec agent. Market-research figures (greenfield) are medium at best unless tied to a cited source per Decision 14. Every ASK checkpoint that reports verification quality, every gate that evaluates a sub-agent verdict, and every output block that surfaces merge-readiness MUST carry a high/medium/low confidence rating sourced from the upstream sub-agent. Dropping the signal between stages is a gate failure.
+## Phase 1.5 — Emit Pre-Execution Cost Preview
+Before the Phase 2 dispatch, surface the `cost_estimate` block (the pre-execution half of the Cost estimate section below) so the spec run is never started blind. The Phase 0 detection + §0 ASK gate are user-driven and excluded from the duration estimate. This is the explicit pre-execution emission point mandated by `rules/hatch3r-cost-visibility.md` Pre-Execution Estimate.
+## Phase 2 — Delegate to chosen spec agent
+Spawn exactly one spec agent via the Task tool. Routing is mutually exclusive:
+- `brownfield` → `hatch3r-brownfield-spec`
+- `greenfield` → `hatch3r-greenfield-spec`
+**Inputs passed to the sub-agent:**
+| Field | Source |
+|-------|--------|
+| `project_state` | Phase 0 result (greenfield or brownfield + matched signals) |
+| `triage_tier` | Phase 1 result (light, standard, deep) |
+| `scope` | §0 answer (full project / named subsystem / single feature) |
+| `output_root` | `.hatch3r/spec/<ISO-8601-timestamp>/` |
+| `mvp_or_full` | §0 answer (greenfield only); brownfield ignores |
+| `maturity_tier` | Read from `.hatch3r/hatch.json::maturity` (defaults to `solo` per Decision 16) |
+No inline mutations from this orchestrator turn — the spec agent owns all file writes.
+## Phase 3 — Aggregate deliverables
+Wait for the spec agent's structured result. Verify all 8 deliverables landed:
+**Shared core (both states):**
+1. `requirements.md` — functional + non-functional requirements with priority labels.
+2. `acceptance-criteria.md` — testable acceptance criteria, one block per requirement.
+3. `risk-inventory.md` — named risks with likelihood × impact, mitigation owner, residual risk.
+4. `test-plan.md` — coverage by layer (unit / integration / E2E / contract / mutation per `rules/hatch3r-testing.md` mandate-map).
+**Greenfield-specific (`hatch3r-greenfield-spec`):**
+5. `market-research.md` — addressable market, growth signals, ≥2 reputable sources ≤12 months old per `agents/shared/rigor-contract.md` (Web Research Mandate).
+6. `competitive-analysis.md` — named competitors, capability matrix, differentiation hypothesis.
+7. `personas.md` — ≥2 personas with jobs-to-be-done, current workaround, acceptance signal.
+8. `tech-stack.md` — chosen stack with rationale citing CQ7 (performance budgets) and CQ6 (scalability) impact.
+**Brownfield-specific (`hatch3r-brownfield-spec`):**
+5. `codebase-map.md` — directory tree summary, top 10 modules by LOC + churn, public interface inventory.
+6. `pattern-detection.md` — recurring patterns (naming, error handling, test style) the new work must follow per CQ8.
+7. `integration-plan.md` — touchpoint inventory (which existing files extend or wrap), expand-contract migration shape per CQ8.
+8. `migration-notes.md` — pre/post snapshot diff plan, rollback path, observability touchpoints per CQ4.
+If any deliverable is missing or empty, halt and surface the gap to the user — do not silently accept a partial spec. Resume via `--resume` per the resumability contract (Decision 27).
+## Phase 4 — Per-Turn Pipeline-State Header
+> Orchestration boilerplate: see `commands/shared/orchestration-frame.md` → Per-Turn Pipeline-State Header. Phase mapping: `1` = detection + triage, `2` = spec-agent delegation, `3` = aggregation + verification, `4` = summary. Light runs are exempt from the header per the Tier 1 exemption.
+## Phase 5 — End-of-Turn Delegation Attestation
+> Orchestration boilerplate: see `commands/shared/orchestration-frame.md` → End-of-Turn Delegation Attestation. Per-command mutated-file slot: the artifacts this command writes.
+## Phase 6 — Iteration Summary
+Emit the canonical 9-section iteration summary per `rules/hatch3r-iteration-summary.md`. Sections required: Status, Outcome, Done, Not Done / Deferred / Unverified, Open Questions / Blockers, Confidence, Artifacts Touched (the 8 deliverable paths), Verifications Run (deliverable-presence checks), Suggested Next Action (typically `/hatch3r-board-fill` to convert acceptance criteria into a board).
+## Output paths
+All deliverables land in a single timestamped directory:
+```
+.hatch3r/spec/<ISO-8601-timestamp>/
+├── requirements.md
+├── acceptance-criteria.md
+├── risk-inventory.md
+├── test-plan.md
+└── {state-specific 4 files per Phase 3}
+```
+The directory is the unit of versioning — re-running `/hatch3r-spec` produces a new timestamped tree, never overwriting prior runs.
+## Iteration Summary (mandatory output)
+Emit the canonical 9-section iteration summary per `rules/hatch3r-iteration-summary.md` as the final user-facing output (the Phase 6 Iteration Summary above is the spec-specific rendering — both the Phase 6 block and the 9-section canonical contract apply). The validation gate at `.claude/rules/capability-lifecycle.md` blocks SUCCESS declarations without this block (CONSTITUTION §6 Decision 23).
+The 9 sections:
+1. **Request** — verbatim restatement of the user's ask in one sentence.
+2. **Fan-out + Cost** — `sub_agents_spawned: { count, rationale }` plus the `cost_estimate` / `cost_actuals` / `delta` blocks (see Cost Visibility below).
+3. **Web Research** — every URL fetched with access date + trust tier per `agents/shared/rigor-contract.md` (0 acceptable when no research was needed).
+4. **Files Mutated** — list with diff summary (lines added / removed / files created).
+5. **Gates Passed / Failed** — explicit list per `.claude/rules/capability-lifecycle.md` Gate Checklist.
+6. **Pillar Impact Attribution** — `progress_toward_pillar: <axis>.<pillar_id>+<delta>` per CONSTITUTION §6 Decision 17.
+7. **Verification Commands** — exact commands run with exit codes plus key output lines (≤200 chars).
+8. **Open Questions / Blockers** — explicit `None` if fully closed.
+9. **Learnings Captured** — IDs of any learnings written to `.hatch3r/learnings/` this run per `rules/hatch3r-learning-system.md`.
+### Cost Visibility (Decision 24)
+> Orchestration boilerplate: see `commands/shared/orchestration-frame.md` → Cost Estimate for the 5-field `cost_estimate` schema and the post-execution `cost_actuals` + `delta` contract; both land in Section 2 above.
+## Cost estimate (Decision 24)
+Field names match the canonical contract in `rules/hatch3r-cost-visibility.md` (Pre-Execution Estimate + Post-Execution Actuals). Emit the pre-execution block before Phase 2 dispatch:
+```yaml
+cost_estimate:
+  expected_sa_count: 1
+  estimated_input_tokens_static_frame: 12000  # standard tier; light ~6000, deep ~24000
+  estimated_web_research_queries: 4            # greenfield 2-6; brownfield is local-only (0)
+  triage_tier: light | standard | deep
+  estimated_duration_min: 8                    # standard tier; light ~4, deep ~12
+```
+Post-execution: emit the actuals + delta block per `rules/hatch3r-cost-visibility.md` before declaring iteration-summary status. Token telemetry sources from `src/pipeline/observability.ts`:
+```yaml
+cost_actuals:
+  actual_sa_count: <int>
+  actual_input_tokens: <int>
+  actual_output_tokens: <int>
+  actual_web_research_queries: <int>
+  actual_duration_min: <float>
+delta:
+  sa_count_delta: <int>
+  input_tokens_delta_percent: <float>
+  duration_delta_percent: <float>
+```
+Both blocks land in the iteration summary's Fan-out + Cost section per `rules/hatch3r-iteration-summary.md` §2. Deltas beyond 25% absolute value carry `flagged_for_review: true`.
+## References
+- hatch3r design decisions: reputable-source mandate (Decision #14) and the 2-spec-agents-with-shared-core split (Decision #23)
+- hatch3r design decisions: C8-D5-M1 orchestrator-marker contract (`orchestrator: true` + `agentPipeline`) and the Command-vs-Skill authoring criterion (Decision #13)
+- `agents/shared/user-question-protocol.md` (B1 gate)
+- `agents/shared/quality-charter.md` §1, §3, §7, §8 (confidence, ambiguity, measurable criteria)
+- `rules/hatch3r-agent-orchestration.md` (Per-Turn Pipeline-State Header, End-of-Turn Delegation Attestation, Mandatory Delegation Directive)
+- `rules/hatch3r-iteration-summary.md` (canonical end-of-turn block)
+- `rules/hatch3r-cost-visibility.md` (Decision 24 cost_estimate / cost_actuals / delta field contract)
+- hatch3r design decision: `--effort` universal override + triage_tiers (Decision 17)
+- `commands/hatch3r-board-fill.md` (orchestrator pattern reference)

package/dist/content/commands/hatch3r-test-plan.md CHANGED Viewed

@@ -9,15 +9,17 @@ quality_charter: agents/shared/quality-charter.md
 efficiency_patterns: agents/shared/efficiency-patterns.md
 cache_friendly: true
 parallel_tool_default: true
+efficiency_tier: deep
 triage_tiers: [1, 2, 3]
+supports_resume: true
 sub_agents_spawned:
   count: 5
-  rationale: Five parallel hatch3r-researcher modes per test-planning brief — coverage-analysis, complexity-and-risk, test-pattern, boundary-analysis, risk-prioritization — dispatched concurrently in Step 3; a docs-writer composes the test plan on their merged output.
+  rationale: Five parallel hatch3r-researcher modes per test-planning brief — coverage-analysis, complexity-and-risk, test-pattern, boundary-analysis, risk-prioritization — dispatched concurrently in Step 3; a docs-writer composes the test plan on their merged output. Cost-dominance per CONSTITUTION §2 P8 — token cost never serializes independent work.
 ---
 ## §0 Detect Ambiguity (P8 B1)
-Before any action, scan the user's request and provided context for unresolved questions in scope, acceptance criteria, irreversibility, or constraint conflicts (contradictory inputs, missing target, unknown convention). If any are found, ask the user via the platform-native question tool per `agents/shared/user-question-protocol.md` — do not proceed under silent assumption. This is the default path, not an exception. Acceptable to proceed without asking ONLY when scope is single-target, single-concern, and the brief alone is testable. Any residual ambiguity discovered mid-workflow invokes the same protocol.
+> Orchestration boilerplate: see `commands/shared/orchestration-frame.md` → §0 Detect Ambiguity (P8 B1). Triggers: contradictory inputs, missing target, unknown convention.
 ## Agent Pipeline
@@ -27,9 +29,11 @@ Before any action, scan the user's request and provided context for unresolved q
 | 2. Document Generation | `hatch3r-docs-writer` (test plan spec, ADRs) | Yes | Yes |
 | 3. Todo Generation | Orchestrator (inline) | No | Yes |
+**Parallel-safety conditions** (per `rules/hatch3r-agent-orchestration.md` §Parallel Safety): every parallel fan-out above holds all three — read-only or disjoint writes, deterministic aggregation, no shared mutable state.
 # Test Plan -- Comprehensive Test Strategy from Scope to Board-Ready Epic
-Take a test planning scope (feature, module, or codebase area) and produce a complete test plan specification (`docs/specs/`), architectural decision records (`docs/adr/`) when significant testing infrastructure decisions are involved, and structured `todo.md` entries (epic + sub-items) ready for `hatch3r-board-fill`. Spawns parallel researcher sub-agents (coverage analysis, complexity & risk mapping, test pattern extraction, boundary analysis, risk-based prioritization) to analyze the testing landscape from multiple angles before generating artifacts. AI proposes all outputs; user confirms before any files are written. Supports two modes: feature-scoped test planning (plan tests for a specific feature) and module/codebase-level coverage auditing (assess and improve test coverage across an area). Optionally chains into `hatch3r-test-writer` for immediate test implementation or `hatch3r-board-fill` to create tracking issues.
+Take a test planning scope (feature, module, or codebase area) and produce a complete test plan specification (`docs/specs/`), architectural decision records (`docs/adr/`) when significant testing infrastructure decisions are involved, and structured `todo.md` entries (epic + sub-items) ready for `hatch3r-board-fill`. Spawns parallel researcher sub-agents (coverage analysis, complexity & risk mapping, test pattern extraction, boundary analysis, risk-based prioritization) to analyze the testing landscape from multiple angles before generating artifacts. AI proposes all outputs; user confirms before any files are written. Supports two modes: feature-scoped test planning (plan tests for a specific feature) and module/codebase-level coverage auditing (assess and improve test coverage across an area). Optionally chains into `hatch3r-testability` for CQ5 mandate verification or `hatch3r-board-fill` to create tracking issues.
 ---
@@ -45,6 +49,12 @@ Take a test planning scope (feature, module, or codebase area) and produce a com
 ---
+## Confidence Propagation Contract
+> Orchestration boilerplate: see `commands/shared/orchestration-frame.md` → Confidence Propagation Contract. Readiness kind: plan.
+---
 ## Workflow
 Execute these steps in order. **Do not skip any step.** Ask the user at every checkpoint marked with ASK.
@@ -59,6 +69,25 @@ Classify the test-planning request before delegating:
 If Tier 1, run the reduced researcher set and skip Step 6 (ADRs). If Tier 2, run the standard pipeline below. If Tier 3, run the full pipeline including ADR generation for testing infrastructure decisions and confirm scope with the user before writing files.
+### Step 0.5: Emit Pre-Execution Cost Preview
+Before the first researcher dispatch (Step 3), surface the cost preview so a multi-researcher test-planning run is never started blind. Emit the `cost_estimate` block per `rules/hatch3r-cost-visibility.md` Pre-Execution Estimate, calibrated to the Step 0 triage tier:
+```yaml
+cost_estimate:
+  expected_sa_count: <triage tier → Tier 1 ~2, Tier 2 ~5, Tier 3 up to 5>
+  estimated_input_tokens_static_frame: <int>
+  estimated_web_research_queries: <int>
+  triage_tier: light | standard | deep
+  estimated_duration_min: <int>
+```
+Post-execution actuals + delta land in the iteration summary's Fan-out + Cost section per `rules/hatch3r-cost-visibility.md` Post-Execution Actuals. Token telemetry sources from `src/pipeline/observability.ts`.
+### Effort Override (Decision 17)
+> Orchestration boilerplate: see `commands/shared/orchestration-frame.md` → Effort Override (Decision 17). Misclassification example: a single-module test plan scored as Deep, or a full-suite restructure scored as Light.
 ---
 ### Step 1: Gather Test Planning Scope
@@ -100,7 +129,7 @@ After the test planning brief is confirmed, probe for missing requirements acros
    - **Test data requirements**: Data generation approach? Fixture vs factory preference? Seeding requirements? PII concerns in test data?
    - **Browser/E2E scope**: Which user flows need E2E coverage? Browser matrix? Mobile viewport testing?
    - **Flaky test tolerance**: Existing flaky tests? Quarantine process in place? Retry budget in CI?
-3. Skip dimensions that the brief already addresses clearly.
+3. Skip dimensions that the brief already addresses with a stated answer.
 **ASK:** "Before research begins, I have {N} questions to confirm the test plan covers all relevant dimensions:
 {numbered question list -- each with the dimension label and why the answer matters}
@@ -175,7 +204,7 @@ The `coverage-analysis` sub-agent establishes the baseline -- its output tells u
 The `risk-prioritization` sub-agent's output is critical for ordering the test plan -- it determines which tests should be written first for maximum risk reduction.
-The `test-pattern` sub-agent ensures the plan aligns with existing conventions, so new tests fit naturally into the codebase.
+The `test-pattern` sub-agent ensures the plan aligns with existing conventions, so new tests fit the codebase's existing patterns (file layout, fixture style, assertion idiom).
 **Each sub-agent prompt must also include** the Resolved Requirements from Step 1b (user's answers to dimension-probing questions) so researchers can factor in the user's explicit decisions.
@@ -340,7 +369,7 @@ Reference conventions from `hatch3r-testing` rule:
 | Deterministic (no wall clock) | {aligned / needs attention} | {details} |
 | Isolated (own setup/teardown) | {aligned / needs attention} | {details} |
 | Fast (unit < 50ms, integration < 2s) | {aligned / needs attention} | {details} |
-| Named clearly (behavior descriptions) | {aligned / needs attention} | {details} |
+| Named per behavior description (verb + outcome) | {aligned / needs attention} | {details} |
 | No network in unit tests | {aligned / needs attention} | {details} |
 | Fakes > stubs > mocks hierarchy | {aligned / needs attention} | {details} |
 | Factory over fixtures | {aligned / needs attention} | {details} |
@@ -519,16 +548,63 @@ Files Created/Updated:
 ### Step 9 (Optional): Chain into Test Writer or Board-Fill
 **ASK:** "All files written. What would you like to do next?
-- **Run `hatch3r-test-writer`** to implement the highest-priority (P0) tests immediately
+- **Run `hatch3r-testability`** to verify the highest-priority (P0) tests meet the CQ5 mandate map / coverage floor
 - **Run `hatch3r-board-fill`** to create GitHub issues from the new todo.md entries
 - **Neither** -- I'll take it from here"
-If `hatch3r-test-writer`: instruct the user to invoke `hatch3r-test-writer`, passing the P0 test cases from the test plan spec as the scope.
+If `hatch3r-testability`: instruct the user to invoke `hatch3r-testability`, passing the P0 test cases from the test plan spec as the scope.
 If `hatch3r-board-fill`: instruct the user to invoke `hatch3r-board-fill`. Note that board-fill will perform its own deduplication, grouping, dependency analysis, and readiness assessment on the entries.
 ---
+## Resumability (Decision 27/30)
+test-plan is long-running — a Tier 3 plan fans out parallel researcher sub-agents across coverage, framework, and convention modes (Step 3), drives an ASK synthesis (Step 4), then writes a multi-file test plan spec + ADRs + todo.md (Steps 5–8). Per hatch3r's workspace-checkpointed resumability contract, checkpoint progress so an interrupted run re-enters at the last completed step rather than re-running the researcher batch.
+> Orchestration boilerplate: see `commands/shared/orchestration-frame.md` → Checkpoint Contract. Per-command slots: workspace `.test-plan-workspace/`; step range the Step 0 → Step 9 progression; `wave` = researcher-batch index; snapshot/rollback paths `docs/specs/`, `docs/adr/`, and `todo.md`; `meta` adds `testPlanSlug`. Write points: after Step 1 scope confirmation, after Step 2 context load, after Step 3 researcher fan-out completes, after the Step 4 ASK synthesis is confirmed, and after each Step 5–8 file write (test plan spec, ADRs, todo.md) so already-generated artifacts survive a crash and are not regenerated on resume.
+---
+## Per-Turn Pipeline-State Header (Bypass Protection)
+> Orchestration boilerplate: see `commands/shared/orchestration-frame.md` → Per-Turn Pipeline-State Header. Phase mapping for test-plan: `1` = feature/diff intake + mandate-class detection, `2` = hatch3r-testability sub-agent dispatch (fuzz / mutation / contract / property / visual / AI-eval), `3` = plan synthesis + coverage analysis, `4` = plan write + iteration-summary. Tier 1 runs are exempt per the Tier 1 exemption.
+## End-of-Turn Delegation Attestation (Bypass Protection)
+> Orchestration boilerplate: see `commands/shared/orchestration-frame.md` → End-of-Turn Delegation Attestation. Per-command mutated-file slot: test-plan doc, mandate matrix, coverage spec.
+## Iteration Summary (mandatory output)
+Emit the canonical 9-section iteration summary per `rules/hatch3r-iteration-summary.md` as the final user-facing output. The validation gate at `.claude/rules/capability-lifecycle.md` blocks SUCCESS declarations without this block (CONSTITUTION §6 Decision 23).
+The 9 sections:
+1. **Request** — verbatim restatement of the user's ask in one sentence.
+2. **Fan-out + Cost** — `sub_agents_spawned: { count, rationale }` plus the `cost_estimate` / `cost_actuals` / `delta` blocks (see Cost Visibility below).
+3. **Web Research** — every URL fetched with access date + trust tier per `agents/shared/rigor-contract.md` (0 acceptable when no research was needed).
+4. **Files Mutated** — list with diff summary (lines added / removed / files created).
+5. **Gates Passed / Failed** — explicit list per `.claude/rules/capability-lifecycle.md` Gate Checklist.
+6. **Pillar Impact Attribution** — `progress_toward_pillar: <axis>.<pillar_id>+<delta>` per CONSTITUTION §6 Decision 17.
+7. **Verification Commands** — exact commands run with exit codes plus key output lines (≤200 chars).
+8. **Open Questions / Blockers** — explicit `None` if fully closed.
+9. **Learnings Captured** — IDs of any learnings written to `.hatch3r/learnings/` this run per `rules/hatch3r-learning-system.md`.
+### Cost Visibility (Decision 24)
+> Orchestration boilerplate: see `commands/shared/orchestration-frame.md` → Cost Estimate for the 5-field `cost_estimate` schema and the post-execution `cost_actuals` + `delta` contract; both land in Section 2 above.
+## Cost estimate (Decision 24)
+This command emits cost transparency per `rules/hatch3r-cost-visibility.md` and CONSTITUTION §6 Decision 24/29:
+- **Pre-execution `cost_estimate`** -- emitted in Step 0.5 before the first researcher dispatch.
+- **Post-execution `cost_actuals` + `delta`** -- appended to the iteration summary's Fan-out + Cost section per `rules/hatch3r-iteration-summary.md` §2.
+Per-tier `expected_sa_count` calibration (from frontmatter `sub_agents_spawned.count: 5` × tier heuristic in `rules/hatch3r-cost-visibility.md` Pre-Execution Estimate): Tier 1 ≈ 2 (coverage-analysis + risk-prioritization); Tier 2/3 spawn up to 5 parallel researcher modes. Deltas beyond 25% absolute value carry `flagged_for_review: true`. Token telemetry sources from `src/pipeline/observability.ts`; estimation primitives from `src/pipeline/costEstimator.ts`.
+---
 ## Error Handling
 - **Sub-agent failure:** Retry the failed sub-agent once. If it fails again, present partial results from the remaining sub-agents and ask the user how to proceed (continue without that researcher's input / provide the missing information manually / abort).
@@ -549,7 +625,7 @@ If `hatch3r-board-fill`: instruct the user to invoke `hatch3r-board-fill`. Note
 - **Stay within the test planning scope** defined by the user in Step 1. Do not invent test areas the user did not describe or imply. Flag coverage expansion opportunities but do not act on them without explicit approval.
 - **Coverage targets must align with `hatch3r-testing` rule thresholds.** Statement 80%, branch 70%, critical modules 90%/85%. If the user requests lower targets, note the divergence explicitly.
 - **Test cases must follow the convention hierarchy.** Fakes > stubs > mocks, as specified in `hatch3r-testing` rule. If the codebase uses a different convention, note the divergence and recommend gradual alignment.
-- **Do not prescribe implementation details.** The test plan specifies what to test, not how to implement the tests. Implementation details are `hatch3r-test-writer`'s responsibility. Test case outlines include behavior descriptions and acceptance criteria, not code.
+- **Do not prescribe implementation details.** The test plan specifies what to test, not how to implement the tests. Implementation details are the implementer's responsibility, gated by `hatch3r-testability` (CQ5). Test case outlines include behavior descriptions and acceptance criteria, not code.
 - **Property-based and mutation testing are opt-in.** Only include these in the plan if the user opts in during Step 1b or the codebase already uses them.
 - **All 5 researchers must complete before proceeding to Step 4.** Do not generate specs from partial research.
 - **todo.md must be compatible with board-fill format** -- markdown checklist with bold titles, grouped by priority, referencing source specs.