npm - hatch3r - Versions diffs - 1.9.0 → 2.0.0 - Mend

hatch3r 1.9.0 → 2.0.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (288) hide show

package/dist/content/commands/hatch3r-workflow.md CHANGED Viewed

@@ -2,22 +2,24 @@
 id: hatch3r-workflow
 type: command
 orchestrator: true
-agentPipeline: [hatch3r-researcher, hatch3r-implementer, hatch3r-reviewer, hatch3r-fixer, hatch3r-test-writer, hatch3r-security-auditor, hatch3r-docs-writer, hatch3r-lint-fixer, hatch3r-a11y-auditor, hatch3r-perf-profiler]
+agentPipeline: [hatch3r-researcher, hatch3r-implementer, hatch3r-reviewer, hatch3r-fixer, hatch3r-docs-writer, hatch3r-lint-fixer, hatch3r-ui, hatch3r-ux, hatch3r-security, hatch3r-reliability, hatch3r-testability, hatch3r-scalability, hatch3r-performance, hatch3r-maintainability, hatch3r-enhancability]
 description: Guided development lifecycle with 4 phases (Analyze, Plan, Implement, Review) and scale-adaptive Quick Mode for small tasks.
 tags: [implementation, orchestration]
 quality_charter: agents/shared/quality-charter.md
 efficiency_patterns: agents/shared/efficiency-patterns.md
 cache_friendly: true
 parallel_tool_default: true
+efficiency_tier: standard
 triage_tiers: [1, 2, 3]
+supports_resume: true
 sub_agents_spawned:
-  count: 10
-  rationale: Full 4-phase delivery pipeline — researcher (Phase 1), implementer (one per independent module, Phase 3), reviewer ↔ fixer review loop (Phase 4a), then a parallel Phase-4b final-quality batch (test-writer, security-auditor, docs-writer, lint-fixer, a11y-auditor, perf-profiler) bounded by max_phase4_parallel.
+  count: 15
+  rationale: Full 4-phase delivery pipeline — researcher (Phase 1), implementer (one per independent module, Phase 3), reviewer ↔ fixer review loop (Phase 4a), then a parallel Phase-4b final-quality batch (docs-writer + lint-fixer + CQ1-CQ9 vector specialists ui/ux/security/reliability/testability/scalability/performance/maintainability/enhancability — testability and security cover the always-on test + security gates) bounded by max_phase4_parallel. Cost-dominance per CONSTITUTION §2 P8 — token cost never serializes independent work.
 ---
 ## §0 Detect Ambiguity (P8 B1)
-Before any action, scan the user's request and provided context for unresolved questions in scope, acceptance criteria, irreversibility, or constraint conflicts (contradictory inputs, missing target, unknown convention). If any are found, ask the user via the platform-native question tool per `agents/shared/user-question-protocol.md` — do not proceed under silent assumption. This is the default path, not an exception. Acceptable to proceed without asking ONLY when scope is single-target, single-concern, and the brief alone is testable. Any residual ambiguity discovered mid-workflow invokes the same protocol.
+> Orchestration boilerplate: see `commands/shared/orchestration-frame.md` → §0 Detect Ambiguity (P8 B1). Triggers: contradictory inputs, missing target, unknown convention.
 # Development Workflow -- Guided Lifecycle for Structured Implementation
@@ -32,10 +34,12 @@ Optional guided development lifecycle command that walks through structured phas
 | 1. Research | `hatch3r-researcher` (modes by task type) | Per focus area | Yes (skip for trivial edits) |
 | 2. Implementation | `hatch3r-implementer` (one per module) | Yes (independent modules) | Yes |
 | 3a. Review Loop | `hatch3r-reviewer` -> `hatch3r-fixer` (max 3 iterations until clean) | No (sequential loop) | Yes |
-| 3b. Final Quality — Testing | `hatch3r-test-writer` | Yes | Yes (code changes) |
-| 3c. Final Quality — Security | `hatch3r-security-auditor` | Yes | Yes (code changes) |
+| 3b. Final Quality — Testing | `hatch3r-testability` | Yes | Yes (code changes) |
+| 3c. Final Quality — Security | `hatch3r-security` | Yes | Yes (code changes) |
 | 3d. Final Quality — Docs | `hatch3r-docs-writer` | Yes | When APIs/architecture/UX affected |
-| 3e. Final Quality — Conditional | `hatch3r-lint-fixer`, `hatch3r-a11y-auditor`, `hatch3r-perf-profiler` | Yes | When triggered |
+| 3e. Final Quality — Conditional | `hatch3r-lint-fixer` + all conditional CQ specialists (CQ1-CQ9: `hatch3r-ui`, `hatch3r-ux`, `hatch3r-reliability`, `hatch3r-scalability`, `hatch3r-performance`, `hatch3r-maintainability`, `hatch3r-enhancability`) per `SPECIALIST_TRIGGER_TABLE` | Yes | Spawn each whose trigger matches the diff |
+**Parallel-safety conditions** (per `rules/hatch3r-agent-orchestration.md` §Parallel Safety): every parallel fan-out above (multi-module implementers in Phase 3, the Phase-4b final-quality batch) holds all three — read-only or disjoint writes, deterministic aggregation, no shared mutable state.
 ## Browser Automation
@@ -69,14 +73,16 @@ Every sub-agent delegation prompt in this command MUST include the confidence ex
 Downstream propagation: every ASK checkpoint that reports verification quality, every gate that evaluates a sub-agent verdict, and every output block that surfaces merge-readiness MUST carry a high/medium/low confidence rating sourced from the upstream sub-agent. Dropping the signal between stages is a gate failure.
+Absent-confidence clause (D13-SA13.2-F3): a clean verdict (0 Critical + 0 Warning) whose reviewer `confidence` field is absent or unparseable is treated as `confidence: low` at every gate — trigger the second pass / ASK, never proceed. This matches the code gate in `src/pipeline/reviewLoop.ts` (`evaluateReviewGate`), where an `unknown`/absent confidence ranks below `low` (`CONFIDENCE_RANK.unknown = 0`) and so does not pass. A prose gate that reads `confidence != low` would otherwise let absence pass silently — inverting the code gate. Resolve absence to `low` before applying the Step 0.7 floor.
 ---
 ## Triage
 Classify the development task before delegating. Detailed mode classification runs in Step 0 (Triage / Scale-Adaptive Mode Selection); this section summarizes the routing:
-- **Tier 1 (trivial)**: single-line edit, typo, or trivial config change; Quick Mode skips most ASK checkpoints and runs the streamlined 3-step path.
-- **Tier 2 (standard)**: bug fix or small feature in 1–3 files; Quick Mode with full sub-agent delegation (researcher, implementer, reviewer, fixer, test-writer, security-auditor).
+- **Tier 1 (trivial)**: single-line edit, typo, or trivial config change; Quick Mode runs the streamlined 3-step path. The B1 ambiguity gate (`§0 Detect Ambiguity` per `.claude/rules/clarification-default.md`) is NEVER skipped — Tier 1 admission already requires that the brief alone be testable, single-file, and single-concern, so the gate evaluates trivially and passes silently when those preconditions hold. ASK checkpoints downstream of the brief (mid-plan, end-of-implementation, mid-review) are reduced to one consolidated end-of-run "merge or revise?" prompt rather than per-phase prompts — Tier 1 work is short enough that incremental ASK fatigue would dominate the workflow without proportional benefit. Any mid-run ambiguity that wasn't visible at the brief surface re-invokes the B1 protocol on the spot. This satisfies P8 B1 default-not-exception: the protocol still applies; the checkpoint cadence is right-sized (Finding D7-M11 / D7-SA7.4-4).
+- **Tier 2 (standard)**: bug fix or small feature in 1–3 files; Quick Mode with full sub-agent delegation (researcher, implementer, reviewer, fixer, hatch3r-testability, hatch3r-security).
 - **Tier 3 (deep)**: multi-module feature, architectural change, or cross-cutting refactor; Full Mode with all 4 phases (Analyze, Plan, Implement, Review) and deep research before mutating files.
 If Tier 1, take Quick Mode with reduced sub-agent prompts. If Tier 2, take Quick Mode below. If Tier 3, switch to Full Mode and confirm the plan with the user before implementation.
@@ -116,6 +122,40 @@ Evaluate the task against both signal sets. Count matching signals to determine
 **ASK:** "Task: {user's task description}. Complexity assessment: {assessment}. Recommended mode: {Full/Quick}. Proceed with {recommended}? (yes / switch to {other} / let me decide per phase)"
+### Step 0.5: Emit Pre-Execution Cost Preview
+Before the first sub-agent dispatch (Phase 1 / Quick Step 1), surface the cost preview to the user so a multi-agent run is never started blind. Emit the `cost_estimate` block per `rules/hatch3r-cost-visibility.md` Pre-Execution Estimate, calibrated to the triage tier selected in Step 0:
+```yaml
+cost_estimate:
+  expected_sa_count: <triage tier → Quick Tier 1 ~2, Quick Tier 2 ~6, Full Tier 3 up to 15>
+  estimated_input_tokens_static_frame: <int>
+  estimated_web_research_queries: <int>
+  triage_tier: light | standard | deep
+  estimated_duration_min: <int>
+```
+Post-execution actuals + delta land in the Phase 4 / Quick Step 3 iteration summary's Fan-out + Cost section per `rules/hatch3r-cost-visibility.md` Post-Execution Actuals. Token telemetry sources from `src/pipeline/observability.ts`.
+### Step 0.6: Effort Override (Decision 17)
+Auto-tiering (Step 0 mode selection) can misclassify — a single-file edit scored as Full Mode, or a cross-cutting refactor scored as Quick Mode. The user override is the recovery path mandated by hatch3r's universal `--effort` override contract ("User overridable via `--effort` flag"):
+- `--effort=light|standard|deep` forces the named tier (light → Quick Tier 1, standard → Quick Tier 2, deep → Full Tier 3), bypassing the Step 0 auto-classification. This composes with the existing `--mode=full|quick` flag: an explicit `--mode` wins over the `--effort`-derived mode.
+- The override wins over the auto-detected tier; record both the auto-detected tier and the override in the run context so the Cost estimate block reports the budget delta.
+- The override never disables the Safety Guardrails (destructive operations, breaking changes, open questions, quality-gate failures always stop) — those are mode-independent.
+- No override passed → the Step 0 auto-classification stands.
+### Step 0.7: Confidence Floor (Decision 16 / D13-SA13.3-F13.3.3)
+`--effort` calibrates work-effort depth; `--confidence-floor` calibrates the confidence threshold at which the review gate blocks. They are orthogonal — a Tier 1 typo fix and a Tier 3 refactor can each carry any floor. This is the user's pre-flight assertiveness knob (the forced-second-pass on low confidence in Phase 4a is post-hoc; the floor lets the user set the bar before the run):
+- `--confidence-floor=any|medium|high` (default `any`). Resolution order: explicit flag wins over the persisted `hatch3r config confidence_floor=...` default, which wins over the built-in `any`.
+- **`any`** (current behavior): the Phase 4a confidence-aware gate forces a second reviewer pass only when reviewer confidence `== low` with 0 Critical + 0 Warning.
+- **`medium`**: force a second pass on ANY finding rated `confidence == low`, even with 0 Critical + 0 Warning.
+- **`high`**: force a second pass on any finding rated `confidence != high`, AND ASK the user on every low-confidence finding regardless of severity.
+- Per P1 maturity tier (Decision 16): solo defaults `any`, enterprise defaults `high`. The floor never relaxes a Safety Guardrail — it only tightens the second-pass / ASK trigger.
 ---
 ## Full Mode
@@ -208,7 +248,7 @@ Map the task type to the appropriate skill:
 | Visual refactor  | hatch3r-visual-refactor        |
 | QA validation    | hatch3r-qa-validation          |
-Identify supporting agents needed: test-writer, docs-writer, reviewer, security-auditor.
+Identify supporting agents needed: hatch3r-testability, docs-writer, reviewer, hatch3r-security.
 #### 2c. Identify Risks
@@ -237,18 +277,19 @@ Implementation Plan:
 ### Phase 3: Implement
-**Goal:** Execute the plan using the selected hatch3r skill, delegating to sub-agents per the Universal Sub-Agent Pipeline.
+**Goal:** Execute the plan using the selected hatch3r skill, delegating to sub-agents per the Universal sub-agent Pipeline.
-#### 3a. Context Gathering (Researcher Sub-Agent)
+#### 3a. Context Gathering (Researcher sub-agent)
 You MUST spawn a `hatch3r-researcher` sub-agent via the Task tool (`subagent_type: "generalPurpose"`) before implementation. Skip only for trivial single-line edits (typos, comment fixes, single-value config changes).
 - Select research modes by task type (bug → symptom-trace/root-cause/codebase-impact, feature → codebase-impact/feature-design/architecture, refactor → current-state/refactoring-strategy/migration-path, QA → codebase-impact).
 - Add tier-appropriate modes per the `hatch3r-deep-context` rule if not already run in Phase 1 Step 1b.
 - Use depth `quick` for low-risk, `standard` for medium-risk, `deep` for high-risk. The complexity tier may override depth upward.
-- Await the researcher result. Use its structured output to inform Step 3b.
+- **Question decomposition (K-parallel-researcher path, per `rules/hatch3r-agent-orchestration.md` → Scaling Heuristic):** when the task decomposes into ≥2 independent research questions — answers that do not depend on each other (e.g. "what is the current auth flow?" and "what does the billing webhook expect?" for a cross-cutting feature) — spawn one `hatch3r-researcher` sub-agent per question in parallel (each scoped to its question with the modes above), then union their structured findings into a single Phase-1 brief before Step 3b. This is the parallel-safe Phase-1 case (read-only, deterministic union, no shared mutable state per the Three Conditions to Parallelize). Keep the single-researcher path for a single-question task; do not serialize independent questions to save tokens (P8 dominates P7).
+- Await the researcher result(s). Use the structured output (unioned across researchers when fanned out) to inform Step 3b.
-#### 3b. Core Implementation (Implementer Sub-Agent)
+#### 3b. Core Implementation (Implementer sub-agent)
 You MUST spawn a `hatch3r-implementer` sub-agent via the Task tool (`subagent_type: "generalPurpose"`). Do NOT implement inline — always delegate to a dedicated implementer.
@@ -267,6 +308,7 @@ The implementer sub-agent prompt MUST include:
 - **Reference conventions** from `similar-implementation` output (Tier 2/3) — triggers the implementer's Convention Lock step.
 - **Resolved requirements** from `requirements-elicitation` answers (Tier 2/3) — explicit decisions on ambiguities.
 - **Blast radius data** from enhanced `codebase-impact` (Tier 3) — transitive dependency trace and API consumer map.
+- `correlation_id` (UUID v4 generated per top-level task per `rules/hatch3r-agent-orchestration.md` → Correlation ID) — the sub-agent echoes it in logs, outputs, and status reports for cross-phase attribution.
 - Confidence expression requirement: rate every recommendation and finding as high/medium/low confidence per the quality charter (`agents/shared/quality-charter.md`). High = verified against current code. Medium = pattern-based, not fully verified. Low = best judgment, recommend human review.
 Await the implementer sub-agent. Collect its structured result.
@@ -278,10 +320,10 @@ Await the implementer sub-agent. Collect its structured result.
 #### 3d. Run Quality Checks
-Run the project's quality checks (adapt to project conventions):
+Run the project's quality checks (adapt to project conventions; resolved to the project's language-aware command set at sync time, fallback when detection is unknown: `npm run lint && npm run typecheck && npm run test`):
 ```bash
-npm run lint && npm run typecheck && npm run test
+${HATCH3R:VERIFY_GATE_ALL}
 ```
 Fix any issues before proceeding. If quality checks fail, loop back and resolve before advancing to Phase 4.
@@ -290,7 +332,7 @@ Fix any issues before proceeding. If quality checks fail, loop back and resolve
 ---
-### Phase 4: Review (Sub-Agent Quality Pipeline)
+### Phase 4: Review (sub-agent Quality Pipeline)
 **Goal:** Verify quality and completeness via a two-stage sub-agent pipeline before finalizing. The Review Loop (4a) iterates until code quality is clean, then Final Quality (4b) runs remaining specialists in parallel.
@@ -299,12 +341,14 @@ Fix any issues before proceeding. If quality checks fail, loop back and resolve
 Spawn a `hatch3r-reviewer` sub-agent via the Task tool (`subagent_type: "generalPurpose"`). Include the diff and acceptance criteria in the prompt.
 1. **Review:** Await the reviewer result. Extract Critical and Warning findings AND the reviewer's top-level `confidence` field (high/medium/low).
-2. **Confidence-aware gate:**
-   - **0 Critical + 0 Warning AND reviewer confidence != low:** Review loop is clean. Proceed to 4b.
-   - **0 Critical + 0 Warning AND reviewer confidence == low:** Trigger a second reviewer pass before exiting. Do not proceed to 4b until the second pass returns non-low confidence OR the user explicitly accepts the low-confidence PASS at the ASK checkpoint in step 5.
+2. **Confidence-aware gate** (the second-pass trigger tightens with the `--confidence-floor` set in Step 0.7 — `any` = default below, `medium`/`high` raise the bar). First resolve the reviewer `confidence` field per the Confidence Propagation Contract absent-confidence clause: an absent or unparseable value is treated as `low` (it does NOT satisfy `!= low`), matching the code gate where `unknown` ranks below `low`.
+   - **0 Critical + 0 Warning AND reviewer confidence == high or medium:** Review loop is clean. Proceed to 4b. (Floor `medium`: also force a second pass if any individual finding is `confidence == low`. Floor `high`: force a second pass if reviewer confidence `!= high` OR any finding is `!= high`, AND ASK on every low-confidence finding.)
+   - **0 Critical + 0 Warning AND reviewer confidence == low (including absent/unparseable, resolved to `low` above):** Trigger a second reviewer pass before exiting. Do not proceed to 4b until the second pass returns high/medium confidence OR the user explicitly accepts the low-confidence PASS at the ASK checkpoint in step 5.
 3. **If Critical or Warning findings exist:** Spawn a `hatch3r-fixer` sub-agent with the reviewer output. The fixer applies fixes for all Critical and Warning findings.
 4. **Re-review:** After the fixer completes, spawn `hatch3r-reviewer` again to verify fixes.
-5. **Repeat** steps 2-4 for a maximum of **3 iterations**. If still not clean after 3 iterations, **ASK** the user how to proceed (force continue / manual fix / abort).
+5. **Repeat** steps 2-4 for a maximum of **3 iterations** (code-class cap). If still not clean after 3 iterations, **ASK** the user how to proceed (force continue / manual fix / abort).
+> **Iteration-cap rationale (D10-SA10.7-F10.7.7).** Code reviews diverge faster than spec reviews — a code finding can spawn a regression the next iteration must catch — so the code-class loop here caps at 3. The spec-class loop in `hatch3r-board-fill` Step 7.9d caps at 4 because issue-spec reviews converge more slowly and deterministically (text refinement, no runtime regressions). Both are bounded below `DEFAULT_MAX_REVIEW_ITERATIONS` (4) in `src/pipeline/reviewLoop.ts`, which keeps the oscillation detector reachable in default config. Expected convergence is 1–2 iterations; the cap is the divergence backstop, not the target.
 After each reviewer iteration, assess the reviewer's findings confidence: if the reviewer rates any finding as low-confidence, flag it separately in the ASK prompt so the user can prioritize human review of uncertain findings. The reviewer sub-agent output MUST include a top-level `confidence: high | medium | low` field (not just per-finding) so step 2 can evaluate it deterministically.
@@ -313,32 +357,45 @@ Each reviewer/fixer sub-agent prompt MUST include:
 - All `scope: always` rule directives from `rules/`.
 - The diff or file changes to review/fix.
 - The task's acceptance criteria.
+- `correlation_id` (UUID v4 per top-level task per `rules/hatch3r-agent-orchestration.md` → Correlation ID).
 - Confidence expression requirement: rate every recommendation and finding as high/medium/low confidence per the quality charter (`agents/shared/quality-charter.md`). High = verified against current code. Medium = pattern-based, not fully verified. Low = best judgment, recommend human review.
 #### 4b. Final Quality (Parallel Specialists)
-**ONLY after the review loop (4a) reports 0 Critical + 0 Warning findings**, spawn the remaining specialist sub-agents. Use the Task tool with `subagent_type: "generalPurpose"`. Dispatch is bounded by `max_phase4_parallel` (default `3`, env-overridable via `HATCH3R_MAX_PHASE4_PARALLEL`, valid range 1-16) per `rules/hatch3r-agent-orchestration.md` Phase 4 — Final Quality. When the applicable specialists exceed the bound, batch by severity priority `CRITICAL → HIGH → MEDIUM → LOW`; each batch runs to completion before the next.
+**ONLY after the review loop (4a) reports 0 Critical + 0 Warning findings**, spawn the remaining specialist sub-agents. Use the Task tool with `subagent_type: "generalPurpose"`. Dispatch is bounded by the orchestrator-honored fan-out width `max_phase4_parallel` (default `8`) per `rules/hatch3r-agent-orchestration.md` Phase 4 — Final Quality — LLM-honored guidance, not a code-enforced cap (the host Task tool applies no platform fan-out limit). The bound exists for upstream provider rate-limit headroom, not per-orchestrator context cost (P8 dominates P7). When the applicable specialists exceed the bound, batch by severity priority `CRITICAL → HIGH → MEDIUM → LOW`; each batch runs to completion before the next.
 **Always spawn (mandatory for every code change):**
-1. **`hatch3r-test-writer`** — tests for all code changes. Unit tests for new logic, regression tests for bug fixes, integration tests for cross-module changes.
-2. **`hatch3r-security-auditor`** — security review of all code changes. Audit data flows, access control, input validation, and secret management.
+1. **`hatch3r-testability`** (CQ5) — confirm tests meet the mandate map / coverage floor for all code changes. Unit tests for new logic, regression tests for bug fixes, integration tests for cross-module changes.
+2. **`hatch3r-security`** (CQ3) — security review of all code changes. Audit data flows, access control, input validation, and secret management against the CQ3 threshold set.
 **Always evaluate (spawn when applicable):**
 3. **`hatch3r-docs-writer`** — spawn when changes affect public APIs, architectural patterns, user-facing behavior, or when specs/ADRs need updating. Skip silently if no documentation impact.
-**Conditional specialists (spawn when triggered):**
+**Conditional specialists (spawn each whose trigger matches the diff):**
+Spawn **all conditional CQ specialists (CQ1-CQ9) per `SPECIALIST_TRIGGER_TABLE`** plus `hatch3r-lint-fixer` — not the lint/ui/performance subset alone. Evaluate each via `shouldTriggerSpecialist(specialist, changedFiles, projectType)` (D6-M11); spawn the ones that return `{ triggered: true }`:
-4. **`hatch3r-lint-fixer`** — when lint or type errors are present after implementation.
-5. **`hatch3r-a11y-auditor`** — when UI or accessibility changes are made.
-6. **`hatch3r-perf-profiler`** — when performance-sensitive changes are made.
+4. **`hatch3r-lint-fixer`** — lint or type errors present after implementation.
+5. **`hatch3r-ui`** (CQ1) — UI component / design-token / theme files modified.
+6. **`hatch3r-ux`** (CQ2) — flow / route-transition / modal / error-state files or microcopy/i18n strings modified.
+7. **`hatch3r-reliability`** (CQ4) — service/request handler, OTel/SLO, retry/circuit-breaker, or Kubernetes-probe code modified.
+8. **`hatch3r-scalability`** (CQ6) — request handler, queue client, connection-pool, cache, or background-job code modified.
+9. **`hatch3r-performance`** (CQ7) — ORM/data-access, UI-rendering, or bundle/hot-path code modified.
+10. **`hatch3r-maintainability`** (CQ8) — any code mutation (duplication + complexity scan); schema / migration / API spec modified.
+11. **`hatch3r-enhancability`** (CQ9) — user-visible behavior, public API surface, config schema, or extension-point interface modified.
+(`hatch3r-architect` and `hatch3r-devops` are also conditional in `SPECIALIST_TRIGGER_TABLE` but are not CQ-vector specialists; spawn them too when their architectural / CI-CD triggers match.)
+> **Single source of truth for triggers (D6-M11):** the canonical trigger map lives in `src/pipeline/pipelineContext.ts::SPECIALIST_TRIGGER_TABLE` and the predicate `shouldTriggerSpecialist(specialist, changedFiles, projectType)` returns `{ triggered, reasons }` for any specialist. The brief prose list above is a quick reference only; for the authoritative trigger evaluation at runtime, call `shouldTriggerSpecialist` from the orchestrator harness (or the equivalent mirror in `rules/hatch3r-agent-orchestration.md` → Phase 4 Specialist Trigger Table). Adding a specialist to the prose without updating the TS table is rejected by `npm run validate:specialist-roster`.
 Each specialist sub-agent prompt MUST include:
 - The agent protocol to follow.
 - All `scope: always` rule directives from `rules/`.
 - The diff or file changes to review.
 - The task's acceptance criteria.
+- `correlation_id` (UUID v4 per top-level task per `rules/hatch3r-agent-orchestration.md` → Correlation ID).
 - Confidence expression requirement: rate every recommendation and finding as high/medium/low confidence per the quality charter (`agents/shared/quality-charter.md`). High = verified against current code. Medium = pattern-based, not fully verified. Low = best judgment, recommend human review.
 Await all specialist sub-agents. Apply their feedback (fixes, additional tests, documentation updates).
@@ -347,6 +404,14 @@ Await all specialist sub-agents. Apply their feedback (fixes, additional tests,
 If any Phase 4 specialist produced fixes (not just findings), run a lightweight re-review to catch regressions introduced by the specialist changes. Spawn `hatch3r-reviewer` with a focused prompt covering only the files modified by Phase 4 specialists. If the re-review finds Critical findings, spawn `hatch3r-fixer` and re-review once more (max 1 additional iteration). This prevents Phase 4 fixes from bypassing the review gate.
+#### 4b.2. Post-Write Duplication Scan (Decision 21)
+Before clearing the review gate, run a duplication scan on the working-tree diff to catch near-duplicate code that parallel Phase-3 implementers (one per module) can each pass their own review independently (D13-SA13.2-F7). This operationalizes the CONSTITUTION §6 Decision 21 post-write duplication scan at runtime, not only at audit time.
+1. Run `npx jscpd --min-lines 40 --threshold 80 --reporters json --silent <changed-paths>` (or the project's configured duplication tool). The gate fires when any cross-file clone block is **≥40 lines OR ≥80% byte-similar**.
+2. **If a clone is detected:** route the duplication report back to `hatch3r-fixer` to extract the shared logic (DRY refactor), then re-run 4b.1 re-review on the refactored files. Max 1 duplication-fix iteration; if it persists, surface to the user with the clone locations.
+3. **If no clone is detected:** proceed to 4c. Skip silently when the diff touches a single file (no cross-file clone possible).
 #### 4c. Verify Against Acceptance Criteria
 Check each acceptance criterion from the original task or issue. Mark as met or not-met with evidence.
@@ -359,8 +424,8 @@ For each criterion, rate verification confidence: high (tested and confirmed via
 Review Results:
   Acceptance Criteria: {N/M met}
   Code Quality: {reviewer findings}
-  Security: {security-auditor findings}
-  Test Coverage: {test-writer results}
+  Security: {hatch3r-security findings}
+  Test Coverage: {hatch3r-testability results}
   Documentation: {docs-writer results / not applicable}
   Performance: {pass/issues}
   Overall Confidence: {high/medium/low}
@@ -397,24 +462,34 @@ Collapses the 4 phases into a streamlined flow for small, well-defined tasks. Su
 2. Run quality checks (lint, typecheck, test).
 3. Fix any issues before proceeding.
-### Quick Step 3: Quick Review (Sub-Agent Quality Pipeline)
+### Quick Step 3: Quick Review (sub-agent Quality Pipeline)
 Same two-stage pipeline as Full Mode, with lighter prompts:
 **Stage 1 — Review Loop:**
-1. Spawn **`hatch3r-reviewer`** with a focused prompt covering correctness and quality.
+1. Spawn **`hatch3r-reviewer`** with a focused prompt covering correctness and quality. Extract Critical/Warning findings AND the reviewer's top-level `confidence` field.
 2. If Critical or Warning findings exist, spawn **`hatch3r-fixer`**, then re-review. Max 3 iterations.
+3. **Confidence-aware gate (parity with Full Mode 4a step 2 — the `--confidence-floor` from Step 0.7 is NOT inert in Quick Mode):** resolve an absent/unparseable reviewer `confidence` to `low` per the Confidence Propagation Contract, then apply the same floor branch as Full Mode 4a:
+   - **0 Critical + 0 Warning AND confidence == high or medium:** clean — proceed to Stage 2. (Floor `medium`: also force a second pass on any finding `confidence == low`. Floor `high`: force a second pass if reviewer confidence `!= high` OR any finding is `!= high`, AND ASK on every low-confidence finding.)
+   - **0 Critical + 0 Warning AND confidence == low (including absent/unparseable):** trigger a second reviewer pass before exiting; do not proceed to Stage 2 until it returns high/medium confidence OR the user accepts the low-confidence PASS at the finalize ASK.
 **Stage 2 — Final Quality (after review loop is clean):**
-3. **`hatch3r-test-writer`** — ALWAYS for code changes.
-4. **`hatch3r-security-auditor`** — ALWAYS for code changes.
-5. **`hatch3r-docs-writer`** — evaluate; spawn when documentation impact exists.
-6. Verify acceptance criteria are met.
-7. Confirm lint/typecheck/test pass.
+4. **`hatch3r-testability`** (CQ5) — ALWAYS for code changes.
+5. **`hatch3r-security`** (CQ3) — ALWAYS for code changes.
+6. **`hatch3r-docs-writer`** — evaluate; spawn when documentation impact exists.
+7. Verify acceptance criteria are met (rate each criterion high/medium/low per Full Mode 4c).
+8. Confirm lint/typecheck/test pass.
+Before the finalize ASK, emit an `Overall Confidence` line (parity with Full Mode 4d) sourced from the lowest upstream confidence across reviewer, testability, security, and the acceptance-criteria checks:
-**ASK:** "Changes complete. Quality checks pass. Finalize? (yes / deeper review needed → switch to Full Mode Phase 4)"
+```
+Overall Confidence: {high/medium/low}
+  Lowest-confidence area: {description or "none"}
+```
+**ASK:** "Changes complete. Quality checks pass. Overall confidence: {high/medium/low}. Finalize? (yes / deeper review needed → switch to Full Mode Phase 4)"
 ---
@@ -456,7 +531,7 @@ These checkpoints are NEVER skipped, even in auto mode:
 - **Breaking changes**: API contract changes, public interface modifications always require confirmation
 - **Open questions**: If Phase 1 analysis surfaces unresolvable ambiguity, stop and ASK regardless of mode
 - **Quality gate failures**: If lint/typecheck/test fail after 2 fix attempts, stop and ASK
-- **Cost thresholds**: Stop if estimated token cost exceeds configured limit (default: $10 per task)
+- **Cost thresholds**: When the estimated cost for the selected tier exceeds the configured limit (default: $10 per task), do NOT abort silently. Call `proposeAlternativeTier(currentTier, currentEstimate, budget)` from `src/pipeline/costEstimator.ts` and surface a 3-option ASK: **(a) downgrade** to the suggested lower tier (saves the reported delta — drops the deep researcher modes / Phase-4 specialist depth that the lower tier omits), **(b) raise the budget** and proceed at the current tier, **(c) abort**. Default-if-no-response: abort (preserves the fail-closed contract). When `proposeAlternativeTier` returns `null` (current tier is already the cheapest, or no lower tier fits), present only raise-budget / abort.
 ### Activation
@@ -482,12 +557,60 @@ At the end of an auto workflow session, generate a summary:
 ---
+## Resumability (Decision 27/30)
+workflow is long-running — a Tier 2/3 run walks the 4-phase delivery pipeline (Analyze → Plan → Implement → Review), fans out one implementer per independent module in Phase 3, and runs a reviewer ↔ fixer loop plus Phase 4b CQ1–CQ9 specialist batch in Phase 4. Per hatch3r's workspace-checkpointed resumability contract, checkpoint progress so an interrupted run re-enters at the last completed phase rather than re-running researchers or re-implementing already-applied module changes.
+> Orchestration boilerplate: see `commands/shared/orchestration-frame.md` → Checkpoint Contract. Per-command slots: workspace `.workflow-workspace/`; step range the command's step progression; `wave` = per-module implementer batch index for Phase 3 and reviewer-fixer iteration count for Phase 4a; snapshot/rollback paths every file touched by Phase 3 implementers and Phase 4a fixers. Write points: after Phase 1 researcher fan-out returns, after the Phase 2 plan synthesis is confirmed by ASK, after each Phase 3 implementer sub-agent returns (one write per module so a mid-batch crash preserves prior `delegation_proof_id`s), after each Phase 4a reviewer-fixer iteration, and after the Phase 4b parallel specialist batch (docs-writer + lint-fixer + CQ1–CQ9 specialists) completes.
+---
+## Per-Turn Pipeline-State Header (Bypass Protection)
+> Orchestration boilerplate: see `commands/shared/orchestration-frame.md` → Per-Turn Pipeline-State Header. Phase mapping for workflow: `1` = workflow intake + step decomposition, `2` = per-step sub-agent dispatch, `3` = aggregation + verification of step outputs, `4` = workflow report + iteration-summary. Tier 1 runs are exempt per the Tier 1 exemption.
+## End-of-Turn Delegation Attestation (Bypass Protection)
+> Orchestration boilerplate: see `commands/shared/orchestration-frame.md` → End-of-Turn Delegation Attestation. Per-command mutated-file slot: workflow definition, step outputs, automation manifests.
+## Iteration Summary (mandatory output)
+Emit the canonical 9-section iteration summary per `rules/hatch3r-iteration-summary.md` as the final user-facing output. The validation gate at `.claude/rules/capability-lifecycle.md` blocks SUCCESS declarations without this block (CONSTITUTION §6 Decision 23).
+The 9 sections:
+1. **Request** — verbatim restatement of the user's ask in one sentence.
+2. **Fan-out + Cost** — `sub_agents_spawned: { count, rationale }` plus the `cost_estimate` / `cost_actuals` / `delta` blocks (see Cost Visibility below).
+3. **Web Research** — every URL fetched with access date + trust tier per `agents/shared/rigor-contract.md` (0 acceptable when no research was needed).
+4. **Files Mutated** — list with diff summary (lines added / removed / files created).
+5. **Gates Passed / Failed** — explicit list per `.claude/rules/capability-lifecycle.md` Gate Checklist.
+6. **Pillar Impact Attribution** — `progress_toward_pillar: <axis>.<pillar_id>+<delta>` per CONSTITUTION §6 Decision 17.
+7. **Verification Commands** — exact commands run with exit codes plus key output lines (≤200 chars).
+8. **Open Questions / Blockers** — explicit `None` if fully closed.
+9. **Learnings Captured** — IDs of any learnings written to `.hatch3r/learnings/` this run per `rules/hatch3r-learning-system.md`.
+### Cost Visibility (Decision 24)
+> Orchestration boilerplate: see `commands/shared/orchestration-frame.md` → Cost Estimate for the 5-field `cost_estimate` schema and the post-execution `cost_actuals` + `delta` contract; both land in Section 2 above.
+## Cost estimate (Decision 24)
+This command emits cost transparency per `rules/hatch3r-cost-visibility.md` and CONSTITUTION §6 Decision 24/29:
+- **Pre-execution `cost_estimate`** — emitted in Step 0.5 before the first sub-agent dispatch (both Full and Quick Mode).
+- **Post-execution `cost_actuals` + `delta`** — appended to the Phase 4 / Quick Step 3 iteration summary's Fan-out + Cost section.
+Per-tier `expected_sa_count` calibration (from frontmatter `sub_agents_spawned.count: 15` × tier heuristic in `rules/hatch3r-cost-visibility.md` Pre-Execution Estimate): Quick Tier 1 ≈ 2 (researcher + implementer, reviewer/fixer/testability/security when triggered), Quick Tier 2 ≈ 6 (researcher + implementer + reviewer + fixer + testability + security), Full Tier 3 up to 15 (full pipeline including the Phase-4b CQ1-CQ9 specialist batch bounded by `max_phase4_parallel`). Token telemetry sources from `src/pipeline/observability.ts`; estimation primitives from `src/pipeline/costEstimator.ts`.
+---
 ## Error Handling
 - **Quality check failure in Phase 3:** Loop back and fix before proceeding to Phase 4. Do not advance with failing checks.
 - **Acceptance criteria not met in Phase 4:** Loop back to Phase 3 with specific items to address.
-- **Sub-agent failure:** Retry once, then fall back to direct implementation.
-- **Context degradation (>25 turns):** Suggest starting a fresh chat with a progress summary capturing completed work and remaining items.
+- **Sub-agent failure:** Per the shared sub-agent-failure clause in `rules/hatch3r-agent-orchestration.md` -> Cross-Phase Error Propagation: retry once, then re-spawn `hatch3r-fixer` with the failure context, then `BLOCKED_OTHER` + ASK. Never fall back to inline implementation (issue #73 bypass mode).
+- **Context degradation:** per the canonical Context-Degradation Policy (`rules/hatch3r-agent-orchestration-detail.md` -> Context-Degradation Policy) — compress at `>50%` context window, restart at `>75%`; the coarse turn-count fallback for this command is ~25 turns, at which point suggest a fresh chat with a progress summary capturing completed work and remaining items.
+- **Handoff information loss (>0.3):** When a lossy phase transition crosses the `informationLossEstimate > 0.3` threshold, emit the `formatPhaseHandoffWarning` line in the iteration summary per `rules/hatch3r-agent-orchestration.md` -> Phase Handoff Contract (Handoff-loss trigger) so the next phase verifies critical context survived — distinct from the turn-count degradation rule above.
 - **Mode switch:** User can switch from Quick to Full (or vice versa) at any ASK checkpoint. State carries forward — no work is lost.
 ## Guardrails
@@ -501,4 +624,5 @@ At the end of an auto workflow session, generate a summary:
 - **All phases produce structured output** that can feed into other hatch3r commands.
 - **Respect the project's tooling hierarchy** for knowledge augmentation (Context7 MCP for library docs, web research for current events).
 - **Never force a mode** — user always has final say at every ASK checkpoint.
+- **Concurrent invocation:** acquire `.hatch3r/.lock` before Phase 1 and detect-then-warn on a conflicting active pipeline (same branch / open `.hatch3r/hatch.json` transaction) per `rules/hatch3r-agent-orchestration.md` → Parallel Safety → Concurrent Invocation Handling. Cross-task learnings consolidate at completion, never mid-pipeline.
 - **This command composes existing hatch3r agents and skills** — it does not replace them.

package/dist/content/commands/revision/revision-delegation.md CHANGED Viewed

@@ -42,14 +42,14 @@ For Tier 2/3: cache researcher output (reference conventions, blast radius data)
 |-----------------|-----------|----------|
 | Bugs, missing features, error handling, logic fixes | `hatch3r-implementer` | hatch3r-implementer agent protocol |
 | Dead code, unused imports, type fixes, lint errors | `hatch3r-lint-fixer` | hatch3r-lint-fixer agent protocol |
-| Missing tests, insufficient coverage | `hatch3r-test-writer` | hatch3r-test-writer agent protocol |
+| Missing tests, insufficient coverage | `hatch3r-testability` | hatch3r-testability agent protocol |
 ### Blast-Radius-Aware Grouping
 When multiple findings affect the same file or module, batch them to a single sub-agent to avoid cross-agent file conflicts:
 1. Build a file-to-findings map from all [FIX NOW] items.
-2. Findings in the same file go to the same sub-agent instance, even if they span categories (use the highest-priority specialist: implementer > lint-fixer > test-writer).
+2. Findings in the same file go to the same sub-agent instance, even if they span categories (use the highest-priority specialist: hatch3r-implementer > hatch3r-lint-fixer > hatch3r-testability).
 3. Findings in disjoint files can run in parallel sub-agents.
 4. If findings span independent areas within the same specialist type, spawn one sub-agent per area to parallelize.
@@ -68,12 +68,13 @@ Each sub-agent prompt MUST include:
 5. Relevant learnings from `.hatch3r/learnings/` (if found in Step 1d).
 6. Explicit instruction: do NOT create branches, commits, or PRs.
 7. Confidence expression requirement: rate every recommendation and finding as high/medium/low confidence per the quality charter (`agents/shared/quality-charter.md`). High = verified against current code. Medium = pattern-based, not fully verified. Low = best judgment, recommend human review.
-8. Revision-specific constraint: "You are fixing existing code, not implementing new features. Stay within the architecture established by the original implementation."
+8. `correlation_id` (UUID v4 generated per top-level task per `rules/hatch3r-agent-orchestration.md` → Correlation ID) — the sub-agent echoes it in logs, outputs, and status reports for cross-phase attribution.
+9. Revision-specific constraint: "You are fixing existing code, not implementing new features. Stay within the architecture established by the original implementation."
 **When Tier 2/3 research was performed (6.pre):**
-9. Reference conventions from `similar-implementation` output — triggers the implementer's Convention Lock step.
-10. Blast radius data from `codebase-impact` output (Tier 3) — transitive dependency trace informing which consumers and contracts the fix must preserve.
+10. Reference conventions from `similar-implementation` output — triggers the implementer's Convention Lock step.
+11. Blast radius data from `codebase-impact` output (Tier 3) — transitive dependency trace informing which consumers and contracts the fix must preserve.
 ---

package/dist/content/commands/revision/revision-modes.md CHANGED Viewed

@@ -61,6 +61,51 @@ Revision Session Report:
 ---
+## Review-Only Mode (D13-SA13.1-F2)
+When invoked with `--review-only`, revision becomes a **read-only code-review surface**: it runs the reviewer against the change set and emits the structured review report, but mutates nothing — no fix implementation, no commit, no push, no learnings write. This is the standalone "review this code, no changes" entry that fills development-workflow activity (3) Code review (`governance/audit/domains/D13-human-ai-collaboration.md` §13.1) — the only other code-facing surfaces (`hatch3r-pr-resolve`, the default `hatch3r-revision` flow) both mutate.
+### Behavior Changes in Review-Only Mode
+| Step | Normal Mode | Review-Only Mode |
+|------|-------------|------------------|
+| Step 1 Context Reconstruction | Run | Run (diff scope is the review target) |
+| Step 2 Context validation | ASK user | ASK user (confirm review target only) |
+| Step 3 User feedback interview | ASK user for feedback | Skip — the agent is the reviewer, not the interviewer |
+| Step 4 Proactive leftover scan | Run | Run (read-only; findings feed the report, not a fix queue) |
+| Step 5 Findings consolidation + routing | Suggest FIX NOW / DEFER | Consolidate into the report; no routing (nothing is fixed) |
+| Step 6 Fix implementation | Delegate to fixers | **Skipped** — no `hatch3r-implementer` / `hatch3r-fixer` / `hatch3r-lint-fixer` spawn |
+| Step 7 Quality verification | Stage 1 review loop -> Stage 2 specialists | **Single `hatch3r-reviewer` pass only** — no fixer, no re-review loop, no Stage 2 mutation specialists |
+| Step 8 Commit and push | `git add` / `commit` / `push` | **Skipped** — zero git mutation |
+| Step 9 Merge readiness | Verdict + board housekeeping | Emit the review report (see below); no PR-body write, no board mutation |
+| Step 10 Capture learnings | Write `.hatch3r/learnings/` | **Skipped** — no learnings file written |
+The single reviewer pass still carries the Confidence Propagation Contract: the reviewer's high/medium/low confidence is surfaced verbatim in the report. The `--confidence-floor` knob is inert in review-only mode (it gates the second-pass/fix loop, which does not run); state that in the report header rather than silently dropping it. `--review-only` and `--auto` are independent — `--auto` only relaxes ASK checkpoints, so `--review-only --auto` runs the read-only review with Step 2 auto-proceeding when a PR + linked issues are found.
+### Activation
+```
+/hatch3r revision --review-only
+```
+### Review Report
+In place of the Step 9 merge-readiness assessment, emit a read-only report:
+```
+Review-Only Report:
+  Branch: {branch}
+  Diff: {files_changed} files changed (+{additions} / -{deletions})
+  Reviewer confidence: {high/medium/low}
+  Confidence floor: inert (review-only — no fix loop to gate)
+  Critical ({n}): {finding} — {file:line}
+  Warning ({n}):  {finding} — {file:line}
+  Suggestion ({n}): {finding} — {file:line}
+  No changes were made. Run /hatch3r revision (without --review-only) to fix.
+```
+---
 ## Error Handling
 > Platform-specific CLI commands: see `commands/board/shared-{platform}.md` for fallback chains
@@ -68,10 +113,10 @@ Revision Session Report:
 - **Git diff failure**: If `git diff` fails (e.g., no commits on branch, detached HEAD), **ASK** the user for the correct branch or base ref.
 - **No changes detected**: If the diff is empty, inform the user and exit. There is nothing to revise.
 - **PR/issue fetch failure**: Retry once using the platform CLI. If retry fails, proceed without PR/issue context. Work from the diff alone. Warn the user that acceptance criteria are unavailable.
-- **Sub-agent failure**: Retry once. If the retry fails, fall back to direct implementation for that finding.
+- **Sub-agent failure**: Per the shared sub-agent-failure clause in `rules/hatch3r-agent-orchestration.md` -> Cross-Phase Error Propagation: retry once, then re-spawn `hatch3r-fixer` with the failure context for that finding, then `BLOCKED_OTHER` + ASK. Never fall back to inline implementation (issue #73 bypass mode).
 - **Quality check failure after 2 retries**: Present the specific failures and **ASK** the user whether to proceed with a partial fix commit or continue debugging.
 - **Push failure**: Present the error. Common fixes: `git push -u origin {branch}` for new branches, `git pull --rebase` for diverged branches.
-- **Context degradation (>25 turns)**: Suggest starting a fresh chat with a progress summary. The revision command is designed for fresh contexts — it can be re-run.
+- **Context degradation**: per the canonical Context-Degradation Policy (`rules/hatch3r-agent-orchestration-detail.md` -> Context-Degradation Policy) — compress at `>50%` context window, restart at `>75%` (the coarse turn-count fallback is ~25 turns). The revision command is designed for fresh contexts — at the restart threshold, suggest a fresh chat with a progress summary; it can be re-run.
 - **Board sync failure** (when board context exists): Warn and continue. Board sync is advisory in revision — it does not block the fix pipeline.
 ---
@@ -82,11 +127,11 @@ Revision Session Report:
 - **Never skip the proactive scan (Step 4)** — even if the user reports no issues. Agents leave leftovers.
 - **Always run quality checks (Step 7)** before committing. Never commit code that fails lint, typecheck, or tests.
 - **Stay within the revision scope.** Fix what was reported and what the scan found. Do not refactor unrelated code, add new features, or expand beyond the original implementation's intent.
-- **Always commit and push** at the end of a revision cycle. The user invoked this command to get fixes merged — do not exit without committing (unless the user explicitly abandons).
+- **Always commit and push** at the end of a revision cycle. The user invoked this command to get fixes merged — do not exit without committing (unless the user explicitly abandons, or `--review-only` is active, in which case there is nothing to commit).
 - **Respect the original implementation's architecture.** Revision fixes issues within the existing patterns. If the architecture itself is flawed, note it as a finding but do not restructure — suggest a separate refactor instead.
 - **One sub-agent per concern.** Delegate to specialist sub-agents based on finding type. Do not ask the implementer to also fix lint issues or write tests.
 - **Git safety.** Never force-push. Never rewrite history. Always create new commits for revision changes.
-- **This command composes existing hatch3r agents** — it does not replace them. The reviewer, implementer, lint-fixer, and test-writer agents handle the actual work.
+- **This command composes existing hatch3r agents** — it does not replace them. The reviewer, implementer, lint-fixer, and hatch3r-testability agents handle the actual work.
 - **Critical findings default to FIX NOW.** If the user overrides this, execute the Critical Deferral Protocol (Step 5b): structured warning with specific risk, require written rationale, record in todo.md with `Critical-deferred` tag, and flag for elevated triage in board-fill. The user is never blocked — rationale adds accountability, not a veto.
 - **Deferred findings go to `todo.md`, not directly to GitHub/GitLab/Azure DevOps issues.** The board-fill pipeline handles triage, epic creation, dependency analysis, and readiness assessment. Revision does not shortcut this process.
 - **Always format deferred items as a single epic block** in `todo.md`, regardless of count. This groups them together during the next board-fill run.

package/dist/content/commands/revision/revision-quality.md CHANGED Viewed

@@ -41,15 +41,17 @@ The reviewer prompt MUST include:
 - All `scope: always` rule directives from `rules/`.
 - Iteration number and previous findings (if not the first iteration).
 - Confidence expression requirement: rate every recommendation and finding as high/medium/low confidence per the quality charter (`agents/shared/quality-charter.md`). High = verified against current code. Medium = pattern-based, not fully verified. Low = best judgment, recommend human review.
+- `correlation_id` (UUID v4 generated per top-level task per `rules/hatch3r-agent-orchestration.md` → Correlation ID) — the sub-agent echoes it in logs, outputs, and status reports for cross-phase attribution.
 **When Tier 2/3 research was performed** (from Step 6.pre):
 - Include blast radius data so the reviewer can verify fixes preserve dependent consumers and contracts.
 - Include reference conventions so the reviewer can verify fixes follow established patterns.
-2. Process reviewer output (confidence-aware gate):
+2. Process reviewer output per the canonical **Confidence-Aware Review Gate** in `agents/shared/confidence-gate.md`, passing in the resolved `--confidence-floor` (`any` | `medium` | `high`) routed here from `hatch3r-revision` → Confidence Floor. At the default `any` floor:
    - If **0 Critical + 0 Warning AND reviewer confidence != low:** review loop is clean. Proceed to Stage 2.
    - If **0 Critical + 0 Warning AND reviewer confidence == low:** trigger a second reviewer pass before exiting. Do not proceed to Stage 2 until the second pass returns non-low confidence OR the user explicitly accepts the low-confidence PASS.
-   - If Critical or Warning findings remain: spawn `hatch3r-fixer` sub-agent to address them. When fixes touch shared or public interfaces, include blast radius data and reference conventions in the fixer prompt. Then re-run the reviewer (next iteration).
+   - **Floor tightening:** at floor `medium` the pass surface above is unchanged; at floor `high` a `medium`-confidence clean verdict also triggers a second pass (and any low-confidence finding triggers an ASK). Apply the floor-tier branches from the shared gate — do not collapse `medium`/`high` to the `any` row.
+   - If Critical or Warning findings remain: spawn `hatch3r-fixer` sub-agent to address them. The fixer prompt MUST include `correlation_id` (UUID v4 generated per top-level task per `rules/hatch3r-agent-orchestration.md` → Correlation ID) — the sub-agent echoes it in logs, outputs, and status reports for cross-phase attribution. When fixes touch shared or public interfaces, include blast radius data and reference conventions in the fixer prompt. Then re-run the reviewer (next iteration).
 3. If 3 iterations complete and findings remain, **ASK** the user whether to proceed or fix manually.
@@ -65,8 +67,8 @@ After the review loop is clean, spawn specialist agents in parallel via the Task
 ### Always Spawn (Mandatory for Code Changes)
-- **`hatch3r-test-writer`** — write or update tests for code changes. Unit tests for new logic, regression tests for bug fixes, integration tests for cross-module changes.
-- **`hatch3r-security-auditor`** — security review of code changes. Audit data flows, access control, input validation, and secret management.
+- **`hatch3r-testability`** (CQ5) — verify tests for code changes meet the mandate map / coverage floor. Unit tests for new logic, regression tests for bug fixes, integration tests for cross-module changes.
+- **`hatch3r-security`** (CQ3) — security review of code changes. Audit data flows, access control, input validation, and secret management.
 ### Always Evaluate (Spawn When Applicable)
@@ -75,17 +77,18 @@ After the review loop is clean, spawn specialist agents in parallel via the Task
 ### Conditional Specialists (Spawn When Triggered)
 - **`hatch3r-lint-fixer`** — spawn when lint errors are present after fix implementation (Step 6 lint-fixer may have missed errors introduced by other sub-agents).
-- **`hatch3r-a11y-auditor`** — spawn when the diff includes UI component changes (`area:ui` or `area:a11y` label on linked issues, or component/style files in the diff).
-- **`hatch3r-perf-profiler`** — spawn when the diff includes hot-path changes (`area:performance` label on linked issues, or changes to database queries, API handlers, rendering loops).
+- **`hatch3r-ui`** (CQ1) — spawn when the diff includes UI component changes (`area:ui` or `area:a11y` label on linked issues, or component/style files in the diff).
+- **`hatch3r-performance`** (CQ7) — spawn when the diff includes hot-path changes (`area:performance` label on linked issues, or changes to database queries, API handlers, rendering loops).
 ### Specialist Prompt Requirements
 Each specialist sub-agent prompt MUST include:
-- The agent protocol to follow (e.g., "Follow the hatch3r-test-writer agent protocol").
+- The agent protocol to follow (e.g., "Follow the hatch3r-testability agent protocol").
 - All `scope: always` rule directives from `rules/` (sub-agents do not inherit rules automatically).
 - The diff or file changes to review.
 - The linked issue's acceptance criteria (if available).
 - Confidence expression requirement: rate every recommendation and finding as high/medium/low confidence per the quality charter (`agents/shared/quality-charter.md`). High = verified against current code. Medium = pattern-based, not fully verified. Low = best judgment, recommend human review.
+- `correlation_id` (UUID v4 generated per top-level task per `rules/hatch3r-agent-orchestration.md` → Correlation ID) — the sub-agent echoes it in logs, outputs, and status reports for cross-phase attribution.
 Await all specialist sub-agents. Apply their feedback (fixes, additional tests, documentation updates). Re-run quality gates (7a) if changes were made.