npm - hatch3r - Versions diffs - 1.9.0 → 2.0.0 - Mend

hatch3r 1.9.0 → 2.0.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (288) hide show

package/dist/content/rules/hatch3r-agent-orchestration.md CHANGED Viewed

@@ -1,7 +1,7 @@
 ---
 id: hatch3r-agent-orchestration
 type: rule
-description: Mandatory agent delegation, skill loading, and subagent usage directives for ALL tasks in ALL contexts
+description: Mandatory agent delegation, skill loading, and sub-agent usage directives for ALL tasks in ALL contexts
 scope: always
 tags: [orchestration, floor:protocol]
 precedence: high
@@ -10,99 +10,62 @@ cache_friendly: true
 ---
 # Agent Orchestration
-This rule governs when and how to delegate work to hatch3r agents, load skills, and spawn subagents. These directives are mandatory — not suggestions. For extended reference on pipeline context schemas, resilience/failure handling, and observability, see `hatch3r-agent-orchestration-detail`.
-## Orchestration Differentiation
-Hatch3r's orchestration uses a **phase-gated pipeline** (Research, Implement, Review, Quality) with **structured handoffs** via `PipelineContext` and a **mandatory review gate** before the quality phase. This is not free-form agent chat.
+This rule governs when and how to delegate work to hatch3r agents, load skills, and spawn sub-agents — mandatory directives, not suggestions. Hatch3r orchestration is a **phase-gated pipeline** (not free-form agent chat) with **structured handoffs** via `PipelineContext` and a **mandatory review gate** before the quality phase. For extended reference (PipelineContext schemas, resilience/failure handling, observability), see `hatch3r-agent-orchestration-detail`.
 ## Universal Applicability
-This rule applies to EVERY context without exception: board-pickup (epic, sub-issue, standalone, batch), workflow command (full/quick), plain chat, issue references, and natural language requests. The full sub-agent pipeline is mandatory — never implement code inline without sub-agents.
+This rule applies to EVERY context without exception: board-pickup (epic, sub-issue, standalone, batch), workflow command (full/quick), plain chat, issue references, and natural-language requests. Every task MUST follow the four-phase pipeline — **Phase 1 Research** (`hatch3r-researcher`), **Phase 2 Implement** (`hatch3r-implementer`), **Phase 3 Review Loop** (`hatch3r-reviewer` + `hatch3r-fixer`), **Phase 4 Final Quality** (parallel specialists) — per Mandatory Delegation Directives below; never implement code inline without sub-agents.
 **"Inline implementation" defined.** Inline implementation means calling any code-writing tool — `Edit`, `Write`, `MultiEdit`, `NotebookEdit`, `replace_string_in_file`, `multi_replace_string_in_file`, `create_file`, `str_replace_based_edit_tool`, `apply_patch`, or any platform equivalent — from the orchestrator turn itself, rather than from inside a spawned `hatch3r-implementer` (Phase 2) or `hatch3r-fixer` (Phase 3) sub-agent. The only carve-out is `hatch3r-quick-change` for Tier 1 single-line trivial edits per its declared scope.
-## Universal Sub-Agent Pipeline
-Every task MUST follow this four-phase pipeline: **Phase 1 — Research** (`hatch3r-researcher`), **Phase 2 — Implement** (`hatch3r-implementer`), **Phase 3 — Review Loop** (`hatch3r-reviewer` + `hatch3r-fixer`), **Phase 4 — Final Quality** (parallel specialists). See Mandatory Delegation Directives below.
 ## Agent Roster
+Pipeline-phase agents (Phases 1-3):
 | Agent | Purpose | Invoke When |
 |-------|---------|-------------|
-| `hatch3r-researcher` | Context gathering (15 modes) | Always — before implementation (skip trivial edits) |
-| `hatch3r-implementer` | Single-task implementation | Always — one per task |
-| `hatch3r-reviewer` | Code review | Always — Phase 3 review loop |
+| `hatch3r-researcher` | Context gathering (15 modes) | Phase 1 — before implementation (skip trivial edits) |
+| `hatch3r-implementer` | Single-task implementation | Phase 2 — one per task |
+| `hatch3r-reviewer` | Code review | Phase 3 — review loop |
 | `hatch3r-fixer` | Fix reviewer findings | Phase 3 — Critical/Warning findings |
-| `hatch3r-test-writer` | Tests | Always — Phase 4 (every code change; skip per Phase Skip Criteria) |
-| `hatch3r-security-auditor` | Security review | Always — Phase 4 (every code change; skip per Phase Skip Criteria) |
-| `hatch3r-docs-writer` | Documentation | Phase 4 — evaluate when APIs/architecture/UX affected |
-| `hatch3r-lint-fixer` | Lint/type fixes | Conditional — lint errors present |
-| `hatch3r-a11y-auditor` | WCAG AA checks | Conditional — UI/accessibility changes |
-| `hatch3r-perf-profiler` | Performance profiling | Conditional — performance-sensitive changes |
-| `hatch3r-dependency-auditor` | CVE/supply chain | Conditional — dependencies change |
 | `hatch3r-ci-watcher` | CI failure diagnosis | Conditional — CI fails |
-| `hatch3r-architect` | Architecture design | Conditional — architectural decisions needed |
-| `hatch3r-devops` | CI/CD and deployment | Conditional — infrastructure tasks |
+Phase 4 specialists (docs-writer, lint-fixer, architect, devops, and the CQ1-CQ9 vector specialists ui/ux/security/reliability/testability/scalability/performance/maintainability/enhancability) and their trigger conditions are enumerated once in the Phase 4 Specialist Trigger Table below. That table mirrors the single source of truth `src/pipeline/pipelineContext.ts::SPECIALIST_TRIGGER_TABLE` — add a specialist there first, never here.
 ## Deep Context Integration
 Score task complexity per the `hatch3r-deep-context` rule before Phase 1. Apply the resulting tier:
-- **Tier 2 hard gate (B1).** Before Phase 2, run `hatch3r-researcher` with `requirements-elicitation:quick` mode to detect ambiguity and ask the user via the platform-native question tool (per `agents/shared/user-question-protocol.md`). Orchestrator awaits answers and integrates them into the Phase 1 brief; do not begin Phase 2 with unresolved questions. Tier 1 is exempt only when scope is single-file, single-concern, and acceptance is testable from the user message alone.
+- **Tier 2 hard gate (B1).** Before Phase 2, run `hatch3r-researcher` with `requirements-elicitation:quick` mode to detect ambiguity. The researcher sub-agent emits the elicited questions as a structured list in its result — it does NOT call the platform-native question tool (a spawned sub-agent has no interactive surface). The **orchestrator** renders that list to the user via the platform-native question tool (per `agents/shared/user-question-protocol.md`), mirroring `commands/hatch3r-workflow.md` Phase 1 "Present the `requirements-elicitation` questions inline and await answers". Orchestrator awaits answers and integrates them into the Phase 1 brief; do not begin Phase 2 with unresolved questions. Tier 1 is exempt only when scope is single-file, single-concern, and acceptance is testable from the user message alone.
 - **Tier 3 (Deep):** Present Pre-Implementation Summary and ASK for confirmation. Do NOT proceed until all unresolved questions are answered.
+**Tier-to-Phase-4 specialist depth mapping** (Finding D7-M9 / D7-SA7.4-1). Deep-context tier drives Phase 1 researcher depth; the same tier drives Phase 4 specialist depth so quality coverage scales with task risk: Tier 1 → run only the always-mode floor (`hatch3r-security` + `hatch3r-testability`) at `quick` depth — UI/perf/maintainability/etc. specialists are skipped per Phase Skip Criteria. Tier 2 → always-mode floor at `standard` depth + every triggered conditional specialist at `quick` depth. Tier 3 → every applicable specialist at `deep` depth — full WCAG AA / OWASP ASI / CWV / mandate-map sweep with N=3 sampling on always-mode specialists when `floor:security` items are touched (per `agents/shared/quality-charter.md` -> Non-Determinism Budget). The depth signal rides on the specialist prompt as the explicit field `depth: quick | standard | deep`; specialists read it via the shared `agents/shared/quality-specialist-frame.md`. Tier also drives **model class** as a first-order effort lever (Tier 1 → economy, Tier 2 → default, Tier 3 → strongest), resolved per-adapter against `models.default` / `src/models/resolve.ts` and ignored where an adapter has no model-routing surface — see `hatch3r-deep-context` -> Tier Assignment.
 ## Mandatory Delegation Directives
 ### Context Gathering (Before Implementation)
-Spawn `hatch3r-researcher` before implementing any task. Skip only for trivial single-line edits. Select modes by task type, then add tier-appropriate modes per Deep Context Integration:
-- **`type:bug`**: `symptom-trace`, `root-cause`, `codebase-impact` + tier modes
-- **`type:feature`**: `codebase-impact`, `feature-design`, `architecture` + tier modes
-- **`type:refactor`**: `current-state`, `refactoring-strategy`, `migration-path` + tier modes
-- **`type:qa`**: `codebase-impact` + tier modes
-Use depth `quick` for low-risk, `standard` for medium-risk, `deep` for high-risk. Tier 3 always uses `deep` depth.
+Spawn `hatch3r-researcher` before implementing any task (skip only for trivial single-line edits). Select modes by task type plus tier-appropriate modes per Deep Context Integration: `type:bug` → `symptom-trace`, `root-cause`, `codebase-impact`; `type:feature` → `codebase-impact`, `feature-design`, `architecture`; `type:refactor` → `current-state`, `refactoring-strategy`, `migration-path`; `type:qa` → `codebase-impact`. Depth: `quick` low-risk, `standard` medium-risk, `deep` high-risk; Tier 3 always `deep`.
 ### Research Completeness Checklist
-Before Phase 1 to Phase 2 handoff, verify:
-- [ ] **All affected files identified** — files to be created, modified, or deleted are listed.
-- [ ] **Blast radius assessed** — downstream consumers and integration points documented.
-- [ ] **Existing tests located** — test files covering affected code identified (or absence noted).
-- [ ] **Dependencies mapped** — internal and external dependencies enumerated.
-If any item is unconfirmed, re-run researcher with additional modes or surface to user.
+Before Phase 1 to Phase 2 handoff, verify all four: (1) **affected files identified** (create/modify/delete listed); (2) **blast radius assessed** (downstream consumers + integration points documented); (3) **existing tests located** (or absence noted); (4) **dependencies mapped** (internal + external). If any item is unconfirmed, re-run researcher with additional modes or surface to user.
 ### Implementation Delegation
-Spawn `hatch3r-implementer` via Task tool for ALL code changes. Never implement inline.
-- **Single issue**: One implementer. Orchestrator owns git/PR/board.
-- **Plain chat task**: One implementer. Create synthetic issue context first.
-- **Epics**: One implementer per sub-issue, level-by-level respecting dependency order.
-- **Batch**: Group by dependency level, one implementer per issue, shared branch + combined PR.
-**Implementer prompt enrichment (Tier 2+):** Include `similar-implementation` findings as "Reference Conventions", resolved `requirements-elicitation` answers as "Resolved Requirements", and blast radius data (Tier 3 only).
+Spawn `hatch3r-implementer` via Task tool for ALL code changes; never implement inline. **Single issue / plain chat task:** one implementer (orchestrator owns git/PR/board; plain chat creates synthetic issue context first). **Epics:** one implementer per sub-issue, level-by-level respecting dependency order. **Batch:** group by dependency level, one implementer per issue, shared branch + combined PR. **Prompt enrichment (Tier 2+):** include `similar-implementation` findings as "Reference Conventions", resolved `requirements-elicitation` answers as "Resolved Requirements", and blast radius (Tier 3 only).
 ### Per-Turn Pipeline-State Header
-Whenever a tracked task is active at Tier 2 or Tier 3 (deep-context score >= 3), the orchestrator MUST emit a single-line pipeline-state header at the very start of every assistant turn that touches the task. Format:
+Whenever a tracked task is active at Tier 2 or Tier 3 (deep-context score >= 3), the orchestrator MUST emit a single-line pipeline-state header at the start of every assistant turn that touches the task. Format (next can also be `user-confirmation` or `complete`):
 ```
-[hatch3r-pipeline: phase {1|2|3|4} | last: {agent} → {SUCCESS|PARTIAL|FAILED|BLOCKED|n/a} | next: {agent or "user-confirmation" or "complete"}]
+[hatch3r-pipeline: phase {1|2|3|4} | last: {agent} → {SUCCESS|PARTIAL|FAILED|BLOCKED|n/a} | next: {agent}]
 ```
-Examples:
-- `[hatch3r-pipeline: phase 1 | last: n/a | next: hatch3r-researcher]`
-- `[hatch3r-pipeline: phase 2 | last: hatch3r-researcher → SUCCESS | next: hatch3r-implementer]`
-- `[hatch3r-pipeline: phase 3 | last: hatch3r-reviewer → PARTIAL | next: hatch3r-fixer]`
-- `[hatch3r-pipeline: phase 3 | last: hatch3r-implementer → SUCCESS | next: user-confirmation]`
+Example: `[hatch3r-pipeline: phase 2 | last: hatch3r-researcher → SUCCESS | next: hatch3r-implementer]`
-A missing header on a tracked Tier >= 2 task is a self-detectable drift signal — the user may halt the turn and request re-grounding. The header also functions as a per-reply cache prime: rendering it forces the orchestrator to re-resolve which phase it is in before choosing tools. Tier 1 tasks, read-only answers, and chat-only iterations do NOT require the header.
+A missing header on a tracked Tier >= 2 task is a self-detectable drift signal — the user may halt and re-ground. The header also primes the orchestrator to re-resolve its phase before choosing tools. Tier 1, read-only, and chat-only turns do NOT require it.
 ### End-of-Turn Delegation Attestation
@@ -120,30 +83,18 @@ inline_edits_by_orchestrator: none | <carve-out: hatch3r-quick-change Tier-1 + q
 Rules:
-- Each `files_mutated_this_turn` row MUST cite the spawning sub-agent invocation and quote the `delegation_proof_id` returned by that sub-agent verbatim. Unattributable rows are self-declared P8 B2 violations and the orchestrator MUST queue re-delegation in the next turn.
-- `inline_edits_by_orchestrator: none` is the only acceptable value outside the `hatch3r-quick-change` Tier-1 carve-out declared in the "Inline implementation" definition above.
-- Tier 1 read-only and chat-only turns are exempt — same scope as the Per-Turn Pipeline-State Header.
-- Missing block on a Tier >= 2 mutating turn is a self-detectable drift signal — the user may halt the turn and re-ground per the same protocol as the missing-header signal.
-- The block is consumed by reviewers and the next orchestrator turn; it sits beside the Iteration Summary, not inside it, preserving the existing 5-field iteration-summary contract verbatim.
+- Each `files_mutated_this_turn` row MUST cite the spawning sub-agent invocation and quote its `delegation_proof_id` verbatim. Unattributable rows are self-declared P8 B2 violations; the orchestrator MUST queue re-delegation next turn.
+- `inline_edits_by_orchestrator: none` is the only value accepted outside the `hatch3r-quick-change` Tier-1 carve-out (per the "Inline implementation" definition above).
+- Tier 1 read-only and chat-only turns are exempt (same scope as the Per-Turn Pipeline-State Header); a missing block on a Tier >= 2 mutating turn is a self-detectable drift signal — halt and re-ground per the missing-header protocol.
+- The block is consumed by reviewers and the next orchestrator turn; it sits beside the Iteration Summary, not inside it, preserving the 5-field iteration-summary contract verbatim.
 ### Mandatory Delegation Directive (No Inline Implementation)
-Restating with maximum clarity for sub-agent prompt inclusion: the orchestrator MUST NOT call `Edit`, `Write`, `MultiEdit`, `NotebookEdit`, `replace_string_in_file`, `multi_replace_string_in_file`, `create_file`, `str_replace_based_edit_tool`, `apply_patch`, or any platform-equivalent code-writing tool from its own turn. The only path for code mutation is the Task tool spawning `hatch3r-implementer` (Phase 2) or `hatch3r-fixer` (Phase 3). Carve-out: `hatch3r-quick-change` Tier 1 trivial items per its declared scope. No other carve-out exists. Violations are bypass mode (see issue #73) — surface them by halting the turn and re-delegating.
+For sub-agent prompt inclusion: the orchestrator MUST NOT call any code-writing tool (enumerated under "Inline implementation" above) from its own turn. The only path for code mutation is the Task tool spawning `hatch3r-implementer` (Phase 2) or `hatch3r-fixer` (Phase 3). Sole carve-out: `hatch3r-quick-change` Tier 1 trivial items per its declared scope. Violations are bypass mode (issue #73) — halt the turn and re-delegate.
 ### Mid-Implementation Research Gap Checkpoint
-At the midpoint of Phase 2 (after initial files are modified but before completion), the implementer MUST evaluate whether research gaps exist. This prevents discovering missing context too late in the pipeline.
-**Checkpoint triggers:**
-1. Implementation requires modifying a file not listed in `researchFindings.affectedFiles`.
-2. An undocumented dependency or integration point is discovered.
-3. The implementer's confidence drops below "medium" for any sub-task.
-4. A test file expected from research does not exist or covers different behavior.
-**Actions when gaps are detected:**
-- Log the gap in `PipelineContext.researchGaps`.
-- If the gap is blocking (cannot proceed without the missing context): pause implementation, surface the gap to the orchestrator, and request a targeted re-run of `hatch3r-researcher` with the specific modes needed.
-- If the gap is non-blocking (can proceed with assumptions): document the assumption, continue implementation, and flag for reviewer attention in Phase 3.
+At the Phase 2 midpoint (after initial files modified, before completion), the implementer MUST evaluate research gaps to avoid discovering missing context too late. **Triggers:** modifying a file not in `researchFindings.affectedFiles`; an undocumented dependency/integration point; confidence dropping below "medium" on any sub-task; an expected test file missing or covering different behavior. **Actions:** log the gap in `PipelineContext.researchGaps`; if blocking, pause and request a targeted `hatch3r-researcher` re-run with the needed modes; if non-blocking, document the assumption, continue, and flag for Phase 3 reviewer attention.
 ### Per-Task Mini-Review
@@ -153,149 +104,96 @@ For multi-sub-task implementations, the implementer performs a lightweight mini-
 **Phase 3 — Review Loop:**
-1. Spawn `hatch3r-reviewer` with diff and acceptance criteria. Reviewer includes blast radius summary.
-2. Critical/Warning findings: spawn `hatch3r-fixer` with full reviewer output.
-3. Re-review after fixes. Repeat until 0 Critical + 0 Warning, or max 4 iterations (matches `DEFAULT_MAX_REVIEW_ITERATIONS` in `src/pipeline/reviewLoop.ts`; raised from 3 to 4 in Cycle 7.5 W2B2 finding H26 so the oscillation detector becomes reachable in default config). The rule default and the code constant are kept in sync by `src/__tests__/pipeline/reviewLoop.test.ts` (CI-enforced).
-4. **Confirmation pass** after clean review: lightweight re-review for fix-driven regressions and acceptance criteria completeness. The confirmation pass checks only: (a) no new test failures compared to Phase 2 baseline, (b) no type errors introduced, (c) acceptance criteria from the issue are still met. It does not re-run the full review checklist.
-5. Max iterations reached: surface to user with a structured summary: iteration count, remaining Critical findings (with file:line), remaining Warning findings, and a recommendation (fix manually vs. accept risk). Never present raw reviewer output without summarization.
-6. **Review gate confidence signal:** When the review loop exits with a clean verdict, record the iteration count in `PipelineContext.reviewResult.iterations`. Clean-on-first-pass (iteration 1) signals higher confidence than clean-after-multiple-iterations (iteration 2-3). Phase 4 specialists and the orchestrator should factor this into their risk assessment.
+1. Spawn `hatch3r-reviewer` with diff, acceptance criteria, and blast-radius summary.
+2. Critical/Warning findings: spawn `hatch3r-fixer` with full reviewer output, then re-review. Repeat until 0 Critical + 0 Warning, or max 4 iterations (matches `DEFAULT_MAX_REVIEW_ITERATIONS` in `src/pipeline/reviewLoop.ts`, raised 3→4 in Cycle 7.5 W2B2 H26 to reach the oscillation detector in default config; kept in sync by `src/__tests__/pipeline/reviewLoop.test.ts`, CI-enforced).
+   - **Re-run honor-rule (anti-self-approval, F15.2-H2 / D13-SA13.2-F6 / D15-SA15.2-F3).** The fixer's `Reviewer re-run required` boolean is advisory; the orchestrator derives the authoritative value `reRunRequired = (fixer Files changed list is non-empty)`. When `true`, another `hatch3r-reviewer` pass is MANDATORY before the loop may exit clean — a fixer `Status: SUCCESS` with a non-empty `Files changed` can never self-approve to a clean exit. A `Reviewer re-run required: false` printed alongside a non-empty `Files changed` is overridden to `true` and noted as a self-declared protocol violation. The `Files changed` list is SSOT-bound: it is attested by the same fixer `delegation_proof_id` the orchestrator quotes in its End-of-Turn Delegation Attestation, so the signal cannot be forged without spawning the fixer.
+3. **Confirmation pass** after clean review: lightweight re-review checking only (a) no new test failures vs Phase 2 baseline, (b) no type errors introduced, (c) acceptance criteria still met. Does not re-run the full checklist.
+   - **Runtime calibration second pass (orchestrator-owned).** At this would-be-clean exit, the orchestrator — not the stateless reviewer — evaluates the `rules/hatch3r-reviewer-calibration.md` trigger: read the cross-run `consecutive_clean_pass_count` from `.hatch3r/calibration-state.json`, increment it for this clean run, and fire a second-pass review (different model class, else same-class re-roll) when the post-increment count is a multiple of `N` (default 5) OR — for a high-risk diff (`floor:security` / auth / CQ3-security-dispatch files) — on the first clean PASS. A divergent second pass reverts the exit to step 2 (`REQUEST CHANGES`) and resets the count to 0; persist the count (atomic write via `src/merge/safeWrite.ts`) and append the calibration-log record. A REQUEST CHANGES or DESIGN_OBJECTION at any iteration also resets the persisted count to 0.
+4. Max iterations reached: surface to user with a structured summary (iteration count, remaining Critical findings with file:line, remaining Warnings, fix-manually-vs-accept-risk recommendation). Never present raw reviewer output unsummarized.
+5. **Review gate confidence signal:** on a clean verdict, record the iteration count in `PipelineContext.reviewResult.iterations`. Clean-on-first-pass signals higher confidence than clean-after-multiple-iterations; Phase 4 and the orchestrator factor this into risk assessment.
 **Phase 4 — Final Quality** (after review loop is clean):
-Launch Phase 4 specialists in parallel, bounded by `max_phase4_parallel` (default `3`, override via `HATCH3R_MAX_PHASE4_PARALLEL` env var; valid range 1-16, values outside the range fall back to default with a logged warning). The bound exists to cap per-orchestrator concurrent context cost — it does not soften the P8 B2 directive that fan-out scales with task decomposition. When the number of applicable specialists exceeds `max_phase4_parallel`, batch them by severity-descending priority: `CRITICAL → HIGH → MEDIUM → LOW` (severity is the worst-case finding class the specialist is expected to surface, per the `hatch3r-test-writer` / `hatch3r-security-auditor` always-on baseline → CRITICAL, conditional UI/security/perf → HIGH, docs/lint → MEDIUM, low-impact specialists → LOW). Within the same severity bucket, dispatch order is the trigger-table order in the table above. Each batch runs to completion (all specialists return SUCCESS/PARTIAL/FAILED) before the next batch starts; the validation pass below runs once after the final batch.
+Launch Phase 4 specialists in parallel, bounded by an orchestrator-honored fan-out width `max_phase4_parallel` (default `8` — covers the empirical maximum of applicable specialists per the trigger table, so a typical Tier 3 change fans out in at most 2 batches). This bound is LLM-honored orchestrator guidance, not a code-enforced cap: the host Task tool is the actual dispatcher and applies no platform fan-out limit, so no hatch3r module reads an env var or clamps the count — the orchestrator self-limits per this prose. The bound exists for upstream provider rate-limit headroom (RPM/TPM) — a true dependency edge — NOT per-orchestrator context cost; token cost never serializes independent work (P8 dominates P7). A non-rate-limited orchestrator MAY raise the width up to the full applicable-specialist set. **Rate-limit back-off (orchestrator-LLM guidance):** when the orchestrator observes ≥3 consecutive rate-limit-class transient failures, reduce the active fan-out width by 1 for the next batch and record the back-off in the Iteration Summary; never silently cap a healthy run. When applicable specialists exceed the bound, batch by severity-descending priority `CRITICAL → HIGH → MEDIUM → LOW` (severity is the worst-case finding class the specialist surfaces: always-on testability (CQ5) / security (CQ3) → CRITICAL, conditional CQ1/CQ4/CQ7 (ui/reliability/performance) → HIGH, docs/lint → MEDIUM, low-impact → LOW); within a bucket, dispatch in trigger-table order. Each batch runs to completion before the next starts; the validation pass runs once after the final batch. The applicable specialists and their trigger conditions are listed in the Phase 4 Specialist Trigger Table below.
-- **Always** (except when Phase Skip Criteria applies — see below)**:** `hatch3r-test-writer`, `hatch3r-security-auditor`
-- **Evaluate:** `hatch3r-docs-writer` (when APIs/architecture/UX affected)
-- **Conditional:** `hatch3r-lint-fixer`, `hatch3r-a11y-auditor`, `hatch3r-perf-profiler`, `hatch3r-dependency-auditor`, `hatch3r-architect`, `hatch3r-devops`
+**Specialist Prompt Enrichment:** When spawning Phase 4 specialists, include the Phase 2 `filesChanged` list (focus on affected code), the Phase 3 review verdict summary (avoid re-flagging reviewed issues), and `researchFindings.blastRadius` (assess downstream impact).
-**Specialist Prompt Enrichment:** When spawning Phase 4 specialists, include:
-- The `filesChanged` list from Phase 2 so specialists focus on affected code.
-- The review verdict summary from Phase 3 so specialists do not re-flag already-reviewed issues.
-- The `researchFindings.blastRadius` so specialists can assess downstream impact of their changes.
+**Runtime trigger evaluation (D6-M11):** the orchestrator harness calls `shouldTriggerSpecialist(specialist, changedFiles, projectType)` from `src/pipeline/pipelineContext.ts` to evaluate whether each specialist applies to the current change set. The function returns `{ triggered: boolean, reasons: string[] }` and consumes the same `SPECIALIST_TRIGGER_TABLE` that the prose table below mirrors. Treat the prose as a quick reference; treat the TS predicate as the authoritative gate. `npm run validate:specialist-roster` enforces parity.
 **Phase 4 Specialist Trigger Table:**
 | Specialist | Mode | Trigger Conditions |
 |-----------|------|--------------------|
-| `hatch3r-test-writer` | Always | Any code change; may skip per Phase Skip Criteria |
-| `hatch3r-security-auditor` | Always | Any code change; may skip per Phase Skip Criteria |
 | `hatch3r-docs-writer` | Evaluate | Public API, architecture, or UX changes |
 | `hatch3r-lint-fixer` | Conditional | Lint/type errors present |
-| `hatch3r-a11y-auditor` | Conditional | UI/accessibility changes |
-| `hatch3r-perf-profiler` | Conditional | Performance-sensitive changes |
-| `hatch3r-dependency-auditor` | Conditional | Dependency files modified (package.json, go.mod, Cargo.toml, requirements.txt, Gemfile, pom.xml, pubspec.yaml, mix.exs, composer.json, and their lockfiles) |
 | `hatch3r-architect` | Conditional | Architectural decisions, new modules/services |
 | `hatch3r-devops` | Conditional | CI/CD or infrastructure changes |
+| `hatch3r-ui` (CQ1) | Conditional | UI component / theme / token files modified (`*.{tsx,jsx,vue,svelte}`, `tailwind.config.*`, design-token registries) |
+| `hatch3r-ux` (CQ2) | Conditional | Flow / modal / route-transition / error-state files modified; microcopy or i18n strings changed |
+| `hatch3r-security` (CQ3) | Always | Any code change (always-mode floor — absorbs legacy security-auditor scope); auth / JWT / OAuth / WebAuthn code modified; release workflow modified; cookie or session handling modified; dependency files modified |
+| `hatch3r-reliability` (CQ4) | Conditional | Service handler / OTel / SLO / retry / circuit-breaker code modified; Kubernetes probe manifests modified |
+| `hatch3r-testability` (CQ5) | Always | Any code change (always-mode floor — absorbs legacy test-writer scope); test code added or modified; mandate-map feature class introduced; coverage threshold or runner config modified |
+| `hatch3r-scalability` (CQ6) | Conditional | Request handler / queue client / connection-pool / cache / background-job code modified |
+| `hatch3r-performance` (CQ7) | Conditional | ORM query / data-access layer / UI-rendering component / bundle config modified; vendor dependency >50KB introduced |
+| `hatch3r-maintainability` (CQ8) | Conditional | Any code mutation (duplication + complexity scan); schema / migration / API spec (OpenAPI / GraphQL SDL / Protobuf) modified |
+| `hatch3r-enhancability` (CQ9) | Conditional | User-visible behavior modified; public API surface modified (OpenAPI / GraphQL SDL / AsyncAPI); config schema or feature-flag definition modified |
-**Project-Type-Aware Specialist Selection:**
+**CQ specialist consolidation.** Each CQ-vector specialist owns the full scope previously split between a legacy specialist and the CQ row. `ui` (CQ1) covers axe-core + design-token + four-state + reuse plus deep ARIA / reduced-motion; `security` (CQ3) covers OAuth 2.1 + OIDC + DPoP, SBOM/cosign, OWASP ASI plus the always-on security floor and project-specific deep audits; `performance` (CQ7) covers CWV, p95/p99, bundle size, N+1 plus profile-driven hot-path analysis; `testability` (CQ5) covers mandate-map verification plus test authoring. Per-agent boundaries are documented in each agent file's opening section.
-When `PipelineContext.projectType` is available (populated from repo analysis), use the detected languages and frameworks to enrich specialist prompts with language-specific hints. For example:
-- **TypeScript/JavaScript:** Include strict mode checks for lint-fixer, framework-specific test patterns for test-writer.
-- **Python:** Include ruff/mypy hints for lint-fixer, pytest patterns for test-writer, SSTI/SQLi checks for security-auditor.
-- **Go:** Include golangci-lint for lint-fixer, govulncheck for security-auditor, table-driven test patterns for test-writer.
-- **Rust:** Include clippy lints for lint-fixer, cargo-audit for security-auditor.
+**Verification harness binding (CQ specialist → verify skill).** A CQ specialist runs its matching verify-class skill as the pass/fail evidence harness for its Phase 4 gate, so audit semantics are not re-authored in two places (D16.3): `ui` (CQ1) + `ux` (CQ2) → `hatch3r-ui-ux-verify`; `reliability` (CQ4) → `hatch3r-reliability-verify` + `hatch3r-observability-verify` (the latter covers OTel span / trace-id correlation on the request path). `hatch3r-qa-validation` (no 1:1 CQ specialist — release/acceptance E2E) and `hatch3r-browser-verify` (multi-purpose Playwright tool, default-ON per UI-affecting invocation) stay standalone harnesses invoked by the orchestrator, not bound to one CQ row. The reciprocal "Invoked by" upstream-citation lives in each verify skill's `## Invoked by` subsection.
-See `src/pipeline/pipelineContext.ts` for the full `LANGUAGE_SPECIALIST_CONFIGS` mapping.
+**Project-Type-Aware Specialist Selection:** When `PipelineContext.projectType` is available, enrich specialist prompts with language-specific hints (e.g., ruff/mypy + pytest + SSTI/SQLi for Python; golangci-lint + govulncheck for Go; clippy + cargo-audit for Rust). See `src/pipeline/pipelineContext.ts` `LANGUAGE_SPECIALIST_CONFIGS` for the full mapping.
 ### Phase 4 Validation Pass
-After all Phase 4 specialists complete, run a validation pass to catch regressions:
-1. Run test suite and type checker. Compare against Phase 3 baseline cached in `PipelineContext`.
-2. No new failures: proceed to completion.
-3. New failures: identify causing specialist, spawn `hatch3r-fixer`, re-validate (max 2 iterations).
-4. Persistent regressions: surface to user. Do not silently accept.
-5. If any specialist produced code fixes (not just findings), spawn a lightweight `hatch3r-reviewer` re-review scoped to files modified by Phase 4 specialists. This prevents specialist fixes from bypassing the Phase 3 review gate. Max 1 re-review iteration; Critical findings trigger a single fixer pass.
+After all Phase 4 specialists complete, run a validation pass: run the test suite + type checker against the Phase 3 baseline cached in `PipelineContext`. No new failures → complete. New failures → identify the causing specialist, spawn `hatch3r-fixer`, re-validate (max 2 iterations, matches `DEFAULT_MAX_VALIDATION_PASS_ITERATIONS` in `src/pipeline/pipelineContext.ts`; basis + recalibration triggers in `VALIDATION_PASS_CALIBRATION`; kept in sync by `src/__tests__/pipeline/pipelineContext.test.ts`, CI-enforced); persistent regressions surface to user (never silently accept). If any specialist produced code fixes (not just findings), spawn a lightweight `hatch3r-reviewer` re-review scoped to the specialist-modified files (prevents specialist fixes bypassing the Phase 3 gate; max 1 re-review iteration, Critical findings trigger a single fixer pass).
 ### Specialist Success Criteria
-| Specialist | Success Criterion |
-|-----------|-------------------|
-| `hatch3r-test-writer` | All new/modified code paths have tests; no untested branches in changed files. |
-| `hatch3r-security-auditor` | No HIGH/CRITICAL findings unresolved; MEDIUM findings documented with plan. |
-| `hatch3r-docs-writer` | Affected APIs, architecture, and UX changes reflected in docs. |
-| `hatch3r-lint-fixer` | Zero lint/type errors in changed files. |
-| `hatch3r-a11y-auditor` | WCAG AA compliance; no new a11y violations. |
-| `hatch3r-perf-profiler` | No performance regressions; new hot paths benchmarked. |
-| `hatch3r-dependency-auditor` | No known CVEs; license compatibility verified. |
-| `hatch3r-architect` | ADRs documented; design aligns with patterns or divergence justified. |
-| `hatch3r-devops` | CI/CD passes end-to-end; deployment config validated. |
+- **testability (CQ5):** all new/modified code paths have tests meeting the mandate map; no untested branches in changed files.
+- **security (CQ3):** no HIGH/CRITICAL findings unresolved; MEDIUM documented with plan; CQ3 thresholds (npm provenance, SBOM, SHA-pin, OWASP ASI) met for in-scope changes.
+- **docs-writer:** affected APIs, architecture, and UX reflected in docs. **lint-fixer:** zero lint/type errors in changed files.
+- **ui (CQ1):** WCAG AA compliance; no new a11y violations; design-token + four-state coverage; reuse-first delta.
+- **performance (CQ7):** no performance regressions; new hot paths benchmarked; CWV / p95/p99 / bundle-size budgets met.
+- **architect:** ADRs documented; design aligns with patterns or divergence justified. **devops:** CI/CD passes end-to-end; deployment config validated.
 ## Skill Loading Directives
-Load the matching skill before implementation:
-| Task Type | Skill |
-|-----------|-------|
-| `type:bug` | `hatch3r-bug-fix` |
-| `type:feature` | `hatch3r-feature` |
-| `type:refactor` + `area:ui` | `hatch3r-visual-refactor` |
-| `type:refactor` + behavior change | `hatch3r-logical-refactor` |
-| `type:refactor` (other) | `hatch3r-refactor` |
-| `type:qa` | `hatch3r-qa-validation` |
-Skill-referenced agent delegations are mandatory.
+Load the matching skill before implementation: `type:bug` → `hatch3r-bug-fix`; `type:feature` → `hatch3r-feature`; `type:refactor` + `area:ui` → `hatch3r-visual-refactor`; `type:refactor` + behavior change → `hatch3r-logical-refactor`; `type:refactor` (other) → `hatch3r-refactor`; `type:qa` → `hatch3r-qa-validation`. Skill-referenced agent delegations are mandatory.
 ## Subagent Spawning Protocol
-1. Use `subagent_type: "generalPurpose"` for all delegations.
-2. Include: agent protocol, applicable `scope: always` rules, tooling hierarchy, relevant learnings.
-3. Launch independent subagents in parallel — maximum parallelism.
-4. Await and review results. Surface BLOCKED or PARTIAL to user.
-## Parallel Safety
-This section documents when spawning multiple sub-agents concurrently is safe and when it must remain sequential.
-### Design Rationale
+Use `subagent_type: "generalPurpose"` for all delegations. Include the agent protocol (the hatch3r role id, e.g. `hatch3r-reviewer`, named in the prompt), applicable `scope: always` rules, tooling hierarchy, and relevant learnings. Launch independent sub-agents in parallel (maximum parallelism); await and review results, surfacing BLOCKED or PARTIAL to the user.
-The pipeline's default is **linear per task** — Phase 1 → Phase 2 → Phase 3 → Phase 4, serially. `PipelineContext` captures a single logical handoff token that flows through sub-agents in sequence. LLM-driven orchestrators reason better with sequential, linearly-ordered context. Within Phase 4, parallel specialists are safe because they operate on read-only artifacts. Extending parallelism to Phases 1-3 requires explicit conditions.
+**Tool-allowlist enforcement boundary (ASI02/ASI03).** The generic-spawn convention has a trust-boundary consequence the orchestrator MUST account for: the Claude Code PreToolUse hook (`src/pipeline/agentToolAllowlist.ts::buildClaudePreToolUseHookScript`) gates only when the payload `agent_type` starts with `hatch3r-`; a `generalPurpose` spawn carries Claude Code's own `agent_type` (`general-purpose`), so the runtime hook passes it through (this is intentional — Claude Code's built-in sub-agents must not be governed by hatch3r policy). Therefore the **active** allowlist enforcement for delegated hatch3r work is the orchestrator-boundary gate `checkToolAccess(roleId, toolCategory)` (`src/pipeline/agentToolAllowlist.ts`), which the orchestrator applies using the hatch3r role id it placed in the prompt — deny-by-default before forwarding a tool category to the sub-agent. The runtime PreToolUse hook is a defense-in-depth second layer that fires only for adapters/sessions that spawn role-bearing native sub-agents (`subagent_type: "hatch3r-<role>"`); under the generic-spawn default it does not fire, and the boundary gate is the sole enforcement point.
-### Parallel-Safe Operations
-1. **Phase 4 Specialists** — test-writer, security-auditor, docs-writer, lint-fixer, etc. Read-only input from Phase 3 completion; independent outputs; no shared state mutation.
-2. **Intra-Phase-1 Researcher Modes** — multiple modes on the SAME task (symptom-trace + root-cause + codebase-impact) when the task is self-contained. Operate on read-only codebase; produce independent findings.
-3. **Per-Module Phase 2 Fan-Out (Disjoint `affectedFiles`)** — multi-module tasks with non-overlapping file sets; each implementer commits independently; merged post-Phase-2.
-4. **Tier 2/3 Elicitation Researchers** — parallel researchers on the same task to surface different perspectives; outputs tagged with confidence + perspective; orchestrator aggregates.
-### NOT Parallel-Safe
+## Parallel Safety
-1. **Cross-Phase Execution** — Phase 1 must complete before Phase 2 (Phase 2 depends on researchFindings). Phase 2 must complete before Phase 3 (review needs diff). Phase 3 must complete before Phase 4 (quality checks assume review-clean state).
-2. **Phase 3 Review Loop Iterations** — reviewer → fixer → re-reviewer must be serial.
-3. **Overlapping-File Implementers** — two Phase 2 implementers touching the same file must execute serially or use a merge-conflict detection gate.
-4. **Shared `PipelineContext` Field Writers** — if multiple agents mutate `state` / `featureFlags` / `metadata`, serialize them. Parallel agents must only READ context.
-5. **Phase 4 Validation Re-Review** — the confirmation pass after Phase 4 specialists must run serially; it checks fix-driven regressions.
+Default is **linear per task** (Phase 1 → 2 → 3 → 4 serially) — `PipelineContext` is a single handoff token and LLM orchestrators reason better with sequential context. Phase 4 specialists parallelize because they read-only Phase 3 artifacts; extending parallelism to Phases 1-3 requires the conditions below.
 ### Three Conditions to Parallelize
-ALL three must hold:
-1. **Read-only or disjoint writes** — agents read-only from context OR write to disjoint files/fields (no conflict zone).
-2. **Deterministic aggregation** — outputs merge without orchestrator intervention (tests: pass if all pass; findings: union).
-3. **Overhead < savings** — coordination cost (merge, conflict detection) is less than latency savings (max-of-agents vs sum-of-agents).
+ALL three must hold: (1) **read-only or disjoint writes** (no conflict zone); (2) **deterministic aggregation** (outputs merge without orchestrator intervention — tests pass-if-all-pass, findings union); (3) **no shared mutable state** (agents that mutate `PipelineContext.state`/`featureFlags`/`metadata` serialize; parallel agents only READ).
-**Default:** When in doubt, serialize. For typical hatch3r tasks (1–5 sub-tasks) the DAG-scheduling overhead often outweighs concurrency gain.
+**Parallel-safe:** Phase 4 specialists; intra-Phase-1 researcher modes on a self-contained task; per-module Phase 2 fan-out on disjoint `affectedFiles` (merged post-Phase-2); Tier 2/3 elicitation researchers (outputs tagged with confidence + perspective). **NOT parallel-safe:** cross-phase execution (each phase depends on the prior's output); Phase 3 review-loop iterations (reviewer → fixer → re-reviewer serial); overlapping-file implementers (serialize or use a merge-conflict gate); Phase 4 validation re-review.
-### Cost-Dominance Principle
+**Cost-Dominance Principle.** Token cost of sub-agent invocation never justifies serialization of independent work. The three safety conditions govern WHEN parallelism is safe; cost does not govern WHETHER to parallelize. When in doubt, fan out. Serialization is only valid on true dependency edges.
-Token cost of sub-agent invocation never justifies serialization of independent work. The three safety conditions (read-only or disjoint writes, deterministic aggregation, no shared mutable state) govern WHEN parallelism is safe; cost does not govern WHETHER to parallelize. When in doubt, fan out. Serialization is only valid on true dependency edges.
+**Scaling Heuristic.** Sub-agent count tracks task decomposition: N independent modules → N parallel Phase-2 implementers; M specialist gates → M parallel Phase-4 specialists; K independent research questions → K parallel `hatch3r-researcher` sub-agents (one per question, findings unioned post-Phase-1). The K-parallel-researcher path is mandatory only when Phase 1 decomposes into ≥2 questions whose answers do not depend on each other; a single-question task keeps one researcher running its mode set serially. Orchestrators emit `sub_agents_spawned: {count, rationale}` in their structured output.
-### Scaling Heuristic
+### Concurrent Invocation Handling
-Sub-agent count tracks task decomposition: N independent modules → N parallel Phase-2 implementers; M specialist gates → M parallel Phase-4 specialists; K independent research questions → K parallel researcher modes. Orchestrators emit `sub_agents_spawned: {count, rationale}` in their structured output.
+Two top-level pipelines running at once (e.g. `hatch3r-workflow` in one shell, `hatch3r-board-pickup` in another) are bounded by the same three conditions, applied cross-pipeline (D7-SA7.5-F7.5.6): (1) **Advisory lock-note (best-effort, NOT atomic — D7-27)** — before Phase 1, the orchestrator writes/reads `.hatch3r/.lock` (JSON: `pid`, `command`, `branch`, `correlation_id`, `started_at`); clear on completion/abort; treat a note older than 6h as stale. This is an advisory coordination note, not a mutual-exclusion primitive: there is no `hatch3r` lock verb and no atomic acquire in `src/` (the only cross-process lock that ships is `acquireWriteLock` in `src/merge/safeWrite.ts`, scoped to single-file atomic writes, not top-level pipelines), so the LLM orchestrator's read-then-write is TOCTOU by construction and `src/cli/commands/status.ts` already labels this file "advisory". It exists to surface a likely collision, never to guarantee exclusion. (2) **Detect-then-warn** — if a live note names the same branch or an open `.hatch3r/hatch.json` board transaction, WARN and ASK (proceed on a separate branch / wait / abort); never silently co-mutate shared state. (3) **True isolation via worktree, not the lock-note (D7-27)** — when concurrency must actually be conflict-free rather than merely warned (the parallel-implementer path), route each pipeline through `hatch3r worktree-setup <name>` — the isolation primitive hatch3r already ships (`src/cli/commands/worktreeSetup.ts`; `commands/board/pickup-delegation-multi.md` already uses it per implementer) — so the pipelines write disjoint working trees and integrate back on completion, satisfying the disjoint-writes safety condition without relying on the non-atomic note. (4) **Cache-sharing** — `.hatch3r/learnings/` is read-many, write-once-at-completion with timestamp-ordered conflict resolution. Cross-task context sharing is bounded by the no-shared-mutable-state condition: learnings consolidate at pipeline completion; mid-pipeline writes are out of scope to preserve parallel-safety determinism (D7-SA7.5-F7.5.8). Each command's Guardrails cite this subsection.
 ## Cross-Phase Error Propagation
-When a phase produces a non-SUCCESS status, the orchestrator must propagate error context to downstream phases rather than silently dropping it:
+On a non-SUCCESS status, the orchestrator MUST propagate error context downstream, never silently drop it. **Phase 1 PARTIAL:** include `researchGaps` in the implementer prompt and set confidence expectations accordingly. **Phase 2 PARTIAL:** include `reason` + unimplemented acceptance criteria in the reviewer prompt (reviewer distinguishes "not done yet" from "done incorrectly"). **Phase 3 UNRESOLVED:** include the unresolved findings in Phase 4 specialist prompts (specialists must not conflict with known issues). **Phase 4 specialist FAILED:** include the failure reason when surfacing — never report "Phase 4 failed" without naming which specialist and why.
-1. **Phase 1 PARTIAL** (incomplete research): Include the `researchGaps` list in the implementer prompt so the implementer knows which areas lack verified context. Set implementer confidence expectations accordingly.
-2. **Phase 2 PARTIAL** (incomplete implementation): Include the `reason` field and list of unimplemented acceptance criteria in the reviewer prompt. The reviewer must distinguish between "not done yet" and "done incorrectly."
-3. **Phase 3 UNRESOLVED** (review loop exhausted): Include the unresolved findings list in the Phase 4 specialist prompts. Specialists must not introduce changes that conflict with known unresolved issues.
-4. **Phase 4 specialist FAILED**: Include the failure reason when surfacing to the user. Never report "Phase 4 failed" without specifying which specialist failed and why.
+**Sub-agent-failure handling (shared clause; all commands cite this — never inline).** When any spawned implementer/fixer/specialist sub-agent FAILS or returns no usable output: (1) retry once with the same prompt; (2) if the retry fails, re-spawn `hatch3r-fixer` (Phase 3) with the failure reason + partial output as failure context — `hatch3r-fixer` is the code-mutation path, so the work stays delegated; (3) if the re-spawn also fails, emit `BLOCKED_OTHER` with a one-sentence reason and ASK the user (fix-manually vs adjust-scope vs accept-risk). The orchestrator MUST NOT fall back to inline implementation — that is the issue #73 bypass mode (see Mandatory Delegation Directive). The sole exception is `hatch3r-quick-change`, whose Tier-1 carve-out permits inline retry per its declared scope.
 ## Correlation ID
-Generate a UUID v4 per top-level task before Phase 1. Include in every subagent prompt as `correlation_id`. All subagents include it in logs, outputs, and status reports. Epic sub-issues get individual IDs; batch tasks share one ID with a sub-task index.
+Generate a UUID v4 per top-level task before Phase 1. Include in every sub-agent prompt as `correlation_id`. All sub-agents include it in logs, outputs, and status reports. Epic sub-issues get individual IDs; batch tasks share one ID with a sub-task index.
 ## Severity Scale
@@ -307,70 +205,46 @@ Generate a UUID v4 per top-level task before Phase 1. Include in every subagent
 | **LOW** | Track for future. Style nits, minor improvements. | Log only. No merge gate. |
 | **INFO** | Informational. Observations, suggestions. | Awareness only. |
-All subagents MUST map findings to this scale.
 ## Status Codes
-| Status | Meaning |
-|--------|---------|
-| **SUCCESS** | Fully completed, all criteria met. |
-| **PARTIAL** | Partially completed; include `reason` field. |
-| **FAILED** | No usable output; include `reason` field. |
-| **SKIPPED** | Intentionally not executed. |
-| **TIMEOUT** | Time budget exceeded; forward partial output. |
+All sub-agents MUST map findings to the Severity Scale above. **SUCCESS** (fully completed, all criteria met) · **PARTIAL** (include `reason`) · **FAILED** (no usable output; include `reason`) · **SKIPPED** (intentionally not executed) · **TIMEOUT** (time budget exceeded; forward partial output) · **BLOCKED_AMBIGUITY** · **BLOCKED_MISSING_CONTEXT** · **BLOCKED_CONFLICTING_SPECS** · **BLOCKED_MISSING_TOOL** · **BLOCKED_PREMISE_CHALLENGE** · **BLOCKED_OTHER** (one-sentence reason required). The six BLOCKED_* values are the canonical named escalation enum codified in `agents/shared/quality-charter.md` §17; every `agents/hatch3r-*.md` main agent MUST declare a Status field selecting from this enum. BLOCKED_PREMISE_CHALLENGE triggers `isHaltStatus()` from `src/pipeline/pipelineContext.ts::AgentStatus` — orchestrator halts and surfaces the premise concern + ≥1 alternative approach (Finding D7-M1 / D7-SA7.1-1). The reviewer's Phase-3 equivalent is the `DESIGN_OBJECTION` verdict; implementer/researcher/fixer emit the agent status, reviewer emits the verdict — both surface the same premise-challenge across non-overlapping phases.
+## Phase Handoff Contract
+Each phase populates a typed slice of `PipelineContext` (canonical schema: `src/pipeline/pipelineContext.ts::PipelineContext`). Required fields per transition: Phase 1 → 2 sets `researchFindings` (or `researchGaps[]` when skip-documented); Phase 2 → 3 sets `implementationResult` (filesChanged, testsWritten, status ∈ `AgentStatus`, reason); Phase 3 → 4 sets `reviewResult` (iterations, finalVerdict, findings, confirmationPassResult, confidence); Phase 4 → completion sets `qualityResults` (specialists[], validationPass). The typed gate `validatePhaseTransition(context, targetPhase, options?)` returns the `ValidationError[]` set the orchestrator MUST resolve before advancing; forwarding sub-agent output that omits a required field is a Phase Handoff Contract violation (Finding D7-M3 / D7-SA7.1-3) — re-spawn the upstream agent with the gap named. The Phase 3 → 4 advance rejects an `UNRESOLVED` verdict unless `options.allowUnresolvedAdvance` is set (the user-chose-manual skip condition, Finding D7-10) and accepts an absent `reviewResult`/`iterations: 0` only under `options.phase3Skipped` (the docs-only/trivial Phase 3 skip, Finding D7-11). Phase 4 completion is additionally gated by `evaluatePhase4Completion(qualityResults, options)` — a typed predicate aggregating Phase 4 Validation Pass criteria; when `complete: false`, surface `incompletionReason` (Finding D7-M8 / D7-SA7.3-3). **Handoff-loss trigger (Finding D7-23):** when a transition applies lossy compression (`hatch3r-agent-orchestration-detail` § Context-Degradation Policy), the orchestrator records `createPhaseHandoffMetrics` (passing the protected-byte count for never-truncate strategy-#4 bytes) and, when `informationLossEstimate > 0.3`, emits the `formatPhaseHandoffWarning` line in the iteration summary so downstream phases verify critical context survived — this is the command-layer trigger; every pipeline command inherits it via this always-loaded rule.
 ## Phase Skip Criteria
-Consistent criteria for when each pipeline phase can be safely skipped. All commands that use the pipeline MUST reference these criteria — do not invent command-specific skip rules.
+All commands that use the pipeline MUST reference these criteria — do not invent command-specific skip rules.
 | Phase | Can Skip When | Mandatory Minimum (even when skipped) |
 |-------|--------------|--------------------------------------|
 | **Phase 1 (Research)** | Trivial single-line edit (typo, comment, single-value config); Tier 1 single-file change with no cross-module impact; Research already cached in PipelineContext | Affected files identified (even via quick scan); existing tests noted |
 | **Phase 2 (Implement)** | Never — implementation is always required for code changes | All changes via hatch3r-implementer (never inline except trivial items in quick-change) |
 | **Phase 3 (Review)** | All items trivial (quick-change only); documentation-only change with no code | Quality checks (lint/typecheck/test) must pass; acceptance criteria verified |
-| **Phase 4 (Quality)** | Review loop unresolved AND user chose manual resolution; documentation-only; all trivial + quality checks pass (quick-change only) | test-writer + security-auditor always required for code changes; quality checks must pass |
+| **Phase 4 (Quality)** | Review loop unresolved AND user chose manual resolution; documentation-only; all trivial + quality checks pass (quick-change only) | testability (CQ5) + security (CQ3) always required for code changes; quality checks must pass |
 See `src/pipeline/pipelineContext.ts` for the programmatic `PHASE_SKIP_CRITERIA` constant.
 ## Root-Cause Depth Requirements
-When a pipeline phase reports a failure or unexpected result, the orchestrator must perform root-cause classification before deciding the next action:
+When a phase reports a failure or unexpected result, the orchestrator MUST classify root cause before the next action — reject the shallow fix, require the root-cause fix:
-| Symptom | Shallow Fix (avoid) | Root-Cause Fix (required) |
-|---------|---------------------|---------------------------|
-| Test failure after Phase 2 | Disable or skip the failing test | Identify why the implementation breaks the test -- fix the code or update the test with justification |
-| Lint errors after Phase 4 | Add `eslint-disable` comments | Fix the underlying code pattern that triggers the lint rule |
-| Type errors after fixer changes | Cast with `as any` | Trace the type mismatch to its source and fix the type definition or usage |
-| Review loop not converging | Surface to user after 3 iterations without analysis | Classify whether findings are oscillating (fixer A breaks what fixer B fixed) and surface the conflict pattern |
+- **Test failure after Phase 2:** not disable/skip the test — identify why the implementation breaks it; fix the code or update the test with justification.
+- **Lint errors after Phase 4:** not `eslint-disable` comments — fix the underlying code pattern.
+- **Type errors after fixer changes:** not `as any` casts — trace the mismatch to its source and fix the type definition or usage.
+- **Review loop not converging:** not surface after 3 iterations without analysis — classify whether findings oscillate (fixer A breaks what fixer B fixed) and surface the conflict pattern.
-The orchestrator must reject superficial fixes from any subagent. If a fixer's output contains suppression patterns (disable comments, `any` casts, test skips without linked issues), classify as PARTIAL and re-run with an adjusted prompt that requests a root-cause fix.
+Reject superficial fixes from any sub-agent. If a fixer's output contains suppression patterns (disable comments, `any` casts, test skips without linked issues), classify as PARTIAL and re-run with a prompt requesting a root-cause fix. This rejection is backed by a typed advisory: the reviewer runs `detectSuppressionPatterns(diff)` (`src/pipeline/reviewLoop.ts`) over the fixer diff — it flags `as any` casts, `eslint-disable` directives with no issue reference, and `test.skip`/`it.skip`/`describe.skip` with no linked issue. A non-empty `found` is the machine-checkable signal for this gate (mirroring how the orchestrator consults `detectOscillation`); the reviewer downgrades the verdict so the existing review loop forces the re-run.
 ## Task Context Protocols
-**Single-task plain chat:** Classify task type, create synthetic issue context, run full pipeline. For issue references, fetch details via platform CLI.
-**Multi-task plain chat:** Parse into discrete tasks, classify each, build dependency graph, parallelize researchers and implementers per dependency level, run review loop after all implementations, then Phase 4 specialists. When parallel implementers modify the same file: accept disjoint region edits, merge overlapping regions using the larger-scope change as base, and halt on semantic conflicts (contradictory interface/contract changes) for user resolution.
-**Auto-mode guardrails:** In unattended execution, verify scope containment, no unapproved destructive operations, and output schema compliance after each phase. Halt on violation. See `hatch3r-agent-orchestration-detail` for full guardrail specifications.
+**Single-task plain chat:** classify task type, create synthetic issue context, run the full pipeline (fetch issue-reference details via platform CLI). **Multi-task plain chat:** parse into discrete tasks, classify each, build a dependency graph, parallelize researchers + implementers per dependency level, run the review loop after all implementations, then Phase 4 specialists; when parallel implementers touch the same file, accept disjoint-region edits, merge overlapping regions using the larger-scope change as base, and halt on semantic conflicts for user resolution. **Auto-mode guardrails:** verify scope containment, no unapproved destructive operations, and output-schema compliance after each phase; halt on violation. Full specs in `hatch3r-agent-orchestration-detail`.
 ## Rule Application
-All `scope: always` rules apply to every task including subagent work. Include rule directives in subagent prompts.
-### Tiered Rule Inclusion
-**Tier 1 -- Always include (every subagent):**
-- `hatch3r-security-patterns` -- security invariants
-- `hatch3r-code-standards` -- code quality
-**Tier 2 -- Include by phase:**
-- `hatch3r-testing` -- test-writer, implementer, reviewer
-- `hatch3r-accessibility-standards` -- a11y-auditor, reviewer (UI)
-- `hatch3r-git-conventions` -- orchestrator git ops
-- `hatch3r-ci-cd` -- ci-watcher, devops
-- `hatch3r-dependency-management` -- dependency-auditor
-**Tier 3 -- On-demand:**
-- `hatch3r-api-design`, `hatch3r-secrets-management`, `hatch3r-data-classification`, `hatch3r-performance-budgets`, `hatch3r-browser-verification`, `hatch3r-component-conventions`, `hatch3r-i18n`, `hatch3r-theming`, `hatch3r-migrations`, `hatch3r-feature-flags`, `hatch3r-observability-logging`, `hatch3r-observability-metrics`, `hatch3r-observability-tracing`
+All `scope: always` rules apply to every task including sub-agent work; include rule directives in sub-agent prompts. For limited context windows, Tier 1 is mandatory; Tier 2/3 included selectively. Inclusion tiers:
-For limited context windows, Tier 1 is mandatory. Tier 2/3 included selectively by agent role and task scope.
+- **Tier 1 — always include (every sub-agent):** `hatch3r-security-patterns`, `hatch3r-code-standards`.
+- **Tier 2 — by phase:** `hatch3r-testing` (testability/implementer/reviewer); `hatch3r-accessibility-standards` (ui, UI reviewer); `hatch3r-git-conventions` (orchestrator git ops); `hatch3r-ci-cd` (ci-watcher/devops); `hatch3r-dependency-management` (security CQ3 supply-chain slice).
+- **Tier 3 — on-demand by role + scope:** `hatch3r-api-design`, `hatch3r-secrets-management`, `hatch3r-data-classification`, `hatch3r-performance-budgets`, `hatch3r-browser-verification`, `hatch3r-component-conventions`, `hatch3r-i18n`, `hatch3r-theming`, `hatch3r-migrations`, `hatch3r-feature-flags`, `hatch3r-observability-logging`, `hatch3r-observability-metrics`, `hatch3r-observability-tracing`.