npm - hatch3r - Versions diffs - 1.8.0 → 2.0.0 - Mend

hatch3r 1.8.0 → 2.0.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (396) hide show

package/dist/content/rules/hatch3r-agent-orchestration-detail.mdc ADDED Viewed

@@ -0,0 +1,245 @@
+---
+description: Extended orchestration reference — PipelineContext schemas, resilience protocols, observability integration, and auto-mode guardrails
+globs: ["**/.hatch3r/**", "**/pipeline/**", "**/*orchestrat*", "**/*agent*"]
+alwaysApply: false
+precedence: high
+detail_rule: true
+consumed_by: hatch3r-agent-orchestration
+---
+# Agent Orchestration — Extended Reference
+This is the on-demand companion to `hatch3r-agent-orchestration`. Load when you need detailed schemas, failure handling protocols, or guardrail specifications.
+## PipelineContext Schema
+The `PipelineContext` is the structured handoff object passed between pipeline phases. Each phase reads its inputs and writes its outputs to this context.
+```
+PipelineContext {
+  correlationId: string          // UUID v4, generated before Phase 1
+  taskType: "bug" | "feature" | "refactor" | "qa"
+  issueRef: string | null        // Issue number or null for plain chat
+  deepContextTier: 1 | 2 | 3    // Pre-Phase-1 baseline from hatch3r-deep-context scoring
+  // Mid-run tier upgrade (Finding D7-14): set when execution surfaces complexity
+  // the baseline missed (see Complexity-Driven Adaptation + Tier-upgrade propagation).
+  tierUpgrade?: { from: 1|2|3; to: 1|2|3; reason: string; atPhase: 1|2|3|4 }
+  // Detected project type for specialist selection (Finding #56)
+  projectType?: {
+    languages: string[]          // From repo analysis (e.g., "typescript", "python", "go")
+    frameworks: string[]         // Detected frameworks (e.g., "next", "express")
+    isMonorepo: boolean
+    packageManager: string       // "npm" | "yarn" | "pnpm" | "bun" | "unknown"
+  }
+  // Phase 1 outputs (Research)
+  researchFindings: {
+    modes: string[]              // Researcher modes used
+    affectedFiles: string[]      // Files to create/modify/delete
+    blastRadius: string[]        // Downstream consumers
+    existingTests: string[]      // Test files covering affected code
+    dependencies: string[]       // Internal + external dependencies
+    conventions: object | null   // From similar-implementation mode
+    resolvedRequirements: object | null  // From requirements-elicitation
+  }
+  // Research gap flags from mid-implementation checkpoint (Finding #52)
+  researchGaps?: string[]        // Gaps identified during Phase 2
+  // Phase 2 outputs (Implementation)
+  implementationResult: {
+    filesChanged: string[]
+    testsWritten: string[]
+    status: "SUCCESS" | "PARTIAL" | "FAILED" | "SKIPPED" | "TIMEOUT"
+    reason: string | null
+  }
+  // Phase 3 outputs (Review)
+  reviewResult: {
+    iterations: number           // 1 to code-class cap (DEFAULT_MAX_REVIEW_ITERATIONS - 1 = 3)
+    finalVerdict: "CLEAN" | "UNRESOLVED"
+    findings: ReviewFinding[]
+    confirmationPassResult: "PASS" | "FAIL"
+  }
+  // Phase 4 outputs (Quality)
+  qualityResults: {
+    specialists: SpecialistResult[]
+    validationPass: {
+      testsPass: boolean
+      typecheckPass: boolean
+      fixAttempts: number
+      regressionsPersist: boolean
+    }
+  }
+  // Metadata
+  startedAt: string              // ISO-8601
+  completedAt: string | null
+  totalDuration: number | null   // milliseconds
+}
+```
+The TypeScript implementation of this schema with runtime validation is in `src/pipeline/pipelineContext.ts`. Use `validatePhaseTransition()` to verify context completeness before advancing between phases.
+## Resilience and Failure Handling
+### Phase Failure Protocols
+| Phase | Failure Mode | Protocol |
+|-------|-------------|----------|
+| Phase 1 (Research) | Researcher timeout | Proceed with partial findings; flag missing modes. |
+| Phase 1 (Research) | No relevant findings | Surface to user; ask whether to proceed with implementation. |
+| Phase 2 (Implementation) | Build/test failure | Attempt self-fix (max 2 iterations). Escalate to user if unresolved. |
+| Phase 2 (Implementation) | Scope creep detected | Halt. Surface deviation to user. Resume only with approval. |
+| Phase 3 (Review) | Max iterations (3) (code-class cap = `DEFAULT_MAX_REVIEW_ITERATIONS` - 1) | Surface unresolved findings to user. Do not merge. |
+| Phase 3 (Review) | DESIGN_OBJECTION verdict | Terminate review loop immediately. Surface the objection and alternative approaches to the user for an architectural decision. Do not spawn fixer. |
+| Phase 3 (Review) | Fixer introduces regressions | Revert fixer changes. Surface original findings + regression to user. |
+| Phase 4 (Quality) | Conditional-specialist timeout | Log timeout. Continue with available results. Flag in output. |
+| Phase 4 (Quality) | Mandatory CQ3/CQ5 specialist non-completion (TIMEOUT / crash / no-output) | Fail closed. Surface `BLOCKED`. A mandatory always-mode specialist (`hatch3r-security` CQ3, `hatch3r-testability` CQ5 per `hatch3r-agent-orchestration.md` Phase 4 Specialist Trigger Table) that does not return `COMPLETE` leaves its gate absent; absence-of-finding is NOT an implicit pass. This row is the prose face of the typed gate `evaluatePhase4Completion` (`src/pipeline/pipelineContext.ts`): a non-SUCCESS floor specialist yields `mandatoryFloorsSatisfied: false` → `complete: false`. Do not merge/release. Require an explicit operator decision (re-run, accept-risk, or abort) before advancing. |
+| Phase 4 (Quality) | Validation pass fails | Spawn fixer (max 2 attempts). Surface if unresolved. |
+### Subagent Error Recovery
+1. **Timeout:** Forward partial output. Mark status `TIMEOUT`. Continue pipeline ONLY for conditional specialists. A mandatory always-mode CQ3/CQ5 specialist on TIMEOUT fails closed — surface `BLOCKED`, never treat the missing gate as a pass.
+2. **Crash/no output:** Mark status `FAILED`. Log reason. Continue if non-blocking. A mandatory always-mode CQ3/CQ5 specialist is blocking — a crash or no-output return fails closed (surface `BLOCKED`, explicit operator decision before merge/release), it is never "non-blocking".
+3. **Conflicting outputs:** When two specialists disagree (e.g., security vs performance), escalate to user with both positions.
+4. **Resource exhaustion:** Apply the Context-Degradation Policy (this file) — compress at `>50%` window, restart at `>75%`.
+### Retry Policies
+- Subagent retries: 0 — never retry the same failed operation identically; spawn a new agent with an adjusted prompt/approach instead.
+- Phase retries: Phase 3 review loop retries up to 3 iterations (code-class cap = `DEFAULT_MAX_REVIEW_ITERATIONS` - 1). All other phases: 0 retries (escalate to user).
+## Observability Integration
+### Structured Logging
+All pipeline events should produce structured log entries when the project has observability infrastructure:
+```
+{
+  "event": "pipeline.phase.start" | "pipeline.phase.end" | "subagent.spawn" | "subagent.complete",
+  "correlationId": "...",
+  "phase": 1-4,
+  "agent": "hatch3r-implementer",
+  "status": "SUCCESS" | "PARTIAL" | "FAILED" | "TIMEOUT",
+  "duration": 12345,
+  "metadata": {}
+}
+```
+### Metrics to Track
+| Metric | Description |
+|--------|-------------|
+| Pipeline duration | Total time from Phase 1 start to Phase 4 end |
+| Phase duration | Time per phase |
+| Review iterations | Number of Phase 3 review cycles |
+| Specialist invocations | Count of Phase 4 specialists launched |
+| Fix attempts | Number of fixer invocations across all phases |
+| Failure rate | Proportion of tasks not reaching SUCCESS |
+### Correlation ID Propagation
+The correlation ID generated before Phase 1 MUST be:
+- Included in every subagent prompt
+- Included in every structured log entry
+- Included in every status report and output
+- Used as the key for cross-referencing pipeline artifacts
+## Auto-Mode Guardrails
+When operating in unattended/auto mode (no human in the loop), enforce these guardrails after each phase:
+### Scope Containment
+- **File scope:** Only modify files identified in Phase 1 research + files discovered during implementation that are direct dependencies. No drive-by refactors.
+- **Dependency scope:** Do not add new external dependencies without explicit approval.
+- **Destructive operations:** Never execute `rm -rf`, `DROP TABLE`, force push, or other destructive operations in auto mode. Queue for human review.
+### Output Schema Compliance
+After each phase, validate that the output conforms to the expected PipelineContext schema fields. Missing required fields trigger a HALT.
+### Escalation Triggers
+Auto-mode MUST halt and surface to user when:
+1. A CRITICAL finding is detected in Phase 3.
+2. Phase 4 validation pass fails after 2 fix attempts.
+3. Any specialist reports FAILED status, OR a mandatory always-mode CQ3/CQ5 specialist (`hatch3r-security`, `hatch3r-testability`) returns any of {FAILED, TIMEOUT, no-output} — a mandatory gate that did not return `COMPLETE` is BLOCKED, not a silent pass.
+4. Scope containment violation detected.
+5. Implementation touches more than 20 files (may indicate scope creep).
+### Budget Guards
+- **Token budget:** If cumulative subagent token usage exceeds 80% of estimated budget, surface to user before spawning additional agents.
+- **Time budget:** If pipeline duration exceeds 2x the estimated time (based on deep context tier), surface status and request continuation approval.
+## Adaptive Pipeline Behavior
+### Complexity-Driven Adaptation
+The pipeline should adapt its behavior based on observed task complexity, not just the initial tier assignment:
+| Signal During Execution | Adaptation |
+|------------------------|------------|
+| Phase 1 research finds >10 affected files (initial estimate was <5) | Upgrade tier to 3 if currently 2. Record the upgrade in `PipelineContext.tierUpgrade` (`src/pipeline/pipelineContext.ts::TierUpgrade`: `{from, to, reason, atPhase}`) and re-run `codebase-impact` at `deep` depth before Phase 2. |
+| Phase 2 implementer reports >3 research gaps | Pause Phase 2. Run targeted researcher with gap-specific modes before continuing. |
+| Phase 3 review loop reaches iteration 2 with increasing Critical count | Classify as complexity underestimate. Surface to user with recommendation to break the task into smaller sub-tasks. |
+| Phase 4 validation pass fails on first attempt | Check whether failure is in hatch3r-testability's new tests (expected -- fix test) or in pre-existing tests (regression -- fix implementation). Route to appropriate fixer. |
+**Tier-upgrade propagation (Finding D7-14).** A mid-run upgrade is not just a log line — it MUST change downstream behavior. After populating `PipelineContext.tierUpgrade`, the orchestrator reads the tier for every subsequent depth decision via `resolveEffectiveTier(context)` (returns `tierUpgrade.to` when an upgrade is recorded, else `deepContextTier`), so the Tier→Phase-4-specialist-depth mapping in `hatch3r-agent-orchestration.md` (Deep Context Integration) scales to the upgraded tier instead of staying pinned to the stale baseline. The carrier only ever raises the tier (a recorded `to <= from` is ignored). Surface the upgrade once in the iteration summary via `formatTierUpgradeNote(context)` (one line, returns `null` when no upgrade occurred) so the adaptation is visible, not silent.
+### Post-Pipeline Learning
+After pipeline completion, the orchestrator captures lessons for future runs:
+1. **Tier accuracy:** Was the initial tier correct? If the pipeline needed adaptation (above), persist a tier-accuracy record (`taskId`, `initialTier`, `finalTier`, `adjustmentReasons`, `correlationId`, `ts`) to `.hatch3r/telemetry/<session-id>-tier.json` via the atomic-write path in `src/pipeline/costEstimator.ts` (sibling of `CostTelemetryRecord`). Tier mismatch beyond ±10% across 50 tasks triggers a CL-3 signal-weight recalibration proposal.
+2. **Phase duration ratios:** Record time spent per phase. Anomalous ratios (e.g., Phase 3 taking 5x Phase 2) indicate systemic issues worth investigating.
+3. **Specialist value:** Record which Phase 4 specialists produced actionable findings vs. clean reports. Over time, this data informs smarter specialist dispatch.
+## Multi-Task and Concurrent Pipeline Support
+Canonical schema for the one-sentence multi-task / epic / batch handling in the orchestration rule's `Task Context Protocols`, so pack integrators have a deterministic specification (Finding D7-M13 / D7-SA7.5-3).
+**Dependency-graph construction.** Multi-task input (epic, plain-chat multi-request, or board batch) is parsed into discrete units. Each unit carries its own `correlationId` (epic sub-issues get individual IDs sharing a parent epic ID; batch tasks share one ID with a sub-task index). The orchestrator builds a directed acyclic dependency graph from declared inter-unit constraints (e.g., "issue B depends on issue A's API changes"); units with no declared dependency form the root level.
+**Per-level parallelism.** At each dependency level, the orchestrator parallelizes researchers + implementers across all units in that level subject to the three Parallel Safety conditions in the canonical rule. The parallelism width per level is bounded by the same orchestrator-honored `max_phase4_parallel` width (default `8`) the Phase 4 specialists honor — LLM-honored guidance, not a code-enforced cap (no hatch3r module reads an env var; the host Task tool is the dispatcher and applies no platform fan-out limit, per the canonical rule's Phase 4 — Final Quality).
+**Concurrent primitive — `concurrent_pipeline_unit`.** Each unit in a level is a `concurrent_pipeline_unit` record: `{ unitId: string; correlationId: string; parentEpicId?: string; level: number; dependsOn: string[]; priority: "p0"|"p1"|"p2"|"p3"; status: "pending"|"running"|"complete"|"blocked"; }`. Within a level the orchestrator dispatches by priority descending (p0 first); when concurrency limits cap the level, the in-flight pool is filled with highest-priority units first and the rest queue for the next dispatch slot.
+**File-overlap reconciliation.** When two parallel implementers in the same level touch the same file: accept disjoint-region edits without conflict; merge overlapping regions using the larger-scope change as base (the smaller change replays onto the larger); halt on semantic conflicts for user resolution. Per Parallel Safety condition 3, NO mid-pipeline writes to shared mutable state (`.hatch3r/hatch.json`, `.hatch3r/learnings/INDEX.md`) — learnings consolidation happens at pipeline completion only.
+**Review loop coordination.** After all level-N implementers complete, the orchestrator runs ONE consolidated Phase 3 review loop covering the union diff produced by the level. Per-unit Phase 4 specialist dispatch then runs in parallel bounded by `max_phase4_parallel`. Level-N+1 begins only after Level-N reaches Phase 4 completion (validated by `evaluatePhase4Completion`). Cross-pipeline concurrent invocation (two `hatch3r` commands in two shells against the same repo) is deferred per the cross-command note below + the audit CL-2 spec.
+## Pipeline Pattern (Cross-Command Consistency)
+Finding D7-M12 / D7-SA7.5-2: implementation-flavored orchestrators (`workflow`, `board-pickup`, `revision`, `quick-change`, `board-fill`) MUST follow the canonical pattern below. Per-command deviations require an explicit rationale in the command body's "Pipeline Deviations" subsection.
+| Stage | Canonical agent | Required at Tier | Carve-out |
+|-------|-----------------|------------------|-----------|
+| Phase 1 Research | `hatch3r-researcher` | T2/T3 | T1 skip per Phase Skip Criteria |
+| Phase 2 Implement | `hatch3r-implementer` | All | T1 quick-change inline carve-out only |
+| Phase 3 Review Loop | `hatch3r-reviewer` ↔ `hatch3r-fixer` (max `DEFAULT_MAX_REVIEW_ITERATIONS`) | T2/T3 nontrivial | T1 all-trivial skip per Phase Skip Criteria |
+| Phase 4 Final Quality | CQ + SSOT specialists, batched by severity, bounded by `max_phase4_parallel` | T2/T3 | T1 — only always-mode floor (`security` + `testability`) |
+| Phase 4 Validation Pass | re-run tests/typecheck vs Phase-3 baseline; re-review on specialist code mutations | T2/T3 | — |
+Cross-command error-handling defaults: sub-agent failure → retry once then fall back to direct/inline implementation per command's carve-out; quality-check failure → max 2 retry loops then ASK; context degradation → the single Context-Degradation Policy below (window-fraction primary: compress `>50%`, restart `>75%`; turn counts a coarse fallback). Concurrent-invocation handling and lockfile semantics are deferred to a future cycle pending the Decision 27 resumability work.
+## Context-Degradation Policy
+Single canonical policy for every pipeline command (reconciles the per-command turn bullets, Finding D7-24). **Window-fraction is the authoritative axis**; the per-command turn count is a coarse fallback for the same threshold at that command's pace, used only when the host surfaces no context-window percentage. Commands cite this policy, not restated numbers.
+| Window fraction (primary) | Action | Turn-count fallback (coarse) |
+|---------------------------|--------|------------------------------|
+| `> 50%` | Compress: apply the numbered strategies below in order. | implementation/review ≈ 25 turns; quick-change ≈ 15 (fast-completion scope); debug ≈ 20 |
+| `> 75%` | Restart: suggest a fresh chat / batch split carrying a progress summary of completed + remaining work; a fresh-context command (`hatch3r-revision`) just re-runs. | ≈ 1.5× the compress turn count |
+1. **Summarize Phase 1 output.** Replace full research findings with a structured summary: affected files (list), blast radius (count + top 3), key conventions (bullet points). Keep raw data only for the fields the current phase needs.
+2. **Prune resolved findings.** After Phase 3 review loop, remove findings that were fixed and confirmed. Only carry forward unresolved findings.
+3. **Collapse specialist results.** In the final output, summarize specialist results as a single status table rather than including full specialist reports. Full reports are available on request.
+4. **Never truncate security findings.** Security auditor output is always included in full regardless of context pressure.
+**Handoff loss measurement.** Compression is lossy, so measure it. At each phase transition the orchestrator records a `PhaseHandoffMetrics` record (`src/pipeline/observability.ts::createPhaseHandoffMetrics`) capturing input bytes, output bytes, whether summarisation was applied, and an `informationLossEstimate` (0-1 fraction of input bytes dropped). When `informationLossEstimate` exceeds `0.3`, surface the `formatPhaseHandoffWarning` line in the iteration summary so downstream phases validate that critical context survived — closing the gap where a phase silently receives a summary when it needed the full upstream output.

package/dist/content/rules/hatch3r-agent-orchestration.md ADDED Viewed

@@ -0,0 +1,250 @@
+---
+id: hatch3r-agent-orchestration
+type: rule
+description: Mandatory agent delegation, skill loading, and sub-agent usage directives for ALL tasks in ALL contexts
+scope: always
+tags: [orchestration, floor:protocol]
+precedence: high
+quality_charter: agents/shared/quality-charter.md
+cache_friendly: true
+---
+# Agent Orchestration
+This rule governs when and how to delegate work to hatch3r agents, load skills, and spawn sub-agents — mandatory directives, not suggestions. Hatch3r orchestration is a **phase-gated pipeline** (not free-form agent chat) with **structured handoffs** via `PipelineContext` and a **mandatory review gate** before the quality phase. For extended reference (PipelineContext schemas, resilience/failure handling, observability), see `hatch3r-agent-orchestration-detail`.
+## Universal Applicability
+This rule applies to EVERY context without exception: board-pickup (epic, sub-issue, standalone, batch), workflow command (full/quick), plain chat, issue references, and natural-language requests. Every task MUST follow the four-phase pipeline — **Phase 1 Research** (`hatch3r-researcher`), **Phase 2 Implement** (`hatch3r-implementer`), **Phase 3 Review Loop** (`hatch3r-reviewer` + `hatch3r-fixer`), **Phase 4 Final Quality** (parallel specialists) — per Mandatory Delegation Directives below; never implement code inline without sub-agents.
+**"Inline implementation" defined.** Inline implementation means calling any code-writing tool — `Edit`, `Write`, `MultiEdit`, `NotebookEdit`, `replace_string_in_file`, `multi_replace_string_in_file`, `create_file`, `str_replace_based_edit_tool`, `apply_patch`, or any platform equivalent — from the orchestrator turn itself, rather than from inside a spawned `hatch3r-implementer` (Phase 2) or `hatch3r-fixer` (Phase 3) sub-agent. The only carve-out is `hatch3r-quick-change` for Tier 1 single-line trivial edits per its declared scope.
+## Agent Roster
+Pipeline-phase agents (Phases 1-3):
+| Agent | Purpose | Invoke When |
+|-------|---------|-------------|
+| `hatch3r-researcher` | Context gathering (15 modes) | Phase 1 — before implementation (skip trivial edits) |
+| `hatch3r-implementer` | Single-task implementation | Phase 2 — one per task |
+| `hatch3r-reviewer` | Code review | Phase 3 — review loop |
+| `hatch3r-fixer` | Fix reviewer findings | Phase 3 — Critical/Warning findings |
+| `hatch3r-ci-watcher` | CI failure diagnosis | Conditional — CI fails |
+Phase 4 specialists (docs-writer, lint-fixer, architect, devops, and the CQ1-CQ9 vector specialists ui/ux/security/reliability/testability/scalability/performance/maintainability/enhancability) and their trigger conditions are enumerated once in the Phase 4 Specialist Trigger Table below. That table mirrors the single source of truth `src/pipeline/pipelineContext.ts::SPECIALIST_TRIGGER_TABLE` — add a specialist there first, never here.
+## Deep Context Integration
+Score task complexity per the `hatch3r-deep-context` rule before Phase 1. Apply the resulting tier:
+- **Tier 2 hard gate (B1).** Before Phase 2, run `hatch3r-researcher` with `requirements-elicitation:quick` mode to detect ambiguity. The researcher sub-agent emits the elicited questions as a structured list in its result — it does NOT call the platform-native question tool (a spawned sub-agent has no interactive surface). The **orchestrator** renders that list to the user via the platform-native question tool (per `agents/shared/user-question-protocol.md`), mirroring `commands/hatch3r-workflow.md` Phase 1 "Present the `requirements-elicitation` questions inline and await answers". Orchestrator awaits answers and integrates them into the Phase 1 brief; do not begin Phase 2 with unresolved questions. Tier 1 is exempt only when scope is single-file, single-concern, and acceptance is testable from the user message alone.
+- **Tier 3 (Deep):** Present Pre-Implementation Summary and ASK for confirmation. Do NOT proceed until all unresolved questions are answered.
+**Tier-to-Phase-4 specialist depth mapping** (Finding D7-M9 / D7-SA7.4-1). Deep-context tier drives Phase 1 researcher depth; the same tier drives Phase 4 specialist depth so quality coverage scales with task risk: Tier 1 → run only the always-mode floor (`hatch3r-security` + `hatch3r-testability`) at `quick` depth — UI/perf/maintainability/etc. specialists are skipped per Phase Skip Criteria. Tier 2 → always-mode floor at `standard` depth + every triggered conditional specialist at `quick` depth. Tier 3 → every applicable specialist at `deep` depth — full WCAG AA / OWASP ASI / CWV / mandate-map sweep with N=3 sampling on always-mode specialists when `floor:security` items are touched (per `agents/shared/quality-charter.md` -> Non-Determinism Budget). The depth signal rides on the specialist prompt as the explicit field `depth: quick | standard | deep`; specialists read it via the shared `agents/shared/quality-specialist-frame.md`. Tier also drives **model class** as a first-order effort lever (Tier 1 → economy, Tier 2 → default, Tier 3 → strongest), resolved per-adapter against `models.default` / `src/models/resolve.ts` and ignored where an adapter has no model-routing surface — see `hatch3r-deep-context` -> Tier Assignment.
+## Mandatory Delegation Directives
+### Context Gathering (Before Implementation)
+Spawn `hatch3r-researcher` before implementing any task (skip only for trivial single-line edits). Select modes by task type plus tier-appropriate modes per Deep Context Integration: `type:bug` → `symptom-trace`, `root-cause`, `codebase-impact`; `type:feature` → `codebase-impact`, `feature-design`, `architecture`; `type:refactor` → `current-state`, `refactoring-strategy`, `migration-path`; `type:qa` → `codebase-impact`. Depth: `quick` low-risk, `standard` medium-risk, `deep` high-risk; Tier 3 always `deep`.
+### Research Completeness Checklist
+Before Phase 1 to Phase 2 handoff, verify all four: (1) **affected files identified** (create/modify/delete listed); (2) **blast radius assessed** (downstream consumers + integration points documented); (3) **existing tests located** (or absence noted); (4) **dependencies mapped** (internal + external). If any item is unconfirmed, re-run researcher with additional modes or surface to user.
+### Implementation Delegation
+Spawn `hatch3r-implementer` via Task tool for ALL code changes; never implement inline. **Single issue / plain chat task:** one implementer (orchestrator owns git/PR/board; plain chat creates synthetic issue context first). **Epics:** one implementer per sub-issue, level-by-level respecting dependency order. **Batch:** group by dependency level, one implementer per issue, shared branch + combined PR. **Prompt enrichment (Tier 2+):** include `similar-implementation` findings as "Reference Conventions", resolved `requirements-elicitation` answers as "Resolved Requirements", and blast radius (Tier 3 only).
+### Per-Turn Pipeline-State Header
+Whenever a tracked task is active at Tier 2 or Tier 3 (deep-context score >= 3), the orchestrator MUST emit a single-line pipeline-state header at the start of every assistant turn that touches the task. Format (next can also be `user-confirmation` or `complete`):
+```
+[hatch3r-pipeline: phase {1|2|3|4} | last: {agent} → {SUCCESS|PARTIAL|FAILED|BLOCKED|n/a} | next: {agent}]
+```
+Example: `[hatch3r-pipeline: phase 2 | last: hatch3r-researcher → SUCCESS | next: hatch3r-implementer]`
+A missing header on a tracked Tier >= 2 task is a self-detectable drift signal — the user may halt and re-ground. The header also primes the orchestrator to re-resolve its phase before choosing tools. Tier 1, read-only, and chat-only turns do NOT require it.
+### End-of-Turn Delegation Attestation
+When the turn is on a tracked task at Tier >= 2 AND caused at least one file mutation, the orchestrator MUST emit a closing block immediately before the Iteration Summary. The block enumerates every file mutated this turn, the spawning sub-agent invocation, and the `delegation_proof_id` returned by that sub-agent.
+Format:
+```
+[hatch3r-delegation-attestation]
+files_mutated_this_turn:
+  - <relative path>: via <agent-name> (proof: <delegation_proof_id>)
+mutating_subagent_invocations: <integer>
+inline_edits_by_orchestrator: none | <carve-out: hatch3r-quick-change Tier-1 + queued re-delegation>
+```
+Rules:
+- Each `files_mutated_this_turn` row MUST cite the spawning sub-agent invocation and quote its `delegation_proof_id` verbatim. Unattributable rows are self-declared P8 B2 violations; the orchestrator MUST queue re-delegation next turn.
+- `inline_edits_by_orchestrator: none` is the only value accepted outside the `hatch3r-quick-change` Tier-1 carve-out (per the "Inline implementation" definition above).
+- Tier 1 read-only and chat-only turns are exempt (same scope as the Per-Turn Pipeline-State Header); a missing block on a Tier >= 2 mutating turn is a self-detectable drift signal — halt and re-ground per the missing-header protocol.
+- The block is consumed by reviewers and the next orchestrator turn; it sits beside the Iteration Summary, not inside it, preserving the 5-field iteration-summary contract verbatim.
+### Mandatory Delegation Directive (No Inline Implementation)
+For sub-agent prompt inclusion: the orchestrator MUST NOT call any code-writing tool (enumerated under "Inline implementation" above) from its own turn. The only path for code mutation is the Task tool spawning `hatch3r-implementer` (Phase 2) or `hatch3r-fixer` (Phase 3). Sole carve-out: `hatch3r-quick-change` Tier 1 trivial items per its declared scope. Violations are bypass mode (issue #73) — halt the turn and re-delegate.
+### Mid-Implementation Research Gap Checkpoint
+At the Phase 2 midpoint (after initial files modified, before completion), the implementer MUST evaluate research gaps to avoid discovering missing context too late. **Triggers:** modifying a file not in `researchFindings.affectedFiles`; an undocumented dependency/integration point; confidence dropping below "medium" on any sub-task; an expected test file missing or covering different behavior. **Actions:** log the gap in `PipelineContext.researchGaps`; if blocking, pause and request a targeted `hatch3r-researcher` re-run with the needed modes; if non-blocking, document the assumption, continue, and flag for Phase 3 reviewer attention.
+### Per-Task Mini-Review
+For multi-sub-task implementations, the implementer performs a lightweight mini-review after each sub-task: verify correctness, check interface contracts, validate no regressions, gate progression. Mini-reviews are internal (no separate reviewer agent).
+### Post-Implementation Quality Pipeline
+**Phase 3 — Review Loop:**
+1. Spawn `hatch3r-reviewer` with diff, acceptance criteria, and blast-radius summary.
+2. Critical/Warning findings: spawn `hatch3r-fixer` with full reviewer output, then re-review. Repeat until 0 Critical + 0 Warning, or max 4 iterations (matches `DEFAULT_MAX_REVIEW_ITERATIONS` in `src/pipeline/reviewLoop.ts`, raised 3→4 in Cycle 7.5 W2B2 H26 to reach the oscillation detector in default config; kept in sync by `src/__tests__/pipeline/reviewLoop.test.ts`, CI-enforced).
+   - **Re-run honor-rule (anti-self-approval, F15.2-H2 / D13-SA13.2-F6 / D15-SA15.2-F3).** The fixer's `Reviewer re-run required` boolean is advisory; the orchestrator derives the authoritative value `reRunRequired = (fixer Files changed list is non-empty)`. When `true`, another `hatch3r-reviewer` pass is MANDATORY before the loop may exit clean — a fixer `Status: SUCCESS` with a non-empty `Files changed` can never self-approve to a clean exit. A `Reviewer re-run required: false` printed alongside a non-empty `Files changed` is overridden to `true` and noted as a self-declared protocol violation. The `Files changed` list is SSOT-bound: it is attested by the same fixer `delegation_proof_id` the orchestrator quotes in its End-of-Turn Delegation Attestation, so the signal cannot be forged without spawning the fixer.
+3. **Confirmation pass** after clean review: lightweight re-review checking only (a) no new test failures vs Phase 2 baseline, (b) no type errors introduced, (c) acceptance criteria still met. Does not re-run the full checklist.
+   - **Runtime calibration second pass (orchestrator-owned).** At this would-be-clean exit, the orchestrator — not the stateless reviewer — evaluates the `rules/hatch3r-reviewer-calibration.md` trigger: read the cross-run `consecutive_clean_pass_count` from `.hatch3r/calibration-state.json`, increment it for this clean run, and fire a second-pass review (different model class, else same-class re-roll) when the post-increment count is a multiple of `N` (default 5) OR — for a high-risk diff (`floor:security` / auth / CQ3-security-dispatch files) — on the first clean PASS. A divergent second pass reverts the exit to step 2 (`REQUEST CHANGES`) and resets the count to 0; persist the count (atomic write via `src/merge/safeWrite.ts`) and append the calibration-log record. A REQUEST CHANGES or DESIGN_OBJECTION at any iteration also resets the persisted count to 0.
+4. Max iterations reached: surface to user with a structured summary (iteration count, remaining Critical findings with file:line, remaining Warnings, fix-manually-vs-accept-risk recommendation). Never present raw reviewer output unsummarized.
+5. **Review gate confidence signal:** on a clean verdict, record the iteration count in `PipelineContext.reviewResult.iterations`. Clean-on-first-pass signals higher confidence than clean-after-multiple-iterations; Phase 4 and the orchestrator factor this into risk assessment.
+**Phase 4 — Final Quality** (after review loop is clean):
+Launch Phase 4 specialists in parallel, bounded by an orchestrator-honored fan-out width `max_phase4_parallel` (default `8` — covers the empirical maximum of applicable specialists per the trigger table, so a typical Tier 3 change fans out in at most 2 batches). This bound is LLM-honored orchestrator guidance, not a code-enforced cap: the host Task tool is the actual dispatcher and applies no platform fan-out limit, so no hatch3r module reads an env var or clamps the count — the orchestrator self-limits per this prose. The bound exists for upstream provider rate-limit headroom (RPM/TPM) — a true dependency edge — NOT per-orchestrator context cost; token cost never serializes independent work (P8 dominates P7). A non-rate-limited orchestrator MAY raise the width up to the full applicable-specialist set. **Rate-limit back-off (orchestrator-LLM guidance):** when the orchestrator observes ≥3 consecutive rate-limit-class transient failures, reduce the active fan-out width by 1 for the next batch and record the back-off in the Iteration Summary; never silently cap a healthy run. When applicable specialists exceed the bound, batch by severity-descending priority `CRITICAL → HIGH → MEDIUM → LOW` (severity is the worst-case finding class the specialist surfaces: always-on testability (CQ5) / security (CQ3) → CRITICAL, conditional CQ1/CQ4/CQ7 (ui/reliability/performance) → HIGH, docs/lint → MEDIUM, low-impact → LOW); within a bucket, dispatch in trigger-table order. Each batch runs to completion before the next starts; the validation pass runs once after the final batch. The applicable specialists and their trigger conditions are listed in the Phase 4 Specialist Trigger Table below.
+**Specialist Prompt Enrichment:** When spawning Phase 4 specialists, include the Phase 2 `filesChanged` list (focus on affected code), the Phase 3 review verdict summary (avoid re-flagging reviewed issues), and `researchFindings.blastRadius` (assess downstream impact).
+**Runtime trigger evaluation (D6-M11):** the orchestrator harness calls `shouldTriggerSpecialist(specialist, changedFiles, projectType)` from `src/pipeline/pipelineContext.ts` to evaluate whether each specialist applies to the current change set. The function returns `{ triggered: boolean, reasons: string[] }` and consumes the same `SPECIALIST_TRIGGER_TABLE` that the prose table below mirrors. Treat the prose as a quick reference; treat the TS predicate as the authoritative gate. `npm run validate:specialist-roster` enforces parity.
+**Phase 4 Specialist Trigger Table:**
+| Specialist | Mode | Trigger Conditions |
+|-----------|------|--------------------|
+| `hatch3r-docs-writer` | Evaluate | Public API, architecture, or UX changes |
+| `hatch3r-lint-fixer` | Conditional | Lint/type errors present |
+| `hatch3r-architect` | Conditional | Architectural decisions, new modules/services |
+| `hatch3r-devops` | Conditional | CI/CD or infrastructure changes |
+| `hatch3r-ui` (CQ1) | Conditional | UI component / theme / token files modified (`*.{tsx,jsx,vue,svelte}`, `tailwind.config.*`, design-token registries) |
+| `hatch3r-ux` (CQ2) | Conditional | Flow / modal / route-transition / error-state files modified; microcopy or i18n strings changed |
+| `hatch3r-security` (CQ3) | Always | Any code change (always-mode floor — absorbs legacy security-auditor scope); auth / JWT / OAuth / WebAuthn code modified; release workflow modified; cookie or session handling modified; dependency files modified |
+| `hatch3r-reliability` (CQ4) | Conditional | Service handler / OTel / SLO / retry / circuit-breaker code modified; Kubernetes probe manifests modified |
+| `hatch3r-testability` (CQ5) | Always | Any code change (always-mode floor — absorbs legacy test-writer scope); test code added or modified; mandate-map feature class introduced; coverage threshold or runner config modified |
+| `hatch3r-scalability` (CQ6) | Conditional | Request handler / queue client / connection-pool / cache / background-job code modified |
+| `hatch3r-performance` (CQ7) | Conditional | ORM query / data-access layer / UI-rendering component / bundle config modified; vendor dependency >50KB introduced |
+| `hatch3r-maintainability` (CQ8) | Conditional | Any code mutation (duplication + complexity scan); schema / migration / API spec (OpenAPI / GraphQL SDL / Protobuf) modified |
+| `hatch3r-enhancability` (CQ9) | Conditional | User-visible behavior modified; public API surface modified (OpenAPI / GraphQL SDL / AsyncAPI); config schema or feature-flag definition modified |
+**CQ specialist consolidation.** Each CQ-vector specialist owns the full scope previously split between a legacy specialist and the CQ row. `ui` (CQ1) covers axe-core + design-token + four-state + reuse plus deep ARIA / reduced-motion; `security` (CQ3) covers OAuth 2.1 + OIDC + DPoP, SBOM/cosign, OWASP ASI plus the always-on security floor and project-specific deep audits; `performance` (CQ7) covers CWV, p95/p99, bundle size, N+1 plus profile-driven hot-path analysis; `testability` (CQ5) covers mandate-map verification plus test authoring. Per-agent boundaries are documented in each agent file's opening section.
+**Verification harness binding (CQ specialist → verify skill).** A CQ specialist runs its matching verify-class skill as the pass/fail evidence harness for its Phase 4 gate, so audit semantics are not re-authored in two places (D16.3): `ui` (CQ1) + `ux` (CQ2) → `hatch3r-ui-ux-verify`; `reliability` (CQ4) → `hatch3r-reliability-verify` + `hatch3r-observability-verify` (the latter covers OTel span / trace-id correlation on the request path). `hatch3r-qa-validation` (no 1:1 CQ specialist — release/acceptance E2E) and `hatch3r-browser-verify` (multi-purpose Playwright tool, default-ON per UI-affecting invocation) stay standalone harnesses invoked by the orchestrator, not bound to one CQ row. The reciprocal "Invoked by" upstream-citation lives in each verify skill's `## Invoked by` subsection.
+**Project-Type-Aware Specialist Selection:** When `PipelineContext.projectType` is available, enrich specialist prompts with language-specific hints (e.g., ruff/mypy + pytest + SSTI/SQLi for Python; golangci-lint + govulncheck for Go; clippy + cargo-audit for Rust). See `src/pipeline/pipelineContext.ts` `LANGUAGE_SPECIALIST_CONFIGS` for the full mapping.
+### Phase 4 Validation Pass
+After all Phase 4 specialists complete, run a validation pass: run the test suite + type checker against the Phase 3 baseline cached in `PipelineContext`. No new failures → complete. New failures → identify the causing specialist, spawn `hatch3r-fixer`, re-validate (max 2 iterations, matches `DEFAULT_MAX_VALIDATION_PASS_ITERATIONS` in `src/pipeline/pipelineContext.ts`; basis + recalibration triggers in `VALIDATION_PASS_CALIBRATION`; kept in sync by `src/__tests__/pipeline/pipelineContext.test.ts`, CI-enforced); persistent regressions surface to user (never silently accept). If any specialist produced code fixes (not just findings), spawn a lightweight `hatch3r-reviewer` re-review scoped to the specialist-modified files (prevents specialist fixes bypassing the Phase 3 gate; max 1 re-review iteration, Critical findings trigger a single fixer pass).
+### Specialist Success Criteria
+- **testability (CQ5):** all new/modified code paths have tests meeting the mandate map; no untested branches in changed files.
+- **security (CQ3):** no HIGH/CRITICAL findings unresolved; MEDIUM documented with plan; CQ3 thresholds (npm provenance, SBOM, SHA-pin, OWASP ASI) met for in-scope changes.
+- **docs-writer:** affected APIs, architecture, and UX reflected in docs. **lint-fixer:** zero lint/type errors in changed files.
+- **ui (CQ1):** WCAG AA compliance; no new a11y violations; design-token + four-state coverage; reuse-first delta.
+- **performance (CQ7):** no performance regressions; new hot paths benchmarked; CWV / p95/p99 / bundle-size budgets met.
+- **architect:** ADRs documented; design aligns with patterns or divergence justified. **devops:** CI/CD passes end-to-end; deployment config validated.
+## Skill Loading Directives
+Load the matching skill before implementation: `type:bug` → `hatch3r-bug-fix`; `type:feature` → `hatch3r-feature`; `type:refactor` + `area:ui` → `hatch3r-visual-refactor`; `type:refactor` + behavior change → `hatch3r-logical-refactor`; `type:refactor` (other) → `hatch3r-refactor`; `type:qa` → `hatch3r-qa-validation`. Skill-referenced agent delegations are mandatory.
+## Subagent Spawning Protocol
+Use `subagent_type: "generalPurpose"` for all delegations. Include the agent protocol (the hatch3r role id, e.g. `hatch3r-reviewer`, named in the prompt), applicable `scope: always` rules, tooling hierarchy, and relevant learnings. Launch independent sub-agents in parallel (maximum parallelism); await and review results, surfacing BLOCKED or PARTIAL to the user.
+**Tool-allowlist enforcement boundary (ASI02/ASI03).** The generic-spawn convention has a trust-boundary consequence the orchestrator MUST account for: the Claude Code PreToolUse hook (`src/pipeline/agentToolAllowlist.ts::buildClaudePreToolUseHookScript`) gates only when the payload `agent_type` starts with `hatch3r-`; a `generalPurpose` spawn carries Claude Code's own `agent_type` (`general-purpose`), so the runtime hook passes it through (this is intentional — Claude Code's built-in sub-agents must not be governed by hatch3r policy). Therefore the **active** allowlist enforcement for delegated hatch3r work is the orchestrator-boundary gate `checkToolAccess(roleId, toolCategory)` (`src/pipeline/agentToolAllowlist.ts`), which the orchestrator applies using the hatch3r role id it placed in the prompt — deny-by-default before forwarding a tool category to the sub-agent. The runtime PreToolUse hook is a defense-in-depth second layer that fires only for adapters/sessions that spawn role-bearing native sub-agents (`subagent_type: "hatch3r-<role>"`); under the generic-spawn default it does not fire, and the boundary gate is the sole enforcement point.
+## Parallel Safety
+Default is **linear per task** (Phase 1 → 2 → 3 → 4 serially) — `PipelineContext` is a single handoff token and LLM orchestrators reason better with sequential context. Phase 4 specialists parallelize because they read-only Phase 3 artifacts; extending parallelism to Phases 1-3 requires the conditions below.
+### Three Conditions to Parallelize
+ALL three must hold: (1) **read-only or disjoint writes** (no conflict zone); (2) **deterministic aggregation** (outputs merge without orchestrator intervention — tests pass-if-all-pass, findings union); (3) **no shared mutable state** (agents that mutate `PipelineContext.state`/`featureFlags`/`metadata` serialize; parallel agents only READ).
+**Parallel-safe:** Phase 4 specialists; intra-Phase-1 researcher modes on a self-contained task; per-module Phase 2 fan-out on disjoint `affectedFiles` (merged post-Phase-2); Tier 2/3 elicitation researchers (outputs tagged with confidence + perspective). **NOT parallel-safe:** cross-phase execution (each phase depends on the prior's output); Phase 3 review-loop iterations (reviewer → fixer → re-reviewer serial); overlapping-file implementers (serialize or use a merge-conflict gate); Phase 4 validation re-review.
+**Cost-Dominance Principle.** Token cost of sub-agent invocation never justifies serialization of independent work. The three safety conditions govern WHEN parallelism is safe; cost does not govern WHETHER to parallelize. When in doubt, fan out. Serialization is only valid on true dependency edges.
+**Scaling Heuristic.** Sub-agent count tracks task decomposition: N independent modules → N parallel Phase-2 implementers; M specialist gates → M parallel Phase-4 specialists; K independent research questions → K parallel `hatch3r-researcher` sub-agents (one per question, findings unioned post-Phase-1). The K-parallel-researcher path is mandatory only when Phase 1 decomposes into ≥2 questions whose answers do not depend on each other; a single-question task keeps one researcher running its mode set serially. Orchestrators emit `sub_agents_spawned: {count, rationale}` in their structured output.
+### Concurrent Invocation Handling
+Two top-level pipelines running at once (e.g. `hatch3r-workflow` in one shell, `hatch3r-board-pickup` in another) are bounded by the same three conditions, applied cross-pipeline (D7-SA7.5-F7.5.6): (1) **Advisory lock-note (best-effort, NOT atomic — D7-27)** — before Phase 1, the orchestrator writes/reads `.hatch3r/.lock` (JSON: `pid`, `command`, `branch`, `correlation_id`, `started_at`); clear on completion/abort; treat a note older than 6h as stale. This is an advisory coordination note, not a mutual-exclusion primitive: there is no `hatch3r` lock verb and no atomic acquire in `src/` (the only cross-process lock that ships is `acquireWriteLock` in `src/merge/safeWrite.ts`, scoped to single-file atomic writes, not top-level pipelines), so the LLM orchestrator's read-then-write is TOCTOU by construction and `src/cli/commands/status.ts` already labels this file "advisory". It exists to surface a likely collision, never to guarantee exclusion. (2) **Detect-then-warn** — if a live note names the same branch or an open `.hatch3r/hatch.json` board transaction, WARN and ASK (proceed on a separate branch / wait / abort); never silently co-mutate shared state. (3) **True isolation via worktree, not the lock-note (D7-27)** — when concurrency must actually be conflict-free rather than merely warned (the parallel-implementer path), route each pipeline through `hatch3r worktree-setup <name>` — the isolation primitive hatch3r already ships (`src/cli/commands/worktreeSetup.ts`; `commands/board/pickup-delegation-multi.md` already uses it per implementer) — so the pipelines write disjoint working trees and integrate back on completion, satisfying the disjoint-writes safety condition without relying on the non-atomic note. (4) **Cache-sharing** — `.hatch3r/learnings/` is read-many, write-once-at-completion with timestamp-ordered conflict resolution. Cross-task context sharing is bounded by the no-shared-mutable-state condition: learnings consolidate at pipeline completion; mid-pipeline writes are out of scope to preserve parallel-safety determinism (D7-SA7.5-F7.5.8). Each command's Guardrails cite this subsection.
+## Cross-Phase Error Propagation
+On a non-SUCCESS status, the orchestrator MUST propagate error context downstream, never silently drop it. **Phase 1 PARTIAL:** include `researchGaps` in the implementer prompt and set confidence expectations accordingly. **Phase 2 PARTIAL:** include `reason` + unimplemented acceptance criteria in the reviewer prompt (reviewer distinguishes "not done yet" from "done incorrectly"). **Phase 3 UNRESOLVED:** include the unresolved findings in Phase 4 specialist prompts (specialists must not conflict with known issues). **Phase 4 specialist FAILED:** include the failure reason when surfacing — never report "Phase 4 failed" without naming which specialist and why.
+**Sub-agent-failure handling (shared clause; all commands cite this — never inline).** When any spawned implementer/fixer/specialist sub-agent FAILS or returns no usable output: (1) retry once with the same prompt; (2) if the retry fails, re-spawn `hatch3r-fixer` (Phase 3) with the failure reason + partial output as failure context — `hatch3r-fixer` is the code-mutation path, so the work stays delegated; (3) if the re-spawn also fails, emit `BLOCKED_OTHER` with a one-sentence reason and ASK the user (fix-manually vs adjust-scope vs accept-risk). The orchestrator MUST NOT fall back to inline implementation — that is the issue #73 bypass mode (see Mandatory Delegation Directive). The sole exception is `hatch3r-quick-change`, whose Tier-1 carve-out permits inline retry per its declared scope.
+## Correlation ID
+Generate a UUID v4 per top-level task before Phase 1. Include in every sub-agent prompt as `correlation_id`. All sub-agents include it in logs, outputs, and status reports. Epic sub-issues get individual IDs; batch tasks share one ID with a sub-task index.
+## Severity Scale
+| Severity | Definition | Pipeline Action |
+|----------|-----------|-----------------|
+| **CRITICAL** | Blocks merge. Security vulnerabilities, data loss, broken core functionality. | Must resolve before Phase 3 exit. |
+| **HIGH** | Should fix before merge. Significant bugs, performance regressions. | Fix or escalate to user. |
+| **MEDIUM** | Fix in same sprint. Code quality, minor bugs. | Document with remediation plan. |
+| **LOW** | Track for future. Style nits, minor improvements. | Log only. No merge gate. |
+| **INFO** | Informational. Observations, suggestions. | Awareness only. |
+## Status Codes
+All sub-agents MUST map findings to the Severity Scale above. **SUCCESS** (fully completed, all criteria met) · **PARTIAL** (include `reason`) · **FAILED** (no usable output; include `reason`) · **SKIPPED** (intentionally not executed) · **TIMEOUT** (time budget exceeded; forward partial output) · **BLOCKED_AMBIGUITY** · **BLOCKED_MISSING_CONTEXT** · **BLOCKED_CONFLICTING_SPECS** · **BLOCKED_MISSING_TOOL** · **BLOCKED_PREMISE_CHALLENGE** · **BLOCKED_OTHER** (one-sentence reason required). The six BLOCKED_* values are the canonical named escalation enum codified in `agents/shared/quality-charter.md` §17; every `agents/hatch3r-*.md` main agent MUST declare a Status field selecting from this enum. BLOCKED_PREMISE_CHALLENGE triggers `isHaltStatus()` from `src/pipeline/pipelineContext.ts::AgentStatus` — orchestrator halts and surfaces the premise concern + ≥1 alternative approach (Finding D7-M1 / D7-SA7.1-1). The reviewer's Phase-3 equivalent is the `DESIGN_OBJECTION` verdict; implementer/researcher/fixer emit the agent status, reviewer emits the verdict — both surface the same premise-challenge across non-overlapping phases.
+## Phase Handoff Contract
+Each phase populates a typed slice of `PipelineContext` (canonical schema: `src/pipeline/pipelineContext.ts::PipelineContext`). Required fields per transition: Phase 1 → 2 sets `researchFindings` (or `researchGaps[]` when skip-documented); Phase 2 → 3 sets `implementationResult` (filesChanged, testsWritten, status ∈ `AgentStatus`, reason); Phase 3 → 4 sets `reviewResult` (iterations, finalVerdict, findings, confirmationPassResult, confidence); Phase 4 → completion sets `qualityResults` (specialists[], validationPass). The typed gate `validatePhaseTransition(context, targetPhase, options?)` returns the `ValidationError[]` set the orchestrator MUST resolve before advancing; forwarding sub-agent output that omits a required field is a Phase Handoff Contract violation (Finding D7-M3 / D7-SA7.1-3) — re-spawn the upstream agent with the gap named. The Phase 3 → 4 advance rejects an `UNRESOLVED` verdict unless `options.allowUnresolvedAdvance` is set (the user-chose-manual skip condition, Finding D7-10) and accepts an absent `reviewResult`/`iterations: 0` only under `options.phase3Skipped` (the docs-only/trivial Phase 3 skip, Finding D7-11). Phase 4 completion is additionally gated by `evaluatePhase4Completion(qualityResults, options)` — a typed predicate aggregating Phase 4 Validation Pass criteria; when `complete: false`, surface `incompletionReason` (Finding D7-M8 / D7-SA7.3-3). **Handoff-loss trigger (Finding D7-23):** when a transition applies lossy compression (`hatch3r-agent-orchestration-detail` § Context-Degradation Policy), the orchestrator records `createPhaseHandoffMetrics` (passing the protected-byte count for never-truncate strategy-#4 bytes) and, when `informationLossEstimate > 0.3`, emits the `formatPhaseHandoffWarning` line in the iteration summary so downstream phases verify critical context survived — this is the command-layer trigger; every pipeline command inherits it via this always-loaded rule.
+## Phase Skip Criteria
+All commands that use the pipeline MUST reference these criteria — do not invent command-specific skip rules.
+| Phase | Can Skip When | Mandatory Minimum (even when skipped) |
+|-------|--------------|--------------------------------------|
+| **Phase 1 (Research)** | Trivial single-line edit (typo, comment, single-value config); Tier 1 single-file change with no cross-module impact; Research already cached in PipelineContext | Affected files identified (even via quick scan); existing tests noted |
+| **Phase 2 (Implement)** | Never — implementation is always required for code changes | All changes via hatch3r-implementer (never inline except trivial items in quick-change) |
+| **Phase 3 (Review)** | All items trivial (quick-change only); documentation-only change with no code | Quality checks (lint/typecheck/test) must pass; acceptance criteria verified |
+| **Phase 4 (Quality)** | Review loop unresolved AND user chose manual resolution; documentation-only; all trivial + quality checks pass (quick-change only) | testability (CQ5) + security (CQ3) always required for code changes; quality checks must pass |
+See `src/pipeline/pipelineContext.ts` for the programmatic `PHASE_SKIP_CRITERIA` constant.
+## Root-Cause Depth Requirements
+When a phase reports a failure or unexpected result, the orchestrator MUST classify root cause before the next action — reject the shallow fix, require the root-cause fix:
+- **Test failure after Phase 2:** not disable/skip the test — identify why the implementation breaks it; fix the code or update the test with justification.
+- **Lint errors after Phase 4:** not `eslint-disable` comments — fix the underlying code pattern.
+- **Type errors after fixer changes:** not `as any` casts — trace the mismatch to its source and fix the type definition or usage.
+- **Review loop not converging:** not surface after 3 iterations without analysis — classify whether findings oscillate (fixer A breaks what fixer B fixed) and surface the conflict pattern.
+Reject superficial fixes from any sub-agent. If a fixer's output contains suppression patterns (disable comments, `any` casts, test skips without linked issues), classify as PARTIAL and re-run with a prompt requesting a root-cause fix. This rejection is backed by a typed advisory: the reviewer runs `detectSuppressionPatterns(diff)` (`src/pipeline/reviewLoop.ts`) over the fixer diff — it flags `as any` casts, `eslint-disable` directives with no issue reference, and `test.skip`/`it.skip`/`describe.skip` with no linked issue. A non-empty `found` is the machine-checkable signal for this gate (mirroring how the orchestrator consults `detectOscillation`); the reviewer downgrades the verdict so the existing review loop forces the re-run.
+## Task Context Protocols
+**Single-task plain chat:** classify task type, create synthetic issue context, run the full pipeline (fetch issue-reference details via platform CLI). **Multi-task plain chat:** parse into discrete tasks, classify each, build a dependency graph, parallelize researchers + implementers per dependency level, run the review loop after all implementations, then Phase 4 specialists; when parallel implementers touch the same file, accept disjoint-region edits, merge overlapping regions using the larger-scope change as base, and halt on semantic conflicts for user resolution. **Auto-mode guardrails:** verify scope containment, no unapproved destructive operations, and output-schema compliance after each phase; halt on violation. Full specs in `hatch3r-agent-orchestration-detail`.
+## Rule Application
+All `scope: always` rules apply to every task including sub-agent work; include rule directives in sub-agent prompts. For limited context windows, Tier 1 is mandatory; Tier 2/3 included selectively. Inclusion tiers:
+- **Tier 1 — always include (every sub-agent):** `hatch3r-security-patterns`, `hatch3r-code-standards`.
+- **Tier 2 — by phase:** `hatch3r-testing` (testability/implementer/reviewer); `hatch3r-accessibility-standards` (ui, UI reviewer); `hatch3r-git-conventions` (orchestrator git ops); `hatch3r-ci-cd` (ci-watcher/devops); `hatch3r-dependency-management` (security CQ3 supply-chain slice).
+- **Tier 3 — on-demand by role + scope:** `hatch3r-api-design`, `hatch3r-secrets-management`, `hatch3r-data-classification`, `hatch3r-performance-budgets`, `hatch3r-browser-verification`, `hatch3r-component-conventions`, `hatch3r-i18n`, `hatch3r-theming`, `hatch3r-migrations`, `hatch3r-feature-flags`, `hatch3r-observability-logging`, `hatch3r-observability-metrics`, `hatch3r-observability-tracing`.