npm - hatch3r - Versions diffs - 1.8.0 → 2.0.0 - Mend

hatch3r 1.8.0 → 2.0.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (396) hide show

package/dist/content/rules/hatch3r-agent-orchestration.mdc ADDED Viewed

@@ -0,0 +1,245 @@
+---
+description: Mandatory agent delegation, skill loading, and sub-agent usage directives for ALL tasks in ALL contexts
+alwaysApply: true
+precedence: high
+---
+# Agent Orchestration
+This rule governs when and how to delegate work to hatch3r agents, load skills, and spawn sub-agents — mandatory directives, not suggestions. Hatch3r orchestration is a **phase-gated pipeline** (not free-form agent chat) with **structured handoffs** via `PipelineContext` and a **mandatory review gate** before the quality phase. For extended reference (PipelineContext schemas, resilience/failure handling, observability), see `hatch3r-agent-orchestration-detail`.
+## Universal Applicability
+This rule applies to EVERY context without exception: board-pickup (epic, sub-issue, standalone, batch), workflow command (full/quick), plain chat, issue references, and natural-language requests. Every task MUST follow the four-phase pipeline — **Phase 1 Research** (`hatch3r-researcher`), **Phase 2 Implement** (`hatch3r-implementer`), **Phase 3 Review Loop** (`hatch3r-reviewer` + `hatch3r-fixer`), **Phase 4 Final Quality** (parallel specialists) — per Mandatory Delegation Directives below; never implement code inline without sub-agents.
+**"Inline implementation" defined.** Inline implementation means calling any code-writing tool — `Edit`, `Write`, `MultiEdit`, `NotebookEdit`, `replace_string_in_file`, `multi_replace_string_in_file`, `create_file`, `str_replace_based_edit_tool`, `apply_patch`, or any platform equivalent — from the orchestrator turn itself, rather than from inside a spawned `hatch3r-implementer` (Phase 2) or `hatch3r-fixer` (Phase 3) sub-agent. The only carve-out is `hatch3r-quick-change` for Tier 1 single-line trivial edits per its declared scope.
+## Agent Roster
+Pipeline-phase agents (Phases 1-3):
+| Agent | Purpose | Invoke When |
+|-------|---------|-------------|
+| `hatch3r-researcher` | Context gathering (15 modes) | Phase 1 — before implementation (skip trivial edits) |
+| `hatch3r-implementer` | Single-task implementation | Phase 2 — one per task |
+| `hatch3r-reviewer` | Code review | Phase 3 — review loop |
+| `hatch3r-fixer` | Fix reviewer findings | Phase 3 — Critical/Warning findings |
+| `hatch3r-ci-watcher` | CI failure diagnosis | Conditional — CI fails |
+Phase 4 specialists (docs-writer, lint-fixer, architect, devops, and the CQ1-CQ9 vector specialists ui/ux/security/reliability/testability/scalability/performance/maintainability/enhancability) and their trigger conditions are enumerated once in the Phase 4 Specialist Trigger Table below. That table mirrors the single source of truth `src/pipeline/pipelineContext.ts::SPECIALIST_TRIGGER_TABLE` — add a specialist there first, never here.
+## Deep Context Integration
+Score task complexity per the `hatch3r-deep-context` rule before Phase 1. Apply the resulting tier:
+- **Tier 2 hard gate (B1).** Before Phase 2, run `hatch3r-researcher` with `requirements-elicitation:quick` mode to detect ambiguity. The researcher sub-agent emits the elicited questions as a structured list in its result — it does NOT call the platform-native question tool (a spawned sub-agent has no interactive surface). The **orchestrator** renders that list to the user via the platform-native question tool (per `agents/shared/user-question-protocol.md`), mirroring `commands/hatch3r-workflow.md` Phase 1 "Present the `requirements-elicitation` questions inline and await answers". Orchestrator awaits answers and integrates them into the Phase 1 brief; do not begin Phase 2 with unresolved questions. Tier 1 is exempt only when scope is single-file, single-concern, and acceptance is testable from the user message alone.
+- **Tier 3 (Deep):** Present Pre-Implementation Summary and ASK for confirmation. Do NOT proceed until all unresolved questions are answered.
+**Tier-to-Phase-4 specialist depth mapping** (Finding D7-M9 / D7-SA7.4-1). Deep-context tier drives Phase 1 researcher depth; the same tier drives Phase 4 specialist depth so quality coverage scales with task risk: Tier 1 → run only the always-mode floor (`hatch3r-security` + `hatch3r-testability`) at `quick` depth — UI/perf/maintainability/etc. specialists are skipped per Phase Skip Criteria. Tier 2 → always-mode floor at `standard` depth + every triggered conditional specialist at `quick` depth. Tier 3 → every applicable specialist at `deep` depth — full WCAG AA / OWASP ASI / CWV / mandate-map sweep with N=3 sampling on always-mode specialists when `floor:security` items are touched (per `agents/shared/quality-charter.md` -> Non-Determinism Budget). The depth signal rides on the specialist prompt as the explicit field `depth: quick | standard | deep`; specialists read it via the shared `agents/shared/quality-specialist-frame.md`. Tier also drives **model class** as a first-order effort lever (Tier 1 → economy, Tier 2 → default, Tier 3 → strongest), resolved per-adapter against `models.default` / `src/models/resolve.ts` and ignored where an adapter has no model-routing surface — see `hatch3r-deep-context` -> Tier Assignment.
+## Mandatory Delegation Directives
+### Context Gathering (Before Implementation)
+Spawn `hatch3r-researcher` before implementing any task (skip only for trivial single-line edits). Select modes by task type plus tier-appropriate modes per Deep Context Integration: `type:bug` → `symptom-trace`, `root-cause`, `codebase-impact`; `type:feature` → `codebase-impact`, `feature-design`, `architecture`; `type:refactor` → `current-state`, `refactoring-strategy`, `migration-path`; `type:qa` → `codebase-impact`. Depth: `quick` low-risk, `standard` medium-risk, `deep` high-risk; Tier 3 always `deep`.
+### Research Completeness Checklist
+Before Phase 1 to Phase 2 handoff, verify all four: (1) **affected files identified** (create/modify/delete listed); (2) **blast radius assessed** (downstream consumers + integration points documented); (3) **existing tests located** (or absence noted); (4) **dependencies mapped** (internal + external). If any item is unconfirmed, re-run researcher with additional modes or surface to user.
+### Implementation Delegation
+Spawn `hatch3r-implementer` via Task tool for ALL code changes; never implement inline. **Single issue / plain chat task:** one implementer (orchestrator owns git/PR/board; plain chat creates synthetic issue context first). **Epics:** one implementer per sub-issue, level-by-level respecting dependency order. **Batch:** group by dependency level, one implementer per issue, shared branch + combined PR. **Prompt enrichment (Tier 2+):** include `similar-implementation` findings as "Reference Conventions", resolved `requirements-elicitation` answers as "Resolved Requirements", and blast radius (Tier 3 only).
+### Per-Turn Pipeline-State Header
+Whenever a tracked task is active at Tier 2 or Tier 3 (deep-context score >= 3), the orchestrator MUST emit a single-line pipeline-state header at the start of every assistant turn that touches the task. Format (next can also be `user-confirmation` or `complete`):
+```
+[hatch3r-pipeline: phase {1|2|3|4} | last: {agent} → {SUCCESS|PARTIAL|FAILED|BLOCKED|n/a} | next: {agent}]
+```
+Example: `[hatch3r-pipeline: phase 2 | last: hatch3r-researcher → SUCCESS | next: hatch3r-implementer]`
+A missing header on a tracked Tier >= 2 task is a self-detectable drift signal — the user may halt and re-ground. The header also primes the orchestrator to re-resolve its phase before choosing tools. Tier 1, read-only, and chat-only turns do NOT require it.
+### End-of-Turn Delegation Attestation
+When the turn is on a tracked task at Tier >= 2 AND caused at least one file mutation, the orchestrator MUST emit a closing block immediately before the Iteration Summary. The block enumerates every file mutated this turn, the spawning sub-agent invocation, and the `delegation_proof_id` returned by that sub-agent.
+Format:
+```
+[hatch3r-delegation-attestation]
+files_mutated_this_turn:
+  - <relative path>: via <agent-name> (proof: <delegation_proof_id>)
+mutating_subagent_invocations: <integer>
+inline_edits_by_orchestrator: none | <carve-out: hatch3r-quick-change Tier-1 + queued re-delegation>
+```
+Rules:
+- Each `files_mutated_this_turn` row MUST cite the spawning sub-agent invocation and quote its `delegation_proof_id` verbatim. Unattributable rows are self-declared P8 B2 violations; the orchestrator MUST queue re-delegation next turn.
+- `inline_edits_by_orchestrator: none` is the only value accepted outside the `hatch3r-quick-change` Tier-1 carve-out (per the "Inline implementation" definition above).
+- Tier 1 read-only and chat-only turns are exempt (same scope as the Per-Turn Pipeline-State Header); a missing block on a Tier >= 2 mutating turn is a self-detectable drift signal — halt and re-ground per the missing-header protocol.
+- The block is consumed by reviewers and the next orchestrator turn; it sits beside the Iteration Summary, not inside it, preserving the 5-field iteration-summary contract verbatim.
+### Mandatory Delegation Directive (No Inline Implementation)
+For sub-agent prompt inclusion: the orchestrator MUST NOT call any code-writing tool (enumerated under "Inline implementation" above) from its own turn. The only path for code mutation is the Task tool spawning `hatch3r-implementer` (Phase 2) or `hatch3r-fixer` (Phase 3). Sole carve-out: `hatch3r-quick-change` Tier 1 trivial items per its declared scope. Violations are bypass mode (issue #73) — halt the turn and re-delegate.
+### Mid-Implementation Research Gap Checkpoint
+At the Phase 2 midpoint (after initial files modified, before completion), the implementer MUST evaluate research gaps to avoid discovering missing context too late. **Triggers:** modifying a file not in `researchFindings.affectedFiles`; an undocumented dependency/integration point; confidence dropping below "medium" on any sub-task; an expected test file missing or covering different behavior. **Actions:** log the gap in `PipelineContext.researchGaps`; if blocking, pause and request a targeted `hatch3r-researcher` re-run with the needed modes; if non-blocking, document the assumption, continue, and flag for Phase 3 reviewer attention.
+### Per-Task Mini-Review
+For multi-sub-task implementations, the implementer performs a lightweight mini-review after each sub-task: verify correctness, check interface contracts, validate no regressions, gate progression. Mini-reviews are internal (no separate reviewer agent).
+### Post-Implementation Quality Pipeline
+**Phase 3 — Review Loop:**
+1. Spawn `hatch3r-reviewer` with diff, acceptance criteria, and blast-radius summary.
+2. Critical/Warning findings: spawn `hatch3r-fixer` with full reviewer output, then re-review. Repeat until 0 Critical + 0 Warning, or max 4 iterations (matches `DEFAULT_MAX_REVIEW_ITERATIONS` in `src/pipeline/reviewLoop.ts`, raised 3→4 in Cycle 7.5 W2B2 H26 to reach the oscillation detector in default config; kept in sync by `src/__tests__/pipeline/reviewLoop.test.ts`, CI-enforced).
+   - **Re-run honor-rule (anti-self-approval, F15.2-H2 / D13-SA13.2-F6 / D15-SA15.2-F3).** The fixer's `Reviewer re-run required` boolean is advisory; the orchestrator derives the authoritative value `reRunRequired = (fixer Files changed list is non-empty)`. When `true`, another `hatch3r-reviewer` pass is MANDATORY before the loop may exit clean — a fixer `Status: SUCCESS` with a non-empty `Files changed` can never self-approve to a clean exit. A `Reviewer re-run required: false` printed alongside a non-empty `Files changed` is overridden to `true` and noted as a self-declared protocol violation. The `Files changed` list is SSOT-bound: it is attested by the same fixer `delegation_proof_id` the orchestrator quotes in its End-of-Turn Delegation Attestation, so the signal cannot be forged without spawning the fixer.
+3. **Confirmation pass** after clean review: lightweight re-review checking only (a) no new test failures vs Phase 2 baseline, (b) no type errors introduced, (c) acceptance criteria still met. Does not re-run the full checklist.
+   - **Runtime calibration second pass (orchestrator-owned).** At this would-be-clean exit, the orchestrator — not the stateless reviewer — evaluates the `rules/hatch3r-reviewer-calibration.md` trigger: read the cross-run `consecutive_clean_pass_count` from `.hatch3r/calibration-state.json`, increment it for this clean run, and fire a second-pass review (different model class, else same-class re-roll) when the post-increment count is a multiple of `N` (default 5) OR — for a high-risk diff (`floor:security` / auth / CQ3-security-dispatch files) — on the first clean PASS. A divergent second pass reverts the exit to step 2 (`REQUEST CHANGES`) and resets the count to 0; persist the count (atomic write via `src/merge/safeWrite.ts`) and append the calibration-log record. A REQUEST CHANGES or DESIGN_OBJECTION at any iteration also resets the persisted count to 0.
+4. Max iterations reached: surface to user with a structured summary (iteration count, remaining Critical findings with file:line, remaining Warnings, fix-manually-vs-accept-risk recommendation). Never present raw reviewer output unsummarized.
+5. **Review gate confidence signal:** on a clean verdict, record the iteration count in `PipelineContext.reviewResult.iterations`. Clean-on-first-pass signals higher confidence than clean-after-multiple-iterations; Phase 4 and the orchestrator factor this into risk assessment.
+**Phase 4 — Final Quality** (after review loop is clean):
+Launch Phase 4 specialists in parallel, bounded by an orchestrator-honored fan-out width `max_phase4_parallel` (default `8` — covers the empirical maximum of applicable specialists per the trigger table, so a typical Tier 3 change fans out in at most 2 batches). This bound is LLM-honored orchestrator guidance, not a code-enforced cap: the host Task tool is the actual dispatcher and applies no platform fan-out limit, so no hatch3r module reads an env var or clamps the count — the orchestrator self-limits per this prose. The bound exists for upstream provider rate-limit headroom (RPM/TPM) — a true dependency edge — NOT per-orchestrator context cost; token cost never serializes independent work (P8 dominates P7). A non-rate-limited orchestrator MAY raise the width up to the full applicable-specialist set. **Rate-limit back-off (orchestrator-LLM guidance):** when the orchestrator observes ≥3 consecutive rate-limit-class transient failures, reduce the active fan-out width by 1 for the next batch and record the back-off in the Iteration Summary; never silently cap a healthy run. When applicable specialists exceed the bound, batch by severity-descending priority `CRITICAL → HIGH → MEDIUM → LOW` (severity is the worst-case finding class the specialist surfaces: always-on testability (CQ5) / security (CQ3) → CRITICAL, conditional CQ1/CQ4/CQ7 (ui/reliability/performance) → HIGH, docs/lint → MEDIUM, low-impact → LOW); within a bucket, dispatch in trigger-table order. Each batch runs to completion before the next starts; the validation pass runs once after the final batch. The applicable specialists and their trigger conditions are listed in the Phase 4 Specialist Trigger Table below.
+**Specialist Prompt Enrichment:** When spawning Phase 4 specialists, include the Phase 2 `filesChanged` list (focus on affected code), the Phase 3 review verdict summary (avoid re-flagging reviewed issues), and `researchFindings.blastRadius` (assess downstream impact).
+**Runtime trigger evaluation (D6-M11):** the orchestrator harness calls `shouldTriggerSpecialist(specialist, changedFiles, projectType)` from `src/pipeline/pipelineContext.ts` to evaluate whether each specialist applies to the current change set. The function returns `{ triggered: boolean, reasons: string[] }` and consumes the same `SPECIALIST_TRIGGER_TABLE` that the prose table below mirrors. Treat the prose as a quick reference; treat the TS predicate as the authoritative gate. `npm run validate:specialist-roster` enforces parity.
+**Phase 4 Specialist Trigger Table:**
+| Specialist | Mode | Trigger Conditions |
+|-----------|------|--------------------|
+| `hatch3r-docs-writer` | Evaluate | Public API, architecture, or UX changes |
+| `hatch3r-lint-fixer` | Conditional | Lint/type errors present |
+| `hatch3r-architect` | Conditional | Architectural decisions, new modules/services |
+| `hatch3r-devops` | Conditional | CI/CD or infrastructure changes |
+| `hatch3r-ui` (CQ1) | Conditional | UI component / theme / token files modified (`*.{tsx,jsx,vue,svelte}`, `tailwind.config.*`, design-token registries) |
+| `hatch3r-ux` (CQ2) | Conditional | Flow / modal / route-transition / error-state files modified; microcopy or i18n strings changed |
+| `hatch3r-security` (CQ3) | Always | Any code change (always-mode floor — absorbs legacy security-auditor scope); auth / JWT / OAuth / WebAuthn code modified; release workflow modified; cookie or session handling modified; dependency files modified |
+| `hatch3r-reliability` (CQ4) | Conditional | Service handler / OTel / SLO / retry / circuit-breaker code modified; Kubernetes probe manifests modified |
+| `hatch3r-testability` (CQ5) | Always | Any code change (always-mode floor — absorbs legacy test-writer scope); test code added or modified; mandate-map feature class introduced; coverage threshold or runner config modified |
+| `hatch3r-scalability` (CQ6) | Conditional | Request handler / queue client / connection-pool / cache / background-job code modified |
+| `hatch3r-performance` (CQ7) | Conditional | ORM query / data-access layer / UI-rendering component / bundle config modified; vendor dependency >50KB introduced |
+| `hatch3r-maintainability` (CQ8) | Conditional | Any code mutation (duplication + complexity scan); schema / migration / API spec (OpenAPI / GraphQL SDL / Protobuf) modified |
+| `hatch3r-enhancability` (CQ9) | Conditional | User-visible behavior modified; public API surface modified (OpenAPI / GraphQL SDL / AsyncAPI); config schema or feature-flag definition modified |
+**CQ specialist consolidation.** Each CQ-vector specialist owns the full scope previously split between a legacy specialist and the CQ row. `ui` (CQ1) covers axe-core + design-token + four-state + reuse plus deep ARIA / reduced-motion; `security` (CQ3) covers OAuth 2.1 + OIDC + DPoP, SBOM/cosign, OWASP ASI plus the always-on security floor and project-specific deep audits; `performance` (CQ7) covers CWV, p95/p99, bundle size, N+1 plus profile-driven hot-path analysis; `testability` (CQ5) covers mandate-map verification plus test authoring. Per-agent boundaries are documented in each agent file's opening section.
+**Verification harness binding (CQ specialist → verify skill).** A CQ specialist runs its matching verify-class skill as the pass/fail evidence harness for its Phase 4 gate, so audit semantics are not re-authored in two places (D16.3): `ui` (CQ1) + `ux` (CQ2) → `hatch3r-ui-ux-verify`; `reliability` (CQ4) → `hatch3r-reliability-verify` + `hatch3r-observability-verify` (the latter covers OTel span / trace-id correlation on the request path). `hatch3r-qa-validation` (no 1:1 CQ specialist — release/acceptance E2E) and `hatch3r-browser-verify` (multi-purpose Playwright tool, default-ON per UI-affecting invocation) stay standalone harnesses invoked by the orchestrator, not bound to one CQ row. The reciprocal "Invoked by" upstream-citation lives in each verify skill's `## Invoked by` subsection.
+**Project-Type-Aware Specialist Selection:** When `PipelineContext.projectType` is available, enrich specialist prompts with language-specific hints (e.g., ruff/mypy + pytest + SSTI/SQLi for Python; golangci-lint + govulncheck for Go; clippy + cargo-audit for Rust). See `src/pipeline/pipelineContext.ts` `LANGUAGE_SPECIALIST_CONFIGS` for the full mapping.
+### Phase 4 Validation Pass
+After all Phase 4 specialists complete, run a validation pass: run the test suite + type checker against the Phase 3 baseline cached in `PipelineContext`. No new failures → complete. New failures → identify the causing specialist, spawn `hatch3r-fixer`, re-validate (max 2 iterations, matches `DEFAULT_MAX_VALIDATION_PASS_ITERATIONS` in `src/pipeline/pipelineContext.ts`; basis + recalibration triggers in `VALIDATION_PASS_CALIBRATION`; kept in sync by `src/__tests__/pipeline/pipelineContext.test.ts`, CI-enforced); persistent regressions surface to user (never silently accept). If any specialist produced code fixes (not just findings), spawn a lightweight `hatch3r-reviewer` re-review scoped to the specialist-modified files (prevents specialist fixes bypassing the Phase 3 gate; max 1 re-review iteration, Critical findings trigger a single fixer pass).
+### Specialist Success Criteria
+- **testability (CQ5):** all new/modified code paths have tests meeting the mandate map; no untested branches in changed files.
+- **security (CQ3):** no HIGH/CRITICAL findings unresolved; MEDIUM documented with plan; CQ3 thresholds (npm provenance, SBOM, SHA-pin, OWASP ASI) met for in-scope changes.
+- **docs-writer:** affected APIs, architecture, and UX reflected in docs. **lint-fixer:** zero lint/type errors in changed files.
+- **ui (CQ1):** WCAG AA compliance; no new a11y violations; design-token + four-state coverage; reuse-first delta.
+- **performance (CQ7):** no performance regressions; new hot paths benchmarked; CWV / p95/p99 / bundle-size budgets met.
+- **architect:** ADRs documented; design aligns with patterns or divergence justified. **devops:** CI/CD passes end-to-end; deployment config validated.
+## Skill Loading Directives
+Load the matching skill before implementation: `type:bug` → `hatch3r-bug-fix`; `type:feature` → `hatch3r-feature`; `type:refactor` + `area:ui` → `hatch3r-visual-refactor`; `type:refactor` + behavior change → `hatch3r-logical-refactor`; `type:refactor` (other) → `hatch3r-refactor`; `type:qa` → `hatch3r-qa-validation`. Skill-referenced agent delegations are mandatory.
+## Subagent Spawning Protocol
+Use `subagent_type: "generalPurpose"` for all delegations. Include the agent protocol (the hatch3r role id, e.g. `hatch3r-reviewer`, named in the prompt), applicable `scope: always` rules, tooling hierarchy, and relevant learnings. Launch independent sub-agents in parallel (maximum parallelism); await and review results, surfacing BLOCKED or PARTIAL to the user.
+**Tool-allowlist enforcement boundary (ASI02/ASI03).** The generic-spawn convention has a trust-boundary consequence the orchestrator MUST account for: the Claude Code PreToolUse hook (`src/pipeline/agentToolAllowlist.ts::buildClaudePreToolUseHookScript`) gates only when the payload `agent_type` starts with `hatch3r-`; a `generalPurpose` spawn carries Claude Code's own `agent_type` (`general-purpose`), so the runtime hook passes it through (this is intentional — Claude Code's built-in sub-agents must not be governed by hatch3r policy). Therefore the **active** allowlist enforcement for delegated hatch3r work is the orchestrator-boundary gate `checkToolAccess(roleId, toolCategory)` (`src/pipeline/agentToolAllowlist.ts`), which the orchestrator applies using the hatch3r role id it placed in the prompt — deny-by-default before forwarding a tool category to the sub-agent. The runtime PreToolUse hook is a defense-in-depth second layer that fires only for adapters/sessions that spawn role-bearing native sub-agents (`subagent_type: "hatch3r-<role>"`); under the generic-spawn default it does not fire, and the boundary gate is the sole enforcement point.
+## Parallel Safety
+Default is **linear per task** (Phase 1 → 2 → 3 → 4 serially) — `PipelineContext` is a single handoff token and LLM orchestrators reason better with sequential context. Phase 4 specialists parallelize because they read-only Phase 3 artifacts; extending parallelism to Phases 1-3 requires the conditions below.
+### Three Conditions to Parallelize
+ALL three must hold: (1) **read-only or disjoint writes** (no conflict zone); (2) **deterministic aggregation** (outputs merge without orchestrator intervention — tests pass-if-all-pass, findings union); (3) **no shared mutable state** (agents that mutate `PipelineContext.state`/`featureFlags`/`metadata` serialize; parallel agents only READ).
+**Parallel-safe:** Phase 4 specialists; intra-Phase-1 researcher modes on a self-contained task; per-module Phase 2 fan-out on disjoint `affectedFiles` (merged post-Phase-2); Tier 2/3 elicitation researchers (outputs tagged with confidence + perspective). **NOT parallel-safe:** cross-phase execution (each phase depends on the prior's output); Phase 3 review-loop iterations (reviewer → fixer → re-reviewer serial); overlapping-file implementers (serialize or use a merge-conflict gate); Phase 4 validation re-review.
+**Cost-Dominance Principle.** Token cost of sub-agent invocation never justifies serialization of independent work. The three safety conditions govern WHEN parallelism is safe; cost does not govern WHETHER to parallelize. When in doubt, fan out. Serialization is only valid on true dependency edges.
+**Scaling Heuristic.** Sub-agent count tracks task decomposition: N independent modules → N parallel Phase-2 implementers; M specialist gates → M parallel Phase-4 specialists; K independent research questions → K parallel `hatch3r-researcher` sub-agents (one per question, findings unioned post-Phase-1). The K-parallel-researcher path is mandatory only when Phase 1 decomposes into ≥2 questions whose answers do not depend on each other; a single-question task keeps one researcher running its mode set serially. Orchestrators emit `sub_agents_spawned: {count, rationale}` in their structured output.
+### Concurrent Invocation Handling
+Two top-level pipelines running at once (e.g. `hatch3r-workflow` in one shell, `hatch3r-board-pickup` in another) are bounded by the same three conditions, applied cross-pipeline (D7-SA7.5-F7.5.6): (1) **Advisory lock-note (best-effort, NOT atomic — D7-27)** — before Phase 1, the orchestrator writes/reads `.hatch3r/.lock` (JSON: `pid`, `command`, `branch`, `correlation_id`, `started_at`); clear on completion/abort; treat a note older than 6h as stale. This is an advisory coordination note, not a mutual-exclusion primitive: there is no `hatch3r` lock verb and no atomic acquire in `src/` (the only cross-process lock that ships is `acquireWriteLock` in `src/merge/safeWrite.ts`, scoped to single-file atomic writes, not top-level pipelines), so the LLM orchestrator's read-then-write is TOCTOU by construction and `src/cli/commands/status.ts` already labels this file "advisory". It exists to surface a likely collision, never to guarantee exclusion. (2) **Detect-then-warn** — if a live note names the same branch or an open `.hatch3r/hatch.json` board transaction, WARN and ASK (proceed on a separate branch / wait / abort); never silently co-mutate shared state. (3) **True isolation via worktree, not the lock-note (D7-27)** — when concurrency must actually be conflict-free rather than merely warned (the parallel-implementer path), route each pipeline through `hatch3r worktree-setup <name>` — the isolation primitive hatch3r already ships (`src/cli/commands/worktreeSetup.ts`; `commands/board/pickup-delegation-multi.md` already uses it per implementer) — so the pipelines write disjoint working trees and integrate back on completion, satisfying the disjoint-writes safety condition without relying on the non-atomic note. (4) **Cache-sharing** — `.hatch3r/learnings/` is read-many, write-once-at-completion with timestamp-ordered conflict resolution. Cross-task context sharing is bounded by the no-shared-mutable-state condition: learnings consolidate at pipeline completion; mid-pipeline writes are out of scope to preserve parallel-safety determinism (D7-SA7.5-F7.5.8). Each command's Guardrails cite this subsection.
+## Cross-Phase Error Propagation
+On a non-SUCCESS status, the orchestrator MUST propagate error context downstream, never silently drop it. **Phase 1 PARTIAL:** include `researchGaps` in the implementer prompt and set confidence expectations accordingly. **Phase 2 PARTIAL:** include `reason` + unimplemented acceptance criteria in the reviewer prompt (reviewer distinguishes "not done yet" from "done incorrectly"). **Phase 3 UNRESOLVED:** include the unresolved findings in Phase 4 specialist prompts (specialists must not conflict with known issues). **Phase 4 specialist FAILED:** include the failure reason when surfacing — never report "Phase 4 failed" without naming which specialist and why.
+**Sub-agent-failure handling (shared clause; all commands cite this — never inline).** When any spawned implementer/fixer/specialist sub-agent FAILS or returns no usable output: (1) retry once with the same prompt; (2) if the retry fails, re-spawn `hatch3r-fixer` (Phase 3) with the failure reason + partial output as failure context — `hatch3r-fixer` is the code-mutation path, so the work stays delegated; (3) if the re-spawn also fails, emit `BLOCKED_OTHER` with a one-sentence reason and ASK the user (fix-manually vs adjust-scope vs accept-risk). The orchestrator MUST NOT fall back to inline implementation — that is the issue #73 bypass mode (see Mandatory Delegation Directive). The sole exception is `hatch3r-quick-change`, whose Tier-1 carve-out permits inline retry per its declared scope.
+## Correlation ID
+Generate a UUID v4 per top-level task before Phase 1. Include in every sub-agent prompt as `correlation_id`. All sub-agents include it in logs, outputs, and status reports. Epic sub-issues get individual IDs; batch tasks share one ID with a sub-task index.
+## Severity Scale
+| Severity | Definition | Pipeline Action |
+|----------|-----------|-----------------|
+| **CRITICAL** | Blocks merge. Security vulnerabilities, data loss, broken core functionality. | Must resolve before Phase 3 exit. |
+| **HIGH** | Should fix before merge. Significant bugs, performance regressions. | Fix or escalate to user. |
+| **MEDIUM** | Fix in same sprint. Code quality, minor bugs. | Document with remediation plan. |
+| **LOW** | Track for future. Style nits, minor improvements. | Log only. No merge gate. |
+| **INFO** | Informational. Observations, suggestions. | Awareness only. |
+## Status Codes
+All sub-agents MUST map findings to the Severity Scale above. **SUCCESS** (fully completed, all criteria met) · **PARTIAL** (include `reason`) · **FAILED** (no usable output; include `reason`) · **SKIPPED** (intentionally not executed) · **TIMEOUT** (time budget exceeded; forward partial output) · **BLOCKED_AMBIGUITY** · **BLOCKED_MISSING_CONTEXT** · **BLOCKED_CONFLICTING_SPECS** · **BLOCKED_MISSING_TOOL** · **BLOCKED_PREMISE_CHALLENGE** · **BLOCKED_OTHER** (one-sentence reason required). The six BLOCKED_* values are the canonical named escalation enum codified in `agents/shared/quality-charter.md` §17; every `agents/hatch3r-*.md` main agent MUST declare a Status field selecting from this enum. BLOCKED_PREMISE_CHALLENGE triggers `isHaltStatus()` from `src/pipeline/pipelineContext.ts::AgentStatus` — orchestrator halts and surfaces the premise concern + ≥1 alternative approach (Finding D7-M1 / D7-SA7.1-1). The reviewer's Phase-3 equivalent is the `DESIGN_OBJECTION` verdict; implementer/researcher/fixer emit the agent status, reviewer emits the verdict — both surface the same premise-challenge across non-overlapping phases.
+## Phase Handoff Contract
+Each phase populates a typed slice of `PipelineContext` (canonical schema: `src/pipeline/pipelineContext.ts::PipelineContext`). Required fields per transition: Phase 1 → 2 sets `researchFindings` (or `researchGaps[]` when skip-documented); Phase 2 → 3 sets `implementationResult` (filesChanged, testsWritten, status ∈ `AgentStatus`, reason); Phase 3 → 4 sets `reviewResult` (iterations, finalVerdict, findings, confirmationPassResult, confidence); Phase 4 → completion sets `qualityResults` (specialists[], validationPass). The typed gate `validatePhaseTransition(context, targetPhase, options?)` returns the `ValidationError[]` set the orchestrator MUST resolve before advancing; forwarding sub-agent output that omits a required field is a Phase Handoff Contract violation (Finding D7-M3 / D7-SA7.1-3) — re-spawn the upstream agent with the gap named. The Phase 3 → 4 advance rejects an `UNRESOLVED` verdict unless `options.allowUnresolvedAdvance` is set (the user-chose-manual skip condition, Finding D7-10) and accepts an absent `reviewResult`/`iterations: 0` only under `options.phase3Skipped` (the docs-only/trivial Phase 3 skip, Finding D7-11). Phase 4 completion is additionally gated by `evaluatePhase4Completion(qualityResults, options)` — a typed predicate aggregating Phase 4 Validation Pass criteria; when `complete: false`, surface `incompletionReason` (Finding D7-M8 / D7-SA7.3-3). **Handoff-loss trigger (Finding D7-23):** when a transition applies lossy compression (`hatch3r-agent-orchestration-detail` § Context-Degradation Policy), the orchestrator records `createPhaseHandoffMetrics` (passing the protected-byte count for never-truncate strategy-#4 bytes) and, when `informationLossEstimate > 0.3`, emits the `formatPhaseHandoffWarning` line in the iteration summary so downstream phases verify critical context survived — this is the command-layer trigger; every pipeline command inherits it via this always-loaded rule.
+## Phase Skip Criteria
+All commands that use the pipeline MUST reference these criteria — do not invent command-specific skip rules.
+| Phase | Can Skip When | Mandatory Minimum (even when skipped) |
+|-------|--------------|--------------------------------------|
+| **Phase 1 (Research)** | Trivial single-line edit (typo, comment, single-value config); Tier 1 single-file change with no cross-module impact; Research already cached in PipelineContext | Affected files identified (even via quick scan); existing tests noted |
+| **Phase 2 (Implement)** | Never — implementation is always required for code changes | All changes via hatch3r-implementer (never inline except trivial items in quick-change) |
+| **Phase 3 (Review)** | All items trivial (quick-change only); documentation-only change with no code | Quality checks (lint/typecheck/test) must pass; acceptance criteria verified |
+| **Phase 4 (Quality)** | Review loop unresolved AND user chose manual resolution; documentation-only; all trivial + quality checks pass (quick-change only) | testability (CQ5) + security (CQ3) always required for code changes; quality checks must pass |
+See `src/pipeline/pipelineContext.ts` for the programmatic `PHASE_SKIP_CRITERIA` constant.
+## Root-Cause Depth Requirements
+When a phase reports a failure or unexpected result, the orchestrator MUST classify root cause before the next action — reject the shallow fix, require the root-cause fix:
+- **Test failure after Phase 2:** not disable/skip the test — identify why the implementation breaks it; fix the code or update the test with justification.
+- **Lint errors after Phase 4:** not `eslint-disable` comments — fix the underlying code pattern.
+- **Type errors after fixer changes:** not `as any` casts — trace the mismatch to its source and fix the type definition or usage.
+- **Review loop not converging:** not surface after 3 iterations without analysis — classify whether findings oscillate (fixer A breaks what fixer B fixed) and surface the conflict pattern.
+Reject superficial fixes from any sub-agent. If a fixer's output contains suppression patterns (disable comments, `any` casts, test skips without linked issues), classify as PARTIAL and re-run with a prompt requesting a root-cause fix. This rejection is backed by a typed advisory: the reviewer runs `detectSuppressionPatterns(diff)` (`src/pipeline/reviewLoop.ts`) over the fixer diff — it flags `as any` casts, `eslint-disable` directives with no issue reference, and `test.skip`/`it.skip`/`describe.skip` with no linked issue. A non-empty `found` is the machine-checkable signal for this gate (mirroring how the orchestrator consults `detectOscillation`); the reviewer downgrades the verdict so the existing review loop forces the re-run.
+## Task Context Protocols
+**Single-task plain chat:** classify task type, create synthetic issue context, run the full pipeline (fetch issue-reference details via platform CLI). **Multi-task plain chat:** parse into discrete tasks, classify each, build a dependency graph, parallelize researchers + implementers per dependency level, run the review loop after all implementations, then Phase 4 specialists; when parallel implementers touch the same file, accept disjoint-region edits, merge overlapping regions using the larger-scope change as base, and halt on semantic conflicts for user resolution. **Auto-mode guardrails:** verify scope containment, no unapproved destructive operations, and output-schema compliance after each phase; halt on violation. Full specs in `hatch3r-agent-orchestration-detail`.
+## Rule Application
+All `scope: always` rules apply to every task including sub-agent work; include rule directives in sub-agent prompts. For limited context windows, Tier 1 is mandatory; Tier 2/3 included selectively. Inclusion tiers:
+- **Tier 1 — always include (every sub-agent):** `hatch3r-security-patterns`, `hatch3r-code-standards`.
+- **Tier 2 — by phase:** `hatch3r-testing` (testability/implementer/reviewer); `hatch3r-accessibility-standards` (ui, UI reviewer); `hatch3r-git-conventions` (orchestrator git ops); `hatch3r-ci-cd` (ci-watcher/devops); `hatch3r-dependency-management` (security CQ3 supply-chain slice).
+- **Tier 3 — on-demand by role + scope:** `hatch3r-api-design`, `hatch3r-secrets-management`, `hatch3r-data-classification`, `hatch3r-performance-budgets`, `hatch3r-browser-verification`, `hatch3r-component-conventions`, `hatch3r-i18n`, `hatch3r-theming`, `hatch3r-migrations`, `hatch3r-feature-flags`, `hatch3r-observability-logging`, `hatch3r-observability-metrics`, `hatch3r-observability-tracing`.

package/{rules → dist/content/rules}/hatch3r-ai-evals.md RENAMED Viewed

@@ -2,8 +2,10 @@
 id: hatch3r-ai-evals
 type: rule
 description: AI feature evaluation, prompt versioning, cost telemetry, prompt caching, model fallback, and hallucination-as-SLI for end-user projects shipping LLM features
-scope: "**/ai/**,**/llm/**,**/chat/**,**/assistant/**,**/agents/**,**/copilot/**,**/evals/**,**/prompts/**,**/rag/**"
+scope: conditional
+globs: "**/ai/**,**/llm/**,**/chat/**,**/assistant/**,**/agents/**,**/copilot/**,**/evals/**,**/prompts/**,**/rag/**"
 tags: [review, implementation, ai]
+precedence: high
 quality_charter: agents/shared/quality-charter.md
 cache_friendly: true
 ---
@@ -24,7 +26,7 @@ Pick one tool by task class:
 - **promptfoo** — broad coverage, declarative YAML, model-comparison defaults
 - **DeepEval** — pytest-style assertions for CI gate integration
 - **RAGAS** — retrieval-augmented generation metrics (context_precision, context_recall, faithfulness, answer_relevance)
-- **Inspect** — UK AISI framework for safety and agentic evals
+- **Inspect** — UK AISI framework for safety and agentic evals. At `scaleup`/`enterprise` maturity (CONSTITUTION §6 Decision 16), use its external-agent runner (one harness drives Claude Code / Codex CLI / Gemini CLI) with bootstrap statistical scoring for multi-agent statistical-significance gating — point-estimate eval scores carry a confidence interval rather than a single number.
 - **braintrust** — SaaS + OSS hybrid, run history retained per prompt version
 - **TruLens** — observability-coupled, runs evals against live traces
 - **Arize Phoenix** — open-source observability with eval modules
@@ -67,7 +69,7 @@ Match the metric to the task class:
 Every LLM call logs: `tokens_in`, `tokens_out`, `cache_hit` (boolean + cached_tokens count), `model`, `cost_usd`, `latency_ms`, `cost_center` (feature ID), `prompt_version`, `prompt_hash`, `user_id_hash`.
-Aggregate dashboards in the observability stack — cross-reference `rules/hatch3r-observability-metrics.md` and `rules/hatch3r-observability-tracing-detail.md` for the SLI/SLO vocabulary, and `skills/hatch3r-observability-verify` for the wiring checklist. Per-feature budget alerts fire at 50%, 75%, and 90% of monthly budget; abuse-detection alert at 10x user p99 cost over a 1-hour window.
+Aggregate dashboards in the observability stack — cross-reference `rules/hatch3r-observability-metrics.md` and `rules/hatch3r-observability-tracing.md` for the SLI/SLO vocabulary, and `skills/hatch3r-observability-verify` for the wiring checklist. Per-feature budget alerts fire at 50%, 75%, and 90% of monthly budget; abuse-detection alert at 10x user p99 cost over a 1-hour window.
 ## Prompt Caching (Anthropic)
@@ -126,7 +128,7 @@ Methodology aligned with **BFCL v4** (Berkeley Function Calling Leaderboard) and
 ## OpenTelemetry GenAI Semantic Conventions
-Every LLM call emits an OpenTelemetry span named `gen_ai.<operation>` with the attributes prescribed by the OpenTelemetry GenAI semantic conventions: `gen_ai.system`, `gen_ai.request.model`, `gen_ai.response.model`, `gen_ai.usage.input_tokens`, `gen_ai.usage.output_tokens`, `gen_ai.usage.cached_tokens`, `gen_ai.request.temperature`, `gen_ai.tool.name` (when tools used). Cross-reference Slice 2 observability rules for the broader span taxonomy.
+Every LLM call emits an OpenTelemetry span named `{gen_ai.operation.name} {gen_ai.request.model}` with the attributes named by the OpenTelemetry GenAI semantic conventions (v1.41.1): `gen_ai.operation.name`, `gen_ai.provider.name` (renamed from the deprecated `gen_ai.system`), `gen_ai.request.model`, `gen_ai.response.model`, `gen_ai.usage.input_tokens`, `gen_ai.usage.output_tokens`, `gen_ai.usage.cached_tokens`, `gen_ai.request.temperature`, `gen_ai.tool.name` (when tools used). These `gen_ai.*` keys are Development-status as of v1.41.1 — names may change; pin the SemConv version you emit and re-verify each P3 currency cycle. Cross-reference Slice 2 observability rules for the broader span taxonomy.
 ## User-Feedback Loop
@@ -146,7 +148,7 @@ Write eval before prompt, measure baseline, write prompt, measure delta, iterate
 ## References
-- promptfoo — `promptfoo.dev`
+- promptfoo — `promptfoo.dev` (acquired by OpenAI 2026-03-09; remains OSS MIT)
 - DeepEval — `github.com/confident-ai/deepeval`
 - RAGAS — `docs.ragas.io`
 - Inspect (UK AISI) — `github.com/UKGovernmentBEIS/inspect_ai`

package/{rules → dist/content/rules}/hatch3r-ai-evals.mdc RENAMED Viewed

@@ -2,6 +2,7 @@
 description: AI feature evaluation, prompt versioning, cost telemetry, prompt caching, model fallback, and hallucination-as-SLI for end-user projects shipping LLM features
 globs: ["**/ai/**", "**/llm/**", "**/chat/**", "**/assistant/**", "**/agents/**", "**/copilot/**", "**/evals/**", "**/prompts/**", "**/rag/**"]
 alwaysApply: false
+precedence: high
 ---
 # AI Feature Evaluation and Cost Governance (2026)
@@ -20,7 +21,7 @@ Pick one tool by task class:
 - **promptfoo** — broad coverage, declarative YAML, model-comparison defaults
 - **DeepEval** — pytest-style assertions for CI gate integration
 - **RAGAS** — retrieval-augmented generation metrics (context_precision, context_recall, faithfulness, answer_relevance)
-- **Inspect** — UK AISI framework for safety and agentic evals
+- **Inspect** — UK AISI framework for safety and agentic evals. At `scaleup`/`enterprise` maturity (CONSTITUTION §6 Decision 16), use its external-agent runner (one harness drives Claude Code / Codex CLI / Gemini CLI) with bootstrap statistical scoring for multi-agent statistical-significance gating — point-estimate eval scores carry a confidence interval rather than a single number.
 - **braintrust** — SaaS + OSS hybrid, run history retained per prompt version
 - **TruLens** — observability-coupled, runs evals against live traces
 - **Arize Phoenix** — open-source observability with eval modules
@@ -63,7 +64,7 @@ Match the metric to the task class:
 Every LLM call logs: `tokens_in`, `tokens_out`, `cache_hit` (boolean + cached_tokens count), `model`, `cost_usd`, `latency_ms`, `cost_center` (feature ID), `prompt_version`, `prompt_hash`, `user_id_hash`.
-Aggregate dashboards in the observability stack — cross-reference `rules/hatch3r-observability-metrics.md` and `rules/hatch3r-observability-tracing-detail.md` for the SLI/SLO vocabulary, and `skills/hatch3r-observability-verify` for the wiring checklist. Per-feature budget alerts fire at 50%, 75%, and 90% of monthly budget; abuse-detection alert at 10x user p99 cost over a 1-hour window.
+Aggregate dashboards in the observability stack — cross-reference `rules/hatch3r-observability-metrics.md` and `rules/hatch3r-observability-tracing.md` for the SLI/SLO vocabulary, and `skills/hatch3r-observability-verify` for the wiring checklist. Per-feature budget alerts fire at 50%, 75%, and 90% of monthly budget; abuse-detection alert at 10x user p99 cost over a 1-hour window.
 ## Prompt Caching (Anthropic)
@@ -122,7 +123,7 @@ Methodology aligned with **BFCL v4** (Berkeley Function Calling Leaderboard) and
 ## OpenTelemetry GenAI Semantic Conventions
-Every LLM call emits an OpenTelemetry span named `gen_ai.<operation>` with the attributes prescribed by the OpenTelemetry GenAI semantic conventions: `gen_ai.system`, `gen_ai.request.model`, `gen_ai.response.model`, `gen_ai.usage.input_tokens`, `gen_ai.usage.output_tokens`, `gen_ai.usage.cached_tokens`, `gen_ai.request.temperature`, `gen_ai.tool.name` (when tools used). Cross-reference Slice 2 observability rules for the broader span taxonomy.
+Every LLM call emits an OpenTelemetry span named `{gen_ai.operation.name} {gen_ai.request.model}` with the attributes named by the OpenTelemetry GenAI semantic conventions (v1.41.1): `gen_ai.operation.name`, `gen_ai.provider.name` (renamed from the deprecated `gen_ai.system`), `gen_ai.request.model`, `gen_ai.response.model`, `gen_ai.usage.input_tokens`, `gen_ai.usage.output_tokens`, `gen_ai.usage.cached_tokens`, `gen_ai.request.temperature`, `gen_ai.tool.name` (when tools used). These `gen_ai.*` keys are Development-status as of v1.41.1 — names may change; pin the SemConv version you emit and re-verify each P3 currency cycle. Cross-reference Slice 2 observability rules for the broader span taxonomy.
 ## User-Feedback Loop
@@ -142,7 +143,7 @@ Write eval before prompt, measure baseline, write prompt, measure delta, iterate
 ## References
-- promptfoo — `promptfoo.dev`
+- promptfoo — `promptfoo.dev` (acquired by OpenAI 2026-03-09; remains OSS MIT)
 - DeepEval — `github.com/confident-ai/deepeval`
 - RAGAS — `docs.ragas.io`
 - Inspect (UK AISI) — `github.com/UKGovernmentBEIS/inspect_ai`

package/{rules → dist/content/rules}/hatch3r-ai-ux-patterns.md RENAMED Viewed

@@ -2,13 +2,17 @@
 id: hatch3r-ai-ux-patterns
 type: rule
 description: 2026 AI/agentic UX patterns for end-user projects shipping AI features — streaming, tool-call UI, human-approval gates, cancel/abort/undo, citations
-scope: "**/*.vue,**/*.jsx,**/*.tsx,**/*.svelte,**/ai/**,**/chat/**,**/assistant/**,**/agents/**,**/llm/**,**/copilot/**"
-tags: [ux, ai, frontend]
+scope: conditional
+globs: "**/*.vue,**/*.jsx,**/*.tsx,**/*.svelte,**/ai/**,**/chat/**,**/assistant/**,**/agents/**,**/llm/**,**/copilot/**"
+tags: [implementation, floor:ui-ux, ux, ai, frontend]
+precedence: high
 quality_charter: agents/shared/quality-charter.md
 cache_friendly: true
 ---
 # AI/Agentic UX Patterns (2026)
+**Pillars:** P2 (Scientific & Practical Quality), CQ2 (UX Quality)
 ## Scope
 This rule applies when the end-user project ships LLM-driven UI — chat, assistant, copilot, agent dashboards, generative UI surfaces. It does NOT govern the LLM backend itself (model selection, prompt engineering, retrieval pipeline). For non-AI UX rules (loading, empty, error, partial states; form patterns; microcopy), cross-reference `rules/hatch3r-ux-states-and-flows.md`. When both rules apply to the same surface, the non-AI rule sets the baseline and this rule layers AI-specific behavior on top.
@@ -110,7 +114,7 @@ Three distinct affordances — do not collapse them into a single control:
 ## Verification Gate
-Before declaring an AI surface "done":
+These 7 checks are operationalized as Gate 8 of `skills/hatch3r-ui-ux-verify` — the skill executes all 7, not just the streaming/tool-call subset. Before declaring an AI surface "done":
 - Streaming verified end-to-end: scripted Playwright test that asserts progressive token render (first token <1s after request, last token marked complete).
 - Tool-call card snapshot per state (`pending`, `in-progress`, `complete`, `failed`) — missing any state is a blocker.

package/{rules → dist/content/rules}/hatch3r-ai-ux-patterns.mdc RENAMED Viewed

@@ -2,9 +2,12 @@
 description: 2026 AI/agentic UX patterns for end-user projects shipping AI features — streaming, tool-call UI, human-approval gates, cancel/abort/undo, citations
 globs: ["**/*.vue", "**/*.jsx", "**/*.tsx", "**/*.svelte", "**/ai/**", "**/chat/**", "**/assistant/**", "**/agents/**", "**/llm/**", "**/copilot/**"]
 alwaysApply: false
+precedence: high
 ---
 # AI/Agentic UX Patterns (2026)
+**Pillars:** P2 (Scientific & Practical Quality), CQ2 (UX Quality)
 ## Scope
 This rule applies when the end-user project ships LLM-driven UI — chat, assistant, copilot, agent dashboards, generative UI surfaces. It does NOT govern the LLM backend itself (model selection, prompt engineering, retrieval pipeline). For non-AI UX rules (loading, empty, error, partial states; form patterns; microcopy), cross-reference `rules/hatch3r-ux-states-and-flows.md`. When both rules apply to the same surface, the non-AI rule sets the baseline and this rule layers AI-specific behavior on top.
@@ -106,7 +109,7 @@ Three distinct affordances — do not collapse them into a single control:
 ## Verification Gate
-Before declaring an AI surface "done":
+These 7 checks are operationalized as Gate 8 of `skills/hatch3r-ui-ux-verify` — the skill executes all 7, not just the streaming/tool-call subset. Before declaring an AI surface "done":
 - Streaming verified end-to-end: scripted Playwright test that asserts progressive token render (first token <1s after request, last token marked complete).
 - Tool-call card snapshot per state (`pending`, `in-progress`, `complete`, `failed`) — missing any state is a blocker.

package/dist/content/rules/hatch3r-android-patterns.md ADDED Viewed

@@ -0,0 +1,107 @@
+---
+id: hatch3r-android-patterns
+type: rule
+description: Android Kotlin conventions covering Jetpack Compose, coroutines + Flow, Hilt DI, Room, modular Gradle, AGP 8.x, target SDK 35, and Compose testing
+scope: conditional
+globs: "**/*.kt,**/*.kts,**/build.gradle,**/build.gradle.kts,**/settings.gradle,**/settings.gradle.kts,**/gradle.properties,**/AndroidManifest.xml,**/proguard-rules.pro,**/app/src/**,**/android/app/**,**/libs.versions.toml"
+tags: [implementation, lang:java]
+quality_charter: agents/shared/quality-charter.md
+cache_friendly: true
+---
+# Android Patterns
+**Pillars:** P2 (Scientific & Practical Quality), CQ8 (Maintainability Quality)
+> Applies when the project ships an Android app or Kotlin library. Detection signals: `build.gradle` / `build.gradle.kts`, `AndroidManifest.xml`, `app/` module, or any `*.kt` file paired with Gradle wrapper. Native Android takes precedence over Flutter/React-Native cross-platform shells.
+## Kotlin Language Floor
+- Target Kotlin 2.0+ with K2 compiler (`kotlin.experimental.tryK2=true`). Treat warnings as errors in CI.
+- Use `data class` for value-equality types, `sealed class` / `sealed interface` for closed-set hierarchies (UI state, events, errors). Exhaust `when` over sealed types — the compiler enforces it.
+- Coroutines + Flow for concurrency. Never use `Thread` directly. Wrap legacy callback APIs with `suspendCancellableCoroutine` at the boundary.
+- Nullability: explicit `?` for nullable types; avoid `!!` (force-unwrap) outside test fixtures. Platform types from Java interop annotated with `@Nullable` / `@NonNull` where possible.
+## Build Tooling
+- Android Gradle Plugin (AGP) 8.7+ with Gradle 8.10+. Use the version catalog (`gradle/libs.versions.toml`) for every dependency — no hard-coded versions in `build.gradle.kts`.
+- Kotlin DSL (`build.gradle.kts`) for new modules. Migrate Groovy scripts during regular refactors; do not mix dialects in a single module's chain.
+- Target SDK 35 (Android 15) — Google Play requires `targetSdk = 35` for new apps and updates in 2025. Set `compileSdk = 35` alongside.
+- Min SDK: 24 (Android 7.0) covers >97% of active devices per Android Distribution Dashboard. Going below 24 means losing modern Kotlin stdlib calls that require API 24.
+## Architecture
+- Pick ONE app-architecture pattern per app and document it in `docs/architecture.md`:
+  - **MVI / Unidirectional Data Flow with Compose + ViewModel + StateFlow** — recommended default.
+  - **MVVM with LiveData** — only for legacy apps that haven't migrated to Compose.
+- Modular Gradle structure: `app/` (entry point), `feature:<name>/` (feature modules), `core:<network|database|design-system>/` (shared modules). Cross-feature imports go through `core:` interfaces.
+- Composition root in `app/` wires Hilt graphs. Feature modules expose `@Module @InstallIn(SingletonComponent::class)` providers; they never instantiate dependencies directly.
+## Dependency Injection
+- Use Hilt for runtime DI (annotation-processor based, sits on top of Dagger). Configure `@HiltAndroidApp` on the `Application` class and `@AndroidEntryPoint` on Activities, Fragments, Services.
+- Constructor injection for ViewModels via `@HiltViewModel` + `@Inject constructor`. Do not use `ViewModelProvider.Factory` manually unless integrating with a non-Hilt host.
+- Scoped bindings: `@SingletonComponent` for app-wide singletons, `@ViewModelScoped` for per-ViewModel instances. Never share `@SingletonComponent` state with UI lifecycle.
+## Jetpack Compose
+- Compose is the UI floor for new screens. Migrate XML views during regular refactors; do not mix `ConstraintLayout` XML with Compose for a single screen.
+- State hoisting: `@Composable` functions are stateless by default. Hoist `MutableState` to the ViewModel; pass values + lambdas down.
+- Use `collectAsStateWithLifecycle()` on Flow → State conversions inside composables. Plain `collectAsState()` ignores lifecycle and leaks during backgrounded screens.
+- Recompose discipline: stable types only as composable parameters. Use `@Stable` / `@Immutable` annotations or `kotlinx-collections-immutable` for lists. `MutableList<T>` parameters force every consumer to recompose on any mutation.
+- Material 3 (`androidx.compose.material3`) is the design system floor for Compose. Material 2 is legacy.
+## Coroutines & Flow
+- Structured concurrency: every coroutine launched in a `CoroutineScope` tied to a lifecycle. Use `viewModelScope` (auto-cancels on `onCleared()`), `lifecycleScope` (auto-cancels on lifecycle destroy), and `WorkManager` for background tasks.
+- Never launch from `GlobalScope`. Never call `runBlocking` on the main thread.
+- Cold flows (`flow { }`, `flowOf`) for one-shot streams; hot flows (`MutableStateFlow`, `MutableSharedFlow`) for UI state and event buses. `LiveData` is legacy — use `StateFlow` for new code.
+- Use `Dispatchers.IO` for blocking I/O, `Dispatchers.Default` for CPU-bound work. The main dispatcher is for UI updates only; no blocking work there.
+## Persistence
+- Room is the SQL persistence floor. Define `@Entity`, `@Dao`, and a `@Database` class with a versioned schema. Migration paths via `Migration` objects committed to VCS.
+- DataStore (Preferences or Proto) for key-value and structured preferences. SharedPreferences is legacy — do not adopt for new code.
+- For network sync: WorkManager with `Constraints` (network, charging, battery) + exponential backoff. Never use `JobScheduler` directly — WorkManager wraps it and handles edge cases.
+- Bind Room DAOs and DataStore to coroutines / Flow — suspend functions for one-shot queries, `Flow<T>` for observed reads.
+## Networking
+- Use Retrofit 2.11+ with kotlinx-serialization-converter or Moshi (Gson is in maintenance — prefer one of the modern serializers). Configure OkHttp logging interceptor only in debug builds.
+- Ktor Client is the alternative when the app shares a multi-platform stack with Kotlin Multiplatform.
+- Configure timeouts on every OkHttp client: connect (10s), read (30s), write (15s). Default timeouts are unbounded — production apps must override.
+- Certificate pinning (`CertificatePinner`) for high-security endpoints. Always include a backup pin and a rotation schedule.
+## Testing
+- Unit tests with JUnit 5 (`org.junit.jupiter:junit-jupiter`) + MockK for mocking. JUnit 4 + Mockito is legacy — do not add for new modules.
+- Coroutine tests with `kotlinx-coroutines-test` (`runTest`, `TestDispatcher`). Never use `Thread.sleep` to wait for async work.
+- Compose UI tests with `androidx.compose.ui:ui-test-junit4`. Use semantics-based matchers (`hasContentDescription`, `hasTestTag`) — avoid text matchers for resilience.
+- Instrumented tests via `androidTest/` source set on emulators or Firebase Test Lab matrices. Configure `testCoverage` in the AGP block; floor 80% line coverage in `feature:` modules.
+- Screenshot tests via Paparazzi (host-side, no emulator needed) or Roborazzi. Update goldens through PR review.
+## Accessibility
+- Every interactive composable has `Modifier.semantics { contentDescription = "..." }` or uses Material 3 widgets that provide semantics by default.
+- TalkBack tested on every PR touching UI. The TalkBack-on screenshot test in Paparazzi catches regressions automatically.
+- Dynamic font scaling: respect `MaterialTheme.typography` and avoid hard-coded `.sp` values in layouts. Test with the largest accessibility font size in the system settings.
+- Touch targets: minimum 48dp × 48dp per Material 3 guidelines. `Modifier.minimumInteractiveComponentSize()` enforces this when composables are below 48dp.
+## Distribution
+- Sign release builds with Play App Signing — Google holds the upload key, you hold the signing key in escrow. Configure via Play Console.
+- App Bundle (`*.aab`) for Play Store, APK only for sideload / direct distribution. Enable `bundle { language { enableSplit = true } }`.
+- ProGuard / R8 on release builds (`isMinifyEnabled = true`, `isShrinkResources = true`). Maintain `proguard-rules.pro` per module; verify post-shrink builds in CI before publishing.
+- Use Crashlytics or Sentry for crash reporting. Symbolicate native crashes via NDK symbol upload in CI.
+## References
+- AGP 8.x release notes: https://developer.android.com/build/releases/gradle-plugin (accessed 2026-05-27, official-docs)
+- Jetpack Compose: https://developer.android.com/jetpack/compose (accessed 2026-05-27, official-docs)
+- Coroutines + Flow: https://kotlinlang.org/docs/coroutines-overview.html (accessed 2026-05-27, official-docs)
+- Google Play target SDK 35 requirement: https://developer.android.com/google/play/requirements/target-sdk (accessed 2026-05-27, official-docs)
+## Cross-References
+- `rules/hatch3r-component-conventions.md` — four-state surface contract maps to Compose `StateFlow<UiState>` patterns.
+- `rules/hatch3r-testing.md` — coverage thresholds and determinism rules apply to JUnit / Compose tests.
+- `rules/hatch3r-accessibility-standards.md` — WCAG mapping for Compose `semantics { ... }` modifiers.