npm - hatch3r - Versions diffs - 1.1.0 → 1.3.0 - Mend

hatch3r 1.1.0 → 1.3.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (146) hide show

package/README.md +109 -364
package/agents/hatch3r-a11y-auditor.md +8 -8
package/agents/hatch3r-architect.md +2 -4
package/agents/hatch3r-ci-watcher.md +2 -4
package/agents/hatch3r-context-rules.md +2 -4
package/agents/hatch3r-dependency-auditor.md +5 -7
package/agents/hatch3r-devops.md +2 -4
package/agents/hatch3r-docs-writer.md +2 -4
package/agents/hatch3r-fixer.md +2 -0
package/agents/hatch3r-implementer.md +32 -0
package/agents/hatch3r-learnings-loader.md +189 -13
package/agents/hatch3r-lint-fixer.md +3 -14
package/agents/hatch3r-perf-profiler.md +2 -4
package/agents/hatch3r-researcher.md +247 -0
package/agents/hatch3r-reviewer.md +76 -7
package/agents/hatch3r-security-auditor.md +4 -7
package/agents/hatch3r-test-writer.md +3 -11
package/agents/modes/architecture.md +44 -0
package/agents/modes/boundary-analysis.md +45 -0
package/agents/modes/codebase-impact.md +81 -0
package/agents/modes/complexity-risk.md +40 -0
package/agents/modes/coverage-analysis.md +44 -0
package/agents/modes/current-state.md +52 -0
package/agents/modes/feature-design.md +39 -0
package/agents/modes/impact-analysis.md +45 -0
package/agents/modes/library-docs.md +31 -0
package/agents/modes/migration-path.md +55 -0
package/agents/modes/prior-art.md +31 -0
package/agents/modes/refactoring-strategy.md +55 -0
package/agents/modes/regression.md +45 -0
package/agents/modes/requirements-elicitation.md +68 -0
package/agents/modes/risk-assessment.md +41 -0
package/agents/modes/risk-prioritization.md +43 -0
package/agents/modes/root-cause.md +39 -0
package/agents/modes/similar-implementation.md +70 -0
package/agents/modes/symptom-trace.md +39 -0
package/agents/modes/test-pattern.md +61 -0
package/agents/shared/external-knowledge.md +11 -0
package/commands/board/pickup-azure-devops.md +81 -0
package/commands/board/pickup-delegation-multi.md +197 -0
package/commands/board/pickup-delegation.md +100 -0
package/commands/board/pickup-github.md +82 -0
package/commands/board/pickup-gitlab.md +81 -0
package/commands/board/pickup-modes.md +143 -0
package/commands/board/pickup-post-impl.md +120 -0
package/commands/board/shared-azure-devops.md +149 -0
package/commands/board/shared-board-overview.md +215 -0
package/commands/board/shared-github.md +169 -0
package/commands/board/shared-gitlab.md +142 -0
package/commands/hatch3r-agent-customize.md +3 -2
package/commands/hatch3r-api-spec.md +1 -0
package/commands/hatch3r-benchmark.md +1 -0
package/commands/hatch3r-board-fill.md +15 -16
package/commands/hatch3r-board-groom.md +50 -10
package/commands/hatch3r-board-init.md +1 -0
package/commands/hatch3r-board-pickup.md +44 -572
package/commands/hatch3r-board-refresh.md +31 -10
package/commands/hatch3r-board-shared.md +87 -439
package/commands/hatch3r-bug-plan.md +1 -0
package/commands/hatch3r-codebase-map.md +1 -0
package/commands/hatch3r-command-customize.md +1 -0
package/commands/hatch3r-context-health.md +23 -2
package/commands/hatch3r-cost-tracking.md +15 -0
package/commands/hatch3r-debug.md +1 -0
package/commands/hatch3r-dep-audit.md +2 -1
package/commands/hatch3r-feature-plan.md +1 -0
package/commands/hatch3r-healthcheck.md +2 -1
package/commands/hatch3r-hooks.md +1 -0
package/commands/hatch3r-learn.md +69 -2
package/commands/hatch3r-migration-plan.md +1 -0
package/commands/hatch3r-onboard.md +1 -0
package/commands/hatch3r-project-spec.md +1 -0
package/commands/hatch3r-quick-change.md +1 -0
package/commands/hatch3r-recipe.md +1 -0
package/commands/hatch3r-refactor-plan.md +1 -0
package/commands/hatch3r-release.md +2 -1
package/commands/hatch3r-revision.md +1 -0
package/commands/hatch3r-roadmap.md +8 -1
package/commands/hatch3r-rule-customize.md +1 -0
package/commands/hatch3r-security-audit.md +2 -1
package/commands/hatch3r-skill-customize.md +1 -0
package/commands/hatch3r-test-plan.md +532 -0
package/commands/hatch3r-workflow.md +1 -0
package/dist/cli/index.js +4735 -1426
package/dist/cli/index.js.map +1 -1
package/github-agents/hatch3r-docs-agent.md +1 -0
package/github-agents/hatch3r-lint-agent.md +1 -0
package/github-agents/hatch3r-security-agent.md +1 -0
package/github-agents/hatch3r-test-agent.md +1 -0
package/hooks/hatch3r-ci-failure.md +1 -0
package/hooks/hatch3r-file-save.md +1 -0
package/hooks/hatch3r-post-merge.md +1 -0
package/hooks/hatch3r-pre-commit.md +1 -0
package/hooks/hatch3r-pre-push.md +1 -0
package/hooks/hatch3r-session-start.md +1 -0
package/package.json +2 -2
package/prompts/hatch3r-bug-triage.md +1 -0
package/prompts/hatch3r-code-review.md +1 -0
package/prompts/hatch3r-pr-description.md +1 -0
package/rules/hatch3r-accessibility-standards.md +1 -0
package/rules/hatch3r-agent-orchestration.md +289 -73
package/rules/hatch3r-api-design.md +1 -0
package/rules/hatch3r-browser-verification.md +1 -0
package/rules/hatch3r-ci-cd.md +1 -0
package/rules/hatch3r-code-standards.md +9 -0
package/rules/hatch3r-component-conventions.md +1 -0
package/rules/hatch3r-data-classification.md +1 -0
package/rules/hatch3r-deep-context.md +1 -0
package/rules/hatch3r-dependency-management.md +13 -0
package/rules/hatch3r-feature-flags.md +1 -0
package/rules/hatch3r-git-conventions.md +1 -0
package/rules/hatch3r-i18n.md +1 -0
package/rules/hatch3r-learning-consult.md +1 -0
package/rules/hatch3r-migrations.md +12 -0
package/rules/hatch3r-observability.md +290 -0
package/rules/hatch3r-performance-budgets.md +1 -0
package/rules/hatch3r-secrets-management.md +1 -0
package/rules/hatch3r-security-patterns.md +12 -0
package/rules/hatch3r-testing.md +1 -0
package/rules/hatch3r-theming.md +1 -0
package/rules/hatch3r-tooling-hierarchy.md +1 -0
package/skills/hatch3r-a11y-audit/SKILL.md +1 -0
package/skills/hatch3r-agent-customize/SKILL.md +1 -0
package/skills/hatch3r-api-spec/SKILL.md +1 -0
package/skills/hatch3r-architecture-review/SKILL.md +1 -0
package/skills/hatch3r-bug-fix/SKILL.md +1 -0
package/skills/hatch3r-ci-pipeline/SKILL.md +1 -0
package/skills/hatch3r-command-customize/SKILL.md +1 -0
package/skills/hatch3r-context-health/SKILL.md +1 -0
package/skills/hatch3r-cost-tracking/SKILL.md +1 -0
package/skills/hatch3r-dep-audit/SKILL.md +2 -1
package/skills/hatch3r-feature/SKILL.md +1 -0
package/skills/hatch3r-gh-agentic-workflows/SKILL.md +1 -0
package/skills/hatch3r-incident-response/SKILL.md +1 -0
package/skills/hatch3r-issue-workflow/SKILL.md +1 -0
package/skills/hatch3r-logical-refactor/SKILL.md +1 -0
package/skills/hatch3r-migration/SKILL.md +1 -0
package/skills/hatch3r-perf-audit/SKILL.md +1 -0
package/skills/hatch3r-pr-creation/SKILL.md +1 -0
package/skills/hatch3r-qa-validation/SKILL.md +1 -0
package/skills/hatch3r-recipe/SKILL.md +1 -0
package/skills/hatch3r-refactor/SKILL.md +1 -0
package/skills/hatch3r-release/SKILL.md +1 -0
package/skills/hatch3r-rule-customize/SKILL.md +1 -0
package/skills/hatch3r-skill-customize/SKILL.md +1 -0
package/skills/hatch3r-visual-refactor/SKILL.md +1 -0

package/github-agents/hatch3r-docs-agent.md CHANGED Viewed

@@ -2,6 +2,7 @@
 name: hatch3r-docs-agent
 description: Technical writer who maintains specs, ADRs, and documentation
 # Simplified agent for GitHub Copilot/Codex
+tags: [team, devops]
 ---
 You are an expert technical writer for the project.

package/github-agents/hatch3r-lint-agent.md CHANGED Viewed

@@ -2,6 +2,7 @@
 name: hatch3r-lint-agent
 description: Code quality enforcer who fixes style, formatting, and type issues
 # Simplified agent for GitHub Copilot/Codex
+tags: [team, devops]
 ---
 You are a code quality engineer for the project.

package/github-agents/hatch3r-security-agent.md CHANGED Viewed

@@ -2,6 +2,7 @@
 name: hatch3r-security-agent
 description: Security analyst who audits code, rules, and data flows
 # Simplified agent for GitHub Copilot/Codex
+tags: [team, devops]
 ---
 You are an expert security analyst for the project.

package/github-agents/hatch3r-test-agent.md CHANGED Viewed

@@ -2,6 +2,7 @@
 name: hatch3r-test-agent
 description: QA engineer who writes and maintains tests
 # Simplified agent for GitHub Copilot/Codex
+tags: [team, devops]
 ---
 You are an expert QA engineer for the project.

package/hooks/hatch3r-ci-failure.md CHANGED Viewed

@@ -4,6 +4,7 @@ type: hook
 event: ci-failure
 agent: ci-watcher
 description: Diagnose CI pipeline failures
+tags: [core]
 ---
 # Hook: ci-failure → ci-watcher

package/hooks/hatch3r-file-save.md CHANGED Viewed

@@ -5,6 +5,7 @@ event: file-save
 agent: context-rules
 description: Activate context-specific rules on file save
 globs: "**/*.ts, **/*.tsx, **/*.js, **/*.jsx"
+tags: [core]
 ---
 # Hook: file-save → context-rules

package/hooks/hatch3r-post-merge.md CHANGED Viewed

@@ -4,6 +4,7 @@ type: hook
 event: post-merge
 agent: ci-watcher
 description: Check CI pipeline status after merge
+tags: [core]
 ---
 # Hook: post-merge → ci-watcher

package/hooks/hatch3r-pre-commit.md CHANGED Viewed

@@ -5,6 +5,7 @@ event: pre-commit
 agent: lint-fixer
 description: Auto-fix lint and formatting issues before commit
 globs: "**/*.ts, **/*.tsx, **/*.js, **/*.jsx"
+tags: [core]
 ---
 # Hook: pre-commit → lint-fixer

package/hooks/hatch3r-pre-push.md CHANGED Viewed

@@ -4,6 +4,7 @@ type: hook
 event: pre-push
 agent: security-auditor
 description: Scan for secrets and security issues before push
+tags: [core]
 ---
 # Hook: pre-push → security-auditor

package/hooks/hatch3r-session-start.md CHANGED Viewed

@@ -4,6 +4,7 @@ type: hook
 event: session-start
 agent: learnings-loader
 description: Load relevant learnings at session start
+tags: [core]
 ---
 # Hook: session-start → learnings-loader

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "hatch3r",
-  "version": "1.1.0",
+  "version": "1.3.0",
   "description": "Battle-tested agentic coding setup framework. One command to hatch your agent stack -- agents, skills, rules, commands, and MCP for every major AI coding tool.",
   "type": "module",
   "bin": {
@@ -40,7 +40,7 @@
     "type": "git",
     "url": "https://github.com/hatch3r/hatch3r.git"
   },
-  "homepage": "https://github.com/hatch3r/hatch3r#readme",
+  "homepage": "https://docs.hatch3r.com",
   "bugs": {
     "url": "https://github.com/hatch3r/hatch3r/issues"
   },

package/prompts/hatch3r-bug-triage.md CHANGED Viewed

@@ -2,6 +2,7 @@
 id: hatch3r-bug-triage
 type: prompt
 description: Triage a bug report and suggest investigation steps
+tags: [core]
 ---
 # Bug Triage

package/prompts/hatch3r-code-review.md CHANGED Viewed

@@ -2,6 +2,7 @@
 id: hatch3r-code-review
 type: prompt
 description: Review code changes for quality, security, and correctness
+tags: [core]
 ---
 # Code Review

package/prompts/hatch3r-pr-description.md CHANGED Viewed

@@ -2,6 +2,7 @@
 id: hatch3r-pr-description
 type: prompt
 description: Generate a pull request description from staged changes
+tags: [core]
 ---
 # PR Description

package/rules/hatch3r-accessibility-standards.md CHANGED Viewed

@@ -3,6 +3,7 @@ id: hatch3r-accessibility-standards
 type: rule
 description: Accessibility standards covering WCAG 2.2 AA compliance, keyboard navigation, screen readers, and ARIA patterns
 scope: always
+tags: [a11y]
 ---
 # Accessibility Standards

package/rules/hatch3r-agent-orchestration.md CHANGED Viewed

@@ -3,11 +3,16 @@ id: hatch3r-agent-orchestration
 type: rule
 description: Mandatory agent delegation, skill loading, and subagent usage directives for ALL tasks in ALL contexts
 scope: always
+tags: [core]
 ---
 # Agent Orchestration
 This rule governs when and how to delegate work to hatch3r agents, load skills, and spawn subagents. These directives are mandatory — not suggestions.
+## Orchestration Differentiation
+Hatch3r's orchestration is not free-form agent chat. It differs from simpler approaches in three structural ways: (1) a **phase-gated pipeline** that enforces Research, Implement, Review, and Quality as distinct stages with explicit entry/exit criteria; (2) **structured handoffs** between phases via the `PipelineContext` schema, ensuring no context is lost or fabricated between agents; and (3) a **mandatory review gate** before the quality phase, preventing untested or unreviewed code from reaching final quality checks.
 ## Universal Applicability
 This rule applies to EVERY context without exception:
@@ -22,83 +27,33 @@ Whether the user invokes a command or simply asks for a task in conversation, th
 ## Universal Sub-Agent Pipeline
-Every task MUST follow this four-phase pipeline:
-**Phase 1 — Research:** Spawn `hatch3r-researcher` for context gathering. Skip only for trivial single-line edits (typos, comment fixes, single-value config changes). All other tasks require researcher context. **Before spawning researchers, score the task's complexity per the `hatch3r-deep-context` rule** and add the tier-appropriate researcher modes alongside the standard task-type modes (see Deep Context Integration below).
-**Phase 2 — Implement:** Spawn `hatch3r-implementer` for ALL code changes. One dedicated implementer per task. Never implement inline — always delegate via the Task tool. **Include reference conventions, resolved requirements, and blast radius data** from Phase 1 in the implementer prompt when available (see Deep Context Integration below).
-**Phase 3 — Review Loop:**
-- 3a. Spawn `hatch3r-reviewer` to review the implementation.
-- 3b. If Critical or Warning findings exist: spawn `hatch3r-fixer` with the reviewer output.
-- 3c. Re-review: spawn `hatch3r-reviewer` on the fixed code.
-- 3d. Repeat 3b–3c until the reviewer reports 0 Critical + 0 Warning, or max 3 iterations reached.
-- 3e. If max iterations reached with remaining findings: surface to user for manual resolution.
-**Phase 4 — Final Quality** (runs ONLY after the review loop is clean):
-Spawn all applicable specialists in parallel:
-| Specialist | When | Mandatory? |
-|-----------|------|------------|
-| `hatch3r-test-writer` | After every code change | YES — always for code changes |
-| `hatch3r-security-auditor` | After every code change | YES — always for code changes |
-| `hatch3r-docs-writer` | After every implementation | EVALUATE — spawn when changes affect APIs, architecture, user-facing behavior, or when specs/ADRs need updating |
-| `hatch3r-lint-fixer` | When lint errors present | Conditional |
-| `hatch3r-a11y-auditor` | When UI/accessibility changes | Conditional |
-| `hatch3r-perf-profiler` | When performance-sensitive changes | Conditional |
-| `hatch3r-dependency-auditor` | When dependencies change | Conditional |
-| `hatch3r-ci-watcher` | When CI fails | Conditional |
-| `hatch3r-architect` | When architectural decisions are needed or system design review is requested | Conditional |
-| `hatch3r-devops` | When CI/CD, deployment, or infrastructure tasks are involved | Conditional |
+Every task MUST follow this four-phase pipeline: **Phase 1 — Research** (context gathering via `hatch3r-researcher`), **Phase 2 — Implement** (code changes via `hatch3r-implementer`), **Phase 3 — Review Loop** (review/fix cycle via `hatch3r-reviewer` and `hatch3r-fixer`), **Phase 4 — Final Quality** (parallel specialists after review is clean). See **Mandatory Delegation Directives** below for full phase definitions, entry/exit criteria, and specialist invocation rules.
 ## Agent Roster
 | Agent | Purpose | Invoke When |
 |-------|---------|-------------|
-| `hatch3r-researcher` | Context gathering across 15 research modes | ALWAYS before implementation. Skip only for trivial single-line edits. Select modes by task type + tier-appropriate deep context modes. |
-| `hatch3r-implementer` | Focused single-task implementation | ALWAYS. One dedicated implementer per task — standalone issues, epic sub-issues, batched issues, and plain chat tasks all get dedicated implementers. |
-| `hatch3r-reviewer` | Code review for quality, security, performance | ALWAYS in review loop (Phase 3). Reviews implementation, then re-reviews after fixes. |
-| `hatch3r-fixer` | Targeted fixes for reviewer findings | When `hatch3r-reviewer` reports Critical or Warning findings during the review loop (Phase 3). |
-| `hatch3r-test-writer` | Regression and coverage tests | ALWAYS for code changes in final quality (Phase 4). Not just bugs — every code change gets tests. |
-| `hatch3r-security-auditor` | Security rules, data flows, access control | ALWAYS for code changes in final quality (Phase 4). Not just `area:security` — every code change gets a security review. |
-| `hatch3r-docs-writer` | Specs, ADRs, documentation maintenance | ALWAYS evaluate in final quality (Phase 4). Spawn when changes affect APIs, architecture, or user-facing behavior. |
-| `hatch3r-lint-fixer` | Style, formatting, type error cleanup | After implementation when lint errors are present. |
-| `hatch3r-a11y-auditor` | WCAG AA compliance checks | When UI/accessibility changes are made. |
-| `hatch3r-perf-profiler` | Performance profiling and optimization | When performance-sensitive changes are made. |
-| `hatch3r-dependency-auditor` | Supply chain security, CVE scanning | When dependencies change or new packages are added. |
-| `hatch3r-ci-watcher` | CI/CD failure diagnosis and fix suggestions | When CI fails during or after implementation. |
-| `hatch3r-architect` | Architecture design, system design review, technical decision documentation | When architectural decisions are needed or system design review is requested. |
-| `hatch3r-devops` | CI/CD pipeline operations, deployment configuration, infrastructure setup | When CI/CD, deployment, or infrastructure tasks are involved. |
+| `hatch3r-researcher` | Context gathering (15 modes) | Always — before implementation (skip trivial edits) |
+| `hatch3r-implementer` | Single-task implementation | Always — one per task |
+| `hatch3r-reviewer` | Code review | Always — Phase 3 review loop |
+| `hatch3r-fixer` | Fix reviewer findings | Phase 3 — Critical/Warning findings |
+| `hatch3r-test-writer` | Tests | Always — Phase 4 (every code change) |
+| `hatch3r-security-auditor` | Security review | Always — Phase 4 (every code change) |
+| `hatch3r-docs-writer` | Documentation | Phase 4 — evaluate when APIs/architecture/UX affected |
+| `hatch3r-lint-fixer` | Lint/type fixes | Conditional — lint errors present |
+| `hatch3r-a11y-auditor` | WCAG AA checks | Conditional — UI/accessibility changes |
+| `hatch3r-perf-profiler` | Performance profiling | Conditional — performance-sensitive changes |
+| `hatch3r-dependency-auditor` | CVE/supply chain | Conditional — dependencies change |
+| `hatch3r-ci-watcher` | CI failure diagnosis | Conditional — CI fails |
+| `hatch3r-architect` | Architecture design | Conditional — architectural decisions needed |
+| `hatch3r-devops` | CI/CD and deployment | Conditional — infrastructure tasks |
 ## Deep Context Integration
-Before spawning researchers in Phase 1, score the task's complexity using the `hatch3r-deep-context` rule criteria. The resulting tier determines which additional researcher modes to include alongside the standard task-type modes.
-### Tier-Adjusted Research Modes
+Score task complexity per the `hatch3r-deep-context` rule (always-loaded) before Phase 1. That rule defines the full tier criteria, researcher modes per tier, and implementer enrichment fields. Apply the resulting tier as follows:
-**Tier 1 (Light — score 0–2):** Use only the standard task-type modes below. No additional modes.
-**Tier 2 (Standard — score 3–5):** Add these modes at `quick` depth alongside the task-type modes:
-- `requirements-elicitation` — scan for top ambiguities, ask 3–5 clarifying questions
-- `similar-implementation` — find 1 reference implementation, extract top-level patterns
-Present the elicitation questions to the user inline. Await answers before proceeding to Phase 2.
-**Tier 3 (Deep — score 6+):** Add these modes at `deep` depth alongside the task-type modes:
-- `requirements-elicitation` — full 10-dimension ambiguity scan, dependency questions, cross-cutting concern checklist
-- `similar-implementation` — find 2–3 references, full convention extraction, divergence analysis
-- `codebase-impact` at `deep` depth (with transitive tracing, API consumer map, blast radius)
-**Mandatory Tier 3 checkpoint:** Present a consolidated Pre-Implementation Summary to the user and ASK for confirmation. Do NOT proceed to Phase 2 until all unresolved questions are answered.
-### Implementer Prompt Enrichment
-When spawning `hatch3r-implementer` in Phase 2, include the following from Phase 1 results when available:
-- **Reference Conventions**: `similar-implementation` output — the implementer uses this in its Convention Lock step (Step 1b)
-- **Resolved Requirements**: User's answers to `requirements-elicitation` questions — explicit decisions the implementer should follow instead of guessing
-- **Blast Radius**: Enhanced `codebase-impact` output with transitive traces and API consumer maps — informs which consumers and contracts must be preserved
+- **Tier 2 (Standard):** Present elicitation questions to the user inline. Await answers before proceeding to Phase 2.
+- **Tier 3 (Deep):** Present a consolidated Pre-Implementation Summary and ASK for confirmation. Do NOT proceed to Phase 2 until all unresolved questions are answered.
 ## Mandatory Delegation Directives
@@ -113,6 +68,17 @@ You MUST spawn a `hatch3r-researcher` subagent before implementing any task. Ski
 Use depth `quick` for low-risk tasks, `standard` for medium-risk, `deep` for high-risk. The `hatch3r-deep-context` tier may override depth upward (e.g., a Tier 3 task always uses `deep` depth for the additional modes, even if the task-type modes use `standard`).
+### Research Completeness Checklist
+Before handing off from Phase 1 (Research) to Phase 2 (Implement), the researcher output MUST be verified against this completeness checklist. Do NOT proceed to implementation until all items are confirmed:
+- [ ] **All affected files identified** — every file that will be created, modified, or deleted is listed explicitly.
+- [ ] **Blast radius assessed** — downstream consumers, dependents, and integration points that could break are documented.
+- [ ] **Existing tests located** — relevant test files and test cases that cover the affected code are identified (or absence of coverage is noted).
+- [ ] **Dependencies mapped** — internal module dependencies and external package dependencies relevant to the change are enumerated.
+If any item cannot be confirmed, the researcher MUST flag the gap and the orchestrator MUST either: (a) re-run the researcher with additional modes targeting the gap, or (b) surface the gap to the user for manual input before proceeding.
 ### Implementation Delegation
 You MUST spawn a `hatch3r-implementer` subagent via the Task tool for ALL code changes. Never implement inline.
@@ -127,19 +93,29 @@ You MUST spawn a `hatch3r-implementer` subagent via the Task tool for ALL code c
 - Resolved `requirements-elicitation` answers as "Resolved Requirements"
 - Enhanced `codebase-impact` blast radius data (Tier 3 only)
+### Per-Task Mini-Review
+When a single implementation involves multiple sub-tasks (e.g., an epic with ordered steps, a feature requiring schema change + service layer + UI), the implementer MUST perform a lightweight mini-review after completing each sub-task before starting the next:
+1. **Verify sub-task correctness** — confirm the sub-task's output compiles/parses without errors and meets its local acceptance criteria.
+2. **Check interface contracts** — ensure any interfaces, types, or contracts introduced or modified by the sub-task are consistent with what subsequent sub-tasks will consume.
+3. **Validate no regressions** — confirm the sub-task has not broken existing functionality visible at that point (e.g., existing tests still pass if applicable).
+4. **Gate progression** — if the mini-review surfaces issues, fix them before moving to the next sub-task. Do not accumulate debt across sub-tasks.
+Mini-reviews are internal to the implementer and do not require spawning a separate reviewer agent. They are lighter weight than the full Phase 3 review loop, which still runs after all sub-tasks are complete.
 ### Post-Implementation Quality Pipeline
 You MUST run the review loop and final quality phases after implementation completes.
 **Phase 3 — Review Loop:**
-1. Spawn `hatch3r-reviewer` — code review. Include the diff and acceptance criteria in the prompt.
-2. If the reviewer reports Critical or Warning findings: spawn `hatch3r-fixer` with the full reviewer output (findings, file paths, line references, suggested fixes). When fixes touch shared or public interfaces, also include:
-   - **Blast radius data** from Phase 1 (if available) — so the fixer knows which consumers and contracts must be preserved.
-   - **Reference conventions** from Phase 1 (if available) — so the fixer maintains established patterns when applying fixes.
+1. Spawn `hatch3r-reviewer` — code review. Include the diff and acceptance criteria in the prompt. The reviewer MUST include a **blast radius summary** in its output: number of files changed, number of lines added/removed, and whether any public APIs (exported interfaces, route signatures, event schemas) were changed. This summary gives the orchestrator and the user a quick gauge of change scope and risk.
+2. If the reviewer reports Critical or Warning findings: spawn `hatch3r-fixer` with the full reviewer output (findings, file paths, line references, suggested fixes). When fixes touch shared or public interfaces, also include deep context enrichment (blast radius data, reference conventions) per the Implementation Delegation section above.
 3. After fixes: spawn `hatch3r-reviewer` again to re-review the fixed code.
 4. Repeat steps 2–3 until the reviewer reports 0 Critical + 0 Warning, or max 3 iterations reached.
-5. If max iterations reached with remaining findings: surface to user for manual resolution. Do not proceed to Phase 4 until the user acknowledges.
+5. **Confirmation pass** — after the reviewer reports 0 Critical + 0 Warning, run one final lightweight re-review. This confirmation pass focuses ONLY on: (1) the reviewer's own fix-driven changes were not missed or introduced new issues, (2) no accidental regressions in adjacent code touched by fixes, (3) all acceptance criteria are fully met. If the confirmation pass surfaces new Critical or Warning findings, route them back through steps 2–4 (these iterations count toward the max 3 cap).
+6. If max iterations reached with remaining findings: surface to user for manual resolution. Do not proceed to Phase 4 until the user acknowledges.
 **Phase 4 — Final Quality** (runs ONLY after the review loop is clean):
@@ -163,6 +139,22 @@ Launch as many independent subagents in parallel as the platform supports — no
 8. `hatch3r-architect` — when architectural decisions are needed or system design review is requested.
 9. `hatch3r-devops` — when CI/CD, deployment, or infrastructure tasks are involved.
+### Specialist Success Criteria
+Each Phase 4 specialist agent has a defined success criterion. The specialist's output is considered successful only when its criterion is met. If not met, the orchestrator MUST surface the gap to the user.
+| Specialist | Success Criterion |
+|-----------|-------------------|
+| `hatch3r-test-writer` | All new and modified code paths have corresponding tests; no untested branches remain in changed files. |
+| `hatch3r-security-auditor` | No HIGH or CRITICAL severity findings remain unresolved; all MEDIUM findings are documented with remediation plan. |
+| `hatch3r-docs-writer` | All affected APIs, architectural changes, and user-facing behavior changes are reflected in documentation. |
+| `hatch3r-lint-fixer` | Zero lint errors and zero type errors in all changed files. |
+| `hatch3r-a11y-auditor` | All changed UI components meet WCAG AA compliance; no new accessibility violations introduced. |
+| `hatch3r-perf-profiler` | No performance regressions detected; any new hot paths are documented with benchmark baselines. |
+| `hatch3r-dependency-auditor` | No known CVEs in added or updated dependencies; license compatibility verified. |
+| `hatch3r-architect` | Architectural decisions are documented in ADRs; design aligns with existing system patterns or divergence is justified. |
+| `hatch3r-devops` | CI/CD pipeline passes end-to-end; deployment configuration is validated against target environment. |
 ## Skill Loading Directives
 Before implementing any task, you MUST read and follow the matching hatch3r skill:
@@ -191,6 +183,180 @@ When spawning any subagent via the Task tool:
 3. **Launch as many independent subagents in parallel as the platform supports.** Do not impose an artificial concurrency limit. Use maximum parallelism for independent work.
 4. **Await and review results** before proceeding. If a subagent reports BLOCKED or PARTIAL, surface to the user.
+## Correlation ID
+The orchestrator MUST generate a unique correlation ID (UUID v4 or equivalent) for each top-level task at the start of the pipeline. This ID enables end-to-end tracing across multi-agent workflows.
+1. **Generation**: Create one correlation ID per top-level task before Phase 1 begins. Format: UUID v4 (e.g., `550e8400-e29b-41d4-a716-446655440000`).
+2. **Propagation**: Include the correlation ID in every subagent prompt — researchers, implementers, reviewers, fixers, and all Phase 4 specialists. Pass it as a top-level field: `correlation_id: "<value>"`.
+3. **Usage in subagents**: All subagents MUST include the correlation ID in any logs, error messages, structured outputs, or status reports they produce. This applies to both success and failure paths.
+4. **Scope**: One correlation ID per top-level task. Epic sub-issues each get their own correlation ID. Batch tasks share one correlation ID per batch but include a sub-task index (e.g., `correlation_id: "<uuid>", sub_task: 2`).
+## Severity Scale
+All agents across the pipeline MUST use this canonical severity scale when classifying findings, issues, or audit results. This ensures consistent triage and gating across phases.
+| Severity | Definition | Pipeline Action |
+|----------|-----------|-----------------|
+| **CRITICAL** | Blocks merge; must fix immediately. Security vulnerabilities, data loss risks, broken core functionality. | Merge is blocked. Findings must be resolved before the pipeline can proceed past Phase 3. |
+| **HIGH** | Should fix before merge. Significant bugs, performance regressions, incomplete acceptance criteria. | Strongly recommended to fix before merge. Escalate to user if the fix is deferred. |
+| **MEDIUM** | Fix in same sprint. Code quality issues, minor bugs, non-critical security findings. | Document with a remediation plan. May merge with tracking issue created. |
+| **LOW** | Track for future. Style nits, minor refactoring opportunities, non-blocking improvements. | Log in findings summary. No merge gate. |
+| **INFO** | Informational only. Observations, suggestions, context for future work. | Include in output for awareness. No action required. |
+All subagents — reviewers, security auditors, test writers, and other specialists — MUST map their findings to this scale. When a subagent uses a different internal scale, it MUST translate to this canonical scale in its output.
+## Pipeline Context
+The orchestrator MUST maintain a `PipelineContext` object throughout the pipeline lifecycle. This object serves as the data contract between pipeline phases, ensuring structured handoff of findings, decisions, and artifacts.
+### PipelineContext Schema
+```
+PipelineContext {
+  correlationId: string           // UUID v4 from the Correlation ID directive
+  phase: "research" | "implement" | "review" | "quality"  // Current active phase
+  findings: Finding[]             // Accumulated findings from all phases
+  decisions: Decision[]           // Decisions made during the pipeline (user answers, trade-offs, overrides)
+  artifacts: string[]             // File paths created or modified during the pipeline
+}
+Finding {
+  id: string                      // Unique finding identifier (e.g., "F-001")
+  phase: string                   // Phase that produced the finding
+  agent: string                   // Agent that produced the finding
+  severity: "CRITICAL" | "HIGH" | "MEDIUM" | "LOW" | "INFO"  // Per Severity Scale
+  description: string             // Human-readable finding description
+  filePath?: string               // Affected file, if applicable
+  resolved: boolean               // Whether the finding has been addressed
+}
+Decision {
+  id: string                      // Unique decision identifier (e.g., "D-001")
+  phase: string                   // Phase where the decision was made
+  description: string             // What was decided
+  rationale: string               // Why this option was chosen
+  madeBy: "user" | "agent"        // Who made the decision
+}
+```
+### Phase Handoff Metadata
+When transitioning between pipeline phases, the orchestrator MUST include the following metadata fields in each handoff to enable traceability and performance analysis:
+- `timestamp` -- ISO 8601 timestamp of the handoff event
+- `agentId` -- identifier of the agent completing the phase (e.g., `hatch3r-researcher`, `hatch3r-implementer`)
+- `phase` -- the phase being completed (e.g., `research`, `implement`, `review`, `quality`)
+- `duration` -- elapsed time in seconds for the completed phase
+- `filesModified` -- list of file paths created, modified, or deleted during the phase
+These fields are appended to the `PipelineContext` at each phase transition, providing a structured audit trail of which agent did what, when, and for how long.
+### Context Caching
+When multiple agents need the same context (e.g., project structure, test results, blast radius data, reference conventions), cache it in the shared `PipelineContext` rather than having each agent re-read or re-compute it independently. Specifically:
+- Research output from Phase 1 (file lists, dependency maps, convention extractions) should be stored once and passed by reference to the implementer, reviewer, and any Phase 4 specialists that need it.
+- Test suite results captured during implementation verification should be cached and forwarded to the reviewer and test-writer rather than re-running the full suite in each phase.
+- This reduces redundant file reads, avoids inconsistencies from reading files at different points in time, and conserves token budget across subagent prompts.
+### Cache Enforcement
+The orchestrator MUST enforce caching at each phase transition. Caching is not optional guidance -- it is a pipeline invariant.
+1. **Pre-handoff cache check**: Before spawning any downstream subagent, the orchestrator MUST verify that all cacheable outputs from prior phases are stored in `PipelineContext`. If a cacheable output is missing, the orchestrator MUST populate it before proceeding. Cacheable outputs include:
+   - Phase 1: file lists, dependency maps, convention extractions, blast radius data
+   - Phase 2: test suite results, modified file list, build output
+   - Phase 3: reviewer findings, fixer diffs, resolved/unresolved finding status
+2. **No redundant reads**: If a subagent prompt would include context that exists in the cache, the orchestrator MUST pass the cached version. Subagents MUST NOT re-read files or re-run commands whose results are already cached and fresh (per Cache Verification above).
+3. **Cache population logging**: Log every cache write with the key and size: `"Cache WRITE <cache_key>: <token_estimate> tokens"`. This provides visibility into which data is being cached and its cost.
+4. **Enforcement violation**: If a subagent re-reads or re-computes data that was available in the cache, log a warning: `"Cache BYPASS detected: <agent> re-computed <cache_key> instead of using cached value"`. This warning is informational (severity INFO) and does not block the pipeline, but it flags an optimization gap for future runs.
+### PipelineContext Usage
+1. **Initialization**: The orchestrator creates a `PipelineContext` at the start of Phase 1 with the `correlationId` and `phase` set to `"research"`. All other fields are initialized as empty arrays.
+2. **Phase transitions**: When moving between phases, update the `phase` field. Do not clear previous phase data — findings and decisions accumulate across the full pipeline.
+3. **Subagent input**: Pass the current `PipelineContext` (or relevant subsets) to each subagent so it has full pipeline history.
+4. **Subagent output**: Each subagent appends its findings and decisions to the context. The orchestrator merges subagent outputs back into the canonical `PipelineContext`.
+5. **Final output**: The completed `PipelineContext` is included in the task summary, giving the user full traceability from research through quality.
+## Resilience Directives
+This section covers all failure/recovery paths — researcher failure, test failure, reviewer failure, and all other subagent failures.
+When a subagent fails (error, timeout, or BLOCKED status), apply the following retry-and-fallback protocol:
+1. **Retry once**: Re-send the same prompt to the same agent type exactly once. Do not modify the prompt on retry.
+2. **Fallback on second failure**: If the retry also fails, fall back to degraded mode for that phase:
+   - **Researcher failure** → Proceed to Phase 2 (Implement) without research context. Add a warning to the implementer prompt: `"WARNING: Research phase failed. Proceeding without research context. Exercise extra caution with assumptions."` The orchestrator should note this gap in the final output.
+   - **Reviewer failure** → Surface the raw diff to the user for manual review. Do not proceed to Phase 4 automatically.
+   - **Test-writer failure** → Flag the deliverable as "untested" in the PR description. Add label `needs-tests` if the platform supports it.
+   - **Fixer failure** → Surface the original reviewer findings to the user. Do not re-enter the review loop.
+   - **Security-auditor failure** → Flag as "security-unaudited" in the PR description. Add label `needs-security-review` if the platform supports it.
+   - **Other specialist failure** → Skip that specialist, document the gap in the final output (e.g., "docs-writer skipped due to failure").
+3. **Retry budget**: Maximum 3 total retries across all subagents per top-level task. Once the budget is exhausted, any subsequent failures go directly to fallback without retry.
+4. **Reporting**: Include all failures and fallbacks in the task summary so the user has full visibility into degraded phases.
+### Circuit Breaker Tracking
+The orchestrator MUST track consecutive failures per agent type and per pipeline phase to prevent repeated invocations of persistently failing agents.
+1. **Tracking**: Maintain a per-agent failure counter that increments on each consecutive failure (error, timeout, or BLOCKED) and resets to zero on any success.
+2. **Trip threshold**: After **3 consecutive failures** for the same agent type within a single pipeline run, mark that agent as **"tripped"** and skip all subsequent invocations of it for the remainder of the task.
+3. **State transitions**: Log every circuit breaker state change with the correlation ID, agent type, and transition:
+   - `CLOSED → OPEN` — agent tripped after 3 consecutive failures. Log: `"Circuit breaker OPEN for <agent>: <failure_count> consecutive failures"`.
+   - `OPEN → HALF-OPEN` — cooldown period elapsed or manual reset issued. Log: `"Circuit breaker HALF-OPEN for <agent>: attempting probe"`.
+   - `HALF-OPEN → CLOSED` — probe invocation succeeded. Log: `"Circuit breaker CLOSED for <agent>: probe succeeded"`.
+   - `HALF-OPEN → OPEN` — probe invocation failed. Log: `"Circuit breaker re-OPEN for <agent>: probe failed"`.
+4. **Skipping tripped agents**: When an agent is tripped, apply its fallback behavior from the Resilience Directives above and note `"Skipped: circuit breaker OPEN"` in the task summary.
+5. **Reset policy**: A tripped agent can be re-enabled by either:
+   - **Manual reset** — the user explicitly requests retrying the agent (e.g., "retry the reviewer").
+   - **Cooldown period** — if the pipeline spans multiple top-level tasks in a session, a tripped agent automatically transitions to HALF-OPEN after **10 minutes** of inactivity. The next invocation is a probe: success closes the breaker; failure re-opens it.
+6. **Cross-task persistence**: Circuit breaker state persists within a session. If an agent trips during task A, it remains tripped for task B unless manually reset or the cooldown period has elapsed.
+### Stall Detection
+If an agent produces no output for 2 minutes, consider it stalled. The orchestrator MUST:
+1. **Log the stall** with the correlation ID, agent type, phase, and elapsed idle time: `"STALL detected for <agent> in <phase>: <elapsed>s with no output"`.
+2. **Terminate the stalled agent** and capture any partial output produced before the stall.
+3. **Retry once** by re-spawning the same agent type with the same prompt. If the retry also stalls, skip the agent and apply the relevant fallback from the Resilience Directives (e.g., proceed without research context, flag as untested).
+4. **Include a warning** in the `PipelineContext` noting the stall and whether the retry succeeded or the agent was skipped.
+A stalled invocation counts as a failure for both the retry budget and the circuit breaker failure counter.
+### Timeout Policy
+Each pipeline phase has an explicit time budget. If a phase exceeds its timeout, capture partial results and move to the next phase. Do not block the pipeline indefinitely.
+| Phase / Activity | Per-Item Timeout | Phase Total Timeout |
+|-----------------|-----------------|-------------------|
+| **Phase 1 — Research** | 5 minutes per file | 30 minutes total |
+| **Phase 2 — Implement** | 10 minutes per task | — |
+| **Phase 3 — Review Loop** | 5 minutes per review cycle | — |
+| **Phase 4 — Final Quality** | 5 minutes per specialist | — |
+**Timeout behavior:**
+1. **Partial capture**: When a timeout fires, the orchestrator MUST capture whatever output the subagent has produced so far. Partial research context, partial reviews, or partial test suites are preferable to no output.
+2. **Logging**: Log the timeout with the correlation ID, phase, agent, elapsed time, and whether partial results were captured: `"TIMEOUT in <phase> for <agent>: <elapsed>s elapsed, partial results captured: <yes/no>"`.
+3. **Phase advancement**: After capturing partial results, proceed to the next phase. Include a warning in downstream prompts: `"WARNING: <phase> timed out. Partial results only. Exercise extra caution."`.
+4. **Retry interaction**: A timed-out invocation counts as a failure for both the retry budget and the circuit breaker failure counter.
+### Observability Span Naming
+For observability, name tracing spans consistently using the pattern `hatch3r.{phase}.{agent}`. This convention enables filtering and aggregation across pipeline runs in any OpenTelemetry-compatible backend.
+Examples:
+- `hatch3r.research.researcher`
+- `hatch3r.implement.implementer`
+- `hatch3r.review.reviewer`
+- `hatch3r.review.fixer`
+- `hatch3r.quality.test-writer`
+- `hatch3r.quality.security-auditor`
+The orchestrator creates a root span `hatch3r.pipeline` for the full task, with child spans for each phase and grandchild spans for each agent invocation within that phase. Include the `correlationId` as a span attribute on every span.
 ## Single-Task Plain Chat Protocol
 When the user provides a single task in plain chat (no command invoked, no issue reference), the full sub-agent pipeline still applies:
@@ -224,6 +390,56 @@ When the user provides multiple tasks in a single message — numbered lists, co
 This directive applies regardless of whether board-pickup was invoked. Any context where implementation tasks are identified MUST use one subagent per task with maximum parallelism.
+## Auto-Mode Guardrails
+When agents run in auto-mode (unattended execution without real-time user oversight), the orchestrator MUST apply additional verification after each phase completes:
+1. **Scope containment** — verify the agent stayed within its declared scope. If a researcher was scoped to `codebase-impact`, it must not have performed `feature-design` work. If an implementer was scoped to specific files, it must not have modified files outside that set.
+2. **No destructive operations without prior approval** — verify the agent did not perform destructive operations (file deletions, database migrations, force-pushes, dependency removals) unless those operations were explicitly listed in the task prompt as approved actions. Any destructive operation not pre-approved MUST be flagged and rolled back before proceeding.
+3. **Output schema compliance** — verify all agent outputs match their expected schemas. Researcher output must contain the required sections for its modes. Implementer output must include changed file paths and acceptance criteria status. Reviewer output must use the canonical severity scale. Malformed outputs MUST trigger a retry or escalation, not silent acceptance.
+If any guardrail check fails, the orchestrator MUST halt the pipeline and surface the violation to the user (or to a persistent log if fully unattended) before continuing.
+## Status Codes
+All agents MUST use these canonical status codes when reporting task or phase outcomes. This ensures consistent interpretation across the pipeline.
+| Status | Meaning |
+|--------|---------|
+| **SUCCESS** | Task completed fully, all acceptance criteria met. |
+| **PARTIAL** | Task partially completed; some acceptance criteria met, others remain open or degraded. |
+| **FAILED** | Task could not be completed; no usable output produced. |
+| **SKIPPED** | Task was intentionally not executed (e.g., non-applicable phase, trivial edit bypass). |
+| **TIMEOUT** | Task exceeded its time budget; partial results may be available. |
+When a subagent returns PARTIAL or FAILED, it MUST include a `reason` field explaining what succeeded and what did not. When a subagent returns TIMEOUT, any captured partial output MUST be forwarded to the next phase.
 ## Rule Application
 All hatch3r rules with `scope: always` apply to every implementation task, including work delegated to subagents. When constructing subagent prompts, include the rule directives — subagents do not automatically inherit the parent's rule context.
+### Tiered Rule Inclusion
+To manage token budgets when constructing subagent prompts, include rules in tiers. Higher tiers are only loaded when relevant to the specific agent or task phase.
+**Tier 1 -- Always include (every subagent prompt):**
+- `hatch3r-security-patterns` -- security invariants apply to all code changes
+- `hatch3r-code-standards` -- code quality conventions apply universally
+**Tier 2 -- Include by phase (match to the active agent):**
+- `hatch3r-testing` -- include for `hatch3r-test-writer`, `hatch3r-implementer`, `hatch3r-reviewer`
+- `hatch3r-accessibility-standards` -- include for `hatch3r-a11y-auditor`, `hatch3r-reviewer` (UI changes)
+- `hatch3r-git-conventions` -- include for orchestrator git operations
+- `hatch3r-ci-cd` -- include for `hatch3r-ci-watcher`, `hatch3r-devops`
+- `hatch3r-dependency-management` -- include for `hatch3r-dependency-auditor`
+**Tier 3 -- On-demand (reference only when the task context requires it):**
+- `hatch3r-api-design` -- when designing or reviewing API contracts
+- `hatch3r-secrets-management` -- when handling credentials or environment config
+- `hatch3r-data-classification` -- when handling PII or sensitive data flows
+- `hatch3r-performance-budgets` -- when profiling or reviewing performance
+- `hatch3r-browser-verification` -- when verifying UI in browser
+- `hatch3r-component-conventions` -- when writing UI components
+- `hatch3r-i18n`, `hatch3r-theming`, `hatch3r-migrations`, `hatch3r-feature-flags`, `hatch3r-observability` -- when the task specifically touches these areas
+For tools with limited context windows, Tier 1 rules are mandatory. Tier 2 and Tier 3 rules should be included selectively based on the subagent's role and the task scope to avoid exceeding token budgets.

package/rules/hatch3r-api-design.md CHANGED Viewed

@@ -3,6 +3,7 @@ id: hatch3r-api-design
 type: rule
 description: API endpoint and contract design patterns for the project
 scope: always
+tags: [planning]
 ---
 # API Design

package/rules/hatch3r-browser-verification.md CHANGED Viewed

@@ -3,6 +3,7 @@ id: hatch3r-browser-verification
 type: rule
 description: Browser-based verification for UI and user-facing changes
 scope: conditional
+tags: [review]
 ---
 # Browser Verification

package/rules/hatch3r-ci-cd.md CHANGED Viewed

@@ -3,6 +3,7 @@ id: hatch3r-ci-cd
 type: rule
 description: CI/CD pipeline standards covering stage gates, deployment strategies, and rollback procedures
 scope: always
+tags: [devops]
 ---
 # CI/CD Standards