npm - hatch3r - Versions diffs - 1.0.0 → 1.2.0 - Mend

hatch3r 1.0.0 → 1.2.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (144) hide show

package/README.md +93 -322
package/agents/hatch3r-a11y-auditor.md +24 -6
package/agents/hatch3r-architect.md +20 -1
package/agents/hatch3r-ci-watcher.md +31 -8
package/agents/hatch3r-context-rules.md +14 -2
package/agents/hatch3r-dependency-auditor.md +21 -5
package/agents/hatch3r-devops.md +37 -6
package/agents/hatch3r-docs-writer.md +19 -3
package/agents/hatch3r-fixer.md +171 -0
package/agents/hatch3r-implementer.md +84 -11
package/agents/hatch3r-learnings-loader.md +69 -13
package/agents/hatch3r-lint-fixer.md +19 -14
package/agents/hatch3r-perf-profiler.md +18 -1
package/agents/hatch3r-researcher.md +440 -5
package/agents/hatch3r-reviewer.md +97 -5
package/agents/hatch3r-security-auditor.md +23 -5
package/agents/hatch3r-test-writer.md +21 -10
package/checks/README.md +49 -0
package/checks/code-quality.md +49 -0
package/checks/performance.md +58 -0
package/checks/security.md +58 -0
package/checks/testing.md +53 -0
package/commands/board/pickup-azure-devops.md +81 -0
package/commands/board/pickup-delegation-multi.md +197 -0
package/commands/board/pickup-delegation.md +100 -0
package/commands/board/pickup-github.md +82 -0
package/commands/board/pickup-gitlab.md +81 -0
package/commands/board/pickup-modes.md +143 -0
package/commands/board/pickup-post-impl.md +120 -0
package/commands/board/shared-azure-devops.md +149 -0
package/commands/board/shared-board-overview.md +215 -0
package/commands/board/shared-github.md +169 -0
package/commands/board/shared-gitlab.md +142 -0
package/commands/hatch3r-agent-customize.md +40 -2
package/commands/hatch3r-api-spec.md +294 -32
package/commands/hatch3r-benchmark.md +386 -32
package/commands/hatch3r-board-fill.md +161 -25
package/commands/hatch3r-board-groom.md +595 -0
package/commands/hatch3r-board-init.md +203 -46
package/commands/hatch3r-board-pickup.md +79 -457
package/commands/hatch3r-board-refresh.md +98 -27
package/commands/hatch3r-board-shared.md +87 -238
package/commands/hatch3r-bug-plan.md +16 -3
package/commands/hatch3r-codebase-map.md +43 -10
package/commands/hatch3r-command-customize.md +6 -0
package/commands/hatch3r-context-health.md +5 -0
package/commands/hatch3r-cost-tracking.md +5 -0
package/commands/hatch3r-debug.md +426 -0
package/commands/hatch3r-dep-audit.md +7 -1
package/commands/hatch3r-feature-plan.md +74 -12
package/commands/hatch3r-healthcheck.md +17 -1
package/commands/hatch3r-hooks.md +16 -10
package/commands/hatch3r-learn.md +15 -9
package/commands/hatch3r-migration-plan.md +333 -33
package/commands/hatch3r-onboard.md +327 -38
package/commands/hatch3r-project-spec.md +46 -10
package/commands/hatch3r-quick-change.md +336 -0
package/commands/hatch3r-recipe.md +6 -0
package/commands/hatch3r-refactor-plan.md +29 -13
package/commands/hatch3r-release.md +13 -3
package/commands/hatch3r-revision.md +395 -0
package/commands/hatch3r-roadmap.md +18 -3
package/commands/hatch3r-rule-customize.md +6 -0
package/commands/hatch3r-security-audit.md +17 -1
package/commands/hatch3r-skill-customize.md +6 -0
package/commands/hatch3r-test-plan.md +532 -0
package/commands/hatch3r-workflow.md +113 -38
package/dist/cli/index.js +5184 -2593
package/dist/cli/index.js.map +1 -0
package/github-agents/hatch3r-docs-agent.md +1 -0
package/github-agents/hatch3r-lint-agent.md +1 -0
package/github-agents/hatch3r-security-agent.md +1 -0
package/github-agents/hatch3r-test-agent.md +1 -0
package/hooks/hatch3r-ci-failure.md +30 -0
package/hooks/hatch3r-file-save.md +22 -0
package/hooks/hatch3r-post-merge.md +23 -0
package/hooks/hatch3r-pre-commit.md +23 -0
package/hooks/hatch3r-pre-push.md +22 -0
package/hooks/hatch3r-session-start.md +22 -0
package/mcp/mcp.json +22 -3
package/package.json +4 -7
package/prompts/hatch3r-bug-triage.md +1 -0
package/prompts/hatch3r-code-review.md +1 -0
package/prompts/hatch3r-pr-description.md +1 -0
package/rules/hatch3r-accessibility-standards.md +1 -0
package/rules/hatch3r-agent-orchestration.md +326 -53
package/rules/hatch3r-agent-orchestration.mdc +225 -0
package/rules/hatch3r-api-design.md +4 -1
package/rules/hatch3r-browser-verification.md +33 -1
package/rules/hatch3r-browser-verification.mdc +29 -0
package/rules/hatch3r-ci-cd.md +5 -1
package/rules/hatch3r-ci-cd.mdc +4 -1
package/rules/hatch3r-code-standards.md +18 -0
package/rules/hatch3r-code-standards.mdc +10 -1
package/rules/hatch3r-component-conventions.md +4 -1
package/rules/hatch3r-data-classification.md +1 -0
package/rules/hatch3r-deep-context.md +94 -0
package/rules/hatch3r-deep-context.mdc +69 -0
package/rules/hatch3r-dependency-management.md +13 -0
package/rules/hatch3r-feature-flags.md +4 -1
package/rules/hatch3r-git-conventions.md +1 -0
package/rules/hatch3r-i18n.md +4 -1
package/rules/hatch3r-learning-consult.md +4 -2
package/rules/hatch3r-learning-consult.mdc +3 -2
package/rules/hatch3r-migrations.md +12 -0
package/rules/hatch3r-observability.md +293 -1
package/rules/hatch3r-performance-budgets.md +5 -2
package/rules/hatch3r-performance-budgets.mdc +1 -1
package/rules/hatch3r-secrets-management.md +11 -3
package/rules/hatch3r-secrets-management.mdc +10 -3
package/rules/hatch3r-security-patterns.md +23 -3
package/rules/hatch3r-security-patterns.mdc +8 -2
package/rules/hatch3r-testing.md +1 -0
package/rules/hatch3r-theming.md +4 -1
package/rules/hatch3r-tooling-hierarchy.md +42 -15
package/rules/hatch3r-tooling-hierarchy.mdc +27 -4
package/skills/hatch3r-a11y-audit/SKILL.md +1 -0
package/skills/hatch3r-agent-customize/SKILL.md +3 -0
package/skills/hatch3r-api-spec/SKILL.md +1 -0
package/skills/hatch3r-architecture-review/SKILL.md +6 -2
package/skills/hatch3r-bug-fix/SKILL.md +4 -1
package/skills/hatch3r-ci-pipeline/SKILL.md +1 -0
package/skills/hatch3r-command-customize/SKILL.md +1 -0
package/skills/hatch3r-context-health/SKILL.md +2 -1
package/skills/hatch3r-cost-tracking/SKILL.md +1 -0
package/skills/hatch3r-dep-audit/SKILL.md +6 -2
package/skills/hatch3r-feature/SKILL.md +9 -2
package/skills/hatch3r-gh-agentic-workflows/SKILL.md +130 -21
package/skills/hatch3r-incident-response/SKILL.md +11 -5
package/skills/hatch3r-issue-workflow/SKILL.md +12 -7
package/skills/hatch3r-logical-refactor/SKILL.md +1 -0
package/skills/hatch3r-migration/SKILL.md +1 -0
package/skills/hatch3r-perf-audit/SKILL.md +2 -1
package/skills/hatch3r-pr-creation/SKILL.md +20 -10
package/skills/hatch3r-qa-validation/SKILL.md +2 -1
package/skills/hatch3r-recipe/SKILL.md +1 -0
package/skills/hatch3r-refactor/SKILL.md +7 -1
package/skills/hatch3r-release/SKILL.md +15 -11
package/skills/hatch3r-rule-customize/SKILL.md +1 -0
package/skills/hatch3r-skill-customize/SKILL.md +1 -0
package/skills/hatch3r-visual-refactor/SKILL.md +1 -0
package/dist/cli/hooks-ZOTFDEA3.js +0 -59
package/rules/hatch3r-error-handling.md +0 -17
package/rules/hatch3r-error-handling.mdc +0 -15

package/commands/hatch3r-benchmark.md CHANGED Viewed

@@ -2,49 +2,403 @@
 id: hatch3r-benchmark
 type: command
 description: Run and analyze performance benchmarks. Compare results against baselines, identify regressions, and produce performance reports.
+tags: [review, performance]
 ---
-# Performance Benchmark
-## Inputs
+## Agent Pipeline
-- **Target:** specific file, function, endpoint, or `all` for full suite
-- **Baseline:** `previous-run` (default), git ref (branch/tag/SHA), or `none` (no comparison)
-- **Iterations:** number of benchmark runs for statistical significance (default: 5)
+| Stage | Agent(s) | Parallel | Required |
+|-------|----------|----------|----------|
+| 1. Discovery | `hatch3r-researcher` (codebase-analysis mode) | No | Yes |
+| 2. Execution | Orchestrator (inline, runs benchmarks) | No | Yes |
+| 3. Analysis | `hatch3r-perf-profiler` | No | Yes |
+| 4. Reporting | `hatch3r-docs-writer` | No | If regressions found |
-## Procedure
+# Performance Benchmark — Run, Compare, and Report on Performance Metrics
-1. **Discover benchmarks:**
-   - Scan for benchmark files (`.bench.ts`, `.benchmark.ts`, `__benchmarks__/`, `bench/`)
-   - If no benchmarks exist, identify critical paths and suggest benchmark candidates
+Run performance benchmarks against a target (file, function, endpoint, or full suite), compare results against a baseline (previous run, git ref, or none), and produce a structured performance report. Discovers existing benchmark files or proposes new ones for critical paths. Executes with configurable iterations, performs statistical analysis on results, and flags regressions with root cause tracing. Persists results to `.benchmarks/results.json` for longitudinal tracking. AI proposes all actions; user confirms at every checkpoint.
-2. **Run benchmarks:**
-   - Execute benchmark suite with the specified iterations
-   - Capture: execution time (mean, p50, p95, p99), memory usage, throughput (ops/sec)
-   - Run in a clean state (cold start) and warm state
+---
+## Shared Context
+**Read the `hatch3r-board-shared` command at the start of the run** if it exists. While this command does not perform board operations directly, it establishes patterns and context (GitHub owner/repo, tooling directives) that may be useful for regression issue creation. Cache any values found.
+## Token-Saving Directives
+1. **Do not re-read files already cached.** Once benchmark discovery results are collected, reference them in memory — do not re-scan the filesystem.
+2. **Limit source reads.** When reading source files to identify critical paths, read function signatures and hot-path sections only — not entire files.
+3. **Structured output only.** All sub-agent prompts and benchmark results require structured markdown output — no prose dumps.
+4. **Compress raw metrics.** Store full raw data in `.benchmarks/results.json` but present only summary statistics (mean, p50, p95, p99, stddev) in the report.
+---
+## Workflow
+Execute these steps in order. **Do not skip any step.** Ask the user at every checkpoint marked with ASK.
+### Step 1: Gather Benchmark Context
+1. **ASK:** "Tell me about the benchmarks you want to run. I need:
+   - **Target:** specific file, function, endpoint, module, or `all` for the full suite
+   - **Baseline:** `previous-run` (default — loads last saved results), a git ref (branch/tag/SHA to compare against), or `none` (no comparison)
+   - **Iterations:** number of benchmark runs for statistical significance (default: 5, minimum: 3)
+   - **Environment constraints:** CI vs. local, Node version, memory limits, specific flags (e.g., `--expose-gc`)
+   - **Metrics of interest:** time (default), memory, throughput (ops/sec), or `all`
+   You can also point me to an existing benchmark config file and I'll extract these from it."
+2. If the user provides a config reference, read it and extract the fields above.
+3. Present a structured summary:
+```
+Benchmark Brief:
+  Target:       {target — file/function/endpoint/all}
+  Baseline:     {previous-run / git-ref / none}
+  Iterations:   {N}
+  Environment:  {CI / local — Node version, flags}
+  Metrics:      {time / memory / throughput / all}
+```
+**ASK:** "Does this capture the benchmark plan correctly? Adjust anything before I begin discovery."
+---
+### Step 2: Discover Benchmarks
+Delegate to `hatch3r-researcher` in `codebase-analysis` mode with focus on benchmark discovery.
+1. Scan the codebase for existing benchmark infrastructure:
+   - Benchmark files: `*.bench.ts`, `*.benchmark.ts`, `*.bench.js`, `*.benchmark.js`
+   - Benchmark directories: `__benchmarks__/`, `bench/`, `benchmarks/`
+   - Test runner bench support: vitest `bench` mode, jest-bench, tinybench, benny
+   - Package.json scripts: any script containing `bench` or `benchmark`
+   - Existing results: `.benchmarks/`, `benchmark-results/`, or similar
+2. If benchmarks are found, catalog them:
+```
+Benchmark Discovery:
+  Files found:      {N} benchmark files
+  Runner:           {vitest bench / tinybench / benny / custom}
+  Existing results: {found at path / not found}
+  Coverage:         {which modules/functions have benchmarks}
+  Gaps:             {critical paths without benchmarks}
+```
+3. If the target is a specific file/function, verify a corresponding benchmark exists.
+**ASK:** "Here are the benchmarks I found. Confirm which to run:
+{numbered list of benchmark files/suites with brief description}
+Select: (a) all listed, (b) specific numbers, (c) let me suggest new benchmarks first"
+---
+### Step 3: Suggest Benchmarks (If Needed)
+Skip this step if the user confirmed existing benchmarks in Step 2 and no gaps were identified.
+1. Identify critical paths that lack benchmarks:
+   - Functions with high cyclomatic complexity
+   - Hot paths (called frequently based on import graph analysis)
+   - I/O-bound operations (database queries, file operations, network calls)
+   - Data transformation pipelines (parsing, serialization, mapping)
+   - Functions the user specifically targeted in Step 1
+2. For each candidate, propose a benchmark skeleton:
+```
+Benchmark Candidates:
+  1. {function/module} — {why it's a critical path}
+     Suggested benchmark: {brief description of what to measure}
+  2. {function/module} — {why it's a critical path}
+     Suggested benchmark: {brief description of what to measure}
+```
+**ASK:** "These critical paths lack benchmarks. Options:
+- **(a) Create benchmarks** for selected candidates (tell me which numbers)
+- **(b) Skip** — run only existing benchmarks
+- **(c) Create all** suggested benchmarks"
+3. If the user approves creation, generate benchmark files following the detected runner conventions (vitest bench, tinybench, etc.). Present file contents for review before writing.
+---
+### Step 4: Environment Preparation
+1. Verify clean execution state:
+   - Check for running dev servers, watchers, or other processes that consume CPU/memory
+   - Warn if detected: "Found {N} background processes that may affect benchmark stability: {list}"
+2. Set environment:
+   - `NODE_ENV=production` (unless the user specified otherwise)
+   - Apply any flags from Step 1 (e.g., `--expose-gc` for memory benchmarks)
+3. If baseline is a git ref:
+   - Verify the ref exists: `git rev-parse --verify {ref}`
+   - Check for uncommitted changes that would block checkout
+4. Present readiness check:
+```
+Environment Ready:
+  NODE_ENV:          {production}
+  Node version:      {version}
+  Background noise:  {none / warnings listed}
+  Baseline ref:      {verified / N/A}
+  Working tree:      {clean / uncommitted changes — list}
+```
+If uncommitted changes exist and baseline requires git checkout, **ASK:** "Uncommitted changes detected. Options: (a) stash and continue, (b) abort baseline comparison, (c) use `previous-run` baseline instead"
+---
+### Step 5: Execute Benchmarks
+1. Run the benchmark suite with the specified iterations:
+   - Cold start run (first iteration, not counted in statistics)
+   - Warm runs (remaining N iterations, used for statistics)
+2. Capture metrics per benchmark:
+   - **Time:** mean, median (p50), p95, p99, min, max, standard deviation
+   - **Memory:** heap used (mean, peak), RSS delta, GC pause time (if `--expose-gc`)
+   - **Throughput:** operations per second, iterations completed
+3. Monitor execution:
+   - Report progress: "Running benchmark {M}/{total} — {name}... iteration {I}/{N}"
+   - If a single benchmark takes >60s per iteration, warn the user
+   - If a benchmark crashes, capture the error and continue with remaining benchmarks
+4. Store raw results in memory for analysis.
+---
+### Step 6: Baseline Comparison
+Skip if baseline is `none`.
+1. **If baseline is `previous-run`:**
+   - Load `.benchmarks/results.json`
+   - If file does not exist, warn and skip comparison
+   - Match benchmarks by name — report any that exist in one set but not the other
+2. **If baseline is a git ref:**
+   - Stash current changes (if any)
+   - Checkout the baseline ref
+   - Run the same benchmark suite with the same iterations and environment
+   - Checkout the original branch and restore stash
+   - Match benchmarks by name
+3. Compute deltas for each matched benchmark:
+```
+Comparison:
+  Benchmark        | Metric   | Baseline   | Current    | Delta     | Change
+  {name}           | time p50 | {value}    | {value}    | {±value}  | {±%}
+  {name}           | ops/sec  | {value}    | {value}    | {±value}  | {±%}
+  {name}           | heap     | {value}    | {value}    | {±value}  | {±%}
+```
+---
+### Step 7: Statistical Analysis
+Delegate to `hatch3r-perf-profiler` for analysis of the collected metrics.
+1. Calculate statistical significance for each delta:
+   - Use coefficient of variation (CV) to assess measurement noise
+   - Flag results with CV > 15% as **noisy** — increase iterations recommended
+   - Apply t-test or equivalent for significance (p < 0.05 threshold)
+2. Identify outliers:
+   - Runs that deviate > 2 standard deviations from mean
+   - Report outlier count and whether they skew results
+3. Classify each benchmark result:
-3. **Compare against baseline:**
-   - If baseline is `previous-run`, read last saved benchmark results from `.benchmarks/results.json`
-   - If baseline is a git ref, checkout that ref, run benchmarks, then compare
-   - Calculate: absolute difference, percentage change, statistical significance
+| Classification | Criteria | Action |
+|---------------|----------|--------|
+| `regression-critical` | > 50% slower or > 2x memory | Immediate attention |
+| `regression-warning` | 10–50% slower or 50–100% more memory | Investigation recommended |
+| `acceptable` | < 10% change in either direction | No action needed |
+| `improvement` | > 10% faster or less memory | Note for celebration |
+| `noisy` | CV > 15% — results unreliable | Rerun with more iterations |
+**ASK:** "Here is the benchmark summary:
+- {N} benchmarks executed across {M} iterations
+- {X} regressions ({critical count} critical, {warning count} warning)
+- {Y} improvements
+- {Z} noisy results
+Want me to perform deep analysis on any specific metric or benchmark? (list numbers / all regressions / skip to report)"
+---
+### Step 8: Root Cause Analysis (If Regressions Found)
+Skip if no regressions classified as `regression-critical` or `regression-warning` in Step 7.
+1. For each regression > 10%, trace to specific code changes:
+   - If baseline is a git ref: `git diff {ref}...HEAD -- {affected files}`
+   - Identify new code in hot paths, additional allocations, changed algorithms
+   - Cross-reference with import graph to find indirect causes
+2. Categorize root causes:
+| Pattern | Example | Typical Fix |
+|---------|---------|-------------|
+| Hot path expansion | New validation/logging in critical loop | Move out of loop, lazy evaluate |
+| Allocation increase | New object creation per iteration | Object pooling, pre-allocation |
+| Algorithm change | O(n) → O(n²) in data processing | Restore or optimize algorithm |
+| Dependency overhead | New import initializing at load time | Lazy import, tree-shake |
+| Serialization cost | Larger payloads, new fields | Selective serialization |
+3. Present findings per regression:
+```
+Regression Analysis: {benchmark name}
+  Metric:     {metric} — {baseline} → {current} ({+%} regression)
+  Root cause: {description}
+  Files:      {affected files with line ranges}
+  Category:   {pattern from table above}
+  Suggested fix: {brief recommendation}
+```
+---
+### Step 9: Generate Report
+Delegate to `hatch3r-docs-writer` if regressions were found (for a polished report); otherwise the orchestrator generates inline.
+Present the full report for review before saving. Use the **Output Template** below.
+**ASK:** "Here is the benchmark report. Review before I save:
+- {N} benchmarks, {M} iterations, {environment}
+- {X} regressions, {Y} improvements, {Z} stable
+- Report file: `.benchmarks/report-{date}.md`
+Confirm, or tell me what to adjust."
+---
+### Step 10: Save Results
+1. Save raw results to `.benchmarks/results.json` (create directory if needed):
+   - Timestamp, git SHA, branch, environment metadata
+   - Per-benchmark: all iterations, computed statistics, classification
+   - Comparison deltas (if baseline was used)
+2. Save the markdown report to `.benchmarks/report-{YYYY-MM-DD}.md`.
+3. Present a summary of files created:
+```
+Files Created/Updated:
+  .benchmarks/
+    results.json                — {N} benchmarks, {M} iterations, {timestamp}
+    report-{YYYY-MM-DD}.md     — full benchmark report
+```
+**ASK:** "Results saved. Should these become the new baseline for future comparisons? (yes — overwrites previous baseline / no — keep existing baseline)"
+If yes, ensure `results.json` is structured as the canonical baseline for the next `previous-run` comparison.
+---
+## Output Template
+The benchmark report follows this structure:
+```markdown
+# Performance Benchmark Report
+**Date:** {YYYY-MM-DD}
+**Branch:** {branch} @ {short SHA}
+**Baseline:** {previous-run / git ref / none}
+**Environment:** {Node version}, {OS}, {CI/local}
+**Iterations:** {N} (+ 1 cold start, excluded)
+## Summary
+| Status | Count |
+|--------|-------|
+| Regressions (critical) | {N} |
+| Regressions (warning) | {N} |
+| Stable | {N} |
+| Improvements | {N} |
+| Noisy (inconclusive) | {N} |
+## Results
+| Benchmark | Metric | Baseline | Current | Delta | Status |
+|-----------|--------|----------|---------|-------|--------|
+| {name} | time (p50) | {value}ms | {value}ms | {±%} | {status icon} |
+| {name} | ops/sec | {value} | {value} | {±%} | {status icon} |
+| {name} | heap (mean) | {value}MB | {value}MB | {±%} | {status icon} |
+### Statistical Confidence
+| Benchmark | CV (%) | Outliers | Significance | Reliable |
+|-----------|--------|----------|-------------|----------|
+| {name} | {value}% | {N}/{total} | p={value} | {yes/no} |
+## Regressions
+### {Benchmark Name} — {metric} regression ({+%})
+**Severity:** {critical / warning}
+**Baseline:** {value} → **Current:** {value} ({±absolute}, {±%})
+**Root Cause:**
+{description of what changed and why it's slower}
+**Affected Code:**
+- `{file}:{line range}` — {what changed}
+**Recommendation:**
+{specific optimization suggestion}
+## Improvements
+| Benchmark | Metric | Baseline | Current | Improvement |
+|-----------|--------|----------|---------|-------------|
+| {name} | {metric} | {value} | {value} | {-%} faster |
+## Environment Details
+| Property | Value |
+|----------|-------|
+| Node.js | {version} |
+| OS | {platform, arch} |
+| CPU | {model, cores} |
+| Memory | {total} |
+| Runner | {vitest bench / tinybench / custom} |
+| Flags | {--expose-gc, etc.} |
+## Recommendations
+1. {prioritized optimization recommendation}
+2. {recommendation}
+---
+*Generated by hatch3r-benchmark — {timestamp}*
+```
+---
-4. **Identify regressions:**
-   - Flag any metric that regressed > 10% from baseline
-   - For regressions, identify the likely cause (new code in hot path, additional allocations, etc.)
-   - Classify: `critical` (> 50% regression), `warning` (10-50%), `acceptable` (< 10%)
+## Error Handling
-5. **Generate report:**
-   - Summary table with all metrics and comparison
-   - Detailed analysis for any regressions
-   - Recommendations for optimization if regressions found
-   - Save results to `.benchmarks/results.json` for future comparisons
+- **Benchmarks fail to run:** Capture the error output. Check for missing dependencies, syntax errors, or runner misconfiguration. Present the error and ASK: "Benchmark {name} failed to execute: {error}. Options: (a) skip and continue with remaining, (b) attempt to fix the benchmark file, (c) abort."
+- **Baseline ref does not exist:** If `git rev-parse --verify {ref}` fails, report the error and ASK: "Baseline ref `{ref}` not found. Options: (a) use `previous-run` instead, (b) run without comparison, (c) provide a different ref."
+- **No previous results file:** If baseline is `previous-run` but `.benchmarks/results.json` does not exist, warn and proceed without comparison. Note in the report that this is a baseline-establishing run.
+- **Results too noisy (high variance):** If CV > 15% for more than half the benchmarks, flag the entire run as unreliable. ASK: "Results show high variance — likely environmental interference. Options: (a) rerun with more iterations ({current × 2}), (b) accept results with noise caveat, (c) abort and retry in a cleaner environment."
+- **Benchmark timeout:** If a single benchmark exceeds 5 minutes per iteration, kill it and report. ASK whether to skip or increase the timeout.
+- **Git checkout failure during baseline comparison:** If stash/checkout fails (merge conflicts, dirty state), abort the baseline comparison gracefully. Fall back to `previous-run` or `none` and inform the user.
+- **Disk space for results:** If `.benchmarks/results.json` grows excessively (> 10MB), warn and suggest pruning old entries.
-## Output
+## Guardrails
-- Performance report in markdown format
-- Updated baseline results file
-- Regression alerts with severity classification
+- **Never modify production code based on benchmark results.** The benchmark command observes and reports — it never changes application source code. Optimization changes require a separate implementation task.
+- **Never skip ASK checkpoints.** Every step with an ASK must pause for user confirmation.
+- **Flag CI vs. local execution.** Results from different environments must not share a baseline. Include environment fingerprint in `results.json` and warn if comparing across environments.
+- **Minimum 3 iterations for statistical validity.** If the user requests fewer, override to 3 and explain why.
+- **Always exclude the cold start run from statistics.** The first iteration warms caches and JIT — including it skews results.
+- **Never overwrite baseline without confirmation.** Step 10 explicitly asks before promoting results to baseline status.
+- **Preserve existing `.benchmarks/results.json` history.** Append new runs; do not truncate historical data without user approval.
+- **Do not benchmark in debug mode.** Ensure `NODE_ENV=production` and no debug flags are active unless explicitly requested.
+- **Respect the project's tooling hierarchy** for knowledge augmentation: project docs first, then codebase exploration, then Context7 MCP, then web research.
+- **Report, don't interpret subjectively.** Present statistical facts. Flag regressions by threshold, not opinion. Let the user decide what matters.
 ## Related
-- **Skill:** `hatch3r-perf-audit` — comprehensive performance profiling
-- **Agent:** `hatch3r-perf-profiler` — agent for deep performance investigation
+- **Agent:** `hatch3r-perf-profiler` — deep performance profiling and analysis
+- **Check:** `checks/performance.md` — performance budget checks
+- **Rule:** `hatch3r-performance-budgets` — performance budget thresholds and enforcement
+- **Command:** `hatch3r-refactor-plan` — plan optimizations identified by benchmark regressions