npm - hatch3r - Versions diffs - 1.7.1 → 1.8.0 - Mend

hatch3r 1.7.1 → 1.8.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (189) hide show

package/README.md +38 -12
package/agents/hatch3r-a11y-auditor.md +4 -0
package/agents/hatch3r-architect.md +4 -0
package/agents/hatch3r-ci-watcher.md +4 -0
package/agents/hatch3r-context-rules.md +26 -6
package/agents/hatch3r-creator.md +6 -1
package/agents/hatch3r-dependency-auditor.md +4 -0
package/agents/hatch3r-devops.md +4 -0
package/agents/hatch3r-docs-writer.md +4 -0
package/agents/hatch3r-fixer.md +4 -0
package/agents/hatch3r-handoff-loader.md +243 -0
package/agents/hatch3r-handoff-preparer.md +134 -0
package/agents/hatch3r-implementer.md +12 -0
package/agents/hatch3r-learnings-loader.md +5 -1
package/agents/hatch3r-lint-fixer.md +4 -0
package/agents/hatch3r-perf-profiler.md +8 -0
package/agents/hatch3r-researcher.md +4 -0
package/agents/hatch3r-reviewer.md +94 -0
package/agents/hatch3r-security-auditor.md +24 -0
package/agents/hatch3r-test-writer.md +4 -0
package/agents/modes/requirements-elicitation.md +4 -1
package/agents/modes/similar-implementation.md +6 -0
package/agents/modes/user-flows.md +76 -0
package/agents/shared/quality-charter.md +128 -0
package/agents/shared/user-content-templates.md +31 -1
package/commands/hatch3r-agent-customize.md +4 -0
package/commands/hatch3r-api-spec.md +7 -0
package/commands/hatch3r-benchmark.md +7 -0
package/commands/hatch3r-board-fill.md +8 -0
package/commands/hatch3r-board-groom.md +4 -0
package/commands/hatch3r-board-init.md +51 -0
package/commands/hatch3r-board-pickup.md +8 -0
package/commands/hatch3r-board-refresh.md +4 -0
package/commands/hatch3r-board-shared.md +6 -6
package/commands/hatch3r-bug-plan.md +7 -0
package/commands/hatch3r-codebase-map.md +8 -0
package/commands/hatch3r-command-customize.md +4 -0
package/commands/hatch3r-context-health.md +5 -0
package/commands/hatch3r-create.md +59 -4
package/commands/hatch3r-debug.md +7 -0
package/commands/hatch3r-dep-audit.md +4 -0
package/commands/hatch3r-feature-plan.md +7 -0
package/commands/hatch3r-handoff.md +133 -0
package/commands/hatch3r-healthcheck.md +4 -0
package/commands/hatch3r-hooks.md +4 -0
package/commands/hatch3r-learn.md +16 -0
package/commands/hatch3r-migration-plan.md +7 -0
package/commands/hatch3r-onboard.md +7 -0
package/commands/hatch3r-pr-resolve.md +12 -1
package/commands/hatch3r-project-spec.md +8 -0
package/commands/hatch3r-quick-change.md +11 -2
package/commands/hatch3r-recipe.md +4 -0
package/commands/hatch3r-refactor-plan.md +7 -0
package/commands/hatch3r-release.md +5 -0
package/commands/hatch3r-revision.md +7 -0
package/commands/hatch3r-roadmap.md +8 -0
package/commands/hatch3r-rule-customize.md +4 -0
package/commands/hatch3r-security-audit.md +4 -0
package/commands/hatch3r-skill-customize.md +4 -0
package/commands/hatch3r-test-plan.md +7 -0
package/commands/hatch3r-workflow.md +11 -1
package/dist/cli/index.js +4814 -1130
package/dist/cli/index.js.map +1 -1
package/package.json +10 -5
package/rules/hatch3r-accessibility-standards.md +21 -0
package/rules/hatch3r-accessibility-standards.mdc +21 -0
package/rules/hatch3r-agent-orchestration-detail.md +3 -0
package/rules/hatch3r-agent-orchestration-detail.mdc +3 -0
package/rules/hatch3r-agent-orchestration.md +34 -3
package/rules/hatch3r-agent-orchestration.mdc +34 -3
package/rules/hatch3r-ai-evals.md +158 -0
package/rules/hatch3r-ai-evals.mdc +154 -0
package/rules/hatch3r-ai-ux-patterns.md +131 -0
package/rules/hatch3r-ai-ux-patterns.mdc +127 -0
package/rules/hatch3r-api-design.md +67 -9
package/rules/hatch3r-api-design.mdc +67 -9
package/rules/hatch3r-api-versioning.md +119 -0
package/rules/hatch3r-api-versioning.mdc +115 -0
package/rules/hatch3r-auth-patterns.md +170 -0
package/rules/hatch3r-auth-patterns.mdc +166 -0
package/rules/hatch3r-component-conventions.md +30 -0
package/rules/hatch3r-component-conventions.mdc +30 -0
package/rules/hatch3r-container-hardening.md +131 -0
package/rules/hatch3r-container-hardening.mdc +127 -0
package/rules/hatch3r-contract-testing.md +117 -0
package/rules/hatch3r-contract-testing.mdc +113 -0
package/rules/hatch3r-deep-context.md +2 -0
package/rules/hatch3r-deep-context.mdc +2 -0
package/rules/hatch3r-dependency-management.md +73 -1
package/rules/hatch3r-dependency-management.mdc +72 -0
package/rules/hatch3r-design-system-detection.md +142 -0
package/rules/hatch3r-design-system-detection.mdc +138 -0
package/rules/hatch3r-event-schema-evolution.md +90 -0
package/rules/hatch3r-event-schema-evolution.mdc +86 -0
package/rules/hatch3r-handoff-readiness.md +45 -0
package/rules/hatch3r-handoff-readiness.mdc +40 -0
package/rules/hatch3r-i18n.md +13 -0
package/rules/hatch3r-i18n.mdc +13 -0
package/rules/hatch3r-iteration-summary.md +2 -0
package/rules/hatch3r-iteration-summary.mdc +2 -0
package/rules/hatch3r-migrations.md +61 -16
package/rules/hatch3r-migrations.mdc +61 -16
package/rules/hatch3r-observability-logging.md +1 -1
package/rules/hatch3r-observability-logging.mdc +1 -1
package/rules/hatch3r-observability-metrics.md +1 -1
package/rules/hatch3r-observability-metrics.mdc +1 -1
package/rules/hatch3r-observability-tracing-detail.md +8 -149
package/rules/hatch3r-observability-tracing-detail.mdc +7 -149
package/rules/hatch3r-observability-tracing.md +154 -6
package/rules/hatch3r-observability-tracing.mdc +154 -6
package/rules/hatch3r-observability.md +1 -0
package/rules/hatch3r-observability.mdc +1 -0
package/rules/hatch3r-operability.md +149 -0
package/rules/hatch3r-operability.mdc +145 -0
package/rules/hatch3r-passkey-server.md +181 -0
package/rules/hatch3r-passkey-server.mdc +177 -0
package/rules/hatch3r-progressive-delivery.md +120 -0
package/rules/hatch3r-progressive-delivery.mdc +116 -0
package/rules/hatch3r-resilience-patterns.md +154 -0
package/rules/hatch3r-resilience-patterns.mdc +150 -0
package/rules/hatch3r-secrets-management.md +29 -0
package/rules/hatch3r-secrets-management.mdc +29 -0
package/rules/hatch3r-testing.md +139 -43
package/rules/hatch3r-testing.mdc +139 -43
package/rules/hatch3r-ux-states-and-flows.md +149 -0
package/rules/hatch3r-ux-states-and-flows.mdc +145 -0
package/skills/hatch3r-a11y-audit/SKILL.md +14 -0
package/skills/hatch3r-agent-customize/SKILL.md +10 -0
package/skills/hatch3r-ai-feature/SKILL.md +136 -0
package/skills/hatch3r-api-spec/SKILL.md +73 -0
package/skills/hatch3r-architecture-review/SKILL.md +14 -0
package/skills/hatch3r-bug-fix/SKILL.md +5 -0
package/skills/hatch3r-ci-pipeline/SKILL.md +14 -0
package/skills/hatch3r-cli-aichat/SKILL.md +84 -0
package/skills/hatch3r-cli-ast-grep/SKILL.md +85 -0
package/skills/hatch3r-cli-az-devops/SKILL.md +89 -0
package/skills/hatch3r-cli-bat/SKILL.md +85 -0
package/skills/hatch3r-cli-comby/SKILL.md +85 -0
package/skills/hatch3r-cli-csvkit/SKILL.md +84 -0
package/skills/hatch3r-cli-delta/SKILL.md +86 -0
package/skills/hatch3r-cli-difftastic/SKILL.md +84 -0
package/skills/hatch3r-cli-docker/SKILL.md +89 -0
package/skills/hatch3r-cli-duckdb/SKILL.md +84 -0
package/skills/hatch3r-cli-fd/SKILL.md +85 -0
package/skills/hatch3r-cli-fzf/SKILL.md +84 -0
package/skills/hatch3r-cli-gh/SKILL.md +90 -0
package/skills/hatch3r-cli-glab/SKILL.md +89 -0
package/skills/hatch3r-cli-jq/SKILL.md +89 -0
package/skills/hatch3r-cli-lazygit/SKILL.md +78 -0
package/skills/hatch3r-cli-llm/SKILL.md +84 -0
package/skills/hatch3r-cli-miller/SKILL.md +84 -0
package/skills/hatch3r-cli-mods/SKILL.md +84 -0
package/skills/hatch3r-cli-overview/SKILL.md +60 -0
package/skills/hatch3r-cli-playwright/SKILL.md +89 -0
package/skills/hatch3r-cli-podman/SKILL.md +84 -0
package/skills/hatch3r-cli-qsv/SKILL.md +91 -0
package/skills/hatch3r-cli-ripgrep/SKILL.md +85 -0
package/skills/hatch3r-cli-rtk/SKILL.md +91 -0
package/skills/hatch3r-cli-sd/SKILL.md +85 -0
package/skills/hatch3r-cli-stagehand/SKILL.md +111 -0
package/skills/hatch3r-cli-taplo/SKILL.md +84 -0
package/skills/hatch3r-cli-yq/SKILL.md +85 -0
package/skills/hatch3r-cli-zstd/SKILL.md +85 -0
package/skills/hatch3r-command-customize/SKILL.md +10 -0
package/skills/hatch3r-context-health/SKILL.md +14 -0
package/skills/hatch3r-cost-tracking/SKILL.md +14 -0
package/skills/hatch3r-customize/SKILL.md +17 -0
package/skills/hatch3r-dep-audit/SKILL.md +14 -0
package/skills/hatch3r-design-system-detect/SKILL.md +164 -0
package/skills/hatch3r-feature/SKILL.md +2 -0
package/skills/hatch3r-gh-agentic-workflows/SKILL.md +13 -0
package/skills/hatch3r-handoff-prepare/SKILL.md +160 -0
package/skills/hatch3r-handoff-resume/SKILL.md +171 -0
package/skills/hatch3r-incident-response/SKILL.md +14 -0
package/skills/hatch3r-issue-workflow/SKILL.md +5 -0
package/skills/hatch3r-logical-refactor/SKILL.md +14 -0
package/skills/hatch3r-migration/SKILL.md +14 -0
package/skills/hatch3r-observability-verify/SKILL.md +134 -0
package/skills/hatch3r-perf-audit/SKILL.md +14 -0
package/skills/hatch3r-pr-creation/SKILL.md +14 -0
package/skills/hatch3r-qa-validation/SKILL.md +18 -0
package/skills/hatch3r-recipe/SKILL.md +14 -0
package/skills/hatch3r-refactor/SKILL.md +14 -0
package/skills/hatch3r-release/SKILL.md +14 -0
package/skills/hatch3r-reliability-verify/SKILL.md +146 -0
package/skills/hatch3r-rule-customize/SKILL.md +10 -0
package/skills/hatch3r-skill-customize/SKILL.md +10 -0
package/skills/hatch3r-ui-ux-verify/SKILL.md +138 -0
package/skills/hatch3r-visual-refactor/SKILL.md +15 -1

package/rules/hatch3r-testing.mdc CHANGED Viewed

@@ -8,17 +8,24 @@ alwaysApply: false
 ## Core Principles
 - Unit tests: project test runner. Integration: test runner + emulators/mocks. E2E: browser automation (Playwright or equivalent).
-- **Deterministic.** Mock time where needed. No wall clock dependency.
-- **Isolated.** Each test sets up and tears down its own state.
+- **Deterministic.** Mock time, seed RNG, pin timezone/locale. See Determinism Contract below.
+- **Isolated.** Each test sets up and tears down its own state. Vitest `isolate: true`; Jest `--runInBand` only for serialized DB tests.
 - **Fast.** Unit tests < 50ms. Integration tests < 2s.
 - **Named clearly.** Describe behavior: `"should award 15 XP for 25-min focus block"`.
 - **Regression.** Every bug fix includes a test that fails before the fix and passes after.
-- **No network.** Unit tests must not make network calls. Use mocks.
+- **No network.** Unit tests must not make network calls. Use mocks or Testcontainers (pinned by digest).
 - No type escape hatches in tests. No `.skip` without a linked issue.
 - Write tests to `tests/unit/`, `tests/integration/`, `tests/e2e/`, or equivalent.
 - Use test fixtures from `tests/fixtures/` or equivalent.
 - **Browser verification.** For UI changes, verify visually in the browser via browser automation MCP after automated tests pass. Capture screenshots as evidence.
+## Test Pyramid / Honeycomb / Trophy — Pick by Architecture
+Pick exactly one shape and document it in `docs/testing.md` (or equivalent):
+- **Pyramid** (heavy unit, light E2E): monoliths with rich domain logic.
+- **Honeycomb** (heavy integration, light unit + E2E): microservices; ~48% of microservice teams (Spotify model).
+- **Trophy** (unit + integration + E2E in similar ratios, light static): serverless functions; ~42% of serverless teams (Kent C. Dodds shape).
 ## Coverage Thresholds
 - **Statement coverage:** 80% minimum across the project. New code must not decrease overall coverage.
@@ -29,6 +36,10 @@ alwaysApply: false
 - Generate coverage reports in CI and publish as PR comments or artifacts for visibility.
 - Exclude generated code, type declarations, and config files from coverage metrics.
+## Coverage That Matters — Coverage AND Mutation
+Coverage alone is necessary, not sufficient. A PR that raises line coverage but drops mutation score is a regression. Reviewers verify the right test classes per the Per-Feature Mandate Map below; coverage numbers are a floor, not a finish line.
 ## Mocking Strategy
 - **Prefer fakes over mocks** for stateful dependencies (databases, caches). Fakes implement the real interface with in-memory state, making tests more realistic.
@@ -39,59 +50,144 @@ alwaysApply: false
 - **Type-safe mocks.** Mock implementations must satisfy the same TypeScript interface as the real dependency. Avoid `as any` in mock setup.
 - **No mocking the unit under test.** If you need to mock part of the module you are testing, the module has too many responsibilities — refactor first.
-## Property-Based Testing
+## Contract Testing
+Every cross-service interaction is covered by both consumer-side (Pact) and provider-side (Schemathesis against the OpenAPI/AsyncAPI schema) contract tests. `pact-broker can-i-deploy` gates production deploys: if the consumer/provider contract pair is incompatible, the deploy is blocked. See `rules/hatch3r-contract-testing.md` for the full pattern (broker setup, provider state handlers, versioning, breakage triage).
+## Property-Based Testing — Per Ecosystem
+Required for any pure function, parser, serializer, state machine, or invariant-bearing function. Default 100 trials per property; raise to 1000 for security-sensitive code. Shrinking must be enabled.
+- **TypeScript / JavaScript:** `fast-check` (latest 3.x). Use for pure functions, parsers, state machines (`fc.commands`).
+- **Python:** `Hypothesis` 6.151+. Stateful PBT via `RuleBasedStateMachine`.
+- **Rust:** `proptest`. Shrinks to minimal failing case.
+- **Scala:** `ScalaCheck`. Use for case-class invariants.
+- **Java:** `jqwik` (modern) or `junit-quickcheck`.
+- **Go:** `gopter` or stdlib `testing/quick` (limited shrinking).
-- Use a property-based testing library (fast-check or equivalent) for functions with wide input domains.
-- **Priority targets:** parsers, serializers, validators, encoders/decoders, mathematical functions, and any pure function with complex input types.
-- Define invariants as properties: round-trip (encode then decode equals original), idempotency (applying twice equals applying once), monotonicity, commutativity.
-- Use `fc.assert` with at least 100 runs per property. Increase to 1000 for critical paths.
-- When a property test finds a failure, add the minimal counterexample as a dedicated regression unit test.
-- Shrinking must be enabled — it reduces failing inputs to the smallest reproduction case.
-- Property tests belong alongside unit tests in `tests/unit/`. Name them clearly: `"property: round-trip serialization for UserProfile"`.
+Invariants to encode: round-trip (encode then decode equals original), idempotency (applying twice equals once), monotonicity, commutativity. When a property test finds a failure, add the minimal counterexample as a dedicated regression unit test.
-## Mutation Testing
+## Mutation Testing — Per Ecosystem + Thresholds
-- Use Stryker (or equivalent mutation testing framework) on critical modules to measure test effectiveness beyond line coverage.
-- **Mutation score target:** 70% minimum on critical modules (auth, data layer, business rules). 60% minimum project-wide.
-- Run mutation testing in CI on a weekly schedule (not per-PR — too slow). Report results as a CI artifact.
-- **Surviving mutants** indicate tests that pass regardless of code changes — these are false-coverage tests. Fix them by adding assertions that detect the mutation.
-- Focus mutation testing effort on modules where a bug would cause data loss, security vulnerability, or financial impact.
-- Exclude test files, generated code, and UI presentation logic from mutation analysis.
+Run on a nightly schedule (not per-commit) due to runtime cost. Mutation score is a quality gate alongside coverage. Surviving mutants indicate tests that pass regardless of code changes — fix by adding assertions that detect the mutation.
+- **TypeScript / JavaScript:** Stryker. Thresholds: break 50 / low 60 / high 80 (Stryker defaults).
+- **Python:** `mutmut` (88.5% score on reference suite, faster) or `Cosmic Ray` (82.7%, thorough).
+- **Java:** PIT 1.22 (November 2025). Business logic target 80–90.
+- **Go:** `go-mutesting` or `gremlins-dev/gremlins`.
+- **.NET:** `Stryker.NET`.
+**Mutation score target:** 70% minimum on critical modules (auth, data layer, business rules), 60% project-wide, 80%+ on payment/billing logic. Exclude test files, generated code, and UI presentation logic from mutation analysis.
+## Fuzz Testing — Per Ecosystem
+Required for any parser, deserializer, network handler, file-format handler, or untrusted-input boundary. Crash + hang + OOM detection; corpus minimization; persisted crash inputs become regression fixtures.
+- **Java:** jazzer (OSS-Fuzz integrated).
+- **JavaScript:** jazzer.js OSS was discontinued in 2025 — fall back to property-based testing for JS-only paths, or fuzz the underlying native binding.
+- **Python:** `atheris` (Google).
+- **Rust:** `cargo-fuzz` + libFuzzer.
+- **Go:** native `testing.F` (Go 1.18+); for advanced workflows use `gosentry` (Trail of Bits, 2026-05-12 fork of the discontinued jazzer.js workflow adapted for Go).
+- **C / C++:** AFL++ + OSS-Fuzz.
+## Determinism Contract
+Every test must be deterministic. Mandates:
+- **Clock injection.** Production code never calls `new Date()` / `time.Now()` / `datetime.now()` directly; inject a clock interface. In tests: `vi.useFakeTimers()` / `freezegun` / `mock.patch('time.time')`.
+- **Seeded RNG.** Every random call uses an injected seedable RNG. In tests: fixed seed per test.
+- **Pinned timezone and locale.** `TZ=UTC` and `LC_ALL=C.UTF-8` in the CI environment.
+- **Sorted iteration.** Any test asserting on map / dict iteration order sorts first.
+- **OS-assigned ports.** Never bind to a fixed port in tests — bind to `0` to get an OS-assigned port.
+- **Test isolation.** Vitest `isolate: true` (default); Jest `--runInBand` only when needed for serialized DB tests.
 ## Flaky Test Handling
-- **Zero tolerance policy.** A flaky test erodes trust in the entire suite. Fix or quarantine within 48 hours of detection.
-- **Quarantine process:** Move the flaky test to a `tests/quarantine/` directory or tag with `.skip("FLAKY: #issue-number")`. Create a tracking issue immediately.
-- **Retry strategy in CI:** Allow a maximum of 1 automatic retry for the full test suite. Never retry individual tests silently — that masks flakiness.
-- **Root cause investigation:** Common causes are shared mutable state, timing dependencies (real clocks, `setTimeout`), port conflicts, uncontrolled randomness, and external service calls.
-- **Fix patterns:** Replace `setTimeout` with fake timers, replace shared state with per-test setup, replace port binding with dynamic ports, seed random generators deterministically.
-- **Flaky test metrics:** Track flaky test rate over time. Target < 0.5% flaky rate (flaky runs / total runs). Alert when rate exceeds 1%.
-- **Quarantine review:** Review quarantined tests weekly. Tests quarantined for more than 30 days must be either fixed or deleted with justification.
+- **Detection.** CI retries failed tests once; tests failing on retry but passing on rerun are tagged `flake-suspected`.
+- **Quarantine.** Any test that flakes twice in 7 days moves to `tests/quarantine/` (runs but does not block PRs). Issue auto-filed with the `flake` label.
+- **SLA.** 14 days to root-cause and fix; otherwise the test is deleted. Quarantined tests reviewed weekly.
+- **Retry policy.** Allow at most 1 automatic retry for the full test suite. Never silently retry individual tests — that masks flakiness.
+- **Categorization on intake.** Tag each flake by root cause: timing (use fake timers), network (use mocks / Testcontainers), ordering (sort assertions), pollution (test isolation), resource (cleanup).
+- **Fix patterns.** Replace `setTimeout` with fake timers; replace shared state with per-test setup; replace fixed ports with OS-assigned (`0`); seed random generators deterministically.
+- **Metrics.** Track flaky rate over time. Target < 0.5% (flaky runs / total runs). Alert at 1%.
+- **Cost awareness.** Datadog 2026 telemetry reports 6–8 hrs/eng/week lost to flakes when quarantine + SLA is not enforced.
+## E2E Strategy
+- **Playwright is the 2026 default** (95k stars, ~290 ms/action, native sharding via `--shard`). Use for cross-browser, accessibility (`@axe-core/playwright`), and visual regression (`toHaveScreenshot()`).
+- Cypress requires paid Cloud for serious parallelization; WebdriverIO is the niche choice for web + mobile parity.
+- Retry policy: `retries: 2` for transient infra; never `retries: >= 5` (masks bugs).
+## Snapshot Testing
+- **Use sparingly.** 2–4 snapshots per component max. Appropriate for serialized output (JSON API responses, CLI output, rendered HTML structure) where the exact output matters and is stable.
+- **Not appropriate for:** UI component visual appearance (use visual regression tests via `toHaveScreenshot()` or `jest-image-snapshot`), objects with timestamps or random IDs (unstable), large objects (unreadable diffs).
+- **Review discipline.** Snapshot updates (`--update-snapshots`) must be reviewed with the same rigor as code changes. Reviewers verify the new snapshot is intentionally correct, not just "different."
+- **Keep snapshots small.** Files > 100 lines suggest the test is asserting too broadly. Narrow the assertion to the relevant subset.
+- **Inline snapshots** are preferred over external `.snap` files for short outputs (< 20 lines) — keeps the assertion co-located with the test.
+- **Design-system components:** Storybook + Chromatic.
 ## Test Data Management
-- **Factories over fixtures.** Use factory functions (builder pattern) to generate test data with sensible defaults and per-test overrides. Factories produce valid objects by default; tests override only the fields relevant to the scenario.
-- **Builder pattern example:** `buildUser({ role: "admin" })` returns a full valid User with admin role and random but valid defaults for all other fields.
-- **No shared mutable fixtures.** If multiple tests read the same fixture data, each test must get its own copy. Use `structuredClone()` or factory functions.
-- **Realistic data.** Use faker or equivalent for generating realistic names, emails, dates. Avoid magic strings like `"test"`, `"foo"`, `"abc123"`.
-- **Deterministic seeding.** When using random data generators, seed them per test file so failures are reproducible.
-- **Fixture files** (JSON, YAML) are acceptable for large, complex, or externally-sourced test inputs (API response snapshots, configuration samples). Store in `tests/fixtures/`.
-- **Database state:** Integration tests that require database state must set up and tear down within the test using helpers. Never depend on database state from a previous test.
+- **Factories over fixtures.** Use factory functions (factory-bot for Ruby, Fishery for TS, factory-boy for Python) seeded with Faker pinned to a fixed version.
+- **Builder pattern example:** `buildUser({ role: "admin" })` returns a full valid `User` with admin role and valid defaults for all other fields.
+- **No shared mutable fixtures.** If multiple tests read the same fixture data, each test gets its own copy via `structuredClone()` or a factory function.
+- **Realistic data.** Avoid magic strings like `"test"`, `"foo"`, `"abc123"`.
+- **Deterministic seeding.** Seed generators per test file so failures reproduce.
+- **Fixture files** (JSON, YAML) are acceptable for large, complex, or externally-sourced inputs (API response snapshots, configuration samples). Store in `tests/fixtures/`.
+- **Database state:** Integration tests set up and tear down within the test via helpers. Never depend on database state from a previous test. Enforce tenancy isolation via per-test schema or transaction rollback.
+- **Testcontainers** pinned by image digest, not tag.
 ## Error Path Coverage
 Error handling code is often under-tested because developers focus on happy paths. Enforce minimum error coverage:
+- **Every exported function that can fail** must have at least one test exercising the error path. "Can fail" includes functions returning `Result<T, E>`, functions with `throw` statements, async functions calling external services, and functions with input validation.
+- **Error message assertions.** Verify that messages, codes, and structured fields contain the expected values. Do not assert only that "an error was thrown" — verify the error content.
+- **Error propagation.** When a function wraps or transforms errors from a dependency, verify the original error context is preserved (cause chain, stack trace, original error code).
+- **Boundary error tests.** For each architectural boundary (API handler, event handler, background processor), verify that errors are caught, logged, and returned as safe responses without leaking internal details.
-- **Every exported function that can fail** must have at least one test exercising the error path. "Can fail" includes: functions returning `Result<T, E>`, functions with `throw` statements, async functions calling external services, and functions with input validation.
-- **Error message assertions.** Test that error messages, codes, and structured fields contain the expected values. Do not assert only that "an error was thrown" -- verify the error content.
-- **Error propagation.** When a function wraps or transforms errors from a dependency, test that the original error context is preserved (cause chain, stack trace, original error code).
-- **Boundary error tests.** For each architectural boundary (API handler, event handler, background processor), test that errors are caught, logged, and returned as safe responses without leaking internal details.
+## Load Testing in CI
-## Snapshot Testing
+- **k6** (k6 Operator v1.0 for Kubernetes-distributed runs), **Vegeta** (constant-rate, no coordinated omission), **Locust** (Python), **Artillery** (TS).
+- Baseline vs current diff in CI; SLO regression detection on p95, p99, and error-rate thresholds. Block the PR when a tracked SLO regresses.
+## Security Testing in CI
+- **SAST:** Semgrep + CodeQL.
+- **SCA / container / IaC / secrets:** Trivy (one-shot multi-scanner).
+- **DAST:** OWASP ZAP or Nuclei against an ephemeral environment.
+See `rules/hatch3r-container-hardening.md` and `rules/hatch3r-dependency-management.md` for the operational policy around hardening and pinning.
+## AI-Assisted Test Generation
+- **Qodo 2.0** (60.1% F1 on reference benchmark) for TS / JS unit tests + edge cases.
+- **Diffblue Cover** (symbolic, 20× cited productivity uplift on legacy code) for Java.
+These are accelerators, not substitutes for the Per-Feature Mandate Map. Every generated test still goes through review and must map to a required test class for the code under test.
+## Per-Feature Mandate Map
+Reviewers verify each PR satisfies the required test classes for the code class touched. A PR that adds a parser without a fuzz harness, or a payment path without mutation testing, fails review even if coverage is green.
+| Code class | Required test classes |
+|------------|----------------------|
+| Parser / deserializer | unit + property + fuzz |
+| Network handler / RPC entry | integration + contract + fuzz |
+| Payment / billing logic | unit + property + mutation (≥ 80 score) |
+| State machine | unit + property (with `RuleBasedStateMachine` analogue) |
+| Pure function | unit + property |
+| Service / RPC client | unit + contract (consumer side) |
+| Service / RPC server | integration + contract (provider side) + Schemathesis |
+| UI component | unit + visual regression + a11y (via `hatch3r-ui-ux-verify`) |
+| LLM feature | eval (via `hatch3r-ai-feature`) + unit on adapter + integration on fallback chain |
+| Background job | unit + integration with poison-message handling |
+## References
-- **Use sparingly.** Snapshots are appropriate for serialized output (JSON API responses, CLI output, rendered HTML structure) where the exact output matters and is stable.
-- **Not appropriate for:** UI component visual appearance (use visual regression tests), objects with timestamps or random IDs (unstable), large objects (unreadable diffs).
-- **Review discipline.** Snapshot updates (`--update-snapshots`) must be reviewed with the same rigor as code changes. Reviewers must verify the new snapshot is intentionally correct, not just "different."
-- **Keep snapshots small.** Snapshot files > 100 lines suggest the test is asserting too broadly. Narrow the assertion to the relevant subset.
-- **Inline snapshots** (where supported) are preferred over external `.snap` files for short outputs (< 20 lines) because they keep the assertion co-located with the test.
-- **Name snapshot files** to match their test file: `auth.test.ts` → `auth.test.ts.snap`.
+- Stryker (mutation testing): https://stryker-mutator.io/
+- fast-check (property-based testing, TS): https://fast-check.dev/
+- Hypothesis (property-based testing, Python): https://hypothesis.readthedocs.io/
+- proptest (property-based testing, Rust): https://github.com/proptest-rs/proptest
+- Pact (consumer-driven contract testing): https://docs.pact.io/
+- Schemathesis (OpenAPI provider testing): https://schemathesis.readthedocs.io/
+- OWASP Web Security Testing Guide: https://owasp.org/www-project-web-security-testing-guide/

package/rules/hatch3r-ux-states-and-flows.md ADDED Viewed

@@ -0,0 +1,149 @@
+---
+id: hatch3r-ux-states-and-flows
+type: rule
+description: Four-state surface contract (loading/empty/error/partial), user-flow decomposition before implementation, and microcopy + tone discipline for end-user UIs
+scope: "**/*.vue,**/*.jsx,**/*.tsx,**/*.svelte,**/components/**,**/*.html,**/messages/**,**/locales/**,**/copy/**"
+tags: [ux, ui, frontend]
+quality_charter: agents/shared/quality-charter.md
+cache_friendly: true
+---
+# UX States and Flows
+## User-Flow Decomposition (Mandatory Before Implementation)
+Every user story produces three explicit flows before any UI code is written. Skipping this step is a regression — the implementer rejects specs lacking flow decomposition.
+1. **Happy Path** — numbered sequence of user actions paired with system responses and decision points covering the success route end-to-end. Includes entry point (route, prior screen, deep link), authentication state assumed, and exit point (resulting state, next screen).
+2. **Alternative Paths** — every documented branch (filters applied, multi-step forms, optional sections, permission tiers, locale variants). Each branch is numbered and references the decision point in the Happy Path where it diverges. Re-converges named.
+3. **Error-Recovery Path** — what the user does when a step fails (network drop, validation failure, expired session, permission denied, server outage, conflict on concurrent edit). Names the recovery action (retry, undo, refresh credential, contact support, drop changes) at every failure point.
+Each flow is a numbered list with three columns: user action, system response, decision/branch. Reference `agents/modes/user-flows.md` for the canonical template. A flow is incomplete if any documented decision point has no specified branch outcome. The implementer treats acceptance criteria as a downstream artifact of the flow set — flows come first, criteria are derived.
+Flow decomposition is also the contract between the feature designer and the reviewer: the reviewer walks every flow during code review and asserts each numbered step maps to a code path. A code path with no flow step is dead code or undocumented surface; a flow step with no code path is a missing implementation.
+## Four-State Surface Contract
+Every async view renders all four states. Missing any state on an async view is a blocker — NN/g 2025 measured 92% of AI-generated dashboards omit empty state and 78% omit error state, so this is the regression bar to beat.
+| State | Trigger | Rendering rules |
+|-------|---------|-----------------|
+| Loading | Request in flight >300ms | Skeleton matches final content dimensions (explicit width/height) to keep CLS <=0.1; spinner only for short modal actions <500ms; loading state announced via `aria-busy="true"` on the container |
+| Empty | Successful response, 0 results | Distinguish three sub-types below: cold start, active filter, no permission. Each has a distinct heading, visual cue, and CTA |
+| Error | Network failure, server error, validation failure | Plain-language cause + retry CTA + fallback or contact path; no "Oops" or "Something went wrong" — corrective verb required; render in `role="alert"` for assistive tech announcement |
+| Partial | Some data succeeded, some failed | Banner naming the failed subset + degraded data rendered for the succeeded subset + retry option for the failed subset; never silently drop the failed subset |
+State machines drive these transitions — model the view as `idle | loading | success | empty | error | partial` and assert every state has a render path in the component. Snapshot every state in tests (Storybook play function or vitest + Testing Library): missing snapshots are blockers.
+Transitions between states have their own rules. Loading -> success swaps the skeleton for content without layout shift (the skeleton was already the final dimensions). Loading -> error replaces the skeleton with the error surface and announces it via `role="alert"`. Loading -> empty replaces the skeleton with the sub-typed empty surface (cold / filter / permission). Success -> partial overlays the banner without re-rendering the succeeded subset. Every transition has a measurable duration and an announcement strategy — never silent.
+## First-Run vs Filter vs Network — Content Structure
+| State type | Heading level | Visual cue | CTA | Example |
+|------------|---------------|------------|-----|---------|
+| Cold start | h2 | Onboarding illustration | "Create your first <noun>" | "No <items> yet — create your first to begin" |
+| Active filter | h3 inside results | Filter icon | "Clear filters" | "No results match these filters" |
+| Network error | role=alert | Error icon | "Retry" + "Contact support" | "Couldn't load <items> — check your connection" |
+| No permission | h3 | Lock icon | "Request access" | "You don't have access to <thing>" |
+Cold start and active filter are not interchangeable — a first-run view directs the user to the onboarding action; an active-filter view offers "Clear filters" because the data exists behind the filter. Rendering "Clear filters" on a cold-start view is a usability defect (the button has no effect). Detect the sub-type from a state predicate (item count vs filter count vs auth scope), not from a single boolean.
+Predicate ordering matters: check permission first (no-permission overrides everything), then filter state (active-filter overrides cold-start), then total count (cold-start when zero items exist anywhere). Inverting the order renders the wrong empty sub-type and the wrong CTA — a frequent source of "the button does nothing" bug reports.
+## Form UX
+- Inline validation timing: validate on `blur`. After the first error on a field appears, re-validate on every keystroke with a 150ms debounce so the error tracks the user's correction in flight.
+- Error recovery: clear the inline error immediately when the field becomes valid (no debounce on clear); re-enable the submit button in real time so the user sees recovery without re-tabbing.
+- Error summary on submit: render at the top of the form with anchor links to each invalid field. Move focus to the summary heading (not to the first error field) so screen-reader users hear the count and list before being dropped into one field. Anchor link target is the input itself; the link text is the field label, not "Error 1".
+- Required attributes per input: `autocomplete` (e.g., `email`, `tel`, `cc-number`, `one-time-code`), `inputmode` (e.g., `numeric`, `decimal`, `email`), `aria-describedby` pointing at the error or help text, and `aria-invalid="true"` when invalid. `enterkeyhint` on multi-step forms maps the soft-keyboard enter key to the next action.
+- Never use a disabled submit button as the only error signal — users assume the page is broken. Allow submission and render validation outcome on submit. A submit attempt with invalid fields renders the summary, sets focus on it, and leaves the submit button operable for retry.
+- Identifier-first auth: separate email and password screens; offer passkeys via WebAuthn Conditional UI (`autocomplete="username webauthn"`) so the OS surfaces the credential inline in autofill rather than behind a separate button. Passkey enrollment copy states the user-visible benefit in one sentence ("Sign in with your fingerprint — no password to remember").
+- Positive confirmation for non-trivial fields (email, password strength, username availability): when valid, render an inline check or short affirmation. Pair with a screen-reader-only `aria-live="polite"` announcement.
+- Multi-step forms render a step indicator with current step, total steps, and per-step labels. Each step is a separate `<form>` with its own validation and submit; never collect all fields on one screen behind tabs. Browser back button maps to previous step without data loss.
+- Autosave drafts where the user expects persistence (long-form composition, settings panels). Surface autosave state with a passive indicator ("Saved 2s ago" or "Saving..."), never with a blocking modal.
+## Error-Recovery Patterns
+- Retry button retries the same operation with the same input. Disable for 1s after click to avoid duplicate submits; re-enable on completion.
+- Fallback offers an alternative path when retry will not help: cached view, contact support, alternate route. Render fallback below retry, not above.
+- Undo for reversible mutations: surface a toast with the undo action and a >=10-second timeout. Persist the undo action across screen navigations until the timeout fires or the user confirms.
+- Conflict on concurrent edit: surface a diff with three options (keep mine, keep theirs, merge); never overwrite silently.
+- Session expiration: re-auth inline (modal or banner with credential field), preserving the user's in-flight data. Forced re-auth that drops form data is a defect.
+## Microcopy and Tone
+Anchored to GOV.UK Design System (error pattern, plain English) and IBM Carbon (voice + tone).
+- Plain language; second person; corrective verb on every error string ("Enter your email address", not "Email is required"). The verb names the action the user takes to recover.
+- No jargon visible to end users: never expose "FIDO2", "WebAuthn", "HTTP 500", "null", "undefined", "401". Translate the implementation detail to the action the user can take. Implementation labels live in logs and error reports, not in UI strings.
+- CTA labels are action-oriented and specific: "Save changes", "Create account", "Delete project" — not "Submit", "OK", "Confirm". Destructive verbs name the object: "Delete project Foo permanently".
+- Error tone explains cause and offers recovery; never blame the user. "We couldn't save your changes — try again." not "You did something wrong." Distinguish system-fault from user-fault wording — for user-fault, name the field; for system-fault, use first-person plural ("we couldn't").
+- ICU MessageFormat for every translatable string. Never concatenate strings to build sentences — singular, plural, and gender variants all live in one message ID with explicit categories. Translator memory stays stable across releases when keys are stable; rename keys is a breaking change.
+- Cross-reference `rules/hatch3r-i18n.md` for translation mechanics and the Microcopy & Tone subsection that defines voice consistency across locales. Reserve 30-40% extra width in layouts for languages like DE and FI.
+- Treat AI as a copy assistant, not author — AI generates tone variants and the human approves final voice strings. Brand voice strings are owned by a named maintainer, not by a model.
+- Forbidden filler in any user-visible string: "Oops", "Whoops", "Something went wrong", "An unexpected error occurred". Lint these via a string-scan check in CI against the locale files. Replacements name the failure and the action — "Couldn't load <items>. Check your connection and retry."
+- Empty-state copy follows the same plain-language rule: name what is absent and the action to fill it. "No drafts yet — start writing." beats "You have no drafts." because the action is in the same sentence.
+- Truncation strategy: never mid-word in single-line truncations; use `text-overflow: ellipsis` with a tooltip or expand affordance for the full string. Multi-line truncation uses `-webkit-line-clamp` with a "Read more" affordance below.
+- Time and date formatting: respect the user's locale via `Intl.DateTimeFormat`; never hard-code "MM/DD/YYYY" or "12:34 PM" in UI strings. Relative time ("3 minutes ago") for recent events; absolute for events >24h old.
+- Numbers and units follow the locale's separator and grouping. Currency renders the user-selected currency, not the merchant's default. Round to the precision the user can act on — financial transactions to two decimals, weights to one.
+## Perceived Performance
+- Skeleton show threshold: 300ms first-content latency. Below 300ms render the final view directly; above 300ms render skeleton first to avoid spinner flash. Implement via a delayed `setTimeout(showSkeleton, 300)` cleared on early response.
+- Skeleton matches final content dimensions (explicit `width`, `height`, or `aspect-ratio`) so the layout does not shift when real content arrives — CLS <=0.1 baseline. Test by measuring CLS on the route under a throttled network profile (Slow 4G or 4x CPU).
+- Optimistic UI for safe mutations (add to list, mark done, toggle like): apply the change in local state immediately and reconcile on server response. On reject, roll back the local state AND surface a non-blocking toast with retry. Use `useOptimistic` in React or framework equivalent.
+- Pessimistic UI for destructive or financial actions (delete, payment, irreversible state change): wait for server confirmation before reflecting the change. Surface a progress indicator on the action itself, not on the page.
+- Stale-while-revalidate for non-critical data: render the cached value immediately, refresh in the background, surface a non-blocking "Refresh available" affordance when the fresh value differs from the cached value. Never overwrite the rendered view silently — the user is mid-task on the stale data.
+- 2026 Core Web Vitals gates at p75 of real-user data: LCP <=2.5s, INP <=200ms, CLS <=0.1. These are gating, not aspirational. Break long tasks (>50ms) with `scheduler.yield()` or `requestIdleCallback` to keep INP within budget.
+- Streaming UI for long-running AI or tool responses: render tokens as they arrive without re-parsing the full buffer per chunk. Show a typing indicator only while no token has arrived; switch to streamed content on first delta. Provide cancel/abort at every streamed surface.
+- Loading copy names the phase when total work is knowable ("Uploading 3 of 7..."), or names the activity when not ("Encoding..."). Generic "Loading..." is the fallback for sub-300ms cases that escape the skeleton threshold.
+## Streaming and Long-Running Operations
+- Streaming text (AI responses, log tails, real-time data) renders incrementally. First-token latency target <=500ms; render the typing indicator only until first token, then switch to streamed content.
+- Tool calls and side-effectful operations render as collapsible structured cards with status (pending, running, done, failed), arguments summary, duration, result preview. Hidden tool calls are a trust regression.
+- Human-in-the-loop approval required for any side-effectful tool (write, send, pay, deploy). Default approval mode is opt-in; auto-accept lives behind an explicit setting with named scope and one-click revocation.
+- Cancel/abort available at every long-running call. Abort triggers cleanup (rollback partial state, close streams, release locks). Show the user the post-abort state — never leave the UI in a "Cancelling..." limbo.
+- Citation rendering for retrieval-grounded claims: inline link to the source span. Spans the model cannot ground from retrieved evidence are flagged visibly, not silently emitted.
+## Mobile and Touch
+- Touch target minimum: 44pt on iOS HIG, 48dp on Material 3. The platform minimum supersedes WCAG SC 2.5.8 (24x24 CSS px) on touch surfaces — use the stricter bound.
+- Spacing between adjacent interactive elements: >=8px to avoid mis-taps. Measure on the rendered DOM, not on the design comp — padding and margins count toward hit area only when they belong to the same interactive element.
+- Avoid tap-to-reveal (hover-only tooltip) on touch surfaces. Use permanent labels or long-press with a visible affordance. Hover-only tooltips fail WCAG SC 1.4.13 Content on Hover or Focus on touch.
+- Apply `env(safe-area-inset-*)` on full-bleed surfaces so primary CTAs do not sit under the iOS home indicator, notch, or Android gesture nav. Native equivalents: `safeAreaInsets` on iOS, `WindowInsets` on Android.
+- Dynamic Type on iOS (`UIFont.preferredFont(forTextStyle:)` or SwiftUI `Font.body`) and rem-based font scaling on Android/web. Never use hard-coded `px` for text — fails text-resize verification at 200% zoom (WCAG SC 1.4.4) and 400% reflow (SC 1.4.10).
+- Drag-only gestures (range slider, kanban reorder, swipe-to-delete) require a non-drag alternative per WCAG SC 2.5.7 — keyboard arrows, click-to-place, explicit "Delete" button.
+- Pointer-event vs touch-event: prefer pointer events (`pointerdown`, `pointermove`, `pointerup`) so the same handler covers mouse, touch, pen. Synthesize click events on tap with a 300ms tolerance only on legacy mobile browsers — modern engines emit click immediately.
+- Pinch-zoom: never disable user zoom on the viewport meta tag. WCAG SC 1.4.4 fails on disabled zoom. Use `maximum-scale=5` to allow up to 5x zoom while suppressing zoom-on-focus jitter.
+- Orientation lock: respect WCAG SC 1.3.4 — content adapts to portrait and landscape unless an essential exception applies (e.g., piano keyboard, blueprint editor). Document the exception in the spec.
+## Heuristic Verification Gate
+Run every check below before declaring a feature done. A "done" claim without these checks is not done.
+- Nielsen 10 heuristics 5-minute walkthrough by a reviewer on the key user flow. Flag any heuristic violation (visibility of status, match to real world, user control, consistency, error prevention, recognition over recall, flexibility, minimalist design, error recovery, help/documentation); resolve before ship.
+- Keyboard-only run through the flow (no mouse, no touch). Every step reachable, every focus state visible (>=2px outline at >=3:1 contrast), no traps, no off-screen focus.
+- Screen-reader smoke test on the key route — one human pass per release using VoiceOver (macOS/iOS) or NVDA (Windows). Document the trace: which landmarks announced, which form labels heard, which dynamic updates registered.
+- First-time-user walkthrough for P0 features: measure task-completion time and observe error recovery. Note where the user paused, backtracked, or asked for help. Three observed users per P0 feature minimum.
+- State-coverage check: snapshot every async view in each of {loading, empty, error, partial, success} via Storybook play or component tests. Missing snapshots block the gate.
+- Microcopy lint: a CI string-scan rejects banned filler ("Oops", "Whoops", "Something went wrong") in any locale file; rejects `disabled` on submit buttons without an accompanying error-state announcement; flags strings without a corrective verb on error keys (`error.*`, `validation.*`).
+- Reduced-motion pass: toggle `prefers-reduced-motion: reduce` on the route and verify all non-essential transitions are removed; essential loaders simplify rather than disable.
+- Reflow pass: zoom the route to 200% and 400% in a 1280x1024 viewport — no horizontal scroll, no clipped content, no overlapping interactive elements.
+## References
+- WCAG 2.2 AA — new success criteria SC 2.5.8 (Target Size), SC 2.4.11 (Focus Not Obscured), SC 2.5.7 (Dragging Movements), SC 1.4.13 (Content on Hover or Focus)
+- NN/g state-omission research 2025 — empty-state and error-state omission rates in AI-generated UIs (92% empty omitted, 78% error omitted)
+- GOV.UK Design System — error message and error summary components, plain English readability guidance
+- IBM Carbon Design System — voice and tone content guidelines, error message patterns
+- Baymard Institute — inline form validation research and disabled-submit findings (top-10 form-abandonment driver)
+- Google Core Web Vitals 2026 thresholds — LCP, INP, CLS at p75 real-user data
+- Nielsen Norman Group — 10 usability heuristics for user interface design
+- Mailchimp Content Style Guide — voice and tone baseline, clarity over cleverness
+- Apple Human Interface Guidelines — Dynamic Type, safe area, touch target sizing
+- Material 3 Design System — touch target sizing (48dp), state layers, ripple feedback
+- W3C WAI-ARIA Authoring Practices — accessible widget patterns, live regions, focus management
+- FIDO Alliance / Passkey Central — WebAuthn Conditional UI design guidelines for passwordless auth
+- Vercel AI Elements — streaming message component patterns for AI chat surfaces
+- Anthropic engineering blog — agent UI patterns, human-in-the-loop approval, tool-call transparency
+- OpenAI Apps SDK UI guidelines — citation rendering, tool-result presentation

package/rules/hatch3r-ux-states-and-flows.mdc ADDED Viewed

@@ -0,0 +1,145 @@
+---
+description: Four-state surface contract (loading/empty/error/partial), user-flow decomposition before implementation, and microcopy + tone discipline for end-user UIs
+globs: ["**/*.vue", "**/*.jsx", "**/*.tsx", "**/*.svelte", "**/components/**", "**/*.html", "**/messages/**", "**/locales/**", "**/copy/**"]
+alwaysApply: false
+---
+# UX States and Flows
+## User-Flow Decomposition (Mandatory Before Implementation)
+Every user story produces three explicit flows before any UI code is written. Skipping this step is a regression — the implementer rejects specs lacking flow decomposition.
+1. **Happy Path** — numbered sequence of user actions paired with system responses and decision points covering the success route end-to-end. Includes entry point (route, prior screen, deep link), authentication state assumed, and exit point (resulting state, next screen).
+2. **Alternative Paths** — every documented branch (filters applied, multi-step forms, optional sections, permission tiers, locale variants). Each branch is numbered and references the decision point in the Happy Path where it diverges. Re-converges named.
+3. **Error-Recovery Path** — what the user does when a step fails (network drop, validation failure, expired session, permission denied, server outage, conflict on concurrent edit). Names the recovery action (retry, undo, refresh credential, contact support, drop changes) at every failure point.
+Each flow is a numbered list with three columns: user action, system response, decision/branch. Reference `agents/modes/user-flows.md` for the canonical template. A flow is incomplete if any documented decision point has no specified branch outcome. The implementer treats acceptance criteria as a downstream artifact of the flow set — flows come first, criteria are derived.
+Flow decomposition is also the contract between the feature designer and the reviewer: the reviewer walks every flow during code review and asserts each numbered step maps to a code path. A code path with no flow step is dead code or undocumented surface; a flow step with no code path is a missing implementation.
+## Four-State Surface Contract
+Every async view renders all four states. Missing any state on an async view is a blocker — NN/g 2025 measured 92% of AI-generated dashboards omit empty state and 78% omit error state, so this is the regression bar to beat.
+| State | Trigger | Rendering rules |
+|-------|---------|-----------------|
+| Loading | Request in flight >300ms | Skeleton matches final content dimensions (explicit width/height) to keep CLS <=0.1; spinner only for short modal actions <500ms; loading state announced via `aria-busy="true"` on the container |
+| Empty | Successful response, 0 results | Distinguish three sub-types below: cold start, active filter, no permission. Each has a distinct heading, visual cue, and CTA |
+| Error | Network failure, server error, validation failure | Plain-language cause + retry CTA + fallback or contact path; no "Oops" or "Something went wrong" — corrective verb required; render in `role="alert"` for assistive tech announcement |
+| Partial | Some data succeeded, some failed | Banner naming the failed subset + degraded data rendered for the succeeded subset + retry option for the failed subset; never silently drop the failed subset |
+State machines drive these transitions — model the view as `idle | loading | success | empty | error | partial` and assert every state has a render path in the component. Snapshot every state in tests (Storybook play function or vitest + Testing Library): missing snapshots are blockers.
+Transitions between states have their own rules. Loading -> success swaps the skeleton for content without layout shift (the skeleton was already the final dimensions). Loading -> error replaces the skeleton with the error surface and announces it via `role="alert"`. Loading -> empty replaces the skeleton with the sub-typed empty surface (cold / filter / permission). Success -> partial overlays the banner without re-rendering the succeeded subset. Every transition has a measurable duration and an announcement strategy — never silent.
+## First-Run vs Filter vs Network — Content Structure
+| State type | Heading level | Visual cue | CTA | Example |
+|------------|---------------|------------|-----|---------|
+| Cold start | h2 | Onboarding illustration | "Create your first <noun>" | "No <items> yet — create your first to begin" |
+| Active filter | h3 inside results | Filter icon | "Clear filters" | "No results match these filters" |
+| Network error | role=alert | Error icon | "Retry" + "Contact support" | "Couldn't load <items> — check your connection" |
+| No permission | h3 | Lock icon | "Request access" | "You don't have access to <thing>" |
+Cold start and active filter are not interchangeable — a first-run view directs the user to the onboarding action; an active-filter view offers "Clear filters" because the data exists behind the filter. Rendering "Clear filters" on a cold-start view is a usability defect (the button has no effect). Detect the sub-type from a state predicate (item count vs filter count vs auth scope), not from a single boolean.
+Predicate ordering matters: check permission first (no-permission overrides everything), then filter state (active-filter overrides cold-start), then total count (cold-start when zero items exist anywhere). Inverting the order renders the wrong empty sub-type and the wrong CTA — a frequent source of "the button does nothing" bug reports.
+## Form UX
+- Inline validation timing: validate on `blur`. After the first error on a field appears, re-validate on every keystroke with a 150ms debounce so the error tracks the user's correction in flight.
+- Error recovery: clear the inline error immediately when the field becomes valid (no debounce on clear); re-enable the submit button in real time so the user sees recovery without re-tabbing.
+- Error summary on submit: render at the top of the form with anchor links to each invalid field. Move focus to the summary heading (not to the first error field) so screen-reader users hear the count and list before being dropped into one field. Anchor link target is the input itself; the link text is the field label, not "Error 1".
+- Required attributes per input: `autocomplete` (e.g., `email`, `tel`, `cc-number`, `one-time-code`), `inputmode` (e.g., `numeric`, `decimal`, `email`), `aria-describedby` pointing at the error or help text, and `aria-invalid="true"` when invalid. `enterkeyhint` on multi-step forms maps the soft-keyboard enter key to the next action.
+- Never use a disabled submit button as the only error signal — users assume the page is broken. Allow submission and render validation outcome on submit. A submit attempt with invalid fields renders the summary, sets focus on it, and leaves the submit button operable for retry.
+- Identifier-first auth: separate email and password screens; offer passkeys via WebAuthn Conditional UI (`autocomplete="username webauthn"`) so the OS surfaces the credential inline in autofill rather than behind a separate button. Passkey enrollment copy states the user-visible benefit in one sentence ("Sign in with your fingerprint — no password to remember").
+- Positive confirmation for non-trivial fields (email, password strength, username availability): when valid, render an inline check or short affirmation. Pair with a screen-reader-only `aria-live="polite"` announcement.
+- Multi-step forms render a step indicator with current step, total steps, and per-step labels. Each step is a separate `<form>` with its own validation and submit; never collect all fields on one screen behind tabs. Browser back button maps to previous step without data loss.
+- Autosave drafts where the user expects persistence (long-form composition, settings panels). Surface autosave state with a passive indicator ("Saved 2s ago" or "Saving..."), never with a blocking modal.
+## Error-Recovery Patterns
+- Retry button retries the same operation with the same input. Disable for 1s after click to avoid duplicate submits; re-enable on completion.
+- Fallback offers an alternative path when retry will not help: cached view, contact support, alternate route. Render fallback below retry, not above.
+- Undo for reversible mutations: surface a toast with the undo action and a >=10-second timeout. Persist the undo action across screen navigations until the timeout fires or the user confirms.
+- Conflict on concurrent edit: surface a diff with three options (keep mine, keep theirs, merge); never overwrite silently.
+- Session expiration: re-auth inline (modal or banner with credential field), preserving the user's in-flight data. Forced re-auth that drops form data is a defect.
+## Microcopy and Tone
+Anchored to GOV.UK Design System (error pattern, plain English) and IBM Carbon (voice + tone).
+- Plain language; second person; corrective verb on every error string ("Enter your email address", not "Email is required"). The verb names the action the user takes to recover.
+- No jargon visible to end users: never expose "FIDO2", "WebAuthn", "HTTP 500", "null", "undefined", "401". Translate the implementation detail to the action the user can take. Implementation labels live in logs and error reports, not in UI strings.
+- CTA labels are action-oriented and specific: "Save changes", "Create account", "Delete project" — not "Submit", "OK", "Confirm". Destructive verbs name the object: "Delete project Foo permanently".
+- Error tone explains cause and offers recovery; never blame the user. "We couldn't save your changes — try again." not "You did something wrong." Distinguish system-fault from user-fault wording — for user-fault, name the field; for system-fault, use first-person plural ("we couldn't").
+- ICU MessageFormat for every translatable string. Never concatenate strings to build sentences — singular, plural, and gender variants all live in one message ID with explicit categories. Translator memory stays stable across releases when keys are stable; rename keys is a breaking change.
+- Cross-reference `rules/hatch3r-i18n.md` for translation mechanics and the Microcopy & Tone subsection that defines voice consistency across locales. Reserve 30-40% extra width in layouts for languages like DE and FI.
+- Treat AI as a copy assistant, not author — AI generates tone variants and the human approves final voice strings. Brand voice strings are owned by a named maintainer, not by a model.
+- Forbidden filler in any user-visible string: "Oops", "Whoops", "Something went wrong", "An unexpected error occurred". Lint these via a string-scan check in CI against the locale files. Replacements name the failure and the action — "Couldn't load <items>. Check your connection and retry."
+- Empty-state copy follows the same plain-language rule: name what is absent and the action to fill it. "No drafts yet — start writing." beats "You have no drafts." because the action is in the same sentence.
+- Truncation strategy: never mid-word in single-line truncations; use `text-overflow: ellipsis` with a tooltip or expand affordance for the full string. Multi-line truncation uses `-webkit-line-clamp` with a "Read more" affordance below.
+- Time and date formatting: respect the user's locale via `Intl.DateTimeFormat`; never hard-code "MM/DD/YYYY" or "12:34 PM" in UI strings. Relative time ("3 minutes ago") for recent events; absolute for events >24h old.
+- Numbers and units follow the locale's separator and grouping. Currency renders the user-selected currency, not the merchant's default. Round to the precision the user can act on — financial transactions to two decimals, weights to one.
+## Perceived Performance
+- Skeleton show threshold: 300ms first-content latency. Below 300ms render the final view directly; above 300ms render skeleton first to avoid spinner flash. Implement via a delayed `setTimeout(showSkeleton, 300)` cleared on early response.
+- Skeleton matches final content dimensions (explicit `width`, `height`, or `aspect-ratio`) so the layout does not shift when real content arrives — CLS <=0.1 baseline. Test by measuring CLS on the route under a throttled network profile (Slow 4G or 4x CPU).
+- Optimistic UI for safe mutations (add to list, mark done, toggle like): apply the change in local state immediately and reconcile on server response. On reject, roll back the local state AND surface a non-blocking toast with retry. Use `useOptimistic` in React or framework equivalent.
+- Pessimistic UI for destructive or financial actions (delete, payment, irreversible state change): wait for server confirmation before reflecting the change. Surface a progress indicator on the action itself, not on the page.
+- Stale-while-revalidate for non-critical data: render the cached value immediately, refresh in the background, surface a non-blocking "Refresh available" affordance when the fresh value differs from the cached value. Never overwrite the rendered view silently — the user is mid-task on the stale data.
+- 2026 Core Web Vitals gates at p75 of real-user data: LCP <=2.5s, INP <=200ms, CLS <=0.1. These are gating, not aspirational. Break long tasks (>50ms) with `scheduler.yield()` or `requestIdleCallback` to keep INP within budget.
+- Streaming UI for long-running AI or tool responses: render tokens as they arrive without re-parsing the full buffer per chunk. Show a typing indicator only while no token has arrived; switch to streamed content on first delta. Provide cancel/abort at every streamed surface.
+- Loading copy names the phase when total work is knowable ("Uploading 3 of 7..."), or names the activity when not ("Encoding..."). Generic "Loading..." is the fallback for sub-300ms cases that escape the skeleton threshold.
+## Streaming and Long-Running Operations
+- Streaming text (AI responses, log tails, real-time data) renders incrementally. First-token latency target <=500ms; render the typing indicator only until first token, then switch to streamed content.
+- Tool calls and side-effectful operations render as collapsible structured cards with status (pending, running, done, failed), arguments summary, duration, result preview. Hidden tool calls are a trust regression.
+- Human-in-the-loop approval required for any side-effectful tool (write, send, pay, deploy). Default approval mode is opt-in; auto-accept lives behind an explicit setting with named scope and one-click revocation.
+- Cancel/abort available at every long-running call. Abort triggers cleanup (rollback partial state, close streams, release locks). Show the user the post-abort state — never leave the UI in a "Cancelling..." limbo.
+- Citation rendering for retrieval-grounded claims: inline link to the source span. Spans the model cannot ground from retrieved evidence are flagged visibly, not silently emitted.
+## Mobile and Touch
+- Touch target minimum: 44pt on iOS HIG, 48dp on Material 3. The platform minimum supersedes WCAG SC 2.5.8 (24x24 CSS px) on touch surfaces — use the stricter bound.
+- Spacing between adjacent interactive elements: >=8px to avoid mis-taps. Measure on the rendered DOM, not on the design comp — padding and margins count toward hit area only when they belong to the same interactive element.
+- Avoid tap-to-reveal (hover-only tooltip) on touch surfaces. Use permanent labels or long-press with a visible affordance. Hover-only tooltips fail WCAG SC 1.4.13 Content on Hover or Focus on touch.
+- Apply `env(safe-area-inset-*)` on full-bleed surfaces so primary CTAs do not sit under the iOS home indicator, notch, or Android gesture nav. Native equivalents: `safeAreaInsets` on iOS, `WindowInsets` on Android.
+- Dynamic Type on iOS (`UIFont.preferredFont(forTextStyle:)` or SwiftUI `Font.body`) and rem-based font scaling on Android/web. Never use hard-coded `px` for text — fails text-resize verification at 200% zoom (WCAG SC 1.4.4) and 400% reflow (SC 1.4.10).
+- Drag-only gestures (range slider, kanban reorder, swipe-to-delete) require a non-drag alternative per WCAG SC 2.5.7 — keyboard arrows, click-to-place, explicit "Delete" button.
+- Pointer-event vs touch-event: prefer pointer events (`pointerdown`, `pointermove`, `pointerup`) so the same handler covers mouse, touch, pen. Synthesize click events on tap with a 300ms tolerance only on legacy mobile browsers — modern engines emit click immediately.
+- Pinch-zoom: never disable user zoom on the viewport meta tag. WCAG SC 1.4.4 fails on disabled zoom. Use `maximum-scale=5` to allow up to 5x zoom while suppressing zoom-on-focus jitter.
+- Orientation lock: respect WCAG SC 1.3.4 — content adapts to portrait and landscape unless an essential exception applies (e.g., piano keyboard, blueprint editor). Document the exception in the spec.
+## Heuristic Verification Gate
+Run every check below before declaring a feature done. A "done" claim without these checks is not done.
+- Nielsen 10 heuristics 5-minute walkthrough by a reviewer on the key user flow. Flag any heuristic violation (visibility of status, match to real world, user control, consistency, error prevention, recognition over recall, flexibility, minimalist design, error recovery, help/documentation); resolve before ship.
+- Keyboard-only run through the flow (no mouse, no touch). Every step reachable, every focus state visible (>=2px outline at >=3:1 contrast), no traps, no off-screen focus.
+- Screen-reader smoke test on the key route — one human pass per release using VoiceOver (macOS/iOS) or NVDA (Windows). Document the trace: which landmarks announced, which form labels heard, which dynamic updates registered.
+- First-time-user walkthrough for P0 features: measure task-completion time and observe error recovery. Note where the user paused, backtracked, or asked for help. Three observed users per P0 feature minimum.
+- State-coverage check: snapshot every async view in each of {loading, empty, error, partial, success} via Storybook play or component tests. Missing snapshots block the gate.
+- Microcopy lint: a CI string-scan rejects banned filler ("Oops", "Whoops", "Something went wrong") in any locale file; rejects `disabled` on submit buttons without an accompanying error-state announcement; flags strings without a corrective verb on error keys (`error.*`, `validation.*`).
+- Reduced-motion pass: toggle `prefers-reduced-motion: reduce` on the route and verify all non-essential transitions are removed; essential loaders simplify rather than disable.
+- Reflow pass: zoom the route to 200% and 400% in a 1280x1024 viewport — no horizontal scroll, no clipped content, no overlapping interactive elements.
+## References
+- WCAG 2.2 AA — new success criteria SC 2.5.8 (Target Size), SC 2.4.11 (Focus Not Obscured), SC 2.5.7 (Dragging Movements), SC 1.4.13 (Content on Hover or Focus)
+- NN/g state-omission research 2025 — empty-state and error-state omission rates in AI-generated UIs (92% empty omitted, 78% error omitted)
+- GOV.UK Design System — error message and error summary components, plain English readability guidance
+- IBM Carbon Design System — voice and tone content guidelines, error message patterns
+- Baymard Institute — inline form validation research and disabled-submit findings (top-10 form-abandonment driver)
+- Google Core Web Vitals 2026 thresholds — LCP, INP, CLS at p75 real-user data
+- Nielsen Norman Group — 10 usability heuristics for user interface design
+- Mailchimp Content Style Guide — voice and tone baseline, clarity over cleverness
+- Apple Human Interface Guidelines — Dynamic Type, safe area, touch target sizing
+- Material 3 Design System — touch target sizing (48dp), state layers, ripple feedback
+- W3C WAI-ARIA Authoring Practices — accessible widget patterns, live regions, focus management
+- FIDO Alliance / Passkey Central — WebAuthn Conditional UI design guidelines for passwordless auth
+- Vercel AI Elements — streaming message component patterns for AI chat surfaces
+- Anthropic engineering blog — agent UI patterns, human-in-the-loop approval, tool-call transparency
+- OpenAI Apps SDK UI guidelines — citation rendering, tool-result presentation

package/skills/hatch3r-a11y-audit/SKILL.md CHANGED Viewed

@@ -12,6 +12,7 @@ cache_friendly: true
 ```
 Task Progress:
+- [ ] Step 0: Detect ambiguity (P8 B1)
 - [ ] Step 1: Read accessibility requirements from rules and specs
 - [ ] Step 2: Automated scan — run axe-core or similar on all pages/components
 - [ ] Step 3: Manual audit — load references/manual-audit-checklist.md
@@ -20,6 +21,19 @@ Task Progress:
 - [ ] Step 6: Verify fixes with re-scan and manual check
 ```
+## Step 0 — Detect Ambiguity (P8 B1)
+Before any work, scan the invocation for unresolved questions in scope, intent, acceptance criteria, target environment, or irreversibility. If any are found, ask the user via the platform-native question tool per `agents/shared/user-question-protocol.md`. Do not proceed under silent assumption. Default path, not an exception. Triggers for THIS skill: WCAG conformance target (A vs AA vs AAA), scope (single component vs full app), surfaces to audit (which routes), fix authority (audit-only vs audit-and-fix), and screen-reader pass scope (per release vs per audit).
+## Fan-out Discipline (P8 B2)
+This skill delegates per task size:
+- Tier 1 (trivial single-file): inline execution acceptable.
+- Tier 2 (multi-file or multi-concern): spawn parallel sub-agents per concern via the Task tool.
+- Tier 3 (multi-module / high-risk): one fresh sub-agent per independent module or gate; orchestrator integrates only.
+Never under-fan-out to save tokens. Token cost is dominated by quality and completeness gains. Emit `sub_agents_spawned: { count, rationale }` in your output.
 ## Progressive Disclosure
 - **Main skill file (this):** Workflow steps, automated scan, fix process, DoD.