npm - hatch3r - Versions diffs - 1.7.1 → 1.8.0 - Mend

hatch3r 1.7.1 → 1.8.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (189) hide show

package/README.md +38 -12
package/agents/hatch3r-a11y-auditor.md +4 -0
package/agents/hatch3r-architect.md +4 -0
package/agents/hatch3r-ci-watcher.md +4 -0
package/agents/hatch3r-context-rules.md +26 -6
package/agents/hatch3r-creator.md +6 -1
package/agents/hatch3r-dependency-auditor.md +4 -0
package/agents/hatch3r-devops.md +4 -0
package/agents/hatch3r-docs-writer.md +4 -0
package/agents/hatch3r-fixer.md +4 -0
package/agents/hatch3r-handoff-loader.md +243 -0
package/agents/hatch3r-handoff-preparer.md +134 -0
package/agents/hatch3r-implementer.md +12 -0
package/agents/hatch3r-learnings-loader.md +5 -1
package/agents/hatch3r-lint-fixer.md +4 -0
package/agents/hatch3r-perf-profiler.md +8 -0
package/agents/hatch3r-researcher.md +4 -0
package/agents/hatch3r-reviewer.md +94 -0
package/agents/hatch3r-security-auditor.md +24 -0
package/agents/hatch3r-test-writer.md +4 -0
package/agents/modes/requirements-elicitation.md +4 -1
package/agents/modes/similar-implementation.md +6 -0
package/agents/modes/user-flows.md +76 -0
package/agents/shared/quality-charter.md +128 -0
package/agents/shared/user-content-templates.md +31 -1
package/commands/hatch3r-agent-customize.md +4 -0
package/commands/hatch3r-api-spec.md +7 -0
package/commands/hatch3r-benchmark.md +7 -0
package/commands/hatch3r-board-fill.md +8 -0
package/commands/hatch3r-board-groom.md +4 -0
package/commands/hatch3r-board-init.md +51 -0
package/commands/hatch3r-board-pickup.md +8 -0
package/commands/hatch3r-board-refresh.md +4 -0
package/commands/hatch3r-board-shared.md +6 -6
package/commands/hatch3r-bug-plan.md +7 -0
package/commands/hatch3r-codebase-map.md +8 -0
package/commands/hatch3r-command-customize.md +4 -0
package/commands/hatch3r-context-health.md +5 -0
package/commands/hatch3r-create.md +59 -4
package/commands/hatch3r-debug.md +7 -0
package/commands/hatch3r-dep-audit.md +4 -0
package/commands/hatch3r-feature-plan.md +7 -0
package/commands/hatch3r-handoff.md +133 -0
package/commands/hatch3r-healthcheck.md +4 -0
package/commands/hatch3r-hooks.md +4 -0
package/commands/hatch3r-learn.md +16 -0
package/commands/hatch3r-migration-plan.md +7 -0
package/commands/hatch3r-onboard.md +7 -0
package/commands/hatch3r-pr-resolve.md +12 -1
package/commands/hatch3r-project-spec.md +8 -0
package/commands/hatch3r-quick-change.md +11 -2
package/commands/hatch3r-recipe.md +4 -0
package/commands/hatch3r-refactor-plan.md +7 -0
package/commands/hatch3r-release.md +5 -0
package/commands/hatch3r-revision.md +7 -0
package/commands/hatch3r-roadmap.md +8 -0
package/commands/hatch3r-rule-customize.md +4 -0
package/commands/hatch3r-security-audit.md +4 -0
package/commands/hatch3r-skill-customize.md +4 -0
package/commands/hatch3r-test-plan.md +7 -0
package/commands/hatch3r-workflow.md +11 -1
package/dist/cli/index.js +4814 -1130
package/dist/cli/index.js.map +1 -1
package/package.json +10 -5
package/rules/hatch3r-accessibility-standards.md +21 -0
package/rules/hatch3r-accessibility-standards.mdc +21 -0
package/rules/hatch3r-agent-orchestration-detail.md +3 -0
package/rules/hatch3r-agent-orchestration-detail.mdc +3 -0
package/rules/hatch3r-agent-orchestration.md +34 -3
package/rules/hatch3r-agent-orchestration.mdc +34 -3
package/rules/hatch3r-ai-evals.md +158 -0
package/rules/hatch3r-ai-evals.mdc +154 -0
package/rules/hatch3r-ai-ux-patterns.md +131 -0
package/rules/hatch3r-ai-ux-patterns.mdc +127 -0
package/rules/hatch3r-api-design.md +67 -9
package/rules/hatch3r-api-design.mdc +67 -9
package/rules/hatch3r-api-versioning.md +119 -0
package/rules/hatch3r-api-versioning.mdc +115 -0
package/rules/hatch3r-auth-patterns.md +170 -0
package/rules/hatch3r-auth-patterns.mdc +166 -0
package/rules/hatch3r-component-conventions.md +30 -0
package/rules/hatch3r-component-conventions.mdc +30 -0
package/rules/hatch3r-container-hardening.md +131 -0
package/rules/hatch3r-container-hardening.mdc +127 -0
package/rules/hatch3r-contract-testing.md +117 -0
package/rules/hatch3r-contract-testing.mdc +113 -0
package/rules/hatch3r-deep-context.md +2 -0
package/rules/hatch3r-deep-context.mdc +2 -0
package/rules/hatch3r-dependency-management.md +73 -1
package/rules/hatch3r-dependency-management.mdc +72 -0
package/rules/hatch3r-design-system-detection.md +142 -0
package/rules/hatch3r-design-system-detection.mdc +138 -0
package/rules/hatch3r-event-schema-evolution.md +90 -0
package/rules/hatch3r-event-schema-evolution.mdc +86 -0
package/rules/hatch3r-handoff-readiness.md +45 -0
package/rules/hatch3r-handoff-readiness.mdc +40 -0
package/rules/hatch3r-i18n.md +13 -0
package/rules/hatch3r-i18n.mdc +13 -0
package/rules/hatch3r-iteration-summary.md +2 -0
package/rules/hatch3r-iteration-summary.mdc +2 -0
package/rules/hatch3r-migrations.md +61 -16
package/rules/hatch3r-migrations.mdc +61 -16
package/rules/hatch3r-observability-logging.md +1 -1
package/rules/hatch3r-observability-logging.mdc +1 -1
package/rules/hatch3r-observability-metrics.md +1 -1
package/rules/hatch3r-observability-metrics.mdc +1 -1
package/rules/hatch3r-observability-tracing-detail.md +8 -149
package/rules/hatch3r-observability-tracing-detail.mdc +7 -149
package/rules/hatch3r-observability-tracing.md +154 -6
package/rules/hatch3r-observability-tracing.mdc +154 -6
package/rules/hatch3r-observability.md +1 -0
package/rules/hatch3r-observability.mdc +1 -0
package/rules/hatch3r-operability.md +149 -0
package/rules/hatch3r-operability.mdc +145 -0
package/rules/hatch3r-passkey-server.md +181 -0
package/rules/hatch3r-passkey-server.mdc +177 -0
package/rules/hatch3r-progressive-delivery.md +120 -0
package/rules/hatch3r-progressive-delivery.mdc +116 -0
package/rules/hatch3r-resilience-patterns.md +154 -0
package/rules/hatch3r-resilience-patterns.mdc +150 -0
package/rules/hatch3r-secrets-management.md +29 -0
package/rules/hatch3r-secrets-management.mdc +29 -0
package/rules/hatch3r-testing.md +139 -43
package/rules/hatch3r-testing.mdc +139 -43
package/rules/hatch3r-ux-states-and-flows.md +149 -0
package/rules/hatch3r-ux-states-and-flows.mdc +145 -0
package/skills/hatch3r-a11y-audit/SKILL.md +14 -0
package/skills/hatch3r-agent-customize/SKILL.md +10 -0
package/skills/hatch3r-ai-feature/SKILL.md +136 -0
package/skills/hatch3r-api-spec/SKILL.md +73 -0
package/skills/hatch3r-architecture-review/SKILL.md +14 -0
package/skills/hatch3r-bug-fix/SKILL.md +5 -0
package/skills/hatch3r-ci-pipeline/SKILL.md +14 -0
package/skills/hatch3r-cli-aichat/SKILL.md +84 -0
package/skills/hatch3r-cli-ast-grep/SKILL.md +85 -0
package/skills/hatch3r-cli-az-devops/SKILL.md +89 -0
package/skills/hatch3r-cli-bat/SKILL.md +85 -0
package/skills/hatch3r-cli-comby/SKILL.md +85 -0
package/skills/hatch3r-cli-csvkit/SKILL.md +84 -0
package/skills/hatch3r-cli-delta/SKILL.md +86 -0
package/skills/hatch3r-cli-difftastic/SKILL.md +84 -0
package/skills/hatch3r-cli-docker/SKILL.md +89 -0
package/skills/hatch3r-cli-duckdb/SKILL.md +84 -0
package/skills/hatch3r-cli-fd/SKILL.md +85 -0
package/skills/hatch3r-cli-fzf/SKILL.md +84 -0
package/skills/hatch3r-cli-gh/SKILL.md +90 -0
package/skills/hatch3r-cli-glab/SKILL.md +89 -0
package/skills/hatch3r-cli-jq/SKILL.md +89 -0
package/skills/hatch3r-cli-lazygit/SKILL.md +78 -0
package/skills/hatch3r-cli-llm/SKILL.md +84 -0
package/skills/hatch3r-cli-miller/SKILL.md +84 -0
package/skills/hatch3r-cli-mods/SKILL.md +84 -0
package/skills/hatch3r-cli-overview/SKILL.md +60 -0
package/skills/hatch3r-cli-playwright/SKILL.md +89 -0
package/skills/hatch3r-cli-podman/SKILL.md +84 -0
package/skills/hatch3r-cli-qsv/SKILL.md +91 -0
package/skills/hatch3r-cli-ripgrep/SKILL.md +85 -0
package/skills/hatch3r-cli-rtk/SKILL.md +91 -0
package/skills/hatch3r-cli-sd/SKILL.md +85 -0
package/skills/hatch3r-cli-stagehand/SKILL.md +111 -0
package/skills/hatch3r-cli-taplo/SKILL.md +84 -0
package/skills/hatch3r-cli-yq/SKILL.md +85 -0
package/skills/hatch3r-cli-zstd/SKILL.md +85 -0
package/skills/hatch3r-command-customize/SKILL.md +10 -0
package/skills/hatch3r-context-health/SKILL.md +14 -0
package/skills/hatch3r-cost-tracking/SKILL.md +14 -0
package/skills/hatch3r-customize/SKILL.md +17 -0
package/skills/hatch3r-dep-audit/SKILL.md +14 -0
package/skills/hatch3r-design-system-detect/SKILL.md +164 -0
package/skills/hatch3r-feature/SKILL.md +2 -0
package/skills/hatch3r-gh-agentic-workflows/SKILL.md +13 -0
package/skills/hatch3r-handoff-prepare/SKILL.md +160 -0
package/skills/hatch3r-handoff-resume/SKILL.md +171 -0
package/skills/hatch3r-incident-response/SKILL.md +14 -0
package/skills/hatch3r-issue-workflow/SKILL.md +5 -0
package/skills/hatch3r-logical-refactor/SKILL.md +14 -0
package/skills/hatch3r-migration/SKILL.md +14 -0
package/skills/hatch3r-observability-verify/SKILL.md +134 -0
package/skills/hatch3r-perf-audit/SKILL.md +14 -0
package/skills/hatch3r-pr-creation/SKILL.md +14 -0
package/skills/hatch3r-qa-validation/SKILL.md +18 -0
package/skills/hatch3r-recipe/SKILL.md +14 -0
package/skills/hatch3r-refactor/SKILL.md +14 -0
package/skills/hatch3r-release/SKILL.md +14 -0
package/skills/hatch3r-reliability-verify/SKILL.md +146 -0
package/skills/hatch3r-rule-customize/SKILL.md +10 -0
package/skills/hatch3r-skill-customize/SKILL.md +10 -0
package/skills/hatch3r-ui-ux-verify/SKILL.md +138 -0
package/skills/hatch3r-visual-refactor/SKILL.md +15 -1

package/rules/hatch3r-resilience-patterns.mdc ADDED Viewed

@@ -0,0 +1,150 @@
+---
+description: Resilience patterns in user code — circuit breakers, retry with decorrelated jitter, timeouts with deadline propagation, idempotency keys, bulkheads, hedged requests
+globs: ["**/services/**", "**/handlers/**", "**/clients/**", "**/integrations/**", "**/api/**", "**/middleware/**", "**/circuit*", "**/retry*", "**/resilience*"]
+alwaysApply: false
+---
+# Resilience Patterns
+## Scope
+This rule applies to every outbound call (database, cache, queue, external HTTP, external RPC) and every service entry point in user code. The rule governs user application code; hatch3r's own pipeline implementations (`src/pipeline/circuitBreaker.ts`, `src/pipeline/retryWithBackoff.ts`) are the reference implementations — copy their semantics, including the transient-vs-substantive failure classification.
+## Circuit Breakers — Per Ecosystem
+Use the maintained library for the project's runtime. The Hystrix project entered maintenance in 2018; new services use resilience4j or its peers below.
+- **Node / TypeScript:** `opossum` 9.x or `cockatiel`. `opossum` wraps the operation in a breaker instance with explicit timeout, error threshold, and reset interval.
+- **JVM (Java / Kotlin):** `resilience4j` 2.x — succeeds Hystrix; integrates with Reactor, RxJava 3, Spring Boot 3. Configure per-dependency `CircuitBreakerConfig` and register metrics with Micrometer.
+- **.NET:** `Polly` 8.x — pipeline-style API; chain `AddCircuitBreaker`, `AddRetry`, `AddTimeout` for layered resilience.
+- **Go:** `gobreaker` (sony/gobreaker) — synchronous wrapper around the call; counts consecutive failures and rolls over per `Interval` window.
+- **Python:** `pybreaker` (sync) or `circuitbreaker` (async via decorator); pair with `tenacity` for retry composition.
+Concrete configuration baseline:
+- Failure threshold: 50% over a 10-request rolling window.
+- Cooldown: 30s before a half-open trial request.
+- Half-open: 1 trial request; success closes the breaker, failure reopens for the next 30s.
+- Minimum request volume before threshold evaluation: 20 requests within the window — below that the breaker stays closed regardless of failure rate (avoids false trips on cold traffic).
+- Failure classification distinguishes transient (network blip, 5xx, timeout) from substantive (4xx validation, auth failure). Only transient failures trip the breaker. Reference `src/pipeline/circuitBreaker.ts` for the classification taxonomy.
+Emit a metric on every state transition (closed → open, open → half-open, half-open → closed|open) and alert when a per-dependency breaker stays open for more than 5 minutes — that is a sustained-outage signal, not transient.
+## Retry With Decorrelated Jitter
+The canonical AWS Architecture Blog (2015) pattern, used by AWS SDK retries since:
+```
+sleep = min(cap, random_between(base, prev_sleep * 3))
+```
+- Base: 100ms.
+- Cap: 30s.
+- Maximum retries: 3 attempts beyond the initial call.
+Never use `min(cap, base * 2 ** attempt)` (pure exponential backoff) without jitter — synchronized clients produce thundering-herd retry storms on recovery. Never use full random jitter (`random_between(0, cap)`) — it discards the prior delay signal and converges slower.
+**Retry budget:** cap retry traffic at 10% of base request traffic per service. Exceeding the budget signals a sustained outage; circuit breaker should already be open.
+Retry only on transient failures (see classification below). Never retry on substantive failures — they will fail again and waste budget. The library should expose a `shouldRetry(error)` predicate that returns `false` for 4xx (except 408 and 429), validation errors, and authentication failures.
+Respect `Retry-After` headers on 429 and 503 — the server has told the client how long to wait. Ignoring it adds load to a struggling dependency.
+## Idempotency-Key on Retries
+Any retried request that is not naturally idempotent (POST, PATCH, non-idempotent RPC) must carry an `Idempotency-Key` header. Server stores key+response for 24h and replays the stored response on duplicate key. Cross-reference `rules/hatch3r-api-versioning.md` (Slice 4) for the wire-format contract.
+GET / HEAD / PUT / DELETE are idempotent by HTTP semantics — no key required, but verify handlers honor the contract (no side effects on GET; PUT replaces full resource state; DELETE on already-deleted resource returns 204 not 404).
+Generate keys as UUIDv4 or ULID at the call site, not at the retry layer — the key must remain stable across all retry attempts of the same logical operation. Reusing a key across logically distinct operations is a correctness bug; the server returns the cached prior response.
+## Timeouts + Deadline Propagation
+Every external call has an explicit timeout. Nested timeout budgets propagate from the parent deadline:
+- **Go:** `context.WithDeadline` / `context.WithTimeout`.
+- **Web / Node:** chained `AbortSignal` via `AbortSignal.timeout(ms)` and `AbortSignal.any([...])`.
+- **gRPC:** `Deadline` propagated via metadata.
+- **JVM:** `CompletableFuture.orTimeout` or `Resilience4j.TimeLimiter`.
+Downstream timeout = remaining-budget − propagation-overhead (typically 50ms reserved). Default budgets:
+- Service-call timeout: 5s.
+- Database call: 2s.
+- Cache call: 200ms.
+- Health-probe call (cross-reference `rules/hatch3r-operability.md`): 1s.
+A call that exceeds its budget is cancelled, not allowed to complete past the deadline. Caller surfaces the cancellation as a transient failure to the retry layer; if the parent deadline is already exhausted, no retry is attempted (no budget remains).
+## Hedged Requests
+For tail-latency-sensitive reads, send a second request after p99 latency elapses; the first response wins, the second is cancelled. The Google SRE workbook (Reducing Tail Latency chapter) measures p99.9 reduction of approximately 96% at approximately 2% extra request traffic.
+- **gRPC:** built-in via service-config `hedgingPolicy` (`maxAttempts`, `hedgingDelay`).
+- **Application-level:** issue duplicate request via `Promise.race` (Node) or equivalent; cancel the loser via `AbortController`.
+Apply only to idempotent reads. Never hedge writes — the duplicate could double-apply. Cap hedge traffic at 5% of total request volume; above that, the dependency is overloaded and hedging compounds the problem.
+## Bulkheads + Load Shedding
+Bound concurrent calls per dependency to prevent one slow downstream from saturating all worker threads:
+- 50 concurrent DB connections per service instance (per pool).
+- 20 concurrent external HTTP calls per remote dependency.
+- 10 concurrent cache calls per dependency (cache is usually fast — a queue depth above 10 means cache is itself a bottleneck).
+- Reject excess with HTTP 503 + `Retry-After: 1` header at the service entry; do not block the request worker waiting for a slot.
+- **JVM:** `resilience4j.Bulkhead` (semaphore or fixed thread pool).
+- **Node / TypeScript:** semaphore via `p-limit` or `async-sema`.
+- **Go:** buffered channel as semaphore.
+Hystrix `commandPoolSize` archived since 2018 — use resilience4j on the JVM.
+## Failure Classification
+Every call result classifies into one of three buckets:
+- **Transient** — retryable: connection refused, timeout, 502/503/504, ECONNRESET, EAI_AGAIN, gRPC `UNAVAILABLE`/`DEADLINE_EXCEEDED`.
+- **Substantive** — non-retryable: 400/401/403/404/422, validation error, business-rule rejection, gRPC `INVALID_ARGUMENT`/`NOT_FOUND`/`PERMISSION_DENIED`.
+- **Unknown** — treated as transient with reduced retry budget (max 1 attempt). Includes unexpected exceptions and unmapped error codes.
+Only transient failures trip the breaker and get retried. Substantive failures return immediately to the caller. Reference `src/pipeline/circuitBreaker.ts` for the canonical classification function.
+429 Too Many Requests is transient but special — respect `Retry-After`; do not exponential-backoff past it. 408 Request Timeout is transient and retryable. 401 is substantive at the resilience layer (auth refresh is a separate concern handled by the auth client, not the breaker).
+## Graceful Degradation / Fallback
+Every circuit-broken call has a fallback path:
+- Cached value (from the cache layer, even if stale-while-revalidate is in effect).
+- Static response (sensible default; e.g. empty list, feature-flagged-off behavior).
+- Degraded feature (skip non-critical enrichment; render the page without the personalization block).
+Never let a non-critical dependency take down the request path. A recommendation service outage must not 500 the product page; the page renders without recommendations. Document the per-feature degradation strategy in `docs/runbooks/<service>.md` so the on-call can verify the fallback actually fires when the upstream breaker is open.
+## Composition Order
+Stack the patterns in the order below — outermost first; reversing the order changes the failure semantics.
+1. **Bulkhead** (outermost) — reject if concurrency cap exceeded; protects the worker pool.
+2. **Circuit breaker** — fail-fast if breaker is open; protects the dependency.
+3. **Retry** — on transient failure; bounded by retry budget.
+4. **Timeout** (innermost, per attempt) — cancel the in-flight call.
+Wrong order example: retry around timeout around breaker would retry on breaker-open exceptions and amplify load against the failing dependency. Right order: retry sits inside breaker, so a tripped breaker short-circuits before any retry attempt fires.
+## Cross-References
+- `rules/hatch3r-operability.md` — probes, graceful shutdown, kill switches.
+- `rules/hatch3r-observability-metrics.md` — RED metrics emit on retries and circuit-state changes; SLO alerts trigger on retry-budget exhaustion.
+- `rules/hatch3r-api-versioning.md` — `Idempotency-Key` header contract.
+- `rules/hatch3r-progressive-delivery.md` — circuit-broken dependency triggers staged-rollout abort.
+## References
+- opossum (Node) — `github.com/nodeshift/opossum`
+- resilience4j 2.x (JVM) — `resilience4j.readme.io`
+- Polly 8.x (.NET) — `pollydocs.org`
+- gobreaker (Go) — `github.com/sony/gobreaker`
+- AWS Architecture Blog, "Exponential Backoff and Jitter" (2015) — `aws.amazon.com/blogs/architecture/exponential-backoff-and-jitter`
+- Google SRE workbook, "Reducing Tail Latency" chapter — `sre.google/workbook`
+- 2024–2026 outage postmortems: CrowdStrike Jul 2024, AWS us-east-1 Oct 2025, Azure East-US2 Sep 2025, Cloudflare Nov 2025.

package/rules/hatch3r-secrets-management.md CHANGED Viewed

@@ -53,6 +53,35 @@ cache_friendly: true
   - **Azure DevOps:** Workload Identity Federation with service connections
   - **GitLab CI:** OIDC ID tokens via `id_tokens` keyword
+## OIDC Trust-Policy Conditions (Cloud Auth)
+Long-lived cloud secrets (`AWS_ACCESS_KEY_ID`, `GOOGLE_APPLICATION_CREDENTIALS` JSON file, Azure SP password) in CI are an anti-pattern in 2026. GitHub OIDC issues a per-run JWT; the cloud provider's trust policy exchanges it for short-lived credentials scoped to the workflow.
+- **Branch + environment conditions are required.** The `sub` claim must be matched to a specific `(repository, ref, environment)` triple. PR workflows assume a read-only role; main-branch workflows assume the production role.
+- **AWS:** IAM trust policy `Condition.StringEquals` on `token.actions.githubusercontent.com:sub` matching `repo:org/repo:ref:refs/heads/main` (or `:environment:production`). Federate via `aws-actions/configure-aws-credentials`.
+- **GCP:** Workload Identity Federation with attribute condition on `assertion.sub` and `assertion.repository`. The pool is restricted to a single GitHub org.
+- **Azure:** Federated credential `subject` field; create separate federated credentials for `pull_request`, `ref:refs/heads/main`, and per-environment.
+- Remove every long-lived cloud secret after migrating; the leak surface drops from "credential lifetime" to "single workflow run".
+## Secret Manager Mandate
+Every production secret lives in a managed vault — AWS Secrets Manager, GCP Secret Manager, Azure Key Vault, HashiCorp Vault, or a SaaS equivalent (Doppler, 1Password Secrets Automation, Infisical). Direct `.env`-file secret storage is permitted only for local development, and the file is never committed.
+- Repository ships `.env.example` only — a template with placeholder values and per-variable descriptions. Actual `.env` is in `.gitignore`.
+- Application code fetches secrets at startup via the vault SDK (AWS `GetSecretValue`, GCP `accessSecretVersion`, Vault `kv/data`). Secrets never bake into container images or build artifacts.
+- Automated rotation is configured per-secret (AWS Lambda rotators, GCP rotation schedules, Vault dynamic credentials). See the Secret Rotation Policies table above for cadence.
+- Vault access scoped per service via IAM / IRSA / Workload Identity / AppRole — no shared "master" reader.
+## Certificate Automation
+TLS certificate expiry is a recurring outage class — manual renewal is forbidden in production. Automate via ACME.
+- **Kubernetes:** `cert-manager` with a `ClusterIssuer` pointing at Let's Encrypt (production) or a private ACME server. Use HTTP-01 or DNS-01 solver per environment.
+- **Reverse proxies:** Caddy auto-issues and renews via ACME by default; nginx pairs with `lego` or `acme.sh`.
+- **Cloud-managed:** AWS ACM, GCP Managed SSL, Azure App Service Managed Certificate — auto-renew built in; pair with deploy-time CertificateNotAfter assertions.
+- Monitoring alerts fire at 30, 14, and 7 days before expiry — paging at 7 days. Track via Prometheus `ssl_expiry_seconds` exporter or cloud-native alarm.
+- Root and intermediate CA certificates pinned by SPKI hash in mTLS clients; rotate on a documented schedule before issuer key rollover.
 ## Application-Level Secret Handling
 - **Never log secrets.** Sanitize log output to redact any field matching known secret patterns (tokens, keys, passwords). Use structured logging with an explicit allowlist of loggable fields.

package/rules/hatch3r-secrets-management.mdc CHANGED Viewed

@@ -49,6 +49,35 @@ alwaysApply: false
   - **Azure DevOps:** Workload Identity Federation with service connections
   - **GitLab CI:** OIDC ID tokens via `id_tokens` keyword
+## OIDC Trust-Policy Conditions (Cloud Auth)
+Long-lived cloud secrets (`AWS_ACCESS_KEY_ID`, `GOOGLE_APPLICATION_CREDENTIALS` JSON file, Azure SP password) in CI are an anti-pattern in 2026. GitHub OIDC issues a per-run JWT; the cloud provider's trust policy exchanges it for short-lived credentials scoped to the workflow.
+- **Branch + environment conditions are required.** The `sub` claim must be matched to a specific `(repository, ref, environment)` triple. PR workflows assume a read-only role; main-branch workflows assume the production role.
+- **AWS:** IAM trust policy `Condition.StringEquals` on `token.actions.githubusercontent.com:sub` matching `repo:org/repo:ref:refs/heads/main` (or `:environment:production`). Federate via `aws-actions/configure-aws-credentials`.
+- **GCP:** Workload Identity Federation with attribute condition on `assertion.sub` and `assertion.repository`. The pool is restricted to a single GitHub org.
+- **Azure:** Federated credential `subject` field; create separate federated credentials for `pull_request`, `ref:refs/heads/main`, and per-environment.
+- Remove every long-lived cloud secret after migrating; the leak surface drops from "credential lifetime" to "single workflow run".
+## Secret Manager Mandate
+Every production secret lives in a managed vault — AWS Secrets Manager, GCP Secret Manager, Azure Key Vault, HashiCorp Vault, or a SaaS equivalent (Doppler, 1Password Secrets Automation, Infisical). Direct `.env`-file secret storage is permitted only for local development, and the file is never committed.
+- Repository ships `.env.example` only — a template with placeholder values and per-variable descriptions. Actual `.env` is in `.gitignore`.
+- Application code fetches secrets at startup via the vault SDK (AWS `GetSecretValue`, GCP `accessSecretVersion`, Vault `kv/data`). Secrets never bake into container images or build artifacts.
+- Automated rotation is configured per-secret (AWS Lambda rotators, GCP rotation schedules, Vault dynamic credentials). See the Secret Rotation Policies table above for cadence.
+- Vault access scoped per service via IAM / IRSA / Workload Identity / AppRole — no shared "master" reader.
+## Certificate Automation
+TLS certificate expiry is a recurring outage class — manual renewal is forbidden in production. Automate via ACME.
+- **Kubernetes:** `cert-manager` with a `ClusterIssuer` pointing at Let's Encrypt (production) or a private ACME server. Use HTTP-01 or DNS-01 solver per environment.
+- **Reverse proxies:** Caddy auto-issues and renews via ACME by default; nginx pairs with `lego` or `acme.sh`.
+- **Cloud-managed:** AWS ACM, GCP Managed SSL, Azure App Service Managed Certificate — auto-renew built in; pair with deploy-time CertificateNotAfter assertions.
+- Monitoring alerts fire at 30, 14, and 7 days before expiry — paging at 7 days. Track via Prometheus `ssl_expiry_seconds` exporter or cloud-native alarm.
+- Root and intermediate CA certificates pinned by SPKI hash in mTLS clients; rotate on a documented schedule before issuer key rollover.
 ## Application-Level Secret Handling
 - **Never log secrets.** Sanitize log output to redact any field matching known secret patterns (tokens, keys, passwords). Use structured logging with an explicit allowlist of loggable fields.

package/rules/hatch3r-testing.md CHANGED Viewed

@@ -12,17 +12,24 @@ cache_friendly: true
 ## Core Principles
 - Unit tests: project test runner. Integration: test runner + emulators/mocks. E2E: browser automation (Playwright or equivalent).
-- **Deterministic.** Mock time where needed. No wall clock dependency.
-- **Isolated.** Each test sets up and tears down its own state.
+- **Deterministic.** Mock time, seed RNG, pin timezone/locale. See Determinism Contract below.
+- **Isolated.** Each test sets up and tears down its own state. Vitest `isolate: true`; Jest `--runInBand` only for serialized DB tests.
 - **Fast.** Unit tests < 50ms. Integration tests < 2s.
 - **Named clearly.** Describe behavior: `"should award 15 XP for 25-min focus block"`.
 - **Regression.** Every bug fix includes a test that fails before the fix and passes after.
-- **No network.** Unit tests must not make network calls. Use mocks.
+- **No network.** Unit tests must not make network calls. Use mocks or Testcontainers (pinned by digest).
 - No type escape hatches in tests. No `.skip` without a linked issue.
 - Write tests to `tests/unit/`, `tests/integration/`, `tests/e2e/`, or equivalent.
 - Use test fixtures from `tests/fixtures/` or equivalent.
 - **Browser verification.** For UI changes, verify visually in the browser via browser automation MCP after automated tests pass. Capture screenshots as evidence.
+## Test Pyramid / Honeycomb / Trophy — Pick by Architecture
+Pick exactly one shape and document it in `docs/testing.md` (or equivalent):
+- **Pyramid** (heavy unit, light E2E): monoliths with rich domain logic.
+- **Honeycomb** (heavy integration, light unit + E2E): microservices; ~48% of microservice teams (Spotify model).
+- **Trophy** (unit + integration + E2E in similar ratios, light static): serverless functions; ~42% of serverless teams (Kent C. Dodds shape).
 ## Coverage Thresholds
 - **Statement coverage:** 80% minimum across the project. New code must not decrease overall coverage.
@@ -33,6 +40,10 @@ cache_friendly: true
 - Generate coverage reports in CI and publish as PR comments or artifacts for visibility.
 - Exclude generated code, type declarations, and config files from coverage metrics.
+## Coverage That Matters — Coverage AND Mutation
+Coverage alone is necessary, not sufficient. A PR that raises line coverage but drops mutation score is a regression. Reviewers verify the right test classes per the Per-Feature Mandate Map below; coverage numbers are a floor, not a finish line.
 ## Mocking Strategy
 - **Prefer fakes over mocks** for stateful dependencies (databases, caches). Fakes implement the real interface with in-memory state, making tests more realistic.
@@ -43,59 +54,144 @@ cache_friendly: true
 - **Type-safe mocks.** Mock implementations must satisfy the same TypeScript interface as the real dependency. Avoid `as any` in mock setup.
 - **No mocking the unit under test.** If you need to mock part of the module you are testing, the module has too many responsibilities — refactor first.
-## Property-Based Testing
+## Contract Testing
+Every cross-service interaction is covered by both consumer-side (Pact) and provider-side (Schemathesis against the OpenAPI/AsyncAPI schema) contract tests. `pact-broker can-i-deploy` gates production deploys: if the consumer/provider contract pair is incompatible, the deploy is blocked. See `rules/hatch3r-contract-testing.md` for the full pattern (broker setup, provider state handlers, versioning, breakage triage).
+## Property-Based Testing — Per Ecosystem
+Required for any pure function, parser, serializer, state machine, or invariant-bearing function. Default 100 trials per property; raise to 1000 for security-sensitive code. Shrinking must be enabled.
+- **TypeScript / JavaScript:** `fast-check` (latest 3.x). Use for pure functions, parsers, state machines (`fc.commands`).
+- **Python:** `Hypothesis` 6.151+. Stateful PBT via `RuleBasedStateMachine`.
+- **Rust:** `proptest`. Shrinks to minimal failing case.
+- **Scala:** `ScalaCheck`. Use for case-class invariants.
+- **Java:** `jqwik` (modern) or `junit-quickcheck`.
+- **Go:** `gopter` or stdlib `testing/quick` (limited shrinking).
-- Use a property-based testing library (fast-check or equivalent) for functions with wide input domains.
-- **Priority targets:** parsers, serializers, validators, encoders/decoders, mathematical functions, and any pure function with complex input types.
-- Define invariants as properties: round-trip (encode then decode equals original), idempotency (applying twice equals applying once), monotonicity, commutativity.
-- Use `fc.assert` with at least 100 runs per property. Increase to 1000 for critical paths.
-- When a property test finds a failure, add the minimal counterexample as a dedicated regression unit test.
-- Shrinking must be enabled — it reduces failing inputs to the smallest reproduction case.
-- Property tests belong alongside unit tests in `tests/unit/`. Name them clearly: `"property: round-trip serialization for UserProfile"`.
+Invariants to encode: round-trip (encode then decode equals original), idempotency (applying twice equals once), monotonicity, commutativity. When a property test finds a failure, add the minimal counterexample as a dedicated regression unit test.
-## Mutation Testing
+## Mutation Testing — Per Ecosystem + Thresholds
-- Use Stryker (or equivalent mutation testing framework) on critical modules to measure test effectiveness beyond line coverage.
-- **Mutation score target:** 70% minimum on critical modules (auth, data layer, business rules). 60% minimum project-wide.
-- Run mutation testing in CI on a weekly schedule (not per-PR — too slow). Report results as a CI artifact.
-- **Surviving mutants** indicate tests that pass regardless of code changes — these are false-coverage tests. Fix them by adding assertions that detect the mutation.
-- Focus mutation testing effort on modules where a bug would cause data loss, security vulnerability, or financial impact.
-- Exclude test files, generated code, and UI presentation logic from mutation analysis.
+Run on a nightly schedule (not per-commit) due to runtime cost. Mutation score is a quality gate alongside coverage. Surviving mutants indicate tests that pass regardless of code changes — fix by adding assertions that detect the mutation.
+- **TypeScript / JavaScript:** Stryker. Thresholds: break 50 / low 60 / high 80 (Stryker defaults).
+- **Python:** `mutmut` (88.5% score on reference suite, faster) or `Cosmic Ray` (82.7%, thorough).
+- **Java:** PIT 1.22 (November 2025). Business logic target 80–90.
+- **Go:** `go-mutesting` or `gremlins-dev/gremlins`.
+- **.NET:** `Stryker.NET`.
+**Mutation score target:** 70% minimum on critical modules (auth, data layer, business rules), 60% project-wide, 80%+ on payment/billing logic. Exclude test files, generated code, and UI presentation logic from mutation analysis.
+## Fuzz Testing — Per Ecosystem
+Required for any parser, deserializer, network handler, file-format handler, or untrusted-input boundary. Crash + hang + OOM detection; corpus minimization; persisted crash inputs become regression fixtures.
+- **Java:** jazzer (OSS-Fuzz integrated).
+- **JavaScript:** jazzer.js OSS was discontinued in 2025 — fall back to property-based testing for JS-only paths, or fuzz the underlying native binding.
+- **Python:** `atheris` (Google).
+- **Rust:** `cargo-fuzz` + libFuzzer.
+- **Go:** native `testing.F` (Go 1.18+); for advanced workflows use `gosentry` (Trail of Bits, 2026-05-12 fork of the discontinued jazzer.js workflow adapted for Go).
+- **C / C++:** AFL++ + OSS-Fuzz.
+## Determinism Contract
+Every test must be deterministic. Mandates:
+- **Clock injection.** Production code never calls `new Date()` / `time.Now()` / `datetime.now()` directly; inject a clock interface. In tests: `vi.useFakeTimers()` / `freezegun` / `mock.patch('time.time')`.
+- **Seeded RNG.** Every random call uses an injected seedable RNG. In tests: fixed seed per test.
+- **Pinned timezone and locale.** `TZ=UTC` and `LC_ALL=C.UTF-8` in the CI environment.
+- **Sorted iteration.** Any test asserting on map / dict iteration order sorts first.
+- **OS-assigned ports.** Never bind to a fixed port in tests — bind to `0` to get an OS-assigned port.
+- **Test isolation.** Vitest `isolate: true` (default); Jest `--runInBand` only when needed for serialized DB tests.
 ## Flaky Test Handling
-- **Zero tolerance policy.** A flaky test erodes trust in the entire suite. Fix or quarantine within 48 hours of detection.
-- **Quarantine process:** Move the flaky test to a `tests/quarantine/` directory or tag with `.skip("FLAKY: #issue-number")`. Create a tracking issue immediately.
-- **Retry strategy in CI:** Allow a maximum of 1 automatic retry for the full test suite. Never retry individual tests silently — that masks flakiness.
-- **Root cause investigation:** Common causes are shared mutable state, timing dependencies (real clocks, `setTimeout`), port conflicts, uncontrolled randomness, and external service calls.
-- **Fix patterns:** Replace `setTimeout` with fake timers, replace shared state with per-test setup, replace port binding with dynamic ports, seed random generators deterministically.
-- **Flaky test metrics:** Track flaky test rate over time. Target < 0.5% flaky rate (flaky runs / total runs). Alert when rate exceeds 1%.
-- **Quarantine review:** Review quarantined tests weekly. Tests quarantined for more than 30 days must be either fixed or deleted with justification.
+- **Detection.** CI retries failed tests once; tests failing on retry but passing on rerun are tagged `flake-suspected`.
+- **Quarantine.** Any test that flakes twice in 7 days moves to `tests/quarantine/` (runs but does not block PRs). Issue auto-filed with the `flake` label.
+- **SLA.** 14 days to root-cause and fix; otherwise the test is deleted. Quarantined tests reviewed weekly.
+- **Retry policy.** Allow at most 1 automatic retry for the full test suite. Never silently retry individual tests — that masks flakiness.
+- **Categorization on intake.** Tag each flake by root cause: timing (use fake timers), network (use mocks / Testcontainers), ordering (sort assertions), pollution (test isolation), resource (cleanup).
+- **Fix patterns.** Replace `setTimeout` with fake timers; replace shared state with per-test setup; replace fixed ports with OS-assigned (`0`); seed random generators deterministically.
+- **Metrics.** Track flaky rate over time. Target < 0.5% (flaky runs / total runs). Alert at 1%.
+- **Cost awareness.** Datadog 2026 telemetry reports 6–8 hrs/eng/week lost to flakes when quarantine + SLA is not enforced.
+## E2E Strategy
+- **Playwright is the 2026 default** (95k stars, ~290 ms/action, native sharding via `--shard`). Use for cross-browser, accessibility (`@axe-core/playwright`), and visual regression (`toHaveScreenshot()`).
+- Cypress requires paid Cloud for serious parallelization; WebdriverIO is the niche choice for web + mobile parity.
+- Retry policy: `retries: 2` for transient infra; never `retries: >= 5` (masks bugs).
+## Snapshot Testing
+- **Use sparingly.** 2–4 snapshots per component max. Appropriate for serialized output (JSON API responses, CLI output, rendered HTML structure) where the exact output matters and is stable.
+- **Not appropriate for:** UI component visual appearance (use visual regression tests via `toHaveScreenshot()` or `jest-image-snapshot`), objects with timestamps or random IDs (unstable), large objects (unreadable diffs).
+- **Review discipline.** Snapshot updates (`--update-snapshots`) must be reviewed with the same rigor as code changes. Reviewers verify the new snapshot is intentionally correct, not just "different."
+- **Keep snapshots small.** Files > 100 lines suggest the test is asserting too broadly. Narrow the assertion to the relevant subset.
+- **Inline snapshots** are preferred over external `.snap` files for short outputs (< 20 lines) — keeps the assertion co-located with the test.
+- **Design-system components:** Storybook + Chromatic.
 ## Test Data Management
-- **Factories over fixtures.** Use factory functions (builder pattern) to generate test data with sensible defaults and per-test overrides. Factories produce valid objects by default; tests override only the fields relevant to the scenario.
-- **Builder pattern example:** `buildUser({ role: "admin" })` returns a full valid User with admin role and random but valid defaults for all other fields.
-- **No shared mutable fixtures.** If multiple tests read the same fixture data, each test must get its own copy. Use `structuredClone()` or factory functions.
-- **Realistic data.** Use faker or equivalent for generating realistic names, emails, dates. Avoid magic strings like `"test"`, `"foo"`, `"abc123"`.
-- **Deterministic seeding.** When using random data generators, seed them per test file so failures are reproducible.
-- **Fixture files** (JSON, YAML) are acceptable for large, complex, or externally-sourced test inputs (API response snapshots, configuration samples). Store in `tests/fixtures/`.
-- **Database state:** Integration tests that require database state must set up and tear down within the test using helpers. Never depend on database state from a previous test.
+- **Factories over fixtures.** Use factory functions (factory-bot for Ruby, Fishery for TS, factory-boy for Python) seeded with Faker pinned to a fixed version.
+- **Builder pattern example:** `buildUser({ role: "admin" })` returns a full valid `User` with admin role and valid defaults for all other fields.
+- **No shared mutable fixtures.** If multiple tests read the same fixture data, each test gets its own copy via `structuredClone()` or a factory function.
+- **Realistic data.** Avoid magic strings like `"test"`, `"foo"`, `"abc123"`.
+- **Deterministic seeding.** Seed generators per test file so failures reproduce.
+- **Fixture files** (JSON, YAML) are acceptable for large, complex, or externally-sourced inputs (API response snapshots, configuration samples). Store in `tests/fixtures/`.
+- **Database state:** Integration tests set up and tear down within the test via helpers. Never depend on database state from a previous test. Enforce tenancy isolation via per-test schema or transaction rollback.
+- **Testcontainers** pinned by image digest, not tag.
 ## Error Path Coverage
 Error handling code is often under-tested because developers focus on happy paths. Enforce minimum error coverage:
+- **Every exported function that can fail** must have at least one test exercising the error path. "Can fail" includes functions returning `Result<T, E>`, functions with `throw` statements, async functions calling external services, and functions with input validation.
+- **Error message assertions.** Verify that messages, codes, and structured fields contain the expected values. Do not assert only that "an error was thrown" — verify the error content.
+- **Error propagation.** When a function wraps or transforms errors from a dependency, verify the original error context is preserved (cause chain, stack trace, original error code).
+- **Boundary error tests.** For each architectural boundary (API handler, event handler, background processor), verify that errors are caught, logged, and returned as safe responses without leaking internal details.
-- **Every exported function that can fail** must have at least one test exercising the error path. "Can fail" includes: functions returning `Result<T, E>`, functions with `throw` statements, async functions calling external services, and functions with input validation.
-- **Error message assertions.** Test that error messages, codes, and structured fields contain the expected values. Do not assert only that "an error was thrown" -- verify the error content.
-- **Error propagation.** When a function wraps or transforms errors from a dependency, test that the original error context is preserved (cause chain, stack trace, original error code).
-- **Boundary error tests.** For each architectural boundary (API handler, event handler, background processor), test that errors are caught, logged, and returned as safe responses without leaking internal details.
+## Load Testing in CI
-## Snapshot Testing
+- **k6** (k6 Operator v1.0 for Kubernetes-distributed runs), **Vegeta** (constant-rate, no coordinated omission), **Locust** (Python), **Artillery** (TS).
+- Baseline vs current diff in CI; SLO regression detection on p95, p99, and error-rate thresholds. Block the PR when a tracked SLO regresses.
+## Security Testing in CI
+- **SAST:** Semgrep + CodeQL.
+- **SCA / container / IaC / secrets:** Trivy (one-shot multi-scanner).
+- **DAST:** OWASP ZAP or Nuclei against an ephemeral environment.
+See `rules/hatch3r-container-hardening.md` and `rules/hatch3r-dependency-management.md` for the operational policy around hardening and pinning.
+## AI-Assisted Test Generation
+- **Qodo 2.0** (60.1% F1 on reference benchmark) for TS / JS unit tests + edge cases.
+- **Diffblue Cover** (symbolic, 20× cited productivity uplift on legacy code) for Java.
+These are accelerators, not substitutes for the Per-Feature Mandate Map. Every generated test still goes through review and must map to a required test class for the code under test.
+## Per-Feature Mandate Map
+Reviewers verify each PR satisfies the required test classes for the code class touched. A PR that adds a parser without a fuzz harness, or a payment path without mutation testing, fails review even if coverage is green.
+| Code class | Required test classes |
+|------------|----------------------|
+| Parser / deserializer | unit + property + fuzz |
+| Network handler / RPC entry | integration + contract + fuzz |
+| Payment / billing logic | unit + property + mutation (≥ 80 score) |
+| State machine | unit + property (with `RuleBasedStateMachine` analogue) |
+| Pure function | unit + property |
+| Service / RPC client | unit + contract (consumer side) |
+| Service / RPC server | integration + contract (provider side) + Schemathesis |
+| UI component | unit + visual regression + a11y (via `hatch3r-ui-ux-verify`) |
+| LLM feature | eval (via `hatch3r-ai-feature`) + unit on adapter + integration on fallback chain |
+| Background job | unit + integration with poison-message handling |
+## References
-- **Use sparingly.** Snapshots are appropriate for serialized output (JSON API responses, CLI output, rendered HTML structure) where the exact output matters and is stable.
-- **Not appropriate for:** UI component visual appearance (use visual regression tests), objects with timestamps or random IDs (unstable), large objects (unreadable diffs).
-- **Review discipline.** Snapshot updates (`--update-snapshots`) must be reviewed with the same rigor as code changes. Reviewers must verify the new snapshot is intentionally correct, not just "different."
-- **Keep snapshots small.** Snapshot files > 100 lines suggest the test is asserting too broadly. Narrow the assertion to the relevant subset.
-- **Inline snapshots** (where supported) are preferred over external `.snap` files for short outputs (< 20 lines) because they keep the assertion co-located with the test.
-- **Name snapshot files** to match their test file: `auth.test.ts` → `auth.test.ts.snap`.
+- Stryker (mutation testing): https://stryker-mutator.io/
+- fast-check (property-based testing, TS): https://fast-check.dev/
+- Hypothesis (property-based testing, Python): https://hypothesis.readthedocs.io/
+- proptest (property-based testing, Rust): https://github.com/proptest-rs/proptest
+- Pact (consumer-driven contract testing): https://docs.pact.io/
+- Schemathesis (OpenAPI provider testing): https://schemathesis.readthedocs.io/
+- OWASP Web Security Testing Guide: https://owasp.org/www-project-web-security-testing-guide/