hatch3r 1.7.0 → 1.7.5

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (160) hide show
  1. package/README.md +38 -12
  2. package/agents/hatch3r-a11y-auditor.md +4 -0
  3. package/agents/hatch3r-architect.md +5 -1
  4. package/agents/hatch3r-ci-watcher.md +4 -0
  5. package/agents/hatch3r-context-rules.md +4 -0
  6. package/agents/hatch3r-creator.md +4 -0
  7. package/agents/hatch3r-dependency-auditor.md +4 -0
  8. package/agents/hatch3r-devops.md +4 -0
  9. package/agents/hatch3r-docs-writer.md +4 -0
  10. package/agents/hatch3r-fixer.md +5 -1
  11. package/agents/hatch3r-handoff-loader.md +243 -0
  12. package/agents/hatch3r-handoff-preparer.md +134 -0
  13. package/agents/hatch3r-implementer.md +5 -1
  14. package/agents/hatch3r-learnings-loader.md +4 -0
  15. package/agents/hatch3r-lint-fixer.md +4 -0
  16. package/agents/hatch3r-perf-profiler.md +8 -0
  17. package/agents/hatch3r-researcher.md +5 -1
  18. package/agents/hatch3r-reviewer.md +92 -0
  19. package/agents/hatch3r-security-auditor.md +24 -0
  20. package/agents/hatch3r-test-writer.md +4 -0
  21. package/agents/modes/requirements-elicitation.md +5 -1
  22. package/agents/modes/similar-implementation.md +6 -0
  23. package/agents/modes/user-flows.md +76 -0
  24. package/agents/shared/quality-charter.md +129 -0
  25. package/agents/shared/user-question-protocol.md +95 -0
  26. package/commands/board/shared-azure-devops.md +2 -0
  27. package/commands/board/shared-github.md +17 -0
  28. package/commands/board/shared-gitlab.md +4 -0
  29. package/commands/hatch3r-board-fill.md +2 -1
  30. package/commands/hatch3r-board-pickup.md +1 -1
  31. package/commands/hatch3r-board-shared.md +21 -0
  32. package/commands/hatch3r-create.md +2 -0
  33. package/commands/hatch3r-handoff.md +126 -0
  34. package/commands/hatch3r-pr-resolve.md +672 -0
  35. package/commands/hatch3r-quick-change.md +5 -3
  36. package/commands/hatch3r-report.md +167 -0
  37. package/commands/hatch3r-revision.md +1 -1
  38. package/commands/hatch3r-workflow.md +3 -1
  39. package/dist/cli/index.js +3144 -979
  40. package/dist/cli/index.js.map +1 -1
  41. package/package.json +4 -2
  42. package/rules/hatch3r-accessibility-standards.md +21 -0
  43. package/rules/hatch3r-accessibility-standards.mdc +21 -0
  44. package/rules/hatch3r-agent-orchestration.md +32 -1
  45. package/rules/hatch3r-agent-orchestration.mdc +32 -1
  46. package/rules/hatch3r-ai-evals.md +158 -0
  47. package/rules/hatch3r-ai-evals.mdc +154 -0
  48. package/rules/hatch3r-ai-ux-patterns.md +131 -0
  49. package/rules/hatch3r-ai-ux-patterns.mdc +127 -0
  50. package/rules/hatch3r-api-design.md +67 -9
  51. package/rules/hatch3r-api-design.mdc +67 -9
  52. package/rules/hatch3r-api-versioning.md +119 -0
  53. package/rules/hatch3r-api-versioning.mdc +115 -0
  54. package/rules/hatch3r-auth-patterns.md +170 -0
  55. package/rules/hatch3r-auth-patterns.mdc +166 -0
  56. package/rules/hatch3r-component-conventions.md +30 -0
  57. package/rules/hatch3r-component-conventions.mdc +30 -0
  58. package/rules/hatch3r-container-hardening.md +131 -0
  59. package/rules/hatch3r-container-hardening.mdc +127 -0
  60. package/rules/hatch3r-contract-testing.md +117 -0
  61. package/rules/hatch3r-contract-testing.mdc +113 -0
  62. package/rules/hatch3r-deep-context.md +3 -1
  63. package/rules/hatch3r-deep-context.mdc +3 -1
  64. package/rules/hatch3r-dependency-management.md +73 -1
  65. package/rules/hatch3r-dependency-management.mdc +72 -0
  66. package/rules/hatch3r-design-system-detection.md +142 -0
  67. package/rules/hatch3r-design-system-detection.mdc +138 -0
  68. package/rules/hatch3r-event-schema-evolution.md +90 -0
  69. package/rules/hatch3r-event-schema-evolution.mdc +86 -0
  70. package/rules/hatch3r-handoff-readiness.md +45 -0
  71. package/rules/hatch3r-handoff-readiness.mdc +40 -0
  72. package/rules/hatch3r-i18n.md +13 -0
  73. package/rules/hatch3r-i18n.mdc +13 -0
  74. package/rules/hatch3r-iteration-summary.md +2 -0
  75. package/rules/hatch3r-iteration-summary.mdc +2 -0
  76. package/rules/hatch3r-migrations.md +61 -16
  77. package/rules/hatch3r-migrations.mdc +61 -16
  78. package/rules/hatch3r-observability-logging.md +1 -1
  79. package/rules/hatch3r-observability-logging.mdc +1 -1
  80. package/rules/hatch3r-observability-metrics.md +1 -1
  81. package/rules/hatch3r-observability-metrics.mdc +1 -1
  82. package/rules/hatch3r-observability-tracing-detail.md +1 -1
  83. package/rules/hatch3r-observability-tracing-detail.mdc +1 -1
  84. package/rules/hatch3r-observability-tracing.md +1 -1
  85. package/rules/hatch3r-observability-tracing.mdc +1 -1
  86. package/rules/hatch3r-observability.md +1 -0
  87. package/rules/hatch3r-observability.mdc +1 -0
  88. package/rules/hatch3r-operability.md +149 -0
  89. package/rules/hatch3r-operability.mdc +145 -0
  90. package/rules/hatch3r-passkey-server.md +181 -0
  91. package/rules/hatch3r-passkey-server.mdc +177 -0
  92. package/rules/hatch3r-progressive-delivery.md +120 -0
  93. package/rules/hatch3r-progressive-delivery.mdc +116 -0
  94. package/rules/hatch3r-resilience-patterns.md +154 -0
  95. package/rules/hatch3r-resilience-patterns.mdc +150 -0
  96. package/rules/hatch3r-secrets-management.md +29 -0
  97. package/rules/hatch3r-secrets-management.mdc +29 -0
  98. package/rules/hatch3r-testing.md +139 -43
  99. package/rules/hatch3r-testing.mdc +139 -43
  100. package/rules/hatch3r-ux-states-and-flows.md +149 -0
  101. package/rules/hatch3r-ux-states-and-flows.mdc +145 -0
  102. package/skills/hatch3r-a11y-audit/SKILL.md +14 -0
  103. package/skills/hatch3r-ai-feature/SKILL.md +134 -0
  104. package/skills/hatch3r-api-spec/SKILL.md +5 -0
  105. package/skills/hatch3r-architecture-review/SKILL.md +14 -0
  106. package/skills/hatch3r-bug-fix/SKILL.md +5 -0
  107. package/skills/hatch3r-ci-pipeline/SKILL.md +14 -0
  108. package/skills/hatch3r-cli-aichat/SKILL.md +84 -0
  109. package/skills/hatch3r-cli-ast-grep/SKILL.md +85 -0
  110. package/skills/hatch3r-cli-az-devops/SKILL.md +89 -0
  111. package/skills/hatch3r-cli-bat/SKILL.md +85 -0
  112. package/skills/hatch3r-cli-comby/SKILL.md +85 -0
  113. package/skills/hatch3r-cli-csvkit/SKILL.md +84 -0
  114. package/skills/hatch3r-cli-delta/SKILL.md +86 -0
  115. package/skills/hatch3r-cli-difftastic/SKILL.md +84 -0
  116. package/skills/hatch3r-cli-docker/SKILL.md +89 -0
  117. package/skills/hatch3r-cli-duckdb/SKILL.md +84 -0
  118. package/skills/hatch3r-cli-fd/SKILL.md +85 -0
  119. package/skills/hatch3r-cli-fzf/SKILL.md +84 -0
  120. package/skills/hatch3r-cli-gh/SKILL.md +90 -0
  121. package/skills/hatch3r-cli-glab/SKILL.md +89 -0
  122. package/skills/hatch3r-cli-jq/SKILL.md +85 -0
  123. package/skills/hatch3r-cli-lazygit/SKILL.md +78 -0
  124. package/skills/hatch3r-cli-llm/SKILL.md +84 -0
  125. package/skills/hatch3r-cli-miller/SKILL.md +84 -0
  126. package/skills/hatch3r-cli-mods/SKILL.md +84 -0
  127. package/skills/hatch3r-cli-overview/SKILL.md +60 -0
  128. package/skills/hatch3r-cli-playwright/SKILL.md +89 -0
  129. package/skills/hatch3r-cli-podman/SKILL.md +84 -0
  130. package/skills/hatch3r-cli-ripgrep/SKILL.md +85 -0
  131. package/skills/hatch3r-cli-rtk/SKILL.md +91 -0
  132. package/skills/hatch3r-cli-sd/SKILL.md +85 -0
  133. package/skills/hatch3r-cli-stagehand/SKILL.md +79 -0
  134. package/skills/hatch3r-cli-taplo/SKILL.md +84 -0
  135. package/skills/hatch3r-cli-xsv/SKILL.md +89 -0
  136. package/skills/hatch3r-cli-yq/SKILL.md +85 -0
  137. package/skills/hatch3r-cli-zstd/SKILL.md +85 -0
  138. package/skills/hatch3r-context-health/SKILL.md +14 -0
  139. package/skills/hatch3r-cost-tracking/SKILL.md +14 -0
  140. package/skills/hatch3r-customize/SKILL.md +14 -0
  141. package/skills/hatch3r-dep-audit/SKILL.md +14 -0
  142. package/skills/hatch3r-design-system-detect/SKILL.md +162 -0
  143. package/skills/hatch3r-feature/SKILL.md +2 -0
  144. package/skills/hatch3r-gh-agentic-workflows/SKILL.md +13 -0
  145. package/skills/hatch3r-handoff-prepare/SKILL.md +160 -0
  146. package/skills/hatch3r-handoff-resume/SKILL.md +171 -0
  147. package/skills/hatch3r-incident-response/SKILL.md +14 -0
  148. package/skills/hatch3r-issue-workflow/SKILL.md +5 -0
  149. package/skills/hatch3r-logical-refactor/SKILL.md +14 -0
  150. package/skills/hatch3r-migration/SKILL.md +14 -0
  151. package/skills/hatch3r-observability-verify/SKILL.md +133 -0
  152. package/skills/hatch3r-perf-audit/SKILL.md +14 -0
  153. package/skills/hatch3r-pr-creation/SKILL.md +14 -0
  154. package/skills/hatch3r-qa-validation/SKILL.md +18 -0
  155. package/skills/hatch3r-recipe/SKILL.md +14 -0
  156. package/skills/hatch3r-refactor/SKILL.md +14 -0
  157. package/skills/hatch3r-release/SKILL.md +14 -0
  158. package/skills/hatch3r-reliability-verify/SKILL.md +144 -0
  159. package/skills/hatch3r-ui-ux-verify/SKILL.md +136 -0
  160. package/skills/hatch3r-visual-refactor/SKILL.md +15 -1
@@ -0,0 +1,133 @@
1
+ ---
2
+ id: hatch3r-observability-verify
3
+ type: skill
4
+ description: Verification gate before declaring an agent-produced service done — OTel span coverage on request path, structured-log + trace-id correlation, SLO definition, error-tracking integration, GenAI semconv on AI features
5
+ tags: [review, performance, devops]
6
+ quality_charter: agents/shared/quality-charter.md
7
+ ---
8
+ # Observability Verification Gate
9
+
10
+ ## Quick Start
11
+
12
+ This skill defines what "done" means for any feature shipping a service. Run before declaring a feature complete. The 9 gates below mix automated checks (machine-checkable on every PR) with one release-cadence gate (SLO + burn-rate alert review per release). Skipping any gate = the feature is not done. Reviewer approval and passing unit tests alone do not satisfy this bar.
13
+
14
+ ## Step 0 — Detect Ambiguity (P8 B1)
15
+
16
+ Before any work, scan the invocation for unresolved questions in scope, intent, acceptance criteria, target environment, or irreversibility. If any are found, ask the user via the platform-native question tool per `agents/shared/user-question-protocol.md`. Do not proceed under silent assumption. Default path, not an exception. Triggers for THIS skill: service scope (which routes), trace vendor (OTel collector vs vendor SDK), sample rates (head vs tail), SLO target values, and Gate 7 applicability (LLM-in-path vs pure service).
17
+
18
+ ## Fan-out Discipline (P8 B2)
19
+
20
+ This skill delegates per task size:
21
+ - Tier 1 (trivial single-file): inline execution acceptable.
22
+ - Tier 2 (multi-file or multi-concern): spawn parallel sub-agents per concern via the Task tool.
23
+ - Tier 3 (multi-module / high-risk): one fresh sub-agent per independent module or gate; orchestrator integrates only.
24
+
25
+ Never under-fan-out to save tokens. Token cost is dominated by quality and completeness gains. Emit `sub_agents_spawned: { count, rationale }` in your output.
26
+
27
+ ## Gate 1: OTel span on request path
28
+
29
+ - Every HTTP server entry point, every RPC handler, and every queue consumer emits a root span. Every outbound DB / cache / queue / external HTTP call is wrapped in a child span.
30
+ - Discovery: enumerate route declarations via `grep -E 'app\.(get|post|put|patch|delete)|router\.|@Get|@Post|fastify\.route' src/` and outbound calls via `grep -E 'fetch\(|axios|prisma|redis|pg\.query'`. Each match must have a tracer call on the same path: `grep -E 'tracer|startSpan|@WithSpan'` against the file.
31
+ - Auto-instrumentation packages (`@opentelemetry/auto-instrumentations-node`, `opentelemetry-instrumentation` Python) satisfy the spec when loaded before app imports — verify via process arg `--require @opentelemetry/auto-instrumentations-node/register` or equivalent loader.
32
+ - Pass criteria: >=1 root span per route + >=1 child span per outbound call. 0 routes without instrumentation. Coverage threshold: >=95% of declared routes emit at least one root span under fixture traffic.
33
+ - HTTP semconv attributes on every server span: `http.request.method`, `http.route`, `http.response.status_code`, `url.scheme`. DB spans carry `db.system` + `db.operation.name`. Span status `ERROR` set on every 5xx + every caught exception. Sources: `rules/hatch3r-observability-tracing.md`, OpenTelemetry semconv v1.29.
34
+
35
+ ## Gate 2: Structured logs with trace_id injection
36
+
37
+ - Every log line emitted from request scope is JSON (pino / winston / zap / loguru / `slog`). No `console.log` for application logs in production code paths.
38
+ - Every request-scoped logger carries `trace_id` and `span_id` from the active OTel context. Verify via Playwright or vitest fixture that emits a request and asserts both fields appear on the captured log line.
39
+ - Hook the logger to the active span: `@opentelemetry/instrumentation-pino` for Node, `LoggingInstrumentor` for Python — auto-injects trace_id + span_id. Manual injection acceptable when auto-instrumentation is unavailable for the logger.
40
+ - W3C Trace Context (`traceparent` + `tracestate` headers) propagated on every outbound HTTP call. Test: send a request, inspect the outbound call recorded by `nock` / `msw` / a recording proxy, assert the header is present and parses as a valid traceparent string `00-{32hex}-{16hex}-{2hex}`.
41
+ - Pass criteria: 0 unstructured app-log statements + 100% of request-scoped log lines carry `trace_id` + traceparent propagated on every outbound call. Sources: `rules/hatch3r-observability-logging.md`, W3C Trace Context Level 1 (W3C Recommendation 2020-02).
42
+
43
+ ## Gate 3: Severity and message standards
44
+
45
+ - OTel `SeverityNumber` mapping documented in the logger initialization. Replace ad-hoc level strings with the OTel-aligned set: `TRACE / DEBUG / INFO / WARN / ERROR / FATAL` mapped to SeverityNumber 1 / 5 / 9 / 13 / 17 / 21.
46
+ - Log messages follow the verb-first structure: action + object + outcome. Example: `"created order" {order_id, amount}`. Never embed dynamic values into the message string — pass them as fields.
47
+ - PII / secret redaction enabled via a centralized redactor — pino redact paths, winston format redactor, or a structured-log middleware. Audit: grep for password / authorization / token / email fields in log payloads; 0 unredacted hits.
48
+ - Required envelope fields on every log entry: `service.name`, `service.version`, `deployment.environment`, `trace_id`, `span_id`, `severity_number`, `timestamp` (RFC 3339 with millisecond precision).
49
+ - No `console.log` for app logs. Enforced via eslint rule `no-console` with `error` severity in production code paths; test code is exempt via override. Sources: `rules/hatch3r-observability-logging.md`, OpenTelemetry Logs Data Model.
50
+
51
+ ## Gate 4: RED + USE metrics
52
+
53
+ - Services emit RED metrics: a Rate counter, an Error counter, and a Duration histogram, each labeled `route`, `method`, `status`. Histogram buckets follow the rule default `[5, 10, 25, 50, 100, 250, 500, 1000, 2500, 5000, 10000]` ms.
54
+ - Resources emit USE metrics: Utilization gauge, Saturation gauge, Errors counter on the resource pool — DB connection pool, worker pool, queue depth, file descriptor count, in-memory cache fill ratio.
55
+ - Naming follows `{service}.{domain}.{metric}_{unit}` in snake_case. Counter names end in `_total`; histogram names end in the unit (`_ms`, `_bytes`).
56
+ - Cardinality budget per metric documented in a comment next to the instrument declaration. Cap label cardinality at the value defined in `rules/hatch3r-observability-metrics.md` (<100 unique values per label). Never use raw `user_id` or unbucketed `path` as a label.
57
+ - Exemplars attached to histogram observations when running with OTel Collector — link the metric data point to the corresponding trace_id for click-through from Grafana to the trace view.
58
+ - Pass criteria: RED triplet present per route + USE triplet present per pooled resource + cardinality cap declared + exemplars wired. Sources: Brendan Gregg USE method, Tom Wilkie RED method, `rules/hatch3r-observability-metrics.md`.
59
+
60
+ ## Gate 5: SLO defined
61
+
62
+ - Service declares at least one SLO covering availability, latency p95 or p99, and correctness where applicable. SLO target + measurement window + error-budget formula committed in `slo.yaml` or `service.yaml` at the service root.
63
+ - SLI definition uses the user-facing event ratio: `good_events / valid_events`. Source the numerator and denominator from the same signal (load-balancer logs OR application metrics, never mixed).
64
+ - Burn-rate alerts follow the Google SRE workbook multi-window multi-burn-rate (MWMBR) pattern: fast-burn alert at 2% budget consumed in 1 hour AND slow-burn at 5% consumed in 6 hours. Both windows must confirm before paging. Window pair selected per the workbook table to keep detection time < 1 hour for full-budget-exhaustion incidents.
65
+ - Error budget tracked on a rolling 30-day window. Burn-rate threshold = (budget_consumed_ratio / window_fraction).
66
+ - Pass criteria: SLO target documented + burn-rate alert config committed + runbook link present + error-budget tracker dashboard exists. Sources: Google SRE Workbook ch. 5 (Alerting on SLOs), `rules/hatch3r-observability-metrics.md`.
67
+
68
+ ## Gate 6: Error tracker integration
69
+
70
+ - Sentry / Honeycomb / Datadog / Bugsnag SDK initialized at process entry before any application code runs. Release version + commit SHA tagged via `release: process.env.GIT_SHA` or equivalent.
71
+ - Source maps uploaded in the build pipeline — verify via a grep of the deploy workflow for `sentry-cli sourcemaps upload` or vendor equivalent. Source-map upload step runs on tag-push and on every production deploy.
72
+ - Breadcrumbs configured: capture the last 50 user actions, network requests, and log entries leading to an error. Console-message breadcrumbs disabled in production to avoid leaking debug data.
73
+ - PII scrubbing enabled — `beforeSend` hook strips email, IP, password, authorization tokens from event payloads. Test via a fixture event with PII and assert the captured payload is clean.
74
+ - Sample rates: 100% for errors, 10% for transactions in production. Adjust per cost envelope; record the override in the SDK init comment.
75
+ - Pass criteria: SDK init present + release tag set + source-map upload in CI + PII scrubber wired + breadcrumb config explicit. Sources: `rules/hatch3r-observability-logging.md`, Sentry release tracking guide.
76
+
77
+ ## Gate 7: AI / LLM observability (when applicable)
78
+
79
+ Applies only when the feature calls an LLM or runs an agent:
80
+
81
+ - GenAI semconv span on every LLM call carrying `gen_ai.system`, `gen_ai.request.model`, `gen_ai.usage.input_tokens`, `gen_ai.usage.output_tokens`, `gen_ai.response.finish_reasons`. Cache-hit flag emitted as a span attribute when the provider returns one.
82
+ - Tools invoked by the agent emit `tool.{name}.execute` spans per `rules/hatch3r-observability-tracing-detail.md`. Each tool span carries `tool.name`, `tool.input_hash`, `tool.output_status`, `tool.duration_ms`.
83
+ - Cost telemetry per request: a metric counter `gen_ai.tokens_total{direction, model, agent_name}` and a histogram `gen_ai.request_duration_ms`.
84
+ - GenAI spans sampled at 50-100% in production — higher than general spans because volume is low and per-call cost is high.
85
+
86
+ Cross-reference: `rules/hatch3r-ai-evals.md` (Slice 5), OpenLLMetry semantic conventions.
87
+
88
+ ## Gate 8: Sampling and cost control
89
+
90
+ - Head sampling configured in the SDK and tail sampling configured at the OpenTelemetry Collector. Default: `ParentBased(TraceIdRatioBased(0.1))` head sample + tail-sampling policy keeping 100% of error traces and 100% of traces with latency > p95.
91
+ - Spans-per-second budget documented per service alongside expected QPS. Budget formula: `target_sps = qps * head_sample * (1 + retry_factor)`. Re-check on every deploy.
92
+ - Log sampling for high-volume routes — health checks and static asset routes drop to 1% sample rate via a per-route override at the logger or middleware.
93
+ - Cardinality drop rules at the Collector or vendor — drop attributes that exceed the cardinality budget rather than failing ingestion. Example: drop `user_id` from spans before export when count > 10k unique values per 5-minute window.
94
+ - Cost-budget alert wired on monthly telemetry spend with a 80% threshold warning and 100% threshold page.
95
+ - Pass criteria: head + tail sampling declared + per-route log sample rule + cardinality drop policy + cost-budget alert. Sources: OpenTelemetry sampling docs, `rules/hatch3r-observability-tracing.md`.
96
+
97
+ ## Gate 9: Alerts-as-code with runbook URL
98
+
99
+ - Every Prometheus / Datadog / Grafana alert defined in Terraform or YAML committed to the repo. No alerts created via vendor console.
100
+ - Every alert rule carries a `runbook_url` annotation linking to a runbook in `docs/runbooks/` or equivalent. Runbook contains: symptoms, likely causes, diagnostic steps, remediation actions, owner team, escalation policy.
101
+ - Severity tier set on every alert per the project policy: P1 page on-call within 15 min; P2 page within 1 hour; P3 Slack channel; P4 ticket only. Alerts without a severity tag fail the gate.
102
+ - CI check parses alert files and fails when `runbook_url` is missing or the target runbook file does not exist. Provide a `validate-alerts` script under `scripts/` or rely on `promtool check rules` for Prometheus.
103
+ - Pass criteria: 100% alerts in code + 100% alerts with runbook annotation + 100% alerts with severity tier + target runbook file exists. Sources: Grafana alerting-as-code docs, Datadog Terraform provider, `rules/hatch3r-observability-metrics.md`.
104
+
105
+ ## Verdict
106
+
107
+ All 9 gates pass = the feature is "done". Anything less = not done.
108
+
109
+ The orchestrator running this skill emits a single-line verdict per gate (`GATE_N: PASS|FAIL <evidence-path>`) and aggregates them. One FAIL on a required gate blocks the merge regardless of reviewer approval status.
110
+
111
+ ## When this skill runs
112
+
113
+ - After `hatch3r-implementer` finishes service code and before `hatch3r-qa-validation` runs.
114
+ - On every PR that touches `src/routes/`, `src/handlers/`, `src/services/`, `src/api/`, `src/middleware/`, `src/controllers/`, `src/lib/`, or any file matching the four observability rule globs.
115
+ - Gate 5 (SLO + burn-rate alert review) executes at release-cut time per release; PR-level execution checks only that the SLO file exists and is non-empty.
116
+
117
+ ## Cross-References
118
+
119
+ - `rules/hatch3r-observability.md`
120
+ - `rules/hatch3r-observability-logging.md`
121
+ - `rules/hatch3r-observability-metrics.md`
122
+ - `rules/hatch3r-observability-tracing.md`
123
+ - `rules/hatch3r-observability-tracing-detail.md`
124
+
125
+ ## References
126
+
127
+ - OpenTelemetry Semantic Conventions v1.29 — `opentelemetry.io/docs/specs/semconv/`
128
+ - OpenTelemetry GenAI Semantic Conventions — `opentelemetry.io/docs/specs/semconv/gen-ai/`
129
+ - W3C Trace Context Level 1 — `www.w3.org/TR/trace-context/`
130
+ - Google SRE Workbook ch. 5 (SLO + multi-burn-rate alerts) — `sre.google/workbook/alerting-on-slos/`
131
+ - Grafana SLO and alerts-as-code — `grafana.com/docs/grafana/latest/alerting/`
132
+ - Sentry release tracking and source maps — `docs.sentry.io/product/releases/`
133
+ - OpenLLMetry GenAI conventions — `github.com/traceloop/openllmetry`
@@ -14,6 +14,7 @@ cache_friendly: true
14
14
 
15
15
  ```
16
16
  Task Progress:
17
+ - [ ] Step 0: Detect ambiguity (P8 B1)
17
18
  - [ ] Step 1: Read performance budgets from rules and specs
18
19
  - [ ] Step 2: Profile — bundle size, runtime, memory
19
20
  - [ ] Step 3: Identify violations — which budgets exceeded, which hot paths slow
@@ -22,6 +23,19 @@ Task Progress:
22
23
  - [ ] Step 6: Verify all budgets met, no regressions
23
24
  ```
24
25
 
26
+ ## Step 0 — Detect Ambiguity (P8 B1)
27
+
28
+ Before any work, scan the invocation for unresolved questions in scope, intent, acceptance criteria, target environment, or irreversibility. If any are found, ask the user via the platform-native question tool per `agents/shared/user-question-protocol.md`. Do not proceed under silent assumption. Default path, not an exception. Triggers for THIS skill: target surface (frontend bundle vs backend cold start vs DB query), budget threshold values, profiling environment (local vs CI vs production), regression policy (revert vs ship-and-monitor), and whether optimization is allowed to introduce new deps.
29
+
30
+ ## Fan-out Discipline (P8 B2)
31
+
32
+ This skill delegates per task size:
33
+ - Tier 1 (trivial single-file): inline execution acceptable.
34
+ - Tier 2 (multi-file or multi-concern): spawn parallel sub-agents per concern via the Task tool.
35
+ - Tier 3 (multi-module / high-risk): one fresh sub-agent per independent module or gate; orchestrator integrates only.
36
+
37
+ Never under-fan-out to save tokens. Token cost is dominated by quality and completeness gains. Emit `sub_agents_spawned: { count, rationale }` in your output.
38
+
25
39
  ## Step 1: Read Performance Budgets
26
40
 
27
41
  Load the project's performance budgets from project rules and quality documentation:
@@ -16,12 +16,26 @@ cache_friendly: true
16
16
 
17
17
  ```
18
18
  Task Progress:
19
+ - [ ] Step 0: Detect ambiguity (P8 B1)
19
20
  - [ ] Step 1: Verify branch naming
20
21
  - [ ] Step 2: Self-review against checklist
21
22
  - [ ] Step 3: Fill PR/MR template
22
23
  - [ ] Step 4: Create the PR/MR
23
24
  ```
24
25
 
26
+ ## Step 0 — Detect Ambiguity (P8 B1)
27
+
28
+ Before any work, scan the invocation for unresolved questions in scope, intent, acceptance criteria, target environment, or irreversibility. If any are found, ask the user via the platform-native question tool per `agents/shared/user-question-protocol.md`. Do not proceed under silent assumption. Default path, not an exception. Triggers for THIS skill: target base branch (`board.defaultBranch` vs feature branch), draft vs ready-for-review, reviewers explicitly named, rollout plan (feature flag vs direct), and whether the diff includes irreversible operations (force-push, data migration).
29
+
30
+ ## Fan-out Discipline (P8 B2)
31
+
32
+ This skill delegates per task size:
33
+ - Tier 1 (trivial single-file): inline execution acceptable.
34
+ - Tier 2 (multi-file or multi-concern): spawn parallel sub-agents per concern via the Task tool.
35
+ - Tier 3 (multi-module / high-risk): one fresh sub-agent per independent module or gate; orchestrator integrates only.
36
+
37
+ Never under-fan-out to save tokens. Token cost is dominated by quality and completeness gains. Emit `sub_agents_spawned: { count, rationale }` in your output.
38
+
25
39
  ## Step 1: Branch Naming
26
40
 
27
41
  Branches must follow `{type}/{short-description}`:
@@ -12,6 +12,7 @@ cache_friendly: true
12
12
 
13
13
  ```
14
14
  Task Progress:
15
+ - [ ] Step 0: Detect ambiguity (P8 B1)
15
16
  - [ ] Step 1: Read the issue and relevant specs
16
17
  - [ ] Step 2: Produce a validation plan
17
18
  - [ ] Step 3: Execute all test cases
@@ -19,6 +20,19 @@ Task Progress:
19
20
  - [ ] Step 5: File follow-up issues
20
21
  ```
21
22
 
23
+ ## Step 0 — Detect Ambiguity (P8 B1)
24
+
25
+ Before any work, scan the invocation for unresolved questions in scope, intent, acceptance criteria, target environment, or irreversibility. If any are found, ask the user via the platform-native question tool per `agents/shared/user-question-protocol.md`. Do not proceed under silent assumption. Default path, not an exception. This upgrades validation from exception-driven to default-driven. Triggers for THIS skill: validation scope (single feature vs release), target environment (staging vs prod), pass/fail thresholds, flaky-test policy (retry vs quarantine), and ship/hold authority (auto-block vs surface for review).
26
+
27
+ ## Fan-out Discipline (P8 B2)
28
+
29
+ This skill delegates per task size:
30
+ - Tier 1 (trivial single-file): inline execution acceptable.
31
+ - Tier 2 (multi-file or multi-concern): spawn parallel sub-agents per concern via the Task tool.
32
+ - Tier 3 (multi-module / high-risk): one fresh sub-agent per independent module or gate; orchestrator integrates only.
33
+
34
+ Never under-fan-out to save tokens. Token cost is dominated by quality and completeness gains. Emit `sub_agents_spawned: { count, rationale }` in your output.
35
+
22
36
  ## Step 1: Read Inputs
23
37
 
24
38
  - Parse the issue body: validation scope, test matrix, environments, preconditions, pass/fail criteria, evidence requirements.
@@ -61,6 +75,10 @@ For non-UI test cases (API, data integrity, background jobs), use appropriate no
61
75
 
62
76
  Do NOT fix bugs during validation. Document and file issues.
63
77
 
78
+ ### 3c. UI/UX Verification Gate
79
+
80
+ For any feature that ships UI, the UI/UX verification gate is **`hatch3r-ui-ux-verify`** (`skills/hatch3r-ui-ux-verify/SKILL.md`). All 9 gates in that skill must pass before declaring the feature done. QA validation alone (browser tests, screenshot evidence) does not constitute UI/UX done. Run `hatch3r-ui-ux-verify` before this report's SHIP recommendation and include its verdict in the report.
81
+
64
82
  ## Step 4: Validation Report
65
83
 
66
84
  Produce a structured report with:
@@ -12,6 +12,7 @@ cache_friendly: true
12
12
 
13
13
  ```
14
14
  Task Progress:
15
+ - [ ] Step 0: Detect ambiguity (P8 B1)
15
16
  - [ ] Step 1: Identify the workflow to capture as a recipe
16
17
  - [ ] Step 2: Design the step sequence and dependency graph
17
18
  - [ ] Step 3: Write the recipe YAML
@@ -19,6 +20,19 @@ Task Progress:
19
20
  - [ ] Step 5: Validate with a real execution
20
21
  ```
21
22
 
23
+ ## Step 0 — Detect Ambiguity (P8 B1)
24
+
25
+ Before any work, scan the invocation for unresolved questions in scope, intent, acceptance criteria, target environment, or irreversibility. If any are found, ask the user via the platform-native question tool per `agents/shared/user-question-protocol.md`. Do not proceed under silent assumption. Default path, not an exception. Triggers for THIS skill: recipe scope (single project vs shared), required variables and defaults, checkpoint policy (pause vs flow), error handling (resume vs restart), and target file location (`.hatch3r/recipes/` project vs global).
26
+
27
+ ## Fan-out Discipline (P8 B2)
28
+
29
+ This skill delegates per task size:
30
+ - Tier 1 (trivial single-file): inline execution acceptable.
31
+ - Tier 2 (multi-file or multi-concern): spawn parallel sub-agents per concern via the Task tool.
32
+ - Tier 3 (multi-module / high-risk): one fresh sub-agent per independent module or gate; orchestrator integrates only.
33
+
34
+ Never under-fan-out to save tokens. Token cost is dominated by quality and completeness gains. Emit `sub_agents_spawned: { count, rationale }` in your output.
35
+
22
36
  ## Step 1: Identify Workflow
23
37
 
24
38
  Determine the repeatable workflow pattern:
@@ -14,6 +14,7 @@ cache_friendly: true
14
14
 
15
15
  ```
16
16
  Task Progress:
17
+ - [ ] Step 0: Detect ambiguity (P8 B1)
17
18
  - [ ] Step 1: Read the issue, specs, and existing tests
18
19
  - [ ] Step 2: Produce a refactor plan
19
20
  - [ ] Step 3: Implement with behavioral preservation
@@ -21,6 +22,19 @@ Task Progress:
21
22
  - [ ] Step 5: Open PR
22
23
  ```
23
24
 
25
+ ## Step 0 — Detect Ambiguity (P8 B1)
26
+
27
+ Before any work, scan the invocation for unresolved questions in scope, intent, acceptance criteria, target environment, or irreversibility. If any are found, ask the user via the platform-native question tool per `agents/shared/user-question-protocol.md`. Do not proceed under silent assumption. Default path, not an exception. Triggers for THIS skill: refactor scope (one module vs cross-cutting), behavioral invariants to preserve, public API surface (preserved vs changed), test rewrite policy (preserve vs replace), and acceptable performance delta.
28
+
29
+ ## Fan-out Discipline (P8 B2)
30
+
31
+ This skill delegates per task size:
32
+ - Tier 1 (trivial single-file): inline execution acceptable.
33
+ - Tier 2 (multi-file or multi-concern): spawn parallel sub-agents per concern via the Task tool.
34
+ - Tier 3 (multi-module / high-risk): one fresh sub-agent per independent module or gate; orchestrator integrates only.
35
+
36
+ Never under-fan-out to save tokens. Token cost is dominated by quality and completeness gains. Emit `sub_agents_spawned: { count, rationale }` in your output.
37
+
24
38
  ## Step 1: Read Inputs
25
39
 
26
40
  - Parse the issue body: motivation, proposed change, affected files, safety plan, risk analysis, acceptance criteria.
@@ -14,6 +14,7 @@ cache_friendly: true
14
14
 
15
15
  ```
16
16
  Task Progress:
17
+ - [ ] Step 0: Detect ambiguity (P8 B1)
17
18
  - [ ] Step 1: Determine version bump (major/minor/patch) based on changes
18
19
  - [ ] Step 2: Generate changelog from merged PRs and commit history
19
20
  - [ ] Step 3: Update version in package.json and any other version references
@@ -23,6 +24,19 @@ Task Progress:
23
24
  - [ ] Step 7: Monitor post-deploy for errors/regressions
24
25
  ```
25
26
 
27
+ ## Step 0 — Detect Ambiguity (P8 B1)
28
+
29
+ Before any work, scan the invocation for unresolved questions in scope, intent, acceptance criteria, target environment, or irreversibility. If any are found, ask the user via the platform-native question tool per `agents/shared/user-question-protocol.md`. Do not proceed under silent assumption. Default path, not an exception. Triggers for THIS skill: bump level (major vs minor vs patch), deploy authority (cut-only vs deploy-and-monitor), staging gate (required vs skipped), rollback policy (auto vs manual), and irreversible tag/publish operations (npm publish, GitHub release).
30
+
31
+ ## Fan-out Discipline (P8 B2)
32
+
33
+ This skill delegates per task size:
34
+ - Tier 1 (trivial single-file): inline execution acceptable.
35
+ - Tier 2 (multi-file or multi-concern): spawn parallel sub-agents per concern via the Task tool.
36
+ - Tier 3 (multi-module / high-risk): one fresh sub-agent per independent module or gate; orchestrator integrates only.
37
+
38
+ Never under-fan-out to save tokens. Token cost is dominated by quality and completeness gains. Emit `sub_agents_spawned: { count, rationale }` in your output.
39
+
26
40
  ## Step 1: Determine Version Bump
27
41
 
28
42
  - Review changes since last release: merged PRs/MRs, commit history.
@@ -0,0 +1,144 @@
1
+ ---
2
+ id: hatch3r-reliability-verify
3
+ type: skill
4
+ description: Reliability verification gate before declaring an agent-produced service done — SLO defined, kill switch, timeouts, retries, probes, runbook, staged rollout
5
+ tags: [review, devops]
6
+ quality_charter: agents/shared/quality-charter.md
7
+ ---
8
+ # Reliability Verification Gate
9
+
10
+ ## Quick Start
11
+
12
+ This skill defines what "done" means for any feature shipping a service to production. Run before declaring a feature complete. The 9 gates below are machine-checkable on the manifest, the source, and the alert configuration. Skipping any gate = the feature is not done. Functional tests passing alone do not satisfy this bar — a service that lacks an SLO, a kill switch, or a runbook will fail in production before its first alert reaches the on-call.
13
+
14
+ Inputs the skill expects:
15
+
16
+ - A service repository with `src/` and `k8s/` (or equivalent manifest path).
17
+ - A `docs/runbooks/` directory.
18
+ - Either a `slo/` directory or inline SLO definitions in the alert manifest (Prometheus rules, Datadog monitors, OpenSLO YAML).
19
+
20
+ Outputs the skill produces: a 9-line verdict block written to the PR conversation, plus a JSON artifact at `.audit-workspace/reliability-verify-<sha>.json` for downstream consumption by `hatch3r-release` (or any downstream release-prep skill).
21
+
22
+ ## Step 0 — Detect Ambiguity (P8 B1)
23
+
24
+ Before any work, scan the invocation for unresolved questions in scope, intent, acceptance criteria, target environment, or irreversibility. If any are found, ask the user via the platform-native question tool per `agents/shared/user-question-protocol.md`. Do not proceed under silent assumption. Default path, not an exception. Triggers for THIS skill: service scope, SLO target values and window, rollout strategy (canary stages, hold durations), kill-switch authority and provider, and blast-radius rollback drill cadence.
25
+
26
+ ## Fan-out Discipline (P8 B2)
27
+
28
+ This skill delegates per task size:
29
+ - Tier 1 (trivial single-file): inline execution acceptable.
30
+ - Tier 2 (multi-file or multi-concern): spawn parallel sub-agents per concern via the Task tool.
31
+ - Tier 3 (multi-module / high-risk): one fresh sub-agent per independent module or gate; orchestrator integrates only.
32
+
33
+ Never under-fan-out to save tokens. Token cost is dominated by quality and completeness gains. Emit `sub_agents_spawned: { count, rationale }` in your output.
34
+
35
+ ## Gate 1: SLO defined
36
+
37
+ - The service has at least one Service Level Objective with target percentile, evaluation window, and a wired burn-rate alert.
38
+ - Format: `availability >= 99.9% over rolling 28d` or `p95 latency <= 300ms over rolling 28d`.
39
+ - Burn-rate alert pattern: multi-window multi-burn-rate (Google SRE) — fast burn (14.4x over 5m AND 6x over 1h) pages immediately; slow burn (3x over 6h AND 1x over 3d) opens a ticket.
40
+ - Output: SLO manifest path committed to the repo (e.g. `slo/<service>.yaml` or a Sloth / OpenSLO file).
41
+ - Check: grep for `slo:` or `objectives:` in the service manifest; reject if absent.
42
+ - Cross-reference: `rules/hatch3r-observability-metrics.md`.
43
+
44
+ ## Gate 2: Kill switch present
45
+
46
+ - Every risky feature is gated by an OpenFeature Ops flag with a documented flip procedure.
47
+ - The flag name appears in `docs/runbooks/<service>.md` next to the alert that would trigger its use.
48
+ - Default-on with OFF override; provider connectivity loss does not silently disable the kill switch.
49
+ - Check: open the runbook, locate the flag name, confirm a flip-procedure step exists with the exact CLI or UI action.
50
+ - Cross-reference: `rules/hatch3r-operability.md` §Feature Flags.
51
+
52
+ ## Gate 3: Timeouts on every outbound call
53
+
54
+ - Every DB, cache, queue, external HTTP, and external RPC call has an explicit timeout.
55
+ - Deadline propagation verified: parent timeout reaches child via `context.WithDeadline` (Go), chained `AbortSignal` (Web/Node), `Deadline` metadata (gRPC), or `TimeLimiter` (JVM).
56
+ - Default budgets: service-call 5s, DB 2s, cache 200ms, health-probe 1s.
57
+ - Check: grep the codebase for outbound-call sites and confirm each has a timeout argument or wrapper.
58
+ - Cross-reference: `rules/hatch3r-resilience-patterns.md` §Timeouts.
59
+
60
+ ## Gate 4: Retries with decorrelated jitter
61
+
62
+ - Outbound calls wrap in a retry library — `opossum` (Node), `resilience4j` (JVM), `Polly` (.NET), `gobreaker` + `cenkalti/backoff` (Go), or `pybreaker` + `tenacity` (Python).
63
+ - Retry algorithm is decorrelated jitter: `sleep = min(cap, random_between(base, prev_sleep * 3))` with base 100ms, cap 30s, max 3 retries.
64
+ - `Idempotency-Key` header present on retried non-idempotent operations (POST, PATCH).
65
+ - Retry budget enforced: retry traffic capped at 10% of base traffic.
66
+ - Cross-reference: `rules/hatch3r-resilience-patterns.md` §Retry.
67
+
68
+ ## Gate 5: Probes wired
69
+
70
+ - Kubernetes manifest defines `livenessProbe`, `readinessProbe`, and (for slow-starting services) `startupProbe`.
71
+ - Liveness is shallow (no downstream check); readiness is deep (downstream pings).
72
+ - Distinct endpoints — `/health/live`, `/health/ready`, `/health/startup` — not a single shared `/health`.
73
+ - Probe timeouts under 1s for live, under 2s for ready; periods 10s / 5s / 5s.
74
+ - Check: parse the k8s manifest YAML and verify `livenessProbe.httpGet.path != readinessProbe.httpGet.path` (shared endpoints fail this gate).
75
+ - Cross-reference: `rules/hatch3r-operability.md` §Probes.
76
+
77
+ ## Gate 6: Graceful shutdown
78
+
79
+ - SIGTERM handler closes the listener, marks `/health/ready` to 503, then drains in-flight requests.
80
+ - `preStop` hook delays 1–3s before SIGTERM to handle the endpoint-propagation race.
81
+ - `terminationGracePeriodSeconds >= 45`.
82
+ - Queue consumers commit offsets before disconnect.
83
+ - Cross-reference: `rules/hatch3r-operability.md` §Graceful Shutdown.
84
+
85
+ ## Gate 7: Runbook URL on every alert
86
+
87
+ - Every Prometheus / Datadog / Grafana alert has a `runbook_url` annotation linking to `docs/runbooks/<alert-name>.md`.
88
+ - Runbook contains the 5 required sections: Symptoms, Triage, Mitigation, Root cause, Follow-ups.
89
+ - CI check on the alert manifest fails any alert without `runbook_url` or with a 404 link.
90
+ - Cross-reference: `rules/hatch3r-operability.md` §Runbook URL.
91
+
92
+ ## Gate 8: Staged rollout configured
93
+
94
+ - Deployment uses Argo Rollouts, Flagger, or an equivalent controller with canary or blue-green configured.
95
+ - Stage cadence: 1% → 10% → 50% → 100% with minimum holds 30 min / 1 h / 2 h.
96
+ - Auto-rollback wired to the service SLO burn-rate alert (fast-burn triggers immediate rollback).
97
+ - Canary analysis gates error-rate ratio, p95/p99 latency, and business KPIs against a live baseline.
98
+ - Check: locate the `Rollout` or `Canary` resource in the deploy directory; reject if missing or if `steps:` skips the 1% stage.
99
+ - Cross-reference: `rules/hatch3r-progressive-delivery.md`.
100
+
101
+ ## Gate 9: Blast-radius documented
102
+
103
+ - PR description includes the blast-radius block: services affected, regions, traffic %, rollback time target (<5 min), exact rollback command.
104
+ - Rollback command verified by quarterly drill — drill date recorded in the runbook.
105
+ - Database migrations follow expand-contract; no destructive migration ships in the same deploy as the consuming code.
106
+ - Check: parse the PR body for the `## Blast radius` section; reject if absent or if any required field is empty.
107
+ - Cross-reference: `rules/hatch3r-progressive-delivery.md` §Blast-Radius Reasoning.
108
+
109
+ ## Verdict
110
+
111
+ All 9 gates pass = the service is "done" enough to ship to production. Anything less = not done; the missing gates are findings against this skill.
112
+
113
+ The orchestrator running this skill emits a single-line verdict per gate (`GATE_N: PASS|FAIL <evidence-path>`) and aggregates them. One FAIL on a required gate blocks the merge regardless of functional-test status.
114
+
115
+ Evidence path format: `path/to/file.yaml:LN` or `commit-sha`. The verdict is auditable — a downstream review or release-gate skill can replay the same checks against the same evidence paths and reproduce the verdict bit-for-bit.
116
+
117
+ Gates run independently — a FAIL on Gate 3 does not short-circuit the remaining gates; the run produces the full 9-line verdict so the developer fixes everything in one pass rather than serializing on rerun cycles.
118
+
119
+ ## When this skill runs
120
+
121
+ - After `hatch3r-implementer` finishes service code and before `hatch3r-qa-validation` runs.
122
+ - On every PR that touches `src/services/`, `src/handlers/`, `src/clients/`, `k8s/`, `manifests/`, or the alert / SLO configuration.
123
+ - Gate 9 (drill verification) requires manual confirmation from the on-call rota at release-cut time, not per PR.
124
+ - New-service bootstrap: run the full 9 gates before the first production deploy; failing any one is a blocker, not a follow-up.
125
+
126
+ ## Cross-References
127
+
128
+ - `rules/hatch3r-resilience-patterns.md` — circuit breakers, retries with decorrelated jitter, idempotency keys.
129
+ - `rules/hatch3r-operability.md` — probes, graceful shutdown, kill switches, runbooks.
130
+ - `rules/hatch3r-progressive-delivery.md` — canary, blue-green, auto-rollback on SLO burn.
131
+ - `rules/hatch3r-observability-metrics.md` — SLOs, RED metrics, burn-rate alerts.
132
+
133
+ ## References
134
+
135
+ - Google SRE workbook — `sre.google/workbook`
136
+ - Kubernetes probes — `kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-startup-probes`
137
+ - Argo Rollouts — `argoproj.github.io/argo-rollouts`
138
+ - Flagger — `flagger.app`
139
+ - OpenFeature — `openfeature.dev`
140
+ - opossum (Node) — `github.com/nodeshift/opossum`
141
+ - resilience4j (JVM) — `resilience4j.readme.io`
142
+ - Polly (.NET) — `pollydocs.org`
143
+ - Sloth (Prometheus SLO generator) — `sloth.dev`
144
+ - OpenSLO specification — `openslo.com`
@@ -0,0 +1,136 @@
1
+ ---
2
+ id: hatch3r-ui-ux-verify
3
+ type: skill
4
+ description: UI/UX verification gate before declaring a feature done — axe-core, scripted keyboard trace, accessibility-tree snapshot, four-state coverage, visual-regression baseline, one human screen-reader pass per release
5
+ tags: [ui, ux, a11y]
6
+ quality_charter: agents/shared/quality-charter.md
7
+ ---
8
+ # UI/UX Verification Gate
9
+
10
+ ## Quick Start
11
+
12
+ This skill defines what "done" means for any feature shipping UI. Run before declaring a feature complete. The 9 gates below mix automated checks (machine-checkable on every PR) with one manual gate (one human screen-reader pass per release). Skipping any gate = the feature is not done. Browser tests and screenshots from `hatch3r-qa-validation` alone do not satisfy this bar.
13
+
14
+ ## Step 0 — Detect Ambiguity (P8 B1)
15
+
16
+ Before any work, scan the invocation for unresolved questions in scope, intent, acceptance criteria, target environment, or irreversibility. If any are found, ask the user via the platform-native question tool per `agents/shared/user-question-protocol.md`. Do not proceed under silent assumption. Default path, not an exception. Triggers for THIS skill: routes in scope (single vs all interactive), WCAG target (2.1 AA vs 2.2 AA), visual-regression baseline policy (regenerate vs keep), AI-UX gate applicability, and whether Gate 9 (manual SR pass) is required this run.
17
+
18
+ ## Fan-out Discipline (P8 B2)
19
+
20
+ This skill delegates per task size:
21
+ - Tier 1 (trivial single-file): inline execution acceptable.
22
+ - Tier 2 (multi-file or multi-concern): spawn parallel sub-agents per concern via the Task tool.
23
+ - Tier 3 (multi-module / high-risk): one fresh sub-agent per independent module or gate; orchestrator integrates only.
24
+
25
+ Never under-fan-out to save tokens. Token cost is dominated by quality and completeness gains. Emit `sub_agents_spawned: { count, rationale }` in your output.
26
+
27
+ ## Gate 1: Automated a11y scan (axe-core via Playwright)
28
+
29
+ - Command: `npx playwright test --grep @a11y` with `@axe-core/playwright` integration on every interactive route.
30
+ - Pass criteria: 0 serious / 0 critical violations.
31
+ - WCAG 2.2 AA target with explicit checks for the new success criteria:
32
+ - **SC 2.5.8 Target Size:** assert minimum 24x24 CSS px on every focusable element.
33
+ - **SC 2.4.11 Focus Not Obscured:** assert the focus ring is fully visible — not hidden behind sticky headers, banners, or chatbots.
34
+ - **SC 2.5.7 Dragging Movements:** assert a non-drag alternative exists for any drag operation.
35
+ - Output: a11y report committed to PR. Merge gate: 0 violations.
36
+ - Setup: `import AxeBuilder from '@axe-core/playwright'`; call `new AxeBuilder({ page }).analyze()` inside each route test and assert `results.violations.length === 0` after filtering for `impact in ['serious', 'critical']`.
37
+
38
+ ## Gate 2: Scripted keyboard trace
39
+
40
+ - Playwright script Tabs / Shift+Tabs / Enter / Space / Escape / Arrows through every interactive element on every route.
41
+ - Per-element assertions:
42
+ - Focus is visible (computed outline width > 0 or detectable focus ring).
43
+ - Focused element is within the viewport (scroll into view if not).
44
+ - No keyboard trap — Tab on the last element exits to the next region.
45
+ - Pass criteria: 100% interactive elements reached + 0 traps + 0 focus-visibility failures.
46
+ - Implementation: enumerate focusable elements via `page.locator('a, button, input, select, textarea, [tabindex]:not([tabindex="-1"])')`; iterate Tab presses up to `count + 5` and record the activeElement chain. Diff against the enumeration; any unreached element fails the gate.
47
+
48
+ ## Gate 3: Accessibility-tree snapshot
49
+
50
+ - Playwright captures the accessibility tree on each route via `page.accessibility.snapshot()`.
51
+ - Per-route assertions:
52
+ - Exactly one `<h1>`.
53
+ - Landmark coverage: `banner`, `main`, `nav`, `contentinfo` present.
54
+ - Every form input has an accessible name.
55
+ - Every image has an `alt` attribute or `role="presentation"`.
56
+ - Snapshots committed to the repo. Diff on every PR surfaces visual a11y regression.
57
+
58
+ ## Gate 4: Four-state coverage check
59
+
60
+ - For every async surface, assert snapshots exist for all four states:
61
+ - **loading** (skeleton)
62
+ - **empty** (with CTA)
63
+ - **error** (cause + retry)
64
+ - **partial** (banner + degraded data)
65
+ - Missing snapshot = blocker.
66
+ - Convention: `src/__tests__/states/<feature>.<state>.spec.ts`.
67
+ - Discovery: a pre-test script greps for async data hooks (`useQuery`, `useSWR`, `fetch`, `axios`) and emits the list of features that must have all four state files. Missing files fail the gate before any test runs.
68
+
69
+ ## Gate 5: Visual regression baseline
70
+
71
+ - `playwright.toHaveScreenshot()` for component-library projects; Chromatic or Percy for Storybook-heavy projects.
72
+ - Baselines committed to git or stored in the registry. Never auto-regenerated in CI on the same commit that introduces a visual change.
73
+ - Pass criteria: 0 unintentional drift. Intentional drift requires a reviewer to update the baseline.
74
+ - Pixel threshold: `maxDiffPixels: 0` for layout-critical screens (header, nav, primary CTA); `maxDiffPixelRatio: 0.001` for content-heavy screens. Tighter thresholds catch silent regressions; looser thresholds tolerate font-rendering noise on content text.
75
+
76
+ ## Gate 6: Microcopy lint
77
+
78
+ - Forbid filler tokens in user-facing strings: "oops", "whoops", "something went wrong", "uh oh".
79
+ - Require a corrective verb on error strings — scan the messages files for error messages, fail when no imperative verb appears.
80
+ - Require the `autocomplete` attribute on every input matching `email`, `password`, `name`, or `address`. axe-core covers part of this; add a custom rule for the rest.
81
+
82
+ ## Gate 7: Core Web Vitals (2026 thresholds)
83
+
84
+ - Lighthouse CI or the `web-vitals` library in a synthetic environment.
85
+ - p75 thresholds, measured on mobile with slow-4G + 4x CPU throttle:
86
+ - **LCP** <= 2.5s
87
+ - **INP** <= 200ms
88
+ - **CLS** <= 0.1
89
+ - Failure on any metric = merge blocker.
90
+ - Field data follow-up: when production has RUM (Real User Monitoring) wired via `web-vitals` posting to an analytics endpoint, compare p75 field values to synthetic budgets weekly. A 25% gap between synthetic and field is a finding — re-tune the synthetic environment.
91
+
92
+ ## Gate 8: AI-UX checks (when applicable)
93
+
94
+ Applies only when the feature ships LLM-driven UI:
95
+
96
+ - Streaming hooks in use — grep for `useChat`, `useCompletion`, `streamUI`, or the framework equivalent.
97
+ - Tool-call cards visible by default — assert at least one rendered card per tool invocation in fixtures.
98
+ - Human-approval gates present for side-effectful tools — assert an approval card before `write`, `send`, or `post` tool calls.
99
+ - Cancel/abort controls present and wired to an `AbortController`.
100
+
101
+ Cross-reference: `rules/hatch3r-ai-ux-patterns.md` (Slice 5).
102
+
103
+ ## Gate 9: Manual screen-reader pass (per release, not per PR)
104
+
105
+ - One human pass with VoiceOver (macOS or iOS) or NVDA (Windows) per release on the key user flow.
106
+ - Document the trace in the release notes: route walked, issues found, fixes applied.
107
+ - This gate cannot be skipped or automated away.
108
+ - Trace template: open route, enable screen reader, navigate by heading / by landmark / by form control. Record three things — what was announced, what was missing, what was wrong. Fix or file before release.
109
+
110
+ ## Verdict
111
+
112
+ All 9 gates pass = the feature is "done". Anything less = not done.
113
+
114
+ The orchestrator running this skill emits a single-line verdict per gate (`GATE_N: PASS|FAIL <evidence-path>`) and aggregates them. One FAIL on a required gate blocks the merge regardless of QA validation status.
115
+
116
+ ## When this skill runs
117
+
118
+ - After `hatch3r-implementer` finishes feature code and before `hatch3r-qa-validation` runs.
119
+ - On every PR that touches `src/components/`, `src/pages/`, `src/routes/`, or any file matched by the design-system glob.
120
+ - Gate 9 (manual screen-reader pass) skipped on PR runs and required at release-cut time only.
121
+
122
+ ## Cross-References
123
+
124
+ - `rules/hatch3r-accessibility-standards.md`
125
+ - `rules/hatch3r-ux-states-and-flows.md`
126
+ - `rules/hatch3r-ai-ux-patterns.md`
127
+ - `rules/hatch3r-design-system-detection.md`
128
+ - `rules/hatch3r-performance-budgets.md`
129
+
130
+ ## References
131
+
132
+ - Playwright accessibility testing — `playwright.dev/docs/accessibility-testing`
133
+ - Deque axe-core — `github.com/dequelabs/axe-core`
134
+ - Google Core Web Vitals 2026 thresholds — `web.dev/articles/vitals`
135
+ - Vercel AI SDK UI documentation — `sdk.vercel.ai/docs/ai-sdk-ui`
136
+ - WCAG 2.2 — `www.w3.org/TR/WCAG22/`
@@ -14,6 +14,7 @@ cache_friendly: true
14
14
 
15
15
  ```
16
16
  Task Progress:
17
+ - [ ] Step 0: Detect ambiguity (P8 B1)
17
18
  - [ ] Step 1: Read the issue, mockups, and design system
18
19
  - [ ] Step 2: Produce a visual change plan
19
20
  - [ ] Step 3: Implement matching the mockup
@@ -21,11 +22,24 @@ Task Progress:
21
22
  - [ ] Step 5: Open PR with before/after screenshots
22
23
  ```
23
24
 
25
+ ## Step 0 — Detect Ambiguity (P8 B1)
26
+
27
+ Before any work, scan the invocation for unresolved questions in scope, intent, acceptance criteria, target environment, or irreversibility. If any are found, ask the user via the platform-native question tool per `agents/shared/user-question-protocol.md`. Do not proceed under silent assumption. Default path, not an exception. Triggers for THIS skill: mockup source (provided vs derived from design system), reuse vs extend vs create verdict from `hatch3r-design-system-detect`, responsive breakpoint set, animation budget, and snapshot-regeneration authority.
28
+
29
+ ## Fan-out Discipline (P8 B2)
30
+
31
+ This skill delegates per task size:
32
+ - Tier 1 (trivial single-file): inline execution acceptable.
33
+ - Tier 2 (multi-file or multi-concern): spawn parallel sub-agents per concern via the Task tool.
34
+ - Tier 3 (multi-module / high-risk): one fresh sub-agent per independent module or gate; orchestrator integrates only.
35
+
36
+ Never under-fan-out to save tokens. Token cost is dominated by quality and completeness gains. Emit `sub_agents_spawned: { count, rationale }` in your output.
37
+
24
38
  ## Step 1: Read Inputs
25
39
 
26
40
  - Parse the issue body: proposed changes, before/after mockups, affected surfaces, accessibility checklist, responsiveness requirements.
27
41
  - Read project quality documentation (accessibility, animation budgets).
28
- - Review the existing design system tokens and component hierarchy.
42
+ - Invoke `hatch3r-design-system-detect` to produce the Design System Inventory (`skills/hatch3r-design-system-detect/SKILL.md`). Use the inventory to choose between reuse / extend / create paths. Skipping detection is a regression — visual refactors that invent new tokens or duplicate primitives are rejected at review.
29
43
  - For external library docs and current best practices, follow the project's tooling hierarchy.
30
44
 
31
45
  ## Step 2: Visual Change Plan