npm - hatch3r - Versions diffs - 1.7.0 → 1.7.5 - Mend

hatch3r 1.7.0 → 1.7.5

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (160) hide show

package/README.md +38 -12
package/agents/hatch3r-a11y-auditor.md +4 -0
package/agents/hatch3r-architect.md +5 -1
package/agents/hatch3r-ci-watcher.md +4 -0
package/agents/hatch3r-context-rules.md +4 -0
package/agents/hatch3r-creator.md +4 -0
package/agents/hatch3r-dependency-auditor.md +4 -0
package/agents/hatch3r-devops.md +4 -0
package/agents/hatch3r-docs-writer.md +4 -0
package/agents/hatch3r-fixer.md +5 -1
package/agents/hatch3r-handoff-loader.md +243 -0
package/agents/hatch3r-handoff-preparer.md +134 -0
package/agents/hatch3r-implementer.md +5 -1
package/agents/hatch3r-learnings-loader.md +4 -0
package/agents/hatch3r-lint-fixer.md +4 -0
package/agents/hatch3r-perf-profiler.md +8 -0
package/agents/hatch3r-researcher.md +5 -1
package/agents/hatch3r-reviewer.md +92 -0
package/agents/hatch3r-security-auditor.md +24 -0
package/agents/hatch3r-test-writer.md +4 -0
package/agents/modes/requirements-elicitation.md +5 -1
package/agents/modes/similar-implementation.md +6 -0
package/agents/modes/user-flows.md +76 -0
package/agents/shared/quality-charter.md +129 -0
package/agents/shared/user-question-protocol.md +95 -0
package/commands/board/shared-azure-devops.md +2 -0
package/commands/board/shared-github.md +17 -0
package/commands/board/shared-gitlab.md +4 -0
package/commands/hatch3r-board-fill.md +2 -1
package/commands/hatch3r-board-pickup.md +1 -1
package/commands/hatch3r-board-shared.md +21 -0
package/commands/hatch3r-create.md +2 -0
package/commands/hatch3r-handoff.md +126 -0
package/commands/hatch3r-pr-resolve.md +672 -0
package/commands/hatch3r-quick-change.md +5 -3
package/commands/hatch3r-report.md +167 -0
package/commands/hatch3r-revision.md +1 -1
package/commands/hatch3r-workflow.md +3 -1
package/dist/cli/index.js +3144 -979
package/dist/cli/index.js.map +1 -1
package/package.json +4 -2
package/rules/hatch3r-accessibility-standards.md +21 -0
package/rules/hatch3r-accessibility-standards.mdc +21 -0
package/rules/hatch3r-agent-orchestration.md +32 -1
package/rules/hatch3r-agent-orchestration.mdc +32 -1
package/rules/hatch3r-ai-evals.md +158 -0
package/rules/hatch3r-ai-evals.mdc +154 -0
package/rules/hatch3r-ai-ux-patterns.md +131 -0
package/rules/hatch3r-ai-ux-patterns.mdc +127 -0
package/rules/hatch3r-api-design.md +67 -9
package/rules/hatch3r-api-design.mdc +67 -9
package/rules/hatch3r-api-versioning.md +119 -0
package/rules/hatch3r-api-versioning.mdc +115 -0
package/rules/hatch3r-auth-patterns.md +170 -0
package/rules/hatch3r-auth-patterns.mdc +166 -0
package/rules/hatch3r-component-conventions.md +30 -0
package/rules/hatch3r-component-conventions.mdc +30 -0
package/rules/hatch3r-container-hardening.md +131 -0
package/rules/hatch3r-container-hardening.mdc +127 -0
package/rules/hatch3r-contract-testing.md +117 -0
package/rules/hatch3r-contract-testing.mdc +113 -0
package/rules/hatch3r-deep-context.md +3 -1
package/rules/hatch3r-deep-context.mdc +3 -1
package/rules/hatch3r-dependency-management.md +73 -1
package/rules/hatch3r-dependency-management.mdc +72 -0
package/rules/hatch3r-design-system-detection.md +142 -0
package/rules/hatch3r-design-system-detection.mdc +138 -0
package/rules/hatch3r-event-schema-evolution.md +90 -0
package/rules/hatch3r-event-schema-evolution.mdc +86 -0
package/rules/hatch3r-handoff-readiness.md +45 -0
package/rules/hatch3r-handoff-readiness.mdc +40 -0
package/rules/hatch3r-i18n.md +13 -0
package/rules/hatch3r-i18n.mdc +13 -0
package/rules/hatch3r-iteration-summary.md +2 -0
package/rules/hatch3r-iteration-summary.mdc +2 -0
package/rules/hatch3r-migrations.md +61 -16
package/rules/hatch3r-migrations.mdc +61 -16
package/rules/hatch3r-observability-logging.md +1 -1
package/rules/hatch3r-observability-logging.mdc +1 -1
package/rules/hatch3r-observability-metrics.md +1 -1
package/rules/hatch3r-observability-metrics.mdc +1 -1
package/rules/hatch3r-observability-tracing-detail.md +1 -1
package/rules/hatch3r-observability-tracing-detail.mdc +1 -1
package/rules/hatch3r-observability-tracing.md +1 -1
package/rules/hatch3r-observability-tracing.mdc +1 -1
package/rules/hatch3r-observability.md +1 -0
package/rules/hatch3r-observability.mdc +1 -0
package/rules/hatch3r-operability.md +149 -0
package/rules/hatch3r-operability.mdc +145 -0
package/rules/hatch3r-passkey-server.md +181 -0
package/rules/hatch3r-passkey-server.mdc +177 -0
package/rules/hatch3r-progressive-delivery.md +120 -0
package/rules/hatch3r-progressive-delivery.mdc +116 -0
package/rules/hatch3r-resilience-patterns.md +154 -0
package/rules/hatch3r-resilience-patterns.mdc +150 -0
package/rules/hatch3r-secrets-management.md +29 -0
package/rules/hatch3r-secrets-management.mdc +29 -0
package/rules/hatch3r-testing.md +139 -43
package/rules/hatch3r-testing.mdc +139 -43
package/rules/hatch3r-ux-states-and-flows.md +149 -0
package/rules/hatch3r-ux-states-and-flows.mdc +145 -0
package/skills/hatch3r-a11y-audit/SKILL.md +14 -0
package/skills/hatch3r-ai-feature/SKILL.md +134 -0
package/skills/hatch3r-api-spec/SKILL.md +5 -0
package/skills/hatch3r-architecture-review/SKILL.md +14 -0
package/skills/hatch3r-bug-fix/SKILL.md +5 -0
package/skills/hatch3r-ci-pipeline/SKILL.md +14 -0
package/skills/hatch3r-cli-aichat/SKILL.md +84 -0
package/skills/hatch3r-cli-ast-grep/SKILL.md +85 -0
package/skills/hatch3r-cli-az-devops/SKILL.md +89 -0
package/skills/hatch3r-cli-bat/SKILL.md +85 -0
package/skills/hatch3r-cli-comby/SKILL.md +85 -0
package/skills/hatch3r-cli-csvkit/SKILL.md +84 -0
package/skills/hatch3r-cli-delta/SKILL.md +86 -0
package/skills/hatch3r-cli-difftastic/SKILL.md +84 -0
package/skills/hatch3r-cli-docker/SKILL.md +89 -0
package/skills/hatch3r-cli-duckdb/SKILL.md +84 -0
package/skills/hatch3r-cli-fd/SKILL.md +85 -0
package/skills/hatch3r-cli-fzf/SKILL.md +84 -0
package/skills/hatch3r-cli-gh/SKILL.md +90 -0
package/skills/hatch3r-cli-glab/SKILL.md +89 -0
package/skills/hatch3r-cli-jq/SKILL.md +85 -0
package/skills/hatch3r-cli-lazygit/SKILL.md +78 -0
package/skills/hatch3r-cli-llm/SKILL.md +84 -0
package/skills/hatch3r-cli-miller/SKILL.md +84 -0
package/skills/hatch3r-cli-mods/SKILL.md +84 -0
package/skills/hatch3r-cli-overview/SKILL.md +60 -0
package/skills/hatch3r-cli-playwright/SKILL.md +89 -0
package/skills/hatch3r-cli-podman/SKILL.md +84 -0
package/skills/hatch3r-cli-ripgrep/SKILL.md +85 -0
package/skills/hatch3r-cli-rtk/SKILL.md +91 -0
package/skills/hatch3r-cli-sd/SKILL.md +85 -0
package/skills/hatch3r-cli-stagehand/SKILL.md +79 -0
package/skills/hatch3r-cli-taplo/SKILL.md +84 -0
package/skills/hatch3r-cli-xsv/SKILL.md +89 -0
package/skills/hatch3r-cli-yq/SKILL.md +85 -0
package/skills/hatch3r-cli-zstd/SKILL.md +85 -0
package/skills/hatch3r-context-health/SKILL.md +14 -0
package/skills/hatch3r-cost-tracking/SKILL.md +14 -0
package/skills/hatch3r-customize/SKILL.md +14 -0
package/skills/hatch3r-dep-audit/SKILL.md +14 -0
package/skills/hatch3r-design-system-detect/SKILL.md +162 -0
package/skills/hatch3r-feature/SKILL.md +2 -0
package/skills/hatch3r-gh-agentic-workflows/SKILL.md +13 -0
package/skills/hatch3r-handoff-prepare/SKILL.md +160 -0
package/skills/hatch3r-handoff-resume/SKILL.md +171 -0
package/skills/hatch3r-incident-response/SKILL.md +14 -0
package/skills/hatch3r-issue-workflow/SKILL.md +5 -0
package/skills/hatch3r-logical-refactor/SKILL.md +14 -0
package/skills/hatch3r-migration/SKILL.md +14 -0
package/skills/hatch3r-observability-verify/SKILL.md +133 -0
package/skills/hatch3r-perf-audit/SKILL.md +14 -0
package/skills/hatch3r-pr-creation/SKILL.md +14 -0
package/skills/hatch3r-qa-validation/SKILL.md +18 -0
package/skills/hatch3r-recipe/SKILL.md +14 -0
package/skills/hatch3r-refactor/SKILL.md +14 -0
package/skills/hatch3r-release/SKILL.md +14 -0
package/skills/hatch3r-reliability-verify/SKILL.md +144 -0
package/skills/hatch3r-ui-ux-verify/SKILL.md +136 -0
package/skills/hatch3r-visual-refactor/SKILL.md +15 -1

package/rules/hatch3r-ai-ux-patterns.md ADDED Viewed

@@ -0,0 +1,131 @@
+---
+id: hatch3r-ai-ux-patterns
+type: rule
+description: 2026 AI/agentic UX patterns for end-user projects shipping AI features — streaming, tool-call UI, human-approval gates, cancel/abort/undo, citations
+scope: "**/*.vue,**/*.jsx,**/*.tsx,**/*.svelte,**/ai/**,**/chat/**,**/assistant/**,**/agents/**,**/llm/**,**/copilot/**"
+tags: [ux, ai, frontend]
+quality_charter: agents/shared/quality-charter.md
+cache_friendly: true
+---
+# AI/Agentic UX Patterns (2026)
+## Scope
+This rule applies when the end-user project ships LLM-driven UI — chat, assistant, copilot, agent dashboards, generative UI surfaces. It does NOT govern the LLM backend itself (model selection, prompt engineering, retrieval pipeline). For non-AI UX rules (loading, empty, error, partial states; form patterns; microcopy), cross-reference `rules/hatch3r-ux-states-and-flows.md`. When both rules apply to the same surface, the non-AI rule sets the baseline and this rule layers AI-specific behavior on top.
+Backend companion: `rules/hatch3r-ai-evals.md` defines eval harness, prompt versioning, cost telemetry, prompt caching, model fallback, and hallucination-as-SLI. Apply both rules for any LLM-driven feature.
+Detection: this rule activates when the project imports an AI SDK (`ai`, `@ai-sdk/*`, `openai`, `@anthropic-ai/sdk`, `@google/generative-ai`), or contains files under `ai/`, `chat/`, `assistant/`, `agents/`, `llm/`, `copilot/`. Adding any of these to a project that previously had no AI surface triggers the full ruleset on the next agent run.
+## Streaming-First Defaults
+- Every LLM-driven surface uses framework-agnostic streaming hooks: `useChat`, `useCompletion`, or `useObject` from Vercel AI SDK UI (or equivalent for non-React stacks). Hand-rolled SSE or `fetch` against the model endpoint is a regression in 2026.
+- Render progressive tokens from the moment the request starts. Pair with a skeleton state during the pre-token window (request-sent, first-token-pending). A blank surface during model latency is a regression.
+- Render markdown incrementally without re-parsing the whole buffer on every chunk (Vercel AI Elements `MessageResponse` pattern). Use a streaming-safe markdown renderer that tracks last-rendered offset and appends from there.
+- Emit chunk-level error boundaries. If a stream fails mid-flight, retain prior tokens, render an inline retry control adjacent to the truncated response, and do not blank the surface or reset the message body.
+- Indicate completion explicitly via a final-token marker or `onFinish` callback transition. Stale "thinking" indicators after stream-end erode trust and waste user attention.
+- Non-streaming responses on LLM-driven surfaces are a regression. The narrow exception is structured-output endpoints that must return a single validated JSON payload (`useObject` already streams partial objects — prefer that).
+- Typing indicator policy: show only while no token has arrived; switch to streamed content the moment the first delta lands. Two-second indicator timeouts that linger past first-token are a regression.
+- Backpressure: when the model emits faster than the renderer can paint, batch deltas to one paint per animation frame (`requestAnimationFrame`) rather than dropping tokens or queuing into an unbounded buffer.
+## Generative UI (RSC Streaming)
+- For dynamic dashboards, cards, and surfaces driven by tool calls, use the `streamUI()` RSC pattern (or framework equivalent). The model returns a tool invocation, the server renders an RSC fragment per tool, the client merges fragments into the thread without re-hydrating.
+- Build on shadcn primitives via Vercel AI Elements — 20+ shadcn-based React components for AI surfaces: messages, input, reasoning panel, tool-call display, response actions. Reuse before authoring; bespoke equivalents are a duplication regression.
+- Keep AI-generated UI off the client JS bundle. RSC streaming keeps the payload server-side; do not ship a client-side renderer that duplicates server-rendered output.
+- Each RSC fragment is independently hydratable — never wrap two tool outputs in a single fragment that requires both to complete before render.
+- For non-React stacks, the equivalent is server-rendered HTML fragments streamed via Server-Sent Events and merged into the DOM with explicit anchor elements per tool call. Do not bypass the server-render boundary with raw `innerHTML` of model output.
+## Tool-Call UI Cards
+Every tool invocation is user-visible by default. Hidden tool calls are a regression — transparency is the trust contract with the user.
+- Render each tool call as a structured card with: tool name, args (collapsed by default, expandable on click), status (`pending` → `in-progress` → `complete` / `failed`), elapsed duration, result preview (truncated, expand to full).
+- Indicate state through both color and icon — a single channel (color only) fails for color-vision-deficient users.
+- Args display redacts secrets (API keys, tokens, bearer credentials) and PII per project data-classification rules. Show a `[redacted]` placeholder so the user knows a value was scrubbed, not omitted.
+- Async sub-flows — a tool that triggers more tools — render as nested cards. Do not collapse a multi-tool sub-flow to a single line; the user must see every external action.
+- Failed tool calls show the error message in plain language plus a single-click retry control. Stack traces and raw API responses belong behind an "Details" toggle, not in the user-visible card body.
+- Result preview length: cap at 200 characters or 5 lines (whichever first); always paired with an "Expand" affordance. Long binary or base64 payloads display size + content-type, not the raw bytes.
+- Tool-call cards are keyboard-focusable and announce status transitions via `aria-live="polite"` regions so screen-reader users perceive state changes without polling the visual surface.
+## Human-Approval Gates for Side-Effects
+Required for any tool that:
+- Mutates external state (database write, email send, post to Slack/GitHub/external service)
+- Spends money (paid-API call, transaction, billable inference run on a third-party provider)
+- Modifies the user's environment (file write, shell command, git push, package install)
+Gate pattern:
+1. Pause execution before invoking the tool.
+2. Render an approval card showing tool name, full args (no collapse), diff preview if the tool mutates a file, named target ("Send email to alice@example.com", not "Send email").
+3. Require explicit user confirm (`Approve` button, not Enter-to-dismiss).
+4. On approve, invoke the tool and transition the card to `in-progress`. On deny, surface a cancel-reason field and persist the rejection in the run transcript.
+Read-only tools (search, fetch, list, read) do not require approval — gating them adds friction without trust value.
+Destructive irreversible actions (drop table, delete repo, permanent send) disclose irreversibility on the approval card with a typed-confirmation pattern (user types the named target to enable the confirm button).
+Auto-approve scopes (the "trusted tool" pattern) live behind a per-tool setting with a named scope ("Auto-approve git status, never git push"). Default to opt-in, never opt-out. Scopes are revocable from a single settings surface and revocation takes effect for in-flight runs, not just future runs.
+## Cancel / Abort / Undo
+Three distinct affordances — do not collapse them into a single control:
+- **Cancel** — every long-running tool call (>2s expected duration) and every streaming response renders a visible cancel control adjacent to the surface. Wire to an `AbortController` covering both the model stream and the tool worker. Cancellation produces a clean stop with a "Cancelled by user" marker in the transcript — not a hang, not a silent terminate.
+- **Abort** — distinct from cancel. Aborts the entire agent task (a multi-tool plan). Persists prior tool results so the user can resume from any completed step. Renders as a separate control on the agent-task header, not the in-flight tool card.
+- **Undo** — every reversible action (file write, message send, database row insert, configuration change) renders an undo affordance for 5-30s after completion (named target shown: "Undo: deleted file foo.txt"). Maps to a compensating tool call (delete-after-write, recall-after-send, soft-delete-restore). Destructive irreversible actions disclose this on the approval card per the previous section.
+- Keyboard shortcuts: cancel = `Esc` while the surface is focused; abort = `Esc` then confirm (double-tap pattern) to prevent accidental task termination; undo = `Cmd/Ctrl+Z` while the undo affordance is visible.
+- All three controls remain operable after the user navigates away from the surface — running streams and undo windows persist in a thread-level state, not component-local state. A user closing the panel does not cancel the in-flight task.
+## Citations and Grounding
+- Factual claims from retrieval show inline citations with span-level linkage. Hovering or clicking the citation highlights the source passage in the retrieved document.
+- Cite the retrieval source URL or document anchor, not just the document name — "Knowledge Base / Onboarding" without an anchor is unverifiable.
+- Ungroundable claims are flagged in the response surface ("I'm not certain — this isn't in your retrieved sources") rather than silently emitted. Span-level verification is preferred when the model exposes it.
+- Do not show uncalibrated confidence badges. A "90% confident" indicator without empirical calibration data actively misleads — drop it or replace with grounding framing ("Based on [doc]" or "Not found in your data").
+- When retrieval returns zero results, say so explicitly. Silent fall-through to model parametric memory is a regression — the user cannot distinguish retrieved from confabulated content.
+- Citation rendering: superscript number link in the response body that opens a side panel (or popover on hover) with source title, URL, and highlighted span. Avoid footnote-only patterns that force the user to scroll.
+- Source freshness indicator: when retrieval metadata includes document timestamps, surface the source date on the citation popover. Stale sources flagged at >12 months by default; threshold configurable per project domain.
+## Multi-Step Agent UX
+- Render the agent's plan as an ordered checklist before execution. Allow the user to edit step text, reorder, or skip individual steps before approving the plan. A pre-execution plan view is required for any agent run with >3 steps.
+- As each step completes, mark with status (success/failure/skipped) plus elapsed duration. Failed steps show the failure cause and a retry control; retry restarts from the failed step, not from the beginning.
+- Streaming reasoning (when the model exposes it via a reasoning channel) renders in a collapsed "Reasoning" panel that the user opens on click. Never inline reasoning with the user-facing message body — it dilutes the response and conflates the user-facing answer with internal model state.
+- Maintain a thread or run ID per conversation for resumption, audit trail, and observability backlink. The run ID is visible in the UI (footer or thread metadata) so users and support can reference a specific run.
+- Plan revision: when the model revises its plan mid-run (adds, removes, or reorders remaining steps), diff the previous plan against the new plan and surface the diff to the user before continuing. Silent plan mutation defeats the pre-execution review.
+- Parallel steps: when the plan contains independent steps that the agent runs in parallel, render them as a horizontal group with shared progress, not a flat vertical list — the user must see the parallelism to interpret elapsed time.
+## Failure Modes and Degradation
+- **Model timeout** — render a fallback card with a single-click recovery ("Model is slow — retry?" or "Switching to faster model"). Do not silently retry; the user must know the model failed.
+- **Rate limit** — show a countdown timer and queue position when available. Surface vendor rate-limit headers, not raw API error JSON. "Try again in 47s" beats "429 Too Many Requests".
+- **Token budget exceeded** — indicate context truncation in-line ("Conversation truncated — earlier turns dropped") and offer a summary/restart control. Do not silently truncate; the user must know which turns are still in context.
+- **Tool unavailability** — degrade gracefully. Disable the affected feature with a tooltip explaining why ("Slack integration offline — reconnect in Settings"), not a crash or an unhandled exception bubbled to the user.
+- **Stream interruption mid-response** — retain the partial response, render a "Stream interrupted" marker, offer continue-from-here (resume the stream from the last received token) and restart-from-scratch (drop partial, restart) as two separate controls.
+- **Content-policy refusal** — when the model refuses to answer (safety filter, policy violation), render the refusal verbatim in a distinct style (not a normal assistant message), and surface a single-click "Report this" affordance so users can flag false-positive refusals.
+- **Stale session** — if the user returns to a tab where the stream has died (network sleep, tab discard), detect on focus and offer a "Reconnect" control rather than silently re-issuing the request and double-billing the user.
+## Verification Gate
+Before declaring an AI surface "done":
+- Streaming verified end-to-end: scripted Playwright test that asserts progressive token render (first token <1s after request, last token marked complete).
+- Tool-call card snapshot per state (`pending`, `in-progress`, `complete`, `failed`) — missing any state is a blocker.
+- Approval-gate test: simulate a side-effecting tool, assert execution pauses, approval card renders required fields, deny path persists rejection in transcript.
+- Cancel/abort/undo each have at least one end-to-end test exercising the AbortController wire-up; cancellation produces a transcript marker, not a silent terminate.
+- Citation rendering: snapshot of a grounded response shows inline citation with source URL or anchor; an ungroundable claim renders the flag string, not silent emission.
+- Failure-mode coverage: at least one test per failure mode in this rule (timeout, rate limit, token budget, tool unavailability, stream interruption, content-policy refusal, stale session). Missing coverage is a blocker.
+- Accessibility: axe-core scan against every AI surface state (idle, streaming, tool-call pending, approval card open, error) returns 0 serious or critical violations.
+## References
+- Vercel — Introducing AI Elements (`vercel.com/changelog/introducing-ai-elements`)
+- Vercel AI SDK UI docs (`ai-sdk.dev/docs/ai-sdk-ui`) — `useChat`, `useCompletion`, `useObject`
+- Vercel — AI SDK 3 Generative UI / `streamUI()` (`vercel.com/blog/ai-sdk-3-generative-ui`)
+- Anthropic — Building agents with the Claude Agent SDK (`anthropic.com/engineering/building-agents-with-the-claude-agent-sdk`)
+- OpenAI Apps SDK — UI guidelines (`developers.openai.com/apps-sdk/concepts/ui-guidelines`)
+- ClarityArc — Hallucination, grounding, and citation in enterprise systems
+- Lakera — LLM hallucinations in 2026

package/rules/hatch3r-ai-ux-patterns.mdc ADDED Viewed

@@ -0,0 +1,127 @@
+---
+description: 2026 AI/agentic UX patterns for end-user projects shipping AI features — streaming, tool-call UI, human-approval gates, cancel/abort/undo, citations
+globs: ["**/*.vue", "**/*.jsx", "**/*.tsx", "**/*.svelte", "**/ai/**", "**/chat/**", "**/assistant/**", "**/agents/**", "**/llm/**", "**/copilot/**"]
+alwaysApply: false
+---
+# AI/Agentic UX Patterns (2026)
+## Scope
+This rule applies when the end-user project ships LLM-driven UI — chat, assistant, copilot, agent dashboards, generative UI surfaces. It does NOT govern the LLM backend itself (model selection, prompt engineering, retrieval pipeline). For non-AI UX rules (loading, empty, error, partial states; form patterns; microcopy), cross-reference `rules/hatch3r-ux-states-and-flows.md`. When both rules apply to the same surface, the non-AI rule sets the baseline and this rule layers AI-specific behavior on top.
+Backend companion: `rules/hatch3r-ai-evals.md` defines eval harness, prompt versioning, cost telemetry, prompt caching, model fallback, and hallucination-as-SLI. Apply both rules for any LLM-driven feature.
+Detection: this rule activates when the project imports an AI SDK (`ai`, `@ai-sdk/*`, `openai`, `@anthropic-ai/sdk`, `@google/generative-ai`), or contains files under `ai/`, `chat/`, `assistant/`, `agents/`, `llm/`, `copilot/`. Adding any of these to a project that previously had no AI surface triggers the full ruleset on the next agent run.
+## Streaming-First Defaults
+- Every LLM-driven surface uses framework-agnostic streaming hooks: `useChat`, `useCompletion`, or `useObject` from Vercel AI SDK UI (or equivalent for non-React stacks). Hand-rolled SSE or `fetch` against the model endpoint is a regression in 2026.
+- Render progressive tokens from the moment the request starts. Pair with a skeleton state during the pre-token window (request-sent, first-token-pending). A blank surface during model latency is a regression.
+- Render markdown incrementally without re-parsing the whole buffer on every chunk (Vercel AI Elements `MessageResponse` pattern). Use a streaming-safe markdown renderer that tracks last-rendered offset and appends from there.
+- Emit chunk-level error boundaries. If a stream fails mid-flight, retain prior tokens, render an inline retry control adjacent to the truncated response, and do not blank the surface or reset the message body.
+- Indicate completion explicitly via a final-token marker or `onFinish` callback transition. Stale "thinking" indicators after stream-end erode trust and waste user attention.
+- Non-streaming responses on LLM-driven surfaces are a regression. The narrow exception is structured-output endpoints that must return a single validated JSON payload (`useObject` already streams partial objects — prefer that).
+- Typing indicator policy: show only while no token has arrived; switch to streamed content the moment the first delta lands. Two-second indicator timeouts that linger past first-token are a regression.
+- Backpressure: when the model emits faster than the renderer can paint, batch deltas to one paint per animation frame (`requestAnimationFrame`) rather than dropping tokens or queuing into an unbounded buffer.
+## Generative UI (RSC Streaming)
+- For dynamic dashboards, cards, and surfaces driven by tool calls, use the `streamUI()` RSC pattern (or framework equivalent). The model returns a tool invocation, the server renders an RSC fragment per tool, the client merges fragments into the thread without re-hydrating.
+- Build on shadcn primitives via Vercel AI Elements — 20+ shadcn-based React components for AI surfaces: messages, input, reasoning panel, tool-call display, response actions. Reuse before authoring; bespoke equivalents are a duplication regression.
+- Keep AI-generated UI off the client JS bundle. RSC streaming keeps the payload server-side; do not ship a client-side renderer that duplicates server-rendered output.
+- Each RSC fragment is independently hydratable — never wrap two tool outputs in a single fragment that requires both to complete before render.
+- For non-React stacks, the equivalent is server-rendered HTML fragments streamed via Server-Sent Events and merged into the DOM with explicit anchor elements per tool call. Do not bypass the server-render boundary with raw `innerHTML` of model output.
+## Tool-Call UI Cards
+Every tool invocation is user-visible by default. Hidden tool calls are a regression — transparency is the trust contract with the user.
+- Render each tool call as a structured card with: tool name, args (collapsed by default, expandable on click), status (`pending` → `in-progress` → `complete` / `failed`), elapsed duration, result preview (truncated, expand to full).
+- Indicate state through both color and icon — a single channel (color only) fails for color-vision-deficient users.
+- Args display redacts secrets (API keys, tokens, bearer credentials) and PII per project data-classification rules. Show a `[redacted]` placeholder so the user knows a value was scrubbed, not omitted.
+- Async sub-flows — a tool that triggers more tools — render as nested cards. Do not collapse a multi-tool sub-flow to a single line; the user must see every external action.
+- Failed tool calls show the error message in plain language plus a single-click retry control. Stack traces and raw API responses belong behind an "Details" toggle, not in the user-visible card body.
+- Result preview length: cap at 200 characters or 5 lines (whichever first); always paired with an "Expand" affordance. Long binary or base64 payloads display size + content-type, not the raw bytes.
+- Tool-call cards are keyboard-focusable and announce status transitions via `aria-live="polite"` regions so screen-reader users perceive state changes without polling the visual surface.
+## Human-Approval Gates for Side-Effects
+Required for any tool that:
+- Mutates external state (database write, email send, post to Slack/GitHub/external service)
+- Spends money (paid-API call, transaction, billable inference run on a third-party provider)
+- Modifies the user's environment (file write, shell command, git push, package install)
+Gate pattern:
+1. Pause execution before invoking the tool.
+2. Render an approval card showing tool name, full args (no collapse), diff preview if the tool mutates a file, named target ("Send email to alice@example.com", not "Send email").
+3. Require explicit user confirm (`Approve` button, not Enter-to-dismiss).
+4. On approve, invoke the tool and transition the card to `in-progress`. On deny, surface a cancel-reason field and persist the rejection in the run transcript.
+Read-only tools (search, fetch, list, read) do not require approval — gating them adds friction without trust value.
+Destructive irreversible actions (drop table, delete repo, permanent send) disclose irreversibility on the approval card with a typed-confirmation pattern (user types the named target to enable the confirm button).
+Auto-approve scopes (the "trusted tool" pattern) live behind a per-tool setting with a named scope ("Auto-approve git status, never git push"). Default to opt-in, never opt-out. Scopes are revocable from a single settings surface and revocation takes effect for in-flight runs, not just future runs.
+## Cancel / Abort / Undo
+Three distinct affordances — do not collapse them into a single control:
+- **Cancel** — every long-running tool call (>2s expected duration) and every streaming response renders a visible cancel control adjacent to the surface. Wire to an `AbortController` covering both the model stream and the tool worker. Cancellation produces a clean stop with a "Cancelled by user" marker in the transcript — not a hang, not a silent terminate.
+- **Abort** — distinct from cancel. Aborts the entire agent task (a multi-tool plan). Persists prior tool results so the user can resume from any completed step. Renders as a separate control on the agent-task header, not the in-flight tool card.
+- **Undo** — every reversible action (file write, message send, database row insert, configuration change) renders an undo affordance for 5-30s after completion (named target shown: "Undo: deleted file foo.txt"). Maps to a compensating tool call (delete-after-write, recall-after-send, soft-delete-restore). Destructive irreversible actions disclose this on the approval card per the previous section.
+- Keyboard shortcuts: cancel = `Esc` while the surface is focused; abort = `Esc` then confirm (double-tap pattern) to prevent accidental task termination; undo = `Cmd/Ctrl+Z` while the undo affordance is visible.
+- All three controls remain operable after the user navigates away from the surface — running streams and undo windows persist in a thread-level state, not component-local state. A user closing the panel does not cancel the in-flight task.
+## Citations and Grounding
+- Factual claims from retrieval show inline citations with span-level linkage. Hovering or clicking the citation highlights the source passage in the retrieved document.
+- Cite the retrieval source URL or document anchor, not just the document name — "Knowledge Base / Onboarding" without an anchor is unverifiable.
+- Ungroundable claims are flagged in the response surface ("I'm not certain — this isn't in your retrieved sources") rather than silently emitted. Span-level verification is preferred when the model exposes it.
+- Do not show uncalibrated confidence badges. A "90% confident" indicator without empirical calibration data actively misleads — drop it or replace with grounding framing ("Based on [doc]" or "Not found in your data").
+- When retrieval returns zero results, say so explicitly. Silent fall-through to model parametric memory is a regression — the user cannot distinguish retrieved from confabulated content.
+- Citation rendering: superscript number link in the response body that opens a side panel (or popover on hover) with source title, URL, and highlighted span. Avoid footnote-only patterns that force the user to scroll.
+- Source freshness indicator: when retrieval metadata includes document timestamps, surface the source date on the citation popover. Stale sources flagged at >12 months by default; threshold configurable per project domain.
+## Multi-Step Agent UX
+- Render the agent's plan as an ordered checklist before execution. Allow the user to edit step text, reorder, or skip individual steps before approving the plan. A pre-execution plan view is required for any agent run with >3 steps.
+- As each step completes, mark with status (success/failure/skipped) plus elapsed duration. Failed steps show the failure cause and a retry control; retry restarts from the failed step, not from the beginning.
+- Streaming reasoning (when the model exposes it via a reasoning channel) renders in a collapsed "Reasoning" panel that the user opens on click. Never inline reasoning with the user-facing message body — it dilutes the response and conflates the user-facing answer with internal model state.
+- Maintain a thread or run ID per conversation for resumption, audit trail, and observability backlink. The run ID is visible in the UI (footer or thread metadata) so users and support can reference a specific run.
+- Plan revision: when the model revises its plan mid-run (adds, removes, or reorders remaining steps), diff the previous plan against the new plan and surface the diff to the user before continuing. Silent plan mutation defeats the pre-execution review.
+- Parallel steps: when the plan contains independent steps that the agent runs in parallel, render them as a horizontal group with shared progress, not a flat vertical list — the user must see the parallelism to interpret elapsed time.
+## Failure Modes and Degradation
+- **Model timeout** — render a fallback card with a single-click recovery ("Model is slow — retry?" or "Switching to faster model"). Do not silently retry; the user must know the model failed.
+- **Rate limit** — show a countdown timer and queue position when available. Surface vendor rate-limit headers, not raw API error JSON. "Try again in 47s" beats "429 Too Many Requests".
+- **Token budget exceeded** — indicate context truncation in-line ("Conversation truncated — earlier turns dropped") and offer a summary/restart control. Do not silently truncate; the user must know which turns are still in context.
+- **Tool unavailability** — degrade gracefully. Disable the affected feature with a tooltip explaining why ("Slack integration offline — reconnect in Settings"), not a crash or an unhandled exception bubbled to the user.
+- **Stream interruption mid-response** — retain the partial response, render a "Stream interrupted" marker, offer continue-from-here (resume the stream from the last received token) and restart-from-scratch (drop partial, restart) as two separate controls.
+- **Content-policy refusal** — when the model refuses to answer (safety filter, policy violation), render the refusal verbatim in a distinct style (not a normal assistant message), and surface a single-click "Report this" affordance so users can flag false-positive refusals.
+- **Stale session** — if the user returns to a tab where the stream has died (network sleep, tab discard), detect on focus and offer a "Reconnect" control rather than silently re-issuing the request and double-billing the user.
+## Verification Gate
+Before declaring an AI surface "done":
+- Streaming verified end-to-end: scripted Playwright test that asserts progressive token render (first token <1s after request, last token marked complete).
+- Tool-call card snapshot per state (`pending`, `in-progress`, `complete`, `failed`) — missing any state is a blocker.
+- Approval-gate test: simulate a side-effecting tool, assert execution pauses, approval card renders required fields, deny path persists rejection in transcript.
+- Cancel/abort/undo each have at least one end-to-end test exercising the AbortController wire-up; cancellation produces a transcript marker, not a silent terminate.
+- Citation rendering: snapshot of a grounded response shows inline citation with source URL or anchor; an ungroundable claim renders the flag string, not silent emission.
+- Failure-mode coverage: at least one test per failure mode in this rule (timeout, rate limit, token budget, tool unavailability, stream interruption, content-policy refusal, stale session). Missing coverage is a blocker.
+- Accessibility: axe-core scan against every AI surface state (idle, streaming, tool-call pending, approval card open, error) returns 0 serious or critical violations.
+## References
+- Vercel — Introducing AI Elements (`vercel.com/changelog/introducing-ai-elements`)
+- Vercel AI SDK UI docs (`ai-sdk.dev/docs/ai-sdk-ui`) — `useChat`, `useCompletion`, `useObject`
+- Vercel — AI SDK 3 Generative UI / `streamUI()` (`vercel.com/blog/ai-sdk-3-generative-ui`)
+- Anthropic — Building agents with the Claude Agent SDK (`anthropic.com/engineering/building-agents-with-the-claude-agent-sdk`)
+- OpenAI Apps SDK — UI guidelines (`developers.openai.com/apps-sdk/concepts/ui-guidelines`)
+- ClarityArc — Hallucination, grounding, and citation in enterprise systems
+- Lakera — LLM hallucinations in 2026

package/rules/hatch3r-api-design.md CHANGED Viewed

@@ -12,10 +12,10 @@ cache_friendly: true
 - Use consistent request/response schemas. Document in spec files.
 - All endpoints versioned (URL prefix or header). Backward-compatible changes only.
 - Additive changes only: add fields, never rename or remove without migration.
-- Error responses follow standard shape: `{ code, message, details? }`.
-- Idempotency keys for mutation endpoints. Reject duplicate requests.
+- Error responses follow RFC 9457 Problem Details (`application/problem+json`) — see "Error Envelope" below.
+- Idempotency keys for mutation endpoints. See `rules/hatch3r-api-versioning.md` for header name, TTL, and replay semantics.
 - Request validation at the boundary: validate and sanitize before processing.
-- List endpoints return envelopes: `{ data, pagination }`. Never raw arrays.
+- List endpoints return cursor-paginated envelopes — see "Pagination Standards" below for the exact shape.
 - Rate limiting enforced server-side. Return 429 with `Retry-After` when exceeded.
 - API contracts documented in project specs. Update on schema changes.
@@ -24,7 +24,7 @@ cache_friendly: true
 - Validate JWT signature, expiration (`exp`), audience (`aud`), and issuer (`iss`) on every request.
 - Enforce algorithm in server config (`RS256` for asymmetric, `HS256` for symmetric). Reject `alg: none`.
 - Access tokens short-lived (15–60 min). Use opaque refresh tokens with rotation and revocation.
-- OAuth 2.0 with PKCE for SPAs and mobile clients. Never use implicit flow.
+- OAuth 2.1 (draft-ietf-oauth-v2-1) with mandatory PKCE for SPAs, mobile, and confidential clients. Implicit + password grants removed. Detail in `rules/hatch3r-api-versioning.md`; cross-reference `rules/hatch3r-auth-patterns.md`.
 - API keys identify applications, not users. Scope keys to specific services and rotate on schedule.
 - Authorization middleware checks permissions before handler execution. Fail closed on missing claims.
 - Apply principle of least privilege: scope tokens to the minimum required permissions.
@@ -51,13 +51,12 @@ cache_friendly: true
 ## Pagination Standards
-- Default to cursor-based pagination for ordered data with high write frequency or deep traversal.
-- Use offset-based pagination only for small, stable datasets where total count is cheap.
+- Default to cursor-based pagination with a stable composite sort key (e.g., `(created_at, id)` using ULID/UUIDv7 as deterministic tiebreaker).
+- Use offset-based pagination only for small, bounded, append-immutable datasets (e.g., admin reports) where total count is cheap.
 - Enforce a maximum page size server-side (e.g., 100 items). Ignore client requests exceeding the limit.
-- Return `nextCursor` (or `null`) in the envelope. Clients must not construct cursors—they are opaque.
-- Total count is optional: provide it only when the query cost is bounded. Use `hasMore` boolean otherwise.
+- Cursors are opaque base64 strings. Clients must not parse or construct them.
 - Reject deep offset pagination beyond a threshold (e.g., offset > 10000) to protect database performance.
-- Include pagination metadata in the envelope: `{ data, pagination: { nextCursor, hasMore, pageSize } }`.
+- Standard envelope: `{ data, page_info: { has_next_page, end_cursor }, links: { next } }`. Add `has_previous_page` + `start_cursor` for bidirectional cursor navigation (Relay-style). Total count is optional — provide only when the query cost is bounded.
 ## Request Security
@@ -179,3 +178,62 @@ Follow the IETF draft standard (`draft-ietf-httpapi-ratelimit-headers`) for rate
 - When the rate limit is exceeded, return HTTP `429 Too Many Requests` with `Retry-After` header.
 - Document rate limits per endpoint tier in the API specification (e.g., auth endpoints: 100/min, read endpoints: 1000/min, write endpoints: 500/min).
 - Apply rate limiting by authenticated identity (API key, user token) as the primary key. Fall back to IP-based limiting for unauthenticated endpoints.
+## Spec Floor: OpenAPI 3.1 / 3.2 and AsyncAPI 3.1.0
+- **OpenAPI 3.1** is the 2026 floor for REST APIs. OpenAPI 3.2.0 (Sep 2025) is emerging — use only when its features (deeper webhook coverage, `pathItems`) are required; tooling lag is real.
+- **AsyncAPI 3.1.0** (Jan 2026, non-breaking over 3.0) is the canonical spec for event-driven and webhook APIs. Define every event channel, message schema, and binding (Kafka/AMQP/MQTT/WebSocket/HTTP) in an AsyncAPI document.
+- Generate code, mocks, and reference docs from the spec — never hand-write client SDKs or hand-roll mocks when a spec exists. Use Prism (Stoplight) for OAS-driven mock servers.
+## Error Envelope: RFC 9457 Problem Details
+Every error response uses `Content-Type: application/problem+json` and the RFC 9457 shape, superseding any local `{code, message, details?}` convention:
+| Field | Required | Purpose |
+|-------|----------|---------|
+| `type` | yes | URI identifying the error class (e.g., `https://errors.example.com/validation`). Stable across versions. |
+| `title` | yes | Short human-readable summary. Constant per `type`. |
+| `status` | yes | HTTP status code as integer. Mirrors the response status. |
+| `detail` | yes | Human-readable explanation specific to this occurrence. May include user input. |
+| `instance` | yes | URI reference identifying this specific occurrence (e.g., a log/trace URL). |
+| `trace_id` (extension) | recommended | W3C Trace Context trace ID for observability correlation. Alternative: propagate via `traceparent` response header. |
+- Extension fields are allowed for machine-readable detail (e.g., `errors: [{ field, code }]` for validation).
+- Frameworks emitting Problem Details by default (Spring 6, ASP.NET 8, Hono): keep the default — do not override with a custom shape.
+## GraphQL: Trusted Documents
+- Production GraphQL endpoints accept **only** registered query IDs (Apollo Persisted Queries, Relay Persisted Queries, or Hive). Reject ad-hoc queries from web clients. Apollo's APQ is performance-only — use pre-registered trusted documents for security parity.
+- Annotate fields with `@cost` (per-field cost) and lists with `@listSize` (assumed size, slicing args, `requireOneSlicingArgument: true`) for query-cost analysis. Block requests exceeding the global cost budget.
+- Federation v2 (`@key`, `@shareable`, `@external`, `@inaccessible`, `@override`) is the floor — Federation 1.0 supergraphs are deprecated in GraphOS.
+## gRPC / Connect-RPC
+- **Connect-RPC** (`buf/connect-go|web|swift|kotlin`) is the 2026 recommendation for web-facing gRPC — replaces gRPC-Web; supports HTTP/1.1, HTTP/2, and the Connect protocol.
+- **REST transcoding:** add `google.api.http` annotations per AIP-127 to map RPC to HTTP/JSON; expose via Envoy `grpc_json_transcoder` or grpc-gateway.
+- **Removed fields:** reserve both the tag and the field name (`reserved 4; reserved "password";`) to block accidental reuse.
+- Mark deprecated fields/methods with `[deprecated = true]` and `// Deprecated:` comments; consumers see the deprecation in generated code.
+## tRPC / Hono RPC
+- Choose tRPC v11 or Hono RPC **only** when both ends are TypeScript in a closed monorepo with no external consumers. Types flow via TS inference — no codegen, no cross-language interop.
+- The moment a non-TS consumer or public SDK is required, switch to OpenAPI emission. **oRPC** bridges tRPC ergonomics with OpenAPI for cross-language use (Standard Schema: Zod/Valibot/ArkType).
+- All tRPC routers use Zod (or Standard Schema) for input/output validation and a structured error formatter mapping errors to RFC 9457 Problem Details on the HTTP boundary.
+## Breaking-Change CI Gate
+Every PR that touches an API contract runs the matching diff tool against the last shipped tag. Breaking changes block merge unless paired with an RFC 9745 Deprecation + RFC 8594 Sunset lifecycle (see `rules/hatch3r-api-versioning.md`) or a major-version bump.
+| Contract type | Path glob | Tool | Action |
+|---------------|-----------|------|--------|
+| OpenAPI | `**/openapi.{yaml,yml,json}`, `**/openapi/**` | `oasdiff breaking` (450+ rules, OAS 3.0 + 3.1) | PR comment + fail on `breaking` exit |
+| Protobuf | `**/*.proto` | `buf breaking` (default `FILE` rule set) | PR comment + fail on wire/source incompatibility |
+| GraphQL | `**/*.graphql`, `**/*.gql` | `graphql-inspector diff` | PR comment + fail on breaking-classified changes |
+The reviewer agent (`agents/hatch3r-reviewer.md` item 11 "Contract preservation") invokes the matching tool on every PR touching `**/api/**`, `**/proto/**`, or schema files — verifies both in-repo and cross-repo consumers.
+## Cross-References
+- Versioning, deprecation, idempotency, OAuth 2.1, webhook signing: `rules/hatch3r-api-versioning.md`.
+- Pact / Schemathesis / `can-i-deploy` gate: `rules/hatch3r-contract-testing.md`.
+- OAuth 2.1, DPoP, Resource Indicators, mTLS: `rules/hatch3r-auth-patterns.md`.

package/rules/hatch3r-api-design.mdc CHANGED Viewed

@@ -8,10 +8,10 @@ alwaysApply: false
 - Use consistent request/response schemas. Document in spec files.
 - All endpoints versioned (URL prefix or header). Backward-compatible changes only.
 - Additive changes only: add fields, never rename or remove without migration.
-- Error responses follow standard shape: `{ code, message, details? }`.
-- Idempotency keys for mutation endpoints. Reject duplicate requests.
+- Error responses follow RFC 9457 Problem Details (`application/problem+json`) — see "Error Envelope" below.
+- Idempotency keys for mutation endpoints. See `rules/hatch3r-api-versioning.md` for header name, TTL, and replay semantics.
 - Request validation at the boundary: validate and sanitize before processing.
-- List endpoints return envelopes: `{ data, pagination }`. Never raw arrays.
+- List endpoints return cursor-paginated envelopes — see "Pagination Standards" below for the exact shape.
 - Rate limiting enforced server-side. Return 429 with `Retry-After` when exceeded.
 - API contracts documented in project specs. Update on schema changes.
@@ -20,7 +20,7 @@ alwaysApply: false
 - Validate JWT signature, expiration (`exp`), audience (`aud`), and issuer (`iss`) on every request.
 - Enforce algorithm in server config (`RS256` for asymmetric, `HS256` for symmetric). Reject `alg: none`.
 - Access tokens short-lived (15–60 min). Use opaque refresh tokens with rotation and revocation.
-- OAuth 2.0 with PKCE for SPAs and mobile clients. Never use implicit flow.
+- OAuth 2.1 (draft-ietf-oauth-v2-1) with mandatory PKCE for SPAs, mobile, and confidential clients. Implicit + password grants removed. Detail in `rules/hatch3r-api-versioning.md`; cross-reference `rules/hatch3r-auth-patterns.md`.
 - API keys identify applications, not users. Scope keys to specific services and rotate on schedule.
 - Authorization middleware checks permissions before handler execution. Fail closed on missing claims.
 - Apply principle of least privilege: scope tokens to the minimum required permissions.
@@ -47,13 +47,12 @@ alwaysApply: false
 ## Pagination Standards
-- Default to cursor-based pagination for ordered data with high write frequency or deep traversal.
-- Use offset-based pagination only for small, stable datasets where total count is cheap.
+- Default to cursor-based pagination with a stable composite sort key (e.g., `(created_at, id)` using ULID/UUIDv7 as deterministic tiebreaker).
+- Use offset-based pagination only for small, bounded, append-immutable datasets (e.g., admin reports) where total count is cheap.
 - Enforce a maximum page size server-side (e.g., 100 items). Ignore client requests exceeding the limit.
-- Return `nextCursor` (or `null`) in the envelope. Clients must not construct cursors—they are opaque.
-- Total count is optional: provide it only when the query cost is bounded. Use `hasMore` boolean otherwise.
+- Cursors are opaque base64 strings. Clients must not parse or construct them.
 - Reject deep offset pagination beyond a threshold (e.g., offset > 10000) to protect database performance.
-- Include pagination metadata in the envelope: `{ data, pagination: { nextCursor, hasMore, pageSize } }`.
+- Standard envelope: `{ data, page_info: { has_next_page, end_cursor }, links: { next } }`. Add `has_previous_page` + `start_cursor` for bidirectional cursor navigation (Relay-style). Total count is optional — provide only when the query cost is bounded.
 ## Request Security
@@ -175,3 +174,62 @@ Follow the IETF draft standard (`draft-ietf-httpapi-ratelimit-headers`) for rate
 - When the rate limit is exceeded, return HTTP `429 Too Many Requests` with `Retry-After` header.
 - Document rate limits per endpoint tier in the API specification (e.g., auth endpoints: 100/min, read endpoints: 1000/min, write endpoints: 500/min).
 - Apply rate limiting by authenticated identity (API key, user token) as the primary key. Fall back to IP-based limiting for unauthenticated endpoints.
+## Spec Floor: OpenAPI 3.1 / 3.2 and AsyncAPI 3.1.0
+- **OpenAPI 3.1** is the 2026 floor for REST APIs. OpenAPI 3.2.0 (Sep 2025) is emerging — use only when its features (deeper webhook coverage, `pathItems`) are required; tooling lag is real.
+- **AsyncAPI 3.1.0** (Jan 2026, non-breaking over 3.0) is the canonical spec for event-driven and webhook APIs. Define every event channel, message schema, and binding (Kafka/AMQP/MQTT/WebSocket/HTTP) in an AsyncAPI document.
+- Generate code, mocks, and reference docs from the spec — never hand-write client SDKs or hand-roll mocks when a spec exists. Use Prism (Stoplight) for OAS-driven mock servers.
+## Error Envelope: RFC 9457 Problem Details
+Every error response uses `Content-Type: application/problem+json` and the RFC 9457 shape, superseding any local `{code, message, details?}` convention:
+| Field | Required | Purpose |
+|-------|----------|---------|
+| `type` | yes | URI identifying the error class (e.g., `https://errors.example.com/validation`). Stable across versions. |
+| `title` | yes | Short human-readable summary. Constant per `type`. |
+| `status` | yes | HTTP status code as integer. Mirrors the response status. |
+| `detail` | yes | Human-readable explanation specific to this occurrence. May include user input. |
+| `instance` | yes | URI reference identifying this specific occurrence (e.g., a log/trace URL). |
+| `trace_id` (extension) | recommended | W3C Trace Context trace ID for observability correlation. Alternative: propagate via `traceparent` response header. |
+- Extension fields are allowed for machine-readable detail (e.g., `errors: [{ field, code }]` for validation).
+- Frameworks emitting Problem Details by default (Spring 6, ASP.NET 8, Hono): keep the default — do not override with a custom shape.
+## GraphQL: Trusted Documents
+- Production GraphQL endpoints accept **only** registered query IDs (Apollo Persisted Queries, Relay Persisted Queries, or Hive). Reject ad-hoc queries from web clients. Apollo's APQ is performance-only — use pre-registered trusted documents for security parity.
+- Annotate fields with `@cost` (per-field cost) and lists with `@listSize` (assumed size, slicing args, `requireOneSlicingArgument: true`) for query-cost analysis. Block requests exceeding the global cost budget.
+- Federation v2 (`@key`, `@shareable`, `@external`, `@inaccessible`, `@override`) is the floor — Federation 1.0 supergraphs are deprecated in GraphOS.
+## gRPC / Connect-RPC
+- **Connect-RPC** (`buf/connect-go|web|swift|kotlin`) is the 2026 recommendation for web-facing gRPC — replaces gRPC-Web; supports HTTP/1.1, HTTP/2, and the Connect protocol.
+- **REST transcoding:** add `google.api.http` annotations per AIP-127 to map RPC to HTTP/JSON; expose via Envoy `grpc_json_transcoder` or grpc-gateway.
+- **Removed fields:** reserve both the tag and the field name (`reserved 4; reserved "password";`) to block accidental reuse.
+- Mark deprecated fields/methods with `[deprecated = true]` and `// Deprecated:` comments; consumers see the deprecation in generated code.
+## tRPC / Hono RPC
+- Choose tRPC v11 or Hono RPC **only** when both ends are TypeScript in a closed monorepo with no external consumers. Types flow via TS inference — no codegen, no cross-language interop.
+- The moment a non-TS consumer or public SDK is required, switch to OpenAPI emission. **oRPC** bridges tRPC ergonomics with OpenAPI for cross-language use (Standard Schema: Zod/Valibot/ArkType).
+- All tRPC routers use Zod (or Standard Schema) for input/output validation and a structured error formatter mapping errors to RFC 9457 Problem Details on the HTTP boundary.
+## Breaking-Change CI Gate
+Every PR that touches an API contract runs the matching diff tool against the last shipped tag. Breaking changes block merge unless paired with an RFC 9745 Deprecation + RFC 8594 Sunset lifecycle (see `rules/hatch3r-api-versioning.md`) or a major-version bump.
+| Contract type | Path glob | Tool | Action |
+|---------------|-----------|------|--------|
+| OpenAPI | `**/openapi.{yaml,yml,json}`, `**/openapi/**` | `oasdiff breaking` (450+ rules, OAS 3.0 + 3.1) | PR comment + fail on `breaking` exit |
+| Protobuf | `**/*.proto` | `buf breaking` (default `FILE` rule set) | PR comment + fail on wire/source incompatibility |
+| GraphQL | `**/*.graphql`, `**/*.gql` | `graphql-inspector diff` | PR comment + fail on breaking-classified changes |
+The reviewer agent (`agents/hatch3r-reviewer.md` item 11 "Contract preservation") invokes the matching tool on every PR touching `**/api/**`, `**/proto/**`, or schema files — verifies both in-repo and cross-repo consumers.
+## Cross-References
+- Versioning, deprecation, idempotency, OAuth 2.1, webhook signing: `rules/hatch3r-api-versioning.md`.
+- Pact / Schemathesis / `can-i-deploy` gate: `rules/hatch3r-contract-testing.md`.
+- OAuth 2.1, DPoP, Resource Indicators, mTLS: `rules/hatch3r-auth-patterns.md`.

package/rules/hatch3r-api-versioning.md ADDED Viewed

@@ -0,0 +1,119 @@
+---
+id: hatch3r-api-versioning
+type: rule
+description: API versioning, deprecation lifecycle, and idempotency — RFC 9457 errors, RFC 9745 Deprecation header, RFC 8594 Sunset, OAuth 2.1, Idempotency-Key, semver vs CalVer for APIs
+scope: "**/api/**,**/openapi*,**/asyncapi*,**/*.proto,**/routes/**,**/handlers/**,**/controllers/**"
+tags: [implementation, devops]
+quality_charter: agents/shared/quality-charter.md
+cache_friendly: true
+---
+# API Versioning, Deprecation & Idempotency
+Lifecycle rules for evolving REST, GraphQL, and gRPC contracts without breaking consumers. Pairs with `rules/hatch3r-api-design.md` (shape) and `rules/hatch3r-contract-testing.md` (verification).
+## Versioning Scheme
+- **Semver (`v1`, `v2`)** for stable contracts and published SDKs: major version goes in the URL prefix (`/v2/users`) for breaking changes; minor and patch updates are field-level additive and stay within the same major.
+- **CalVer (`2026-05-15`)** for rolling APIs that change frequently (Stripe-style date-based versioning): clients pin a date via `Stripe-Version` / `Api-Version` header; the server transforms newer responses down to the pinned date.
+- Pick one scheme per API and document it in the spec — never mix semver and CalVer in the same surface.
+- Within a major (semver) or pinned date (CalVer), only additive changes ship: new fields, new endpoints, new enum values (when consumers handle `unknown` fallback).
+## Deprecation Lifecycle (RFC 9745 + RFC 8594)
+Every retirement passes three stages. Skipping any stage breaks downstream consumers.
+| Stage | Header(s) | Action |
+|-------|-----------|--------|
+| Announce | `Deprecation: @<unix-ts>` (RFC 9745, Mar 2025 Standards Track) + `Link: <https://docs.example.com/migrate/v2>; rel="deprecation"` | Emit on every deprecated endpoint/field response; log usage by `X-Api-Key` / authenticated identity. |
+| Sunset | `Sunset: <HTTP-date>` (RFC 8594) | Set N months before removal: N=6 public, N=3 partner, N=1 internal. Pair with continued `Deprecation` header. |
+| Remove | HTTP 410 Gone + Problem Details (`type: https://errors.example.com/sunset`, `title: Endpoint removed`) | Only after the sunset date passes AND analytics show <0.1% usage for two consecutive 7-day windows. |
+- Field-level deprecation: emit `Deprecation` per endpoint that still returns the field; GraphQL uses `@deprecated(reason: "...")`; protobuf uses `[deprecated = true]`.
+- Document the migration in the `Link: ...rel="deprecation"` target — include before/after code samples, fallback strategy, and the removal date.
+## Idempotency-Key (draft-ietf-httpapi-idempotency-key-header)
+Every side-effectful endpoint (POST / PATCH / PUT / DELETE that mutates state) accepts an `Idempotency-Key: <client-generated-uuid>` request header.
+- **Key format:** UUIDv4 or ULID, max 255 chars. Clients generate; servers never accept server-generated keys.
+- **Storage:** server persists `(tenant_id, api_key, idempotency_key, request_body_hash, response_status, response_body)` for 24 hours (configurable per API; Stripe uses 24h, recommended range 24h-7d).
+- **Replay semantics:**
+  - Exact replay (same key + same request body hash) -> return the original response (same status, same body, same `Idempotency-Replay: true` header).
+  - Conflicting replay (same key + different request body hash) -> 422 with Problem Details `type: https://errors.example.com/idempotency-conflict`.
+  - In-flight replay (same key, original still processing) -> 409 Conflict + `Retry-After: <seconds>`.
+- **Scope:** per `(tenant_id, api_key)`, never global. Hashing the body prevents replay-with-mutation attacks.
+## Rate Limit Headers (draft-ietf-httpapi-ratelimit-headers)
+Emit structured-field headers on every response (not just 429):
+| Header | Value | Purpose |
+|--------|-------|---------|
+| `RateLimit-Policy` | `100;w=60` | Quota + window length (seconds). |
+| `RateLimit-Limit` | `100` | Maximum requests in current window. |
+| `RateLimit-Remaining` | `42` | Remaining quota in current window. |
+| `RateLimit-Reset` | `30` | Seconds until the window resets. |
+- On 429, also include `Retry-After: <seconds>` (RFC 9110).
+- Vendor-specific `X-RateLimit-*` headers may be emitted alongside for backward compatibility, but the IETF draft headers are the floor.
+## OAuth 2.1 (draft-ietf-oauth-v2-1)
+- **PKCE mandatory on all clients** (public and confidential) — `code_challenge` + `code_challenge_method=S256`.
+- **Implicit and password (ROPC) grants removed.** Use authorization code + PKCE for interactive; client credentials for server-server; device code for input-constrained.
+- **Exact redirect-URI matching** — no wildcards, no partial paths. Pre-register each callback URL.
+- **Refresh-token rotation with reuse detection** — on every refresh, issue a new refresh token and invalidate the old one. If the old token is presented after rotation, revoke the entire token family (replay attack signal).
+- Access tokens short-lived (5-15 min). Long-lived refresh tokens, sender-constrained (DPoP or mTLS) for public clients.
+- Full auth detail: `rules/hatch3r-auth-patterns.md`.
+## Resource Indicators (RFC 8707)
+- Token requests include a `resource=<uri>` parameter naming the protected resource the token will be presented to.
+- Authorization servers bind the token's `aud` claim to the requested resource — prevents audience-confusion attacks across multi-tenant AS deployments.
+- Multi-resource consent flows pass multiple `resource` params; the AS issues separate tokens per audience.
+## DPoP (RFC 9449)
+- Sender-constrained access and refresh tokens for browser-resident clients (SPAs) — alternative to mTLS for public clients.
+- Client generates a key pair (non-extractable in the browser via WebCrypto), signs a fresh DPoP JWT per request, sends in `DPoP: <jwt>` header.
+- Server binds the access token's `cnf.jkt` claim (JWK thumbprint) to the client's public key; rejects token presentation from any other key.
+- Recommended floor for any token leaving the first-party boundary (third-party SDK consumers, mobile apps with insecure storage).
+## Webhook Signing (Standard Webhooks convention)
+- Headers: `webhook-id` (unique delivery ID), `webhook-timestamp` (unix seconds), `webhook-signature` (`v1,<base64-HMAC-SHA256(<id>.<timestamp>.<body>)>`).
+- HMAC-SHA256 with a per-tenant shared secret, constant-time comparison.
+- Reject deliveries where `|now - webhook-timestamp| > 300` seconds (5-minute replay window).
+- `webhook-id` doubles as the idempotency key on the receiver — cache for >=5 minutes (Redis/SQLite) to dedupe duplicate deliveries.
+- **Key rotation:** signature spec supports comma-separated versions (`v1,sig1 v2,sig2`); receivers accept any valid version during the rollover window.
+- Deliveries that fail signature verification return HTTP 401 + Problem Details `type: https://errors.example.com/invalid-signature`.
+## Backward Compatibility Policy
+- **Field additions** are always safe.
+- **Field removal or rename** requires the full RFC 9745 + RFC 8594 lifecycle (Announce -> Sunset -> Remove). Never remove without the lifecycle.
+- **Enum extension** is safe only when consumers handle an `unknown` fallback value. Generated client code (OpenAPI Generator, openapi-typescript) must emit the `unknown` branch.
+- **Nullable transitions** (required -> optional, or non-null -> nullable) require a feature flag and a deprecation window — clients written against the stricter shape will break on `null`.
+- **Type changes** (e.g., `int32` -> `int64` in protobuf, `string` -> `string | null` in OpenAPI) require a new major version unless the wire format is unchanged (protobuf int32->int64 is wire-compatible for small values; document the edge case).
+## CI Gates
+Every PR touching an API contract runs the matching diff tool against the last shipped tag. Breaking changes block merge.
+- `oasdiff breaking` (OpenAPI 3.0 + 3.1, 450+ rules, CNCF Sandbox) — comments on the PR with the breaking-change list.
+- `buf breaking` (Protobuf, default `FILE` rule set) — blocks wire + source incompatibility.
+- `graphql-inspector diff` — blocks GraphQL schema breaking changes.
+All three integrate with PR comments and fail the build on a breaking exit code. Cross-reference the gate set in `rules/hatch3r-api-design.md` "Breaking-Change CI Gate".
+## References
+- RFC 9457 — Problem Details for HTTP APIs (March 2025, obsoletes RFC 7807).
+- RFC 9745 — The Deprecation HTTP Response Header Field (March 2025, Standards Track).
+- RFC 8594 — The Sunset HTTP Response Header Field.
+- RFC 8707 — Resource Indicators for OAuth 2.0.
+- RFC 9449 — OAuth 2.0 Demonstrating Proof of Possession (DPoP).
+- draft-ietf-oauth-v2-1 — The OAuth 2.1 Authorization Framework.
+- draft-ietf-httpapi-idempotency-key-header — The Idempotency-Key HTTP Header Field.
+- draft-ietf-httpapi-ratelimit-headers — RateLimit Header Fields for HTTP.
+- Standard Webhooks — https://github.com/standard-webhooks/standard-webhooks