npm - hatch3r - Versions diffs - 1.9.0 → 2.0.0 - Mend

hatch3r 1.9.0 → 2.0.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (288) hide show

package/dist/content/agents/shared/user-question-protocol.md CHANGED Viewed

@@ -6,11 +6,11 @@ tags: [shared, ux, p1, p4]
 cache_friendly: true
 ---
-> Last updated: 2026-05-26
 ## Purpose
-This protocol defines how hatch3r agents and commands surface clarifying or triage questions to the user across the 15 supported AI coding platforms. It is the single source of truth for the *how* of asking; the *whether* is governed by [quality-charter §3 "Question Unclear Requirements"](./quality-charter.md) and §8 "Escalate Ambiguity Early". Files that reference this protocol: the requirements-elicitation mode (`agents/modes/requirements-elicitation.md`), the five ASK-checkpoint commands, and the four ask-prone agents — researcher, fixer, architect, implementer.
+> Last updated: 2026-06-09
+This protocol defines how hatch3r agents and commands surface clarifying or triage questions to the user across the 3 supported AI coding tools (Claude Code, Cursor, GitHub Copilot per `governance/CONSTITUTION.md` §6 Decision 12). It is the single source of truth for the *how* of asking; the *whether* is governed by [quality-charter §3 "Question Unclear Requirements"](./quality-charter.md) and §8 "Escalate Ambiguity Early". Coverage is a 100% floor, not a fixed file list: every framework-dev workflow that can mutate canonical artifacts routes its ASK through this protocol — the requirements-elicitation mode (`agents/modes/requirements-elicitation.md`), the shared §0 gate block (`agents/shared/clarification-default-block.md`), and every `agents/hatch3r-*.md` agent and `commands/hatch3r-*.md` command that detects ambiguity (counts: `governance/inventory.json` `counts.agents`, `counts.commands`, `counts.skills`). The "3 supported AI coding tools" figure above is drift-guarded against `inventory.json` `counts.adapters` by `scripts/inventory.ts` (`npm run inventory:check-docs`).
 ## When To Ask
@@ -19,6 +19,7 @@ This protocol defines how hatch3r agents and commands surface clarifying or tria
 - **Branching path** — two or more viable approaches with materially different cost, scope, or risk.
 - **Conflicting constraints** — requirements that cannot all hold (e.g., "no new dependencies" and "use library X").
 - **Missing acceptance criteria** — no testable definition of done for the requested change.
+- **Architectural premise concern** — request is well-specified and single-interpretation, but the chosen approach is architecturally misguided (wrong pattern for the constraint, mis-applied abstraction, foreseeable scaling failure). Surface the concern as a §0.5 Challenge the Premise question per quality-charter §3 — phrase it constructively ("Before I implement this, I want to confirm the approach because [specific concern]"), then offer 2-4 options (proceed as requested / proposed alternative / hybrid). Default-if-no-response: proceed as requested (lowest-blast-radius assumption is that the user has context the agent lacks).
 ## When NOT To Ask
@@ -42,6 +43,8 @@ The marker below is replaced at canonical-write time with the enumeration table
 When viewing this file in the source repo (pre-generation), the marker is unsubstituted — refer to the adapter map in `src/pipeline/adapterToolTranslator.ts` for the same mappings.
+**Sub-agent caveat (Claude Code).** The native `AskUserQuestion` tool is a main-agent / orchestrator affordance only. Claude Code filters it out of every Task-tool sub-agent context (foreground and background) regardless of the agent's `tools` declaration, so a spawned `hatch3r-*` sub-agent cannot call it (upstream-confirmed via `anthropics/claude-code` issues #18721, #12890, #34592; verified 2026-06-06 @ https://code.claude.com/docs/en/sub-agents). A sub-agent that hits an ASK trigger therefore does NOT use the native tool: it RETURNS Status `BLOCKED_AMBIGUITY` (`agents/shared/quality-charter.md` §17) with the question rendered via the Plain-Text Fallback Template below, and the orchestrator owns the live ASK (`agents/shared/clarification-default-block.md` → Protocol). This exclusion is re-verified each audit cycle against the date stamp on `src/pipeline/adapterToolTranslator.ts::ASK_USER_TOOLS` (`claude` entry) — a date drift there is a D09 Medium finding.
 ## Plain-Text Fallback Template
 Use this exact shape when no native tool is available:
@@ -63,6 +66,19 @@ Rules for the template:
 - The default-if-no-response line is mandatory — it removes the deadlock when the user is away or replies "you decide".
 - The default option is the safest reversible choice, not the most ambitious one.
+### Optional preview attachment (orchestrator-scoped, platform-conditional)
+For questions whose options differ along a **visual or layout dimension** — a color/spacing/typography choice, two candidate component arrangements, a copy-tone A/B, a before/after of an async-view state — a rendered preview alongside the numbered options lets the user decide from the artifact instead of from prose. This is most useful for the UI (CQ1) and UX (CQ2) ASK gates, where "which of these reads better" is the decision.
+Attach a preview only when BOTH hold:
+- **You are the orchestrator** (main-conversation `commands/hatch3r-*.md`), not a Task-tool sub-agent. Sub-agents cannot call the native question tool at all (see the Sub-agent caveat above); they render the question — and any preview snippet — in the `BLOCKED_AMBIGUITY` structured result, and the orchestrator owns the live ASK.
+- **The runtime's native question tool supports rich/rendered option content.** Capability is per-platform; look up your runtime's row in the adapter map (`src/pipeline/adapterToolTranslator.ts::ASK_USER_TOOLS`) before relying on a preview, exactly as you would for the question tool itself. When the platform's native tool is text-only (or absent), embed the preview as a fenced code block inside the Plain-Text Fallback Template instead — never assume a rendering affordance the platform row does not document.
+**Concrete affordance (Claude Code orchestrator).** On the `claude` platform, populate the per-option `markdown` field of the `AskUserQuestion` tool: when any option carries a `markdown` value, Claude Code switches to a side-by-side preview layout (numbered options on the left, the rendered markdown on the right), so a diagram, code/diff block, or token-swatch table renders inline with the choice. The field accepts markdown only (no HTML), and long content is truncated to a scrollable panel — keep each option's preview to about one screen of markup. One documented constraint: supplying a `markdown` field suppresses the free-text "Other / Type something" option on that question, so reserve the preview layout for closed-option visual decisions. Other platforms expose no documented preview field (their `ASK_USER_TOOLS` row is `null` — `cursor`, `copilot` as of 2026-06-09); on those, fall back to the fenced-code-block-in-plain-text shape above.
+The preview is an enrichment, not a replacement: the numbered options and the mandatory `Default if no response:` line are still required. Keep the preview small (one screen of markup or a single mock) so it does not bury the decision.
 ## Examples
 **Example 1 — Ambiguous requirement.** Request: "Add caching to the user profile endpoint."
@@ -88,6 +104,23 @@ Default if no response: 2
 Default if no response: 2
 ```
+## Operationalising Default-if-no-Response
+The "Default if no response" line is mandatory in every plain-text fallback question (per the rules above) and is the canonical deadlock-breaker for every ASK gate across the framework. To operationalise the default at runtime rather than leaving it as documented prose:
+1. **Detect non-response.** If a question goes unanswered within the host runtime's question window (Claude Code: idle session timeout; Cursor: AskUserQuestion timeout; Copilot Workspace: prompt-cycle gap), or if the user replies "you decide" / "default" / empty, treat as non-response.
+2. **Apply the safe default.** Pick the option declared on the `Default if no response: <option number>` line — the lowest-blast-radius reversible choice the question's author named.
+3. **Log the default-taken decision.** Emit in the Iteration Summary §8 (Open Questions / Blockers) a single line: `Default applied: <question summary> → option <N> (<one-line reason>)`. This is the operational counterpart to the prose mandate — every agent / command ASK output that exercises the default MUST log the decision. The §8 line is a required field of the canonical Iteration Summary template (`rules/hatch3r-iteration-summary.md` → The 9 Sections, item 8), and the catching-gate ownership is named in `rules/hatch3r-clarification-default.md` → How to ask — those two files are the enforcing surface for this step.
+4. **Never silent-pick.** If no `Default if no response: <option>` line was emitted with the question (an authoring bug per the rules in this file), return `BLOCKED_AMBIGUITY` in the structured result rather than guessing.
+The §8 log is the audit-visible evidence that the default-if-no-response contract was honored; absence of the log when a default was applied is a P8 B1 gate failure. Runtime emission of the §8 line is orchestrator-produced interpreted markdown, so no static gate can verify it fired — D05 (prompt-engineering) and D13 (human-AI collaboration) audit-cycle spot checks plus the per-run Iteration Summary validation gate are the enforcement, not a compiled check.
+The contract has a single named owner so it is not re-declared per command: the always-on, `precedence: high` rule `rules/hatch3r-clarification-default.md` (`scope: always`) binds every `commands/hatch3r-*.md` orchestrator and mutating skill corpus-wide, and that rule's "How to ask" section is the catching gate. An individual command body therefore need not repeat the default-handling vocabulary to be bound by it — the rule + this protocol + the Iteration Summary §8 template are the three-anchor owner set, and a command inherits the contract by being in scope. Treat a command that *does* restate it as a convenience, not the source of truth.
+## Cross-Phase Aggregation
+This protocol defines the *shape* of a single question (numbered options, mandatory default). It does not define where pending questions accumulate when several fire across one pipeline run. That cross-phase aggregation layer is the `PipelineContext.pendingUserInputs: PendingUserInput[]` field (`src/pipeline/pipelineContext.ts`, Finding D7-SA7.1-F-10): each phase pushes a `PendingUserInput` — whose `options` + `defaultIfNoResponse` mirror the Plain-Text Fallback Template above — instead of emitting a direct prompt mid-phase. The orchestrator drains the array between phases, paginating when more than three accumulate, so a Tier 3 run's multiple ASK checkpoints are batched rather than each rendered independently. Per-question UX (this file) and cross-phase batching (the field) are complementary: author each request to this template, enqueue it on the field.
 ## Anti-Patterns
 - **Multi-question barrage** — asking five questions in one turn. Ask the highest-leverage one first; the answer often collapses the rest.
@@ -95,3 +128,12 @@ Default if no response: 2
 - **Silent assumption** — proceeding when ambiguity is real. Apply quality-charter §8: log the ambiguity in structured output even if you decide to proceed under a default.
 - **Echo-as-question** — restating the user's request back as a question ("So you want me to add caching?"). Confirm only when you have a specific decision point with options to offer.
 - **Inflated default** — choosing the most disruptive option as the no-response default. Defaults must be the reversible, lowest-blast-radius choice.
+## References
+The `markdown`-field preview affordance documented in "Optional preview attachment" above is corroborated by:
+- `anthropics/claude-code` issue #27348 — names the `markdown` field on `AskUserQuestion` options as the trigger for the preview layout and the "Other / Type something" suppression constraint (accessed 2026-06-09; trust tier: official-vendor issue tracker). https://github.com/anthropics/claude-code/issues/27348
+- `anthropics/claude-code` issue #33062 — documents the side-by-side preview panel and its scroll/truncation behavior for long content (accessed 2026-06-09; trust tier: official-vendor issue tracker). https://github.com/anthropics/claude-code/issues/33062
+Per-platform tool names and the `null`-means-no-native-tool convention are sourced in `src/pipeline/adapterToolTranslator.ts::ASK_USER_TOOLS` (each entry carries its own `// verified <date> @ <docs URL>` stamp, refreshed on the D09 per-cycle web-research pass).

package/dist/content/checks/README.md CHANGED Viewed

@@ -1,3 +1,8 @@
+---
+id: checks-readme
+type: documentation
+description: Authoring guide and directory contract for the checks/ review-criteria files. Not a check itself — excluded from check enumeration by its documentation type.
+---
 # Checks
 Review criteria definitions for automated and agent-assisted code review. Each check file defines a set of criteria that agents reference when reviewing code changes.

package/dist/content/checks/accessibility.md CHANGED Viewed

@@ -1,19 +1,20 @@
 ---
 id: accessibility
 type: check
-description: Accessibility review criteria covering WCAG compliance, semantic HTML, keyboard navigation, screen reader support, and inclusive design patterns
+description: Accessibility review criteria covering WCAG 2.2 AA compliance, semantic HTML, keyboard navigation, screen reader support, and inclusive design patterns
+tags: [accessibility]
 cache_friendly: true
 ---
 # Accessibility Check
-> **Severity vocabulary:** see [governance/audit/templates/severity-mapping.md](../governance/audit/templates/severity-mapping.md) for canonical 5-column mapping.
+> **Severity vocabulary:** see [agents/shared/severity-mapping.md](../agents/shared/severity-mapping.md) for canonical 5-column mapping.
 Review criteria for evaluating accessibility in pull requests.
 ## Semantic HTML and ARIA
 - `[CRITICAL]` Interactive elements use native HTML controls (`<button>`, `<a>`, `<input>`, `<select>`) rather than styled `<div>` or `<span>` elements with click handlers.
-- `[CRITICAL]` Custom interactive components have appropriate ARIA roles, states, and properties (`role`, `aria-expanded`, `aria-selected`, `aria-disabled`, etc.).
+- `[CRITICAL]` Custom interactive components carry ARIA roles, states, and properties that match the WAI-ARIA 1.2 Authoring Practices pattern for the widget type (`role`, `aria-expanded`, `aria-selected`, `aria-disabled`, etc.); axe-core 4.5+ reports 0 `aria-required-attr` / `aria-allowed-role` violations.
 - `[CRITICAL]` Images have meaningful `alt` text, or `alt=""` and `aria-hidden="true"` if purely decorative.
 - `[CRITICAL]` Form inputs have associated `<label>` elements (via `for`/`id` or nesting). No input relies solely on placeholder text for identification.
 - `[RECOMMENDED]` Headings follow a logical hierarchy (`h1` > `h2` > `h3`) without skipping levels.
@@ -21,22 +22,28 @@ Review criteria for evaluating accessibility in pull requests.
 ## Keyboard Navigation
-- `[CRITICAL]` All interactive elements are reachable and operable via keyboard (Tab, Shift+Tab, Enter, Space, Arrow keys as appropriate).
+- `[CRITICAL]` All interactive elements are reachable and operable via keyboard (Tab, Shift+Tab, Enter, Space; Arrow keys for composite widgets per the WAI-ARIA APG keyboard-interaction table for that widget — menu, listbox, tablist, grid).
 - `[CRITICAL]` Focus is not trapped in a component unless it is a modal dialog with an explicit close mechanism.
 - `[CRITICAL]` Custom keyboard shortcuts do not conflict with screen reader or browser shortcuts.
 - `[RECOMMENDED]` Focus order follows the visual reading order (logical DOM order). No use of positive `tabindex` values.
-- `[RECOMMENDED]` Focus indicators are visible and meet contrast requirements. No `outline: none` without a custom visible focus style.
+- `[CRITICAL]` WCAG 2.2 SC 2.4.7 Focus Visible (AA): the keyboard focus indicator is visible on every operable element. No `outline: none` without a conforming custom focus style.
+- `[CRITICAL]` WCAG 2.2 SC 2.4.11 Focus Not Obscured (Minimum) (AA): the focused element is not entirely hidden by author-created content (sticky headers, footers, cookie banners, overlays). `[RECOMMENDED]` SC 2.4.13 Focus Appearance (AAA) as the enhanced target: the focus indicator has a contrast ratio of at least 3:1 against adjacent colors and a minimum area equal to a 1px-thick perimeter (or 4px-thick along the shortest side).
 ## Visual Design and Color
-- `[CRITICAL]` Text meets WCAG 2.1 AA contrast ratios: 4.5:1 for normal text, 3:1 for large text (18px+ or 14px+ bold).
+- `[CRITICAL]` Text meets WCAG 2.2 AA contrast ratios: 4.5:1 for normal text, 3:1 for large text (18px+ or 14px+ bold). Verify with axe-core 4.5+ `color-contrast` rule — 0 violations.
 - `[CRITICAL]` Information is not conveyed by color alone. Status indicators, errors, and required fields use icons, text, or patterns in addition to color.
 - `[CRITICAL]` UI remains functional and readable at 200% browser zoom without horizontal scrolling or content clipping.
-- `[RECOMMENDED]` Touch targets are at least 44x44 CSS pixels for mobile interfaces.
 - `[RECOMMENDED]` Animations respect the `prefers-reduced-motion` media query — reduce or remove motion for users who have requested it.
+## Touch and Pointer Targets (WCAG 2.2)
+- `[CRITICAL]` WCAG 2.2 SC 2.5.8 Target Size (Minimum): pointer targets are at least 24x24 CSS px, OR have 24px of spacing to adjacent targets, unless an inline/essential exception applies. Mobile interfaces target 44x44 CSS px (Apple HIG / Material).
+- `[CRITICAL]` WCAG 2.2 SC 2.5.7 Dragging Movements: any drag operation (slider, drag-to-reorder, map pan) provides a single-pointer alternative that does not require dragging (tap, click, or button control).
 ## Screen Reader Support
+- `[CRITICAL]` WCAG 2.2 SC 4.1.3 Status Messages: status messages (success/error/progress, search-result counts, loading state) are programmatically conveyed via `role="status"`, `role="alert"`, or an `aria-live` region without moving focus, so assistive tech announces them.
 - `[CRITICAL]` Dynamic content updates (toast notifications, live regions, inline validation) use `aria-live` regions (`polite` or `assertive`) to announce changes.
 - `[CRITICAL]` Modal dialogs trap focus, announce their title via `aria-labelledby`, and return focus to the trigger element on close.
 - `[CRITICAL]` Icon-only buttons and links have accessible names via `aria-label`, `aria-labelledby`, or visually hidden text.

package/dist/content/checks/code-quality.md CHANGED Viewed

@@ -6,7 +6,7 @@ cache_friendly: true
 ---
 # Code Quality Check
-> **Severity vocabulary:** see [governance/audit/templates/severity-mapping.md](../governance/audit/templates/severity-mapping.md) for canonical 5-column mapping.
+> **Severity vocabulary:** see [agents/shared/severity-mapping.md](../agents/shared/severity-mapping.md) for canonical 5-column mapping.
 Review criteria for evaluating code quality in pull requests.

package/dist/content/checks/performance.md CHANGED Viewed

@@ -2,11 +2,14 @@
 id: performance
 type: check
 description: Performance review criteria covering bundle size, render performance, memory usage, network optimization, database queries, and runtime efficiency
+tags: [performance]
 cache_friendly: true
 ---
 # Performance Check
-> **Severity vocabulary:** see [governance/audit/templates/severity-mapping.md](../governance/audit/templates/severity-mapping.md) for canonical 5-column mapping.
+> **Severity vocabulary:** see [agents/shared/severity-mapping.md](../agents/shared/severity-mapping.md) for canonical 5-column mapping.
+**Applies when:** the project declares performance budgets. Without declared budgets this check is advisory, not gating (per `rules/hatch3r-right-sizing.md`).
 Review criteria for evaluating performance in pull requests.
@@ -14,7 +17,7 @@ Review criteria for evaluating performance in pull requests.
 - `[CRITICAL]` New dependencies do not increase the total bundle size (gzipped) beyond the defined budget. Measure before and after.
 - `[CRITICAL]` No unintentional import of full libraries when a subpath import or tree-shakable alternative exists (e.g., `import _ from "lodash"` vs `import groupBy from "lodash/groupBy"`).
-- `[RECOMMENDED]` Images and static assets are optimized (compressed, appropriately sized, modern formats like WebP/AVIF).
+- `[RECOMMENDED]` Images and static assets are compressed and served in WebP or AVIF, with intrinsic dimensions no larger than the maximum rendered size at 2x device pixel ratio (no downscaling a >2x-oversized source in the browser).
 - `[RECOMMENDED]` CSS and JavaScript are minified and dead-code-eliminated in production builds.
 ## Render and Paint Performance
@@ -35,7 +38,7 @@ Review criteria for evaluating performance in pull requests.
 - `[CRITICAL]` No N+1 request patterns — batch or aggregate related requests instead of issuing one per item.
 - `[CRITICAL]` API response payloads return only required fields. No over-fetching of large objects when a subset is needed.
-- `[RECOMMENDED]` Cacheable responses include appropriate cache headers (`Cache-Control`, `ETag`). Client-side caching (query cache, service worker) is used where applicable.
+- `[RECOMMENDED]` Cacheable responses set `Cache-Control` with an explicit `max-age` (immutable static assets `max-age=31536000, immutable`; mutable API responses `no-cache` or a documented TTL) plus an `ETag` or `Last-Modified` validator. Responses that mutate state or carry per-user data set `Cache-Control: private` or `no-store`.
 - `[RECOMMENDED]` Request waterfalls are minimized — parallelize independent requests and preload critical resources.
 ## Database Query Performance
@@ -49,7 +52,7 @@ Review criteria for evaluating performance in pull requests.
 ## Runtime Performance
 - `[CRITICAL]` No synchronous blocking operations (heavy computation, synchronous I/O) on the main thread or event loop.
-- `[CRITICAL]` Hot-path code (called per-request, per-frame, or per-event) does not perform redundant computation — memoize or cache where appropriate.
+- `[CRITICAL]` Hot-path code (called per-request, per-frame, or per-event) does not recompute a pure result whose inputs are unchanged since the last call — memoize or cache any such repeated pure computation, and bound the cache with an eviction policy (size cap or TTL).
 - `[RECOMMENDED]` CPU-intensive work is offloaded to workers, background jobs, or streaming pipelines.
 - `[RECOMMENDED]` Object allocation in tight loops is minimized — reuse buffers and avoid creating short-lived objects per iteration.

package/dist/content/checks/security.md CHANGED Viewed

@@ -6,7 +6,7 @@ cache_friendly: true
 ---
 # Security Check
-> **Severity vocabulary:** see [governance/audit/templates/severity-mapping.md](../governance/audit/templates/severity-mapping.md) for canonical 5-column mapping.
+> **Severity vocabulary:** see [agents/shared/severity-mapping.md](../agents/shared/severity-mapping.md) for canonical 5-column mapping.
 Review criteria for evaluating security posture in pull requests.
@@ -20,10 +20,10 @@ Review criteria for evaluating security posture in pull requests.
 ## Authentication and Authorization
-- `[CRITICAL]` New endpoints have appropriate authentication guards. No accidental public exposure of protected resources.
+- `[CRITICAL]` New endpoints enforce authentication and resource-level authorization per an OAuth 2.1 / RBAC rubric — every non-public route rejects unauthenticated requests (401) and out-of-scope authenticated requests (403). No accidental public exposure of protected resources.
 - `[CRITICAL]` Authorization checks verify the authenticated user has access to the specific resource, not just that they're logged in.
 - `[CRITICAL]` Authentication tokens are not logged, included in URLs, or exposed in error messages.
-- `[RECOMMENDED]` Session tokens use secure attributes: `HttpOnly`, `Secure`, `SameSite=Strict`, appropriate `Max-Age`.
+- `[RECOMMENDED]` Session tokens use secure attributes: `HttpOnly`, `Secure`, `SameSite=Strict`, and a `Max-Age` no longer than the session policy (default ≤24h for access tokens, ≤30d for refresh tokens).
 - `[RECOMMENDED]` Rate limiting is applied to authentication endpoints (login, password reset, OTP verification).
 ## Secrets and Credentials
@@ -38,8 +38,8 @@ Review criteria for evaluating security posture in pull requests.
 - `[CRITICAL]` New dependencies are from trusted sources with active maintenance (recent commits, multiple maintainers).
 - `[CRITICAL]` No known critical or high vulnerabilities in new or updated dependencies (`npm audit`, `pip audit`, etc.).
-- `[RECOMMENDED]` Dependency count increase is justified — prefer standard library solutions when adequate.
-- `[RECOMMENDED]` New dependencies have appropriate licenses compatible with the project.
+- `[RECOMMENDED]` Each added runtime dependency is justified in the PR description; a standard-library or already-present-dependency equivalent that covers the same use case is preferred over a new transitive dependency tree.
+- `[RECOMMENDED]` New dependencies carry an OSI-approved license compatible with the project license (no GPL/AGPL copyleft in a permissively-licensed product unless legal-approved).
 ## Data Exposure
@@ -58,4 +58,4 @@ Review criteria for evaluating security posture in pull requests.
 ## Error Handling
 - `[CRITICAL]` Error responses to clients do not include stack traces, internal paths, or database details.
-- `[RECOMMENDED]` Security-relevant errors (auth failures, permission denials) are logged with sufficient context for incident investigation.
+- `[RECOMMENDED]` Security-relevant errors (auth failures, permission denials) are logged with the five fields an incident responder needs — timestamp, actor/subject identifier, action attempted, resource, and outcome — and never the secret or credential that was rejected.

package/dist/content/checks/testing.md CHANGED Viewed

@@ -6,7 +6,7 @@ cache_friendly: true
 ---
 # Testing Check
-> **Severity vocabulary:** see [governance/audit/templates/severity-mapping.md](../governance/audit/templates/severity-mapping.md) for canonical 5-column mapping.
+> **Severity vocabulary:** see [agents/shared/severity-mapping.md](../agents/shared/severity-mapping.md) for canonical 5-column mapping.
 Review criteria for evaluating test coverage and quality in pull requests.

package/dist/content/commands/board/pickup-delegation-multi.md CHANGED Viewed

@@ -63,9 +63,11 @@ After the shared epic-level research, score each sub-issue individually and run
 ### 6b.3. Execute Level-by-Level With Parallel Sub-Agents
+Worktree isolation applies here identically to the batch path: when ≥2 implementers run concurrently in a level on a Tier 2/3 run and the platform writes into the orchestrator's tree, isolate each implementer per **Step 6c.3-iso** (`--isolate=auto|on|off`, default `auto`) and integrate via the merge protocol in Step 6b.4. Sub-issues in an epic share file overlap more often than standalone batch issues, so the missed-overlap risk that isolation removes is higher here.
 For each dependency level, starting at Level 1:
-1. **Spawn one implementer sub-agent per sub-issue in the current level.** Use the Task tool with `subagent_type: "generalPurpose"`. Launch as many sub-agents concurrently as the platform supports.
+1. **Spawn one implementer sub-agent per sub-issue in the current level.** Use the Task tool with `subagent_type: "generalPurpose"`. Launch as many sub-agents concurrently as the platform supports. When worktree isolation is active (per the note above), create one scratch worktree per implementer first and pin each sub-agent to its `.worktrees/pickup-iso-<sub-issue-number>/` path.
 2. **Each sub-agent prompt must include:**
    - The sub-issue number, title, full body, and acceptance criteria.
@@ -82,13 +84,15 @@ For each dependency level, starting at Level 1:
    - Relevant learnings from `.hatch3r/learnings/` (from Step 6.pre).
    - Instruction to use GitHub MCP for issue reads, and follow the project's tooling hierarchy for external knowledge augmentation.
    - Explicit instruction: do NOT create branches, commits, or PRs.
+   - `correlation_id` (UUID v4 per top-level task per `rules/hatch3r-agent-orchestration.md` → Correlation ID; each epic sub-issue gets its own id).
    - Confidence expression requirement: rate every recommendation and finding as high/medium/low confidence per the quality charter (`agents/shared/quality-charter.md`). High = verified against current code. Medium = pattern-based, not fully verified. Low = best judgment, recommend human review.
 3. **Await all sub-agents in the current level.** Collect their structured results (files changed, tests written, issues encountered).
 4. **Review sub-agent results:**
    - If any sub-agent reports BLOCKED or PARTIAL, **ASK** the user how to proceed (skip, fix manually, retry).
-   - If sub-agents modified overlapping files, review for conflicts and resolve before proceeding.
+   - **When worktree isolation was active for this level** (Step 6c.3-iso, applied per the 6b.3 note): integrate each scratch worktree's file diff back onto the branch via the Step 6b.4 conflict-resolution step, then run `npx hatch3r worktree-cleanup --yes` before advancing.
+   - If sub-agents modified overlapping files (or the isolated integration above surfaced an overlap Step 3 missed), review for conflicts and resolve before proceeding.
 5. **Advance to the next dependency level.** Repeat steps 1-4 until all levels are complete.
@@ -97,7 +101,7 @@ For each dependency level, starting at Level 1:
 After all sub-agents complete:
 1. Run a combined quality check across all changes.
-2. Resolve any cross-sub-issue integration issues.
+2. Resolve any cross-sub-issue integration issues — when isolation ran, this is where each worktree's diff lands on the branch; physically disjoint writes apply cleanly and only Step-3-missed overlaps need manual resolution.
 3. Verify no file conflicts between parallel sub-agent outputs.
 ---
@@ -131,11 +135,31 @@ Unlike epics (which share a single researcher), standalone issues in a batch are
 3. **Await all researchers.** Collect structured outputs. Each researcher's output feeds exclusively into its corresponding implementer in Step 6c.3. For Tier 2/3 issues, present elicitation questions to the user and await answers before proceeding.
+### 6c.3-iso. Optional Worktree Isolation (Parallel Implementers, Filesystem Platforms)
+Step 3 collision detection predicts file overlap and moves overlapping issues to sequential levels. That prediction is fallible: a missed overlap (Step 3.4) puts two implementers into the orchestrator's single working tree, where concurrent writes to the same file silently clobber each other before the Step 6c.4 post-hoc merge can run. Worktree isolation converts the *predicted* disjointness into *physical* disjointness — each implementer writes into its own `git worktree`, so two implementers cannot touch the same on-disk file even when Step 3 missed the overlap. The existing Step 6c.4 merge protocol becomes the integration step plus the residual-conflict handler for whatever Step 3 missed.
+**Isolation gate — `--isolate=auto|on|off` (default `auto`).** Under `auto`, isolate this level when ALL three hold:
+1. **Concurrency:** ≥2 implementer sub-agents are dispatched into this level (a single implementer has no peer to collide with — skip isolation).
+2. **Tier:** the run is Tier 2 or Tier 3 batch mode (Step 0 auto-tier or the `--effort` override per `hatch3r-board-pickup` → Effort Override). Tier 1 batches skip isolation — their trivial edits carry low overlap risk and the worktree setup/cleanup cost is not justified.
+3. **Platform writes to the orchestrator's filesystem:** the platform applies sub-agent file edits into the orchestrator's working tree (CLI platforms — e.g. Claude Code Task sub-agents share the orchestrator's tree). Set this condition false for hosted platforms that sandbox each sub-agent in a separate per-agent workspace and merge results outside the orchestrator tree — those platforms already provide disjoint writes, so a second worktree layer adds no isolation.
+`--isolate=on` forces isolation whenever ≥2 implementers run (overrides the tier/platform conditions); `--isolate=off` keeps the legacy single-tree path (Step 6c.3 dispatches directly into the orchestrator's tree, relying solely on Step 3 prediction + the Step 6c.4 post-hoc merge). Resolution order: explicit `--isolate` flag wins over the `auto` default.
+**When isolation is active for a level, wrap Step 6c.3 dispatch as follows:**
+1. **Create one scratch worktree per implementer.** For each issue in the level, run `npx hatch3r worktree-setup pickup-iso-<issue-number> --yes` (the `--yes` flag suppresses the interactive secret-propagation confirmation for the non-interactive orchestrator; review `.worktree-include` first if `.env.*` files are in scope per the command's CWE-552 blast-radius warning). Each worktree lands on a throwaway branch under `.worktrees/pickup-iso-<issue-number>/` and is populated + `hatch3r sync`-ed by the command. These are scratch isolation directories, not per-issue PR branches — the batch keeps its single shared branch (per `hatch3r-board-pickup` Step 5), and changes are integrated back onto it in Step 6c.4.
+2. **Pin each implementer to its worktree.** In the Step 6c.3 per-agent prompt, set the sub-agent's working directory to its `.worktrees/pickup-iso-<issue-number>/` path and instruct it to read, edit, and run tests only within that path. The "do NOT create branches, commits, or PRs" instruction is unchanged — the orchestrator still owns all git operations; the throwaway worktree branch exists only to give `git worktree add` a ref and is never pushed.
+3. **Integrate and clean up after the level returns.** After all implementers in the level return (Step 6c.3 step 3), integrate each worktree's file diff back onto the batch branch in the orchestrator's main tree via the **Step 6c.4 file-conflict resolution protocol** — physically disjoint writes apply cleanly; any genuinely overlapping or semantically conflicting edits Step 3 missed surface there for resolution exactly as in the non-isolated path. Then run `npx hatch3r worktree-cleanup --yes` to remove the scratch worktrees and prune their throwaway branches. On a non-zero `worktree-setup`/`worktree-cleanup` exit, surface the error and **ASK** the user (retry, fall back to `--isolate=off` for the remaining levels, or abort) — never silently drop isolation mid-run.
 ### 6c.3. Execute Level-by-Level With Parallel Implementers
 For each dependency level, starting at Level 1:
-1. **Spawn one hatch3r-implementer sub-agent per issue in the current level.** Use the Task tool with `subagent_type: "generalPurpose"`. Launch as many sub-agents concurrently as the platform supports.
+0. **Resolve worktree isolation for this level** per Step 6c.3-iso. When isolation is active, create the per-implementer scratch worktrees (6c.3-iso step 1) before dispatching, and pin each sub-agent to its worktree path (6c.3-iso step 2). When inactive, dispatch directly into the orchestrator's tree as below.
+1. **Sort the level by priority, then spawn one hatch3r-implementer sub-agent per issue.** Within each level, sort issues by priority (`p0` > `p1` > `p2` > `p3`) before dispatching. When the platform concurrency limit caps the level, fill the in-flight pool with the highest-priority issues first and queue the rest for the next dispatch slot as in-flight sub-agents return. Issues within a level are independent for correctness; this ordering only governs which independent issues land first when concurrency is the binding constraint. Use the Task tool with `subagent_type: "generalPurpose"`. Launch as many sub-agents concurrently as the platform supports.
 2. **Each sub-agent prompt must include:**
    - The issue number, title, full body, and acceptance criteria.
@@ -150,13 +174,16 @@ For each dependency level, starting at Level 1:
    - All `scope: always` rule directives from `rules/` — subagents do not inherit rules automatically.
    - Relevant learnings from `.hatch3r/learnings/` (from Step 6.pre).
    - Explicit instruction: do NOT create branches, commits, or PRs.
+   - **When worktree isolation is active for this level** (Step 6c.3-iso): the sub-agent's working directory set to its `.worktrees/pickup-iso-<issue-number>/` path, with the instruction to read, edit, and run tests only within that path.
+   - `correlation_id` (UUID v4 per top-level task per `rules/hatch3r-agent-orchestration.md` → Correlation ID; batch tasks share one id with a per-issue sub-task index).
    - Confidence expression requirement: rate every recommendation and finding as high/medium/low confidence per the quality charter (`agents/shared/quality-charter.md`). High = verified against current code. Medium = pattern-based, not fully verified. Low = best judgment, recommend human review.
 3. **Await all sub-agents in the current level.** Collect their structured results (files changed, tests written, issues encountered).
 4. **Review sub-agent results:**
    - If any sub-agent reports BLOCKED or PARTIAL, **ASK** the user how to proceed (skip, fix manually, retry).
-   - If sub-agents modified overlapping files, review for conflicts and resolve before proceeding.
+   - **When worktree isolation was active for this level** (Step 6c.3-iso): integrate each scratch worktree's file diff back onto the batch branch via the Step 6c.4 file-conflict resolution protocol, then run `npx hatch3r worktree-cleanup --yes` to remove the worktrees and prune their throwaway branches before advancing (6c.3-iso step 3).
+   - If sub-agents modified overlapping files (or the isolated integration above surfaced an overlap Step 3 missed), review for conflicts and resolve before proceeding.
 5. **Advance to the next dependency level.** Repeat steps 1-4 until all levels are complete.
@@ -183,7 +210,7 @@ After all implementations complete, run the two-stage quality pipeline across th
    - **Blast radius data** from Step 6c.2 (if available) — so the fixer knows which consumers and contracts must be preserved.
    - **Reference conventions** from Step 6c.2 (if available) — so the fixer maintains established patterns when applying fixes.
 3. Re-spawn **`hatch3r-reviewer`** to verify fixes.
-4. Repeat steps 2-3 for a maximum of **3 iterations** until the confidence-aware gate passes: **0 Critical + 0 Warning AND reviewer confidence != low**. If reviewer confidence is low but there are no Critical/Warning findings, trigger a second reviewer pass before exiting the loop; do not exit until the second pass returns non-low confidence OR the user explicitly accepts the low-confidence PASS.
+4. Repeat steps 2-3 for a maximum of **3 iterations** until the confidence-aware gate passes. Evaluate the gate per the canonical **Confidence-Aware Review Gate** in `agents/shared/confidence-gate.md`, passing in the resolved `--confidence-floor` (`any` | `medium` | `high`) routed here from `hatch3r-board-pickup` → Confidence Floor. At the default `any` floor: **0 Critical + 0 Warning AND reviewer confidence != low**; if reviewer confidence is low with no Critical/Warning findings, trigger a second reviewer pass before exiting and do not exit until the second pass returns non-low confidence OR the user explicitly accepts the low-confidence PASS. At floor `medium` the pass surface is unchanged; at floor `high` a `medium`-confidence clean verdict also forces a second pass (and any low-confidence finding triggers an ASK) — apply the floor-tier branches from the shared gate, do not collapse them to the `any` row.
    After each reviewer iteration, assess the reviewer's findings confidence: if the reviewer rates any finding as low-confidence, flag it separately in the ASK prompt so the user can prioritize human review of uncertain findings.
 5. If still not clean after 3 iterations, **ASK** the user how to proceed.
@@ -192,15 +219,15 @@ After all implementations complete, run the two-stage quality pipeline across th
 Launch as many independent sub-agents in parallel as the platform supports.
 **Always spawn (mandatory for every code change):**
-- **hatch3r-test-writer** — tests for all code changes across the batch.
-- **hatch3r-security-auditor** — security review of all code changes across the batch.
+- **hatch3r-testability** (CQ5) — verify tests for all code changes across the batch meet the mandate map / coverage floor.
+- **hatch3r-security** (CQ3) — security review of all code changes across the batch.
 **Always evaluate (spawn when applicable):**
 - **hatch3r-docs-writer** — spawn when any changes affect public APIs, architectural patterns, or user-facing behavior.
 **Conditional specialists (spawn when triggered by any issue in the batch):**
 - **hatch3r-lint-fixer** — spawn when lint errors are present after implementation.
-- **hatch3r-a11y-auditor** — spawn when any issue has `area:ui` or `area:a11y` labels.
-- **hatch3r-perf-profiler** — spawn when any issue has `area:performance` label.
+- **hatch3r-ui** (CQ1) — spawn when any issue has `area:ui` or `area:a11y` labels.
+- **hatch3r-performance** (CQ7) — spawn when any issue has `area:performance` label.
 Await all specialist sub-agents. Apply their feedback before proceeding to Step 7.

package/dist/content/commands/board/pickup-delegation.md CHANGED Viewed

@@ -60,6 +60,7 @@ The implementer sub-agent prompt MUST include:
 - All `scope: always` rule directives from `rules/` — subagents do not inherit rules automatically.
 - Relevant learnings from `.hatch3r/learnings/` (from Step 6.pre).
 - Explicit instruction: do NOT create branches, commits, or PRs.
+- `correlation_id` (UUID v4 generated per top-level task per `rules/hatch3r-agent-orchestration.md` → Correlation ID; epic sub-issues get individual ids, batch tasks share one id with a sub-task index).
 - Confidence expression requirement: rate every recommendation and finding as high/medium/low confidence per the quality charter (`agents/shared/quality-charter.md`). High = verified against current code. Medium = pattern-based, not fully verified. Low = best judgment, recommend human review.
 Await the implementer sub-agent. Collect its structured result (files changed, tests written, issues encountered).
@@ -75,7 +76,7 @@ After implementation completes, run the two-stage quality pipeline. Use the Task
    - **Blast radius data** from Step 6a.1 (if available) — so the fixer knows which consumers and contracts must be preserved.
    - **Reference conventions** from Step 6a.1 (if available) — so the fixer maintains established patterns when applying fixes.
 3. Re-spawn **`hatch3r-reviewer`** to verify fixes.
-4. Repeat steps 2-3 for a maximum of **3 iterations** until the confidence-aware gate passes: **0 Critical + 0 Warning AND reviewer confidence != low**. If reviewer confidence is low but there are no Critical/Warning findings, trigger a second reviewer pass before exiting the loop; do not exit until the second pass returns non-low confidence OR the user explicitly accepts the low-confidence PASS.
+4. Repeat steps 2-3 for a maximum of **3 iterations** until the confidence-aware gate passes. Evaluate the gate per the canonical **Confidence-Aware Review Gate** in `agents/shared/confidence-gate.md`, passing in the resolved `--confidence-floor` (`any` | `medium` | `high`) routed here from `hatch3r-board-pickup` → Confidence Floor. At the default `any` floor: **0 Critical + 0 Warning AND reviewer confidence != low**; if reviewer confidence is low with no Critical/Warning findings, trigger a second reviewer pass before exiting and do not exit until the second pass returns non-low confidence OR the user explicitly accepts the low-confidence PASS. At floor `medium` the pass surface is unchanged; at floor `high` a `medium`-confidence clean verdict also forces a second pass (and any low-confidence finding triggers an ASK) — apply the floor-tier branches from the shared gate, do not collapse them to the `any` row.
    After each reviewer iteration, assess the reviewer's findings confidence: if the reviewer rates any finding as low-confidence, flag it separately in the ASK prompt so the user can prioritize human review of uncertain findings.
 5. If still not clean after 3 iterations, **ASK** the user how to proceed.
@@ -84,22 +85,23 @@ After implementation completes, run the two-stage quality pipeline. Use the Task
 Launch as many independent sub-agents in parallel as the platform supports.
 **Always spawn (mandatory for every code change):**
-- **hatch3r-test-writer** — tests for all code changes. Unit tests for new logic, regression tests for bug fixes, integration tests for cross-module changes.
-- **hatch3r-security-auditor** — security review of all code changes. Audit data flows, access control, input validation, and secret management.
+- **hatch3r-testability** (CQ5) — verify tests for all code changes meet the mandate map / coverage floor. Unit tests for new logic, regression tests for bug fixes, integration tests for cross-module changes.
+- **hatch3r-security** (CQ3) — security review of all code changes. Audit data flows, access control, input validation, and secret management.
 **Always evaluate (spawn when applicable):**
 - **hatch3r-docs-writer** — spawn when changes affect public APIs, architectural patterns, or user-facing behavior. Skip silently if no documentation impact.
 **Conditional specialists (spawn when triggered):**
 - **hatch3r-lint-fixer** — spawn when lint errors are present after implementation.
-- **hatch3r-a11y-auditor** — spawn when issue has `area:ui` or `area:a11y` labels.
-- **hatch3r-perf-profiler** — spawn when issue has `area:performance` label or changes touch hot paths.
+- **hatch3r-ui** (CQ1) — spawn when issue has `area:ui` or `area:a11y` labels.
+- **hatch3r-performance** (CQ7) — spawn when issue has `area:performance` label or changes touch hot paths.
 Each specialist sub-agent prompt MUST include:
 - The agent protocol to follow (e.g., "Follow the hatch3r-reviewer agent protocol").
 - All `scope: always` rule directives from `rules/` (subagents do not inherit rules automatically).
 - The diff or file changes to review.
 - The issue's acceptance criteria.
+- `correlation_id` (UUID v4 per top-level task per `rules/hatch3r-agent-orchestration.md` → Correlation ID).
 - Confidence expression requirement: rate every recommendation and finding as high/medium/low confidence per the quality charter (`agents/shared/quality-charter.md`). High = verified against current code. Medium = pattern-based, not fully verified. Low = best judgment, recommend human review.
 Await all specialist sub-agents. Apply their feedback (fixes, additional tests, documentation updates) before proceeding to Step 7.

package/dist/content/commands/board/pickup-modes.md CHANGED Viewed

@@ -126,6 +126,7 @@ At the end of an auto session, generate a summary:
 - **Issue update failure:** warn and continue (labels not blocking).
 - **Quality verification failure:** fix before creating PR/MR.
 - **PR/MR creation failure:** present error and manual instructions.
+- **Context degradation:** per the canonical Context-Degradation Policy (`rules/hatch3r-agent-orchestration-detail.md` -> Context-Degradation Policy) — compress at `>50%` context window, restart at `>75%`; the coarse turn-count fallback (inherited from `hatch3r-workflow`) is ~25 turns, at which point suggest splitting the batch or starting a fresh context with a progress summary of completed and remaining issues.
 ---

package/dist/content/commands/board/pickup-post-impl.md CHANGED Viewed

@@ -142,7 +142,7 @@ After PR creation, capture learnings from this development session.
    - Were any pitfalls discovered that should be avoided next time?
 2. If learnings are identified:
-   - Create learning files in `.hatch3r/learnings/` following the learning file format (see `hatch3r-learn` command).
+   - Create learning files in `.hatch3r/learnings/` following the learning file format (see `skills/hatch3r-learn/SKILL.md`).
    - Include the issue number as `source-issue`.
    - Tag with relevant area labels from the issue.
    - **ASK:** "Learnings captured: {list}. Anything else to note? (add more / done)"

package/dist/content/commands/hatch3r-api-spec.md CHANGED Viewed

@@ -9,15 +9,17 @@ quality_charter: agents/shared/quality-charter.md
 efficiency_patterns: agents/shared/efficiency-patterns.md
 cache_friendly: true
 parallel_tool_default: true
+efficiency_tier: deep
 triage_tiers: [1, 2, 3]
+supports_resume: true
 sub_agents_spawned:
   count: 3
-  rationale: Three-stage pipeline per agentPipeline — researcher scans routes, docs-writer assembles the OpenAPI spec, reviewer validates structure; researcher and reviewer fanned out concurrently around the shared docs-writer dependency.
+  rationale: Three-stage pipeline per agentPipeline — researcher scans routes, docs-writer assembles the OpenAPI spec, reviewer validates structure; researcher and reviewer fanned out concurrently around the shared docs-writer dependency. Cost-dominance per CONSTITUTION §2 P8 — token cost never serializes independent work.
 ---
 ## §0 Detect Ambiguity (P8 B1)
-Before any action, scan the user's request and provided context for unresolved questions in scope, acceptance criteria, irreversibility, or constraint conflicts (contradictory inputs, missing target, unknown convention). If any are found, ask the user via the platform-native question tool per `agents/shared/user-question-protocol.md` — do not proceed under silent assumption. This is the default path, not an exception. Acceptable to proceed without asking ONLY when scope is single-target, single-concern, and the brief alone is testable. Any residual ambiguity discovered mid-workflow invokes the same protocol.
+> Orchestration boilerplate: see `commands/shared/orchestration-frame.md` → §0 Detect Ambiguity (P8 B1). Triggers: contradictory inputs, missing target, unknown convention.
 ## Agent Pipeline
@@ -28,6 +30,8 @@ Before any action, scan the user's request and provided context for unresolved q
 | 3. Spec Generation | `hatch3r-docs-writer` | No | Yes |
 | 4. Validation | `hatch3r-reviewer` | No | Yes (if validate mode) |
+**Parallel-safety conditions** (per `rules/hatch3r-agent-orchestration.md` §Parallel Safety): every parallel fan-out above holds all three — read-only or disjoint writes, deterministic aggregation, no shared mutable state.
 # API Specification Generator — OpenAPI from Code or Code-vs-Spec Drift Detection
 Take a codebase with HTTP or RPC endpoints and produce a complete OpenAPI 3.1 specification (`docs/api/`), or take an existing spec and compare it against the live codebase to surface undocumented, stale, or drifted endpoints. The researcher sub-agent scans route definitions, decorators, and handlers; the orchestrator extracts TypeScript types and validation schemas inline; the docs-writer assembles the final spec document; and the reviewer validates structural correctness. AI proposes all outputs; user confirms before any files are written.
@@ -47,6 +51,14 @@ Take a codebase with HTTP or RPC endpoints and produce a complete OpenAPI 3.1 sp
 3. **Structured output only.** All sub-agent prompts require structured markdown output — no prose dumps.
 4. **Batch schema extraction.** Group types by source file. Read each file once and extract all referenced types in a single pass.
+## Confidence Propagation Contract
+Every sub-agent delegation prompt in this command MUST include the confidence expression requirement below (verbatim). Sub-agents are invoked with the `quality_charter: agents/shared/quality-charter.md` reference in their frontmatter, but the orchestrator repeats the directive to override runtime prompt defaults per the charter §1 rule.
+> Confidence expression requirement: rate every recommendation and finding as high/medium/low confidence per the quality charter (`agents/shared/quality-charter.md`). High = verified against current code. Medium = pattern-based, not fully verified. Low = best judgment, recommend human review.
+Downstream propagation: the Step 2 framework-detection confidence (already high/medium/low), every endpoint-discovery completeness verdict, and each Step 7 drift classification MUST carry a high/medium/low confidence rating sourced from the researcher or reviewer sub-agent. Endpoints with unresolvable types map to low confidence. Dropping the signal between stages is a gate failure.
 ---
 ## Workflow
@@ -63,6 +75,25 @@ Classify the request before delegating:
 If Tier 1, complete inline and skip the parallel researcher fanout. If Tier 2, run the standard pipeline below. If Tier 3, run the full pipeline with research and confirm scope with the user before writing the spec.
+### Step 0.5: Emit Pre-Execution Cost Preview
+Before the first sub-agent dispatch (Step 3 endpoint-discovery researcher), surface the cost preview so a full-codebase spec scan is never started blind. Emit the `cost_estimate` block per `rules/hatch3r-cost-visibility.md` Pre-Execution Estimate, calibrated to the Step 0 triage tier:
+```yaml
+cost_estimate:
+  expected_sa_count: <triage tier → Tier 1 inline ~0, Tier 2 ~2 (researcher + docs-writer), Tier 3 up to 3 (+ reviewer)>
+  estimated_input_tokens_static_frame: <int>
+  estimated_web_research_queries: <int>
+  triage_tier: light | standard | deep
+  estimated_duration_min: <int>
+```
+Post-execution actuals + delta land in the Step 8 output summary's Fan-out + Cost section per `rules/hatch3r-cost-visibility.md` Post-Execution Actuals. Token telemetry sources from `src/pipeline/observability.ts`.
+### Effort Override (Decision 17)
+> Orchestration boilerplate: see `commands/shared/orchestration-frame.md` → Effort Override (Decision 17). Misclassification example: a single-endpoint doc tweak scored as Deep, or a full-codebase drift scan scored as Light.
 ---
 ### Step 1: Gather API Context
@@ -301,6 +332,51 @@ Files Created/Updated:
 - **Respect the project's tooling hierarchy** for knowledge augmentation: project docs → codebase exploration → Context7 MCP → web research.
 - **Spec must be valid.** Never write a spec that fails structural validation. Fix issues before writing.
+## Resumability (Decision 27/30)
+api-spec is long-running — a Tier 3 full-codebase scan runs the hatch3r-researcher (codebase-analysis mode) over route definitions and handlers (Step 1), then orchestrator-inline TypeScript-schema + validation-schema extraction (Step 2), then hatch3r-docs-writer assembling the OpenAPI 3.1 spec under `docs/api/` (Step 3), then hatch3r-reviewer validating structural correctness in validate mode (Step 4). Per hatch3r's workspace-checkpointed resumability contract, checkpoint progress so an interrupted run re-enters at the last completed step rather than re-running the route scan or re-extracting schemas.
+> Orchestration boilerplate: see `commands/shared/orchestration-frame.md` → Checkpoint Contract. Per-command slots: workspace `.api-spec-workspace/`; step range the Step 0 → Step 8 progression; `wave` = researcher-batch index in multi-framework detection; snapshot/rollback paths `docs/api/`. Write points: after Step 1 route-scan researcher returns, after Step 2 schema-extraction completes (so endpoint-to-schema map survives a crash), after Step 3 docs-writer spec assembly is confirmed by ASK, after each Step 4 file write under `docs/api/` (so already-generated spec sections survive a crash), after Step 5 reviewer validation returns in validate mode (the drift-classification verdict persists), and after the Step 6 chain handoff in generate mode.
+---
+## Per-Turn Pipeline-State Header (Bypass Protection)
+> Orchestration boilerplate: see `commands/shared/orchestration-frame.md` → Per-Turn Pipeline-State Header. Phase mapping for api-spec: `1` = discovery + route enumeration, `2` = spec generation + schema extraction, `3` = validation + cross-reference verification, `4` = write + iteration-summary. Tier 1 runs are exempt per the Tier 1 exemption.
+## End-of-Turn Delegation Attestation (Bypass Protection)
+> Orchestration boilerplate: see `commands/shared/orchestration-frame.md` → End-of-Turn Delegation Attestation. Per-command mutated-file slot: OpenAPI/AsyncAPI spec writes, schema files, documentation.
+## Iteration Summary (mandatory output)
+Emit the canonical 9-section iteration summary per `rules/hatch3r-iteration-summary.md` as the final user-facing output. The validation gate at `.claude/rules/capability-lifecycle.md` blocks SUCCESS declarations without this block (CONSTITUTION §6 Decision 23).
+The 9 sections:
+1. **Request** — verbatim restatement of the user's ask in one sentence.
+2. **Fan-out + Cost** — `sub_agents_spawned: { count, rationale }` plus the `cost_estimate` / `cost_actuals` / `delta` blocks (see Cost Visibility below).
+3. **Web Research** — every URL fetched with access date + trust tier per `agents/shared/rigor-contract.md` (0 acceptable when no research was needed).
+4. **Files Mutated** — list with diff summary (lines added / removed / files created).
+5. **Gates Passed / Failed** — explicit list per `.claude/rules/capability-lifecycle.md` Gate Checklist.
+6. **Pillar Impact Attribution** — `progress_toward_pillar: <axis>.<pillar_id>+<delta>` per CONSTITUTION §6 Decision 17.
+7. **Verification Commands** — exact commands run with exit codes plus key output lines (≤200 chars).
+8. **Open Questions / Blockers** — explicit `None` if fully closed.
+9. **Learnings Captured** — IDs of any learnings written to `.hatch3r/learnings/` this run per `rules/hatch3r-learning-system.md`.
+### Cost Visibility (Decision 24)
+> Orchestration boilerplate: see `commands/shared/orchestration-frame.md` → Cost Estimate for the 5-field `cost_estimate` schema and the post-execution `cost_actuals` + `delta` contract; both land in Section 2 above.
+## Cost estimate (Decision 24)
+This command emits cost transparency per `rules/hatch3r-cost-visibility.md` and CONSTITUTION §6 Decision 24/29:
+- **Pre-execution `cost_estimate`** — emitted in Step 0.5 before the first researcher dispatch (Step 3 endpoint discovery).
+- **Post-execution `cost_actuals` + `delta`** — appended to the Step 8 output summary's Fan-out + Cost section per `rules/hatch3r-iteration-summary.md` §2.
+Per-tier `expected_sa_count` calibration (from frontmatter `sub_agents_spawned.count: 3` × tier heuristic in `rules/hatch3r-cost-visibility.md` Pre-Execution Estimate): Tier 1 ≈ 0 (inline single-endpoint edit); Tier 2 ≈ 2 (researcher scans routes + docs-writer assembles); Tier 3 up to 3 (researcher + docs-writer + reviewer validation in validate/drift mode). Deltas beyond 25% absolute value carry `flagged_for_review: true`. Token telemetry sources from `src/pipeline/observability.ts`; estimation primitives from `src/pipeline/costEstimator.ts`.
 ## Output Templates
 ### Spec Header
@@ -329,6 +405,7 @@ servers:
 ## Related
+- **Skill:** `skills/hatch3r-api-spec/SKILL.md` — the inline single-pass procedure this command's docs-writer (Step 5) and reviewer (Step 6) stages follow; shares `id: hatch3r-api-spec` by the Decision 13 workflow-split (command = multi-agent fan-out, skill = inline procedure). The skill's "Relationship to commands/hatch3r-api-spec.md" section is the canonical handoff record disambiguating the shared id (F16.3-H3).
 - **Rule:** `hatch3r-api-design` — API design conventions the generated spec must follow
 - **Command:** `hatch3r-codebase-map` — structural map that can help scope the API scan
 - **Command:** `hatch3r-project-spec` — project-level specification for API info block context