npm - hatch3r - Versions diffs - 1.9.0 → 2.0.0 - Mend

hatch3r 1.9.0 → 2.0.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (288) hide show

package/dist/content/commands/hatch3r-auth-scaffold.md ADDED Viewed

@@ -0,0 +1,250 @@
+---
+id: hatch3r-auth-scaffold
+type: command
+orchestrator: true
+agentPipeline: [hatch3r-implementer, hatch3r-security]
+description: "Scaffold authentication boilerplate for a greenfield API service — OAuth 2.1 authorization-code-with-PKCE flow, OIDC ID-token validation, and hashed personal-access-token (PAT) issuance/verification. Implementer writes the code; hatch3r-security gates it against the CQ3 auth-depth floor."
+argument-hint: "[service-name]"
+tags: [implementation, security, floor:security, floor:content-quality]
+quality_charter: agents/shared/quality-charter.md
+efficiency_patterns: agents/shared/efficiency-patterns.md
+cache_friendly: true
+parallel_tool_default: true
+efficiency_tier: standard
+triage_tiers: [1, 2, 3]
+sub_agents_spawned:
+  count: 2
+  rationale: One hatch3r-implementer writes the OAuth 2.1 / OIDC / PAT boilerplate (code mutation flows through the implementer per the Mandatory Delegation Directive); one hatch3r-security gates the result against the CQ3 auth-depth floor (PKCE, exact redirect-URI match, ID-token claim validation, token-secret hashing). Independent auth modes (interactive OAuth vs machine-to-machine PAT) fan out to parallel implementers; the implement -> security-gate edge is the only serialization. Cost-dominance per CONSTITUTION §2 P8.
+---
+## §0 Detect Ambiguity (P8 B1)
+Before any action, scan the request for unresolved questions in auth mode, threat model, and identity provider. If the request does not name which flow(s) to scaffold (interactive sign-in via OAuth 2.1, machine-to-machine via PAT, or both), the OIDC provider / issuer, or the client type (public SPA vs confidential server), ask the user via the platform-native question tool per `agents/shared/user-question-protocol.md` — the token-binding decision (DPoP for browser vs bare bearer) and the redirect-URI allowlist depend on these, and a wrong assumption ships an exploitable flow. Proceed without asking ONLY when the flow set, provider, and client type are all explicit. Scaffolding auth boilerplate is high-blast-radius; default to asking. Source: `.claude/rules/clarification-default.md`.
+## Agent Pipeline
+| Stage | Agent(s) | Parallel | Required |
+|-------|----------|----------|----------|
+| 1. Parse auth spec | Orchestrator (inline) | No | Yes |
+| 2. Confirm flow + threat model + ASK gate | Orchestrator (inline) | No | Yes |
+| 3. Generate boilerplate | `hatch3r-implementer` | Per auth mode | Yes |
+| 4. Gate against CQ3 auth floor | `hatch3r-security` | Per auth mode | Yes |
+| 5. Verify + Iteration Summary | Orchestrator (inline) | No | Yes |
+**Parallel-safety conditions** (per `rules/hatch3r-agent-orchestration.md` §Parallel Safety): when the spec covers both interactive OAuth and machine-to-machine PAT, fan out one `hatch3r-implementer` per mode — each writes a disjoint module (`src/auth/oauth/` vs `src/auth/pat/`), aggregation is deterministic (union of generated paths), no shared mutable state. The `hatch3r-security` gate runs once per generated mode after its implementer returns.
+---
+# Auth Scaffold -- OAuth 2.1 + OIDC + PAT Boilerplate for a Greenfield API
+Generates authentication boilerplate for a new API service to the OAuth 2.1 + OIDC + OWASP-ASVS-v5 floor: an authorization-code-with-PKCE flow, OIDC ID-token validation (issuer, audience, nonce, JWKS signature), and hashed personal-access-token issuance/verification for machine-to-machine clients. Output is project-language source plus a `.env.example` of the required secrets — never inlined credentials.
+Use `/hatch3r-auth-scaffold` on a greenfield service that needs an auth layer built to the CQ3 auth-depth floor (one of the CONSTITUTION §2B floors: "Auth depth coverage: 100%"). Use the `hatch3r-security-verify` skill to audit an existing auth implementation without regenerating it; use `/hatch3r-security-audit` for a broad security review beyond the auth surface.
+---
+## Argument Parsing
+Optional positional argument: `<service-name>`.
+- If supplied: seed Step 1 with that service.
+- If omitted: ASK for the service, the flow set, the OIDC provider, and the client type before delegating.
+---
+## Step 0: Triage
+Classify the auth scaffold before delegating. The tier names map to the canonical Light/Standard/Deep vocabulary (`agents/shared/triage-vocabulary.md`); the `Tier {1|2|3}` references in Step 2 resolve to these rows.
+- **Tier 1 (Light)** — a single auth mode (PAT only, or OAuth only) for a confidential server client with a named issuer. One `hatch3r-implementer` writes the single module; one `hatch3r-security` gate verifies it.
+- **Tier 2 (Standard)** — both interactive OAuth 2.1 and machine-to-machine PAT for one client type. One implementer per mode in parallel (`src/auth/oauth/` vs `src/auth/pat/`); one security gate per generated mode.
+- **Tier 3 (Deep)** — a public/browser client (the DPoP token-binding decision is in play), multiple identity providers, or a mixed public+confidential client matrix. Full fan-out (one implementer per mode × provider), each gated, plus the browser-bearer-is-a-High-finding threat check from Step 4 item 4.
+Rule: an unspecified client type or an undecided token-binding choice fires the §0 B1 gate before tiering — a public client mis-scaffolded as confidential ships an exploitable bearer flow. Classify upward when the client type or token binding is uncertain.
+---
+## Step 1: Parse Auth Spec
+Collect the inputs that determine the flow shape and the token-binding decision. Cache for the Step 3 implementer prompt.
+| Input | Default if unspecified | Notes |
+|-------|------------------------|-------|
+| Auth modes | OAuth 2.1 + PAT | interactive sign-in, machine-to-machine, or both |
+| Client type | confidential | public (SPA/mobile, no secret) vs confidential (server, holds a secret) |
+| OIDC provider / issuer | (required — ASK) | issuer URL → JWKS discovery; never a guessed default |
+| Token binding | DPoP for browser/mobile; bearer acceptable for confidential server-to-server | RFC 9449 — bare bearer for a browser client is a High finding |
+| ID-token clock skew | ≤ 300 s | documented skew window for `exp`/`iat` validation |
+| PAT hash | Argon2id (bcrypt fallback) | long-lived secrets are stored hashed, never plaintext |
+| Output module | `src/auth/` | `src/auth/oauth/`, `src/auth/oidc/`, `src/auth/pat/` |
+The client type drives the PKCE + refresh-token requirement: OAuth 2.1 mandates PKCE on every client (public AND confidential), and refresh tokens issued to public clients MUST be sender-constrained or one-time-use.
+---
+## Step 2: Confirm Flow + Threat Model + ASK Checkpoint (only mutation gate)
+Present the resolved spec and the threat-model decisions so the maintainer confirms before any auth code is written.
+```
+hatch3r-auth-scaffold — service: {name} (Tier {1|2|3})
+Resolved spec:
+  modes: OAuth 2.1 (authorization code + PKCE) + PAT (machine-to-machine)
+  client type: confidential (holds client_secret)
+  OIDC issuer: https://{provider}/  (JWKS auto-discovered)
+  token binding: bearer (confidential server-to-server); DPoP required if a browser client is added
+  ID-token validation: iss, aud, azp, exp, nonce, JWKS signature; skew ≤ 300s
+  PAT: 256-bit random, stored Argon2id-hashed, shown once on issue
+  output: src/auth/{oauth,oidc,pat}/ + .env.example
+OAuth 2.1 invariants enforced (draft-ietf-oauth-v2-1-15):
+  - PKCE (S256) on every authorization-code request
+  - exact-string redirect_uri allowlist (no wildcards)
+  - implicit grant + ROPC grant absent
+  - no bearer token in query string
+  - refresh-token rotation with reuse detection
+Tier: 2
+```
+ASK (only gate), per `agents/shared/user-question-protocol.md`:
+> Generate the auth scaffold for {name} with the flow + threat model above?
+> - `accept` — generate the boilerplate and run the CQ3 security gate
+> - `edit` — change a mode, client type, provider, or token binding first
+> - `skip` — cancel; write nothing
+>
+> (accept / edit / skip)
+After the user accepts, the run is autonomous through Step 5.
+### Step 0.5: Emit Pre-Execution Cost Preview
+Before the Step 2 ASK gate, emit the cost preview per `rules/hatch3r-cost-visibility.md`:
+```yaml
+cost_estimate:
+  expected_sa_count: <N auth modes × 1 implementer + N × 1 security gate>
+  estimated_input_tokens_static_frame: <int>
+  estimated_web_research_queries: <int>   # 0-1 — the spec set is fixed by the references below; web only for a fresh CVE check
+  triage_tier: light | standard | deep
+  estimated_duration_min: <int>
+```
+Post-execution actuals + delta land in the Step 5 Iteration Summary. `--effort=light|standard|deep` (Decision 17) forces the tier; record both auto and override.
+---
+## Step 3: Generate Boilerplate (sub-agent delegation)
+Delegate to `hatch3r-implementer` via the Task tool, one per auth mode. Code mutation flows through the implementer per the Mandatory Delegation Directive — the orchestrator writes no auth code inline.
+Each implementer prompt MUST include the resolved spec, the target module paths, and this boilerplate contract:
+**OAuth 2.1 authorization-code flow (`src/auth/oauth/`):**
+1. PKCE on every authorization-code request — generate a `code_verifier` (43-128 char, high-entropy) and send the `S256` `code_challenge`; verify on the token exchange. PKCE is mandatory on public AND confidential clients in OAuth 2.1.
+2. Exact-string `redirect_uri` allowlist — match the callback URI by exact string against a pre-registered list; no wildcard or prefix matching.
+3. No implicit grant (`response_type=token`) and no ROPC (`grant_type=password`) — both are removed from OAuth 2.1; do not scaffold either.
+4. Refresh-token rotation with reuse detection — rotate the refresh token on every use; on detection of a reused (already-rotated) token, revoke the entire token family. For a public client, the refresh token MUST additionally be sender-constrained (DPoP) or one-time-use.
+5. Access tokens are never placed in a URI query string (OAuth 2.1 prohibition) — pass via the `Authorization` header.
+**OIDC ID-token validation (`src/auth/oidc/`):** validate `iss` (matches the configured issuer), `aud` (matches the client_id), `azp` when `aud` is multi-valued, `exp` (with the documented ≤300 s skew), and `nonce` (matches the value sent in the auth request — replay guard), and verify the JWKS signature with a pinned `alg` allow-list before creating a session. Reject `alg: none`; reject `HS*` when the key is asymmetric (key-confusion guard). Wire RP-initiated logout (`end_session_endpoint`).
+**PAT issuance/verification (`src/auth/pat/`):** generate a 256-bit cryptographically-random token, return it to the caller exactly once at issue time, and store only its Argon2id hash (bcrypt fallback) — never the plaintext. On verification, hash the presented token and compare against the stored hash in constant time. Tokens carry a scope set and an expiry; revocation is a hash-table delete.
+**Secrets:** the client secret, issuer URL, and signing keys are referenced via `${env:VAR}` and emitted to `.env.example` with placeholder values — never inlined. (Project secret convention: `.claude/rules/security-patterns.md` rule 3.)
+Also include in the prompt: all `scope: always` rule directives; the confidence expression requirement (verbatim, high/medium/low per `agents/shared/quality-charter.md` §1); the implementer's standing test obligation (unit tests for token validation: positive = valid token reaches the resource, negative = `alg:none`/expired/wrong-`aud` token is rejected); and the boundary "do NOT create branches, commits, or PRs". Await the structured result; capture `Files changed`, `Tests written`, and the `Delegation proof ID` per file.
+---
+## Step 4: Gate Against CQ3 Auth Floor (sub-agent delegation)
+After each mode's implementer returns, delegate to `hatch3r-security` via the Task tool — the implementer-pre-write/post-write auth invocation in that agent's "When to invoke" (touches `src/auth/**`).
+The security prompt MUST include the generated file paths and require these checklist items (from `agents/hatch3r-security.md` Audit checklist):
+1. **OAuth 2.1 grant hygiene** (item 1) — PKCE present on every client; implicit + ROPC absent; exact-string `redirect_uri` allowlist; refresh-token rotation with reuse detection.
+2. **OIDC ID-token validation** (item 2) — `iss`, `aud`, `azp`, `exp`, `nonce`, JWKS signature all verified before session creation; clock-skew window documented.
+3. **JWT BCP conformance** (item 4, RFC 8725) — `alg` pinned per issuer; `alg: none` rejected; `HS*`-with-asymmetric-key rejected (key-confusion guard).
+4. **Sender-constrained tokens** (item 3) — DPoP on any browser/mobile access token; a bare browser bearer is a High finding.
+5. **Token-secret storage** — the PAT is stored hashed (Argon2id/bcrypt), never plaintext (OWASP ASVS v5 V6 long-lived-secret storage).
+The security gate runs the relevant verification commands from that agent's table (e.g. `rg -n "response_type=code" src/auth/ | rg -v "code_challenge"` must return empty; `rg -n "grant_type=(implicit|password)" src/auth/` must return empty) and returns its `proof_trace` + status. A `CRITICAL` finding (e.g. `alg:none` accepted, refresh rotation absent on a public client) routes the fix back through `hatch3r-implementer` (max 1 regeneration pass), then re-gates. A persistent CRITICAL ends the run at `PARTIAL` and the scaffold is flagged not-merge-ready.
+---
+## Step 5: Verify + Iteration Summary
+Run the project verification gates and record exit codes: `npm test` (or the project equivalent) for the auth unit tests, `npx tsc --noEmit`, and the security agent's grep checks re-run as a final pass.
+### End-of-Turn Delegation Attestation (Bypass Protection)
+> Orchestration boilerplate: see `commands/shared/orchestration-frame.md` → End-of-Turn Delegation Attestation. Per-command mutated-file slot: `src/auth/oauth/<file>`, `src/auth/oidc/<file>`, `src/auth/pat/<file>` — all `via hatch3r-implementer`.
+### Iteration Summary (mandatory output)
+Emit the canonical iteration summary per `rules/hatch3r-iteration-summary.md`:
+```markdown
+## Iteration Summary
+**Status:** SUCCESS | PARTIAL | FAILED | BLOCKED
+**Outcome:** {one sentence — e.g., "Scaffolded OAuth 2.1 + OIDC + hashed-PAT auth for orders-api; security gate PASS."}
+**Done:**
+- src/auth/oauth/* → authorization-code + PKCE flow via hatch3r-implementer (proof: {id})
+- src/auth/oidc/* → ID-token validation (iss/aud/nonce/JWKS) via hatch3r-implementer (proof: {id})
+- src/auth/pat/* → Argon2id-hashed PAT issue/verify via hatch3r-implementer (proof: {id})
+**Not Done / Deferred / Unverified:**
+- `.env.example` placeholders — populate real issuer + client_secret before first run
+- (or: `None — full scope completed`)
+**Open Questions / Blockers:**
+- (or: `None`)
+**Fan-out + Cost:** sub_agents_spawned: { count, rationale } + cost_estimate / cost_actuals / delta
+**Pillar Impact Attribution:** progress_toward_pillar: content-quality.CQ3+{delta}
+**Confidence:** {high | medium | low} — {basis from implementer output + security gate verdict}
+**Suggested Next Action:** {one line — e.g., "Populate .env, then wire the OIDC callback route into the service router."}
+```
+Status decision rules:
+- **SUCCESS** — boilerplate generated, security gate PASS, auth unit tests + typecheck exit 0.
+- **PARTIAL** — generated but the security gate left a residual High/Critical finding, or a verification gate failed.
+- **FAILED** — the implementer returned BLOCKED on every mode; nothing written.
+- **BLOCKED** — provider/issuer or client type contradictory or undecided, or a security gate Critical the maintainer must rule on.
+---
+## Per-Turn Pipeline-State Header (Bypass Protection)
+> Orchestration boilerplate: see `commands/shared/orchestration-frame.md` → Per-Turn Pipeline-State Header. Phase mapping: `1` = spec parse + confirm, `2` = implementer boilerplate generation, `3` = security gate + verify + summary. Tier 1 single-mode runs are exempt per the Tier 1 exemption.
+---
+## Guardrails
+1. **One ASK gate.** Step 2 is the only user-facing checkpoint; after `accept`, the run proceeds through Step 5.
+2. **No commit or push.** Generated auth code is left staged for human review; git operations are out of scope.
+3. **No deprecated grants.** Never scaffold the implicit grant or ROPC — both are removed from OAuth 2.1; PKCE on the authorization-code flow is the only public-client path.
+4. **No inlined secrets.** Client secrets, signing keys, and issuer URLs are referenced via `${env:VAR}` and emitted to `.env.example` with placeholders — never written into source.
+5. **No plaintext long-lived tokens.** PATs are stored Argon2id/bcrypt-hashed and shown once at issue; a scaffold that persists a PAT in plaintext fails the Step 4 CQ3 gate.
+6. **Security gate is mandatory.** The `hatch3r-security` CQ3 gate runs on every generated auth mode — a scaffold is never declared SUCCESS without a PASS verdict.
+## Resumability (Decision 27/30)
+auth-scaffold fans out one implementer per auth mode, so checkpoint at the per-mode boundary — an interrupted run re-enters at the first un-generated mode rather than regenerating completed OAuth/OIDC/PAT modules.
+> Orchestration boilerplate: see `commands/shared/orchestration-frame.md` → Checkpoint Contract. Per-command slots: workspace `.auth-scaffold-workspace/`; step range the Step 1 → Step 5 progression; `wave` = the per-mode index in Step 3/4; snapshot/rollback paths every `src/auth/**` file a Step 3 implementer or a Step 4 regeneration touches. Write points: after the Step 1 spec parse, after the Step 2 accept gate, after each Step 3 implementer return (per mode), and after each Step 4 security gate.
+## References
+- [OAuth 2.1 Authorization Framework (`draft-ietf-oauth-v2-1-15`)](https://datatracker.ietf.org/doc/draft-ietf-oauth-v2-1/) (accessed 2026-06-02, IETF OAuth WG, official-docs) — mandates PKCE on every client, exact-string redirect-URI matching, removal of the implicit + ROPC grants, the bearer-token-in-query prohibition, and sender-constrained-or-one-time refresh tokens for public clients; source for the Step 3 OAuth contract and the deprecated-grant guardrail.
+- [oauth.net — OAuth 2.1 specification index](https://oauth.net/2.1/) (accessed 2026-06-02, Aaron Parecki / OAuth.net, official-docs) — canonical clearinghouse summarizing the OAuth 2.1 normative changes (PKCE, exact redirect match, grant removals, refresh-token constraints); corroborating second source for the OAuth invariants.
+- [OWASP ASVS v5.0 — V10 OAuth and OIDC](https://github.com/OWASP/ASVS/blob/master/5.0/en/0x19-V10-OAuth-and-OIDC.md) (accessed 2026-06-02, OWASP Foundation, official-docs; v5.0 released May 2025) — verification requirements for exact redirect-URI allowlisting, sender-constrained tokens (mTLS / DPoP), and ID-token `nonce` replay mitigation; the CQ3 floor the Step 4 gate checks against.
+- [OpenID Connect Core 1.0 §3.1.3.7 — ID Token Validation](https://openid.net/specs/openid-connect-core-1_0.html#IDTokenValidation) (accessed 2026-06-02, OpenID Foundation, official-docs) — the `iss`/`aud`/`azp`/`exp`/`nonce`/signature checks required before session creation; source for the Step 3 OIDC validation contract.
+- `agents/hatch3r-security.md` -> Audit checklist items 1-4, Verification commands (accessed 2026-06-02, in-repo canonical, official-docs) — the CQ3 auth-depth floor and the grep-based verification the Step 4 gate runs.

package/dist/content/commands/hatch3r-benchmark.md CHANGED Viewed

@@ -2,22 +2,24 @@
 id: hatch3r-benchmark
 type: command
 orchestrator: true
-agentPipeline: [hatch3r-researcher, hatch3r-perf-profiler, hatch3r-docs-writer]
+agentPipeline: [hatch3r-researcher, hatch3r-performance, hatch3r-docs-writer]
 description: Run and analyze performance benchmarks. Compare results against baselines, identify regressions, and produce performance reports.
 tags: [review, performance]
 quality_charter: agents/shared/quality-charter.md
 efficiency_patterns: agents/shared/efficiency-patterns.md
 cache_friendly: true
 parallel_tool_default: true
+efficiency_tier: standard
 triage_tiers: [1, 2, 3]
+supports_resume: true
 sub_agents_spawned:
   count: 3
-  rationale: Three-stage pipeline per agentPipeline — researcher gathers prior baselines, perf-profiler executes the suite, docs-writer assembles the report; each receives the run cache and emits a structured slice.
+  rationale: Three-stage pipeline per agentPipeline — researcher gathers prior baselines, performance (CQ7) executes the suite, docs-writer assembles the report; each receives the run cache and emits a structured slice. Cost-dominance per CONSTITUTION §2 P8 — token cost never serializes independent work.
 ---
 ## §0 Detect Ambiguity (P8 B1)
-Before any action, scan the user's request and provided context for unresolved questions in scope, acceptance criteria, irreversibility, or constraint conflicts (contradictory inputs, missing target, unknown convention). If any are found, ask the user via the platform-native question tool per `agents/shared/user-question-protocol.md` — do not proceed under silent assumption. This is the default path, not an exception. Acceptable to proceed without asking ONLY when scope is single-target, single-concern, and the brief alone is testable. Any residual ambiguity discovered mid-workflow invokes the same protocol.
+> Orchestration boilerplate: see `commands/shared/orchestration-frame.md` → §0 Detect Ambiguity (P8 B1). Triggers: contradictory inputs, missing target, unknown convention.
 ## Agent Pipeline
@@ -25,9 +27,11 @@ Before any action, scan the user's request and provided context for unresolved q
 |-------|----------|----------|----------|
 | 1. Discovery | `hatch3r-researcher` (codebase-analysis mode) | No | Yes |
 | 2. Execution | Orchestrator (inline, runs benchmarks) | No | Yes |
-| 3. Analysis | `hatch3r-perf-profiler` | No | Yes |
+| 3. Analysis | `hatch3r-performance` | No | Yes |
 | 4. Reporting | `hatch3r-docs-writer` | No | If regressions found |
+**Parallel-safety conditions** (per `rules/hatch3r-agent-orchestration.md` §Parallel Safety): every parallel fan-out above holds all three — read-only or disjoint writes, deterministic aggregation, no shared mutable state.
 # Performance Benchmark — Run, Compare, and Report on Performance Metrics
 Run performance benchmarks against a target (file, function, endpoint, or full suite), compare results against a baseline (previous run, git ref, or none), and produce a structured performance report. Discovers existing benchmark files or proposes new ones for critical paths. Executes with configurable iterations, performs statistical analysis on results, and flags regressions with root cause tracing. Persists results to `.benchmarks/results.json` for longitudinal tracking. AI proposes all actions; user confirms at every checkpoint.
@@ -45,6 +49,14 @@ Run performance benchmarks against a target (file, function, endpoint, or full s
 3. **Structured output only.** All sub-agent prompts and benchmark results require structured markdown output — no prose dumps.
 4. **Compress raw metrics.** Store full raw data in `.benchmarks/results.json` but present only summary statistics (mean, p50, p95, p99, stddev) in the report.
+## Confidence Propagation Contract
+Every sub-agent delegation prompt in this command MUST include the confidence expression requirement below (verbatim). Sub-agents are invoked with the `quality_charter: agents/shared/quality-charter.md` reference in their frontmatter, but the orchestrator repeats the directive to override runtime prompt defaults per the charter §1 rule.
+> Confidence expression requirement: rate every recommendation and finding as high/medium/low confidence per the quality charter (`agents/shared/quality-charter.md`). High = verified against current code. Medium = pattern-based, not fully verified. Low = best judgment, recommend human review.
+Downstream propagation: the Step 7 statistical-significance verdict (CV, t-test, reliability flag) and every Step 8 root-cause attribution MUST carry a high/medium/low confidence rating sourced from the hatch3r-performance sub-agent. A `noisy` classification (CV > 15%) maps to low confidence. Dropping the signal between stages is a gate failure.
 ---
 ## Workflow
@@ -55,12 +67,36 @@ Execute these steps in order. **Do not skip any step.** Ask the user at every ch
 Classify the benchmark request before delegating:
-- **Tier 1 (trivial)**: single benchmark with `none` baseline or quick re-run of an existing suite; inline execution, no `hatch3r-perf-profiler` fanout.
+- **Tier 1 (trivial)**: single benchmark with `none` baseline or quick re-run of an existing suite; inline execution, no `hatch3r-performance` fanout.
 - **Tier 2 (standard)**: standard suite with `previous-run` or git-ref baseline; standard pipeline including statistical analysis and reporting.
 - **Tier 3 (deep)**: full-suite cross-environment benchmark with regression triage and root-cause tracing; full pipeline with research and confirm scope with the user before saving results.
 If Tier 1, complete inline and skip the analysis fanout. If Tier 2, run the standard pipeline below. If Tier 3, run the full pipeline with research and confirm scope with the user before saving results.
+### Step 0.5: Emit Pre-Execution Cost Preview
+Before the first sub-agent dispatch (Step 2 discovery researcher), surface the cost preview so a full-suite benchmark run is never started blind. Emit the `cost_estimate` block per `rules/hatch3r-cost-visibility.md` Pre-Execution Estimate, calibrated to the Step 0 triage tier:
+```yaml
+cost_estimate:
+  expected_sa_count: <triage tier → Tier 1 inline ~0, Tier 2 ~2 (performance + docs-writer when regressions), Tier 3 up to 3>
+  estimated_input_tokens_static_frame: <int>
+  estimated_web_research_queries: <int>
+  triage_tier: light | standard | deep
+  estimated_duration_min: <int>
+```
+The benchmark suite execution time (Step 5) is wall-clock measurement, not LLM cost — report it separately in `estimated_duration_min` so the cost delta is not skewed by iteration count. Post-execution actuals + delta land in the Step 9 report's Fan-out + Cost section per `rules/hatch3r-cost-visibility.md` Post-Execution Actuals. Token telemetry sources from `src/pipeline/observability.ts`.
+### Effort Override (Decision 17)
+Auto-tiering can misclassify — a quick re-run scored as Deep, or a full cross-environment suite scored as Light. The user override is the recovery path mandated by hatch3r's universal `--effort` override contract ("User overridable via `--effort` flag"):
+- `--effort=light|standard|deep` forces the named tier, bypassing the Step 0 auto-classification.
+- The override wins over the auto-detected tier; record both the auto-detected tier and the override in the run context so the Cost estimate block reports the budget delta.
+- The override does NOT lower the minimum-3-iterations statistical-validity floor (Guardrails) — measurement rigor is independent of effort tier.
+- No override passed → the Step 0 auto-classification stands.
 ---
 ### Step 1: Gather Benchmark Context
@@ -220,7 +256,7 @@ Comparison:
 ### Step 7: Statistical Analysis
-Delegate to `hatch3r-perf-profiler` for analysis of the collected metrics.
+Delegate to `hatch3r-performance` (CQ7) for analysis of the collected metrics.
 1. Calculate statistical significance for each delta:
    - Use coefficient of variation (CV) to assess measurement noise
@@ -399,6 +435,53 @@ The benchmark report follows this structure:
 ---
+## Resumability (Decision 27/30)
+benchmark is long-running — a Tier 3 full-suite run executes a multi-iteration benchmark sweep (Step 5), statistical analysis (Step 7), and regression root-cause delegation (Step 8) across the researcher → performance → docs-writer pipeline. Per hatch3r's workspace-checkpointed resumability contract, checkpoint progress so an interrupted run re-enters at the last completed step rather than re-running the suite from scratch — benchmark iterations are expensive wall-clock and the statistical-validity floor mandates a minimum of 3 iterations per Guardrails.
+> Orchestration boilerplate: see `commands/shared/orchestration-frame.md` → Checkpoint Contract. Per-command slots: workspace `.benchmark-workspace/`; step range the Step 0 → Step 10 progression; `wave` = suite/iteration batch index; snapshot/rollback paths `.benchmarks/results.json` and any report files under `docs/performance/`. Write points: after Step 1 context discovery, after Step 2 benchmark inventory locks, after Step 4 environment preparation is confirmed, after every Step 5 iteration batch completes (so partial measurements survive a crash and are not re-collected), after Step 6 baseline comparison, after Step 7 statistical analysis, after Step 8 root-cause delegation returns, after Step 9 report assembly, and after Step 10 results are persisted to `.benchmarks/results.json`.
+---
+## Per-Turn Pipeline-State Header (Bypass Protection)
+> Orchestration boilerplate: see `commands/shared/orchestration-frame.md` → Per-Turn Pipeline-State Header. Phase mapping for benchmark: `1` = scope + tool selection, `2` = benchmark execution / sub-agent dispatch, `3` = result aggregation + regression detection, `4` = report + iteration-summary. Tier 1 runs are exempt per the Tier 1 exemption.
+## End-of-Turn Delegation Attestation (Bypass Protection)
+> Orchestration boilerplate: see `commands/shared/orchestration-frame.md` → End-of-Turn Delegation Attestation. Per-command mutated-file slot: benchmark reports, baseline updates, dashboard refreshes.
+## Iteration Summary (mandatory output)
+Emit the canonical 9-section iteration summary per `rules/hatch3r-iteration-summary.md` as the final user-facing output. The validation gate at `.claude/rules/capability-lifecycle.md` blocks SUCCESS declarations without this block (CONSTITUTION §6 Decision 23).
+The 9 sections:
+1. **Request** — verbatim restatement of the user's ask in one sentence.
+2. **Fan-out + Cost** — `sub_agents_spawned: { count, rationale }` plus the `cost_estimate` / `cost_actuals` / `delta` blocks (see Cost Visibility below).
+3. **Web Research** — every URL fetched with access date + trust tier per `agents/shared/rigor-contract.md` (0 acceptable when no research was needed).
+4. **Files Mutated** — list with diff summary (lines added / removed / files created).
+5. **Gates Passed / Failed** — explicit list per `.claude/rules/capability-lifecycle.md` Gate Checklist.
+6. **Pillar Impact Attribution** — `progress_toward_pillar: <axis>.<pillar_id>+<delta>` per CONSTITUTION §6 Decision 17.
+7. **Verification Commands** — exact commands run with exit codes plus key output lines (≤200 chars).
+8. **Open Questions / Blockers** — explicit `None` if fully closed.
+9. **Learnings Captured** — IDs of any learnings written to `.hatch3r/learnings/` this run per `rules/hatch3r-learning-system.md`.
+### Cost Visibility (Decision 24)
+> Orchestration boilerplate: see `commands/shared/orchestration-frame.md` → Cost Estimate for the 5-field `cost_estimate` schema and the post-execution `cost_actuals` + `delta` contract; both land in Section 2 above.
+## Cost estimate (Decision 24)
+This command emits cost transparency per `rules/hatch3r-cost-visibility.md` and CONSTITUTION §6 Decision 24/29:
+- **Pre-execution `cost_estimate`** — emitted in Step 0.5 before the first researcher dispatch (Step 2 discovery).
+- **Post-execution `cost_actuals` + `delta`** — appended to the Step 9 report's Fan-out + Cost section per `rules/hatch3r-iteration-summary.md` §2.
+Per-tier `expected_sa_count` calibration (from frontmatter `sub_agents_spawned.count: 3` × tier heuristic in `rules/hatch3r-cost-visibility.md` Pre-Execution Estimate): Tier 1 ≈ 0 (inline discovery + execution, no fan-out); Tier 2 ≈ 2 (performance for analysis + docs-writer when regressions found); Tier 3 up to 3 (researcher + performance + docs-writer). Benchmark wall-clock execution is reported separately and not counted as LLM token cost. Deltas beyond 25% absolute value carry `flagged_for_review: true`. Token telemetry sources from `src/pipeline/observability.ts`; estimation primitives from `src/pipeline/costEstimator.ts`.
+---
 ## Error Handling
 - **Benchmarks fail to run:** Capture the error output. Check for missing dependencies, syntax errors, or runner misconfiguration. Present the error and ASK: "Benchmark {name} failed to execute: {error}. Options: (a) skip and continue with remaining, (b) attempt to fix the benchmark file, (c) abort."
@@ -424,7 +507,7 @@ The benchmark report follows this structure:
 ## Related
-- **Agent:** `hatch3r-perf-profiler` — deep performance profiling and analysis
+- **Agent:** `hatch3r-performance` (CQ7) — deep performance profiling and analysis
 - **Check:** `checks/performance.md` — performance budget checks
 - **Rule:** `hatch3r-performance-budgets` — performance budget thresholds and enforcement
 - **Command:** `hatch3r-refactor-plan` — plan optimizations identified by benchmark regressions

package/dist/content/commands/hatch3r-board-fill.md CHANGED Viewed

@@ -2,22 +2,24 @@
 id: hatch3r-board-fill
 type: command
 orchestrator: true
-agentPipeline: [hatch3r-reviewer, hatch3r-fixer]
+agentPipeline: [hatch3r-reviewer, hatch3r-fixer, hatch3r-ui, hatch3r-ux, hatch3r-security, hatch3r-reliability, hatch3r-testability, hatch3r-scalability, hatch3r-performance, hatch3r-maintainability, hatch3r-enhancability]
 description: Create epics and issues/work items from todo.md, reorganize the board with dependency analysis, readiness assessment, and implementation ordering. Supports GitHub, Azure DevOps, and GitLab.
 tags: [board, ctx:team-only]
 quality_charter: agents/shared/quality-charter.md
 efficiency_patterns: agents/shared/efficiency-patterns.md
 cache_friendly: true
 parallel_tool_default: true
+efficiency_tier: standard
 triage_tiers: [1, 2, 3]
+supports_resume: true
 sub_agents_spawned:
-  count: 2
-  rationale: Two specialist agents per issue — Step 7.9 fans out one hatch3r-reviewer per issue plus one hatch3r-fixer per accepted finding; batch mode scales reviewer count to N issues with serialization only on the per-issue reviewer→fixer hand-off.
+  count: 11
+  rationale: Per-issue review cycle (reviewer + fixer) plus 9 CQ vector specialists (ui/ux/security/reliability/testability/scalability/performance/maintainability/enhancability) consulted on readiness assessment so that issue acceptance criteria encode the measurable floors the implementation must meet; batch mode scales reviewer count to N issues with serialization only on the per-issue reviewer→fixer hand-off. Cost-dominance per CONSTITUTION §2 P8 — token cost never serializes independent work.
 ---
 ## §0 Detect Ambiguity (P8 B1)
-Before any action, scan the user's request and provided context for unresolved questions in scope, acceptance criteria, irreversibility, or constraint conflicts (contradictory inputs, missing target, unknown convention). If any are found, ask the user via the platform-native question tool per `agents/shared/user-question-protocol.md` — do not proceed under silent assumption. This is the default path, not an exception. Acceptable to proceed without asking ONLY when scope is single-target, single-concern, and the brief alone is testable. Any residual ambiguity discovered mid-workflow invokes the same protocol.
+> Orchestration boilerplate: see `commands/shared/orchestration-frame.md` → §0 Detect Ambiguity (P8 B1). Triggers: contradictory inputs, missing target, unknown convention.
 ## Agent Pipeline
@@ -27,6 +29,8 @@ Before any action, scan the user's request and provided context for unresolved q
 | 2. Issue Creation | Orchestrator (inline, GitHub MCP) | No | Yes |
 | 3. Board Sync | Orchestrator (Projects v2 sync) | No | Yes |
+**Parallel-safety conditions** (per `rules/hatch3r-agent-orchestration.md` §Parallel Safety): every parallel fan-out (Step 1 explore sub-agents, the Step 7.9 per-issue reviewer/fixer loops where N issues = N parallel loops) holds all three — read-only or disjoint writes, deterministic aggregation, no shared mutable state.
 All issue operations MUST follow the Board Sync Enforcement rules defined in `hatch3r-board-shared`.
 # Board Fill -- Create Epics & Issues from todo.md + Board Reorganization
@@ -58,6 +62,14 @@ GitHub Agentic Workflows and hatch3r are complementary: use agentic workflows fo
 Follow the **Token-Saving Directives** in `hatch3r-board-shared`.
+## Confidence Propagation Contract
+Every sub-agent delegation prompt in this command (the Step 7.9 reviewer/fixer loops and any Step 1 explore sub-agents) MUST include the confidence expression requirement below (verbatim). Sub-agents are invoked with the `quality_charter: agents/shared/quality-charter.md` reference in their frontmatter, but the orchestrator repeats the directive to override runtime prompt defaults per the charter §1 rule.
+> Confidence expression requirement: rate every recommendation and finding as high/medium/low confidence per the quality charter (`agents/shared/quality-charter.md`). High = verified against current code. Medium = pattern-based, not fully verified. Low = best judgment, recommend human review.
+Downstream propagation: every readiness-assessment verdict (Step 5.6), every production-readiness reviewer verdict (Step 7.9b), and every flagged AI-inferred acceptance criterion MUST carry a high/medium/low confidence rating sourced from the upstream sub-agent. Dropping the signal between stages is a gate failure.
 ---
 ## Workflow
@@ -74,13 +86,36 @@ Classify the board-fill request before delegating:
 If Tier 1, run only Steps 1–3, 5–6, and 7 (skip the heavy production-readiness review). If Tier 2, run the standard pipeline below. If Tier 3, run the full pipeline including Steps 5.5, 5.6, and 7.9, and confirm batch composition with the user before executing.
+### Step 0.5: Emit Pre-Execution Cost Preview
+Before the first sub-agent dispatch (Step 7.9 reviewer/fixer loops, or Step 1 explore sub-agents when application context is needed), surface the cost preview so a large board-fill batch is never started blind. Emit the `cost_estimate` block per `rules/hatch3r-cost-visibility.md` Pre-Execution Estimate, calibrated to the Step 0 triage tier and the item count selected in Step 1:
+```yaml
+cost_estimate:
+  expected_sa_count: <triage tier → Tier 1 ~0 (no Step 7.9 review), Tier 2 ~2 per issue, Tier 3 up to 11 × issue count>
+  estimated_input_tokens_static_frame: <int>
+  estimated_web_research_queries: <int>
+  triage_tier: light | standard | deep
+  estimated_duration_min: <int>
+```
+The Step 2.5 triage interview and the per-step ASK checkpoints are user-driven and excluded from the duration estimate. Post-execution actuals + delta land in the end-of-run iteration summary's Fan-out + Cost section per `rules/hatch3r-cost-visibility.md` Post-Execution Actuals. Token telemetry sources from `src/pipeline/observability.ts`.
+### Effort Override (Decision 17)
+Auto-tiering can misclassify — a 3-item todo scored as Tier 3, or a 20-item greenfield batch scored as Tier 1. The user override is the recovery path mandated by hatch3r's universal `--effort` override contract ("User overridable via `--effort` flag"):
+- `--effort=light|standard|deep` forces the named tier, bypassing the Step 0 auto-classification (which gates whether Steps 5.5/5.6/7.9 run).
+- The override wins over the auto-detected tier; record both the auto-detected tier and the override in the run context so the Cost estimate block reports the budget delta.
+- No override passed → the Step 0 auto-classification stands.
 ---
 ### Step 1: Read and Parse todo.md
 1. Read `todo.md` at the project root.
-2. Parse each non-empty line as a separate todo item. Strip leading `- ` or `* ` markers.
-3. Present the parsed list numbered.
+2. Parse the file with the **Todo Grammar** in `hatch3r-board-shared` (the single source of truth the `hatch3r-roadmap` and `hatch3r-project-spec` producers emit). Apply its parse rules in order: skip `## P{N}` / `## Future Ideas` headers (recording each as the priority default for items beneath it — not a todo item), strip the leading `- `/`* ` list marker plus any `[ ]`/`[x]` checkbox token, extract the `**[BIZ|TECH|BOTH]**` scope tag out of the title (feed it to Step 3 classification), split the bold title from the description, and capture a trailing `Ref: {path}.` as a documentation reference. Lines that do not match the todo-item form are skipped with a one-line note so the user can fix malformed entries.
+3. Present the parsed list numbered. Show each item's stripped title, its carried `priority:p{N}` default (from the enclosing header), and its `[BIZ|TECH|BOTH]` tag so the user can confirm the parse before processing.
 **ASK:** "Here are the items I found in todo.md. Which items should I process? (all / specific numbers / exclude specific numbers)"
@@ -169,7 +204,7 @@ Before classifying items, extract the information needed to produce genuinely re
 If a product vision was loaded in Step 1a, apply these modifications to the triage process:
 1. **Derive context from the vision first.** For each triage dimension (Scope, Value, Unknowns, Size, Stakeholder), check whether the product vision document provides clear answers. Extract relevant goals, user segments, success metrics, and stated priorities from the vision.
-2. **Reduce questioning.** Only ask triage questions for dimensions that the vision does not clearly address. If the vision states the target user, value proposition, and scope boundaries for an item's domain, mark those dimensions as "Clear (from vision)" and skip the corresponding questions.
+2. **Reduce questioning.** Only ask triage questions for dimensions that the vision does not address with a stated answer. If the vision states the target user, value proposition, and scope boundaries for an item's domain, mark those dimensions as "Clear (from vision)" and skip the corresponding questions.
 3. **Reference vision goals explicitly.** When classifying items or asking remaining triage questions, cite the specific vision goal or statement that informs the classification (e.g., "Per the product vision goal 'Zero-config onboarding', this item aligns with...").
 4. **Greenfield streamlining.** For greenfield projects where the vision covers all major work areas, batch all items into a single triage pass. Present the vision-derived context for each item and ask the user to confirm or override in one prompt rather than per-item questioning.
 5. **Tier-3 greenfield gate (P8 B2).** Vision-aware triage skips a readiness dimension ONLY when the product vision explicitly addresses it; otherwise mark the dimension `unresolved — needs user input` and ask via the platform-native question tool. If readiness gate (Step 5.6) reports unresolved external blockers, re-run triage per dimension.
@@ -255,7 +290,7 @@ If the user declines to answer a dimension ("skip", "not sure", "figure it out")
 ### Step 3: Issue Type & Executor Classification
-For each remaining item, classify across all dimensions using the mapping tables below **combined with Triage Context from Step 2.5**. Triage answers take precedence over keyword heuristics when they conflict.
+For each remaining item, classify across all dimensions using the mapping tables below **combined with Triage Context from Step 2.5** and the `[BIZ|TECH|BOTH]` scope tag extracted in Step 1. Triage answers take precedence over keyword heuristics when they conflict. The scope tag is a soft signal: `TECH` leans `executor:agent`/`executor:hybrid` and a technical `area:*`; `BIZ` leans `executor:human`/`executor:hybrid` and surfaces stakeholder/value context; `BOTH` flags cross-cutting work for decomposition review in Step 5c. Triage answers and the keyword tables override the tag when they conflict.
 **Type:**
@@ -322,7 +357,7 @@ If a product vision was loaded in Step 1a:
 1. Include the product vision as **primary context** for issue scoping. Vision-stated goals, success metrics, target users, and scope boundaries take precedence over inferred context when drafting issue bodies and acceptance criteria.
 2. Map each todo item to the specific vision goal(s) it supports. Include this mapping in the Context Summary output so that Steps 5 and 6 can reference it.
-3. Flag any todo items that do not clearly map to a stated vision goal — these may represent scope creep or infrastructure work that supports the vision indirectly.
+3. Flag any todo items that do not map to a stated vision goal — these may represent scope creep or infrastructure work that supports the vision indirectly.
 #### Output
@@ -360,7 +395,7 @@ Scan ALL existing standalone issues on the board (from the Step 1.5 board scan).
 4. **Singleton promotion.** Apply the same singleton promotion rule as 5a.3 — propose a thematic epic for isolated standalones.
 5. **Catch-all.** Remaining standalones go into the "General Improvements" epic per 5a.4.
-Present regrouping proposals clearly separated from new-item grouping, so the user can confirm or reject existing issue regrouping independently.
+Present regrouping proposals in a section separate from new-item grouping, so the user can confirm or reject existing issue regrouping independently.
 #### 5c. Decomposition Check
@@ -706,7 +741,9 @@ For issues where the reviewer returned `REQUEST CHANGES`:
 #### 7.9d. Loop Termination
-Per-issue loop, max 4 iterations. Loop semantics mirror `src/pipeline/reviewLoop.ts`:
+Per-issue loop, max 4 iterations (spec-class cap). Loop semantics mirror `src/pipeline/reviewLoop.ts`.
+> **Iteration-cap rationale (D10-SA10.7-F10.7.7).** This spec-class loop caps at 4 (matching `DEFAULT_MAX_REVIEW_ITERATIONS` in `src/pipeline/reviewLoop.ts`) because issue-spec reviews converge more slowly and deterministically — they refine text against the production-readiness checklist with no runtime regressions feeding back. Code-class loops (`hatch3r-workflow` Phase 4a, `hatch3r-board-pickup` Step 7a review loop, `hatch3r-quick-change` Step 6a) cap at 3 because code reviews diverge faster (a fix can spawn a regression the next iteration must catch). Expected convergence is 1–2 iterations; the cap is the divergence backstop, not the target.
 **Clean termination** (issue passes review):
 - Reviewer returns `APPROVE`, AND
@@ -774,6 +811,55 @@ If yes, edit `todo.md` to remove lines for created issues. Preserve skipped/excl
 ---
+## Per-Turn Pipeline-State Header (Bypass Protection)
+> Orchestration boilerplate: see `commands/shared/orchestration-frame.md` → Per-Turn Pipeline-State Header. Phase mapping for board-fill: `1` = parse + board scan + triage (Steps 0–2.5), `2` = classification + context + grouping + dependency + readiness (Steps 3–5.6), `3` = create/update + production-readiness review (Steps 6–7.9, reviewer/fixer sub-agent loops), `4` = reconciliation + dashboard + cleanup + summary (Steps 7.8–8). Tier 1 runs are exempt per the Tier 1 exemption.
+## End-of-Turn Delegation Attestation (Bypass Protection)
+> Orchestration boilerplate: see `commands/shared/orchestration-frame.md` → End-of-Turn Delegation Attestation. Per-command mutated-file slot: `todo.md` edits in Step 8. Issue-body mutations applied after a reviewer/fixer verdict (Step 7.9c) cite the spawning sub-agent invocation.
+---
+## Resumability (Decision 27/30)
+board-fill is long-running — a Tier 3 batch can span 15+ items with per-issue reviewer/fixer loops (Step 7.9) and dozens of platform mutations (Step 7). Per hatch3r's workspace-checkpointed resumability contract, checkpoint progress so an interrupted run re-enters at the last completed phase rather than re-creating issues.
+> Orchestration boilerplate: see `commands/shared/orchestration-frame.md` → Checkpoint Contract. Per-command slots: workspace `.board-fill-workspace/`; step range the command's step progression; `wave` = batch index when items are processed in dependency levels; snapshot/rollback paths the command's output paths. Write points: after each ASK checkpoint is confirmed and after each Step 7 mutation phase (epics created, sub-issues linked, board synced) so created-issue numbers and `link_results` survive a crash and are not re-created on resume.
+---
+## Iteration Summary (mandatory output)
+Emit the canonical 9-section iteration summary per `rules/hatch3r-iteration-summary.md` as the final user-facing output. The validation gate at `.claude/rules/capability-lifecycle.md` blocks SUCCESS declarations without this block (CONSTITUTION §6 Decision 23).
+The 9 sections:
+1. **Request** — verbatim restatement of the user's ask in one sentence.
+2. **Fan-out + Cost** — `sub_agents_spawned: { count, rationale }` plus the `cost_estimate` / `cost_actuals` / `delta` blocks (see Cost Visibility below).
+3. **Web Research** — every URL fetched with access date + trust tier per `agents/shared/rigor-contract.md` (0 acceptable when no research was needed).
+4. **Files Mutated** — list with diff summary (lines added / removed / files created).
+5. **Gates Passed / Failed** — explicit list per `.claude/rules/capability-lifecycle.md` Gate Checklist.
+6. **Pillar Impact Attribution** — `progress_toward_pillar: <axis>.<pillar_id>+<delta>` per CONSTITUTION §6 Decision 17.
+7. **Verification Commands** — exact commands run with exit codes plus key output lines (≤200 chars).
+8. **Open Questions / Blockers** — explicit `None` if fully closed.
+9. **Learnings Captured** — IDs of any learnings written to `.hatch3r/learnings/` this run per `rules/hatch3r-learning-system.md`.
+### Cost Visibility (Decision 24)
+> Orchestration boilerplate: see `commands/shared/orchestration-frame.md` → Cost Estimate for the 5-field `cost_estimate` schema and the post-execution `cost_actuals` + `delta` contract; both land in Section 2 above.
+## Cost estimate (Decision 24)
+This command emits cost transparency per `rules/hatch3r-cost-visibility.md` and CONSTITUTION §6 Decision 24/29:
+- **Pre-execution `cost_estimate`** — emitted in Step 0.5 before the first sub-agent dispatch (Step 7.9 reviewer/fixer loops or Step 1 explore sub-agents).
+- **Post-execution `cost_actuals` + `delta`** — appended to the end-of-run iteration summary's Fan-out + Cost section per `rules/hatch3r-iteration-summary.md` §2.
+Per-tier `expected_sa_count` calibration (from frontmatter `sub_agents_spawned.count: 11` × tier heuristic in `rules/hatch3r-cost-visibility.md` Pre-Execution Estimate): Tier 1 ≈ 0 (Step 7.9 production-readiness review skipped); Tier 2 ≈ 2 per issue (reviewer + fixer); Tier 3 up to 11 per issue (reviewer/fixer plus the 9 CQ vector specialists consulted on readiness), scaling with issue count. Deltas beyond 25% absolute value carry `flagged_for_review: true`. Token telemetry sources from `src/pipeline/observability.ts`; estimation primitives from `src/pipeline/costEstimator.ts`.
+---
 ## Error Handling
 - **Search failure** (GitHub `search_issues` / Azure DevOps `az boards query` / GitLab `glab issue list --search`): retry once, then warn and proceed without dedup.