npm - hatch3r - Versions diffs - 1.9.0 → 2.0.0 - Mend

hatch3r 1.9.0 → 2.0.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (288) hide show

package/dist/content/commands/shared/orchestration-frame.md ADDED Viewed

@@ -0,0 +1,119 @@
+---
+id: hatch3r-orchestration-frame
+type: shared-context
+description: Single source of truth for cross-cutting orchestrator-command boilerplate — checkpoint contract, cost-estimate block, and Per-Turn Pipeline-State Header. Cited by long-running commands via a one-line pointer instead of restating the blocks.
+tags: [orchestration, reference]
+quality_charter: agents/shared/quality-charter.md
+cache_friendly: true
+---
+# Orchestration Frame (shared command boilerplate)
+> Last updated: 2026-06-09
+> Pillars: P4 (Lean Coverage, primary — kills the ~30-file restatement of these six blocks), P7 (Speed & Token Efficiency, supporting — static cacheable frame).
+Six cross-cutting blocks recur near-verbatim across the `commands/hatch3r-*.md` orchestrators (§0 Detect Ambiguity ×30, Confidence Propagation Contract ×26, checkpoint contract ×28, Per-Turn Pipeline-State Header ×29, End-of-Turn Delegation Attestation ×30, `cost_estimate` block ×30 at the D22-4 measurement). This file is their single source of truth. A command cites the block it needs with a one-line pointer and supplies only its per-command slots (ambiguity triggers, workspace directory, step range, doc directories, phase mapping, mutated-file list). The authoritative rule for each block is named in its section; this frame is the command-facing restatement, not a competing definition.
+Citation template (drop into the command where the block used to live):
+```
+> Orchestration boilerplate: see `commands/shared/orchestration-frame.md` → {§0 Detect Ambiguity | Confidence Propagation Contract | Checkpoint Contract | Cost Estimate | Per-Turn Pipeline-State Header | End-of-Turn Delegation Attestation}. Per-command slot: <the one varying detail — trigger list, workspace dir, phase mapping, mutated-file list, …>.
+```
+`<…>` slots below are the only text a command varies; everything outside them is invariant and lives here.
+---
+## §0 Detect Ambiguity (P8 B1)
+Authoritative rule: `rules/hatch3r-clarification-default.md` (B1 directive); framework-dev mirror `.claude/rules/clarification-default.md`. This is the orchestrator-context body — commands run in the main conversation, so they invoke the platform-native question tool directly (unlike Task-tool sub-agents, which return `BLOCKED_AMBIGUITY` per `agents/shared/clarification-default-block.md`).
+Before any action, scan the user's request and provided context for unresolved questions in scope, acceptance criteria, irreversibility, or constraint conflicts. If any are found, ask the user via the platform-native question tool per `agents/shared/user-question-protocol.md` — do not proceed under silent assumption. This is the default path, not an exception. Acceptable to proceed without asking ONLY when scope is single-target, single-concern, and the brief alone is testable. Any residual ambiguity discovered mid-workflow invokes the same protocol.
+Per-command slot: an optional one-line trigger list naming the command's domain-specific ambiguities (e.g. for `hatch3r-auth-scaffold`: "which flow(s) to scaffold, the OIDC issuer, public vs confidential client"). The inline trigger line at the citation site is the single source of truth for that command's triggers; this frame keeps no parallel table.
+---
+## Confidence Propagation Contract
+Authoritative rule: `agents/shared/quality-charter.md` §1 (confidence expression). Every sub-agent delegation prompt in a command MUST include the confidence expression requirement below verbatim. Sub-agents carry the `quality_charter` reference in frontmatter, but the orchestrator repeats the directive to override runtime prompt defaults per charter §1.
+> Confidence expression requirement: rate every recommendation and finding as high/medium/low confidence per the quality charter (`agents/shared/quality-charter.md`). High = verified against current code. Medium = pattern-based, not fully verified. Low = best judgment, recommend human review.
+Downstream propagation: every ASK checkpoint that reports verification quality, every gate that evaluates a sub-agent verdict, and every output block that surfaces `<readiness-kind>` readiness MUST carry a high/medium/low confidence rating sourced from the upstream sub-agent. Dropping the signal between stages is a gate failure.
+Per-command slot: `<readiness-kind>` (e.g. plan / spec / merge / map / fix readiness) plus any command-specific propagation points (statistical-significance verdicts, severity classifications, market-research caveats) that carry the signal.
+---
+## Effort Override (Decision 17)
+Authoritative contract: hatch3r's universal `--effort` override ("User overridable via `--effort` flag", CONSTITUTION §6 Decision 17). Auto-tiering can misclassify (a single-module change scored Deep, or a cross-cutting one scored Light); the override is the recovery path. The invariant body:
+- `--effort=light|standard|deep` forces the named tier, bypassing the command's Step 0 auto-classification.
+- The override wins over the auto-detected tier; record both the auto-detected tier and the override in the run context so the Cost Estimate block reports the budget delta.
+- No override passed → the auto-classification stands.
+Per-command slot: a one-line misclassification example in the command's own domain (e.g. "a single-endpoint doc tweak scored as Deep").
+---
+## Checkpoint Contract
+Authoritative module: `src/pipeline/checkpoint.ts`. Restated here for long-running planning commands (feature-plan, bug-plan, test-plan, migration-plan, refactor-plan, and peers) so an interrupted run re-enters at the last completed step instead of re-running its full fan-out.
+1. **Workspace + file:** write `<workspace-dir>/checkpoint.json` via `writeCheckpoint()` (atomic temp+rename through `src/merge/safeWrite.ts`; a SIGKILL mid-write leaves the prior checkpoint or no file, never a partial record). Schema (`schemaVersion: 1`): `phase` (the `<step-range>` progression), `wave` (`<wave-semantics>`, e.g. researcher-batch index across the parallel modes), `status` (`in-progress` | `passed` | `failed`), and `meta` `{ baselineSha, lastPassedGateN, registrySha, timestamp, <slug-or-version-fields> }`.
+2. **Write points:** after each milestone the command declares — context lock, scope ASK, each fan-out batch return, each synthesis ASK confirmation, each file write under `<doc-dirs>`, and the optional chain-to-`hatch3r-board-fill` handoff — so already-generated artifacts survive a crash and are not regenerated on resume. Commands list their own ordered write points.
+3. **`--resume` invocation:** `<command-name> --resume` calls `readCheckpoint()` then `verifyResumability(workspace, currentSha)`. Baseline drift fails closed (the repo or any path under `<doc-dirs>` / `todo.md` changed since the checkpoint) — re-run from scratch or rebase to the checkpoint baseline. A `failed` status halts for operator triage before resuming.
+4. **Snapshot rollback:** pre-mutation snapshots of every path under `<doc-dirs>` and `todo.md` land in `.hatch3r/snapshots/<session-id>/`; `hatch3r rollback --session=<id>` reverts this run's writes. Diff preview precedes every file write per Decision 30.
+If `--resume` is passed with no checkpoint, `verifyResumability` returns `drift: "no checkpoint found"` — treat as a cold start.
+---
+## Cost Estimate
+Authoritative rule: `rules/hatch3r-cost-visibility.md`; primitives: `src/pipeline/observability.ts::buildCostBlock` (actuals) and `src/pipeline/costEstimator.ts` (estimate). CONSTITUTION §6 Decision 24/29.
+**Pre-execution `cost_estimate`** — emit before the first sub-agent dispatch (5-field schema):
+```yaml
+cost_estimate:
+  expected_sa_count: <int>
+  estimated_input_tokens_static_frame: <int>
+  estimated_web_research_queries: <int>      # 0 when no research is needed
+  triage_tier: light | standard | deep
+  estimated_duration_min: <int>
+```
+`expected_sa_count` is calibrated from the command's frontmatter `sub_agents_spawned.count` × the tier heuristic in `rules/hatch3r-cost-visibility.md` → Pre-Execution Estimate. Each command supplies its own per-tier numbers (e.g. `<tier1-count>` / `<tier2-count>` / `<tier3-count>`).
+**Post-execution `cost_actuals` + `delta`** — call `buildCostBlock` again with actuals; both land in the iteration summary's Fan-out + Cost section per `rules/hatch3r-iteration-summary.md` §2. Deltas beyond 25% absolute value carry `flagged_for_review: true`. Field contract + delta semantics: `rules/hatch3r-cost-visibility.md`.
+---
+## Per-Turn Pipeline-State Header
+Authoritative rule: `rules/hatch3r-agent-orchestration.md` → Per-Turn Pipeline-State Header. For Tier 2 and Tier 3 runs, emit the header at the start of every assistant turn that touches the task. Tier 1, read-only, and chat-only turns are exempt.
+```
+[hatch3r-pipeline: phase {1|2|3|4} | last: {agent} → {SUCCESS|PARTIAL|FAILED|BLOCKED|n/a} | next: {agent or "user-confirmation" or "complete"}]
+```
+Phase mapping is per-command — each command maps phases `1`–`4` onto its own steps (e.g. `<phase-mapping>`: intake/decomposition → sub-agent dispatch → synthesis → write + iteration-summary). A missing header on a tracked Tier ≥ 2 task is a self-detectable drift signal; the user may halt and re-ground.
+---
+## End-of-Turn Delegation Attestation
+Authoritative rule: `rules/hatch3r-agent-orchestration.md` → End-of-Turn Delegation Attestation. Every turn that mutated files at Tier 2 or Tier 3 emits this block immediately before the Iteration Summary, quoting verbatim each spawned sub-agent's `delegation_proof_id`:
+```
+[hatch3r-delegation-attestation]
+files_mutated_this_turn:
+  - <relative path>: via <hatch3r-agent-name> (proof: <delegation_proof_id>)
+mutating_subagent_invocations: <integer>
+inline_edits_by_orchestrator: none
+```
+Unattributable rows are a self-declared P8 B2 violation — halt and queue re-delegation next turn. The block sits beside the Iteration Summary, not inside it, preserving the iteration-summary contract verbatim.

package/dist/content/github-agents/hatch3r-docs-agent.md CHANGED Viewed

@@ -1,7 +1,7 @@
 ---
 name: hatch3r-docs-agent
 type: github-agent
-description: Technical writer who maintains specs, ADRs, and documentation
+description: 'Technical writer who maintains specs, ADRs, and documentation'
 # Simplified agent for GitHub Copilot/Codex
 tags: [devops, ctx:team-only]
 quality_charter: agents/shared/quality-charter.md
@@ -11,6 +11,26 @@ cache_friendly: true
 You are an expert technical writer for the project.
+## §0 Detect Ambiguity (P8 B1)
+Before any action, scan the brief for unresolved questions in scope, acceptance criteria, irreversibility, or constraint conflicts (which docs, whether a spec section may be restructured, which stable IDs apply). If any are found, ask via the platform-native question surface per `agents/shared/user-question-protocol.md` — for GitHub Copilot/Codex cloud agents, that surface is a PR comment or issue clarification. Do not proceed under silent assumption. This is the default path, not an exception. Acceptable to proceed without asking ONLY when scope is single-file, single-concern, and the brief alone is testable.
+### Plain-Text Fallback Template (D5-M6)
+When the runtime has no platform-native question tool (GitHub Copilot/Codex cloud agents post to a PR comment or issue body — plain Markdown), emit the question using this exact shape:
+```
+**Question:** <one-sentence question stating the choice>
+1. <Option A> — <one-line rationale or trade-off>
+2. <Option B> — <one-line rationale or trade-off>
+3. <Option C> — <one-line rationale or trade-off>
+Default if no response: <option number, e.g., 2>
+```
+Rules: 2-4 numbered options, each with a one-line trade-off; the `Default if no response:` line is mandatory and names the safest reversible choice. Do not silent-pick — if no default was emitted with the question, return `BLOCKED_AMBIGUITY` in the structured result instead of guessing.
 ## Your Role
 - You read code from `src/` and backend directories and update documentation in `docs/`.

package/dist/content/github-agents/hatch3r-lint-agent.md CHANGED Viewed

@@ -1,7 +1,7 @@
 ---
 name: hatch3r-lint-agent
 type: github-agent
-description: Code quality enforcer who fixes style, formatting, and type issues
+description: 'Code quality enforcer who fixes style, formatting, and type issues'
 # Simplified agent for GitHub Copilot/Codex
 tags: [devops, ctx:team-only]
 quality_charter: agents/shared/quality-charter.md
@@ -11,6 +11,26 @@ cache_friendly: true
 You are a code quality engineer for the project.
+## §0 Detect Ambiguity (P8 B1)
+Before any action, scan the brief for unresolved questions in scope, acceptance criteria, irreversibility, or constraint conflicts (which files or lint rulesets are in scope, whether an exported symbol may be renamed, whether a style fix risks altering behavior). If any are found, ask via the platform-native question surface per `agents/shared/user-question-protocol.md` — for GitHub Copilot/Codex cloud agents, that surface is a PR comment or issue clarification. Do not proceed under silent assumption. This is the default path, not an exception. Acceptable to proceed without asking ONLY when scope is single-file, single-concern, and the brief alone is testable.
+### Plain-Text Fallback Template (D5-M6)
+When the runtime has no platform-native question tool (GitHub Copilot/Codex cloud agents post to a PR comment or issue body — plain Markdown), emit the question using this exact shape:
+```
+**Question:** <one-sentence question stating the choice>
+1. <Option A> — <one-line rationale or trade-off>
+2. <Option B> — <one-line rationale or trade-off>
+3. <Option C> — <one-line rationale or trade-off>
+Default if no response: <option number, e.g., 2>
+```
+Rules: 2-4 numbered options, each with a one-line trade-off; the `Default if no response:` line is mandatory and names the safest reversible choice. Do not silent-pick — if no default was emitted with the question, return `BLOCKED_AMBIGUITY` in the structured result instead of guessing.
 ## Your Role
 - You fix ESLint errors, Prettier formatting, TypeScript strict mode violations, and naming convention issues.

package/dist/content/github-agents/hatch3r-security-agent.md CHANGED Viewed

@@ -1,7 +1,7 @@
 ---
 name: hatch3r-security-agent
 type: github-agent
-description: Security analyst who audits code, rules, and data flows
+description: 'Security analyst who audits code, rules, and data flows'
 # Simplified agent for GitHub Copilot/Codex
 tags: [devops, floor:security, ctx:team-only]
 quality_charter: agents/shared/quality-charter.md
@@ -11,6 +11,26 @@ cache_friendly: true
 You are an expert security analyst for the project.
+## §0 Detect Ambiguity (P8 B1)
+Before any action, scan the brief for unresolved questions in scope, acceptance criteria, irreversibility, or constraint conflicts (which threat model or trust boundary applies, whether a security rule may be loosened, which collections or endpoints are in scope). If any are found, ask via the platform-native question surface per `agents/shared/user-question-protocol.md` — for GitHub Copilot/Codex cloud agents, that surface is a PR comment or issue clarification. Do not proceed under silent assumption. This is the default path, not an exception. Acceptable to proceed without asking ONLY when scope is single-file, single-concern, and the brief alone is testable.
+### Plain-Text Fallback Template (D5-M6)
+When the runtime has no platform-native question tool (GitHub Copilot/Codex cloud agents post to a PR comment or issue body — plain Markdown), emit the question using this exact shape:
+```
+**Question:** <one-sentence question stating the choice>
+1. <Option A> — <one-line rationale or trade-off>
+2. <Option B> — <one-line rationale or trade-off>
+3. <Option C> — <one-line rationale or trade-off>
+Default if no response: <option number, e.g., 2>
+```
+Rules: 2-4 numbered options, each with a one-line trade-off; the `Default if no response:` line is mandatory and names the safest reversible choice. Do not silent-pick — if no default was emitted with the question, return `BLOCKED_AMBIGUITY` in the structured result instead of guessing.
 ## Your Role
 - You audit database security rules, API endpoints, event metadata, and data flows.

package/dist/content/github-agents/hatch3r-test-agent.md CHANGED Viewed

@@ -1,7 +1,7 @@
 ---
 name: hatch3r-test-agent
 type: github-agent
-description: QA engineer who writes and maintains tests
+description: 'QA engineer who writes and maintains tests'
 # Simplified agent for GitHub Copilot/Codex
 tags: [review, ctx:team-only]
 quality_charter: agents/shared/quality-charter.md
@@ -11,6 +11,26 @@ cache_friendly: true
 You are an expert QA engineer for the project.
+## §0 Detect Ambiguity (P8 B1)
+Before any action, scan the brief for unresolved questions in scope, acceptance criteria, irreversibility, or constraint conflicts (which behavior is under test, whether an existing test may be modified or deleted, which coverage target applies). If any are found, ask via the platform-native question surface per `agents/shared/user-question-protocol.md` — for GitHub Copilot/Codex cloud agents, that surface is a PR comment or issue clarification. Do not proceed under silent assumption. This is the default path, not an exception. Acceptable to proceed without asking ONLY when scope is single-file, single-concern, and the brief alone is testable.
+### Plain-Text Fallback Template (D5-M6)
+When the runtime has no platform-native question tool (GitHub Copilot/Codex cloud agents post to a PR comment or issue body — plain Markdown), emit the question using this exact shape:
+```
+**Question:** <one-sentence question stating the choice>
+1. <Option A> — <one-line rationale or trade-off>
+2. <Option B> — <one-line rationale or trade-off>
+3. <Option C> — <one-line rationale or trade-off>
+Default if no response: <option number, e.g., 2>
+```
+Rules: 2-4 numbered options, each with a one-line trade-off; the `Default if no response:` line is mandatory and names the safest reversible choice. Do not silent-pick — if no default was emitted with the question, return `BLOCKED_AMBIGUITY` in the structured result instead of guessing.
 ## Your Role
 - You write unit tests, integration tests, contract tests, and E2E tests.

package/dist/content/hooks/hatch3r-file-save.md CHANGED Viewed

@@ -32,4 +32,4 @@ When this hook fires, the assigned agent should:
 - **Globs**: Controlled by the `globs` frontmatter field. Adjust to match your project's source file extensions.
 - **Rule sources**: Reads from `rules/`. Rules with matching `globs` or `scope: always` are activated.
-- **Debounce**: To avoid excessive processing during rapid saves, the agent debounces with a 2-second window (configurable via `debounceMs`).
+- **Debounce**: To avoid excessive processing during rapid saves, the agent debounces with a 2-second window (configurable via `debounceMs`). Coalescing is trailing-edge — within the debounce window the most-recent save wins and the hook fires once, `debounceMs` after the last save event; intermediate saves in the window do not fire.

package/dist/content/hooks/hatch3r-pre-push.md CHANGED Viewed

@@ -1,16 +1,16 @@
 ---
-id: pre-push-security-auditor
+id: pre-push-security
 type: hook
 event: pre-push
-agent: security-auditor
+agent: security
 description: Scan for secrets and security issues before push
 tags: [floor:security]
 quality_charter: agents/shared/quality-charter.md
 cache_friendly: true
 ---
-# Hook: pre-push → security-auditor
+# Hook: pre-push → security
-Activate the security-auditor agent before pushing to scan for accidentally committed secrets, API keys, credentials, and other security-sensitive content.
+Activate the security agent before pushing to scan for accidentally committed secrets, API keys, credentials, and other security-sensitive content.
 ## Agent Behavior

package/dist/content/hooks/hatch3r-review-loop-cap.md ADDED Viewed

@@ -0,0 +1,52 @@
+---
+id: hatch3r-review-loop-cap
+type: hook
+event: review-loop-cap
+agent: reviewer
+description: Block fixer-spawn past the configured review-loop iteration ceiling via a `.review-loop.json` checkpoint
+tags: [orchestration, floor:security]
+quality_charter: agents/shared/quality-charter.md
+cache_friendly: true
+---
+# Hook: review-loop-cap → review-loop-cap-enforcer
+Activate the review-loop-cap-enforcer when the orchestrator attempts to spawn `hatch3r-fixer` after `hatch3r-reviewer` returned non-clean. The hook reads, increments, and gates a per-issue iteration counter so the Phase 3 review-fix loop cannot run unbounded.
+This hook closes the F15.2-H1 gap surfaced in cycle 10: the runtime `src/pipeline/reviewLoop.ts` carries `DEFAULT_MAX_REVIEW_ITERATIONS = 4` and `HARD_MAX_REVIEW_ITERATIONS = 10`, but no canonical hook artifact instructs adapters to materialize the cap as an enforced gate. Without this hook the bound exists in code but does not propagate into generated end-user agent setups.
+## Event Mapping
+The neutral event name `review-loop-cap` is the canonical surface. Per-adapter mappings:
+- **Claude Code:** `Stop` event, OR `PostToolUse` with `matcher: "Task"` filtered to fixer-spawn invocations.
+- **Cursor:** no native runtime gate — advisory rule only (parity with GitHub Copilot below). The iteration-count gate needs the orchestrator's per-issue `.review-loop.json` counter context, which no Cursor hook payload carries; Cursor's real pre-tool event is `preToolUse` (NOT `pre-tool-call`), and its payload exposes no agent-identity field (cursor.com/docs/agent/hooks, accessed 2026-06-09), so it cannot bind to a fixer-spawn. Materialized as the `.cursor/rules/hatch3r-hook-hatch3r-review-loop-cap.mdc` advisory rule the Cursor adapter emits for every canonical hook.
+- **GitHub Copilot:** no native equivalent — emitted as an advisory rule comment instead of a runtime gate.
+## Agent Behavior
+When this hook fires, the assigned agent should:
+1. Identify the active issue or task ID from orchestrator context. If absent, derive a stable key from the current branch + first changed file path (deterministic hash).
+2. Read the per-issue checkpoint at `.hatch3r/review-loop/<issue-key>.review-loop.json`. If the file is absent, create it with `{ "iteration": 0, "createdAt": "<ISO 8601 UTC>", "maxIterations": <configured> }`.
+3. Increment `iteration` by 1. Persist via the safe-write temp+rename pattern (`src/merge/safeWrite.ts` semantics) so a crash mid-write leaves the prior counter intact.
+4. Compare incremented `iteration` against `maxIterations`:
+   - If `iteration <= maxIterations`: emit a structured pass-through with the new counter value and exit 0. Orchestrator proceeds with fixer-spawn.
+   - If `iteration > maxIterations`: emit a structured block. Exit 2 (non-zero halts the spawn). Include the issue key, the value of `maxIterations`, the iteration counter that triggered the block, and the next-step recommendation: "Reviewer findings remain after N iterations — escalate to maintainer review or accept current state. Reset via `rm .hatch3r/review-loop/<issue-key>.review-loop.json`."
+5. On block, write a one-line audit entry to `.hatch3r/review-loop/audit.log` with timestamp, issue key, hook outcome, and the reviewer's last non-clean verdict if available in context.
+## Expected Output
+- **Pass-through:** `{ "outcome": "pass", "iteration": <N>, "maxIterations": <M>, "remaining": <M-N> }` written to stdout. Exit 0.
+- **Block:** `{ "outcome": "block", "iteration": <N>, "maxIterations": <M>, "reason": "max_iterations_exceeded", "actionable_next_step": "<one sentence>" }` written to stdout. Exit 2.
+## Configuration
+- **maxIterations:** Default 4 — held in lockstep with `src/pipeline/reviewLoop.ts::DEFAULT_MAX_REVIEW_ITERATIONS`. Override via `maxIterations` in hook config. Clamped to `[MIN_MAX_REVIEW_ITERATIONS, HARD_MAX_REVIEW_ITERATIONS]` = `[1, 10]` per the same module.
+- **checkpointDir:** Default `.hatch3r/review-loop/`. Override via `checkpointDir` for projects that namespace `.hatch3r/` differently.
+- **resetOnCleanVerdict:** Default `true`. When the reviewer returns a clean verdict, the orchestrator deletes the checkpoint so the next regression-fix run starts fresh. Set `false` to retain historical counters across reviewer passes.
+## Failure-Boundary Semantics
+The hook itself is a circuit breaker scoped to the fixer-spawn boundary. It does not classify reviewer findings, terminate the review loop on its own authority, or invoke remediation. Its single contract: `iteration > maxIterations` MUST block fixer-spawn, no exception. Adapters that cannot emit a non-zero exit at the spawn site (Cursor and GitHub Copilot — see Event Mapping) render this hook as an advisory rule instead of a runtime gate; the README surface for those adapters declares the downgrade.
+Cross-reference: `src/pipeline/reviewLoop.ts` (canonical state machine), `agents/hatch3r-implementer.md` → Review Loop Awareness (Phase 3 contract), `rules/hatch3r-agent-orchestration.md` (orchestrator delegation protocol).

package/dist/content/mcp/mcp.json CHANGED Viewed

@@ -2,10 +2,12 @@
   "mcpServers": {
     "github": {
       "_description": "GitHub repository management, code review, issues, PRs, and project boards",
+      "_trust_bypass": true,
+      "_trust_bypass_reason": "github-first-party",
       "url": "https://api.githubcopilot.com/mcp/",
       "headers": {
         "Authorization": "Bearer ${env:GITHUB_PAT}",
-        "X-MCP-Toolsets": "all"
+        "X-MCP-Toolsets": "repos,issues,pull_requests"
       }
     },
     "context7": {
@@ -26,7 +28,7 @@
     "brave-search": {
       "_description": "Web research, fact-checking, and current information retrieval",
       "command": "npx",
-      "args": ["-y", "@modelcontextprotocol/server-brave-search@0.6.2"],
+      "args": ["-y", "@brave/brave-search-mcp-server@2.0.83", "--transport", "stdio"],
       "env": {
         "BRAVE_API_KEY": "${env:BRAVE_API_KEY}"
       }
@@ -44,7 +46,7 @@
       "_disabled": true,
       "_description": "PostgreSQL database queries and schema inspection (enable and configure with your connection string)",
       "command": "npx",
-      "args": ["-y", "@modelcontextprotocol/server-postgres@0.6.2"],
+      "args": ["-y", "@henkey/postgres-mcp-server@1.0.5"],
       "env": {
         "POSTGRES_URL": "${env:POSTGRES_URL}"
       }
@@ -71,8 +73,8 @@
     "gitlab": {
       "_disabled": true,
       "_description": "GitLab issues, merge requests, pipelines, and project management",
-      "command": "npx",
-      "args": ["-y", "glab", "mcp"],
+      "command": "glab",
+      "args": ["mcp", "serve"],
       "env": {
         "GITLAB_TOKEN": "${env:GITLAB_TOKEN}"
       }

package/dist/content/rules/hatch3r-accessibility-standards.md CHANGED Viewed

@@ -2,7 +2,8 @@
 id: hatch3r-accessibility-standards
 type: rule
 description: Accessibility standards covering WCAG 2.2 AA compliance, keyboard navigation, screen readers, and ARIA patterns
-scope: "**/*.vue,**/*.jsx,**/*.tsx,**/*.svelte,**/components/**,**/*.html,**/*a11y*,**/*accessibility*"
+scope: conditional
+globs: "**/*.vue,**/*.jsx,**/*.tsx,**/*.svelte,**/components/**,**/*.html,**/*a11y*,**/*accessibility*"
 tags: [floor:ui-ux, a11y]
 precedence: high
 quality_charter: agents/shared/quality-charter.md
@@ -77,7 +78,7 @@ All user-facing features must meet WCAG 2.2 Level AA conformance. This is the ba
 - Test with a screen reader on every feature branch that modifies UI.
 - Minimum testing matrix: VoiceOver (macOS/iOS) + Chrome, NVDA + Firefox (Windows).
 - Run automated accessibility checks (axe-core, Lighthouse) in CI.
-- Automated tools catch ~30% of accessibility issues. Manual testing is required for keyboard flows, screen reader experience, and cognitive accessibility.
+- Automated tools (axe-core) catch roughly 57% of WCAG issues by volume — the remaining ~43% require a human keyboard trace, screen-reader pass, and cognitive-accessibility review (Deque Systems automated-coverage finding; https://www.deque.com/automated-accessibility-testing-coverage/ , accessed 2026-06-06). This figure matches `skills/hatch3r-browser-verify/SKILL.md` per-cycle reminder; the two artifacts state the same denominator.
 - Maintain an accessibility test checklist per component type (form, modal, navigation, data table).
 ## WCAG 2.2 New Success Criteria (Mandatory Audit Items)
@@ -100,3 +101,14 @@ Touch surfaces have stricter target and spacing requirements than pointer-only s
 - Apply `env(safe-area-inset-*)` padding on full-bleed surfaces so content clears notches, home indicators, and rounded corners on iOS and Android edge devices.
 - Support Dynamic Type (iOS) and rem-based font scaling — declare body text in `rem` or `em` units, never `px`, so OS-level font size settings cascade.
 - Zoom to 200% and 400% (per WCAG 1.4.4 and 1.4.10 Reflow) must remain functional with no horizontal scroll trap. Audit for `width: 100vw` and fixed pixel widths that break reflow.
+### Mobile-native verification path
+The touch obligations above are mandated for native UI too, but native targets (React Native, Flutter, SwiftUI/UIKit, Android Compose/View) do not render to a DOM, so the browser gates (axe-core, Lighthouse) cannot run on them. The glob on this rule (`**/*.{tsx,jsx,vue,svelte}` plus `**/components/**`) matches React-Native `.tsx`/`.jsx` components, so a native surface can trigger this rule with no browser. Branch on the detected stack and run the framework-native accessibility check instead — a native obligation without one of these checks is unverified, not satisfied:
+- **React Native:** run `eslint-plugin-react-native-a11y` for static touch-target, label, and role lint; assert touch-target size and `accessibilityLabel` presence in a `@testing-library/react-native` render test.
+- **Flutter:** run the `flutter_test` semantics tester with `meets_guideline` matchers — `androidTapTargetGuideline()` (48dp), `iOSTapTargetGuideline()` (44pt), `labeledTapTargetGuideline()`, and `textContrastGuideline()` — which assert the same target and contrast bounds without a browser.
+- **Android (Compose/View):** enable Espresso `AccessibilityChecks.enable()` in instrumented tests so every interaction is audited for target size and content labels; run Accessibility Scanner once per release on the built APK for spacing and contrast.
+- **iOS (SwiftUI/UIKit):** run an XCUITest `app.performAccessibilityAudit()` (`for: [.dynamicType, .hitRegion, .contrast]`) so target size, Dynamic Type scaling, and contrast are checked on-device.
+When no native a11y harness is wired for the detected stack, the gate is `BLOCKED_MISSING_TOOL` (per `agents/hatch3r-implementer.md` Step 5c) — it never becomes an unmeasured `PASS`; the orchestrator wires the harness or downgrades scope. Stack-specific patterns and the test-harness setup live in `rules/hatch3r-android-patterns.md`, `rules/hatch3r-swiftui-patterns.md`, and `rules/hatch3r-flutter-patterns.md`.

package/dist/content/rules/hatch3r-accessibility-standards.mdc CHANGED Viewed

@@ -73,7 +73,7 @@ All user-facing features must meet WCAG 2.2 Level AA conformance. This is the ba
 - Test with a screen reader on every feature branch that modifies UI.
 - Minimum testing matrix: VoiceOver (macOS/iOS) + Chrome, NVDA + Firefox (Windows).
 - Run automated accessibility checks (axe-core, Lighthouse) in CI.
-- Automated tools catch ~30% of accessibility issues. Manual testing is required for keyboard flows, screen reader experience, and cognitive accessibility.
+- Automated tools (axe-core) catch roughly 57% of WCAG issues by volume — the remaining ~43% require a human keyboard trace, screen-reader pass, and cognitive-accessibility review (Deque Systems automated-coverage finding; https://www.deque.com/automated-accessibility-testing-coverage/ , accessed 2026-06-06). This figure matches `skills/hatch3r-browser-verify/SKILL.md` per-cycle reminder; the two artifacts state the same denominator.
 - Maintain an accessibility test checklist per component type (form, modal, navigation, data table).
 ## WCAG 2.2 New Success Criteria (Mandatory Audit Items)
@@ -96,3 +96,14 @@ Touch surfaces have stricter target and spacing requirements than pointer-only s
 - Apply `env(safe-area-inset-*)` padding on full-bleed surfaces so content clears notches, home indicators, and rounded corners on iOS and Android edge devices.
 - Support Dynamic Type (iOS) and rem-based font scaling — declare body text in `rem` or `em` units, never `px`, so OS-level font size settings cascade.
 - Zoom to 200% and 400% (per WCAG 1.4.4 and 1.4.10 Reflow) must remain functional with no horizontal scroll trap. Audit for `width: 100vw` and fixed pixel widths that break reflow.
+### Mobile-native verification path
+The touch obligations above are mandated for native UI too, but native targets (React Native, Flutter, SwiftUI/UIKit, Android Compose/View) do not render to a DOM, so the browser gates (axe-core, Lighthouse) cannot run on them. The glob on this rule (`**/*.{tsx,jsx,vue,svelte}` plus `**/components/**`) matches React-Native `.tsx`/`.jsx` components, so a native surface can trigger this rule with no browser. Branch on the detected stack and run the framework-native accessibility check instead — a native obligation without one of these checks is unverified, not satisfied:
+- **React Native:** run `eslint-plugin-react-native-a11y` for static touch-target, label, and role lint; assert touch-target size and `accessibilityLabel` presence in a `@testing-library/react-native` render test.
+- **Flutter:** run the `flutter_test` semantics tester with `meets_guideline` matchers — `androidTapTargetGuideline()` (48dp), `iOSTapTargetGuideline()` (44pt), `labeledTapTargetGuideline()`, and `textContrastGuideline()` — which assert the same target and contrast bounds without a browser.
+- **Android (Compose/View):** enable Espresso `AccessibilityChecks.enable()` in instrumented tests so every interaction is audited for target size and content labels; run Accessibility Scanner once per release on the built APK for spacing and contrast.
+- **iOS (SwiftUI/UIKit):** run an XCUITest `app.performAccessibilityAudit()` (`for: [.dynamicType, .hitRegion, .contrast]`) so target size, Dynamic Type scaling, and contrast are checked on-device.
+When no native a11y harness is wired for the detected stack, the gate is `BLOCKED_MISSING_TOOL` (per `agents/hatch3r-implementer.md` Step 5c) — it never becomes an unmeasured `PASS`; the orchestrator wires the harness or downgrades scope. Stack-specific patterns and the test-harness setup live in `rules/hatch3r-android-patterns.md`, `rules/hatch3r-swiftui-patterns.md`, and `rules/hatch3r-flutter-patterns.md`.

package/dist/content/rules/hatch3r-agent-orchestration-detail.md CHANGED Viewed

@@ -5,7 +5,7 @@ description: Extended orchestration reference — PipelineContext schemas, resil
 scope: conditional
 globs: "**/.hatch3r/**,**/pipeline/**,**/*orchestrat*,**/*agent*"
 tags: [orchestration, floor:protocol]
-precedence: normal
+precedence: high
 quality_charter: agents/shared/quality-charter.md
 cache_friendly: true
 detail_rule: true
@@ -24,7 +24,11 @@ PipelineContext {
   correlationId: string          // UUID v4, generated before Phase 1
   taskType: "bug" | "feature" | "refactor" | "qa"
   issueRef: string | null        // Issue number or null for plain chat
-  deepContextTier: 1 | 2 | 3    // From hatch3r-deep-context scoring
+  deepContextTier: 1 | 2 | 3    // Pre-Phase-1 baseline from hatch3r-deep-context scoring
+  // Mid-run tier upgrade (Finding D7-14): set when execution surfaces complexity
+  // the baseline missed (see Complexity-Driven Adaptation + Tier-upgrade propagation).
+  tierUpgrade?: { from: 1|2|3; to: 1|2|3; reason: string; atPhase: 1|2|3|4 }
   // Detected project type for specialist selection (Finding #56)
   projectType?: {
@@ -58,7 +62,7 @@ PipelineContext {
   // Phase 3 outputs (Review)
   reviewResult: {
-    iterations: number           // 1-3
+    iterations: number           // 1 to code-class cap (DEFAULT_MAX_REVIEW_ITERATIONS - 1 = 3)
     finalVerdict: "CLEAN" | "UNRESOLVED"
     findings: ReviewFinding[]
     confirmationPassResult: "PASS" | "FAIL"
@@ -94,24 +98,24 @@ The TypeScript implementation of this schema with runtime validation is in `src/
 | Phase 1 (Research) | No relevant findings | Surface to user; ask whether to proceed with implementation. |
 | Phase 2 (Implementation) | Build/test failure | Attempt self-fix (max 2 iterations). Escalate to user if unresolved. |
 | Phase 2 (Implementation) | Scope creep detected | Halt. Surface deviation to user. Resume only with approval. |
-| Phase 3 (Review) | Max iterations (3) | Surface unresolved findings to user. Do not merge. |
+| Phase 3 (Review) | Max iterations (3) (code-class cap = `DEFAULT_MAX_REVIEW_ITERATIONS` - 1) | Surface unresolved findings to user. Do not merge. |
 | Phase 3 (Review) | DESIGN_OBJECTION verdict | Terminate review loop immediately. Surface the objection and alternative approaches to the user for an architectural decision. Do not spawn fixer. |
 | Phase 3 (Review) | Fixer introduces regressions | Revert fixer changes. Surface original findings + regression to user. |
-| Phase 4 (Quality) | Specialist timeout | Log timeout. Continue with available results. Flag in output. |
+| Phase 4 (Quality) | Conditional-specialist timeout | Log timeout. Continue with available results. Flag in output. |
+| Phase 4 (Quality) | Mandatory CQ3/CQ5 specialist non-completion (TIMEOUT / crash / no-output) | Fail closed. Surface `BLOCKED`. A mandatory always-mode specialist (`hatch3r-security` CQ3, `hatch3r-testability` CQ5 per `hatch3r-agent-orchestration.md` Phase 4 Specialist Trigger Table) that does not return `COMPLETE` leaves its gate absent; absence-of-finding is NOT an implicit pass. This row is the prose face of the typed gate `evaluatePhase4Completion` (`src/pipeline/pipelineContext.ts`): a non-SUCCESS floor specialist yields `mandatoryFloorsSatisfied: false` → `complete: false`. Do not merge/release. Require an explicit operator decision (re-run, accept-risk, or abort) before advancing. |
 | Phase 4 (Quality) | Validation pass fails | Spawn fixer (max 2 attempts). Surface if unresolved. |
 ### Subagent Error Recovery
-1. **Timeout:** Forward partial output. Mark status `TIMEOUT`. Continue pipeline.
-2. **Crash/no output:** Mark status `FAILED`. Log reason. Continue if non-blocking.
+1. **Timeout:** Forward partial output. Mark status `TIMEOUT`. Continue pipeline ONLY for conditional specialists. A mandatory always-mode CQ3/CQ5 specialist on TIMEOUT fails closed — surface `BLOCKED`, never treat the missing gate as a pass.
+2. **Crash/no output:** Mark status `FAILED`. Log reason. Continue if non-blocking. A mandatory always-mode CQ3/CQ5 specialist is blocking — a crash or no-output return fails closed (surface `BLOCKED`, explicit operator decision before merge/release), it is never "non-blocking".
 3. **Conflicting outputs:** When two specialists disagree (e.g., security vs performance), escalate to user with both positions.
-4. **Resource exhaustion:** If context window is exhausted, summarize prior context and continue with summary.
+4. **Resource exhaustion:** Apply the Context-Degradation Policy (this file) — compress at `>50%` window, restart at `>75%`.
 ### Retry Policies
-- Subagent retries: 0 (spawn a new agent with adjusted prompt instead).
-- Phase retries: Phase 3 review loop retries up to 3 iterations. All other phases: 0 retries (escalate to user).
-- Never retry the same failed operation identically — adjust the prompt or approach.
+- Subagent retries: 0 — never retry the same failed operation identically; spawn a new agent with an adjusted prompt/approach instead.
+- Phase retries: Phase 3 review loop retries up to 3 iterations (code-class cap = `DEFAULT_MAX_REVIEW_ITERATIONS` - 1). All other phases: 0 retries (escalate to user).
 ## Observability Integration
@@ -169,7 +173,7 @@ After each phase, validate that the output conforms to the expected PipelineCont
 Auto-mode MUST halt and surface to user when:
 1. A CRITICAL finding is detected in Phase 3.
 2. Phase 4 validation pass fails after 2 fix attempts.
-3. Any specialist reports FAILED status.
+3. Any specialist reports FAILED status, OR a mandatory always-mode CQ3/CQ5 specialist (`hatch3r-security`, `hatch3r-testability`) returns any of {FAILED, TIMEOUT, no-output} — a mandatory gate that did not return `COMPLETE` is BLOCKED, not a silent pass.
 4. Scope containment violation detected.
 5. Implementation touches more than 20 files (may indicate scope creep).
@@ -186,26 +190,61 @@ The pipeline should adapt its behavior based on observed task complexity, not ju
 | Signal During Execution | Adaptation |
 |------------------------|------------|
-| Phase 1 research finds >10 affected files (initial estimate was <5) | Upgrade tier to 3 if currently 2. Re-run `codebase-impact` at `deep` depth before Phase 2. |
+| Phase 1 research finds >10 affected files (initial estimate was <5) | Upgrade tier to 3 if currently 2. Record the upgrade in `PipelineContext.tierUpgrade` (`src/pipeline/pipelineContext.ts::TierUpgrade`: `{from, to, reason, atPhase}`) and re-run `codebase-impact` at `deep` depth before Phase 2. |
 | Phase 2 implementer reports >3 research gaps | Pause Phase 2. Run targeted researcher with gap-specific modes before continuing. |
 | Phase 3 review loop reaches iteration 2 with increasing Critical count | Classify as complexity underestimate. Surface to user with recommendation to break the task into smaller sub-tasks. |
-| Phase 4 validation pass fails on first attempt | Check whether failure is in test-writer's new tests (expected -- fix test) or in pre-existing tests (regression -- fix implementation). Route to appropriate fixer. |
+| Phase 4 validation pass fails on first attempt | Check whether failure is in hatch3r-testability's new tests (expected -- fix test) or in pre-existing tests (regression -- fix implementation). Route to appropriate fixer. |
+**Tier-upgrade propagation (Finding D7-14).** A mid-run upgrade is not just a log line — it MUST change downstream behavior. After populating `PipelineContext.tierUpgrade`, the orchestrator reads the tier for every subsequent depth decision via `resolveEffectiveTier(context)` (returns `tierUpgrade.to` when an upgrade is recorded, else `deepContextTier`), so the Tier→Phase-4-specialist-depth mapping in `hatch3r-agent-orchestration.md` (Deep Context Integration) scales to the upgraded tier instead of staying pinned to the stale baseline. The carrier only ever raises the tier (a recorded `to <= from` is ignored). Surface the upgrade once in the iteration summary via `formatTierUpgradeNote(context)` (one line, returns `null` when no upgrade occurred) so the adaptation is visible, not silent.
 ### Post-Pipeline Learning
-After pipeline completion, the orchestrator should capture lessons for future runs:
+After pipeline completion, the orchestrator captures lessons for future runs:
-1. **Tier accuracy:** Was the initial tier correct? If the pipeline needed adaptation (above), log the mismatch for the learnings system.
+1. **Tier accuracy:** Was the initial tier correct? If the pipeline needed adaptation (above), persist a tier-accuracy record (`taskId`, `initialTier`, `finalTier`, `adjustmentReasons`, `correlationId`, `ts`) to `.hatch3r/telemetry/<session-id>-tier.json` via the atomic-write path in `src/pipeline/costEstimator.ts` (sibling of `CostTelemetryRecord`). Tier mismatch beyond ±10% across 50 tasks triggers a CL-3 signal-weight recalibration proposal.
 2. **Phase duration ratios:** Record time spent per phase. Anomalous ratios (e.g., Phase 3 taking 5x Phase 2) indicate systemic issues worth investigating.
 3. **Specialist value:** Record which Phase 4 specialists produced actionable findings vs. clean reports. Over time, this data informs smarter specialist dispatch.
-## Context Token Optimization
+## Multi-Task and Concurrent Pipeline Support
+Canonical schema for the one-sentence multi-task / epic / batch handling in the orchestration rule's `Task Context Protocols`, so pack integrators have a deterministic specification (Finding D7-M13 / D7-SA7.5-3).
+**Dependency-graph construction.** Multi-task input (epic, plain-chat multi-request, or board batch) is parsed into discrete units. Each unit carries its own `correlationId` (epic sub-issues get individual IDs sharing a parent epic ID; batch tasks share one ID with a sub-task index). The orchestrator builds a directed acyclic dependency graph from declared inter-unit constraints (e.g., "issue B depends on issue A's API changes"); units with no declared dependency form the root level.
+**Per-level parallelism.** At each dependency level, the orchestrator parallelizes researchers + implementers across all units in that level subject to the three Parallel Safety conditions in the canonical rule. The parallelism width per level is bounded by the same orchestrator-honored `max_phase4_parallel` width (default `8`) the Phase 4 specialists honor — LLM-honored guidance, not a code-enforced cap (no hatch3r module reads an env var; the host Task tool is the dispatcher and applies no platform fan-out limit, per the canonical rule's Phase 4 — Final Quality).
+**Concurrent primitive — `concurrent_pipeline_unit`.** Each unit in a level is a `concurrent_pipeline_unit` record: `{ unitId: string; correlationId: string; parentEpicId?: string; level: number; dependsOn: string[]; priority: "p0"|"p1"|"p2"|"p3"; status: "pending"|"running"|"complete"|"blocked"; }`. Within a level the orchestrator dispatches by priority descending (p0 first); when concurrency limits cap the level, the in-flight pool is filled with highest-priority units first and the rest queue for the next dispatch slot.
+**File-overlap reconciliation.** When two parallel implementers in the same level touch the same file: accept disjoint-region edits without conflict; merge overlapping regions using the larger-scope change as base (the smaller change replays onto the larger); halt on semantic conflicts for user resolution. Per Parallel Safety condition 3, NO mid-pipeline writes to shared mutable state (`.hatch3r/hatch.json`, `.hatch3r/learnings/INDEX.md`) — learnings consolidation happens at pipeline completion only.
+**Review loop coordination.** After all level-N implementers complete, the orchestrator runs ONE consolidated Phase 3 review loop covering the union diff produced by the level. Per-unit Phase 4 specialist dispatch then runs in parallel bounded by `max_phase4_parallel`. Level-N+1 begins only after Level-N reaches Phase 4 completion (validated by `evaluatePhase4Completion`). Cross-pipeline concurrent invocation (two `hatch3r` commands in two shells against the same repo) is deferred per the cross-command note below + the audit CL-2 spec.
+## Pipeline Pattern (Cross-Command Consistency)
+Finding D7-M12 / D7-SA7.5-2: implementation-flavored orchestrators (`workflow`, `board-pickup`, `revision`, `quick-change`, `board-fill`) MUST follow the canonical pattern below. Per-command deviations require an explicit rationale in the command body's "Pipeline Deviations" subsection.
+| Stage | Canonical agent | Required at Tier | Carve-out |
+|-------|-----------------|------------------|-----------|
+| Phase 1 Research | `hatch3r-researcher` | T2/T3 | T1 skip per Phase Skip Criteria |
+| Phase 2 Implement | `hatch3r-implementer` | All | T1 quick-change inline carve-out only |
+| Phase 3 Review Loop | `hatch3r-reviewer` ↔ `hatch3r-fixer` (max `DEFAULT_MAX_REVIEW_ITERATIONS`) | T2/T3 nontrivial | T1 all-trivial skip per Phase Skip Criteria |
+| Phase 4 Final Quality | CQ + SSOT specialists, batched by severity, bounded by `max_phase4_parallel` | T2/T3 | T1 — only always-mode floor (`security` + `testability`) |
+| Phase 4 Validation Pass | re-run tests/typecheck vs Phase-3 baseline; re-review on specialist code mutations | T2/T3 | — |
+Cross-command error-handling defaults: sub-agent failure → retry once then fall back to direct/inline implementation per command's carve-out; quality-check failure → max 2 retry loops then ASK; context degradation → the single Context-Degradation Policy below (window-fraction primary: compress `>50%`, restart `>75%`; turn counts a coarse fallback). Concurrent-invocation handling and lockfile semantics are deferred to a future cycle pending the Decision 27 resumability work.
+## Context-Degradation Policy
+Single canonical policy for every pipeline command (reconciles the per-command turn bullets, Finding D7-24). **Window-fraction is the authoritative axis**; the per-command turn count is a coarse fallback for the same threshold at that command's pace, used only when the host surfaces no context-window percentage. Commands cite this policy, not restated numbers.
-When pipeline context exceeds 50% of the available context window, apply these compression strategies in order:
+| Window fraction (primary) | Action | Turn-count fallback (coarse) |
+|---------------------------|--------|------------------------------|
+| `> 50%` | Compress: apply the numbered strategies below in order. | implementation/review ≈ 25 turns; quick-change ≈ 15 (fast-completion scope); debug ≈ 20 |
+| `> 75%` | Restart: suggest a fresh chat / batch split carrying a progress summary of completed + remaining work; a fresh-context command (`hatch3r-revision`) just re-runs. | ≈ 1.5× the compress turn count |
 1. **Summarize Phase 1 output.** Replace full research findings with a structured summary: affected files (list), blast radius (count + top 3), key conventions (bullet points). Keep raw data only for the fields the current phase needs.
 2. **Prune resolved findings.** After Phase 3 review loop, remove findings that were fixed and confirmed. Only carry forward unresolved findings.
 3. **Collapse specialist results.** In the final output, summarize specialist results as a single status table rather than including full specialist reports. Full reports are available on request.
 4. **Never truncate security findings.** Security auditor output is always included in full regardless of context pressure.
-These strategies preserve decision-critical information while reducing token overhead for long pipelines.
+**Handoff loss measurement.** Compression is lossy, so measure it. At each phase transition the orchestrator records a `PhaseHandoffMetrics` record (`src/pipeline/observability.ts::createPhaseHandoffMetrics`) capturing input bytes, output bytes, whether summarisation was applied, and an `informationLossEstimate` (0-1 fraction of input bytes dropped). When `informationLossEstimate` exceeds `0.3`, surface the `formatPhaseHandoffWarning` line in the iteration summary so downstream phases validate that critical context survived — closing the gap where a phase silently receives a summary when it needed the full upstream output.