npm - role-os - Versions diffs - 2.1.0 → 2.2.0 - Mend

role-os 2.1.0 → 2.2.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (15) hide show

package/CHANGELOG.md +33 -0
package/README.md +40 -16
package/package.json +2 -2
package/src/artifacts.mjs +43 -0
package/src/dispatch.mjs +6 -0
package/src/evidence.mjs +9 -9
package/src/mission-run.mjs +111 -13
package/src/mission.mjs +63 -0
package/src/packs.mjs +33 -0
package/src/route.mjs +30 -0
package/src/run.mjs +5 -2
package/starter-pack/agents/engineering/audit-synthesizer.md +56 -0
package/starter-pack/agents/engineering/component-auditor.md +46 -0
package/starter-pack/agents/engineering/seam-auditor.md +46 -0
package/starter-pack/agents/engineering/test-truth-auditor.md +48 -0

package/CHANGELOG.md CHANGED Viewed

@@ -1,5 +1,38 @@
 # Changelog
+## 2.2.0
+### Added
+#### Deep Audit Mission — Runner-Native Componentized Repo Audit
+- **Deep audit mission** — 8th mission in the library. Decomposes a repo into bounded components, dispatches one auditor per component, inspects seams from the dependency graph, assesses test truth, then synthesizes into a ranked verdict and action plan.
+- **Dynamic dispatch** — missions with `dynamicDispatch` field now expand from a manifest at runtime. `createRun("deep-audit", task, { manifest })` creates N + M + K + 3 steps from the repo graph instead of a fixed static chain. A 6-component / 8-boundary repo produces 23 steps; a 10-component / 5-boundary repo produces 28.
+- **4 new audit roles** — Component Auditor, Seam Auditor, Test Truth Auditor, Audit Synthesizer. Each with full artifact contracts, tool profiles, and role definitions in starter-pack.
+- **Deep-audit pack** — 9th team pack with scaling chain order, dispatch defaults, and mismatch guards.
+- **Artifact validation at execution boundaries** — `validateArtifact()` now runs on every step completion in both `run.mjs` and `mission-run.mjs`. Validation results are attached to the step object. Warn, don't block.
+- **Proof run test suite** — `test/deep-audit-proof.test.mjs` proves the full runner-native lifecycle against the real audit-manifest.json: step creation, parcel identity, validation, escalation, partial failure, scaling formula, and report generation.
+### Fixed
+- **Critical: "approve" vs "accept" verdict mismatch** — `evidence.mjs:195` checked `!== "approve"` but the enum defines `"accept"`. Every accept verdict generated a spurious warning. Tests masked it via substring matching. Fixed to `"accept"` with hardened exact-assertion tests.
+- **Dead imports removed** — `TEAM_PACKS` and `ROLE_ARTIFACT_CONTRACTS` in mission-run.mjs, `TEAM_PACKS` in run.mjs, `scoreRole` and `MIN_SCORE_THRESHOLD` in trial.mjs were imported but never used.
+- **Warning message terminology** — all evidence warning messages now use "accept" instead of "approve" consistently.
+### Changed
+- Mission count: 7 → 8
+- Role count: 50 → 54 (4 deep audit roles)
+- Pack count: 8 → 9
+- Artifact contract count: 30 → 34 (4 new audit role contracts)
+- Test count: 905 → 936
+### Evidence
+- Self-audit dogfood: 128 findings (1 critical, 11 high, 39 medium) across 6 component parcels, 8 boundary seams, and 31 test files
+- Runner-native proof run: 23 dynamic steps from real manifest, full lifecycle, all green
+- Scaling formula verified: 2N + K + 3 holds for manifests of 3, 6, 10, and 15 components
 ## 2.1.0
 ### Added

package/README.md CHANGED Viewed

@@ -13,7 +13,7 @@
   <a href="https://mcp-tool-shop-org.github.io/role-os/"><img src="https://img.shields.io/badge/Landing_Page-live-brightgreen" alt="Landing Page"></a>
 </p>
-A multi-Claude operating system that staffs, routes, validates, and runs work through 50 specialized role contracts. Creates task packets, assembles the right team from scored role matching, detects broken chains before execution, auto-routes recovery when work is blocked or rejected, and requires structured evidence in every verdict.
+A multi-Claude operating system that staffs, routes, validates, and runs work through 54 specialized role contracts. Creates task packets, assembles the right team from scored role matching, detects broken chains before execution, auto-routes recovery when work is blocked or rejected, and requires structured evidence in every verdict. Includes dynamic dispatch for manifest-scaled missions — a 10-component repo automatically becomes 28 auditor steps, not 6.
 ## What it does
@@ -44,9 +44,9 @@ roleos start "something completely novel"
 **The fallback ladder:**
-1. **Mission** — when the task matches a proven recurring workflow (bugfix, treatment, feature-ship, docs, security, research). Known role chain, artifact flow, escalation branches, and honest-partial definitions.
-2. **Pack** — when the task is a known family but not a full mission shape. 7 calibrated team packs with auto-selection and mismatch guards.
-3. **Free routing** — when the task is novel, mixed, or uncertain. Scores all 31 roles against packet content and assembles a dynamic chain.
+1. **Mission** — when the task matches a proven recurring workflow (bugfix, treatment, feature-ship, docs, security, research, brainstorm, deep-audit). Known role chain, artifact flow, escalation branches, and honest-partial definitions.
+2. **Pack** — when the task is a known family but not a full mission shape. 9 calibrated team packs with auto-selection and mismatch guards.
+3. **Free routing** — when the task is novel, mixed, or uncertain. Scores all 54 roles against packet content and assembles a dynamic chain.
 The system never forces work through the wrong abstraction. It explains why it chose each level and offers alternatives.
@@ -103,7 +103,7 @@ Full treatment is a canonical 7-phase protocol defined in Claude project memory
 Order: Shipcheck first, then full treatment. No v1.0.0 without passing hard gates.
-## 50 roles across 8 packs
+## 54 roles across 9 packs
 | Pack | Roles |
 |------|-------|
@@ -115,6 +115,7 @@ Order: Shipcheck first, then full treatment. No v1.0.0 without passing hard gate
 | **Product** (3) | Feedback Synthesizer, Roadmap Prioritizer, Spec Writer |
 | **Research** (4) | UX Researcher, Competitive Analyst, Trend Researcher, User Interview Synthesizer |
 | **Growth** (4) | Launch Strategist, Content Strategist, Community Manager, Support Triage Lead |
+| **Deep Audit** (4) | Component Auditor, Test Truth Auditor, Seam Auditor, Audit Synthesizer |
 Every role has a full contract: mission, use when, do not use when, expected inputs, required outputs, quality bar, and escalation triggers. Every role is routable — `roleos route` can recommend any of them based on packet content.
@@ -209,13 +210,13 @@ role-os/
     mission.mjs                ← 7 named mission types (feature, bugfix, treatment, docs, security, research, brainstorm)
     mission-run.mjs            ← Mission runner: create → step → complete → report
     mission-cmd.mjs            ← `roleos mission` CLI commands
-    route.mjs                  ← 31-role routing + dynamic chain builder
-    packs.mjs                  ← 7 calibrated team packs + auto-selection
+    route.mjs                  ← 54-role routing + dynamic chain builder
+    packs.mjs                  ← 9 calibrated team packs + auto-selection
     conflicts.mjs              ← 4-pass conflict detection
     escalation.mjs             ← Auto-routing for blocked/rejected/split
     evidence.mjs               ← Structured evidence + role-aware requirements
     dispatch.mjs               ← Runtime dispatch manifests for multi-claude
-    artifacts.mjs              ← 30 per-role artifact contracts + 7 pack handoffs
+    artifacts.mjs              ← Per-role artifact contracts + pack handoffs
     decompose.mjs              ← Composite task detection + splitting
     composite.mjs              ← Dependency-ordered execution + recovery
     replan.mjs                 ← Mid-run adaptive replanning
@@ -225,7 +226,7 @@ role-os/
     brainstorm.mjs             ← Evidence modes, request validation, finding/synthesis/judge schemas
     brainstorm-roles.mjs       ← Role-native schemas, input partitioning, blindspot enforcement, cross-exam
     brainstorm-render.mjs      ← Two-layer rendering: lexical bans, render schemas, debate transcript
-  test/                        ← 894 tests across 30 test files
+  test/                        ← 936 tests across 31 test files
   starter-pack/                ← Drop-in role contracts, policies, schemas, workflows
 ```
@@ -237,28 +238,29 @@ Role OS operates **locally only**. It copies markdown templates and writes packe
 | Layer | What it does | Status |
 |-------|-------------|--------|
-| **Routing** | Scores all 31 roles against packet content, explains recommendations, assesses confidence | ✓ Shipped |
+| **Routing** | Scores all 54 roles against packet content, explains recommendations, assesses confidence | ✓ Shipped |
 | **Chain builder** | Assembles phase-ordered chains from scored roles, packet-type biased not template-locked | ✓ Shipped |
 | **Conflict detection** | 4-pass validation: hard conflicts, sequence, redundancy, coverage gaps. Repair suggestions. | ✓ Shipped |
 | **Escalation** | Auto-routes blocked/rejected/split work to the right resolver with reason + required artifact | ✓ Shipped |
 | **Evidence** | Role-aware structured evidence in verdicts. Sufficiency checks. 12 evidence kinds. | ✓ Shipped |
 | **Dispatch** | Generates execution manifests for multi-claude. Per-role tool profiles, system prompts, budgets. | ✓ Shipped |
 | **Trials** | Full roster proven: 30/30 gold-task + 5/5 negative trials. 7 pack trials complete. | ✓ Complete |
-| **Team Packs** | 7 calibrated packs with auto-selection, mismatch guards, and free-routing fallback. | ✓ Shipped |
+| **Team Packs** | 9 calibrated packs with auto-selection, mismatch guards, and free-routing fallback. | ✓ Shipped |
 | **Outcome calibration** | Records run outcomes, tunes pack/role weights from results, adjusts confidence thresholds. | ✓ Shipped |
 | **Mixed-task decomposition** | Detects composite work, splits into child packets, assigns packs, preserves dependencies. | ✓ Shipped |
 | **Composite execution** | Runs child packets in dependency order with artifact passing, branch recovery, and synthesis. | ✓ Shipped |
 | **Adaptive replanning** | Mid-run scope changes, findings, or new requirements update the plan without restarting. | ✓ Shipped |
 | **Session spine** | `roleos init claude` scaffolds CLAUDE.md, /roleos-route, /roleos-review, /roleos-status. `roleos doctor` verifies wiring. Route cards prove engagement. | ✓ Shipped |
 | **Hook spine** | 5 lifecycle hooks (SessionStart, PromptSubmit, PreToolUse, SubagentStart, Stop). Advisory enforcement: route card reminders, write-tool gating, subagent role injection, completion audit. | ✓ Shipped |
-| **Artifact spine** | 30 per-role artifact contracts. 7 pack handoff contracts. Structural validation. Chain completeness checks. Downstream roles never guess what they received. | ✓ Shipped |
-| **Mission library** | 7 named missions (feature-ship, bugfix, treatment, docs-release, security-hardening, research-launch, brainstorm). Each declares pack, role chain, artifact flow, escalation branches, honest-partial definition. All 7 trial-proven. | ✓ Shipped |
+| **Artifact spine** | Per-role artifact contracts. Pack handoff contracts. Structural validation. Chain completeness checks. Downstream roles never guess what they received. | ✓ Shipped |
+| **Mission library** | 8 named missions (feature-ship, bugfix, treatment, docs-release, security-hardening, research-launch, brainstorm, deep-audit). Each declares pack, role chain, artifact flow, escalation branches, honest-partial definition. | ✓ Shipped |
 | **Mission runner** | Create runs, step through with tracked state, complete/fail with honest reporting. Blocked-step propagation, out-of-chain escalation warnings, last-step re-opening. | ✓ Shipped |
 | **Unified entry** | `roleos start` decides mission vs pack vs free routing automatically. Fallback ladder with confidence scores, alternatives, and composite detection. | ✓ Shipped |
 | **Persistent runs** | `roleos run` creates disk-backed runs. `resume`, `next`, `explain`, `complete`, `fail`. Interventions: reroute, escalate, retry, block, reopen. Step-local guidance. Friction measurement. | ✓ Shipped |
-| **Brainstorm** | Two-layer architecture: truth (role-native schemas, provenance atoms, cross-exam dispute graph) + render (5 distinct voices, lexical bans, debate transcript). Trace links prove every rendered claim maps to a truth atom. Golden run: 894 tests. | ✓ Shipped |
+| **Brainstorm** | Two-layer architecture: truth (role-native schemas, provenance atoms, cross-exam dispute graph) + render (5 distinct voices, lexical bans, debate transcript). Trace links prove every rendered claim maps to a truth atom. Golden run proven. | ✓ Shipped |
+| **Deep Audit** | Manifest-scaled repo audit: decompose repo into components, dispatch N auditors + M test truth auditors + K seam auditors from dependency graph, synthesize into ranked verdict and action plan. Dynamic dispatch scales with repo size (2N + K + 3 formula). Runner-native with artifact validation at every step. | ✓ Shipped |
-## 7 missions
+## 8 missions
 | Mission | Pack | Roles | When to use |
 |---------|------|-------|-------------|
@@ -269,6 +271,7 @@ Role OS operates **locally only**. It copies markdown templates and writes packe
 | `security-hardening` | security | 4 | Threat model, audit, fix vulnerabilities, re-audit, verify |
 | `research-launch` | research | 4 | Frame question, research, document findings, decide |
 | `brainstorm` | brainstorm | 9 | Structured multi-perspective inquiry with traceable disagreement and verdict |
+| `deep-audit` | deep-audit | 5 (scales) | Manifest-backed repo audit — worker count scales with repo graph via dynamic dispatch |
 Each mission includes honest-partial definitions — when work stalls, the system documents what was completed and what remains instead of bluffing completion.
@@ -290,7 +293,27 @@ roleos run "explore product directions for a developer tool discovery platform"
 - **Chain of custody:** Every rendered sentence traces back to a truth-layer atom. Synthesis directions cite atoms. Cross-exam targets real claim IDs. The dispute graph is the product, not the prose.
-**Proven:** v0.4 golden run — 894 tests, full chain of custody verified. See [`examples/golden-run.md`](examples/golden-run.md) for the complete artifact chain.
+**Proven:** v0.4 golden run — full chain of custody verified. See [`examples/golden-run.md`](examples/golden-run.md) for the complete artifact chain.
+### Deep audit mission
+Not a surface scan. The deep audit mission **decomposes a repo into bounded components and dispatches specialist auditors at a scale determined by the repo's own dependency graph.**
+```bash
+roleos run "deep audit this repo" --manifest=audit-manifest.json
+# → MISSION: Deep Audit (Manifest-Scaled)
+#   Steps: Component Auditor ×6 + Test Truth Auditor ×6 + Seam Auditor ×8 + Synthesizer + Action Plan + Critic = 23 steps
+```
+**What makes it different:**
+- **Dynamic dispatch** — worker count is not fixed. A 10-component repo with 5 boundary clusters produces 28 steps (2×10 + 5 + 3). A 3-component repo produces 12. The scaling formula is `2N + K + 3` where N = components, K = boundaries.
+- **Manifest-backed parcels** — an `audit-manifest.json` defines components (with file paths, line counts, descriptions) and boundaries (from/to with interface descriptions). Each auditor receives only its parcel.
+- **Four role archetypes** — Component Auditor (code truth per module), Test Truth Auditor (tests that prove vs tests that exist), Seam Auditor (integration boundaries from the dependency graph), Audit Synthesizer (ranked verdict + action plan from all parcels).
+- **Artifact validation at every step** — `validateArtifact()` fires on every step completion in both execution paths. Results attached to step objects. The system knows whether each artifact met its contract.
+- **Honest partial** — when budget or scope blocks completion, per-component findings are individually valid. The system synthesizes from whatever completed, never bluffs full coverage.
+**Proven:** Runner-native proof run — 18 tests against real manifest, full lifecycle verified including escalation re-opening and partial failure. Scaling formula verified for 3/6/10/15-component manifests.
 ## Status
@@ -309,6 +332,7 @@ roleos run "explore product directions for a developer tool discovery platform"
 - **v2.0.0**: Operator friction pass (Phase U) — `roleos run` creates persistent disk-backed runs. Resume, next, explain, complete, fail. Interventions: reroute, escalate, retry, block, reopen. Step-local guidance at every step. Friction measurement. 6 friction trials. 613 tests.
 - **v2.0.1**: Handbook audit, beginner docs, test count corrections. 617 tests.
 - **v2.1.0**: Brainstorm mission (v0.4) — specialized roles under law, traceable disagreement, verdict-bearing output. Two-layer architecture (truth + render), cross-exam permission matrix, dispute graph, golden run proof. 7 missions, 50 roles, 8 packs. 894 tests.
+- **v2.2.0**: Deep Audit mission — manifest-scaled repo audit with dynamic dispatch. 4 new audit roles (Component Auditor, Test Truth Auditor, Seam Auditor, Audit Synthesizer). Worker count scales with repo graph (2N + K + 3 formula). Artifact validation wired at both execution boundaries. Runner-native proof run green. accept/approve truth fix in evidence layer. 8 missions, 54 roles, 9 packs. 936 tests.
 ## License

package/package.json CHANGED Viewed

@@ -1,7 +1,7 @@
 {
   "name": "role-os",
-  "version": "2.1.0",
-  "description": "Role OS — a multi-Claude operating system where 50 specialized roles execute work through contracts, conflict detection, escalation, and structured evidence. 8 proven team packs, 7 missions including brainstorm with traceable disagreement and verdict-bearing output.",
+  "version": "2.2.0",
+  "description": "Role OS — a multi-Claude operating system where 54 specialized roles execute work through contracts, conflict detection, escalation, and structured evidence. 9 team packs, 8 missions including deep audit with manifest-scaled dynamic dispatch and brainstorm with traceable disagreement.",
   "homepage": "https://mcp-tool-shop-org.github.io/role-os/",
   "bugs": {
     "url": "https://github.com/mcp-tool-shop-org/role-os/issues"

package/src/artifacts.mjs CHANGED Viewed

@@ -256,6 +256,40 @@ export const ROLE_ARTIFACT_CONTRACTS = {
     consumedBy: [],
     completionRule: "Disposition is accept/revise_expand/revise_synthesize/reject. Verdicts: ready_to_advance/needs_incubation/not_active_now. Actions: build_now/hold_for_followon/archive_but_retain. Revise requires targets.",
   },
+  // ── Deep Audit ──
+  "Component Auditor": {
+    artifactType: "component-audit-report",
+    requiredSections: ["findings", "what-i-could-not-verify", "adjacent-parcel-risks", "parcel-statistics"],
+    optionalSections: [],
+    requiredEvidence: ["component-parcel-definition"],
+    consumedBy: ["Audit Synthesizer"],
+    completionRule: "Every file in owned paths read. Findings use standardized schema with severity, confidence, category, file, evidence, impact. Adjacent parcel risks are specific, not generic.",
+  },
+  "Seam Auditor": {
+    artifactType: "seam-audit-report",
+    requiredSections: ["findings", "false-independence-risks", "content-code-drift", "dependency-direction-assessment"],
+    optionalSections: [],
+    requiredEvidence: ["boundary-cluster-definition", "component-graph"],
+    consumedBy: ["Audit Synthesizer"],
+    completionRule: "Every declared boundary inspected. Findings reference both sides. Content-code drift quotes both content claim and code reality.",
+  },
+  "Test Truth Auditor": {
+    artifactType: "test-truth-report",
+    requiredSections: ["findings", "untested-but-risky", "ceremonial-tests", "integration-gaps", "test-suite-health-summary"],
+    optionalSections: [],
+    requiredEvidence: ["test-file-paths", "implementation-file-paths"],
+    consumedBy: ["Audit Synthesizer"],
+    completionRule: "Distinguishes 'line executed' from 'behavior verified'. Lists source files with no test. Estimates real coverage with reasoning.",
+  },
+  "Audit Synthesizer": {
+    artifactType: "audit-summary",
+    requiredSections: ["verdict", "posture", "by-the-numbers", "structurally-sound", "fragile", "dangerous", "dead-weight", "cross-cutting-findings", "contradictions", "audit-gaps"],
+    optionalSections: [],
+    requiredEvidence: ["component-audit-report", "seam-audit-report", "test-truth-report"],
+    consumedBy: ["Critic Reviewer"],
+    completionRule: "Reconciles findings across parcels. Cross-cutting findings reference source parcels. Contradictions adjudicated. Action plan groups by root cause and leverage.",
+  },
 };
 // ── Artifact validation ───────────────────────────────────────────────────────
@@ -398,6 +432,15 @@ export const PACK_HANDOFF_CONTRACTS = {
       { role: "Critic Reviewer", produces: "verdict", consumedBy: null },
     ],
   },
+  "deep-audit": {
+    flow: [
+      { role: "Component Auditor",  produces: "component-audit-report", consumedBy: "Audit Synthesizer" },
+      { role: "Test Truth Auditor", produces: "test-truth-report",      consumedBy: "Audit Synthesizer" },
+      { role: "Seam Auditor",       produces: "seam-audit-report",      consumedBy: "Audit Synthesizer" },
+      { role: "Audit Synthesizer",  produces: "audit-summary",          consumedBy: "Critic Reviewer" },
+      { role: "Critic Reviewer",    produces: "verdict",                consumedBy: null },
+    ],
+  },
 };
 /**

package/src/dispatch.mjs CHANGED Viewed

@@ -88,6 +88,12 @@ const TOOL_PROFILES = {
   "Mechanics Analyst":    ["Read", "Glob", "Grep"],
   "Positioning Analyst":  ["Read", "Glob", "Grep"],
   "Contrarian Analyst":   ["Read", "Glob", "Grep"],
+  // Deep Audit
+  "Component Auditor":    ["Read", "Glob", "Grep"],
+  "Seam Auditor":         ["Read", "Glob", "Grep"],
+  "Test Truth Auditor":   ["Read", "Glob", "Grep"],
+  "Audit Synthesizer":    ["Read", "Glob", "Grep", "Write"],
 };
 // ── Default role config ─────────────────────────────────────────────────────

package/src/evidence.mjs CHANGED Viewed

@@ -146,7 +146,7 @@ const DEFAULT_REQUIREMENTS = {
  * @property {EvidenceItem[]} evidence - Structured evidence items
  * @property {string[]} gaps - What's missing or weak
  * @property {string[]} risks - Identified risks
- * @property {string} [requiredNextArtifact] - What the next role must produce (for non-approve)
+ * @property {string} [requiredNextArtifact] - What the next role must produce (for non-accept)
  * @property {string} confidence - One of CONFIDENCE_LEVELS
  */
@@ -182,25 +182,25 @@ export function checkSufficiency(verdict) {
     .map(e => `${e.kind}: ${e.claim} (${e.reference})`);
   if (contradictions.length > 0 && verdict.verdict === "accept") {
-    warnings.push("Verdict is 'approve' but evidence contains contradictions — review carefully");
+    warnings.push("Verdict is 'accept' but evidence contains contradictions — review carefully");
   }
-  // Check for missing evidence items on non-approve verdicts
+  // Check for missing evidence items on accept verdicts
   const missingItems = verdict.evidence.filter(e => e.status === "missing");
   if (missingItems.length > 0 && verdict.verdict === "accept") {
-    warnings.push("Verdict is 'approve' but some evidence items are marked 'missing'");
+    warnings.push("Verdict is 'accept' but some evidence items are marked 'missing'");
   }
-  // Non-approve verdicts should have gaps or requiredNextArtifact
-  if (verdict.verdict !== "approve" && verdict.verdict !== "accept-with-notes") {
+  // Non-accept verdicts should have gaps or requiredNextArtifact
+  if (verdict.verdict !== "accept" && verdict.verdict !== "accept-with-notes") {
     if (verdict.gaps.length === 0 && !verdict.requiredNextArtifact) {
-      warnings.push("Non-approve verdict should specify gaps or requiredNextArtifact for recovery");
+      warnings.push("Non-accept verdict should specify gaps or requiredNextArtifact for recovery");
     }
   }
-  // Low confidence + approve is suspicious
+  // Low confidence + accept is suspicious
   if (verdict.confidence === "low" && verdict.verdict === "accept") {
-    warnings.push("Low confidence approve — consider whether evidence is actually sufficient");
+    warnings.push("Low confidence accept — consider whether evidence is actually sufficient");
   }
   const sufficient = missingRequired.length === 0 && contradictions.length === 0;

package/src/mission-run.mjs CHANGED Viewed

@@ -10,8 +10,7 @@
  */
 import { MISSIONS, getMission, validateMission } from "./mission.mjs";
-import { TEAM_PACKS } from "./packs.mjs";
-import { validateArtifact, ROLE_ARTIFACT_CONTRACTS } from "./artifacts.mjs";
+import { validateArtifact } from "./artifacts.mjs";
 let _runCounter = 0;
@@ -59,7 +58,7 @@ let _runCounter = 0;
  * @param {string} taskDescription
  * @returns {MissionRun}
  */
-export function createRun(missionKey, taskDescription) {
+export function createRun(missionKey, taskDescription, options = {}) {
   const mission = getMission(missionKey);
   if (!mission) {
     throw new Error(`Mission "${missionKey}" not found. Available: ${Object.keys(MISSIONS).join(", ")}`);
@@ -72,16 +71,26 @@ export function createRun(missionKey, taskDescription) {
   const id = `${missionKey}-${Date.now()}-${++_runCounter}`;
-  const steps = mission.artifactFlow.map((step) => ({
-    role: step.role,
-    produces: step.produces,
-    consumedBy: step.consumedBy,
-    status: "pending",
-    artifact: null,
-    note: null,
-    startedAt: null,
-    completedAt: null,
-  }));
+  let steps;
+  const dd = mission.dynamicDispatch;
+  if (dd && options.manifest) {
+    // Dynamic dispatch — build steps from manifest
+    steps = buildDynamicSteps(mission, options.manifest);
+  } else {
+    // Static dispatch — use artifactFlow as-is
+    steps = mission.artifactFlow.map((step) => ({
+      role: step.role,
+      produces: step.produces,
+      consumedBy: step.consumedBy,
+      status: "pending",
+      artifact: null,
+      artifactValidation: null,
+      note: null,
+      startedAt: null,
+      completedAt: null,
+    }));
+  }
   return {
     id,
@@ -93,9 +102,94 @@ export function createRun(missionKey, taskDescription) {
     startedAt: new Date().toISOString(),
     completedAt: null,
     completionReport: null,
+    dynamicDispatch: dd && options.manifest ? true : false,
+    manifest: options.manifest || null,
   };
 }
+/**
+ * Build steps from manifest for dynamic dispatch missions.
+ * @param {Object} mission
+ * @param {Object} manifest - The audit-manifest.json content
+ * @returns {MissionStep[]}
+ */
+function buildDynamicSteps(mission, manifest) {
+  const dd = mission.dynamicDispatch;
+  const steps = [];
+  // Scaling roles: one step per manifest entry
+  const components = manifest[dd.componentAuditorPer] || [];
+  const boundaries = manifest[dd.seamAuditorPer] || manifest.boundaries || [];
+  // Component Auditor × N
+  for (const comp of components) {
+    steps.push({
+      role: "Component Auditor",
+      produces: "component-audit-report",
+      consumedBy: "Audit Synthesizer",
+      parcel: comp.id || comp.name,
+      status: "pending",
+      artifact: null,
+      artifactValidation: null,
+      note: null,
+      startedAt: null,
+      completedAt: null,
+    });
+  }
+  // Test Truth Auditor × M
+  for (const comp of components) {
+    steps.push({
+      role: "Test Truth Auditor",
+      produces: "test-truth-report",
+      consumedBy: "Audit Synthesizer",
+      parcel: comp.id || comp.name,
+      status: "pending",
+      artifact: null,
+      artifactValidation: null,
+      note: null,
+      startedAt: null,
+      completedAt: null,
+    });
+  }
+  // Seam Auditor × K
+  for (const boundary of boundaries) {
+    const label = boundary.id || `${boundary.from}-${boundary.to}`;
+    steps.push({
+      role: "Seam Auditor",
+      produces: "seam-audit-report",
+      consumedBy: "Audit Synthesizer",
+      parcel: label,
+      status: "pending",
+      artifact: null,
+      artifactValidation: null,
+      note: null,
+      startedAt: null,
+      completedAt: null,
+    });
+  }
+  // Non-scaling roles from artifactFlow (Audit Synthesizer, Critic Reviewer)
+  for (const step of mission.artifactFlow) {
+    if (!dd.scalingRoles.includes(step.role)) {
+      steps.push({
+        role: step.role,
+        produces: step.produces,
+        consumedBy: step.consumedBy,
+        status: "pending",
+        artifact: null,
+        artifactValidation: null,
+        note: null,
+        startedAt: null,
+        completedAt: null,
+      });
+    }
+  }
+  return steps;
+}
 // ── Step through a run ──────────────────────────────────────────────────────
 /**
@@ -127,6 +221,10 @@ export function completeStep(run, artifact, note) {
     throw new Error("No active step to complete");
   }
+  // Validate artifact against role contract (warn, don't block)
+  const validation = validateArtifact(active.role, artifact);
+  active.artifactValidation = validation;
   active.status = "completed";
   active.artifact = artifact;
   active.note = note || null;

package/src/mission.mjs CHANGED Viewed

@@ -268,6 +268,65 @@ export const MISSIONS = {
     dispatchDefaults: { model: "sonnet", maxTurns: 40, maxBudgetUsd: 6.0 },
     trialEvidence: "v0.4 golden run — 894 tests green. Full chain of custody proven: truth artifacts, provenance atoms, dispute graph (4 challenges, 3 narrowed, 1 unresolved), rendered artifacts in 5 formats, debate transcript, 16+ trace links from rendered → truth. Architecture frozen 2026-03-27.",
   },
+  // ── Deep Audit (Componentized Repo Understanding) ──────────────────────────
+  "deep-audit": {
+    name: "Deep Audit",
+    description: "Decompose a repo into bounded components, dispatch one auditor per component, inspect seams from the dependency graph, assess test truth, then synthesize into a ranked verdict and action plan. Worker count scales with the repo graph — not fixed.",
+    pack: "deep-audit",
+    entryPath: "Decompose repo → validate parcels → Component Auditor ×N (parallel) + Test Truth Auditor ×M → Seam Auditor ×K (from graph edges) → Audit Synthesizer → Critic reviews verdict",
+    // NOTE: This mission has a DYNAMIC role chain. The static chain below
+    // shows the role archetypes. At dispatch time, Component Auditor and
+    // Seam Auditor are instantiated once per component/boundary cluster.
+    // A 10-component repo with 4 risky boundaries = 10 + 4 + 2 + 1 + 1 = 18 tasks.
+    roleChain: [
+      "Component Auditor",   // ×N — one per component from audit-manifest
+      "Test Truth Auditor",  // ×M — one per component or one overlay pass
+      "Seam Auditor",        // ×K — one per risky boundary cluster from graph
+      "Audit Synthesizer",   // ×1 — consumes all outputs, produces verdict
+      "Critic Reviewer",     // ×1 — final acceptance
+    ],
+    // Dynamic dispatch contract:
+    // Step 1 produces audit-manifest.json with components[] and boundaries[].
+    // Steps 2-4 are instantiated from the manifest:
+    //   - One Component Auditor task per components[] entry
+    //   - One Test Truth Auditor task per component (or grouped by layer)
+    //   - One Seam Auditor task per boundary cluster
+    // Step 5 (Audit Synthesizer) runs after ALL step 2-4 tasks complete.
+    // Step 6 (Critic Reviewer) reviews the synthesis.
+    dynamicDispatch: {
+      scalingRoles: ["Component Auditor", "Test Truth Auditor", "Seam Auditor"],
+      manifestSource: "audit-manifest.json",
+      componentAuditorPer: "components",
+      testTruthAuditorPer: "components",
+      seamAuditorPer: "boundary_clusters",
+      synthesisAfter: ["Component Auditor", "Test Truth Auditor", "Seam Auditor"],
+    },
+    artifactFlow: [
+      // Step 1: Decomposition (done before mission dispatch — input artifact)
+      { role: "Component Auditor",   produces: "component-audit-report", consumedBy: "Audit Synthesizer" },
+      { role: "Test Truth Auditor",  produces: "test-truth-report",      consumedBy: "Audit Synthesizer" },
+      { role: "Seam Auditor",        produces: "seam-audit-report",      consumedBy: "Audit Synthesizer" },
+      { role: "Audit Synthesizer",   produces: "audit-summary",          consumedBy: "Critic Reviewer" },
+      { role: "Audit Synthesizer",   produces: "audit-action-plan",      consumedBy: "Critic Reviewer" },
+      { role: "Critic Reviewer",     produces: "review-verdict",         consumedBy: null },
+    ],
+    escalationBranches: [
+      { trigger: "component exceeds 8K lines", from: "Component Auditor", to: "Component Auditor", action: "re-slice into sub-components, re-dispatch" },
+      { trigger: "circular dependency found", from: "Seam Auditor", to: "Audit Synthesizer", action: "elevate as architectural finding, do not attempt to resolve" },
+      { trigger: "parcel outputs inconsistent", from: "Audit Synthesizer", to: "Component Auditor", action: "re-audit the inconsistent component with narrower scope" },
+      { trigger: "critical finding spans 3+ components", from: "Audit Synthesizer", to: "Seam Auditor", action: "targeted cross-cut audit on the systemic issue" },
+      { trigger: "test suite is ceremonial", from: "Test Truth Auditor", to: "Audit Synthesizer", action: "flag as structural risk — false confidence in coverage" },
+    ],
+    honestPartial: "Component audits complete but seam inspection blocked or synthesis incomplete. Per-component findings are individually valid and actionable. Manifest and component reports exist even if synthesis does not.",
+    stopConditions: [
+      "Audit Synthesizer produces verdict + action plan, Critic accepts",
+      "Decomposition reveals repo is too tangled to slice — document why and abort",
+      "All component audits complete but seam audits blocked — synthesize with component-only truth",
+      "Budget exhausted — synthesize from whatever component audits completed",
+    ],
+    dispatchDefaults: { model: "sonnet", maxTurns: 25, maxBudgetUsd: 3.0 },
+    trialEvidence: "New mission — no trial evidence yet. Architecture designed 2026-03-27.",
+  },
 };
 // ── Mission catalog ─────────────────────────────────────────────────────────
@@ -332,6 +391,10 @@ export function suggestMission(taskDescription) {
       signals: ["brainstorm", "explore ideas", "explore directions", "opportunity map", "creative directions", "concept exploration", "what could we build", "divergent thinking", "ideate"],
       weight: 1.1,
     },
+    "deep-audit": {
+      signals: ["deep audit", "component audit", "decompose and audit", "audit components", "structural audit", "deep review", "code audit", "repo deep dive"],
+      weight: 1.2,
+    },
   };
   let bestKey = null;

package/src/packs.mjs CHANGED Viewed

@@ -255,6 +255,38 @@ export const TEAM_PACKS = {
     ],
   },
+  // ── Deep Audit (Componentized Repo Understanding) ──────────────────────────
+  "deep-audit": {
+    name: "Deep Audit",
+    description: "Decompose repo into components, audit each deeply, inspect seams, synthesize verdict. Scales with repo graph.",
+    roles: [
+      "Component Auditor",
+      "Test Truth Auditor",
+      "Seam Auditor",
+      "Audit Synthesizer",
+      "Critic Reviewer",
+    ],
+    orchestratorRequired: false, // mission step sequence handles orchestration
+    optionalRoles: ["Security Reviewer", "Dependency Auditor"],
+    chainOrder: "Component Auditor (×N, parallel) + Test Truth Auditor (×M) → Seam Auditor (×K, from graph) → Audit Synthesizer",
+    requiredArtifacts: ["audit-manifest", "component-audit-report", "seam-audit-report", "test-truth-report", "audit-summary", "audit-action-plan"],
+    stopConditions: [
+      "All component parcels audited + seams inspected + synthesis complete",
+      "Critical finding in decomposition phase — repo too tangled to slice cleanly",
+      "Component auditor finds scope exceeds 8K lines — request re-slice",
+    ],
+    escalationOwner: "Audit Synthesizer",
+    dispatchDefaults: { model: "sonnet", maxTurns: 25, maxBudgetUsd: 3.0 },
+    trialEvidence: "New mission — no trial evidence yet. First test: role-os self-audit.",
+    mismatchGuards: [
+      { notForSignals: ["fix bug", "crash", "broken", "regression"], suggestInstead: "bugfix", reason: "This is a bug to fix, not a deep audit" },
+      { notForSignals: ["implement", "build", "add command", "new feature"], suggestInstead: "feature", reason: "This is feature work, not an audit" },
+      { notForSignals: ["launch", "announce", "release notes", "messaging"], suggestInstead: "launch", reason: "This is launch work, not an audit" },
+      { notForSignals: ["treatment", "shipcheck", "polish"], suggestInstead: "treatment", reason: "This is repo treatment (surface polish), not a deep audit" },
+      { notForSignals: ["brainstorm", "explore ideas", "ideate"], suggestInstead: "brainstorm", reason: "This is brainstorming, not an audit" },
+    ],
+  },
   // ── Brainstorm (Structured Inquiry) ─────────────────────────────────────────
   brainstorm: {
     name: "Brainstorm (Structured Inquiry)",
@@ -304,6 +336,7 @@ const PACK_KEYWORDS = {
   research:  ["research", "competitive", "ux", "friction", "user", "strategy", "trend"],
   treatment: ["treatment", "polish", "cleanup", "repo audit", "shipcheck", "full treatment"],
   brainstorm: ["brainstorm", "explore", "ideate", "divergent", "opportunity", "creative directions", "concept exploration", "what could", "possibilities"],
+  "deep-audit": ["deep audit", "component audit", "repo audit deep", "decompose and audit", "audit components", "code audit", "structural audit", "deep review"],
 };
 /**

package/src/route.mjs CHANGED Viewed

@@ -344,6 +344,36 @@ export const ROLE_CATALOG = [
     triggers: ["targeted challenge", "claim attack", "contradiction exposure"],
     excludeWhen: [],
   },
+  // ── DEEP AUDIT ──
+  {
+    name: "Component Auditor", pack: "deep-audit", phase: 3,
+    keywords: ["audit", "component", "correctness", "dead code", "error handling", "state management"],
+    triggers: ["deep audit", "component audit", "code audit", "line-by-line audit"],
+    excludeWhen: ["test audit only", "boundary audit only"],
+    deliverableAffinity: ["Review"],
+  },
+  {
+    name: "Seam Auditor", pack: "deep-audit", phase: 4,
+    keywords: ["boundary", "seam", "interface", "contract", "integration", "dependency direction"],
+    triggers: ["boundary audit", "seam inspection", "interface mismatch", "cross-component"],
+    excludeWhen: ["single component only", "test audit only"],
+    deliverableAffinity: ["Review"],
+  },
+  {
+    name: "Test Truth Auditor", pack: "deep-audit", phase: 3,
+    keywords: ["test coverage", "test truth", "ceremonial test", "test gap", "mock fidelity"],
+    triggers: ["test truth audit", "coverage reality", "test quality assessment"],
+    excludeWhen: ["no tests exist", "implementation audit only"],
+    deliverableAffinity: ["Review"],
+  },
+  {
+    name: "Audit Synthesizer", pack: "deep-audit", phase: 5,
+    keywords: ["synthesis", "verdict", "action plan", "reconcile", "cross-cutting"],
+    triggers: ["audit synthesis", "repo verdict", "finding reconciliation"],
+    excludeWhen: ["component audit still running", "no findings to synthesize"],
+    deliverableAffinity: ["Review"],
+  },
 ];
 // ── Deliverable type → role affinity ──────────────────────────────────────────

package/src/run.mjs CHANGED Viewed

@@ -18,8 +18,7 @@ import { decideEntry } from "./entry.mjs";
 import { getMission } from "./mission.mjs";
 import { TEAM_PACKS, getPack } from "./packs.mjs";
 import { ROLE_CATALOG } from "./route.mjs";
-import { ROLE_ARTIFACT_CONTRACTS } from "./artifacts.mjs";
-import { getHandoffContract } from "./artifacts.mjs";
+import { ROLE_ARTIFACT_CONTRACTS, validateArtifact, getHandoffContract } from "./artifacts.mjs";
 // ── Run directory ────────────────────────────────────────────────────────────
@@ -309,6 +308,10 @@ export function completeCurrentStep(run, artifact, note, cwd) {
   const active = run.steps.find(s => s.status === "active");
   if (!active) throw new Error("No active step to complete");
+  // Validate artifact against role contract (warn, don't block)
+  const validation = validateArtifact(active.role, artifact);
+  active.artifactValidation = validation;
   active.status = "completed";
   active.artifact = artifact;
   active.note = note || null;

package/starter-pack/agents/engineering/audit-synthesizer.md ADDED Viewed

@@ -0,0 +1,56 @@
+# Audit Synthesizer
+## Mission
+Consume all component, seam, and test audit outputs and produce one truthful repo-wide verdict with a ranked action plan.
+## Use When
+- All component auditors, seam auditors, and test truth auditors have completed their parcels
+- Structured findings exist in standardized format
+- The goal is a single authoritative repo assessment, not another audit pass
+## Do Not Use When
+- Component audits are still running — wait for all outputs
+- No structured findings exist — there's nothing to synthesize
+- The goal is to audit code directly (use Component Auditor)
+## Expected Inputs
+- All AUDIT-PARCEL-*.md files (component findings)
+- All AUDIT-SEAM-*.md files (boundary findings)
+- All AUDIT-TESTS-*.md files (test truth findings)
+- audit-manifest.json (component graph, for cross-referencing)
+## Required Output
+### AUDIT-SUMMARY.md
+- **Verdict** — one paragraph: structurally sound, fragile, dangerous, or dead weight, with specific reasoning
+- **Posture** — sound / fragile / dangerous / abandoned
+- **By the Numbers** — finding counts by severity across all lanes
+- **What Is Structurally Sound** — components/patterns that are correct (give specific credit)
+- **What Is Fragile** — things that work but break under change or edge cases
+- **What Is Dangerous** — active defects, security issues, data loss risks
+- **What Is Dead Weight** — unused code, vestigial features, abandoned modules
+- **Cross-Cutting Findings** — issues spanning multiple components, with source parcel references
+- **Contradictions Between Parcels** — where findings conflict, with adjudication
+- **Audit Gaps** — things no parcel was positioned to evaluate
+### AUDIT-ACTION-PLAN.md
+- **P0** — fix before next release (critical + high, grouped by root cause)
+- **P1** — fix this sprint (medium findings that compound)
+- **P2** — scheduled cleanup (low, dead code, naming)
+- **P3** — architectural (structural changes needing planning)
+- **Recommended Fix Order** — numbered sequence considering dependencies
+- **Estimated Effort** — per priority group (trivial/half-day/full-day/multi-day)
+## Quality Bar
+- Must reconcile — not just concatenate — findings across parcels
+- Cross-cutting findings must reference which parcel outputs informed them
+- Contradictions must be adjudicated, not just listed
+- Action plan must group by root cause and leverage, not by parcel
+- A root cause fix that resolves 5 findings ranks higher than 5 individual patches
+- Must identify gaps — things that fell between parcels
+## Escalation Triggers
+- Parcel outputs are missing or incomplete — cannot synthesize without full data
+- Parcel outputs use inconsistent finding formats — cannot reconcile
+- Critical findings span 3+ components — systemic issue, may need architectural rewrite
+- Component auditors and seam auditors contradict on the same boundary — needs investigation

package/starter-pack/agents/engineering/component-auditor.md ADDED Viewed

@@ -0,0 +1,46 @@
+# Component Auditor
+## Mission
+Read every line in an assigned code component and produce structured findings for every material issue.
+## Use When
+- A repo has been decomposed into bounded components for deep audit
+- This role receives a specific component parcel with owned files, forbidden files, and interfaces
+- The goal is truthful per-component understanding, not surface-level scanning
+## Do Not Use When
+- The work is a broad repo-level audit (use the deep-audit mission instead of dispatching this role directly)
+- The component is tests (use Test Truth Auditor)
+- The work is about interfaces between components (use Seam Auditor)
+## Expected Inputs
+- Component parcel definition: owned paths, forbidden paths, public interfaces, upstream/downstream dependencies, risk hints
+- Approximate line count and complexity assessment
+- Repo language and framework context
+## Required Output
+- Per-file findings using the standardized finding schema:
+  - Severity (critical/high/medium/low/info)
+  - Confidence (certain/likely/possible/speculative)
+  - Category (correctness/error-handling/security/state/performance/dead-code/naming/dependency/architecture)
+  - File and function/line reference
+  - Quoted evidence
+  - Impact assessment
+  - Recommended fix
+  - Blocking questions
+  - Adjacent parcel risks
+- "What I Could Not Verify" section — things outside this parcel's scope
+- "Adjacent Parcel Risks" section — concerns at boundaries with other components
+- Parcel statistics: files read, total lines, findings by severity
+## Quality Bar
+- Every file in owned paths must be read — no skipping
+- Findings must include quoted code evidence, not summaries
+- Adjacent parcel risks must be specific, not generic ("state might leak" is bad; "run.mjs L247 mutates the opts object passed from entry.mjs" is good)
+- "What I Could Not Verify" must be honest — if you can't see the caller, say so
+## Escalation Triggers
+- Component exceeds 8,000 lines — request split into sub-components
+- Owned paths reference files that don't exist — flag immediately
+- Component has zero tests — flag for Test Truth Auditor
+- Critical finding that affects multiple other components — flag for Seam Auditor

package/starter-pack/agents/engineering/seam-auditor.md ADDED Viewed

@@ -0,0 +1,46 @@
+# Seam Auditor
+## Mission
+Inspect interfaces between components to verify they connect lawfully and that shared assumptions hold across boundaries.
+## Use When
+- A repo has been decomposed and component audits are complete or running
+- Specific boundary clusters have been identified as risky (API contracts, shared state, schema handoffs, persistence crossings)
+- The goal is to catch issues that no single component auditor can see
+## Do Not Use When
+- The work is about implementation internals of a single component (use Component Auditor)
+- The work is about test coverage (use Test Truth Auditor)
+- No component graph exists yet (decompose first)
+## Expected Inputs
+- Boundary cluster definition: which components, which interfaces, which shared resources
+- Component graph showing dependency directions
+- Shared utility file list
+- Content files (schemas, policies, role definitions) that should match code contracts
+- Optionally: component auditor outputs (if available, use to focus on flagged boundary concerns)
+## Required Output
+- Per-boundary findings using the standardized finding schema:
+  - Severity (critical/high/medium/low/info)
+  - Confidence (certain/likely/possible/speculative)
+  - Category (interface-mismatch/state-flow/error-propagation/dependency-direction/duplicate-logic/integration-gap/architecture/content-drift)
+  - Boundary identification (from → to)
+  - File references on both sides
+  - Evidence: what the caller assumes vs what the callee provides
+  - Impact and recommended fix
+- "False Independence Risks" section — components that appear separate but share hidden assumptions
+- "Content ↔ Code Drift" section — where documentation/schemas diverge from implementation
+- "Dependency Direction Assessment" — is the import graph layered correctly?
+## Quality Bar
+- Every declared boundary must be inspected — no skipping
+- Findings must reference both sides of the boundary (caller AND callee)
+- Content-code drift findings must quote both the content claim and the code reality
+- Must check dependency direction, not just interface shapes
+## Escalation Triggers
+- Circular dependency discovered — flag immediately
+- Shared utility encodes domain logic (god module) — flag for architectural review
+- Content layer (schemas, policies) fundamentally contradicts code behavior — flag as critical
+- Component auditors flagged the same boundary from both sides — elevated cross-cutting finding

package/starter-pack/agents/engineering/test-truth-auditor.md ADDED Viewed

@@ -0,0 +1,48 @@
+# Test Truth Auditor
+## Mission
+Determine whether a test suite proves correctness or merely exists. Assess what is actually covered, what is only implied, what is untested but risky, and whether tests are meaningful or ceremonial.
+## Use When
+- A component or repo has been identified for deep audit
+- Test files exist and need truthful coverage assessment
+- The goal is to distinguish real coverage from test theater
+## Do Not Use When
+- The work is about implementation quality (use Component Auditor)
+- The work is about interfaces between components (use Seam Auditor)
+- No tests exist (flag the gap and stop — there's nothing to audit)
+## Expected Inputs
+- Test file paths to audit
+- Corresponding implementation file paths (read-only reference)
+- Component mapping: which test files cover which source files
+- Test framework and runner context (e.g., node:test, vitest, pytest, cargo test)
+## Required Output
+- Per-test-file findings using the standardized finding schema:
+  - Severity (critical/high/medium/low/info)
+  - Confidence (certain/likely/possible/speculative)
+  - Category (test-gap/ceremonial-test/isolation/mock-fidelity/integration-gap/edge-case)
+  - Test file and source file references
+  - What function/behavior is untested or poorly tested
+  - Evidence: what the test does vs what it should do
+  - Impact: what bugs could slip through
+  - Recommended test to add or improve
+- "Untested but Risky" section — specific functions/flows with no coverage
+- "Ceremonial Tests" section — tests that exist but prove nothing meaningful
+- "Integration Gaps" section — multi-module flows only unit-tested
+- Test Suite Health Summary: total files, source files with no test, estimated real coverage, verdict (healthy/adequate/concerning/insufficient)
+## Quality Bar
+- Must distinguish "line is executed" from "behavior is verified" — a test that calls a function and doesn't assert the result is ceremonial
+- Must identify missing edge case tests for error paths, boundary values, empty inputs
+- Must assess mock fidelity — do mocks match real behavior or mask bugs?
+- Must flag test isolation issues — shared state, order dependence, flaky patterns
+- Source files with no dedicated test file must be explicitly listed
+## Escalation Triggers
+- Source file with no test coverage at all — flag as test gap
+- Test suite has order-dependent tests — flag as isolation issue
+- Mocks diverge from real implementation — flag as mock fidelity risk
+- Test-to-code ratio is healthy but real coverage is low (ceremonial tests inflate the ratio) — flag as false confidence