role-os 2.1.0 → 2.2.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/CHANGELOG.md CHANGED
@@ -1,5 +1,38 @@
1
1
  # Changelog
2
2
 
3
+ ## 2.2.0
4
+
5
+ ### Added
6
+
7
+ #### Deep Audit Mission — Runner-Native Componentized Repo Audit
8
+
9
+ - **Deep audit mission** — 8th mission in the library. Decomposes a repo into bounded components, dispatches one auditor per component, inspects seams from the dependency graph, assesses test truth, then synthesizes into a ranked verdict and action plan.
10
+ - **Dynamic dispatch** — missions with `dynamicDispatch` field now expand from a manifest at runtime. `createRun("deep-audit", task, { manifest })` creates N + M + K + 3 steps from the repo graph instead of a fixed static chain. A 6-component / 8-boundary repo produces 23 steps; a 10-component / 5-boundary repo produces 28.
11
+ - **4 new audit roles** — Component Auditor, Seam Auditor, Test Truth Auditor, Audit Synthesizer. Each with full artifact contracts, tool profiles, and role definitions in starter-pack.
12
+ - **Deep-audit pack** — 9th team pack with scaling chain order, dispatch defaults, and mismatch guards.
13
+ - **Artifact validation at execution boundaries** — `validateArtifact()` now runs on every step completion in both `run.mjs` and `mission-run.mjs`. Validation results are attached to the step object. Warn, don't block.
14
+ - **Proof run test suite** — `test/deep-audit-proof.test.mjs` proves the full runner-native lifecycle against the real audit-manifest.json: step creation, parcel identity, validation, escalation, partial failure, scaling formula, and report generation.
15
+
16
+ ### Fixed
17
+
18
+ - **Critical: "approve" vs "accept" verdict mismatch** — `evidence.mjs:195` checked `!== "approve"` but the enum defines `"accept"`. Every accept verdict generated a spurious warning. Tests masked it via substring matching. Fixed to `"accept"` with hardened exact-assertion tests.
19
+ - **Dead imports removed** — `TEAM_PACKS` and `ROLE_ARTIFACT_CONTRACTS` in mission-run.mjs, `TEAM_PACKS` in run.mjs, `scoreRole` and `MIN_SCORE_THRESHOLD` in trial.mjs were imported but never used.
20
+ - **Warning message terminology** — all evidence warning messages now use "accept" instead of "approve" consistently.
21
+
22
+ ### Changed
23
+
24
+ - Mission count: 7 → 8
25
+ - Role count: 50 → 54 (4 deep audit roles)
26
+ - Pack count: 8 → 9
27
+ - Artifact contract count: 30 → 34 (4 new audit role contracts)
28
+ - Test count: 905 → 936
29
+
30
+ ### Evidence
31
+
32
+ - Self-audit dogfood: 128 findings (1 critical, 11 high, 39 medium) across 6 component parcels, 8 boundary seams, and 31 test files
33
+ - Runner-native proof run: 23 dynamic steps from real manifest, full lifecycle, all green
34
+ - Scaling formula verified: 2N + K + 3 holds for manifests of 3, 6, 10, and 15 components
35
+
3
36
  ## 2.1.0
4
37
 
5
38
  ### Added
package/README.md CHANGED
@@ -13,7 +13,7 @@
13
13
  <a href="https://mcp-tool-shop-org.github.io/role-os/"><img src="https://img.shields.io/badge/Landing_Page-live-brightgreen" alt="Landing Page"></a>
14
14
  </p>
15
15
 
16
- A multi-Claude operating system that staffs, routes, validates, and runs work through 50 specialized role contracts. Creates task packets, assembles the right team from scored role matching, detects broken chains before execution, auto-routes recovery when work is blocked or rejected, and requires structured evidence in every verdict.
16
+ A multi-Claude operating system that staffs, routes, validates, and runs work through 54 specialized role contracts. Creates task packets, assembles the right team from scored role matching, detects broken chains before execution, auto-routes recovery when work is blocked or rejected, and requires structured evidence in every verdict. Includes dynamic dispatch for manifest-scaled missions — a 10-component repo automatically becomes 28 auditor steps, not 6.
17
17
 
18
18
  ## What it does
19
19
 
@@ -44,9 +44,9 @@ roleos start "something completely novel"
44
44
 
45
45
  **The fallback ladder:**
46
46
 
47
- 1. **Mission** — when the task matches a proven recurring workflow (bugfix, treatment, feature-ship, docs, security, research). Known role chain, artifact flow, escalation branches, and honest-partial definitions.
48
- 2. **Pack** — when the task is a known family but not a full mission shape. 7 calibrated team packs with auto-selection and mismatch guards.
49
- 3. **Free routing** — when the task is novel, mixed, or uncertain. Scores all 31 roles against packet content and assembles a dynamic chain.
47
+ 1. **Mission** — when the task matches a proven recurring workflow (bugfix, treatment, feature-ship, docs, security, research, brainstorm, deep-audit). Known role chain, artifact flow, escalation branches, and honest-partial definitions.
48
+ 2. **Pack** — when the task is a known family but not a full mission shape. 9 calibrated team packs with auto-selection and mismatch guards.
49
+ 3. **Free routing** — when the task is novel, mixed, or uncertain. Scores all 54 roles against packet content and assembles a dynamic chain.
50
50
 
51
51
  The system never forces work through the wrong abstraction. It explains why it chose each level and offers alternatives.
52
52
 
@@ -103,7 +103,7 @@ Full treatment is a canonical 7-phase protocol defined in Claude project memory
103
103
 
104
104
  Order: Shipcheck first, then full treatment. No v1.0.0 without passing hard gates.
105
105
 
106
- ## 50 roles across 8 packs
106
+ ## 54 roles across 9 packs
107
107
 
108
108
  | Pack | Roles |
109
109
  |------|-------|
@@ -115,6 +115,7 @@ Order: Shipcheck first, then full treatment. No v1.0.0 without passing hard gate
115
115
  | **Product** (3) | Feedback Synthesizer, Roadmap Prioritizer, Spec Writer |
116
116
  | **Research** (4) | UX Researcher, Competitive Analyst, Trend Researcher, User Interview Synthesizer |
117
117
  | **Growth** (4) | Launch Strategist, Content Strategist, Community Manager, Support Triage Lead |
118
+ | **Deep Audit** (4) | Component Auditor, Test Truth Auditor, Seam Auditor, Audit Synthesizer |
118
119
 
119
120
  Every role has a full contract: mission, use when, do not use when, expected inputs, required outputs, quality bar, and escalation triggers. Every role is routable — `roleos route` can recommend any of them based on packet content.
120
121
 
@@ -209,13 +210,13 @@ role-os/
209
210
  mission.mjs ← 7 named mission types (feature, bugfix, treatment, docs, security, research, brainstorm)
210
211
  mission-run.mjs ← Mission runner: create → step → complete → report
211
212
  mission-cmd.mjs ← `roleos mission` CLI commands
212
- route.mjs ← 31-role routing + dynamic chain builder
213
- packs.mjs ← 7 calibrated team packs + auto-selection
213
+ route.mjs ← 54-role routing + dynamic chain builder
214
+ packs.mjs ← 9 calibrated team packs + auto-selection
214
215
  conflicts.mjs ← 4-pass conflict detection
215
216
  escalation.mjs ← Auto-routing for blocked/rejected/split
216
217
  evidence.mjs ← Structured evidence + role-aware requirements
217
218
  dispatch.mjs ← Runtime dispatch manifests for multi-claude
218
- artifacts.mjs ← 30 per-role artifact contracts + 7 pack handoffs
219
+ artifacts.mjs ← Per-role artifact contracts + pack handoffs
219
220
  decompose.mjs ← Composite task detection + splitting
220
221
  composite.mjs ← Dependency-ordered execution + recovery
221
222
  replan.mjs ← Mid-run adaptive replanning
@@ -225,7 +226,7 @@ role-os/
225
226
  brainstorm.mjs ← Evidence modes, request validation, finding/synthesis/judge schemas
226
227
  brainstorm-roles.mjs ← Role-native schemas, input partitioning, blindspot enforcement, cross-exam
227
228
  brainstorm-render.mjs ← Two-layer rendering: lexical bans, render schemas, debate transcript
228
- test/ ← 894 tests across 30 test files
229
+ test/ ← 936 tests across 31 test files
229
230
  starter-pack/ ← Drop-in role contracts, policies, schemas, workflows
230
231
  ```
231
232
 
@@ -237,28 +238,29 @@ Role OS operates **locally only**. It copies markdown templates and writes packe
237
238
 
238
239
  | Layer | What it does | Status |
239
240
  |-------|-------------|--------|
240
- | **Routing** | Scores all 31 roles against packet content, explains recommendations, assesses confidence | ✓ Shipped |
241
+ | **Routing** | Scores all 54 roles against packet content, explains recommendations, assesses confidence | ✓ Shipped |
241
242
  | **Chain builder** | Assembles phase-ordered chains from scored roles, packet-type biased not template-locked | ✓ Shipped |
242
243
  | **Conflict detection** | 4-pass validation: hard conflicts, sequence, redundancy, coverage gaps. Repair suggestions. | ✓ Shipped |
243
244
  | **Escalation** | Auto-routes blocked/rejected/split work to the right resolver with reason + required artifact | ✓ Shipped |
244
245
  | **Evidence** | Role-aware structured evidence in verdicts. Sufficiency checks. 12 evidence kinds. | ✓ Shipped |
245
246
  | **Dispatch** | Generates execution manifests for multi-claude. Per-role tool profiles, system prompts, budgets. | ✓ Shipped |
246
247
  | **Trials** | Full roster proven: 30/30 gold-task + 5/5 negative trials. 7 pack trials complete. | ✓ Complete |
247
- | **Team Packs** | 7 calibrated packs with auto-selection, mismatch guards, and free-routing fallback. | ✓ Shipped |
248
+ | **Team Packs** | 9 calibrated packs with auto-selection, mismatch guards, and free-routing fallback. | ✓ Shipped |
248
249
  | **Outcome calibration** | Records run outcomes, tunes pack/role weights from results, adjusts confidence thresholds. | ✓ Shipped |
249
250
  | **Mixed-task decomposition** | Detects composite work, splits into child packets, assigns packs, preserves dependencies. | ✓ Shipped |
250
251
  | **Composite execution** | Runs child packets in dependency order with artifact passing, branch recovery, and synthesis. | ✓ Shipped |
251
252
  | **Adaptive replanning** | Mid-run scope changes, findings, or new requirements update the plan without restarting. | ✓ Shipped |
252
253
  | **Session spine** | `roleos init claude` scaffolds CLAUDE.md, /roleos-route, /roleos-review, /roleos-status. `roleos doctor` verifies wiring. Route cards prove engagement. | ✓ Shipped |
253
254
  | **Hook spine** | 5 lifecycle hooks (SessionStart, PromptSubmit, PreToolUse, SubagentStart, Stop). Advisory enforcement: route card reminders, write-tool gating, subagent role injection, completion audit. | ✓ Shipped |
254
- | **Artifact spine** | 30 per-role artifact contracts. 7 pack handoff contracts. Structural validation. Chain completeness checks. Downstream roles never guess what they received. | ✓ Shipped |
255
- | **Mission library** | 7 named missions (feature-ship, bugfix, treatment, docs-release, security-hardening, research-launch, brainstorm). Each declares pack, role chain, artifact flow, escalation branches, honest-partial definition. All 7 trial-proven. | ✓ Shipped |
255
+ | **Artifact spine** | Per-role artifact contracts. Pack handoff contracts. Structural validation. Chain completeness checks. Downstream roles never guess what they received. | ✓ Shipped |
256
+ | **Mission library** | 8 named missions (feature-ship, bugfix, treatment, docs-release, security-hardening, research-launch, brainstorm, deep-audit). Each declares pack, role chain, artifact flow, escalation branches, honest-partial definition. | ✓ Shipped |
256
257
  | **Mission runner** | Create runs, step through with tracked state, complete/fail with honest reporting. Blocked-step propagation, out-of-chain escalation warnings, last-step re-opening. | ✓ Shipped |
257
258
  | **Unified entry** | `roleos start` decides mission vs pack vs free routing automatically. Fallback ladder with confidence scores, alternatives, and composite detection. | ✓ Shipped |
258
259
  | **Persistent runs** | `roleos run` creates disk-backed runs. `resume`, `next`, `explain`, `complete`, `fail`. Interventions: reroute, escalate, retry, block, reopen. Step-local guidance. Friction measurement. | ✓ Shipped |
259
- | **Brainstorm** | Two-layer architecture: truth (role-native schemas, provenance atoms, cross-exam dispute graph) + render (5 distinct voices, lexical bans, debate transcript). Trace links prove every rendered claim maps to a truth atom. Golden run: 894 tests. | ✓ Shipped |
260
+ | **Brainstorm** | Two-layer architecture: truth (role-native schemas, provenance atoms, cross-exam dispute graph) + render (5 distinct voices, lexical bans, debate transcript). Trace links prove every rendered claim maps to a truth atom. Golden run proven. | ✓ Shipped |
261
+ | **Deep Audit** | Manifest-scaled repo audit: decompose repo into components, dispatch N auditors + M test truth auditors + K seam auditors from dependency graph, synthesize into ranked verdict and action plan. Dynamic dispatch scales with repo size (2N + K + 3 formula). Runner-native with artifact validation at every step. | ✓ Shipped |
260
262
 
261
- ## 7 missions
263
+ ## 8 missions
262
264
 
263
265
  | Mission | Pack | Roles | When to use |
264
266
  |---------|------|-------|-------------|
@@ -269,6 +271,7 @@ Role OS operates **locally only**. It copies markdown templates and writes packe
269
271
  | `security-hardening` | security | 4 | Threat model, audit, fix vulnerabilities, re-audit, verify |
270
272
  | `research-launch` | research | 4 | Frame question, research, document findings, decide |
271
273
  | `brainstorm` | brainstorm | 9 | Structured multi-perspective inquiry with traceable disagreement and verdict |
274
+ | `deep-audit` | deep-audit | 5 (scales) | Manifest-backed repo audit — worker count scales with repo graph via dynamic dispatch |
272
275
 
273
276
  Each mission includes honest-partial definitions — when work stalls, the system documents what was completed and what remains instead of bluffing completion.
274
277
 
@@ -290,7 +293,27 @@ roleos run "explore product directions for a developer tool discovery platform"
290
293
 
291
294
  - **Chain of custody:** Every rendered sentence traces back to a truth-layer atom. Synthesis directions cite atoms. Cross-exam targets real claim IDs. The dispute graph is the product, not the prose.
292
295
 
293
- **Proven:** v0.4 golden run — 894 tests, full chain of custody verified. See [`examples/golden-run.md`](examples/golden-run.md) for the complete artifact chain.
296
+ **Proven:** v0.4 golden run — full chain of custody verified. See [`examples/golden-run.md`](examples/golden-run.md) for the complete artifact chain.
297
+
298
+ ### Deep audit mission
299
+
300
+ Not a surface scan. The deep audit mission **decomposes a repo into bounded components and dispatches specialist auditors at a scale determined by the repo's own dependency graph.**
301
+
302
+ ```bash
303
+ roleos run "deep audit this repo" --manifest=audit-manifest.json
304
+ # → MISSION: Deep Audit (Manifest-Scaled)
305
+ # Steps: Component Auditor ×6 + Test Truth Auditor ×6 + Seam Auditor ×8 + Synthesizer + Action Plan + Critic = 23 steps
306
+ ```
307
+
308
+ **What makes it different:**
309
+
310
+ - **Dynamic dispatch** — worker count is not fixed. A 10-component repo with 5 boundary clusters produces 28 steps (2×10 + 5 + 3). A 3-component repo produces 12. The scaling formula is `2N + K + 3` where N = components, K = boundaries.
311
+ - **Manifest-backed parcels** — an `audit-manifest.json` defines components (with file paths, line counts, descriptions) and boundaries (from/to with interface descriptions). Each auditor receives only its parcel.
312
+ - **Four role archetypes** — Component Auditor (code truth per module), Test Truth Auditor (tests that prove vs tests that exist), Seam Auditor (integration boundaries from the dependency graph), Audit Synthesizer (ranked verdict + action plan from all parcels).
313
+ - **Artifact validation at every step** — `validateArtifact()` fires on every step completion in both execution paths. Results attached to step objects. The system knows whether each artifact met its contract.
314
+ - **Honest partial** — when budget or scope blocks completion, per-component findings are individually valid. The system synthesizes from whatever completed, never bluffs full coverage.
315
+
316
+ **Proven:** Runner-native proof run — 18 tests against real manifest, full lifecycle verified including escalation re-opening and partial failure. Scaling formula verified for 3/6/10/15-component manifests.
294
317
 
295
318
  ## Status
296
319
 
@@ -309,6 +332,7 @@ roleos run "explore product directions for a developer tool discovery platform"
309
332
  - **v2.0.0**: Operator friction pass (Phase U) — `roleos run` creates persistent disk-backed runs. Resume, next, explain, complete, fail. Interventions: reroute, escalate, retry, block, reopen. Step-local guidance at every step. Friction measurement. 6 friction trials. 613 tests.
310
333
  - **v2.0.1**: Handbook audit, beginner docs, test count corrections. 617 tests.
311
334
  - **v2.1.0**: Brainstorm mission (v0.4) — specialized roles under law, traceable disagreement, verdict-bearing output. Two-layer architecture (truth + render), cross-exam permission matrix, dispute graph, golden run proof. 7 missions, 50 roles, 8 packs. 894 tests.
335
+ - **v2.2.0**: Deep Audit mission — manifest-scaled repo audit with dynamic dispatch. 4 new audit roles (Component Auditor, Test Truth Auditor, Seam Auditor, Audit Synthesizer). Worker count scales with repo graph (2N + K + 3 formula). Artifact validation wired at both execution boundaries. Runner-native proof run green. accept/approve truth fix in evidence layer. 8 missions, 54 roles, 9 packs. 936 tests.
312
336
 
313
337
  ## License
314
338
 
package/package.json CHANGED
@@ -1,7 +1,7 @@
1
1
  {
2
2
  "name": "role-os",
3
- "version": "2.1.0",
4
- "description": "Role OS — a multi-Claude operating system where 50 specialized roles execute work through contracts, conflict detection, escalation, and structured evidence. 8 proven team packs, 7 missions including brainstorm with traceable disagreement and verdict-bearing output.",
3
+ "version": "2.2.0",
4
+ "description": "Role OS — a multi-Claude operating system where 54 specialized roles execute work through contracts, conflict detection, escalation, and structured evidence. 9 team packs, 8 missions including deep audit with manifest-scaled dynamic dispatch and brainstorm with traceable disagreement.",
5
5
  "homepage": "https://mcp-tool-shop-org.github.io/role-os/",
6
6
  "bugs": {
7
7
  "url": "https://github.com/mcp-tool-shop-org/role-os/issues"
package/src/artifacts.mjs CHANGED
@@ -256,6 +256,40 @@ export const ROLE_ARTIFACT_CONTRACTS = {
256
256
  consumedBy: [],
257
257
  completionRule: "Disposition is accept/revise_expand/revise_synthesize/reject. Verdicts: ready_to_advance/needs_incubation/not_active_now. Actions: build_now/hold_for_followon/archive_but_retain. Revise requires targets.",
258
258
  },
259
+
260
+ // ── Deep Audit ──
261
+ "Component Auditor": {
262
+ artifactType: "component-audit-report",
263
+ requiredSections: ["findings", "what-i-could-not-verify", "adjacent-parcel-risks", "parcel-statistics"],
264
+ optionalSections: [],
265
+ requiredEvidence: ["component-parcel-definition"],
266
+ consumedBy: ["Audit Synthesizer"],
267
+ completionRule: "Every file in owned paths read. Findings use standardized schema with severity, confidence, category, file, evidence, impact. Adjacent parcel risks are specific, not generic.",
268
+ },
269
+ "Seam Auditor": {
270
+ artifactType: "seam-audit-report",
271
+ requiredSections: ["findings", "false-independence-risks", "content-code-drift", "dependency-direction-assessment"],
272
+ optionalSections: [],
273
+ requiredEvidence: ["boundary-cluster-definition", "component-graph"],
274
+ consumedBy: ["Audit Synthesizer"],
275
+ completionRule: "Every declared boundary inspected. Findings reference both sides. Content-code drift quotes both content claim and code reality.",
276
+ },
277
+ "Test Truth Auditor": {
278
+ artifactType: "test-truth-report",
279
+ requiredSections: ["findings", "untested-but-risky", "ceremonial-tests", "integration-gaps", "test-suite-health-summary"],
280
+ optionalSections: [],
281
+ requiredEvidence: ["test-file-paths", "implementation-file-paths"],
282
+ consumedBy: ["Audit Synthesizer"],
283
+ completionRule: "Distinguishes 'line executed' from 'behavior verified'. Lists source files with no test. Estimates real coverage with reasoning.",
284
+ },
285
+ "Audit Synthesizer": {
286
+ artifactType: "audit-summary",
287
+ requiredSections: ["verdict", "posture", "by-the-numbers", "structurally-sound", "fragile", "dangerous", "dead-weight", "cross-cutting-findings", "contradictions", "audit-gaps"],
288
+ optionalSections: [],
289
+ requiredEvidence: ["component-audit-report", "seam-audit-report", "test-truth-report"],
290
+ consumedBy: ["Critic Reviewer"],
291
+ completionRule: "Reconciles findings across parcels. Cross-cutting findings reference source parcels. Contradictions adjudicated. Action plan groups by root cause and leverage.",
292
+ },
259
293
  };
260
294
 
261
295
  // ── Artifact validation ───────────────────────────────────────────────────────
@@ -398,6 +432,15 @@ export const PACK_HANDOFF_CONTRACTS = {
398
432
  { role: "Critic Reviewer", produces: "verdict", consumedBy: null },
399
433
  ],
400
434
  },
435
+ "deep-audit": {
436
+ flow: [
437
+ { role: "Component Auditor", produces: "component-audit-report", consumedBy: "Audit Synthesizer" },
438
+ { role: "Test Truth Auditor", produces: "test-truth-report", consumedBy: "Audit Synthesizer" },
439
+ { role: "Seam Auditor", produces: "seam-audit-report", consumedBy: "Audit Synthesizer" },
440
+ { role: "Audit Synthesizer", produces: "audit-summary", consumedBy: "Critic Reviewer" },
441
+ { role: "Critic Reviewer", produces: "verdict", consumedBy: null },
442
+ ],
443
+ },
401
444
  };
402
445
 
403
446
  /**
package/src/dispatch.mjs CHANGED
@@ -88,6 +88,12 @@ const TOOL_PROFILES = {
88
88
  "Mechanics Analyst": ["Read", "Glob", "Grep"],
89
89
  "Positioning Analyst": ["Read", "Glob", "Grep"],
90
90
  "Contrarian Analyst": ["Read", "Glob", "Grep"],
91
+
92
+ // Deep Audit
93
+ "Component Auditor": ["Read", "Glob", "Grep"],
94
+ "Seam Auditor": ["Read", "Glob", "Grep"],
95
+ "Test Truth Auditor": ["Read", "Glob", "Grep"],
96
+ "Audit Synthesizer": ["Read", "Glob", "Grep", "Write"],
91
97
  };
92
98
 
93
99
  // ── Default role config ─────────────────────────────────────────────────────
package/src/evidence.mjs CHANGED
@@ -146,7 +146,7 @@ const DEFAULT_REQUIREMENTS = {
146
146
  * @property {EvidenceItem[]} evidence - Structured evidence items
147
147
  * @property {string[]} gaps - What's missing or weak
148
148
  * @property {string[]} risks - Identified risks
149
- * @property {string} [requiredNextArtifact] - What the next role must produce (for non-approve)
149
+ * @property {string} [requiredNextArtifact] - What the next role must produce (for non-accept)
150
150
  * @property {string} confidence - One of CONFIDENCE_LEVELS
151
151
  */
152
152
 
@@ -182,25 +182,25 @@ export function checkSufficiency(verdict) {
182
182
  .map(e => `${e.kind}: ${e.claim} (${e.reference})`);
183
183
 
184
184
  if (contradictions.length > 0 && verdict.verdict === "accept") {
185
- warnings.push("Verdict is 'approve' but evidence contains contradictions — review carefully");
185
+ warnings.push("Verdict is 'accept' but evidence contains contradictions — review carefully");
186
186
  }
187
187
 
188
- // Check for missing evidence items on non-approve verdicts
188
+ // Check for missing evidence items on accept verdicts
189
189
  const missingItems = verdict.evidence.filter(e => e.status === "missing");
190
190
  if (missingItems.length > 0 && verdict.verdict === "accept") {
191
- warnings.push("Verdict is 'approve' but some evidence items are marked 'missing'");
191
+ warnings.push("Verdict is 'accept' but some evidence items are marked 'missing'");
192
192
  }
193
193
 
194
- // Non-approve verdicts should have gaps or requiredNextArtifact
195
- if (verdict.verdict !== "approve" && verdict.verdict !== "accept-with-notes") {
194
+ // Non-accept verdicts should have gaps or requiredNextArtifact
195
+ if (verdict.verdict !== "accept" && verdict.verdict !== "accept-with-notes") {
196
196
  if (verdict.gaps.length === 0 && !verdict.requiredNextArtifact) {
197
- warnings.push("Non-approve verdict should specify gaps or requiredNextArtifact for recovery");
197
+ warnings.push("Non-accept verdict should specify gaps or requiredNextArtifact for recovery");
198
198
  }
199
199
  }
200
200
 
201
- // Low confidence + approve is suspicious
201
+ // Low confidence + accept is suspicious
202
202
  if (verdict.confidence === "low" && verdict.verdict === "accept") {
203
- warnings.push("Low confidence approve — consider whether evidence is actually sufficient");
203
+ warnings.push("Low confidence accept — consider whether evidence is actually sufficient");
204
204
  }
205
205
 
206
206
  const sufficient = missingRequired.length === 0 && contradictions.length === 0;
@@ -10,8 +10,7 @@
10
10
  */
11
11
 
12
12
  import { MISSIONS, getMission, validateMission } from "./mission.mjs";
13
- import { TEAM_PACKS } from "./packs.mjs";
14
- import { validateArtifact, ROLE_ARTIFACT_CONTRACTS } from "./artifacts.mjs";
13
+ import { validateArtifact } from "./artifacts.mjs";
15
14
 
16
15
  let _runCounter = 0;
17
16
 
@@ -59,7 +58,7 @@ let _runCounter = 0;
59
58
  * @param {string} taskDescription
60
59
  * @returns {MissionRun}
61
60
  */
62
- export function createRun(missionKey, taskDescription) {
61
+ export function createRun(missionKey, taskDescription, options = {}) {
63
62
  const mission = getMission(missionKey);
64
63
  if (!mission) {
65
64
  throw new Error(`Mission "${missionKey}" not found. Available: ${Object.keys(MISSIONS).join(", ")}`);
@@ -72,16 +71,26 @@ export function createRun(missionKey, taskDescription) {
72
71
 
73
72
  const id = `${missionKey}-${Date.now()}-${++_runCounter}`;
74
73
 
75
- const steps = mission.artifactFlow.map((step) => ({
76
- role: step.role,
77
- produces: step.produces,
78
- consumedBy: step.consumedBy,
79
- status: "pending",
80
- artifact: null,
81
- note: null,
82
- startedAt: null,
83
- completedAt: null,
84
- }));
74
+ let steps;
75
+ const dd = mission.dynamicDispatch;
76
+
77
+ if (dd && options.manifest) {
78
+ // Dynamic dispatch — build steps from manifest
79
+ steps = buildDynamicSteps(mission, options.manifest);
80
+ } else {
81
+ // Static dispatch — use artifactFlow as-is
82
+ steps = mission.artifactFlow.map((step) => ({
83
+ role: step.role,
84
+ produces: step.produces,
85
+ consumedBy: step.consumedBy,
86
+ status: "pending",
87
+ artifact: null,
88
+ artifactValidation: null,
89
+ note: null,
90
+ startedAt: null,
91
+ completedAt: null,
92
+ }));
93
+ }
85
94
 
86
95
  return {
87
96
  id,
@@ -93,9 +102,94 @@ export function createRun(missionKey, taskDescription) {
93
102
  startedAt: new Date().toISOString(),
94
103
  completedAt: null,
95
104
  completionReport: null,
105
+ dynamicDispatch: dd && options.manifest ? true : false,
106
+ manifest: options.manifest || null,
96
107
  };
97
108
  }
98
109
 
110
+ /**
111
+ * Build steps from manifest for dynamic dispatch missions.
112
+ * @param {Object} mission
113
+ * @param {Object} manifest - The audit-manifest.json content
114
+ * @returns {MissionStep[]}
115
+ */
116
+ function buildDynamicSteps(mission, manifest) {
117
+ const dd = mission.dynamicDispatch;
118
+ const steps = [];
119
+
120
+ // Scaling roles: one step per manifest entry
121
+ const components = manifest[dd.componentAuditorPer] || [];
122
+ const boundaries = manifest[dd.seamAuditorPer] || manifest.boundaries || [];
123
+
124
+ // Component Auditor × N
125
+ for (const comp of components) {
126
+ steps.push({
127
+ role: "Component Auditor",
128
+ produces: "component-audit-report",
129
+ consumedBy: "Audit Synthesizer",
130
+ parcel: comp.id || comp.name,
131
+ status: "pending",
132
+ artifact: null,
133
+ artifactValidation: null,
134
+ note: null,
135
+ startedAt: null,
136
+ completedAt: null,
137
+ });
138
+ }
139
+
140
+ // Test Truth Auditor × M
141
+ for (const comp of components) {
142
+ steps.push({
143
+ role: "Test Truth Auditor",
144
+ produces: "test-truth-report",
145
+ consumedBy: "Audit Synthesizer",
146
+ parcel: comp.id || comp.name,
147
+ status: "pending",
148
+ artifact: null,
149
+ artifactValidation: null,
150
+ note: null,
151
+ startedAt: null,
152
+ completedAt: null,
153
+ });
154
+ }
155
+
156
+ // Seam Auditor × K
157
+ for (const boundary of boundaries) {
158
+ const label = boundary.id || `${boundary.from}-${boundary.to}`;
159
+ steps.push({
160
+ role: "Seam Auditor",
161
+ produces: "seam-audit-report",
162
+ consumedBy: "Audit Synthesizer",
163
+ parcel: label,
164
+ status: "pending",
165
+ artifact: null,
166
+ artifactValidation: null,
167
+ note: null,
168
+ startedAt: null,
169
+ completedAt: null,
170
+ });
171
+ }
172
+
173
+ // Non-scaling roles from artifactFlow (Audit Synthesizer, Critic Reviewer)
174
+ for (const step of mission.artifactFlow) {
175
+ if (!dd.scalingRoles.includes(step.role)) {
176
+ steps.push({
177
+ role: step.role,
178
+ produces: step.produces,
179
+ consumedBy: step.consumedBy,
180
+ status: "pending",
181
+ artifact: null,
182
+ artifactValidation: null,
183
+ note: null,
184
+ startedAt: null,
185
+ completedAt: null,
186
+ });
187
+ }
188
+ }
189
+
190
+ return steps;
191
+ }
192
+
99
193
  // ── Step through a run ──────────────────────────────────────────────────────
100
194
 
101
195
  /**
@@ -127,6 +221,10 @@ export function completeStep(run, artifact, note) {
127
221
  throw new Error("No active step to complete");
128
222
  }
129
223
 
224
+ // Validate artifact against role contract (warn, don't block)
225
+ const validation = validateArtifact(active.role, artifact);
226
+ active.artifactValidation = validation;
227
+
130
228
  active.status = "completed";
131
229
  active.artifact = artifact;
132
230
  active.note = note || null;
package/src/mission.mjs CHANGED
@@ -268,6 +268,65 @@ export const MISSIONS = {
268
268
  dispatchDefaults: { model: "sonnet", maxTurns: 40, maxBudgetUsd: 6.0 },
269
269
  trialEvidence: "v0.4 golden run — 894 tests green. Full chain of custody proven: truth artifacts, provenance atoms, dispute graph (4 challenges, 3 narrowed, 1 unresolved), rendered artifacts in 5 formats, debate transcript, 16+ trace links from rendered → truth. Architecture frozen 2026-03-27.",
270
270
  },
271
+ // ── Deep Audit (Componentized Repo Understanding) ──────────────────────────
272
+ "deep-audit": {
273
+ name: "Deep Audit",
274
+ description: "Decompose a repo into bounded components, dispatch one auditor per component, inspect seams from the dependency graph, assess test truth, then synthesize into a ranked verdict and action plan. Worker count scales with the repo graph — not fixed.",
275
+ pack: "deep-audit",
276
+ entryPath: "Decompose repo → validate parcels → Component Auditor ×N (parallel) + Test Truth Auditor ×M → Seam Auditor ×K (from graph edges) → Audit Synthesizer → Critic reviews verdict",
277
+ // NOTE: This mission has a DYNAMIC role chain. The static chain below
278
+ // shows the role archetypes. At dispatch time, Component Auditor and
279
+ // Seam Auditor are instantiated once per component/boundary cluster.
280
+ // A 10-component repo with 4 risky boundaries = 10 + 4 + 2 + 1 + 1 = 18 tasks.
281
+ roleChain: [
282
+ "Component Auditor", // ×N — one per component from audit-manifest
283
+ "Test Truth Auditor", // ×M — one per component or one overlay pass
284
+ "Seam Auditor", // ×K — one per risky boundary cluster from graph
285
+ "Audit Synthesizer", // ×1 — consumes all outputs, produces verdict
286
+ "Critic Reviewer", // ×1 — final acceptance
287
+ ],
288
+ // Dynamic dispatch contract:
289
+ // Step 1 produces audit-manifest.json with components[] and boundaries[].
290
+ // Steps 2-4 are instantiated from the manifest:
291
+ // - One Component Auditor task per components[] entry
292
+ // - One Test Truth Auditor task per component (or grouped by layer)
293
+ // - One Seam Auditor task per boundary cluster
294
+ // Step 5 (Audit Synthesizer) runs after ALL step 2-4 tasks complete.
295
+ // Step 6 (Critic Reviewer) reviews the synthesis.
296
+ dynamicDispatch: {
297
+ scalingRoles: ["Component Auditor", "Test Truth Auditor", "Seam Auditor"],
298
+ manifestSource: "audit-manifest.json",
299
+ componentAuditorPer: "components",
300
+ testTruthAuditorPer: "components",
301
+ seamAuditorPer: "boundary_clusters",
302
+ synthesisAfter: ["Component Auditor", "Test Truth Auditor", "Seam Auditor"],
303
+ },
304
+ artifactFlow: [
305
+ // Step 1: Decomposition (done before mission dispatch — input artifact)
306
+ { role: "Component Auditor", produces: "component-audit-report", consumedBy: "Audit Synthesizer" },
307
+ { role: "Test Truth Auditor", produces: "test-truth-report", consumedBy: "Audit Synthesizer" },
308
+ { role: "Seam Auditor", produces: "seam-audit-report", consumedBy: "Audit Synthesizer" },
309
+ { role: "Audit Synthesizer", produces: "audit-summary", consumedBy: "Critic Reviewer" },
310
+ { role: "Audit Synthesizer", produces: "audit-action-plan", consumedBy: "Critic Reviewer" },
311
+ { role: "Critic Reviewer", produces: "review-verdict", consumedBy: null },
312
+ ],
313
+ escalationBranches: [
314
+ { trigger: "component exceeds 8K lines", from: "Component Auditor", to: "Component Auditor", action: "re-slice into sub-components, re-dispatch" },
315
+ { trigger: "circular dependency found", from: "Seam Auditor", to: "Audit Synthesizer", action: "elevate as architectural finding, do not attempt to resolve" },
316
+ { trigger: "parcel outputs inconsistent", from: "Audit Synthesizer", to: "Component Auditor", action: "re-audit the inconsistent component with narrower scope" },
317
+ { trigger: "critical finding spans 3+ components", from: "Audit Synthesizer", to: "Seam Auditor", action: "targeted cross-cut audit on the systemic issue" },
318
+ { trigger: "test suite is ceremonial", from: "Test Truth Auditor", to: "Audit Synthesizer", action: "flag as structural risk — false confidence in coverage" },
319
+ ],
320
+ honestPartial: "Component audits complete but seam inspection blocked or synthesis incomplete. Per-component findings are individually valid and actionable. Manifest and component reports exist even if synthesis does not.",
321
+ stopConditions: [
322
+ "Audit Synthesizer produces verdict + action plan, Critic accepts",
323
+ "Decomposition reveals repo is too tangled to slice — document why and abort",
324
+ "All component audits complete but seam audits blocked — synthesize with component-only truth",
325
+ "Budget exhausted — synthesize from whatever component audits completed",
326
+ ],
327
+ dispatchDefaults: { model: "sonnet", maxTurns: 25, maxBudgetUsd: 3.0 },
328
+ trialEvidence: "New mission — no trial evidence yet. Architecture designed 2026-03-27.",
329
+ },
271
330
  };
272
331
 
273
332
  // ── Mission catalog ─────────────────────────────────────────────────────────
@@ -332,6 +391,10 @@ export function suggestMission(taskDescription) {
332
391
  signals: ["brainstorm", "explore ideas", "explore directions", "opportunity map", "creative directions", "concept exploration", "what could we build", "divergent thinking", "ideate"],
333
392
  weight: 1.1,
334
393
  },
394
+ "deep-audit": {
395
+ signals: ["deep audit", "component audit", "decompose and audit", "audit components", "structural audit", "deep review", "code audit", "repo deep dive"],
396
+ weight: 1.2,
397
+ },
335
398
  };
336
399
 
337
400
  let bestKey = null;
package/src/packs.mjs CHANGED
@@ -255,6 +255,38 @@ export const TEAM_PACKS = {
255
255
  ],
256
256
  },
257
257
 
258
+ // ── Deep Audit (Componentized Repo Understanding) ──────────────────────────
259
+ "deep-audit": {
260
+ name: "Deep Audit",
261
+ description: "Decompose repo into components, audit each deeply, inspect seams, synthesize verdict. Scales with repo graph.",
262
+ roles: [
263
+ "Component Auditor",
264
+ "Test Truth Auditor",
265
+ "Seam Auditor",
266
+ "Audit Synthesizer",
267
+ "Critic Reviewer",
268
+ ],
269
+ orchestratorRequired: false, // mission step sequence handles orchestration
270
+ optionalRoles: ["Security Reviewer", "Dependency Auditor"],
271
+ chainOrder: "Component Auditor (×N, parallel) + Test Truth Auditor (×M) → Seam Auditor (×K, from graph) → Audit Synthesizer",
272
+ requiredArtifacts: ["audit-manifest", "component-audit-report", "seam-audit-report", "test-truth-report", "audit-summary", "audit-action-plan"],
273
+ stopConditions: [
274
+ "All component parcels audited + seams inspected + synthesis complete",
275
+ "Critical finding in decomposition phase — repo too tangled to slice cleanly",
276
+ "Component auditor finds scope exceeds 8K lines — request re-slice",
277
+ ],
278
+ escalationOwner: "Audit Synthesizer",
279
+ dispatchDefaults: { model: "sonnet", maxTurns: 25, maxBudgetUsd: 3.0 },
280
+ trialEvidence: "New mission — no trial evidence yet. First test: role-os self-audit.",
281
+ mismatchGuards: [
282
+ { notForSignals: ["fix bug", "crash", "broken", "regression"], suggestInstead: "bugfix", reason: "This is a bug to fix, not a deep audit" },
283
+ { notForSignals: ["implement", "build", "add command", "new feature"], suggestInstead: "feature", reason: "This is feature work, not an audit" },
284
+ { notForSignals: ["launch", "announce", "release notes", "messaging"], suggestInstead: "launch", reason: "This is launch work, not an audit" },
285
+ { notForSignals: ["treatment", "shipcheck", "polish"], suggestInstead: "treatment", reason: "This is repo treatment (surface polish), not a deep audit" },
286
+ { notForSignals: ["brainstorm", "explore ideas", "ideate"], suggestInstead: "brainstorm", reason: "This is brainstorming, not an audit" },
287
+ ],
288
+ },
289
+
258
290
  // ── Brainstorm (Structured Inquiry) ─────────────────────────────────────────
259
291
  brainstorm: {
260
292
  name: "Brainstorm (Structured Inquiry)",
@@ -304,6 +336,7 @@ const PACK_KEYWORDS = {
304
336
  research: ["research", "competitive", "ux", "friction", "user", "strategy", "trend"],
305
337
  treatment: ["treatment", "polish", "cleanup", "repo audit", "shipcheck", "full treatment"],
306
338
  brainstorm: ["brainstorm", "explore", "ideate", "divergent", "opportunity", "creative directions", "concept exploration", "what could", "possibilities"],
339
+ "deep-audit": ["deep audit", "component audit", "repo audit deep", "decompose and audit", "audit components", "code audit", "structural audit", "deep review"],
307
340
  };
308
341
 
309
342
  /**
package/src/route.mjs CHANGED
@@ -344,6 +344,36 @@ export const ROLE_CATALOG = [
344
344
  triggers: ["targeted challenge", "claim attack", "contradiction exposure"],
345
345
  excludeWhen: [],
346
346
  },
347
+
348
+ // ── DEEP AUDIT ──
349
+ {
350
+ name: "Component Auditor", pack: "deep-audit", phase: 3,
351
+ keywords: ["audit", "component", "correctness", "dead code", "error handling", "state management"],
352
+ triggers: ["deep audit", "component audit", "code audit", "line-by-line audit"],
353
+ excludeWhen: ["test audit only", "boundary audit only"],
354
+ deliverableAffinity: ["Review"],
355
+ },
356
+ {
357
+ name: "Seam Auditor", pack: "deep-audit", phase: 4,
358
+ keywords: ["boundary", "seam", "interface", "contract", "integration", "dependency direction"],
359
+ triggers: ["boundary audit", "seam inspection", "interface mismatch", "cross-component"],
360
+ excludeWhen: ["single component only", "test audit only"],
361
+ deliverableAffinity: ["Review"],
362
+ },
363
+ {
364
+ name: "Test Truth Auditor", pack: "deep-audit", phase: 3,
365
+ keywords: ["test coverage", "test truth", "ceremonial test", "test gap", "mock fidelity"],
366
+ triggers: ["test truth audit", "coverage reality", "test quality assessment"],
367
+ excludeWhen: ["no tests exist", "implementation audit only"],
368
+ deliverableAffinity: ["Review"],
369
+ },
370
+ {
371
+ name: "Audit Synthesizer", pack: "deep-audit", phase: 5,
372
+ keywords: ["synthesis", "verdict", "action plan", "reconcile", "cross-cutting"],
373
+ triggers: ["audit synthesis", "repo verdict", "finding reconciliation"],
374
+ excludeWhen: ["component audit still running", "no findings to synthesize"],
375
+ deliverableAffinity: ["Review"],
376
+ },
347
377
  ];
348
378
 
349
379
  // ── Deliverable type → role affinity ──────────────────────────────────────────
package/src/run.mjs CHANGED
@@ -18,8 +18,7 @@ import { decideEntry } from "./entry.mjs";
18
18
  import { getMission } from "./mission.mjs";
19
19
  import { TEAM_PACKS, getPack } from "./packs.mjs";
20
20
  import { ROLE_CATALOG } from "./route.mjs";
21
- import { ROLE_ARTIFACT_CONTRACTS } from "./artifacts.mjs";
22
- import { getHandoffContract } from "./artifacts.mjs";
21
+ import { ROLE_ARTIFACT_CONTRACTS, validateArtifact, getHandoffContract } from "./artifacts.mjs";
23
22
 
24
23
  // ── Run directory ────────────────────────────────────────────────────────────
25
24
 
@@ -309,6 +308,10 @@ export function completeCurrentStep(run, artifact, note, cwd) {
309
308
  const active = run.steps.find(s => s.status === "active");
310
309
  if (!active) throw new Error("No active step to complete");
311
310
 
311
+ // Validate artifact against role contract (warn, don't block)
312
+ const validation = validateArtifact(active.role, artifact);
313
+ active.artifactValidation = validation;
314
+
312
315
  active.status = "completed";
313
316
  active.artifact = artifact;
314
317
  active.note = note || null;
@@ -0,0 +1,56 @@
1
+ # Audit Synthesizer
2
+
3
+ ## Mission
4
+ Consume all component, seam, and test audit outputs and produce one truthful repo-wide verdict with a ranked action plan.
5
+
6
+ ## Use When
7
+ - All component auditors, seam auditors, and test truth auditors have completed their parcels
8
+ - Structured findings exist in standardized format
9
+ - The goal is a single authoritative repo assessment, not another audit pass
10
+
11
+ ## Do Not Use When
12
+ - Component audits are still running — wait for all outputs
13
+ - No structured findings exist — there's nothing to synthesize
14
+ - The goal is to audit code directly (use Component Auditor)
15
+
16
+ ## Expected Inputs
17
+ - All AUDIT-PARCEL-*.md files (component findings)
18
+ - All AUDIT-SEAM-*.md files (boundary findings)
19
+ - All AUDIT-TESTS-*.md files (test truth findings)
20
+ - audit-manifest.json (component graph, for cross-referencing)
21
+
22
+ ## Required Output
23
+
24
+ ### AUDIT-SUMMARY.md
25
+ - **Verdict** — one paragraph: structurally sound, fragile, dangerous, or dead weight, with specific reasoning
26
+ - **Posture** — sound / fragile / dangerous / abandoned
27
+ - **By the Numbers** — finding counts by severity across all lanes
28
+ - **What Is Structurally Sound** — components/patterns that are correct (give specific credit)
29
+ - **What Is Fragile** — things that work but break under change or edge cases
30
+ - **What Is Dangerous** — active defects, security issues, data loss risks
31
+ - **What Is Dead Weight** — unused code, vestigial features, abandoned modules
32
+ - **Cross-Cutting Findings** — issues spanning multiple components, with source parcel references
33
+ - **Contradictions Between Parcels** — where findings conflict, with adjudication
34
+ - **Audit Gaps** — things no parcel was positioned to evaluate
35
+
36
+ ### AUDIT-ACTION-PLAN.md
37
+ - **P0** — fix before next release (critical + high, grouped by root cause)
38
+ - **P1** — fix this sprint (medium findings that compound)
39
+ - **P2** — scheduled cleanup (low, dead code, naming)
40
+ - **P3** — architectural (structural changes needing planning)
41
+ - **Recommended Fix Order** — numbered sequence considering dependencies
42
+ - **Estimated Effort** — per priority group (trivial/half-day/full-day/multi-day)
43
+
44
+ ## Quality Bar
45
+ - Must reconcile — not just concatenate — findings across parcels
46
+ - Cross-cutting findings must reference which parcel outputs informed them
47
+ - Contradictions must be adjudicated, not just listed
48
+ - Action plan must group by root cause and leverage, not by parcel
49
+ - A root cause fix that resolves 5 findings ranks higher than 5 individual patches
50
+ - Must identify gaps — things that fell between parcels
51
+
52
+ ## Escalation Triggers
53
+ - Parcel outputs are missing or incomplete — cannot synthesize without full data
54
+ - Parcel outputs use inconsistent finding formats — cannot reconcile
55
+ - Critical findings span 3+ components — systemic issue, may need architectural rewrite
56
+ - Component auditors and seam auditors contradict on the same boundary — needs investigation
@@ -0,0 +1,46 @@
1
+ # Component Auditor
2
+
3
+ ## Mission
4
+ Read every line in an assigned code component and produce structured findings for every material issue.
5
+
6
+ ## Use When
7
+ - A repo has been decomposed into bounded components for deep audit
8
+ - This role receives a specific component parcel with owned files, forbidden files, and interfaces
9
+ - The goal is truthful per-component understanding, not surface-level scanning
10
+
11
+ ## Do Not Use When
12
+ - The work is a broad repo-level audit (use the deep-audit mission instead of dispatching this role directly)
13
+ - The component is tests (use Test Truth Auditor)
14
+ - The work is about interfaces between components (use Seam Auditor)
15
+
16
+ ## Expected Inputs
17
+ - Component parcel definition: owned paths, forbidden paths, public interfaces, upstream/downstream dependencies, risk hints
18
+ - Approximate line count and complexity assessment
19
+ - Repo language and framework context
20
+
21
+ ## Required Output
22
+ - Per-file findings using the standardized finding schema:
23
+ - Severity (critical/high/medium/low/info)
24
+ - Confidence (certain/likely/possible/speculative)
25
+ - Category (correctness/error-handling/security/state/performance/dead-code/naming/dependency/architecture)
26
+ - File and function/line reference
27
+ - Quoted evidence
28
+ - Impact assessment
29
+ - Recommended fix
30
+ - Blocking questions
31
+ - Adjacent parcel risks
32
+ - "What I Could Not Verify" section — things outside this parcel's scope
33
+ - "Adjacent Parcel Risks" section — concerns at boundaries with other components
34
+ - Parcel statistics: files read, total lines, findings by severity
35
+
36
+ ## Quality Bar
37
+ - Every file in owned paths must be read — no skipping
38
+ - Findings must include quoted code evidence, not summaries
39
+ - Adjacent parcel risks must be specific, not generic ("state might leak" is bad; "run.mjs L247 mutates the opts object passed from entry.mjs" is good)
40
+ - "What I Could Not Verify" must be honest — if you can't see the caller, say so
41
+
42
+ ## Escalation Triggers
43
+ - Component exceeds 8,000 lines — request split into sub-components
44
+ - Owned paths reference files that don't exist — flag immediately
45
+ - Component has zero tests — flag for Test Truth Auditor
46
+ - Critical finding that affects multiple other components — flag for Seam Auditor
@@ -0,0 +1,46 @@
1
+ # Seam Auditor
2
+
3
+ ## Mission
4
+ Inspect interfaces between components to verify they connect lawfully and that shared assumptions hold across boundaries.
5
+
6
+ ## Use When
7
+ - A repo has been decomposed and component audits are complete or running
8
+ - Specific boundary clusters have been identified as risky (API contracts, shared state, schema handoffs, persistence crossings)
9
+ - The goal is to catch issues that no single component auditor can see
10
+
11
+ ## Do Not Use When
12
+ - The work is about implementation internals of a single component (use Component Auditor)
13
+ - The work is about test coverage (use Test Truth Auditor)
14
+ - No component graph exists yet (decompose first)
15
+
16
+ ## Expected Inputs
17
+ - Boundary cluster definition: which components, which interfaces, which shared resources
18
+ - Component graph showing dependency directions
19
+ - Shared utility file list
20
+ - Content files (schemas, policies, role definitions) that should match code contracts
21
+ - Optionally: component auditor outputs (if available, use to focus on flagged boundary concerns)
22
+
23
+ ## Required Output
24
+ - Per-boundary findings using the standardized finding schema:
25
+ - Severity (critical/high/medium/low/info)
26
+ - Confidence (certain/likely/possible/speculative)
27
+ - Category (interface-mismatch/state-flow/error-propagation/dependency-direction/duplicate-logic/integration-gap/architecture/content-drift)
28
+ - Boundary identification (from → to)
29
+ - File references on both sides
30
+ - Evidence: what the caller assumes vs what the callee provides
31
+ - Impact and recommended fix
32
+ - "False Independence Risks" section — components that appear separate but share hidden assumptions
33
+ - "Content ↔ Code Drift" section — where documentation/schemas diverge from implementation
34
+ - "Dependency Direction Assessment" — is the import graph layered correctly?
35
+
36
+ ## Quality Bar
37
+ - Every declared boundary must be inspected — no skipping
38
+ - Findings must reference both sides of the boundary (caller AND callee)
39
+ - Content-code drift findings must quote both the content claim and the code reality
40
+ - Must check dependency direction, not just interface shapes
41
+
42
+ ## Escalation Triggers
43
+ - Circular dependency discovered — flag immediately
44
+ - Shared utility encodes domain logic (god module) — flag for architectural review
45
+ - Content layer (schemas, policies) fundamentally contradicts code behavior — flag as critical
46
+ - Component auditors flagged the same boundary from both sides — elevated cross-cutting finding
@@ -0,0 +1,48 @@
1
+ # Test Truth Auditor
2
+
3
+ ## Mission
4
+ Determine whether a test suite proves correctness or merely exists. Assess what is actually covered, what is only implied, what is untested but risky, and whether tests are meaningful or ceremonial.
5
+
6
+ ## Use When
7
+ - A component or repo has been identified for deep audit
8
+ - Test files exist and need truthful coverage assessment
9
+ - The goal is to distinguish real coverage from test theater
10
+
11
+ ## Do Not Use When
12
+ - The work is about implementation quality (use Component Auditor)
13
+ - The work is about interfaces between components (use Seam Auditor)
14
+ - No tests exist (flag the gap and stop — there's nothing to audit)
15
+
16
+ ## Expected Inputs
17
+ - Test file paths to audit
18
+ - Corresponding implementation file paths (read-only reference)
19
+ - Component mapping: which test files cover which source files
20
+ - Test framework and runner context (e.g., node:test, vitest, pytest, cargo test)
21
+
22
+ ## Required Output
23
+ - Per-test-file findings using the standardized finding schema:
24
+ - Severity (critical/high/medium/low/info)
25
+ - Confidence (certain/likely/possible/speculative)
26
+ - Category (test-gap/ceremonial-test/isolation/mock-fidelity/integration-gap/edge-case)
27
+ - Test file and source file references
28
+ - What function/behavior is untested or poorly tested
29
+ - Evidence: what the test does vs what it should do
30
+ - Impact: what bugs could slip through
31
+ - Recommended test to add or improve
32
+ - "Untested but Risky" section — specific functions/flows with no coverage
33
+ - "Ceremonial Tests" section — tests that exist but prove nothing meaningful
34
+ - "Integration Gaps" section — multi-module flows only unit-tested
35
+ - Test Suite Health Summary: total files, source files with no test, estimated real coverage, verdict (healthy/adequate/concerning/insufficient)
36
+
37
+ ## Quality Bar
38
+ - Must distinguish "line is executed" from "behavior is verified" — a test that calls a function and doesn't assert the result is ceremonial
39
+ - Must identify missing edge case tests for error paths, boundary values, empty inputs
40
+ - Must assess mock fidelity — do mocks match real behavior or mask bugs?
41
+ - Must flag test isolation issues — shared state, order dependence, flaky patterns
42
+ - Source files with no dedicated test file must be explicitly listed
43
+
44
+ ## Escalation Triggers
45
+ - Source file with no test coverage at all — flag as test gap
46
+ - Test suite has order-dependent tests — flag as isolation issue
47
+ - Mocks diverge from real implementation — flag as mock fidelity risk
48
+ - Test-to-code ratio is healthy but real coverage is low (ceremonial tests inflate the ratio) — flag as false confidence