role-os 2.1.0 → 2.2.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/CHANGELOG.md +33 -0
- package/README.md +40 -16
- package/package.json +2 -2
- package/src/artifacts.mjs +43 -0
- package/src/dispatch.mjs +6 -0
- package/src/evidence.mjs +9 -9
- package/src/mission-run.mjs +111 -13
- package/src/mission.mjs +63 -0
- package/src/packs.mjs +33 -0
- package/src/route.mjs +30 -0
- package/src/run.mjs +5 -2
- package/starter-pack/agents/engineering/audit-synthesizer.md +56 -0
- package/starter-pack/agents/engineering/component-auditor.md +46 -0
- package/starter-pack/agents/engineering/seam-auditor.md +46 -0
- package/starter-pack/agents/engineering/test-truth-auditor.md +48 -0
package/CHANGELOG.md
CHANGED
|
@@ -1,5 +1,38 @@
|
|
|
1
1
|
# Changelog
|
|
2
2
|
|
|
3
|
+
## 2.2.0
|
|
4
|
+
|
|
5
|
+
### Added
|
|
6
|
+
|
|
7
|
+
#### Deep Audit Mission — Runner-Native Componentized Repo Audit
|
|
8
|
+
|
|
9
|
+
- **Deep audit mission** — 8th mission in the library. Decomposes a repo into bounded components, dispatches one auditor per component, inspects seams from the dependency graph, assesses test truth, then synthesizes into a ranked verdict and action plan.
|
|
10
|
+
- **Dynamic dispatch** — missions with `dynamicDispatch` field now expand from a manifest at runtime. `createRun("deep-audit", task, { manifest })` creates N + M + K + 3 steps from the repo graph instead of a fixed static chain. A 6-component / 8-boundary repo produces 23 steps; a 10-component / 5-boundary repo produces 28.
|
|
11
|
+
- **4 new audit roles** — Component Auditor, Seam Auditor, Test Truth Auditor, Audit Synthesizer. Each with full artifact contracts, tool profiles, and role definitions in starter-pack.
|
|
12
|
+
- **Deep-audit pack** — 9th team pack with scaling chain order, dispatch defaults, and mismatch guards.
|
|
13
|
+
- **Artifact validation at execution boundaries** — `validateArtifact()` now runs on every step completion in both `run.mjs` and `mission-run.mjs`. Validation results are attached to the step object. Warn, don't block.
|
|
14
|
+
- **Proof run test suite** — `test/deep-audit-proof.test.mjs` proves the full runner-native lifecycle against the real audit-manifest.json: step creation, parcel identity, validation, escalation, partial failure, scaling formula, and report generation.
|
|
15
|
+
|
|
16
|
+
### Fixed
|
|
17
|
+
|
|
18
|
+
- **Critical: "approve" vs "accept" verdict mismatch** — `evidence.mjs:195` checked `!== "approve"` but the enum defines `"accept"`. Every accept verdict generated a spurious warning. Tests masked it via substring matching. Fixed to `"accept"` with hardened exact-assertion tests.
|
|
19
|
+
- **Dead imports removed** — `TEAM_PACKS` and `ROLE_ARTIFACT_CONTRACTS` in mission-run.mjs, `TEAM_PACKS` in run.mjs, `scoreRole` and `MIN_SCORE_THRESHOLD` in trial.mjs were imported but never used.
|
|
20
|
+
- **Warning message terminology** — all evidence warning messages now use "accept" instead of "approve" consistently.
|
|
21
|
+
|
|
22
|
+
### Changed
|
|
23
|
+
|
|
24
|
+
- Mission count: 7 → 8
|
|
25
|
+
- Role count: 50 → 54 (4 deep audit roles)
|
|
26
|
+
- Pack count: 8 → 9
|
|
27
|
+
- Artifact contract count: 30 → 34 (4 new audit role contracts)
|
|
28
|
+
- Test count: 905 → 936
|
|
29
|
+
|
|
30
|
+
### Evidence
|
|
31
|
+
|
|
32
|
+
- Self-audit dogfood: 128 findings (1 critical, 11 high, 39 medium) across 6 component parcels, 8 boundary seams, and 31 test files
|
|
33
|
+
- Runner-native proof run: 23 dynamic steps from real manifest, full lifecycle, all green
|
|
34
|
+
- Scaling formula verified: 2N + K + 3 holds for manifests of 3, 6, 10, and 15 components
|
|
35
|
+
|
|
3
36
|
## 2.1.0
|
|
4
37
|
|
|
5
38
|
### Added
|
package/README.md
CHANGED
|
@@ -13,7 +13,7 @@
|
|
|
13
13
|
<a href="https://mcp-tool-shop-org.github.io/role-os/"><img src="https://img.shields.io/badge/Landing_Page-live-brightgreen" alt="Landing Page"></a>
|
|
14
14
|
</p>
|
|
15
15
|
|
|
16
|
-
A multi-Claude operating system that staffs, routes, validates, and runs work through
|
|
16
|
+
A multi-Claude operating system that staffs, routes, validates, and runs work through 54 specialized role contracts. Creates task packets, assembles the right team from scored role matching, detects broken chains before execution, auto-routes recovery when work is blocked or rejected, and requires structured evidence in every verdict. Includes dynamic dispatch for manifest-scaled missions — a 10-component repo automatically becomes 28 auditor steps, not 6.
|
|
17
17
|
|
|
18
18
|
## What it does
|
|
19
19
|
|
|
@@ -44,9 +44,9 @@ roleos start "something completely novel"
|
|
|
44
44
|
|
|
45
45
|
**The fallback ladder:**
|
|
46
46
|
|
|
47
|
-
1. **Mission** — when the task matches a proven recurring workflow (bugfix, treatment, feature-ship, docs, security, research). Known role chain, artifact flow, escalation branches, and honest-partial definitions.
|
|
48
|
-
2. **Pack** — when the task is a known family but not a full mission shape.
|
|
49
|
-
3. **Free routing** — when the task is novel, mixed, or uncertain. Scores all
|
|
47
|
+
1. **Mission** — when the task matches a proven recurring workflow (bugfix, treatment, feature-ship, docs, security, research, brainstorm, deep-audit). Known role chain, artifact flow, escalation branches, and honest-partial definitions.
|
|
48
|
+
2. **Pack** — when the task is a known family but not a full mission shape. 9 calibrated team packs with auto-selection and mismatch guards.
|
|
49
|
+
3. **Free routing** — when the task is novel, mixed, or uncertain. Scores all 54 roles against packet content and assembles a dynamic chain.
|
|
50
50
|
|
|
51
51
|
The system never forces work through the wrong abstraction. It explains why it chose each level and offers alternatives.
|
|
52
52
|
|
|
@@ -103,7 +103,7 @@ Full treatment is a canonical 7-phase protocol defined in Claude project memory
|
|
|
103
103
|
|
|
104
104
|
Order: Shipcheck first, then full treatment. No v1.0.0 without passing hard gates.
|
|
105
105
|
|
|
106
|
-
##
|
|
106
|
+
## 54 roles across 9 packs
|
|
107
107
|
|
|
108
108
|
| Pack | Roles |
|
|
109
109
|
|------|-------|
|
|
@@ -115,6 +115,7 @@ Order: Shipcheck first, then full treatment. No v1.0.0 without passing hard gate
|
|
|
115
115
|
| **Product** (3) | Feedback Synthesizer, Roadmap Prioritizer, Spec Writer |
|
|
116
116
|
| **Research** (4) | UX Researcher, Competitive Analyst, Trend Researcher, User Interview Synthesizer |
|
|
117
117
|
| **Growth** (4) | Launch Strategist, Content Strategist, Community Manager, Support Triage Lead |
|
|
118
|
+
| **Deep Audit** (4) | Component Auditor, Test Truth Auditor, Seam Auditor, Audit Synthesizer |
|
|
118
119
|
|
|
119
120
|
Every role has a full contract: mission, use when, do not use when, expected inputs, required outputs, quality bar, and escalation triggers. Every role is routable — `roleos route` can recommend any of them based on packet content.
|
|
120
121
|
|
|
@@ -209,13 +210,13 @@ role-os/
|
|
|
209
210
|
mission.mjs ← 7 named mission types (feature, bugfix, treatment, docs, security, research, brainstorm)
|
|
210
211
|
mission-run.mjs ← Mission runner: create → step → complete → report
|
|
211
212
|
mission-cmd.mjs ← `roleos mission` CLI commands
|
|
212
|
-
route.mjs ←
|
|
213
|
-
packs.mjs ←
|
|
213
|
+
route.mjs ← 54-role routing + dynamic chain builder
|
|
214
|
+
packs.mjs ← 9 calibrated team packs + auto-selection
|
|
214
215
|
conflicts.mjs ← 4-pass conflict detection
|
|
215
216
|
escalation.mjs ← Auto-routing for blocked/rejected/split
|
|
216
217
|
evidence.mjs ← Structured evidence + role-aware requirements
|
|
217
218
|
dispatch.mjs ← Runtime dispatch manifests for multi-claude
|
|
218
|
-
artifacts.mjs ←
|
|
219
|
+
artifacts.mjs ← Per-role artifact contracts + pack handoffs
|
|
219
220
|
decompose.mjs ← Composite task detection + splitting
|
|
220
221
|
composite.mjs ← Dependency-ordered execution + recovery
|
|
221
222
|
replan.mjs ← Mid-run adaptive replanning
|
|
@@ -225,7 +226,7 @@ role-os/
|
|
|
225
226
|
brainstorm.mjs ← Evidence modes, request validation, finding/synthesis/judge schemas
|
|
226
227
|
brainstorm-roles.mjs ← Role-native schemas, input partitioning, blindspot enforcement, cross-exam
|
|
227
228
|
brainstorm-render.mjs ← Two-layer rendering: lexical bans, render schemas, debate transcript
|
|
228
|
-
test/ ←
|
|
229
|
+
test/ ← 936 tests across 31 test files
|
|
229
230
|
starter-pack/ ← Drop-in role contracts, policies, schemas, workflows
|
|
230
231
|
```
|
|
231
232
|
|
|
@@ -237,28 +238,29 @@ Role OS operates **locally only**. It copies markdown templates and writes packe
|
|
|
237
238
|
|
|
238
239
|
| Layer | What it does | Status |
|
|
239
240
|
|-------|-------------|--------|
|
|
240
|
-
| **Routing** | Scores all
|
|
241
|
+
| **Routing** | Scores all 54 roles against packet content, explains recommendations, assesses confidence | ✓ Shipped |
|
|
241
242
|
| **Chain builder** | Assembles phase-ordered chains from scored roles, packet-type biased not template-locked | ✓ Shipped |
|
|
242
243
|
| **Conflict detection** | 4-pass validation: hard conflicts, sequence, redundancy, coverage gaps. Repair suggestions. | ✓ Shipped |
|
|
243
244
|
| **Escalation** | Auto-routes blocked/rejected/split work to the right resolver with reason + required artifact | ✓ Shipped |
|
|
244
245
|
| **Evidence** | Role-aware structured evidence in verdicts. Sufficiency checks. 12 evidence kinds. | ✓ Shipped |
|
|
245
246
|
| **Dispatch** | Generates execution manifests for multi-claude. Per-role tool profiles, system prompts, budgets. | ✓ Shipped |
|
|
246
247
|
| **Trials** | Full roster proven: 30/30 gold-task + 5/5 negative trials. 7 pack trials complete. | ✓ Complete |
|
|
247
|
-
| **Team Packs** |
|
|
248
|
+
| **Team Packs** | 9 calibrated packs with auto-selection, mismatch guards, and free-routing fallback. | ✓ Shipped |
|
|
248
249
|
| **Outcome calibration** | Records run outcomes, tunes pack/role weights from results, adjusts confidence thresholds. | ✓ Shipped |
|
|
249
250
|
| **Mixed-task decomposition** | Detects composite work, splits into child packets, assigns packs, preserves dependencies. | ✓ Shipped |
|
|
250
251
|
| **Composite execution** | Runs child packets in dependency order with artifact passing, branch recovery, and synthesis. | ✓ Shipped |
|
|
251
252
|
| **Adaptive replanning** | Mid-run scope changes, findings, or new requirements update the plan without restarting. | ✓ Shipped |
|
|
252
253
|
| **Session spine** | `roleos init claude` scaffolds CLAUDE.md, /roleos-route, /roleos-review, /roleos-status. `roleos doctor` verifies wiring. Route cards prove engagement. | ✓ Shipped |
|
|
253
254
|
| **Hook spine** | 5 lifecycle hooks (SessionStart, PromptSubmit, PreToolUse, SubagentStart, Stop). Advisory enforcement: route card reminders, write-tool gating, subagent role injection, completion audit. | ✓ Shipped |
|
|
254
|
-
| **Artifact spine** |
|
|
255
|
-
| **Mission library** |
|
|
255
|
+
| **Artifact spine** | Per-role artifact contracts. Pack handoff contracts. Structural validation. Chain completeness checks. Downstream roles never guess what they received. | ✓ Shipped |
|
|
256
|
+
| **Mission library** | 8 named missions (feature-ship, bugfix, treatment, docs-release, security-hardening, research-launch, brainstorm, deep-audit). Each declares pack, role chain, artifact flow, escalation branches, honest-partial definition. | ✓ Shipped |
|
|
256
257
|
| **Mission runner** | Create runs, step through with tracked state, complete/fail with honest reporting. Blocked-step propagation, out-of-chain escalation warnings, last-step re-opening. | ✓ Shipped |
|
|
257
258
|
| **Unified entry** | `roleos start` decides mission vs pack vs free routing automatically. Fallback ladder with confidence scores, alternatives, and composite detection. | ✓ Shipped |
|
|
258
259
|
| **Persistent runs** | `roleos run` creates disk-backed runs. `resume`, `next`, `explain`, `complete`, `fail`. Interventions: reroute, escalate, retry, block, reopen. Step-local guidance. Friction measurement. | ✓ Shipped |
|
|
259
|
-
| **Brainstorm** | Two-layer architecture: truth (role-native schemas, provenance atoms, cross-exam dispute graph) + render (5 distinct voices, lexical bans, debate transcript). Trace links prove every rendered claim maps to a truth atom. Golden run
|
|
260
|
+
| **Brainstorm** | Two-layer architecture: truth (role-native schemas, provenance atoms, cross-exam dispute graph) + render (5 distinct voices, lexical bans, debate transcript). Trace links prove every rendered claim maps to a truth atom. Golden run proven. | ✓ Shipped |
|
|
261
|
+
| **Deep Audit** | Manifest-scaled repo audit: decompose repo into components, dispatch N auditors + M test truth auditors + K seam auditors from dependency graph, synthesize into ranked verdict and action plan. Dynamic dispatch scales with repo size (2N + K + 3 formula). Runner-native with artifact validation at every step. | ✓ Shipped |
|
|
260
262
|
|
|
261
|
-
##
|
|
263
|
+
## 8 missions
|
|
262
264
|
|
|
263
265
|
| Mission | Pack | Roles | When to use |
|
|
264
266
|
|---------|------|-------|-------------|
|
|
@@ -269,6 +271,7 @@ Role OS operates **locally only**. It copies markdown templates and writes packe
|
|
|
269
271
|
| `security-hardening` | security | 4 | Threat model, audit, fix vulnerabilities, re-audit, verify |
|
|
270
272
|
| `research-launch` | research | 4 | Frame question, research, document findings, decide |
|
|
271
273
|
| `brainstorm` | brainstorm | 9 | Structured multi-perspective inquiry with traceable disagreement and verdict |
|
|
274
|
+
| `deep-audit` | deep-audit | 5 (scales) | Manifest-backed repo audit — worker count scales with repo graph via dynamic dispatch |
|
|
272
275
|
|
|
273
276
|
Each mission includes honest-partial definitions — when work stalls, the system documents what was completed and what remains instead of bluffing completion.
|
|
274
277
|
|
|
@@ -290,7 +293,27 @@ roleos run "explore product directions for a developer tool discovery platform"
|
|
|
290
293
|
|
|
291
294
|
- **Chain of custody:** Every rendered sentence traces back to a truth-layer atom. Synthesis directions cite atoms. Cross-exam targets real claim IDs. The dispute graph is the product, not the prose.
|
|
292
295
|
|
|
293
|
-
**Proven:** v0.4 golden run —
|
|
296
|
+
**Proven:** v0.4 golden run — full chain of custody verified. See [`examples/golden-run.md`](examples/golden-run.md) for the complete artifact chain.
|
|
297
|
+
|
|
298
|
+
### Deep audit mission
|
|
299
|
+
|
|
300
|
+
Not a surface scan. The deep audit mission **decomposes a repo into bounded components and dispatches specialist auditors at a scale determined by the repo's own dependency graph.**
|
|
301
|
+
|
|
302
|
+
```bash
|
|
303
|
+
roleos run "deep audit this repo" --manifest=audit-manifest.json
|
|
304
|
+
# → MISSION: Deep Audit (Manifest-Scaled)
|
|
305
|
+
# Steps: Component Auditor ×6 + Test Truth Auditor ×6 + Seam Auditor ×8 + Synthesizer + Action Plan + Critic = 23 steps
|
|
306
|
+
```
|
|
307
|
+
|
|
308
|
+
**What makes it different:**
|
|
309
|
+
|
|
310
|
+
- **Dynamic dispatch** — worker count is not fixed. A 10-component repo with 5 boundary clusters produces 28 steps (2×10 + 5 + 3). A 3-component repo produces 12. The scaling formula is `2N + K + 3` where N = components, K = boundaries.
|
|
311
|
+
- **Manifest-backed parcels** — an `audit-manifest.json` defines components (with file paths, line counts, descriptions) and boundaries (from/to with interface descriptions). Each auditor receives only its parcel.
|
|
312
|
+
- **Four role archetypes** — Component Auditor (code truth per module), Test Truth Auditor (tests that prove vs tests that exist), Seam Auditor (integration boundaries from the dependency graph), Audit Synthesizer (ranked verdict + action plan from all parcels).
|
|
313
|
+
- **Artifact validation at every step** — `validateArtifact()` fires on every step completion in both execution paths. Results attached to step objects. The system knows whether each artifact met its contract.
|
|
314
|
+
- **Honest partial** — when budget or scope blocks completion, per-component findings are individually valid. The system synthesizes from whatever completed, never bluffs full coverage.
|
|
315
|
+
|
|
316
|
+
**Proven:** Runner-native proof run — 18 tests against real manifest, full lifecycle verified including escalation re-opening and partial failure. Scaling formula verified for 3/6/10/15-component manifests.
|
|
294
317
|
|
|
295
318
|
## Status
|
|
296
319
|
|
|
@@ -309,6 +332,7 @@ roleos run "explore product directions for a developer tool discovery platform"
|
|
|
309
332
|
- **v2.0.0**: Operator friction pass (Phase U) — `roleos run` creates persistent disk-backed runs. Resume, next, explain, complete, fail. Interventions: reroute, escalate, retry, block, reopen. Step-local guidance at every step. Friction measurement. 6 friction trials. 613 tests.
|
|
310
333
|
- **v2.0.1**: Handbook audit, beginner docs, test count corrections. 617 tests.
|
|
311
334
|
- **v2.1.0**: Brainstorm mission (v0.4) — specialized roles under law, traceable disagreement, verdict-bearing output. Two-layer architecture (truth + render), cross-exam permission matrix, dispute graph, golden run proof. 7 missions, 50 roles, 8 packs. 894 tests.
|
|
335
|
+
- **v2.2.0**: Deep Audit mission — manifest-scaled repo audit with dynamic dispatch. 4 new audit roles (Component Auditor, Test Truth Auditor, Seam Auditor, Audit Synthesizer). Worker count scales with repo graph (2N + K + 3 formula). Artifact validation wired at both execution boundaries. Runner-native proof run green. accept/approve truth fix in evidence layer. 8 missions, 54 roles, 9 packs. 936 tests.
|
|
312
336
|
|
|
313
337
|
## License
|
|
314
338
|
|
package/package.json
CHANGED
|
@@ -1,7 +1,7 @@
|
|
|
1
1
|
{
|
|
2
2
|
"name": "role-os",
|
|
3
|
-
"version": "2.
|
|
4
|
-
"description": "Role OS — a multi-Claude operating system where
|
|
3
|
+
"version": "2.2.0",
|
|
4
|
+
"description": "Role OS — a multi-Claude operating system where 54 specialized roles execute work through contracts, conflict detection, escalation, and structured evidence. 9 team packs, 8 missions including deep audit with manifest-scaled dynamic dispatch and brainstorm with traceable disagreement.",
|
|
5
5
|
"homepage": "https://mcp-tool-shop-org.github.io/role-os/",
|
|
6
6
|
"bugs": {
|
|
7
7
|
"url": "https://github.com/mcp-tool-shop-org/role-os/issues"
|
package/src/artifacts.mjs
CHANGED
|
@@ -256,6 +256,40 @@ export const ROLE_ARTIFACT_CONTRACTS = {
|
|
|
256
256
|
consumedBy: [],
|
|
257
257
|
completionRule: "Disposition is accept/revise_expand/revise_synthesize/reject. Verdicts: ready_to_advance/needs_incubation/not_active_now. Actions: build_now/hold_for_followon/archive_but_retain. Revise requires targets.",
|
|
258
258
|
},
|
|
259
|
+
|
|
260
|
+
// ── Deep Audit ──
|
|
261
|
+
"Component Auditor": {
|
|
262
|
+
artifactType: "component-audit-report",
|
|
263
|
+
requiredSections: ["findings", "what-i-could-not-verify", "adjacent-parcel-risks", "parcel-statistics"],
|
|
264
|
+
optionalSections: [],
|
|
265
|
+
requiredEvidence: ["component-parcel-definition"],
|
|
266
|
+
consumedBy: ["Audit Synthesizer"],
|
|
267
|
+
completionRule: "Every file in owned paths read. Findings use standardized schema with severity, confidence, category, file, evidence, impact. Adjacent parcel risks are specific, not generic.",
|
|
268
|
+
},
|
|
269
|
+
"Seam Auditor": {
|
|
270
|
+
artifactType: "seam-audit-report",
|
|
271
|
+
requiredSections: ["findings", "false-independence-risks", "content-code-drift", "dependency-direction-assessment"],
|
|
272
|
+
optionalSections: [],
|
|
273
|
+
requiredEvidence: ["boundary-cluster-definition", "component-graph"],
|
|
274
|
+
consumedBy: ["Audit Synthesizer"],
|
|
275
|
+
completionRule: "Every declared boundary inspected. Findings reference both sides. Content-code drift quotes both content claim and code reality.",
|
|
276
|
+
},
|
|
277
|
+
"Test Truth Auditor": {
|
|
278
|
+
artifactType: "test-truth-report",
|
|
279
|
+
requiredSections: ["findings", "untested-but-risky", "ceremonial-tests", "integration-gaps", "test-suite-health-summary"],
|
|
280
|
+
optionalSections: [],
|
|
281
|
+
requiredEvidence: ["test-file-paths", "implementation-file-paths"],
|
|
282
|
+
consumedBy: ["Audit Synthesizer"],
|
|
283
|
+
completionRule: "Distinguishes 'line executed' from 'behavior verified'. Lists source files with no test. Estimates real coverage with reasoning.",
|
|
284
|
+
},
|
|
285
|
+
"Audit Synthesizer": {
|
|
286
|
+
artifactType: "audit-summary",
|
|
287
|
+
requiredSections: ["verdict", "posture", "by-the-numbers", "structurally-sound", "fragile", "dangerous", "dead-weight", "cross-cutting-findings", "contradictions", "audit-gaps"],
|
|
288
|
+
optionalSections: [],
|
|
289
|
+
requiredEvidence: ["component-audit-report", "seam-audit-report", "test-truth-report"],
|
|
290
|
+
consumedBy: ["Critic Reviewer"],
|
|
291
|
+
completionRule: "Reconciles findings across parcels. Cross-cutting findings reference source parcels. Contradictions adjudicated. Action plan groups by root cause and leverage.",
|
|
292
|
+
},
|
|
259
293
|
};
|
|
260
294
|
|
|
261
295
|
// ── Artifact validation ───────────────────────────────────────────────────────
|
|
@@ -398,6 +432,15 @@ export const PACK_HANDOFF_CONTRACTS = {
|
|
|
398
432
|
{ role: "Critic Reviewer", produces: "verdict", consumedBy: null },
|
|
399
433
|
],
|
|
400
434
|
},
|
|
435
|
+
"deep-audit": {
|
|
436
|
+
flow: [
|
|
437
|
+
{ role: "Component Auditor", produces: "component-audit-report", consumedBy: "Audit Synthesizer" },
|
|
438
|
+
{ role: "Test Truth Auditor", produces: "test-truth-report", consumedBy: "Audit Synthesizer" },
|
|
439
|
+
{ role: "Seam Auditor", produces: "seam-audit-report", consumedBy: "Audit Synthesizer" },
|
|
440
|
+
{ role: "Audit Synthesizer", produces: "audit-summary", consumedBy: "Critic Reviewer" },
|
|
441
|
+
{ role: "Critic Reviewer", produces: "verdict", consumedBy: null },
|
|
442
|
+
],
|
|
443
|
+
},
|
|
401
444
|
};
|
|
402
445
|
|
|
403
446
|
/**
|
package/src/dispatch.mjs
CHANGED
|
@@ -88,6 +88,12 @@ const TOOL_PROFILES = {
|
|
|
88
88
|
"Mechanics Analyst": ["Read", "Glob", "Grep"],
|
|
89
89
|
"Positioning Analyst": ["Read", "Glob", "Grep"],
|
|
90
90
|
"Contrarian Analyst": ["Read", "Glob", "Grep"],
|
|
91
|
+
|
|
92
|
+
// Deep Audit
|
|
93
|
+
"Component Auditor": ["Read", "Glob", "Grep"],
|
|
94
|
+
"Seam Auditor": ["Read", "Glob", "Grep"],
|
|
95
|
+
"Test Truth Auditor": ["Read", "Glob", "Grep"],
|
|
96
|
+
"Audit Synthesizer": ["Read", "Glob", "Grep", "Write"],
|
|
91
97
|
};
|
|
92
98
|
|
|
93
99
|
// ── Default role config ─────────────────────────────────────────────────────
|
package/src/evidence.mjs
CHANGED
|
@@ -146,7 +146,7 @@ const DEFAULT_REQUIREMENTS = {
|
|
|
146
146
|
* @property {EvidenceItem[]} evidence - Structured evidence items
|
|
147
147
|
* @property {string[]} gaps - What's missing or weak
|
|
148
148
|
* @property {string[]} risks - Identified risks
|
|
149
|
-
* @property {string} [requiredNextArtifact] - What the next role must produce (for non-
|
|
149
|
+
* @property {string} [requiredNextArtifact] - What the next role must produce (for non-accept)
|
|
150
150
|
* @property {string} confidence - One of CONFIDENCE_LEVELS
|
|
151
151
|
*/
|
|
152
152
|
|
|
@@ -182,25 +182,25 @@ export function checkSufficiency(verdict) {
|
|
|
182
182
|
.map(e => `${e.kind}: ${e.claim} (${e.reference})`);
|
|
183
183
|
|
|
184
184
|
if (contradictions.length > 0 && verdict.verdict === "accept") {
|
|
185
|
-
warnings.push("Verdict is '
|
|
185
|
+
warnings.push("Verdict is 'accept' but evidence contains contradictions — review carefully");
|
|
186
186
|
}
|
|
187
187
|
|
|
188
|
-
// Check for missing evidence items on
|
|
188
|
+
// Check for missing evidence items on accept verdicts
|
|
189
189
|
const missingItems = verdict.evidence.filter(e => e.status === "missing");
|
|
190
190
|
if (missingItems.length > 0 && verdict.verdict === "accept") {
|
|
191
|
-
warnings.push("Verdict is '
|
|
191
|
+
warnings.push("Verdict is 'accept' but some evidence items are marked 'missing'");
|
|
192
192
|
}
|
|
193
193
|
|
|
194
|
-
// Non-
|
|
195
|
-
if (verdict.verdict !== "
|
|
194
|
+
// Non-accept verdicts should have gaps or requiredNextArtifact
|
|
195
|
+
if (verdict.verdict !== "accept" && verdict.verdict !== "accept-with-notes") {
|
|
196
196
|
if (verdict.gaps.length === 0 && !verdict.requiredNextArtifact) {
|
|
197
|
-
warnings.push("Non-
|
|
197
|
+
warnings.push("Non-accept verdict should specify gaps or requiredNextArtifact for recovery");
|
|
198
198
|
}
|
|
199
199
|
}
|
|
200
200
|
|
|
201
|
-
// Low confidence +
|
|
201
|
+
// Low confidence + accept is suspicious
|
|
202
202
|
if (verdict.confidence === "low" && verdict.verdict === "accept") {
|
|
203
|
-
warnings.push("Low confidence
|
|
203
|
+
warnings.push("Low confidence accept — consider whether evidence is actually sufficient");
|
|
204
204
|
}
|
|
205
205
|
|
|
206
206
|
const sufficient = missingRequired.length === 0 && contradictions.length === 0;
|
package/src/mission-run.mjs
CHANGED
|
@@ -10,8 +10,7 @@
|
|
|
10
10
|
*/
|
|
11
11
|
|
|
12
12
|
import { MISSIONS, getMission, validateMission } from "./mission.mjs";
|
|
13
|
-
import {
|
|
14
|
-
import { validateArtifact, ROLE_ARTIFACT_CONTRACTS } from "./artifacts.mjs";
|
|
13
|
+
import { validateArtifact } from "./artifacts.mjs";
|
|
15
14
|
|
|
16
15
|
let _runCounter = 0;
|
|
17
16
|
|
|
@@ -59,7 +58,7 @@ let _runCounter = 0;
|
|
|
59
58
|
* @param {string} taskDescription
|
|
60
59
|
* @returns {MissionRun}
|
|
61
60
|
*/
|
|
62
|
-
export function createRun(missionKey, taskDescription) {
|
|
61
|
+
export function createRun(missionKey, taskDescription, options = {}) {
|
|
63
62
|
const mission = getMission(missionKey);
|
|
64
63
|
if (!mission) {
|
|
65
64
|
throw new Error(`Mission "${missionKey}" not found. Available: ${Object.keys(MISSIONS).join(", ")}`);
|
|
@@ -72,16 +71,26 @@ export function createRun(missionKey, taskDescription) {
|
|
|
72
71
|
|
|
73
72
|
const id = `${missionKey}-${Date.now()}-${++_runCounter}`;
|
|
74
73
|
|
|
75
|
-
|
|
76
|
-
|
|
77
|
-
|
|
78
|
-
|
|
79
|
-
|
|
80
|
-
|
|
81
|
-
|
|
82
|
-
|
|
83
|
-
|
|
84
|
-
|
|
74
|
+
let steps;
|
|
75
|
+
const dd = mission.dynamicDispatch;
|
|
76
|
+
|
|
77
|
+
if (dd && options.manifest) {
|
|
78
|
+
// Dynamic dispatch — build steps from manifest
|
|
79
|
+
steps = buildDynamicSteps(mission, options.manifest);
|
|
80
|
+
} else {
|
|
81
|
+
// Static dispatch — use artifactFlow as-is
|
|
82
|
+
steps = mission.artifactFlow.map((step) => ({
|
|
83
|
+
role: step.role,
|
|
84
|
+
produces: step.produces,
|
|
85
|
+
consumedBy: step.consumedBy,
|
|
86
|
+
status: "pending",
|
|
87
|
+
artifact: null,
|
|
88
|
+
artifactValidation: null,
|
|
89
|
+
note: null,
|
|
90
|
+
startedAt: null,
|
|
91
|
+
completedAt: null,
|
|
92
|
+
}));
|
|
93
|
+
}
|
|
85
94
|
|
|
86
95
|
return {
|
|
87
96
|
id,
|
|
@@ -93,9 +102,94 @@ export function createRun(missionKey, taskDescription) {
|
|
|
93
102
|
startedAt: new Date().toISOString(),
|
|
94
103
|
completedAt: null,
|
|
95
104
|
completionReport: null,
|
|
105
|
+
dynamicDispatch: dd && options.manifest ? true : false,
|
|
106
|
+
manifest: options.manifest || null,
|
|
96
107
|
};
|
|
97
108
|
}
|
|
98
109
|
|
|
110
|
+
/**
|
|
111
|
+
* Build steps from manifest for dynamic dispatch missions.
|
|
112
|
+
* @param {Object} mission
|
|
113
|
+
* @param {Object} manifest - The audit-manifest.json content
|
|
114
|
+
* @returns {MissionStep[]}
|
|
115
|
+
*/
|
|
116
|
+
function buildDynamicSteps(mission, manifest) {
|
|
117
|
+
const dd = mission.dynamicDispatch;
|
|
118
|
+
const steps = [];
|
|
119
|
+
|
|
120
|
+
// Scaling roles: one step per manifest entry
|
|
121
|
+
const components = manifest[dd.componentAuditorPer] || [];
|
|
122
|
+
const boundaries = manifest[dd.seamAuditorPer] || manifest.boundaries || [];
|
|
123
|
+
|
|
124
|
+
// Component Auditor × N
|
|
125
|
+
for (const comp of components) {
|
|
126
|
+
steps.push({
|
|
127
|
+
role: "Component Auditor",
|
|
128
|
+
produces: "component-audit-report",
|
|
129
|
+
consumedBy: "Audit Synthesizer",
|
|
130
|
+
parcel: comp.id || comp.name,
|
|
131
|
+
status: "pending",
|
|
132
|
+
artifact: null,
|
|
133
|
+
artifactValidation: null,
|
|
134
|
+
note: null,
|
|
135
|
+
startedAt: null,
|
|
136
|
+
completedAt: null,
|
|
137
|
+
});
|
|
138
|
+
}
|
|
139
|
+
|
|
140
|
+
// Test Truth Auditor × M
|
|
141
|
+
for (const comp of components) {
|
|
142
|
+
steps.push({
|
|
143
|
+
role: "Test Truth Auditor",
|
|
144
|
+
produces: "test-truth-report",
|
|
145
|
+
consumedBy: "Audit Synthesizer",
|
|
146
|
+
parcel: comp.id || comp.name,
|
|
147
|
+
status: "pending",
|
|
148
|
+
artifact: null,
|
|
149
|
+
artifactValidation: null,
|
|
150
|
+
note: null,
|
|
151
|
+
startedAt: null,
|
|
152
|
+
completedAt: null,
|
|
153
|
+
});
|
|
154
|
+
}
|
|
155
|
+
|
|
156
|
+
// Seam Auditor × K
|
|
157
|
+
for (const boundary of boundaries) {
|
|
158
|
+
const label = boundary.id || `${boundary.from}-${boundary.to}`;
|
|
159
|
+
steps.push({
|
|
160
|
+
role: "Seam Auditor",
|
|
161
|
+
produces: "seam-audit-report",
|
|
162
|
+
consumedBy: "Audit Synthesizer",
|
|
163
|
+
parcel: label,
|
|
164
|
+
status: "pending",
|
|
165
|
+
artifact: null,
|
|
166
|
+
artifactValidation: null,
|
|
167
|
+
note: null,
|
|
168
|
+
startedAt: null,
|
|
169
|
+
completedAt: null,
|
|
170
|
+
});
|
|
171
|
+
}
|
|
172
|
+
|
|
173
|
+
// Non-scaling roles from artifactFlow (Audit Synthesizer, Critic Reviewer)
|
|
174
|
+
for (const step of mission.artifactFlow) {
|
|
175
|
+
if (!dd.scalingRoles.includes(step.role)) {
|
|
176
|
+
steps.push({
|
|
177
|
+
role: step.role,
|
|
178
|
+
produces: step.produces,
|
|
179
|
+
consumedBy: step.consumedBy,
|
|
180
|
+
status: "pending",
|
|
181
|
+
artifact: null,
|
|
182
|
+
artifactValidation: null,
|
|
183
|
+
note: null,
|
|
184
|
+
startedAt: null,
|
|
185
|
+
completedAt: null,
|
|
186
|
+
});
|
|
187
|
+
}
|
|
188
|
+
}
|
|
189
|
+
|
|
190
|
+
return steps;
|
|
191
|
+
}
|
|
192
|
+
|
|
99
193
|
// ── Step through a run ──────────────────────────────────────────────────────
|
|
100
194
|
|
|
101
195
|
/**
|
|
@@ -127,6 +221,10 @@ export function completeStep(run, artifact, note) {
|
|
|
127
221
|
throw new Error("No active step to complete");
|
|
128
222
|
}
|
|
129
223
|
|
|
224
|
+
// Validate artifact against role contract (warn, don't block)
|
|
225
|
+
const validation = validateArtifact(active.role, artifact);
|
|
226
|
+
active.artifactValidation = validation;
|
|
227
|
+
|
|
130
228
|
active.status = "completed";
|
|
131
229
|
active.artifact = artifact;
|
|
132
230
|
active.note = note || null;
|
package/src/mission.mjs
CHANGED
|
@@ -268,6 +268,65 @@ export const MISSIONS = {
|
|
|
268
268
|
dispatchDefaults: { model: "sonnet", maxTurns: 40, maxBudgetUsd: 6.0 },
|
|
269
269
|
trialEvidence: "v0.4 golden run — 894 tests green. Full chain of custody proven: truth artifacts, provenance atoms, dispute graph (4 challenges, 3 narrowed, 1 unresolved), rendered artifacts in 5 formats, debate transcript, 16+ trace links from rendered → truth. Architecture frozen 2026-03-27.",
|
|
270
270
|
},
|
|
271
|
+
// ── Deep Audit (Componentized Repo Understanding) ──────────────────────────
|
|
272
|
+
"deep-audit": {
|
|
273
|
+
name: "Deep Audit",
|
|
274
|
+
description: "Decompose a repo into bounded components, dispatch one auditor per component, inspect seams from the dependency graph, assess test truth, then synthesize into a ranked verdict and action plan. Worker count scales with the repo graph — not fixed.",
|
|
275
|
+
pack: "deep-audit",
|
|
276
|
+
entryPath: "Decompose repo → validate parcels → Component Auditor ×N (parallel) + Test Truth Auditor ×M → Seam Auditor ×K (from graph edges) → Audit Synthesizer → Critic reviews verdict",
|
|
277
|
+
// NOTE: This mission has a DYNAMIC role chain. The static chain below
|
|
278
|
+
// shows the role archetypes. At dispatch time, Component Auditor and
|
|
279
|
+
// Seam Auditor are instantiated once per component/boundary cluster.
|
|
280
|
+
// A 10-component repo with 4 risky boundaries = 10 + 4 + 2 + 1 + 1 = 18 tasks.
|
|
281
|
+
roleChain: [
|
|
282
|
+
"Component Auditor", // ×N — one per component from audit-manifest
|
|
283
|
+
"Test Truth Auditor", // ×M — one per component or one overlay pass
|
|
284
|
+
"Seam Auditor", // ×K — one per risky boundary cluster from graph
|
|
285
|
+
"Audit Synthesizer", // ×1 — consumes all outputs, produces verdict
|
|
286
|
+
"Critic Reviewer", // ×1 — final acceptance
|
|
287
|
+
],
|
|
288
|
+
// Dynamic dispatch contract:
|
|
289
|
+
// Step 1 produces audit-manifest.json with components[] and boundaries[].
|
|
290
|
+
// Steps 2-4 are instantiated from the manifest:
|
|
291
|
+
// - One Component Auditor task per components[] entry
|
|
292
|
+
// - One Test Truth Auditor task per component (or grouped by layer)
|
|
293
|
+
// - One Seam Auditor task per boundary cluster
|
|
294
|
+
// Step 5 (Audit Synthesizer) runs after ALL step 2-4 tasks complete.
|
|
295
|
+
// Step 6 (Critic Reviewer) reviews the synthesis.
|
|
296
|
+
dynamicDispatch: {
|
|
297
|
+
scalingRoles: ["Component Auditor", "Test Truth Auditor", "Seam Auditor"],
|
|
298
|
+
manifestSource: "audit-manifest.json",
|
|
299
|
+
componentAuditorPer: "components",
|
|
300
|
+
testTruthAuditorPer: "components",
|
|
301
|
+
seamAuditorPer: "boundary_clusters",
|
|
302
|
+
synthesisAfter: ["Component Auditor", "Test Truth Auditor", "Seam Auditor"],
|
|
303
|
+
},
|
|
304
|
+
artifactFlow: [
|
|
305
|
+
// Step 1: Decomposition (done before mission dispatch — input artifact)
|
|
306
|
+
{ role: "Component Auditor", produces: "component-audit-report", consumedBy: "Audit Synthesizer" },
|
|
307
|
+
{ role: "Test Truth Auditor", produces: "test-truth-report", consumedBy: "Audit Synthesizer" },
|
|
308
|
+
{ role: "Seam Auditor", produces: "seam-audit-report", consumedBy: "Audit Synthesizer" },
|
|
309
|
+
{ role: "Audit Synthesizer", produces: "audit-summary", consumedBy: "Critic Reviewer" },
|
|
310
|
+
{ role: "Audit Synthesizer", produces: "audit-action-plan", consumedBy: "Critic Reviewer" },
|
|
311
|
+
{ role: "Critic Reviewer", produces: "review-verdict", consumedBy: null },
|
|
312
|
+
],
|
|
313
|
+
escalationBranches: [
|
|
314
|
+
{ trigger: "component exceeds 8K lines", from: "Component Auditor", to: "Component Auditor", action: "re-slice into sub-components, re-dispatch" },
|
|
315
|
+
{ trigger: "circular dependency found", from: "Seam Auditor", to: "Audit Synthesizer", action: "elevate as architectural finding, do not attempt to resolve" },
|
|
316
|
+
{ trigger: "parcel outputs inconsistent", from: "Audit Synthesizer", to: "Component Auditor", action: "re-audit the inconsistent component with narrower scope" },
|
|
317
|
+
{ trigger: "critical finding spans 3+ components", from: "Audit Synthesizer", to: "Seam Auditor", action: "targeted cross-cut audit on the systemic issue" },
|
|
318
|
+
{ trigger: "test suite is ceremonial", from: "Test Truth Auditor", to: "Audit Synthesizer", action: "flag as structural risk — false confidence in coverage" },
|
|
319
|
+
],
|
|
320
|
+
honestPartial: "Component audits complete but seam inspection blocked or synthesis incomplete. Per-component findings are individually valid and actionable. Manifest and component reports exist even if synthesis does not.",
|
|
321
|
+
stopConditions: [
|
|
322
|
+
"Audit Synthesizer produces verdict + action plan, Critic accepts",
|
|
323
|
+
"Decomposition reveals repo is too tangled to slice — document why and abort",
|
|
324
|
+
"All component audits complete but seam audits blocked — synthesize with component-only truth",
|
|
325
|
+
"Budget exhausted — synthesize from whatever component audits completed",
|
|
326
|
+
],
|
|
327
|
+
dispatchDefaults: { model: "sonnet", maxTurns: 25, maxBudgetUsd: 3.0 },
|
|
328
|
+
trialEvidence: "New mission — no trial evidence yet. Architecture designed 2026-03-27.",
|
|
329
|
+
},
|
|
271
330
|
};
|
|
272
331
|
|
|
273
332
|
// ── Mission catalog ─────────────────────────────────────────────────────────
|
|
@@ -332,6 +391,10 @@ export function suggestMission(taskDescription) {
|
|
|
332
391
|
signals: ["brainstorm", "explore ideas", "explore directions", "opportunity map", "creative directions", "concept exploration", "what could we build", "divergent thinking", "ideate"],
|
|
333
392
|
weight: 1.1,
|
|
334
393
|
},
|
|
394
|
+
"deep-audit": {
|
|
395
|
+
signals: ["deep audit", "component audit", "decompose and audit", "audit components", "structural audit", "deep review", "code audit", "repo deep dive"],
|
|
396
|
+
weight: 1.2,
|
|
397
|
+
},
|
|
335
398
|
};
|
|
336
399
|
|
|
337
400
|
let bestKey = null;
|
package/src/packs.mjs
CHANGED
|
@@ -255,6 +255,38 @@ export const TEAM_PACKS = {
|
|
|
255
255
|
],
|
|
256
256
|
},
|
|
257
257
|
|
|
258
|
+
// ── Deep Audit (Componentized Repo Understanding) ──────────────────────────
|
|
259
|
+
"deep-audit": {
|
|
260
|
+
name: "Deep Audit",
|
|
261
|
+
description: "Decompose repo into components, audit each deeply, inspect seams, synthesize verdict. Scales with repo graph.",
|
|
262
|
+
roles: [
|
|
263
|
+
"Component Auditor",
|
|
264
|
+
"Test Truth Auditor",
|
|
265
|
+
"Seam Auditor",
|
|
266
|
+
"Audit Synthesizer",
|
|
267
|
+
"Critic Reviewer",
|
|
268
|
+
],
|
|
269
|
+
orchestratorRequired: false, // mission step sequence handles orchestration
|
|
270
|
+
optionalRoles: ["Security Reviewer", "Dependency Auditor"],
|
|
271
|
+
chainOrder: "Component Auditor (×N, parallel) + Test Truth Auditor (×M) → Seam Auditor (×K, from graph) → Audit Synthesizer",
|
|
272
|
+
requiredArtifacts: ["audit-manifest", "component-audit-report", "seam-audit-report", "test-truth-report", "audit-summary", "audit-action-plan"],
|
|
273
|
+
stopConditions: [
|
|
274
|
+
"All component parcels audited + seams inspected + synthesis complete",
|
|
275
|
+
"Critical finding in decomposition phase — repo too tangled to slice cleanly",
|
|
276
|
+
"Component auditor finds scope exceeds 8K lines — request re-slice",
|
|
277
|
+
],
|
|
278
|
+
escalationOwner: "Audit Synthesizer",
|
|
279
|
+
dispatchDefaults: { model: "sonnet", maxTurns: 25, maxBudgetUsd: 3.0 },
|
|
280
|
+
trialEvidence: "New mission — no trial evidence yet. First test: role-os self-audit.",
|
|
281
|
+
mismatchGuards: [
|
|
282
|
+
{ notForSignals: ["fix bug", "crash", "broken", "regression"], suggestInstead: "bugfix", reason: "This is a bug to fix, not a deep audit" },
|
|
283
|
+
{ notForSignals: ["implement", "build", "add command", "new feature"], suggestInstead: "feature", reason: "This is feature work, not an audit" },
|
|
284
|
+
{ notForSignals: ["launch", "announce", "release notes", "messaging"], suggestInstead: "launch", reason: "This is launch work, not an audit" },
|
|
285
|
+
{ notForSignals: ["treatment", "shipcheck", "polish"], suggestInstead: "treatment", reason: "This is repo treatment (surface polish), not a deep audit" },
|
|
286
|
+
{ notForSignals: ["brainstorm", "explore ideas", "ideate"], suggestInstead: "brainstorm", reason: "This is brainstorming, not an audit" },
|
|
287
|
+
],
|
|
288
|
+
},
|
|
289
|
+
|
|
258
290
|
// ── Brainstorm (Structured Inquiry) ─────────────────────────────────────────
|
|
259
291
|
brainstorm: {
|
|
260
292
|
name: "Brainstorm (Structured Inquiry)",
|
|
@@ -304,6 +336,7 @@ const PACK_KEYWORDS = {
|
|
|
304
336
|
research: ["research", "competitive", "ux", "friction", "user", "strategy", "trend"],
|
|
305
337
|
treatment: ["treatment", "polish", "cleanup", "repo audit", "shipcheck", "full treatment"],
|
|
306
338
|
brainstorm: ["brainstorm", "explore", "ideate", "divergent", "opportunity", "creative directions", "concept exploration", "what could", "possibilities"],
|
|
339
|
+
"deep-audit": ["deep audit", "component audit", "repo audit deep", "decompose and audit", "audit components", "code audit", "structural audit", "deep review"],
|
|
307
340
|
};
|
|
308
341
|
|
|
309
342
|
/**
|
package/src/route.mjs
CHANGED
|
@@ -344,6 +344,36 @@ export const ROLE_CATALOG = [
|
|
|
344
344
|
triggers: ["targeted challenge", "claim attack", "contradiction exposure"],
|
|
345
345
|
excludeWhen: [],
|
|
346
346
|
},
|
|
347
|
+
|
|
348
|
+
// ── DEEP AUDIT ──
|
|
349
|
+
{
|
|
350
|
+
name: "Component Auditor", pack: "deep-audit", phase: 3,
|
|
351
|
+
keywords: ["audit", "component", "correctness", "dead code", "error handling", "state management"],
|
|
352
|
+
triggers: ["deep audit", "component audit", "code audit", "line-by-line audit"],
|
|
353
|
+
excludeWhen: ["test audit only", "boundary audit only"],
|
|
354
|
+
deliverableAffinity: ["Review"],
|
|
355
|
+
},
|
|
356
|
+
{
|
|
357
|
+
name: "Seam Auditor", pack: "deep-audit", phase: 4,
|
|
358
|
+
keywords: ["boundary", "seam", "interface", "contract", "integration", "dependency direction"],
|
|
359
|
+
triggers: ["boundary audit", "seam inspection", "interface mismatch", "cross-component"],
|
|
360
|
+
excludeWhen: ["single component only", "test audit only"],
|
|
361
|
+
deliverableAffinity: ["Review"],
|
|
362
|
+
},
|
|
363
|
+
{
|
|
364
|
+
name: "Test Truth Auditor", pack: "deep-audit", phase: 3,
|
|
365
|
+
keywords: ["test coverage", "test truth", "ceremonial test", "test gap", "mock fidelity"],
|
|
366
|
+
triggers: ["test truth audit", "coverage reality", "test quality assessment"],
|
|
367
|
+
excludeWhen: ["no tests exist", "implementation audit only"],
|
|
368
|
+
deliverableAffinity: ["Review"],
|
|
369
|
+
},
|
|
370
|
+
{
|
|
371
|
+
name: "Audit Synthesizer", pack: "deep-audit", phase: 5,
|
|
372
|
+
keywords: ["synthesis", "verdict", "action plan", "reconcile", "cross-cutting"],
|
|
373
|
+
triggers: ["audit synthesis", "repo verdict", "finding reconciliation"],
|
|
374
|
+
excludeWhen: ["component audit still running", "no findings to synthesize"],
|
|
375
|
+
deliverableAffinity: ["Review"],
|
|
376
|
+
},
|
|
347
377
|
];
|
|
348
378
|
|
|
349
379
|
// ── Deliverable type → role affinity ──────────────────────────────────────────
|
package/src/run.mjs
CHANGED
|
@@ -18,8 +18,7 @@ import { decideEntry } from "./entry.mjs";
|
|
|
18
18
|
import { getMission } from "./mission.mjs";
|
|
19
19
|
import { TEAM_PACKS, getPack } from "./packs.mjs";
|
|
20
20
|
import { ROLE_CATALOG } from "./route.mjs";
|
|
21
|
-
import { ROLE_ARTIFACT_CONTRACTS } from "./artifacts.mjs";
|
|
22
|
-
import { getHandoffContract } from "./artifacts.mjs";
|
|
21
|
+
import { ROLE_ARTIFACT_CONTRACTS, validateArtifact, getHandoffContract } from "./artifacts.mjs";
|
|
23
22
|
|
|
24
23
|
// ── Run directory ────────────────────────────────────────────────────────────
|
|
25
24
|
|
|
@@ -309,6 +308,10 @@ export function completeCurrentStep(run, artifact, note, cwd) {
|
|
|
309
308
|
const active = run.steps.find(s => s.status === "active");
|
|
310
309
|
if (!active) throw new Error("No active step to complete");
|
|
311
310
|
|
|
311
|
+
// Validate artifact against role contract (warn, don't block)
|
|
312
|
+
const validation = validateArtifact(active.role, artifact);
|
|
313
|
+
active.artifactValidation = validation;
|
|
314
|
+
|
|
312
315
|
active.status = "completed";
|
|
313
316
|
active.artifact = artifact;
|
|
314
317
|
active.note = note || null;
|
|
@@ -0,0 +1,56 @@
|
|
|
1
|
+
# Audit Synthesizer
|
|
2
|
+
|
|
3
|
+
## Mission
|
|
4
|
+
Consume all component, seam, and test audit outputs and produce one truthful repo-wide verdict with a ranked action plan.
|
|
5
|
+
|
|
6
|
+
## Use When
|
|
7
|
+
- All component auditors, seam auditors, and test truth auditors have completed their parcels
|
|
8
|
+
- Structured findings exist in standardized format
|
|
9
|
+
- The goal is a single authoritative repo assessment, not another audit pass
|
|
10
|
+
|
|
11
|
+
## Do Not Use When
|
|
12
|
+
- Component audits are still running — wait for all outputs
|
|
13
|
+
- No structured findings exist — there's nothing to synthesize
|
|
14
|
+
- The goal is to audit code directly (use Component Auditor)
|
|
15
|
+
|
|
16
|
+
## Expected Inputs
|
|
17
|
+
- All AUDIT-PARCEL-*.md files (component findings)
|
|
18
|
+
- All AUDIT-SEAM-*.md files (boundary findings)
|
|
19
|
+
- All AUDIT-TESTS-*.md files (test truth findings)
|
|
20
|
+
- audit-manifest.json (component graph, for cross-referencing)
|
|
21
|
+
|
|
22
|
+
## Required Output
|
|
23
|
+
|
|
24
|
+
### AUDIT-SUMMARY.md
|
|
25
|
+
- **Verdict** — one paragraph: structurally sound, fragile, dangerous, or dead weight, with specific reasoning
|
|
26
|
+
- **Posture** — sound / fragile / dangerous / abandoned
|
|
27
|
+
- **By the Numbers** — finding counts by severity across all lanes
|
|
28
|
+
- **What Is Structurally Sound** — components/patterns that are correct (give specific credit)
|
|
29
|
+
- **What Is Fragile** — things that work but break under change or edge cases
|
|
30
|
+
- **What Is Dangerous** — active defects, security issues, data loss risks
|
|
31
|
+
- **What Is Dead Weight** — unused code, vestigial features, abandoned modules
|
|
32
|
+
- **Cross-Cutting Findings** — issues spanning multiple components, with source parcel references
|
|
33
|
+
- **Contradictions Between Parcels** — where findings conflict, with adjudication
|
|
34
|
+
- **Audit Gaps** — things no parcel was positioned to evaluate
|
|
35
|
+
|
|
36
|
+
### AUDIT-ACTION-PLAN.md
|
|
37
|
+
- **P0** — fix before next release (critical + high, grouped by root cause)
|
|
38
|
+
- **P1** — fix this sprint (medium findings that compound)
|
|
39
|
+
- **P2** — scheduled cleanup (low, dead code, naming)
|
|
40
|
+
- **P3** — architectural (structural changes needing planning)
|
|
41
|
+
- **Recommended Fix Order** — numbered sequence considering dependencies
|
|
42
|
+
- **Estimated Effort** — per priority group (trivial/half-day/full-day/multi-day)
|
|
43
|
+
|
|
44
|
+
## Quality Bar
|
|
45
|
+
- Must reconcile — not just concatenate — findings across parcels
|
|
46
|
+
- Cross-cutting findings must reference which parcel outputs informed them
|
|
47
|
+
- Contradictions must be adjudicated, not just listed
|
|
48
|
+
- Action plan must group by root cause and leverage, not by parcel
|
|
49
|
+
- A root cause fix that resolves 5 findings ranks higher than 5 individual patches
|
|
50
|
+
- Must identify gaps — things that fell between parcels
|
|
51
|
+
|
|
52
|
+
## Escalation Triggers
|
|
53
|
+
- Parcel outputs are missing or incomplete — cannot synthesize without full data
|
|
54
|
+
- Parcel outputs use inconsistent finding formats — cannot reconcile
|
|
55
|
+
- Critical findings span 3+ components — systemic issue, may need architectural rewrite
|
|
56
|
+
- Component auditors and seam auditors contradict on the same boundary — needs investigation
|
|
@@ -0,0 +1,46 @@
|
|
|
1
|
+
# Component Auditor
|
|
2
|
+
|
|
3
|
+
## Mission
|
|
4
|
+
Read every line in an assigned code component and produce structured findings for every material issue.
|
|
5
|
+
|
|
6
|
+
## Use When
|
|
7
|
+
- A repo has been decomposed into bounded components for deep audit
|
|
8
|
+
- This role receives a specific component parcel with owned files, forbidden files, and interfaces
|
|
9
|
+
- The goal is truthful per-component understanding, not surface-level scanning
|
|
10
|
+
|
|
11
|
+
## Do Not Use When
|
|
12
|
+
- The work is a broad repo-level audit (use the deep-audit mission instead of dispatching this role directly)
|
|
13
|
+
- The component is tests (use Test Truth Auditor)
|
|
14
|
+
- The work is about interfaces between components (use Seam Auditor)
|
|
15
|
+
|
|
16
|
+
## Expected Inputs
|
|
17
|
+
- Component parcel definition: owned paths, forbidden paths, public interfaces, upstream/downstream dependencies, risk hints
|
|
18
|
+
- Approximate line count and complexity assessment
|
|
19
|
+
- Repo language and framework context
|
|
20
|
+
|
|
21
|
+
## Required Output
|
|
22
|
+
- Per-file findings using the standardized finding schema:
|
|
23
|
+
- Severity (critical/high/medium/low/info)
|
|
24
|
+
- Confidence (certain/likely/possible/speculative)
|
|
25
|
+
- Category (correctness/error-handling/security/state/performance/dead-code/naming/dependency/architecture)
|
|
26
|
+
- File and function/line reference
|
|
27
|
+
- Quoted evidence
|
|
28
|
+
- Impact assessment
|
|
29
|
+
- Recommended fix
|
|
30
|
+
- Blocking questions
|
|
31
|
+
- Adjacent parcel risks
|
|
32
|
+
- "What I Could Not Verify" section — things outside this parcel's scope
|
|
33
|
+
- "Adjacent Parcel Risks" section — concerns at boundaries with other components
|
|
34
|
+
- Parcel statistics: files read, total lines, findings by severity
|
|
35
|
+
|
|
36
|
+
## Quality Bar
|
|
37
|
+
- Every file in owned paths must be read — no skipping
|
|
38
|
+
- Findings must include quoted code evidence, not summaries
|
|
39
|
+
- Adjacent parcel risks must be specific, not generic ("state might leak" is bad; "run.mjs L247 mutates the opts object passed from entry.mjs" is good)
|
|
40
|
+
- "What I Could Not Verify" must be honest — if you can't see the caller, say so
|
|
41
|
+
|
|
42
|
+
## Escalation Triggers
|
|
43
|
+
- Component exceeds 8,000 lines — request split into sub-components
|
|
44
|
+
- Owned paths reference files that don't exist — flag immediately
|
|
45
|
+
- Component has zero tests — flag for Test Truth Auditor
|
|
46
|
+
- Critical finding that affects multiple other components — flag for Seam Auditor
|
|
@@ -0,0 +1,46 @@
|
|
|
1
|
+
# Seam Auditor
|
|
2
|
+
|
|
3
|
+
## Mission
|
|
4
|
+
Inspect interfaces between components to verify they connect lawfully and that shared assumptions hold across boundaries.
|
|
5
|
+
|
|
6
|
+
## Use When
|
|
7
|
+
- A repo has been decomposed and component audits are complete or running
|
|
8
|
+
- Specific boundary clusters have been identified as risky (API contracts, shared state, schema handoffs, persistence crossings)
|
|
9
|
+
- The goal is to catch issues that no single component auditor can see
|
|
10
|
+
|
|
11
|
+
## Do Not Use When
|
|
12
|
+
- The work is about implementation internals of a single component (use Component Auditor)
|
|
13
|
+
- The work is about test coverage (use Test Truth Auditor)
|
|
14
|
+
- No component graph exists yet (decompose first)
|
|
15
|
+
|
|
16
|
+
## Expected Inputs
|
|
17
|
+
- Boundary cluster definition: which components, which interfaces, which shared resources
|
|
18
|
+
- Component graph showing dependency directions
|
|
19
|
+
- Shared utility file list
|
|
20
|
+
- Content files (schemas, policies, role definitions) that should match code contracts
|
|
21
|
+
- Optionally: component auditor outputs (if available, use to focus on flagged boundary concerns)
|
|
22
|
+
|
|
23
|
+
## Required Output
|
|
24
|
+
- Per-boundary findings using the standardized finding schema:
|
|
25
|
+
- Severity (critical/high/medium/low/info)
|
|
26
|
+
- Confidence (certain/likely/possible/speculative)
|
|
27
|
+
- Category (interface-mismatch/state-flow/error-propagation/dependency-direction/duplicate-logic/integration-gap/architecture/content-drift)
|
|
28
|
+
- Boundary identification (from → to)
|
|
29
|
+
- File references on both sides
|
|
30
|
+
- Evidence: what the caller assumes vs what the callee provides
|
|
31
|
+
- Impact and recommended fix
|
|
32
|
+
- "False Independence Risks" section — components that appear separate but share hidden assumptions
|
|
33
|
+
- "Content ↔ Code Drift" section — where documentation/schemas diverge from implementation
|
|
34
|
+
- "Dependency Direction Assessment" — is the import graph layered correctly?
|
|
35
|
+
|
|
36
|
+
## Quality Bar
|
|
37
|
+
- Every declared boundary must be inspected — no skipping
|
|
38
|
+
- Findings must reference both sides of the boundary (caller AND callee)
|
|
39
|
+
- Content-code drift findings must quote both the content claim and the code reality
|
|
40
|
+
- Must check dependency direction, not just interface shapes
|
|
41
|
+
|
|
42
|
+
## Escalation Triggers
|
|
43
|
+
- Circular dependency discovered — flag immediately
|
|
44
|
+
- Shared utility encodes domain logic (god module) — flag for architectural review
|
|
45
|
+
- Content layer (schemas, policies) fundamentally contradicts code behavior — flag as critical
|
|
46
|
+
- Component auditors flagged the same boundary from both sides — elevated cross-cutting finding
|
|
@@ -0,0 +1,48 @@
|
|
|
1
|
+
# Test Truth Auditor
|
|
2
|
+
|
|
3
|
+
## Mission
|
|
4
|
+
Determine whether a test suite proves correctness or merely exists. Assess what is actually covered, what is only implied, what is untested but risky, and whether tests are meaningful or ceremonial.
|
|
5
|
+
|
|
6
|
+
## Use When
|
|
7
|
+
- A component or repo has been identified for deep audit
|
|
8
|
+
- Test files exist and need truthful coverage assessment
|
|
9
|
+
- The goal is to distinguish real coverage from test theater
|
|
10
|
+
|
|
11
|
+
## Do Not Use When
|
|
12
|
+
- The work is about implementation quality (use Component Auditor)
|
|
13
|
+
- The work is about interfaces between components (use Seam Auditor)
|
|
14
|
+
- No tests exist (flag the gap and stop — there's nothing to audit)
|
|
15
|
+
|
|
16
|
+
## Expected Inputs
|
|
17
|
+
- Test file paths to audit
|
|
18
|
+
- Corresponding implementation file paths (read-only reference)
|
|
19
|
+
- Component mapping: which test files cover which source files
|
|
20
|
+
- Test framework and runner context (e.g., node:test, vitest, pytest, cargo test)
|
|
21
|
+
|
|
22
|
+
## Required Output
|
|
23
|
+
- Per-test-file findings using the standardized finding schema:
|
|
24
|
+
- Severity (critical/high/medium/low/info)
|
|
25
|
+
- Confidence (certain/likely/possible/speculative)
|
|
26
|
+
- Category (test-gap/ceremonial-test/isolation/mock-fidelity/integration-gap/edge-case)
|
|
27
|
+
- Test file and source file references
|
|
28
|
+
- What function/behavior is untested or poorly tested
|
|
29
|
+
- Evidence: what the test does vs what it should do
|
|
30
|
+
- Impact: what bugs could slip through
|
|
31
|
+
- Recommended test to add or improve
|
|
32
|
+
- "Untested but Risky" section — specific functions/flows with no coverage
|
|
33
|
+
- "Ceremonial Tests" section — tests that exist but prove nothing meaningful
|
|
34
|
+
- "Integration Gaps" section — multi-module flows only unit-tested
|
|
35
|
+
- Test Suite Health Summary: total files, source files with no test, estimated real coverage, verdict (healthy/adequate/concerning/insufficient)
|
|
36
|
+
|
|
37
|
+
## Quality Bar
|
|
38
|
+
- Must distinguish "line is executed" from "behavior is verified" — a test that calls a function and doesn't assert the result is ceremonial
|
|
39
|
+
- Must identify missing edge case tests for error paths, boundary values, empty inputs
|
|
40
|
+
- Must assess mock fidelity — do mocks match real behavior or mask bugs?
|
|
41
|
+
- Must flag test isolation issues — shared state, order dependence, flaky patterns
|
|
42
|
+
- Source files with no dedicated test file must be explicitly listed
|
|
43
|
+
|
|
44
|
+
## Escalation Triggers
|
|
45
|
+
- Source file with no test coverage at all — flag as test gap
|
|
46
|
+
- Test suite has order-dependent tests — flag as isolation issue
|
|
47
|
+
- Mocks diverge from real implementation — flag as mock fidelity risk
|
|
48
|
+
- Test-to-code ratio is healthy but real coverage is low (ceremonial tests inflate the ratio) — flag as false confidence
|