role-os 2.3.1 → 2.5.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -1,387 +1,387 @@
1
- <p align="center">
2
- <a href="README.ja.md">日本語</a> | <a href="README.zh.md">中文</a> | <a href="README.es.md">Español</a> | <a href="README.fr.md">Français</a> | <a href="README.hi.md">हिन्दी</a> | <a href="README.it.md">Italiano</a> | <a href="README.pt-BR.md">Português (BR)</a>
3
- </p>
4
-
5
- <p align="center">
6
- <img src="https://raw.githubusercontent.com/mcp-tool-shop-org/brand/main/logos/role-os/readme.png" alt="Role OS" width="600">
7
- </p>
8
-
9
- <p align="center">
10
- <a href="https://github.com/mcp-tool-shop-org/role-os/actions"><img src="https://github.com/mcp-tool-shop-org/role-os/actions/workflows/ci.yml/badge.svg" alt="CI"></a>
11
- <a href="https://www.npmjs.com/package/role-os"><img src="https://img.shields.io/npm/v/role-os" alt="npm"></a>
12
- <a href="LICENSE"><img src="https://img.shields.io/badge/license-MIT-blue" alt="MIT License"></a>
13
- <a href="https://mcp-tool-shop-org.github.io/role-os/"><img src="https://img.shields.io/badge/Landing_Page-live-brightgreen" alt="Landing Page"></a>
14
- </p>
15
-
16
- A multi-Claude operating system that staffs, routes, validates, and runs work through 61 specialized role contracts. Creates task packets, assembles the right team from scored role matching, detects broken chains before execution, auto-routes recovery when work is blocked or rejected, and requires structured evidence in every verdict. Includes dynamic dispatch for manifest-scaled missions — a 10-component repo automatically becomes 28 auditor steps, not 6. The dogfood swarm mission runs multi-pass convergence: three health stages then iterative feature delivery with exclusive file ownership and build gates.
17
-
18
- ## What it does
19
-
20
- Role OS is the professional way to use multi-Claude. It prevents the specific failures that generic AI workflows produce:
21
-
22
- - **Drift** — roles stay in lane. Product doesn't redesign. Frontend doesn't redefine scope. Backend doesn't invent product direction.
23
- - **False completion** — the done definition is concrete. Work that hides gaps, skips verification, or solves a different problem gets rejected.
24
- - **Contamination** — forked or inherited projects carry identity residue. Role OS detects and rejects cross-project drift in terminology, visuals, and mental models.
25
- - **Vibes-based progress** — every handoff is structured. Every verdict ties to evidence. "It feels done" is not a valid state.
26
-
27
- ## How it works
28
-
29
- Describe your task. Role OS decides the right level of orchestration automatically.
30
-
31
- ```bash
32
- roleos start "fix the crash in save handler"
33
- # → MISSION: Bugfix & Diagnosis (70% confidence)
34
- # Chain: Repo Researcher → Backend Engineer → Test Engineer → Critic Reviewer
35
-
36
- roleos start "add a new export command"
37
- # → PACK: Feature Build (50% confidence)
38
- # Roles: Orchestrator, Product Strategist, Spec Writer, Backend Engineer, Test Engineer, Critic Reviewer
39
-
40
- roleos start "something completely novel"
41
- # → FREE-ROUTING (10% confidence)
42
- # Hint: Create a packet and run `roleos route` for role-level routing
43
- ```
44
-
45
- **The fallback ladder:**
46
-
47
- 1. **Mission** — when the task matches a proven recurring workflow (bugfix, treatment, feature-ship, docs, security, research, brainstorm, deep-audit, dogfood-swarm). Known role chain, artifact flow, escalation branches, and honest-partial definitions.
48
- 2. **Pack** — when the task is a known family but not a full mission shape. 10 calibrated team packs with auto-selection and mismatch guards.
49
- 3. **Free routing** — when the task is novel, mixed, or uncertain. Scores all 61 roles against packet content and assembles a dynamic chain.
50
-
51
- The system never forces work through the wrong abstraction. It explains why it chose each level and offers alternatives.
52
-
53
- **One command to active execution:**
54
-
55
- ```bash
56
- roleos run "fix the crash in save handler"
57
- # → Created run: run-1234
58
- # → Entry: MISSION (bugfix)
59
- # → Started step 0: Repo Researcher → diagnosis-report
60
- # → Guidance: Required sections: entrypoints, module-map, build-test-commands
61
-
62
- roleos next # Start the next step
63
- roleos complete diagnosis.md # Complete the active step with artifact
64
- roleos explain # Show full run state and guidance
65
- roleos resume # Continue an interrupted run
66
- roleos report # Generate completion report
67
- roleos friction # Measure operator touches
68
- ```
69
-
70
- **Interventions when things go wrong:**
71
-
72
- ```bash
73
- roleos retry 0 # Retry a failed step
74
- roleos reroute 1 "Frontend Developer" "UI bug" # Swap a role
75
- roleos escalate "Test Engineer" "Repo Researcher" "missed edge case" "re-diagnose"
76
- roleos block 2 "waiting for API spec"
77
- roleos reopen 0 "found issue in review"
78
- ```
79
-
80
- Runs persist to disk (`.claude/runs/`), so interrupted sessions resume cleanly. Every step includes operator guidance: what to produce, required sections, and stop conditions.
81
-
82
- **Once routed:**
83
-
84
- 1. **Each role produces a handoff** — structured output with evidence items that reduce ambiguity for the next role
85
- 2. **Critic reviews against contract** — accepts, rejects, or blocks based on structured evidence, not impression
86
- 3. **Recovery routes automatically** — blocked or rejected work gets routed to the right resolver with a reason, recovery type, and required artifact
87
-
88
- ## Org rollout state
89
-
90
- Org-wide rollout state (queue, decisions, audit records, per-repo lock packets) lives in a separate private repo: [`role-os-rollout`](https://github.com/mcp-tool-shop-org/role-os-rollout). This repo is the product; that repo is operational state.
91
-
92
- ## Memory and continuity
93
-
94
- Role OS does not own or duplicate the memory layer. Where Claude project memory exists, it is the canonical continuity system — repo facts, decisions, open loops, and treatment history live there.
95
-
96
- Role OS integrates with Claude project memory. It does not replace it.
97
-
98
- ## Full treatment and shipcheck
99
-
100
- Full treatment is a canonical 7-phase protocol defined in Claude project memory (`memory/full-treatment.md`). Role OS routes and reviews treatments using role contracts, handoffs, and critic gates — it does not redefine the protocol.
101
-
102
- **Shipcheck** is the 31-item quality gate that runs before full treatment. Hard gates A-D must pass before any treatment begins. Canonical reference: `memory/shipcheck.md`.
103
-
104
- Order: Shipcheck first, then full treatment. No v1.0.0 without passing hard gates.
105
-
106
- ## 61 roles across 10 packs
107
-
108
- | Pack | Roles |
109
- |------|-------|
110
- | **Core** (3) | Orchestrator, Product Strategist, Critic Reviewer |
111
- | **Engineering** (7) | Frontend Developer, Backend Engineer, Test Engineer, Refactor Engineer, Performance Engineer, Dependency Auditor, Security Reviewer |
112
- | **Design** (2) | UI Designer, Brand Guardian |
113
- | **Marketing** (1) | Launch Copywriter |
114
- | **Treatment** (7) | Repo Researcher, Repo Translator, Docs Architect, Metadata Curator, Coverage Auditor, Deployment Verifier, Release Engineer |
115
- | **Product** (3) | Feedback Synthesizer, Roadmap Prioritizer, Spec Writer |
116
- | **Research** (4) | UX Researcher, Competitive Analyst, Trend Researcher, User Interview Synthesizer |
117
- | **Growth** (4) | Launch Strategist, Content Strategist, Community Manager, Support Triage Lead |
118
- | **Deep Audit** (4) | Component Auditor, Test Truth Auditor, Seam Auditor, Audit Synthesizer |
119
- | **Swarm** (7) | Swarm Coordinator, Swarm Backend Agent, Swarm Bridge Agent, Swarm Tests Agent, Swarm Infra Agent, Swarm Frontend Agent, Swarm Synthesizer |
120
-
121
- Every role has a full contract: mission, use when, do not use when, expected inputs, required outputs, quality bar, and escalation triggers. Every role is routable — `roleos route` can recommend any of them based on packet content.
122
-
123
- ## Quick start
124
-
125
- ```bash
126
- npx role-os init
127
-
128
- # Describe what you need — Role OS picks the right level:
129
- roleos run "fix the crash in save handler"
130
- # → Creates run, picks bugfix mission, starts first step with guidance
131
-
132
- # Step through:
133
- roleos next # Start next step
134
- roleos complete artifact.md # Complete with artifact
135
- roleos explain # Show full state
136
- roleos report # Completion report
137
-
138
- # Deep audit:
139
- roleos audit manifest --generate # Create audit-manifest.json
140
- roleos audit # Start component-level deep audit
141
- roleos audit status # Check audit progress
142
- roleos audit verify # Verify manifest and outputs
143
-
144
- # Dogfood swarm:
145
- roleos swarm manifest --generate # Auto-detect domains from repo structure
146
- roleos swarm # Start multi-pass convergence swarm
147
- roleos swarm status # Check swarm progress by stage
148
- roleos swarm findings # List findings by severity
149
- roleos swarm approve # Approve feature gate
150
-
151
- # Or go manual:
152
- roleos start "fix the crash" # Entry decision only (no run)
153
- roleos packet new feature
154
- roleos route .claude/packets/my-feature.md
155
- roleos review .claude/packets/my-feature.md accept
156
-
157
- # Explore missions and packs:
158
- roleos mission list
159
- roleos packs list
160
- ```
161
-
162
- ## When not to use Role OS
163
-
164
- - Single-line fixes, typos, or obvious bugs
165
- - Exploratory research with no defined output
166
- - Work that fits in one person's head in 5 minutes
167
- - Emergency hotfixes that need to ship before a review chain completes
168
- - Projects where you want speed over structure
169
-
170
- ## Evidence
171
-
172
- Role OS was proven across three trial shapes in two structurally different repos:
173
-
174
- **Trial 001 — Feature work** (Crew Screen, Star Freight)
175
- - 7-role chain, 45 test scenarios, 0 role collisions
176
- - Prevented contamination from fork ancestor, caught inline invention, surfaced honest blockers
177
-
178
- **Trial 002 — Integration work** (CampaignState wiring, Star Freight)
179
- - 5-role chain, resolved architectural seam without fallback lies
180
- - Anti-fallback tests proved the live path is real, not placeholder
181
-
182
- **Trial 003 — Identity work** (Contamination purge, Star Freight)
183
- - 6-role chain, 51 test scenarios including durable CI contamination defense
184
- - Repaired inherited fiction drift without collapsing into broad redesign
185
-
186
- **Portability trial** (Persona consistency, sensor-humor)
187
- - Same spine, different language/domain/stack
188
- - Adopted with context changes only — no core contract modifications
189
-
190
- **Full treatment FT-001** (portlight-desktop)
191
- - 7-phase staffed treatment with Treatment Pack roles
192
- - Shipcheck gating proven, zero role collisions
193
-
194
- **Full treatment FT-002** (studioflow)
195
- - Same treatment pack, structurally different repo (creative workspace vs game)
196
- - Treatment Pack portable — no contract modifications needed
197
-
198
- **Brainstorm golden run** (MCP server marketplace topic)
199
- - 9-role chain, 4 analysts in parallel, cross-examine + rebut dispute graph
200
- - 4 challenges issued, 3 claims narrowed, 1 unresolved — healthy pressure, not deadlock
201
- - 16+ trace links from rendered artifacts back to truth-layer atoms
202
- - Full chain of custody proven: truth → atoms → dispute → synthesis → expand → judge → render → trace
203
-
204
- ## Core properties
205
-
206
- These are non-negotiable. If a change weakens any of them, reject it.
207
-
208
- - Role boundaries hold
209
- - Review has teeth
210
- - Escalation stays honest
211
- - Packets stay testable
212
- - Portability requires context adaptation, not core surgery
213
-
214
- ## Project structure
215
-
216
- ```
217
- role-os/
218
- bin/roleos.mjs ← CLI entrypoint
219
- src/
220
- entry.mjs ← Unified entry: mission → pack → free routing
221
- entry-cmd.mjs ← `roleos start` CLI command
222
- run.mjs ← Persistent run engine: create → step → pause → resume → report
223
- run-cmd.mjs ← `roleos run/resume/next/explain/complete/fail` + interventions
224
- mission.mjs ← 9 named mission types (feature, bugfix, treatment, docs, security, research, brainstorm, deep-audit, dogfood-swarm)
225
- mission-run.mjs ← Mission runner: create → step → complete → report
226
- mission-cmd.mjs ← `roleos mission` CLI commands
227
- audit-cmd.mjs ← `roleos audit` — deep audit entry point with manifest generation
228
- swarm-cmd.mjs ← `roleos swarm` — dogfood swarm entry point with domain detection
229
- swarm/ ← Domain detection, build gate, evidence persistence bridge
230
- route.mjs ← 61-role routing + dynamic chain builder
231
- packs.mjs ← 10 calibrated team packs + auto-selection
232
- conflicts.mjs ← 4-pass conflict detection
233
- escalation.mjs ← Auto-routing for blocked/rejected/split
234
- evidence.mjs ← Structured evidence + role-aware requirements
235
- dispatch.mjs ← Runtime dispatch manifests for multi-claude
236
- tool-profiles.mjs ← Per-role tool sandboxing (shared by dispatch + trial)
237
- state-machine.mjs ← Canonical step/run transition maps
238
- artifacts.mjs ← Per-role artifact contracts + pack handoffs
239
- decompose.mjs ← Composite task detection + splitting
240
- composite.mjs ← Dependency-ordered execution + recovery + cycle detection
241
- replan.mjs ← Mid-run adaptive replanning
242
- calibration.mjs ← Outcome recording + weight tuning
243
- hooks.mjs ← 5 lifecycle hooks for runtime enforcement
244
- session.mjs ← Session scaffolding + doctor
245
- brainstorm.mjs ← Evidence modes, request validation, finding/synthesis/judge schemas
246
- brainstorm-roles.mjs ← Role-native schemas, input partitioning, blindspot enforcement, cross-exam
247
- brainstorm-render.mjs ← Two-layer rendering: lexical bans, render schemas, debate transcript
248
- test/ ← 1150 tests across 37 test files
249
- starter-pack/ ← Drop-in role contracts, policies, schemas, workflows
250
- ```
251
-
252
- ## Security
253
-
254
- Role OS operates **locally only**. It copies markdown templates and writes packet/verdict files to your repository's `.claude/` directory. It does not access the network, handle secrets, or collect telemetry. No dangerous operations — all file writes use skip-if-exists by default. See [SECURITY.md](SECURITY.md) for the full policy.
255
-
256
- ## The operating system
257
-
258
- | Layer | What it does | Status |
259
- |-------|-------------|--------|
260
- | **Routing** | Scores all 61 roles against packet content, explains recommendations, assesses confidence | ✓ Shipped |
261
- | **Chain builder** | Assembles phase-ordered chains from scored roles, packet-type biased not template-locked | ✓ Shipped |
262
- | **Conflict detection** | 4-pass validation: hard conflicts, sequence, redundancy, coverage gaps. Repair suggestions. | ✓ Shipped |
263
- | **Escalation** | Auto-routes blocked/rejected/split work to the right resolver with reason + required artifact | ✓ Shipped |
264
- | **Evidence** | Role-aware structured evidence in verdicts. Sufficiency checks. 12 evidence kinds. | ✓ Shipped |
265
- | **Dispatch** | Generates execution manifests for multi-claude. Per-role tool profiles, system prompts, budgets. | ✓ Shipped |
266
- | **Trials** | Full roster proven: 30/30 gold-task + 5/5 negative trials. 7 pack trials complete. | ✓ Complete |
267
- | **Team Packs** | 10 calibrated packs with auto-selection, mismatch guards, and free-routing fallback. | ✓ Shipped |
268
- | **Outcome calibration** | Records run outcomes, tunes pack/role weights from results, adjusts confidence thresholds. | ✓ Shipped |
269
- | **Mixed-task decomposition** | Detects composite work, splits into child packets, assigns packs, preserves dependencies. | ✓ Shipped |
270
- | **Composite execution** | Runs child packets in dependency order with artifact passing, branch recovery, and synthesis. | ✓ Shipped |
271
- | **Adaptive replanning** | Mid-run scope changes, findings, or new requirements update the plan without restarting. | ✓ Shipped |
272
- | **Session spine** | `roleos init claude` scaffolds CLAUDE.md, /roleos-route, /roleos-review, /roleos-status. `roleos doctor` verifies wiring. Route cards prove engagement. | ✓ Shipped |
273
- | **Hook spine** | 5 lifecycle hooks (SessionStart, PromptSubmit, PreToolUse, SubagentStart, Stop). Advisory enforcement: route card reminders, write-tool gating, subagent role injection, completion audit. | ✓ Shipped |
274
- | **Artifact spine** | Per-role artifact contracts. Pack handoff contracts. Structural validation. Chain completeness checks. Downstream roles never guess what they received. | ✓ Shipped |
275
- | **Mission library** | 9 named missions (feature-ship, bugfix, treatment, docs-release, security-hardening, research-launch, brainstorm, deep-audit, dogfood-swarm). Each declares pack, role chain, artifact flow, escalation branches, honest-partial definition. | ✓ Shipped |
276
- | **Mission runner** | Create runs, step through with tracked state, complete/fail with honest reporting. Blocked-step propagation, out-of-chain escalation warnings, last-step re-opening. | ✓ Shipped |
277
- | **Unified entry** | `roleos start` decides mission vs pack vs free routing automatically. Fallback ladder with confidence scores, alternatives, and composite detection. | ✓ Shipped |
278
- | **Persistent runs** | `roleos run` creates disk-backed runs. `resume`, `next`, `explain`, `complete`, `fail`. Interventions: reroute, escalate, retry, block, reopen. Step-local guidance. Friction measurement. | ✓ Shipped |
279
- | **Brainstorm** | Two-layer architecture: truth (role-native schemas, provenance atoms, cross-exam dispute graph) + render (5 distinct voices, lexical bans, debate transcript). Trace links prove every rendered claim maps to a truth atom. Golden run proven. | ✓ Shipped |
280
- | **Deep Audit** | Manifest-scaled repo audit: decompose repo into components, dispatch N auditors + M test truth auditors + K seam auditors from dependency graph, synthesize into ranked verdict and action plan. Dynamic dispatch scales with repo size (2N + K + 3 formula). Runner-native with artifact validation at every step. | ✓ Shipped |
281
- | **Dogfood Swarm** | Multi-pass convergence: three health stages (bug/security → proactive → humanization) then feature pass. Exclusive file ownership, build gates after every wave, user checkpoints. Domain auto-detection generates manifests. Evidence bridge to dogfood-labs. | ✓ Shipped |
282
-
283
- ## 9 missions
284
-
285
- | Mission | Pack | Roles | When to use |
286
- |---------|------|-------|-------------|
287
- | `feature-ship` | feature | 5 | Full feature delivery: scope → spec → implement → test → review |
288
- | `bugfix` | bugfix | 4 | Diagnose root cause, fix, test, verify |
289
- | `treatment` | treatment | 4 | Shipcheck + polish + docs + CI verify + review |
290
- | `docs-release` | docs | 2 | Write/update documentation, release notes |
291
- | `security-hardening` | security | 4 | Threat model, audit, fix vulnerabilities, re-audit, verify |
292
- | `research-launch` | research | 4 | Frame question, research, document findings, decide |
293
- | `brainstorm` | brainstorm | 9 | Structured multi-perspective inquiry with traceable disagreement and verdict |
294
- | `deep-audit` | deep-audit | 5 (scales) | Manifest-backed repo audit — worker count scales with repo graph via dynamic dispatch |
295
- | `dogfood-swarm` | swarm | 8 (scales) | Multi-pass convergence: health-a → health-b → health-c → feature → final synthesis |
296
-
297
- Each mission includes honest-partial definitions — when work stalls, the system documents what was completed and what remains instead of bluffing completion.
298
-
299
- ### Brainstorm mission
300
-
301
- Not "AI brainstorming." The brainstorm mission is **specialized roles under law, with traceable disagreement and verdict-bearing output.**
302
-
303
- ```bash
304
- roleos run "explore product directions for a developer tool discovery platform"
305
- # → MISSION: Brainstorm (Structured Inquiry)
306
- # Chain: 4 Analysts (parallel) → Normalize → Cross-Examine → Rebut → Synthesize → Expand → Judge
307
- ```
308
-
309
- **What makes it different:**
310
-
311
- - **Layer 1 (truth):** Four analysts emit role-native schemas (ContextMap, UserValueMap, MechanicsMap, PositioningMap) — not shared prose. Each role is blindspot-enforced: forbidden phrases, forbidden claim kinds, filtered input partitions. Atoms carry provenance. A directed cross-examination graph produces targeted challenges. Original analysts defend, narrow, or retract under pressure.
312
-
313
- - **Layer 2 (render):** Five distinct human voices (Boundary Memo, Field Notes, System Sketch, Claim Brief, Cross-Exam Transcript) with lexical bans preventing voice convergence. Synthesis consumes truth, never rendered prose. Both layers always available.
314
-
315
- - **Chain of custody:** Every rendered sentence traces back to a truth-layer atom. Synthesis directions cite atoms. Cross-exam targets real claim IDs. The dispute graph is the product, not the prose.
316
-
317
- **Proven:** v0.4 golden run — full chain of custody verified. See [`examples/golden-run.md`](examples/golden-run.md) for the complete artifact chain.
318
-
319
- ### Deep audit mission
320
-
321
- Not a surface scan. The deep audit mission **decomposes a repo into bounded components and dispatches specialist auditors at a scale determined by the repo's own dependency graph.**
322
-
323
- ```bash
324
- roleos run "deep audit this repo" --manifest=audit-manifest.json
325
- # → MISSION: Deep Audit (Manifest-Scaled)
326
- # Steps: Component Auditor ×6 + Test Truth Auditor ×6 + Seam Auditor ×8 + Synthesizer + Action Plan + Critic = 23 steps
327
- ```
328
-
329
- **What makes it different:**
330
-
331
- - **Dynamic dispatch** — worker count is not fixed. A 10-component repo with 5 boundary clusters produces 28 steps (2×10 + 5 + 3). A 3-component repo produces 12. The scaling formula is `2N + K + 3` where N = components, K = boundaries.
332
- - **Manifest-backed parcels** — an `audit-manifest.json` defines components (with file paths, line counts, descriptions) and boundaries (from/to with interface descriptions). Each auditor receives only its parcel.
333
- - **Four role archetypes** — Component Auditor (code truth per module), Test Truth Auditor (tests that prove vs tests that exist), Seam Auditor (integration boundaries from the dependency graph), Audit Synthesizer (ranked verdict + action plan from all parcels).
334
- - **Artifact validation at every step** — `validateArtifact()` fires on every step completion in both execution paths. Results attached to step objects. The system knows whether each artifact met its contract.
335
- - **Honest partial** — when budget or scope blocks completion, per-component findings are individually valid. The system synthesizes from whatever completed, never bluffs full coverage.
336
-
337
- **Proven:** Runner-native proof run — 18 tests against real manifest, full lifecycle verified including escalation re-opening and partial failure. Scaling formula verified for 3/6/10/15-component manifests.
338
-
339
- ### Dogfood swarm mission
340
-
341
- Not a one-pass linter. The dogfood swarm mission **runs a multi-pass convergence protocol that moves a repo from "works" to "production-ready" through three health stages and iterative feature delivery.**
342
-
343
- ```bash
344
- roleos swarm
345
- # → MISSION: Dogfood Swarm (Multi-Pass Convergence)
346
- # Stages: Health-A → Health-B → Health-C → Feature → Final
347
- # Domain agents: 3-5 parallel per wave (exclusive file ownership)
348
- ```
349
-
350
- **What makes it different:**
351
-
352
- - **Three-stage health pass** — Stage A fixes bugs and security issues (loop until 0 CRITICAL + 0 HIGH). Stage B applies proactive hardening (user reviews findings). Stage C humanizes the codebase — error messages that help users, reconnection feedback, loading states, accessibility. Each stage is a distinct lens, not the same scan repeated.
353
- - **Exclusive file ownership** — every domain agent owns specific files via `swarm-manifest.json`. No two agents edit the same file. No merge conflicts. No coordination overhead.
354
- - **Build gates** — lint + typecheck + test must pass after every wave. The system auto-detects the build system (Node, Rust, Python, Go) and runs the right commands.
355
- - **User checkpoints** — Health-B and the feature pass require explicit user approval before execution. The system presents findings, the user decides what to build.
356
- - **Iterative convergence** — stages loop with wave loops until exit conditions are met or max iterations reached. Each wave re-audits from scratch to catch regressions introduced by previous fixes.
357
- - **Domain auto-detection** — `roleos swarm manifest --generate` detects repo type (CLI, web, desktop, MCP, monorepo) and generates non-overlapping domain assignments.
358
-
359
- **Proven:** claude-collaborate (2026-03-28) — 35→129 tests, 106 health findings fixed, v1.1.0 shipped. Protocol v2.0 with 9 phases.
360
-
361
- ## Status
362
-
363
- - v0.1–v0.4: Foundation — trials, adoption, treatment pack, starter pack
364
- - v1.0.0: 32 roles, full CLI, proven treatment, multi-repo portability
365
- - v1.0.2: Role OS lockdown (bootstrap truth fixes, init --force)
366
- - v1.1.0: 31 roles, full routing spine, conflict detection, escalation, evidence, dispatch, 7 proven team packs. 35 execution trials. 212 tests.
367
- - v1.2.0: Calibrated packs promoted to default entry. Auto-selection, mismatch detection, alternative suggestion, free-routing fallback. 246 tests.
368
- - v1.3.0: Outcome calibration, mixed-task decomposition, composite execution, adaptive replanning. 317 tests.
369
- - v1.4.0: Session spine — `roleos init claude`, `roleos doctor`, route cards, /roleos-route + /roleos-review + /roleos-status commands. 335 tests.
370
- - v1.5.0: Hook spine — 5 lifecycle hooks for runtime enforcement. 358 tests.
371
- - v1.6.0: Artifact spine — 20 per-role artifact contracts, 7 pack handoff contracts, structural validation. 385 tests.
372
- - v1.7.0: Completion proof — real tasks run through the full stack. `roleos artifacts` CLI. Honest escalation on structural fixes. 398 tests.
373
- - v1.8.0: Mission library (Phase S) — 6 named missions, runner engine, completion reports. Hardened from 6 real trial runs. 481 tests.
374
- - v1.9.0: Unified entry path (Phase T) — `roleos start` auto-decides mission vs pack vs free routing. Fallback ladder, composite detection, entry-path comparison trials. 527 tests.
375
- - **v2.0.0**: Operator friction pass (Phase U) — `roleos run` creates persistent disk-backed runs. Resume, next, explain, complete, fail. Interventions: reroute, escalate, retry, block, reopen. Step-local guidance at every step. Friction measurement. 6 friction trials. 613 tests.
376
- - **v2.0.1**: Handbook audit, beginner docs, test count corrections. 617 tests.
377
- - **v2.1.0**: Brainstorm mission (v0.4) — specialized roles under law, traceable disagreement, verdict-bearing output. Two-layer architecture (truth + render), cross-exam permission matrix, dispute graph, golden run proof. 7 missions, 50 roles, 8 packs. 894 tests.
378
- - **v2.2.0**: Deep Audit mission — manifest-scaled repo audit with dynamic dispatch. 4 new audit roles (Component Auditor, Test Truth Auditor, Seam Auditor, Audit Synthesizer). Worker count scales with repo graph (2N + K + 3 formula). Artifact validation wired at both execution boundaries. Runner-native proof run green. accept/approve truth fix in evidence layer. 8 missions, 54 roles, 9 packs. 936 tests.
379
- - **v2.3.0**: Dogfood Swarm mission — multi-pass convergence (health-a → health-b → health-c → feature → final). 7 new swarm roles (Swarm Coordinator, 5 domain agents, Swarm Synthesizer). Two new mission primitives: waveLoops (iterative convergence) and exclusiveOwnership (domain file boundaries). Dynamic domain dispatch, build gates, `roleos swarm` CLI, domain auto-detection, evidence persistence bridge. 9 missions, 61 roles, 10 packs. 1150 tests.
380
-
381
- ## License
382
-
383
- MIT
384
-
385
- ---
386
-
387
- Built by <a href="https://mcp-tool-shop.github.io/">MCP Tool Shop</a>
1
+ <p align="center">
2
+ <a href="README.ja.md">日本語</a> | <a href="README.zh.md">中文</a> | <a href="README.es.md">Español</a> | <a href="README.fr.md">Français</a> | <a href="README.hi.md">हिन्दी</a> | <a href="README.it.md">Italiano</a> | <a href="README.pt-BR.md">Português (BR)</a>
3
+ </p>
4
+
5
+ <p align="center">
6
+ <img src="https://raw.githubusercontent.com/mcp-tool-shop-org/brand/main/logos/role-os/readme.png" alt="Role OS" width="600">
7
+ </p>
8
+
9
+ <p align="center">
10
+ <a href="https://github.com/mcp-tool-shop-org/role-os/actions"><img src="https://github.com/mcp-tool-shop-org/role-os/actions/workflows/ci.yml/badge.svg" alt="CI"></a>
11
+ <a href="https://www.npmjs.com/package/role-os"><img src="https://img.shields.io/npm/v/role-os" alt="npm"></a>
12
+ <a href="LICENSE"><img src="https://img.shields.io/badge/license-MIT-blue" alt="MIT License"></a>
13
+ <a href="https://mcp-tool-shop-org.github.io/role-os/"><img src="https://img.shields.io/badge/Landing_Page-live-brightgreen" alt="Landing Page"></a>
14
+ </p>
15
+
16
+ A multi-Claude operating system that staffs, routes, validates, and runs work through 61 specialized role contracts. Creates task packets, assembles the right team from scored role matching, detects broken chains before execution, auto-routes recovery when work is blocked or rejected, and requires structured evidence in every verdict. Includes dynamic dispatch for manifest-scaled missions — a 10-component repo automatically becomes 28 auditor steps, not 6. The dogfood swarm mission runs multi-pass convergence: three health stages then iterative feature delivery with exclusive file ownership and build gates.
17
+
18
+ ## What it does
19
+
20
+ Role OS is the professional way to use multi-Claude. It prevents the specific failures that generic AI workflows produce:
21
+
22
+ - **Drift** — roles stay in lane. Product doesn't redesign. Frontend doesn't redefine scope. Backend doesn't invent product direction.
23
+ - **False completion** — the done definition is concrete. Work that hides gaps, skips verification, or solves a different problem gets rejected.
24
+ - **Contamination** — forked or inherited projects carry identity residue. Role OS detects and rejects cross-project drift in terminology, visuals, and mental models.
25
+ - **Vibes-based progress** — every handoff is structured. Every verdict ties to evidence. "It feels done" is not a valid state.
26
+
27
+ ## How it works
28
+
29
+ Describe your task. Role OS decides the right level of orchestration automatically.
30
+
31
+ ```bash
32
+ roleos start "fix the crash in save handler"
33
+ # → MISSION: Bugfix & Diagnosis (70% confidence)
34
+ # Chain: Repo Researcher → Backend Engineer → Test Engineer → Critic Reviewer
35
+
36
+ roleos start "add a new export command"
37
+ # → PACK: Feature Build (50% confidence)
38
+ # Roles: Orchestrator, Product Strategist, Spec Writer, Backend Engineer, Test Engineer, Critic Reviewer
39
+
40
+ roleos start "something completely novel"
41
+ # → FREE-ROUTING (10% confidence)
42
+ # Hint: Create a packet and run `roleos route` for role-level routing
43
+ ```
44
+
45
+ **The fallback ladder:**
46
+
47
+ 1. **Mission** — when the task matches a proven recurring workflow (bugfix, treatment, feature-ship, docs, security, research, brainstorm, deep-audit, dogfood-swarm). Known role chain, artifact flow, escalation branches, and honest-partial definitions.
48
+ 2. **Pack** — when the task is a known family but not a full mission shape. 10 calibrated team packs with auto-selection and mismatch guards.
49
+ 3. **Free routing** — when the task is novel, mixed, or uncertain. Scores all 61 roles against packet content and assembles a dynamic chain.
50
+
51
+ The system never forces work through the wrong abstraction. It explains why it chose each level and offers alternatives.
52
+
53
+ **One command to active execution:**
54
+
55
+ ```bash
56
+ roleos run "fix the crash in save handler"
57
+ # → Created run: run-1234
58
+ # → Entry: MISSION (bugfix)
59
+ # → Started step 0: Repo Researcher → diagnosis-report
60
+ # → Guidance: Required sections: entrypoints, module-map, build-test-commands
61
+
62
+ roleos next # Start the next step
63
+ roleos complete diagnosis.md # Complete the active step with artifact
64
+ roleos explain # Show full run state and guidance
65
+ roleos resume # Continue an interrupted run
66
+ roleos report # Generate completion report
67
+ roleos friction # Measure operator touches
68
+ ```
69
+
70
+ **Interventions when things go wrong:**
71
+
72
+ ```bash
73
+ roleos retry 0 # Retry a failed step
74
+ roleos reroute 1 "Frontend Developer" "UI bug" # Swap a role
75
+ roleos escalate "Test Engineer" "Repo Researcher" "missed edge case" "re-diagnose"
76
+ roleos block 2 "waiting for API spec"
77
+ roleos reopen 0 "found issue in review"
78
+ ```
79
+
80
+ Runs persist to disk (`.claude/runs/`), so interrupted sessions resume cleanly. Every step includes operator guidance: what to produce, required sections, and stop conditions.
81
+
82
+ **Once routed:**
83
+
84
+ 1. **Each role produces a handoff** — structured output with evidence items that reduce ambiguity for the next role
85
+ 2. **Critic reviews against contract** — accepts, rejects, or blocks based on structured evidence, not impression
86
+ 3. **Recovery routes automatically** — blocked or rejected work gets routed to the right resolver with a reason, recovery type, and required artifact
87
+
88
+ ## Org rollout state
89
+
90
+ Org-wide rollout state (queue, decisions, audit records, per-repo lock packets) lives in a separate private repo: [`role-os-rollout`](https://github.com/mcp-tool-shop-org/role-os-rollout). This repo is the product; that repo is operational state.
91
+
92
+ ## Memory and continuity
93
+
94
+ Role OS does not own or duplicate the memory layer. Where Claude project memory exists, it is the canonical continuity system — repo facts, decisions, open loops, and treatment history live there.
95
+
96
+ Role OS integrates with Claude project memory. It does not replace it.
97
+
98
+ ## Full treatment and shipcheck
99
+
100
+ Full treatment is a canonical 7-phase protocol defined in Claude project memory (`memory/full-treatment.md`). Role OS routes and reviews treatments using role contracts, handoffs, and critic gates — it does not redefine the protocol.
101
+
102
+ **Shipcheck** is the 31-item quality gate that runs before full treatment. Hard gates A-D must pass before any treatment begins. Canonical reference: `memory/shipcheck.md`.
103
+
104
+ Order: Shipcheck first, then full treatment. No v1.0.0 without passing hard gates.
105
+
106
+ ## 61 roles across 10 packs
107
+
108
+ | Pack | Roles |
109
+ |------|-------|
110
+ | **Core** (3) | Orchestrator, Product Strategist, Critic Reviewer |
111
+ | **Engineering** (7) | Frontend Developer, Backend Engineer, Test Engineer, Refactor Engineer, Performance Engineer, Dependency Auditor, Security Reviewer |
112
+ | **Design** (2) | UI Designer, Brand Guardian |
113
+ | **Marketing** (1) | Launch Copywriter |
114
+ | **Treatment** (7) | Repo Researcher, Repo Translator, Docs Architect, Metadata Curator, Coverage Auditor, Deployment Verifier, Release Engineer |
115
+ | **Product** (3) | Feedback Synthesizer, Roadmap Prioritizer, Spec Writer |
116
+ | **Research** (4) | UX Researcher, Competitive Analyst, Trend Researcher, User Interview Synthesizer |
117
+ | **Growth** (4) | Launch Strategist, Content Strategist, Community Manager, Support Triage Lead |
118
+ | **Deep Audit** (4) | Component Auditor, Test Truth Auditor, Seam Auditor, Audit Synthesizer |
119
+ | **Swarm** (7) | Swarm Coordinator, Swarm Backend Agent, Swarm Bridge Agent, Swarm Tests Agent, Swarm Infra Agent, Swarm Frontend Agent, Swarm Synthesizer |
120
+
121
+ Every role has a full contract: mission, use when, do not use when, expected inputs, required outputs, quality bar, and escalation triggers. Every role is routable — `roleos route` can recommend any of them based on packet content.
122
+
123
+ ## Quick start
124
+
125
+ ```bash
126
+ npx role-os init
127
+
128
+ # Describe what you need — Role OS picks the right level:
129
+ roleos run "fix the crash in save handler"
130
+ # → Creates run, picks bugfix mission, starts first step with guidance
131
+
132
+ # Step through:
133
+ roleos next # Start next step
134
+ roleos complete artifact.md # Complete with artifact
135
+ roleos explain # Show full state
136
+ roleos report # Completion report
137
+
138
+ # Deep audit:
139
+ roleos audit manifest --generate # Create audit-manifest.json
140
+ roleos audit # Start component-level deep audit
141
+ roleos audit status # Check audit progress
142
+ roleos audit verify # Verify manifest and outputs
143
+
144
+ # Dogfood swarm:
145
+ roleos swarm manifest --generate # Auto-detect domains from repo structure
146
+ roleos swarm # Start multi-pass convergence swarm
147
+ roleos swarm status # Check swarm progress by stage
148
+ roleos swarm findings # List findings by severity
149
+ roleos swarm approve # Approve feature gate
150
+
151
+ # Or go manual:
152
+ roleos start "fix the crash" # Entry decision only (no run)
153
+ roleos packet new feature
154
+ roleos route .claude/packets/my-feature.md
155
+ roleos review .claude/packets/my-feature.md accept
156
+
157
+ # Explore missions and packs:
158
+ roleos mission list
159
+ roleos packs list
160
+ ```
161
+
162
+ ## When not to use Role OS
163
+
164
+ - Single-line fixes, typos, or obvious bugs
165
+ - Exploratory research with no defined output
166
+ - Work that fits in one person's head in 5 minutes
167
+ - Emergency hotfixes that need to ship before a review chain completes
168
+ - Projects where you want speed over structure
169
+
170
+ ## Evidence
171
+
172
+ Role OS was proven across three trial shapes in two structurally different repos:
173
+
174
+ **Trial 001 — Feature work** (Crew Screen, Star Freight)
175
+ - 7-role chain, 45 test scenarios, 0 role collisions
176
+ - Prevented contamination from fork ancestor, caught inline invention, surfaced honest blockers
177
+
178
+ **Trial 002 — Integration work** (CampaignState wiring, Star Freight)
179
+ - 5-role chain, resolved architectural seam without fallback lies
180
+ - Anti-fallback tests proved the live path is real, not placeholder
181
+
182
+ **Trial 003 — Identity work** (Contamination purge, Star Freight)
183
+ - 6-role chain, 51 test scenarios including durable CI contamination defense
184
+ - Repaired inherited fiction drift without collapsing into broad redesign
185
+
186
+ **Portability trial** (Persona consistency, sensor-humor)
187
+ - Same spine, different language/domain/stack
188
+ - Adopted with context changes only — no core contract modifications
189
+
190
+ **Full treatment FT-001** (portlight-desktop)
191
+ - 7-phase staffed treatment with Treatment Pack roles
192
+ - Shipcheck gating proven, zero role collisions
193
+
194
+ **Full treatment FT-002** (studioflow)
195
+ - Same treatment pack, structurally different repo (creative workspace vs game)
196
+ - Treatment Pack portable — no contract modifications needed
197
+
198
+ **Brainstorm golden run** (MCP server marketplace topic)
199
+ - 9-role chain, 4 analysts in parallel, cross-examine + rebut dispute graph
200
+ - 4 challenges issued, 3 claims narrowed, 1 unresolved — healthy pressure, not deadlock
201
+ - 16+ trace links from rendered artifacts back to truth-layer atoms
202
+ - Full chain of custody proven: truth → atoms → dispute → synthesis → expand → judge → render → trace
203
+
204
+ ## Core properties
205
+
206
+ These are non-negotiable. If a change weakens any of them, reject it.
207
+
208
+ - Role boundaries hold
209
+ - Review has teeth
210
+ - Escalation stays honest
211
+ - Packets stay testable
212
+ - Portability requires context adaptation, not core surgery
213
+
214
+ ## Project structure
215
+
216
+ ```
217
+ role-os/
218
+ bin/roleos.mjs ← CLI entrypoint
219
+ src/
220
+ entry.mjs ← Unified entry: mission → pack → free routing
221
+ entry-cmd.mjs ← `roleos start` CLI command
222
+ run.mjs ← Persistent run engine: create → step → pause → resume → report
223
+ run-cmd.mjs ← `roleos run/resume/next/explain/complete/fail` + interventions
224
+ mission.mjs ← 9 named mission types (feature, bugfix, treatment, docs, security, research, brainstorm, deep-audit, dogfood-swarm)
225
+ mission-run.mjs ← Mission runner: create → step → complete → report
226
+ mission-cmd.mjs ← `roleos mission` CLI commands
227
+ audit-cmd.mjs ← `roleos audit` — deep audit entry point with manifest generation
228
+ swarm-cmd.mjs ← `roleos swarm` — dogfood swarm entry point with domain detection
229
+ swarm/ ← Domain detection, build gate, evidence persistence bridge
230
+ route.mjs ← 61-role routing + dynamic chain builder
231
+ packs.mjs ← 10 calibrated team packs + auto-selection
232
+ conflicts.mjs ← 4-pass conflict detection
233
+ escalation.mjs ← Auto-routing for blocked/rejected/split
234
+ evidence.mjs ← Structured evidence + role-aware requirements
235
+ dispatch.mjs ← Runtime dispatch manifests for multi-claude
236
+ tool-profiles.mjs ← Per-role tool sandboxing (shared by dispatch + trial)
237
+ state-machine.mjs ← Canonical step/run transition maps
238
+ artifacts.mjs ← Per-role artifact contracts + pack handoffs
239
+ decompose.mjs ← Composite task detection + splitting
240
+ composite.mjs ← Dependency-ordered execution + recovery + cycle detection
241
+ replan.mjs ← Mid-run adaptive replanning
242
+ calibration.mjs ← Outcome recording + weight tuning
243
+ hooks.mjs ← 5 lifecycle hooks for runtime enforcement
244
+ session.mjs ← Session scaffolding + doctor
245
+ brainstorm.mjs ← Evidence modes, request validation, finding/synthesis/judge schemas
246
+ brainstorm-roles.mjs ← Role-native schemas, input partitioning, blindspot enforcement, cross-exam
247
+ brainstorm-render.mjs ← Two-layer rendering: lexical bans, render schemas, debate transcript
248
+ test/ ← 1150 tests across 37 test files
249
+ starter-pack/ ← Drop-in role contracts, policies, schemas, workflows
250
+ ```
251
+
252
+ ## Security
253
+
254
+ Role OS operates **locally only**. It copies markdown templates and writes packet/verdict files to your repository's `.claude/` directory. It does not access the network, handle secrets, or collect telemetry. No dangerous operations — all file writes use skip-if-exists by default. See [SECURITY.md](SECURITY.md) for the full policy.
255
+
256
+ ## The operating system
257
+
258
+ | Layer | What it does | Status |
259
+ |-------|-------------|--------|
260
+ | **Routing** | Scores all 61 roles against packet content, explains recommendations, assesses confidence | ✓ Shipped |
261
+ | **Chain builder** | Assembles phase-ordered chains from scored roles, packet-type biased not template-locked | ✓ Shipped |
262
+ | **Conflict detection** | 4-pass validation: hard conflicts, sequence, redundancy, coverage gaps. Repair suggestions. | ✓ Shipped |
263
+ | **Escalation** | Auto-routes blocked/rejected/split work to the right resolver with reason + required artifact | ✓ Shipped |
264
+ | **Evidence** | Role-aware structured evidence in verdicts. Sufficiency checks. 12 evidence kinds. | ✓ Shipped |
265
+ | **Dispatch** | Generates execution manifests for multi-claude. Per-role tool profiles, system prompts, budgets. | ✓ Shipped |
266
+ | **Trials** | Full roster proven: 30/30 gold-task + 5/5 negative trials. 7 pack trials complete. | ✓ Complete |
267
+ | **Team Packs** | 10 calibrated packs with auto-selection, mismatch guards, and free-routing fallback. | ✓ Shipped |
268
+ | **Outcome calibration** | Records run outcomes, tunes pack/role weights from results, adjusts confidence thresholds. | ✓ Shipped |
269
+ | **Mixed-task decomposition** | Detects composite work, splits into child packets, assigns packs, preserves dependencies. | ✓ Shipped |
270
+ | **Composite execution** | Runs child packets in dependency order with artifact passing, branch recovery, and synthesis. | ✓ Shipped |
271
+ | **Adaptive replanning** | Mid-run scope changes, findings, or new requirements update the plan without restarting. | ✓ Shipped |
272
+ | **Session spine** | `roleos init claude` scaffolds CLAUDE.md, /roleos-route, /roleos-review, /roleos-status. `roleos doctor` verifies wiring. Route cards prove engagement. | ✓ Shipped |
273
+ | **Hook spine** | 5 lifecycle hooks (SessionStart, PromptSubmit, PreToolUse, SubagentStart, Stop). Advisory enforcement: route card reminders, write-tool gating, subagent role injection, completion audit. | ✓ Shipped |
274
+ | **Artifact spine** | Per-role artifact contracts. Pack handoff contracts. Structural validation. Chain completeness checks. Downstream roles never guess what they received. | ✓ Shipped |
275
+ | **Mission library** | 9 named missions (feature-ship, bugfix, treatment, docs-release, security-hardening, research-launch, brainstorm, deep-audit, dogfood-swarm). Each declares pack, role chain, artifact flow, escalation branches, honest-partial definition. | ✓ Shipped |
276
+ | **Mission runner** | Create runs, step through with tracked state, complete/fail with honest reporting. Blocked-step propagation, out-of-chain escalation warnings, last-step re-opening. | ✓ Shipped |
277
+ | **Unified entry** | `roleos start` decides mission vs pack vs free routing automatically. Fallback ladder with confidence scores, alternatives, and composite detection. | ✓ Shipped |
278
+ | **Persistent runs** | `roleos run` creates disk-backed runs. `resume`, `next`, `explain`, `complete`, `fail`. Interventions: reroute, escalate, retry, block, reopen. Step-local guidance. Friction measurement. | ✓ Shipped |
279
+ | **Brainstorm** | Two-layer architecture: truth (role-native schemas, provenance atoms, cross-exam dispute graph) + render (5 distinct voices, lexical bans, debate transcript). Trace links prove every rendered claim maps to a truth atom. Golden run proven. | ✓ Shipped |
280
+ | **Deep Audit** | Manifest-scaled repo audit: decompose repo into components, dispatch N auditors + M test truth auditors + K seam auditors from dependency graph, synthesize into ranked verdict and action plan. Dynamic dispatch scales with repo size (2N + K + 3 formula). Runner-native with artifact validation at every step. | ✓ Shipped |
281
+ | **Dogfood Swarm** | Multi-pass convergence: three health stages (bug/security → proactive → humanization) then feature pass. Exclusive file ownership, build gates after every wave, user checkpoints. Domain auto-detection generates manifests. Evidence bridge to dogfood-labs. | ✓ Shipped |
282
+
283
+ ## 9 missions
284
+
285
+ | Mission | Pack | Roles | When to use |
286
+ |---------|------|-------|-------------|
287
+ | `feature-ship` | feature | 5 | Full feature delivery: scope → spec → implement → test → review |
288
+ | `bugfix` | bugfix | 4 | Diagnose root cause, fix, test, verify |
289
+ | `treatment` | treatment | 4 | Shipcheck + polish + docs + CI verify + review |
290
+ | `docs-release` | docs | 2 | Write/update documentation, release notes |
291
+ | `security-hardening` | security | 4 | Threat model, audit, fix vulnerabilities, re-audit, verify |
292
+ | `research-launch` | research | 4 | Frame question, research, document findings, decide |
293
+ | `brainstorm` | brainstorm | 9 | Structured multi-perspective inquiry with traceable disagreement and verdict |
294
+ | `deep-audit` | deep-audit | 5 (scales) | Manifest-backed repo audit — worker count scales with repo graph via dynamic dispatch |
295
+ | `dogfood-swarm` | swarm | 8 (scales) | Multi-pass convergence: health-a → health-b → health-c → feature → final synthesis |
296
+
297
+ Each mission includes honest-partial definitions — when work stalls, the system documents what was completed and what remains instead of bluffing completion.
298
+
299
+ ### Brainstorm mission
300
+
301
+ Not "AI brainstorming." The brainstorm mission is **specialized roles under law, with traceable disagreement and verdict-bearing output.**
302
+
303
+ ```bash
304
+ roleos run "explore product directions for a developer tool discovery platform"
305
+ # → MISSION: Brainstorm (Structured Inquiry)
306
+ # Chain: 4 Analysts (parallel) → Normalize → Cross-Examine → Rebut → Synthesize → Expand → Judge
307
+ ```
308
+
309
+ **What makes it different:**
310
+
311
+ - **Layer 1 (truth):** Four analysts emit role-native schemas (ContextMap, UserValueMap, MechanicsMap, PositioningMap) — not shared prose. Each role is blindspot-enforced: forbidden phrases, forbidden claim kinds, filtered input partitions. Atoms carry provenance. A directed cross-examination graph produces targeted challenges. Original analysts defend, narrow, or retract under pressure.
312
+
313
+ - **Layer 2 (render):** Five distinct human voices (Boundary Memo, Field Notes, System Sketch, Claim Brief, Cross-Exam Transcript) with lexical bans preventing voice convergence. Synthesis consumes truth, never rendered prose. Both layers always available.
314
+
315
+ - **Chain of custody:** Every rendered sentence traces back to a truth-layer atom. Synthesis directions cite atoms. Cross-exam targets real claim IDs. The dispute graph is the product, not the prose.
316
+
317
+ **Proven:** v0.4 golden run — full chain of custody verified. See [`examples/golden-run.md`](examples/golden-run.md) for the complete artifact chain.
318
+
319
+ ### Deep audit mission
320
+
321
+ Not a surface scan. The deep audit mission **decomposes a repo into bounded components and dispatches specialist auditors at a scale determined by the repo's own dependency graph.**
322
+
323
+ ```bash
324
+ roleos run "deep audit this repo" --manifest=audit-manifest.json
325
+ # → MISSION: Deep Audit (Manifest-Scaled)
326
+ # Steps: Component Auditor ×6 + Test Truth Auditor ×6 + Seam Auditor ×8 + Synthesizer + Action Plan + Critic = 23 steps
327
+ ```
328
+
329
+ **What makes it different:**
330
+
331
+ - **Dynamic dispatch** — worker count is not fixed. A 10-component repo with 5 boundary clusters produces 28 steps (2×10 + 5 + 3). A 3-component repo produces 12. The scaling formula is `2N + K + 3` where N = components, K = boundaries.
332
+ - **Manifest-backed parcels** — an `audit-manifest.json` defines components (with file paths, line counts, descriptions) and boundaries (from/to with interface descriptions). Each auditor receives only its parcel.
333
+ - **Four role archetypes** — Component Auditor (code truth per module), Test Truth Auditor (tests that prove vs tests that exist), Seam Auditor (integration boundaries from the dependency graph), Audit Synthesizer (ranked verdict + action plan from all parcels).
334
+ - **Artifact validation at every step** — `validateArtifact()` fires on every step completion in both execution paths. Results attached to step objects. The system knows whether each artifact met its contract.
335
+ - **Honest partial** — when budget or scope blocks completion, per-component findings are individually valid. The system synthesizes from whatever completed, never bluffs full coverage.
336
+
337
+ **Proven:** Runner-native proof run — 18 tests against real manifest, full lifecycle verified including escalation re-opening and partial failure. Scaling formula verified for 3/6/10/15-component manifests.
338
+
339
+ ### Dogfood swarm mission
340
+
341
+ Not a one-pass linter. The dogfood swarm mission **runs a multi-pass convergence protocol that moves a repo from "works" to "production-ready" through three health stages and iterative feature delivery.**
342
+
343
+ ```bash
344
+ roleos swarm
345
+ # → MISSION: Dogfood Swarm (Multi-Pass Convergence)
346
+ # Stages: Health-A → Health-B → Health-C → Feature → Final
347
+ # Domain agents: 3-5 parallel per wave (exclusive file ownership)
348
+ ```
349
+
350
+ **What makes it different:**
351
+
352
+ - **Three-stage health pass** — Stage A fixes bugs and security issues (loop until 0 CRITICAL + 0 HIGH). Stage B applies proactive hardening (user reviews findings). Stage C humanizes the codebase — error messages that help users, reconnection feedback, loading states, accessibility. Each stage is a distinct lens, not the same scan repeated.
353
+ - **Exclusive file ownership** — every domain agent owns specific files via `swarm-manifest.json`. No two agents edit the same file. No merge conflicts. No coordination overhead.
354
+ - **Build gates** — lint + typecheck + test must pass after every wave. The system auto-detects the build system (Node, Rust, Python, Go) and runs the right commands.
355
+ - **User checkpoints** — Health-B and the feature pass require explicit user approval before execution. The system presents findings, the user decides what to build.
356
+ - **Iterative convergence** — stages loop with wave loops until exit conditions are met or max iterations reached. Each wave re-audits from scratch to catch regressions introduced by previous fixes.
357
+ - **Domain auto-detection** — `roleos swarm manifest --generate` detects repo type (CLI, web, desktop, MCP, monorepo) and generates non-overlapping domain assignments.
358
+
359
+ **Proven:** claude-collaborate (2026-03-28) — 35→129 tests, 106 health findings fixed, v1.1.0 shipped. Protocol v2.0 with 9 phases.
360
+
361
+ ## Status
362
+
363
+ - v0.1–v0.4: Foundation — trials, adoption, treatment pack, starter pack
364
+ - v1.0.0: 32 roles, full CLI, proven treatment, multi-repo portability
365
+ - v1.0.2: Role OS lockdown (bootstrap truth fixes, init --force)
366
+ - v1.1.0: 31 roles, full routing spine, conflict detection, escalation, evidence, dispatch, 7 proven team packs. 35 execution trials. 212 tests.
367
+ - v1.2.0: Calibrated packs promoted to default entry. Auto-selection, mismatch detection, alternative suggestion, free-routing fallback. 246 tests.
368
+ - v1.3.0: Outcome calibration, mixed-task decomposition, composite execution, adaptive replanning. 317 tests.
369
+ - v1.4.0: Session spine — `roleos init claude`, `roleos doctor`, route cards, /roleos-route + /roleos-review + /roleos-status commands. 335 tests.
370
+ - v1.5.0: Hook spine — 5 lifecycle hooks for runtime enforcement. 358 tests.
371
+ - v1.6.0: Artifact spine — 20 per-role artifact contracts, 7 pack handoff contracts, structural validation. 385 tests.
372
+ - v1.7.0: Completion proof — real tasks run through the full stack. `roleos artifacts` CLI. Honest escalation on structural fixes. 398 tests.
373
+ - v1.8.0: Mission library (Phase S) — 6 named missions, runner engine, completion reports. Hardened from 6 real trial runs. 481 tests.
374
+ - v1.9.0: Unified entry path (Phase T) — `roleos start` auto-decides mission vs pack vs free routing. Fallback ladder, composite detection, entry-path comparison trials. 527 tests.
375
+ - **v2.0.0**: Operator friction pass (Phase U) — `roleos run` creates persistent disk-backed runs. Resume, next, explain, complete, fail. Interventions: reroute, escalate, retry, block, reopen. Step-local guidance at every step. Friction measurement. 6 friction trials. 613 tests.
376
+ - **v2.0.1**: Handbook audit, beginner docs, test count corrections. 617 tests.
377
+ - **v2.1.0**: Brainstorm mission (v0.4) — specialized roles under law, traceable disagreement, verdict-bearing output. Two-layer architecture (truth + render), cross-exam permission matrix, dispute graph, golden run proof. 7 missions, 50 roles, 8 packs. 894 tests.
378
+ - **v2.2.0**: Deep Audit mission — manifest-scaled repo audit with dynamic dispatch. 4 new audit roles (Component Auditor, Test Truth Auditor, Seam Auditor, Audit Synthesizer). Worker count scales with repo graph (2N + K + 3 formula). Artifact validation wired at both execution boundaries. Runner-native proof run green. accept/approve truth fix in evidence layer. 8 missions, 54 roles, 9 packs. 936 tests.
379
+ - **v2.3.0**: Dogfood Swarm mission — multi-pass convergence (health-a → health-b → health-c → feature → final). 7 new swarm roles (Swarm Coordinator, 5 domain agents, Swarm Synthesizer). Two new mission primitives: waveLoops (iterative convergence) and exclusiveOwnership (domain file boundaries). Dynamic domain dispatch, build gates, `roleos swarm` CLI, domain auto-detection, evidence persistence bridge. 9 missions, 61 roles, 10 packs. 1150 tests.
380
+
381
+ ## License
382
+
383
+ MIT
384
+
385
+ ---
386
+
387
+ Built by <a href="https://mcp-tool-shop.github.io/">MCP Tool Shop</a>