role-os 2.3.1 → 2.5.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/CHANGELOG.md CHANGED
@@ -1,437 +1,472 @@
1
- # Changelog
2
-
3
- ## 2.3.0
4
-
5
- ### Added
6
-
7
- #### Dogfood Swarm Mission Multi-Pass Health + Feature Convergence
8
-
9
- - **Dogfood swarm mission**9th mission in the library. Three-stage health pass (bug/security proactive → humanization) then iterative feature pass with exclusive file ownership, build gates, and user checkpoints. Moves a repo from "works" to "production-ready." Proven on claude-collaborate (35→129 tests, 106 findings fixed, v1.1.0 shipped).
10
- - **7 new roles** — Swarm Coordinator, Swarm Backend Agent, Swarm Bridge Agent, Swarm Tests Agent, Swarm Infra Agent, Swarm Frontend Agent, Swarm Synthesizer (61 total roles)
11
- - **Swarm team pack** — 10th pack, 8 roles (7 swarm + Critic Reviewer), with mismatch guards and trial evidence
12
- - **Two new mission primitives**:
13
- - `waveLoops` — iterative convergence with exit conditions, max iterations, build gates, and user approval flags
14
- - `exclusiveOwnership` — strict domain file boundaries enforced by manifest
15
- - **Dynamic domain dispatch** — scales agent count based on repo structure via `swarm-manifest.json`
16
- - **`roleos swarm` CLI**first-class entry point with subcommands: `swarm`, `swarm manifest`, `swarm manifest --generate`, `swarm status`, `swarm findings`, `swarm approve`, `swarm verify`
17
- - **Domain detection** (`src/swarm/domain-detect.mjs`) — auto-detects repo type (CLI, web, desktop, MCP, monorepo) and generates domain manifests with non-overlapping file ownership
18
- - **Build gate** (`src/swarm/build-gate.mjs`) — auto-detects build system (Node, Rust, Python, Go) and runs lint → typecheck → test verification after every wave
19
- - **Evidence persistence bridge** (`src/swarm/persist-bridge.mjs`) — optional connection back to dogfood-labs, converts wave results to dogfood submission + audit DB payloads
20
- - **7 artifact contracts** — `swarm-gate`, `wave-report` (×5 with domain-specific sections), `swarm-final-report`
21
- - **Pack handoff contract** for swarm flow
22
-
23
- ### Tests
24
- - 97 new tests (swarm core, domain detection, build gate, persist bridge)total: 1150
25
-
26
- ## 2.2.1
27
-
28
- ### Added
29
- - **`roleos audit` CLI** — first-class entry point for deep audit with subcommands: `audit`, `audit manifest`, `audit manifest --generate`, `audit status`, `audit verify`
30
- - **Shared state machine** (`src/state-machine.mjs`) — canonical step/run transitions shared by both runners
31
- - **Shared tool profiles** (`src/tool-profiles.mjs`)extracted from dispatch.mjs to break trial→dispatch coupling
32
-
33
- ### Fixed
34
- - **P3-1:** Cycle detection in composite execution (`detectCycles` + visited-set guard in `findUnreachable`)
35
- - **P3-2:** Dual-active guard in `startNext`/`startNextStep` prevents two steps active simultaneously
36
- - **P3-3:** Atomic persistence `saveRun` writes to temp file then renames
37
- - **P4-1:** Dependency Auditor has own artifact contract (`dependency-audit`), pack handoff corrected
38
- - **P4-2:** `partitionBrief` returns topic-only for unknown roles instead of full brief
39
- - **P4-3:** Atom kind normalization layer bridges scout `.kind` and atom `.claim_kind`
40
- - **P4-4:** `/dev/stdin` → `readFileSync(0)` for Windows compatibility in all 5 hooks
41
- - **P4-5:** TOOL_PROFILES extracted to shared module, eliminating trial→dispatch coupling
42
- - Node 18 compatibility fix for `import.meta.dirname` in deep-audit-proof test
43
-
44
- ### Tests
45
- - 18 new tests (audit-cmd, audit-p5, deep-audit-proof) total: 954
46
-
47
- ## 2.2.0
48
-
49
- ### Added
50
-
51
- #### Deep Audit MissionRunner-Native Componentized Repo Audit
52
-
53
- - **Deep audit mission** — 8th mission in the library. Decomposes a repo into bounded components, dispatches one auditor per component, inspects seams from the dependency graph, assesses test truth, then synthesizes into a ranked verdict and action plan.
54
- - **Dynamic dispatch** — missions with `dynamicDispatch` field now expand from a manifest at runtime. `createRun("deep-audit", task, { manifest })` creates N + M + K + 3 steps from the repo graph instead of a fixed static chain. A 6-component / 8-boundary repo produces 23 steps; a 10-component / 5-boundary repo produces 28.
55
- - **4 new audit roles** — Component Auditor, Seam Auditor, Test Truth Auditor, Audit Synthesizer. Each with full artifact contracts, tool profiles, and role definitions in starter-pack.
56
- - **Deep-audit pack** 9th team pack with scaling chain order, dispatch defaults, and mismatch guards.
57
- - **Artifact validation at execution boundaries** — `validateArtifact()` now runs on every step completion in both `run.mjs` and `mission-run.mjs`. Validation results are attached to the step object. Warn, don't block.
58
- - **Proof run test suite** — `test/deep-audit-proof.test.mjs` proves the full runner-native lifecycle against the real audit-manifest.json: step creation, parcel identity, validation, escalation, partial failure, scaling formula, and report generation.
59
-
60
- ### Fixed
61
-
62
- - **Critical: "approve" vs "accept" verdict mismatch** — `evidence.mjs:195` checked `!== "approve"` but the enum defines `"accept"`. Every accept verdict generated a spurious warning. Tests masked it via substring matching. Fixed to `"accept"` with hardened exact-assertion tests.
63
- - **Dead imports removed** — `TEAM_PACKS` and `ROLE_ARTIFACT_CONTRACTS` in mission-run.mjs, `TEAM_PACKS` in run.mjs, `scoreRole` and `MIN_SCORE_THRESHOLD` in trial.mjs were imported but never used.
64
- - **Warning message terminology** — all evidence warning messages now use "accept" instead of "approve" consistently.
65
-
66
- ### Changed
67
-
68
- - Mission count: 7 → 8
69
- - Role count: 50 54 (4 deep audit roles)
70
- - Pack count: 8 9
71
- - Artifact contract count: 30 34 (4 new audit role contracts)
72
- - Test count: 905 936
73
-
74
- ### Evidence
75
-
76
- - Self-audit dogfood: 128 findings (1 critical, 11 high, 39 medium) across 6 component parcels, 8 boundary seams, and 31 test files
77
- - Runner-native proof run: 23 dynamic steps from real manifest, full lifecycle, all green
78
- - Scaling formula verified: 2N + K + 3 holds for manifests of 3, 6, 10, and 15 components
79
-
80
- ## 2.1.0
81
-
82
- ### Added
83
-
84
- #### Brainstorm Mission (v0.4) — Structured Inquiry with Traceable Disagreement
85
-
86
- - **Brainstorm mission**7th mission in the library, 9-role chain with two-layer architecture
87
- - **Layer 1 (truth):** 4 analyst roles emit role-native schemas (ContextMap, UserValueMap, MechanicsMap, PositioningMap), not shared prose. Blindspot enforcement: forbidden phrases, forbidden claim kinds, filtered input partitions per role. Provenance-preserving atoms carry source_role, claim_kind, allowed_challengers. Cross-examination permission matrix (directed graph). Rebut phase: original analysts defend, narrow, or retract under pressure.
88
- - **Layer 2 (render):** 5 distinct voices Boundary Memo (taxonomist), Field Notes (ethnographer), System Sketch (whiteboard), Claim Brief (strategist), Cross-Exam Transcript (litigator). Lexical bans prevent voice convergence. Debate transcript generator. Both layers always available.
89
- - **Trace links:** Every rendered sentence maps to a truth-layer atom. Synthesis cites atoms, never prose.
90
- - **Golden run proof:** Full artifact chain for MCP server marketplace topic — truth artifacts, dispute graph (4 challenges, 3 narrowed, 1 unresolved), rendered artifacts, trace map (16+ links). Published as `examples/golden-run.md`.
91
- - **Result formatter:** `formatBrainstormResult()` produces saveable markdown with verdict, directions, dispute, tensions, rendered artifacts (opt-in), and evidence trail. Layer parameter controls truth-only vs both.
92
- - **Artifact contracts:** 9 brainstorm role contracts (replacing 3 v0.1 scout contracts) with completion rules, required evidence, and consumer mapping.
93
- - **Pack update:** Brainstorm pack updated from v0.1 scouts to v0.3/v0.4 analysts with correct chain order and required artifacts.
94
-
95
- ### Changed
96
-
97
- - Mission count: 6 7
98
- - Role count: 31 50 (brainstorm analysts, contrarian, plus existing)
99
- - Artifact contract count: 20 30
100
- - Test count: 617 → 905
101
-
102
- ## 2.0.1
103
-
104
- ### Added
105
-
106
- - 4 version consistency tests (semver, >= 1.0.0, CHANGELOG, help output)
107
-
108
- ## 2.0.0
109
-
110
- ### Added
111
-
112
- #### Operator Friction Pass (Phase U)
113
- - `roleos run "<task>"` one command from task description to active execution
114
- - Persistent disk-backed runs in `.claude/runs/` — survives session interruptions
115
- - Entry level auto-selection: mission, pack, or free routing with force overrides (`--mission=`, `--pack=`)
116
- - Step-local operator guidance at every step: role, artifact, required sections, completion rule, stop conditions
117
- - `roleos resume [id]` — continue interrupted runs from disk
118
- - `roleos next` — start the next step or show what's active
119
- - `roleos explain [id]`full run state with guidance, escalations, interventions
120
- - `roleos complete <artifact> [note]` — complete the active step with artifact reference
121
- - `roleos fail <partial|failed> <reason>` fail with honest downstream blocking
122
- - `roleos run list` list all runs with status icons
123
- - `roleos run show <id>`full run detail
124
-
125
- #### Intervention Shortcuts
126
- - `roleos retry <step>` retry a failed/partial step, unblock downstream
127
- - `roleos reroute <step> <role> <reason>` swap a step to a different role
128
- - `roleos escalate <from> <to> <trigger> <action>` escalate between roles with step re-opening
129
- - `roleos block <step> <reason>` — manually block a step
130
- - `roleos reopen <step> <reason>` — reopen a completed step for re-execution
131
-
132
- #### Friction Measurement
133
- - `roleos report [id]` generate completion report with honest-partial
134
- - `roleos friction [id]` — measure operator touches: interventions, escalations, manual steps
135
- - Friction score: low/medium/high based on touch count vs step count
136
-
137
- ### Evidence
138
- - 613 tests, zero failures (86 new)
139
- - 6 friction trials validated: clean run, reroute, retry, pack-level, free-routing, disk resume
140
- - All entry levels produce low/medium friction scores
141
- - Disk round-trip verified: create pause load resume → complete
142
-
143
- ## 1.9.0
144
-
145
- ### Added
146
-
147
- #### Unified Entry Path (Phase T)
148
- - `roleos start <task>`auto-decides mission vs pack vs free routing
149
- - Three-level fallback ladder with confidence scores and alternatives
150
- - Composite task detection warns when a task should be decomposed
151
- - `--json` flag for machine-readable entry decisions
152
- - 46 new tests: entry engine, comparison trials, CLI integration
153
-
154
- #### Handbook Updates
155
- - New Missions handbook page with full mission documentation
156
- - Updated Getting Started to lead with `roleos start`
157
- - Updated Reference with all CLI commands (start, mission, packs, artifacts, status, doctor)
158
- - Updated handbook index with entry levels and 9 operating layers
159
-
160
- #### README Overhaul
161
- - "How it works" section leads with `roleos start` examples
162
- - Quick Start updated with mission and start commands
163
- - Added 6 Missions table
164
- - Updated project structure with all 18 source modules
165
- - Updated status history through v1.9.0
166
-
167
- ### Evidence
168
- - 527 tests, zero failures (46 new)
169
- - Entry path trials validated against 20+ real task descriptions
170
- - Fallback ladder tested: mission, pack, free-routing, composite, empty input
171
-
172
- ## 1.8.0
173
-
174
- ### Added
175
-
176
- #### Mission Library (Phase S Mission Hardening)
177
- - 6 named, repeatable mission types: feature-ship, bugfix, treatment, docs-release, security-hardening, research-launch
178
- - Each mission declares: pack, role chain, artifact flow, escalation branches, honest-partial definition, stop conditions, dispatch defaults, trial evidence
179
- - Mission runner: create → step through → complete/fail → generate completion report
180
- - Completion proof reporter with honest-partial and formatted text output
181
- - `roleos mission list` — list all missions
182
- - `roleos mission show <key>` — full mission detail
183
- - `roleos mission suggest <text>` — signal-based mission suggestion
184
- - `roleos mission validate [key]` validate mission wiring against packs/roles
185
-
186
- #### Mission Runner Engine
187
- - `createRun()` instantiate a mission with tracked steps
188
- - `startNextStep()` / `completeStep()` / `failStep()` — step lifecycle
189
- - `recordEscalation()` — re-opens completed steps on escalation loops
190
- - `getRunPosition()` / `getArtifactChain()` run introspection
191
- - `generateCompletionReport()` / `formatCompletionReport()` honest outcome reporting
192
-
193
- ### Evidence
194
- - 465 tests, zero failures (67 new)
195
- - All 6 missions validate against live pack/role catalog
196
- - Full lifecycle tests: end-to-end runs, escalation loops, partial completions, failure reporting
197
-
198
- ## 1.7.0
199
-
200
- ### Added
201
-
202
- #### Completion Proof (Phase R)
203
- - `roleos artifacts` CLI command: list, show, validate, chain subcommands
204
- - 13 new CLI integration tests for artifact inspection
205
- - Real task completion missions through the full stack
206
-
207
- #### Completion Proof Evidence
208
- - R1-1 Feature mission: `roleos artifacts` command shipped through feature pack
209
- - Pack: feature (high confidence, correct)
210
- - Chain: 5 roles, 0 escalations, 1 minor correction
211
- - Artifact contracts: all 4 used and valid
212
- - R1-2 Bugfix mission: README.zh.md npm anomaly
213
- - Diagnosed correctly: npm auto-includes README* regardless of files field
214
- - Escalated honestly: fix requires structural decision (translation file organization)
215
- - Not force-closed: deferred to treatment pass
216
-
217
- ### Evidence
218
- - 398 tests, zero failures
219
- - 3 missions run through the full stack
220
- - Completion metrics recorded per mission
221
-
222
- ## 1.6.0
223
-
224
- ### Added
225
-
226
- #### Artifact Spine (Phase Q)
227
- - 20 per-role artifact contracts: each defines artifact type, required sections, evidence references, downstream consumers, and completion rules
228
- - `validateArtifact(role, content)` — structural validation against role contracts (missing sections, evidence references, content depth)
229
- - 7 pack-level handoff contracts: define the expected artifact flow between steps for each pack (e.g., strategy-brief → implementation-spec → change-plan → test-package → verdict)
230
- - `validatePackChain(pack, artifacts)` validates an entire pack's artifact chain for completeness
231
- - `getArtifactContract(role)` / `getHandoffContract(pack)` lookup APIs
232
- - `formatArtifactValidation()` / `formatPackChain()` — display formatters
233
-
234
- #### Artifact contract coverage
235
- - Product Strategist → strategy-brief (problem-framing, scope, non-goals, tradeoffs)
236
- - Spec Writer → implementation-spec (acceptance-criteria, edge-cases, interface-spec)
237
- - Backend/Frontend Engineer → change-plan (files-to-change, implementation-approach, risk-notes)
238
- - Test Engineer test-package (test-plan, test-cases, false-confidence-assessment)
239
- - Security Reviewer security-findings (findings, severity-assessment, recommendations)
240
- - Critic Reviewer verdict (verdict, evidence, required-corrections)
241
- - And 14 more roles with full contracts
242
-
243
- ### Evidence
244
- - 385 tests, zero failures
245
- - 27 new artifact tests
246
-
247
- ## 1.5.0
248
-
249
- ### Added
250
-
251
- #### Hook Spine / Runtime Enforcement (Phase R)
252
- - 5 lifecycle hooks: SessionStart, UserPromptSubmit, PreToolUse, SubagentStart, Stop
253
- - `scaffoldHooks()` generates all 5 hook scripts in .claude/hooks/
254
- - `roleos init claude` now scaffolds hooks + settings.local.json with hook config
255
- - `roleos doctor` now checks for hook scripts (check 7) and settings hooks (check 8)
256
-
257
- #### SessionStart hook
258
- - Establishes session contract on every new session
259
- - Records session ID, timestamp, initializes state tracking
260
- - Adds context reminding Claude to use /roleos-route for non-trivial tasks
261
-
262
- #### UserPromptSubmit hook
263
- - Classifies prompts as substantial (>50 chars + action verbs)
264
- - After 2+ substantial prompts without a route card, adds context reminder
265
- - Does not block advisory enforcement
266
-
267
- #### PreToolUse hook
268
- - Records all tool usage in session state
269
- - Flags write tools (Bash, Write, Edit) used without route card after substantial work
270
- - Advisory, not blocking preserves operator control
271
-
272
- #### SubagentStart hook
273
- - Injects active role contract into delegated agents
274
- - Ensures subagents inherit the Role OS session context
275
-
276
- #### Stop hook
277
- - Warns when substantial sessions end without route card or outcome artifact
278
- - Advisory — does not block session exit
279
- - Trivial sessions (< 2 substantial prompts) are exempt
280
-
281
- ### Evidence
282
- - 358 tests, zero failures
283
- - 23 new hook tests covering all 5 lifecycle hooks
284
-
285
- ## 1.4.0
286
-
287
- ### Added
288
-
289
- #### Session Spine (Phase Q)
290
- - `roleos init claude` scaffolds Claude Code integration: CLAUDE.md instructions, /roleos-route + /roleos-review + /roleos-status slash commands
291
- - `roleos doctor` — verifies repo is correctly wired for Role OS sessions (6 checks: .claude/ dir, CLAUDE.md section, /roleos-route command, context files, role contracts, packets)
292
- - Route card generation — session header artifact proving Role OS was engaged (task type, pack, confidence, composite status, success artifact)
293
- - CLAUDE.md template instructs Claude to route through Role OS before non-trivial work
294
- - /roleos-route command produces structured route cards
295
- - /roleos-review command guides structured verdict production
296
- - /roleos-status command shows active work and context health
297
- - Appends to existing CLAUDE.md without overwriting (detects Role OS section)
298
- - --force flag overwrites existing command files
299
-
300
- ### Evidence
301
- - 335 tests, zero failures
302
-
303
- ## 1.3.0
304
-
305
- ### Added
306
-
307
- #### Outcome Calibration (Phase M)
308
- - Run outcome ledger append-only JSONL recording pack selection, confidence, overrides, escalations, corrections, completion status
309
- - `computeCalibration()` pack usage rates, high-confidence accuracy, operator override rates, per-pack performance
310
- - `computePackBoosts()` — weight tuning from clean completed runs (+0.5/run, capped at 2.0)
311
- - `computeConfidenceAdjustment()` — raises threshold when high-confidence is often overridden, lowers when medium is often accepted
312
- - Auto-generated calibration suggestions when metrics drift
313
- - Safety constraint: calibration never overrides mismatch guards, conflict rules, escalation honesty, or evidence requirements
314
-
315
- #### Mixed-Task Decomposition (Phase N)
316
- - `detectComposite()` — 7 subtask categories (build, bugfix, security, docs, research, launch, treatment) with signal-based detection
317
- - Structural connector detection ("and then", "after that", "plus", "also")
318
- - Confidence levels: high (3+ categories or 2+ with connectors), medium, low
319
- - `decompose()` — generates linked child packets sorted by phase order
320
- - `createRunPlan()` — dependency-aware parent plan with child tracking
321
- - Honest fallback: medium/low confidence shows uncertainty warning with `--no-split` override
322
-
323
- #### Composite Execution (Phase O)
324
- - `initExecution()` / `advance()` — dependency-driven child execution with artifact passing
325
- - 7 artifact contracts defining what each category produces and expects
326
- - Artifact ledger tracking all cross-packet handoffs
327
- - `blockChild()` / `recoverChild()` / `failChild()` branch recovery with transitive cascade
328
- - `invalidateDownstream()` resets stale children when upstream changes, removes stale artifacts
329
- - `synthesize()` truthful parent-level completion report
330
- - Independent branches continue unaffected when a sibling fails
331
-
332
- #### Adaptive Replanning (Phase P)
333
- - 6 structured change event types: scope-change, artifact-changed, new-requirement, review-finding, dependency-discovered, priority-change
334
- - `analyzeImpact()` — identifies valid/stale children, stale artifacts, whether new children or reorder needed
335
- - `replan()` — selective replanning: invalidates only affected branches, inserts new children, updates dependencies
336
- - Plan diff: shows what changed, what stayed valid, what reopened, what was inserted
337
- - Execution resumes from next valid child after replan — no restart required
338
-
339
- ### Evidence
340
- - 317 tests, zero failures
341
- - Calibration, decomposition, composite execution, and replanning each have dedicated test suites
342
-
343
- ## 1.2.0
344
-
345
- ### Added
346
- - Pack auto-selection in `roleos route` suggests best pack when confidence is high
347
- - `roleos route --pack=<name>` use a specific pack for routing
348
- - Pack mismatch detection warns when a pack doesn't fit the task, suggests the correct alternative
349
- - Pack fallback — mismatched or unknown packs fall back to free routing automatically
350
- - `checkPackMismatch()` API with 7 guard sets covering all pack×task-type combinations
351
- - `getPackRoles()` API with conditional Orchestrator support
352
-
353
- ### Changed
354
- - Docs pack: Support Triage Lead now opens (was Feedback Synthesizer). Feedback Synthesizer is second. Release Engineer + Deployment Verifier moved to optional (overhead for docs-only tasks).
355
- - Pack calibration applied from comparison evidence: conditional Orchestrator, Security Reviewer in Treatment, Product Strategist opens Research, mismatch guards on all 7 packs.
356
-
357
- ### Evidence
358
- - Pack comparison: calibrated packs now win or tie 6/7 (was 2/7 pre-calibration)
359
- - Misfit honesty: 0 full bluffs, 0 undetected partial bluffs (was 1 + 3)
360
- - 230 tests, zero failures
361
-
362
- ## 1.1.0
363
-
364
- ### Added
365
-
366
- #### Routing
367
- - Full 31-role catalog — all roles scored by keyword, trigger phrase, packet type bias, and deliverable affinity
368
- - Dynamic chain builder phase-ordered assembly replacing static templates
369
- - Routing confidence assessment (high/medium/low)
370
- - `excludeWhen` enforcement roles suppressed when exclusion patterns match packet content
371
- - `detectType` false-positive prevention "integration testing" no longer triggers integration type
372
- - `--verbose` flag for `roleos route` hides scoring noise by default
373
-
374
- #### Conflict Detection
375
- - 4-pass conflict engine: hard conflicts, sequence, redundancy, coverage gaps
376
- - Per-role constraint registry: lateOnly, requiresBeforePacks
377
- - Overlap pair detection
378
- - Repair suggestions on every finding
379
-
380
- #### Escalation Auto-Routing
381
- - Blocked/rejected/conflict/split work auto-routes to named resolver
382
- - Every escalation includes: target role, recovery type, required artifact, handoff context
383
-
384
- #### Structured Evidence
385
- - 12 evidence kinds, 4 statuses, closed 4-verdict enum (accept/accept-with-notes/reject/blocked)
386
- - Role-aware evidence requirements for 15 roles
387
- - Sufficiency checks with contradiction detection
388
-
389
- #### Runtime Dispatch
390
- - Execution manifests for multi-claude with per-role tool profiles and budgets
391
- - 8 execution states with auto-advance
392
- - Escalation packet generation for blocked/rejected steps
393
-
394
- #### Proven Team Packs
395
- - 7 battle-tested packs: feature, bugfix, security, docs, launch, research, treatment
396
- - `roleos packs list` — show all packs with role counts
397
- - `roleos packs suggest <packet>` — suggest best pack for a packet
398
- - `roleos packs show <name>` — show pack details (roles, artifacts, stop conditions)
399
- - Pack suggestion engine with confidence levels
400
-
401
- #### Trials
402
- - Full roster proven: 30/30 gold-task trials + 5/5 negative (wrong-task honesty) trials
403
- - 7 pack execution trials all packs ran full chains with honest Critic verdicts
404
- - Trial framework: buildClusterTrials, evaluateTrialOutput, formatTrialReport
405
-
406
- ### Changed
407
- - 32 31 roles: Information Architect merged into Docs Architect
408
- - Verdict vocabulary unified: evidence.mjs now uses accept/reject/blocked (matching review.mjs)
409
- - "worker" terminology replaced with "role" in dispatch.mjs
410
-
411
- ### Fixed
412
- - `excludeWhen` was declared on 14 roles but never enforced — now active in scoreRole
413
- - `detectType` false-positived on "integration testing" — now uses word-boundary regex
414
- - "Not triggered: N roles" noise hidden by default (shown with --verbose)
415
- - Handbook: Team Packs page added, reference sidebar reordered
416
-
417
- ## 1.0.2
418
-
419
- ### Fixed
420
- - Fix double-nested `.claude/.claude/` directory created by `roleos init` — `starter-pack/.claude/workflows/full-treatment.md` moved to `starter-pack/workflows/`
421
- - Read VERSION from `package.json` at runtime instead of hardcoded constant — prevents version drift between CLI and package metadata
422
-
423
- ### Added
424
- - `roleos init --force` — update canonical scaffolded files while always protecting user-filled `context/` files
425
- - 4 regression tests: no double-nesting, correct workflow placement, version sync, --force context protection
426
-
427
- ## 1.0.0
428
-
429
- ### Added
430
- - `roleos init` scaffold Role OS starter pack into `.claude/`
431
- - `roleos packet new <type>` create feature, integration, or identity packets
432
- - `roleos route <packet-file>` — recommend smallest valid role chain with dependency verification
433
- - `roleos review <packet-file> <verdict>` — record accept/reject/blocked verdicts
434
- - Full starter pack: 8 role contracts, 3 schemas, 4 policies, 3 workflows
435
- - Guided context templates with inline prompts
436
- - 3 canonical example packets (feature, integration, identity)
437
- - Adoption handbook
1
+ # Changelog
2
+
3
+ ## 2.5.0
4
+
5
+ ### Added
6
+
7
+ #### Local-panel seata second, family-different verifier for `verify-citations`, runnable locally for free
8
+
9
+ - **`roleos verify-citations --local-panel`**adds a local grounded-entailment PANEL (the `offload` CLI on **Qwen3-4B + Qwen3-14B + Mistral-Nemo-12B** via llama-swap) as a SECOND verifier seat, decorrelated from the Claude generator by construction (no Anthropic model in the panel) and from prism's single groundedness model (3 seats, ≥2 families, conservative majority). It re-checks each citation prism marked `supported` against prism's own retrieved evidence (`source_title` + `supporting_span`).
10
+ - **Monotone-tightening** — the panel can only downgrade a passing gate to **escalate** (`local_panel_disagreement`), never loosen one; the deterministic existence floor (`fabricated` → blocking) always dominates, and a non-passing gate is left untouched. A panel that is requested but unreachable **escalates** (`local_panel_unreachable`) the same closed-gate invariant prism uses.
11
+ - **Why it earns its seat (EXTERNAL_VERIFIER, now local + zero-cost):** the panel's measured property is **zero false-confirms** a 3-seat conservative majority never stamps a false claim "supported." On a 16-case real-arXiv citation set, `mistral-nemo-12b` solo false-confirmed a refuted claim (inverting arXiv:2404.13076's finding); the panel held it at `insufficient`. Receipt + dataset: `tensor-engine-knowledge/verifier/citation-panel-receipt.json` (study-swarm wave-6, recipe #156).
12
+ - **Receipt** gains a `local_panel` block (PIN_PER_STEP): the exact seat models used, per-citation panel verdicts, and any disagreement that downgraded the gate — folded into the receipt's hash chain via the verdict + a panel digest.
13
+ - New module `src/citation-panel.mjs` (`runOffloadPanel`, `applyLocalPanel`, `buildEvidence`); injectable `offloadExec` for tests. **Off by default** opt in with `--local-panel` (needs llama-swap up + `offload.py` on the rig; `OFFLOAD_PYTHON` / `OFFLOAD_SCRIPT` / `--llamaswap-base` configurable).
14
+
15
+ ### Tests
16
+ - 16 new tests (evidence building; the panel runner agree / refuted / insufficient / no-evidence / unreachable / garbage; monotone `applyLocalPanel`; end-to-end `runCitationGate --local-panel` incl. the PIN'd receipt + blocking-skips-panel). **1196 total, all green.**
17
+
18
+ ## 2.4.0
19
+
20
+ ### Added
21
+
22
+ #### Citation-Verification Gate — defers citation truthfulness to prism (an external, family-different verifier)
23
+
24
+ - **`roleos verify-citations <dispatch.md|.json>` CLI** — extracts a research dispatch's citations, shells the external `prism verify` CLI (a family-different, reasoning-stripped citation verifier), and gates on the verdict. Exit `0` = accept, `20` = blocking (a cited paper did not resolve in arXiv/Crossref likely fabricated), `10` = advisory (revise / escalate), `2` = no resolvable citations found.
25
+ - **Citation gate module** (`src/verify-citations.mjs`, peer to `build-gate.mjs`) — deterministic, copy-only extraction (`extractCitations` — never invents an identifier); a three-tier gate keyed to the failure source (`gateCitations`): existence `fabricated` → **BLOCKING** hard halt, soft groundedness `contradicted` → advisory revise, low-confidence → advisory escalate. An unreachable verifier **escalates, never default-accepts** ("an unreachable gate is a closed gate"). Emits a receipt chained to prism's HMAC receipt (per-citation `source_sha256` pins → drift-detectable on re-run).
26
+ - **Critic Reviewer** gains a citation-verification clause — for a research dispatch it runs the gate, treats blocking as reject and advisory as accept-with-notes / escalate, and never grades the citations itself.
27
+ - **Design doc** (`design/citation-verification-runner.md`) — research-grounded by a 4-question study-swarm (`wf_20651368-297`), with a Standards-compliance section scoring 15/15 on the applicable standards (NAMED_COMPENSATORS a documented read-only skip): EXTERNAL_VERIFIER, ANDON_AUTHORITY, PIN_PER_STEP, DECOMPOSE_BY_SECRETS, and UNCERTAINTY_GATED_HUMANS (the contrastive escalate-to-human path).
28
+ - Pairs with prism **v0.3.2**'s `prism verify --gate` (verdict-coded exit status).
29
+
30
+ ### Tests
31
+ - 14 new tests (extraction; three-tier gate; runner with injected prism accept / block / escalate / unreachable). Module is testable with no real prism shell-out.
32
+
33
+ ## 2.3.1
34
+
35
+ ### Changed
36
+ - Version bump for dogfood swarm mission release
37
+
38
+ ## 2.3.0
39
+
40
+ ### Added
41
+
42
+ #### Dogfood Swarm Mission Multi-Pass Health + Feature Convergence
43
+
44
+ - **Dogfood swarm mission** — 9th mission in the library. Three-stage health pass (bug/security → proactive → humanization) then iterative feature pass with exclusive file ownership, build gates, and user checkpoints. Moves a repo from "works" to "production-ready." Proven on claude-collaborate (35→129 tests, 106 findings fixed, v1.1.0 shipped).
45
+ - **7 new roles** — Swarm Coordinator, Swarm Backend Agent, Swarm Bridge Agent, Swarm Tests Agent, Swarm Infra Agent, Swarm Frontend Agent, Swarm Synthesizer (61 total roles)
46
+ - **Swarm team pack** — 10th pack, 8 roles (7 swarm + Critic Reviewer), with mismatch guards and trial evidence
47
+ - **Two new mission primitives**:
48
+ - `waveLoops` — iterative convergence with exit conditions, max iterations, build gates, and user approval flags
49
+ - `exclusiveOwnership` — strict domain file boundaries enforced by manifest
50
+ - **Dynamic domain dispatch** — scales agent count based on repo structure via `swarm-manifest.json`
51
+ - **`roleos swarm` CLI**first-class entry point with subcommands: `swarm`, `swarm manifest`, `swarm manifest --generate`, `swarm status`, `swarm findings`, `swarm approve`, `swarm verify`
52
+ - **Domain detection** (`src/swarm/domain-detect.mjs`) — auto-detects repo type (CLI, web, desktop, MCP, monorepo) and generates domain manifests with non-overlapping file ownership
53
+ - **Build gate** (`src/swarm/build-gate.mjs`) auto-detects build system (Node, Rust, Python, Go) and runs lint typecheck test verification after every wave
54
+ - **Evidence persistence bridge** (`src/swarm/persist-bridge.mjs`) optional connection back to dogfood-labs, converts wave results to dogfood submission + audit DB payloads
55
+ - **7 artifact contracts** — `swarm-gate`, `wave-report` (×5 with domain-specific sections), `swarm-final-report`
56
+ - **Pack handoff contract** for swarm flow
57
+
58
+ ### Tests
59
+ - 97 new tests (swarm core, domain detection, build gate, persist bridge) — total: 1150
60
+
61
+ ## 2.2.1
62
+
63
+ ### Added
64
+ - **`roleos audit` CLI** — first-class entry point for deep audit with subcommands: `audit`, `audit manifest`, `audit manifest --generate`, `audit status`, `audit verify`
65
+ - **Shared state machine** (`src/state-machine.mjs`) — canonical step/run transitions shared by both runners
66
+ - **Shared tool profiles** (`src/tool-profiles.mjs`) — extracted from dispatch.mjs to break trial→dispatch coupling
67
+
68
+ ### Fixed
69
+ - **P3-1:** Cycle detection in composite execution (`detectCycles` + visited-set guard in `findUnreachable`)
70
+ - **P3-2:** Dual-active guard in `startNext`/`startNextStep` prevents two steps active simultaneously
71
+ - **P3-3:** Atomic persistence `saveRun` writes to temp file then renames
72
+ - **P4-1:** Dependency Auditor has own artifact contract (`dependency-audit`), pack handoff corrected
73
+ - **P4-2:** `partitionBrief` returns topic-only for unknown roles instead of full brief
74
+ - **P4-3:** Atom kind normalization layer bridges scout `.kind` and atom `.claim_kind`
75
+ - **P4-4:** `/dev/stdin` → `readFileSync(0)` for Windows compatibility in all 5 hooks
76
+ - **P4-5:** TOOL_PROFILES extracted to shared module, eliminating trial→dispatch coupling
77
+ - Node 18 compatibility fix for `import.meta.dirname` in deep-audit-proof test
78
+
79
+ ### Tests
80
+ - 18 new tests (audit-cmd, audit-p5, deep-audit-proof) — total: 954
81
+
82
+ ## 2.2.0
83
+
84
+ ### Added
85
+
86
+ #### Deep Audit Mission Runner-Native Componentized Repo Audit
87
+
88
+ - **Deep audit mission** 8th mission in the library. Decomposes a repo into bounded components, dispatches one auditor per component, inspects seams from the dependency graph, assesses test truth, then synthesizes into a ranked verdict and action plan.
89
+ - **Dynamic dispatch** missions with `dynamicDispatch` field now expand from a manifest at runtime. `createRun("deep-audit", task, { manifest })` creates N + M + K + 3 steps from the repo graph instead of a fixed static chain. A 6-component / 8-boundary repo produces 23 steps; a 10-component / 5-boundary repo produces 28.
90
+ - **4 new audit roles** Component Auditor, Seam Auditor, Test Truth Auditor, Audit Synthesizer. Each with full artifact contracts, tool profiles, and role definitions in starter-pack.
91
+ - **Deep-audit pack** 9th team pack with scaling chain order, dispatch defaults, and mismatch guards.
92
+ - **Artifact validation at execution boundaries** `validateArtifact()` now runs on every step completion in both `run.mjs` and `mission-run.mjs`. Validation results are attached to the step object. Warn, don't block.
93
+ - **Proof run test suite** `test/deep-audit-proof.test.mjs` proves the full runner-native lifecycle against the real audit-manifest.json: step creation, parcel identity, validation, escalation, partial failure, scaling formula, and report generation.
94
+
95
+ ### Fixed
96
+
97
+ - **Critical: "approve" vs "accept" verdict mismatch** — `evidence.mjs:195` checked `!== "approve"` but the enum defines `"accept"`. Every accept verdict generated a spurious warning. Tests masked it via substring matching. Fixed to `"accept"` with hardened exact-assertion tests.
98
+ - **Dead imports removed** `TEAM_PACKS` and `ROLE_ARTIFACT_CONTRACTS` in mission-run.mjs, `TEAM_PACKS` in run.mjs, `scoreRole` and `MIN_SCORE_THRESHOLD` in trial.mjs were imported but never used.
99
+ - **Warning message terminology** all evidence warning messages now use "accept" instead of "approve" consistently.
100
+
101
+ ### Changed
102
+
103
+ - Mission count: 7 → 8
104
+ - Role count: 50 → 54 (4 deep audit roles)
105
+ - Pack count: 8 → 9
106
+ - Artifact contract count: 30 → 34 (4 new audit role contracts)
107
+ - Test count: 905 → 936
108
+
109
+ ### Evidence
110
+
111
+ - Self-audit dogfood: 128 findings (1 critical, 11 high, 39 medium) across 6 component parcels, 8 boundary seams, and 31 test files
112
+ - Runner-native proof run: 23 dynamic steps from real manifest, full lifecycle, all green
113
+ - Scaling formula verified: 2N + K + 3 holds for manifests of 3, 6, 10, and 15 components
114
+
115
+ ## 2.1.0
116
+
117
+ ### Added
118
+
119
+ #### Brainstorm Mission (v0.4)Structured Inquiry with Traceable Disagreement
120
+
121
+ - **Brainstorm mission** 7th mission in the library, 9-role chain with two-layer architecture
122
+ - **Layer 1 (truth):** 4 analyst roles emit role-native schemas (ContextMap, UserValueMap, MechanicsMap, PositioningMap), not shared prose. Blindspot enforcement: forbidden phrases, forbidden claim kinds, filtered input partitions per role. Provenance-preserving atoms carry source_role, claim_kind, allowed_challengers. Cross-examination permission matrix (directed graph). Rebut phase: original analysts defend, narrow, or retract under pressure.
123
+ - **Layer 2 (render):** 5 distinct voices Boundary Memo (taxonomist), Field Notes (ethnographer), System Sketch (whiteboard), Claim Brief (strategist), Cross-Exam Transcript (litigator). Lexical bans prevent voice convergence. Debate transcript generator. Both layers always available.
124
+ - **Trace links:** Every rendered sentence maps to a truth-layer atom. Synthesis cites atoms, never prose.
125
+ - **Golden run proof:** Full artifact chain for MCP server marketplace topic — truth artifacts, dispute graph (4 challenges, 3 narrowed, 1 unresolved), rendered artifacts, trace map (16+ links). Published as `examples/golden-run.md`.
126
+ - **Result formatter:** `formatBrainstormResult()` produces saveable markdown with verdict, directions, dispute, tensions, rendered artifacts (opt-in), and evidence trail. Layer parameter controls truth-only vs both.
127
+ - **Artifact contracts:** 9 brainstorm role contracts (replacing 3 v0.1 scout contracts) with completion rules, required evidence, and consumer mapping.
128
+ - **Pack update:** Brainstorm pack updated from v0.1 scouts to v0.3/v0.4 analysts with correct chain order and required artifacts.
129
+
130
+ ### Changed
131
+
132
+ - Mission count: 6 → 7
133
+ - Role count: 31 50 (brainstorm analysts, contrarian, plus existing)
134
+ - Artifact contract count: 20 30
135
+ - Test count: 617 905
136
+
137
+ ## 2.0.1
138
+
139
+ ### Added
140
+
141
+ - 4 version consistency tests (semver, >= 1.0.0, CHANGELOG, help output)
142
+
143
+ ## 2.0.0
144
+
145
+ ### Added
146
+
147
+ #### Operator Friction Pass (Phase U)
148
+ - `roleos run "<task>"`one command from task description to active execution
149
+ - Persistent disk-backed runs in `.claude/runs/` survives session interruptions
150
+ - Entry level auto-selection: mission, pack, or free routing with force overrides (`--mission=`, `--pack=`)
151
+ - Step-local operator guidance at every step: role, artifact, required sections, completion rule, stop conditions
152
+ - `roleos resume [id]` continue interrupted runs from disk
153
+ - `roleos next` — start the next step or show what's active
154
+ - `roleos explain [id]` — full run state with guidance, escalations, interventions
155
+ - `roleos complete <artifact> [note]` — complete the active step with artifact reference
156
+ - `roleos fail <partial|failed> <reason>` fail with honest downstream blocking
157
+ - `roleos run list` list all runs with status icons
158
+ - `roleos run show <id>` full run detail
159
+
160
+ #### Intervention Shortcuts
161
+ - `roleos retry <step>` retry a failed/partial step, unblock downstream
162
+ - `roleos reroute <step> <role> <reason>` swap a step to a different role
163
+ - `roleos escalate <from> <to> <trigger> <action>` — escalate between roles with step re-opening
164
+ - `roleos block <step> <reason>` manually block a step
165
+ - `roleos reopen <step> <reason>` — reopen a completed step for re-execution
166
+
167
+ #### Friction Measurement
168
+ - `roleos report [id]` generate completion report with honest-partial
169
+ - `roleos friction [id]` measure operator touches: interventions, escalations, manual steps
170
+ - Friction score: low/medium/high based on touch count vs step count
171
+
172
+ ### Evidence
173
+ - 613 tests, zero failures (86 new)
174
+ - 6 friction trials validated: clean run, reroute, retry, pack-level, free-routing, disk resume
175
+ - All entry levels produce low/medium friction scores
176
+ - Disk round-trip verified: create pause → load → resume → complete
177
+
178
+ ## 1.9.0
179
+
180
+ ### Added
181
+
182
+ #### Unified Entry Path (Phase T)
183
+ - `roleos start <task>` — auto-decides mission vs pack vs free routing
184
+ - Three-level fallback ladder with confidence scores and alternatives
185
+ - Composite task detection warns when a task should be decomposed
186
+ - `--json` flag for machine-readable entry decisions
187
+ - 46 new tests: entry engine, comparison trials, CLI integration
188
+
189
+ #### Handbook Updates
190
+ - New Missions handbook page with full mission documentation
191
+ - Updated Getting Started to lead with `roleos start`
192
+ - Updated Reference with all CLI commands (start, mission, packs, artifacts, status, doctor)
193
+ - Updated handbook index with entry levels and 9 operating layers
194
+
195
+ #### README Overhaul
196
+ - "How it works" section leads with `roleos start` examples
197
+ - Quick Start updated with mission and start commands
198
+ - Added 6 Missions table
199
+ - Updated project structure with all 18 source modules
200
+ - Updated status history through v1.9.0
201
+
202
+ ### Evidence
203
+ - 527 tests, zero failures (46 new)
204
+ - Entry path trials validated against 20+ real task descriptions
205
+ - Fallback ladder tested: mission, pack, free-routing, composite, empty input
206
+
207
+ ## 1.8.0
208
+
209
+ ### Added
210
+
211
+ #### Mission Library (Phase S Mission Hardening)
212
+ - 6 named, repeatable mission types: feature-ship, bugfix, treatment, docs-release, security-hardening, research-launch
213
+ - Each mission declares: pack, role chain, artifact flow, escalation branches, honest-partial definition, stop conditions, dispatch defaults, trial evidence
214
+ - Mission runner: create step through complete/fail → generate completion report
215
+ - Completion proof reporter with honest-partial and formatted text output
216
+ - `roleos mission list` — list all missions
217
+ - `roleos mission show <key>` — full mission detail
218
+ - `roleos mission suggest <text>` — signal-based mission suggestion
219
+ - `roleos mission validate [key]` validate mission wiring against packs/roles
220
+
221
+ #### Mission Runner Engine
222
+ - `createRun()` — instantiate a mission with tracked steps
223
+ - `startNextStep()` / `completeStep()` / `failStep()` — step lifecycle
224
+ - `recordEscalation()` — re-opens completed steps on escalation loops
225
+ - `getRunPosition()` / `getArtifactChain()` — run introspection
226
+ - `generateCompletionReport()` / `formatCompletionReport()` — honest outcome reporting
227
+
228
+ ### Evidence
229
+ - 465 tests, zero failures (67 new)
230
+ - All 6 missions validate against live pack/role catalog
231
+ - Full lifecycle tests: end-to-end runs, escalation loops, partial completions, failure reporting
232
+
233
+ ## 1.7.0
234
+
235
+ ### Added
236
+
237
+ #### Completion Proof (Phase R)
238
+ - `roleos artifacts` CLI command: list, show, validate, chain subcommands
239
+ - 13 new CLI integration tests for artifact inspection
240
+ - Real task completion missions through the full stack
241
+
242
+ #### Completion Proof Evidence
243
+ - R1-1 Feature mission: `roleos artifacts` command shipped through feature pack
244
+ - Pack: feature (high confidence, correct)
245
+ - Chain: 5 roles, 0 escalations, 1 minor correction
246
+ - Artifact contracts: all 4 used and valid
247
+ - R1-2 Bugfix mission: README.zh.md npm anomaly
248
+ - Diagnosed correctly: npm auto-includes README* regardless of files field
249
+ - Escalated honestly: fix requires structural decision (translation file organization)
250
+ - Not force-closed: deferred to treatment pass
251
+
252
+ ### Evidence
253
+ - 398 tests, zero failures
254
+ - 3 missions run through the full stack
255
+ - Completion metrics recorded per mission
256
+
257
+ ## 1.6.0
258
+
259
+ ### Added
260
+
261
+ #### Artifact Spine (Phase Q)
262
+ - 20 per-role artifact contracts: each defines artifact type, required sections, evidence references, downstream consumers, and completion rules
263
+ - `validateArtifact(role, content)` structural validation against role contracts (missing sections, evidence references, content depth)
264
+ - 7 pack-level handoff contracts: define the expected artifact flow between steps for each pack (e.g., strategy-brief → implementation-spec → change-plan → test-package → verdict)
265
+ - `validatePackChain(pack, artifacts)`validates an entire pack's artifact chain for completeness
266
+ - `getArtifactContract(role)` / `getHandoffContract(pack)` — lookup APIs
267
+ - `formatArtifactValidation()` / `formatPackChain()` — display formatters
268
+
269
+ #### Artifact contract coverage
270
+ - Product Strategist strategy-brief (problem-framing, scope, non-goals, tradeoffs)
271
+ - Spec Writer → implementation-spec (acceptance-criteria, edge-cases, interface-spec)
272
+ - Backend/Frontend Engineer → change-plan (files-to-change, implementation-approach, risk-notes)
273
+ - Test Engineer test-package (test-plan, test-cases, false-confidence-assessment)
274
+ - Security Reviewer security-findings (findings, severity-assessment, recommendations)
275
+ - Critic Reviewer → verdict (verdict, evidence, required-corrections)
276
+ - And 14 more roles with full contracts
277
+
278
+ ### Evidence
279
+ - 385 tests, zero failures
280
+ - 27 new artifact tests
281
+
282
+ ## 1.5.0
283
+
284
+ ### Added
285
+
286
+ #### Hook Spine / Runtime Enforcement (Phase R)
287
+ - 5 lifecycle hooks: SessionStart, UserPromptSubmit, PreToolUse, SubagentStart, Stop
288
+ - `scaffoldHooks()` generates all 5 hook scripts in .claude/hooks/
289
+ - `roleos init claude` now scaffolds hooks + settings.local.json with hook config
290
+ - `roleos doctor` now checks for hook scripts (check 7) and settings hooks (check 8)
291
+
292
+ #### SessionStart hook
293
+ - Establishes session contract on every new session
294
+ - Records session ID, timestamp, initializes state tracking
295
+ - Adds context reminding Claude to use /roleos-route for non-trivial tasks
296
+
297
+ #### UserPromptSubmit hook
298
+ - Classifies prompts as substantial (>50 chars + action verbs)
299
+ - After 2+ substantial prompts without a route card, adds context reminder
300
+ - Does not block — advisory enforcement
301
+
302
+ #### PreToolUse hook
303
+ - Records all tool usage in session state
304
+ - Flags write tools (Bash, Write, Edit) used without route card after substantial work
305
+ - Advisory, not blocking — preserves operator control
306
+
307
+ #### SubagentStart hook
308
+ - Injects active role contract into delegated agents
309
+ - Ensures subagents inherit the Role OS session context
310
+
311
+ #### Stop hook
312
+ - Warns when substantial sessions end without route card or outcome artifact
313
+ - Advisory does not block session exit
314
+ - Trivial sessions (< 2 substantial prompts) are exempt
315
+
316
+ ### Evidence
317
+ - 358 tests, zero failures
318
+ - 23 new hook tests covering all 5 lifecycle hooks
319
+
320
+ ## 1.4.0
321
+
322
+ ### Added
323
+
324
+ #### Session Spine (Phase Q)
325
+ - `roleos init claude` scaffolds Claude Code integration: CLAUDE.md instructions, /roleos-route + /roleos-review + /roleos-status slash commands
326
+ - `roleos doctor` verifies repo is correctly wired for Role OS sessions (6 checks: .claude/ dir, CLAUDE.md section, /roleos-route command, context files, role contracts, packets)
327
+ - Route card generation session header artifact proving Role OS was engaged (task type, pack, confidence, composite status, success artifact)
328
+ - CLAUDE.md template instructs Claude to route through Role OS before non-trivial work
329
+ - /roleos-route command produces structured route cards
330
+ - /roleos-review command guides structured verdict production
331
+ - /roleos-status command shows active work and context health
332
+ - Appends to existing CLAUDE.md without overwriting (detects Role OS section)
333
+ - --force flag overwrites existing command files
334
+
335
+ ### Evidence
336
+ - 335 tests, zero failures
337
+
338
+ ## 1.3.0
339
+
340
+ ### Added
341
+
342
+ #### Outcome Calibration (Phase M)
343
+ - Run outcome ledger — append-only JSONL recording pack selection, confidence, overrides, escalations, corrections, completion status
344
+ - `computeCalibration()` — pack usage rates, high-confidence accuracy, operator override rates, per-pack performance
345
+ - `computePackBoosts()` — weight tuning from clean completed runs (+0.5/run, capped at 2.0)
346
+ - `computeConfidenceAdjustment()` raises threshold when high-confidence is often overridden, lowers when medium is often accepted
347
+ - Auto-generated calibration suggestions when metrics drift
348
+ - Safety constraint: calibration never overrides mismatch guards, conflict rules, escalation honesty, or evidence requirements
349
+
350
+ #### Mixed-Task Decomposition (Phase N)
351
+ - `detectComposite()` 7 subtask categories (build, bugfix, security, docs, research, launch, treatment) with signal-based detection
352
+ - Structural connector detection ("and then", "after that", "plus", "also")
353
+ - Confidence levels: high (3+ categories or 2+ with connectors), medium, low
354
+ - `decompose()` generates linked child packets sorted by phase order
355
+ - `createRunPlan()` dependency-aware parent plan with child tracking
356
+ - Honest fallback: medium/low confidence shows uncertainty warning with `--no-split` override
357
+
358
+ #### Composite Execution (Phase O)
359
+ - `initExecution()` / `advance()` dependency-driven child execution with artifact passing
360
+ - 7 artifact contracts defining what each category produces and expects
361
+ - Artifact ledger tracking all cross-packet handoffs
362
+ - `blockChild()` / `recoverChild()` / `failChild()` — branch recovery with transitive cascade
363
+ - `invalidateDownstream()` — resets stale children when upstream changes, removes stale artifacts
364
+ - `synthesize()` — truthful parent-level completion report
365
+ - Independent branches continue unaffected when a sibling fails
366
+
367
+ #### Adaptive Replanning (Phase P)
368
+ - 6 structured change event types: scope-change, artifact-changed, new-requirement, review-finding, dependency-discovered, priority-change
369
+ - `analyzeImpact()` identifies valid/stale children, stale artifacts, whether new children or reorder needed
370
+ - `replan()` — selective replanning: invalidates only affected branches, inserts new children, updates dependencies
371
+ - Plan diff: shows what changed, what stayed valid, what reopened, what was inserted
372
+ - Execution resumes from next valid child after replan no restart required
373
+
374
+ ### Evidence
375
+ - 317 tests, zero failures
376
+ - Calibration, decomposition, composite execution, and replanning each have dedicated test suites
377
+
378
+ ## 1.2.0
379
+
380
+ ### Added
381
+ - Pack auto-selection in `roleos route` — suggests best pack when confidence is high
382
+ - `roleos route --pack=<name>` use a specific pack for routing
383
+ - Pack mismatch detection — warns when a pack doesn't fit the task, suggests the correct alternative
384
+ - Pack fallback — mismatched or unknown packs fall back to free routing automatically
385
+ - `checkPackMismatch()` API with 7 guard sets covering all pack×task-type combinations
386
+ - `getPackRoles()` API with conditional Orchestrator support
387
+
388
+ ### Changed
389
+ - Docs pack: Support Triage Lead now opens (was Feedback Synthesizer). Feedback Synthesizer is second. Release Engineer + Deployment Verifier moved to optional (overhead for docs-only tasks).
390
+ - Pack calibration applied from comparison evidence: conditional Orchestrator, Security Reviewer in Treatment, Product Strategist opens Research, mismatch guards on all 7 packs.
391
+
392
+ ### Evidence
393
+ - Pack comparison: calibrated packs now win or tie 6/7 (was 2/7 pre-calibration)
394
+ - Misfit honesty: 0 full bluffs, 0 undetected partial bluffs (was 1 + 3)
395
+ - 230 tests, zero failures
396
+
397
+ ## 1.1.0
398
+
399
+ ### Added
400
+
401
+ #### Routing
402
+ - Full 31-role catalog all roles scored by keyword, trigger phrase, packet type bias, and deliverable affinity
403
+ - Dynamic chain builderphase-ordered assembly replacing static templates
404
+ - Routing confidence assessment (high/medium/low)
405
+ - `excludeWhen` enforcement — roles suppressed when exclusion patterns match packet content
406
+ - `detectType` false-positive prevention — "integration testing" no longer triggers integration type
407
+ - `--verbose` flag for `roleos route` hides scoring noise by default
408
+
409
+ #### Conflict Detection
410
+ - 4-pass conflict engine: hard conflicts, sequence, redundancy, coverage gaps
411
+ - Per-role constraint registry: lateOnly, requiresBeforePacks
412
+ - Overlap pair detection
413
+ - Repair suggestions on every finding
414
+
415
+ #### Escalation Auto-Routing
416
+ - Blocked/rejected/conflict/split work auto-routes to named resolver
417
+ - Every escalation includes: target role, recovery type, required artifact, handoff context
418
+
419
+ #### Structured Evidence
420
+ - 12 evidence kinds, 4 statuses, closed 4-verdict enum (accept/accept-with-notes/reject/blocked)
421
+ - Role-aware evidence requirements for 15 roles
422
+ - Sufficiency checks with contradiction detection
423
+
424
+ #### Runtime Dispatch
425
+ - Execution manifests for multi-claude with per-role tool profiles and budgets
426
+ - 8 execution states with auto-advance
427
+ - Escalation packet generation for blocked/rejected steps
428
+
429
+ #### Proven Team Packs
430
+ - 7 battle-tested packs: feature, bugfix, security, docs, launch, research, treatment
431
+ - `roleos packs list`show all packs with role counts
432
+ - `roleos packs suggest <packet>` — suggest best pack for a packet
433
+ - `roleos packs show <name>` — show pack details (roles, artifacts, stop conditions)
434
+ - Pack suggestion engine with confidence levels
435
+
436
+ #### Trials
437
+ - Full roster proven: 30/30 gold-task trials + 5/5 negative (wrong-task honesty) trials
438
+ - 7 pack execution trials — all packs ran full chains with honest Critic verdicts
439
+ - Trial framework: buildClusterTrials, evaluateTrialOutput, formatTrialReport
440
+
441
+ ### Changed
442
+ - 32 → 31 roles: Information Architect merged into Docs Architect
443
+ - Verdict vocabulary unified: evidence.mjs now uses accept/reject/blocked (matching review.mjs)
444
+ - "worker" terminology replaced with "role" in dispatch.mjs
445
+
446
+ ### Fixed
447
+ - `excludeWhen` was declared on 14 roles but never enforced — now active in scoreRole
448
+ - `detectType` false-positived on "integration testing" — now uses word-boundary regex
449
+ - "Not triggered: N roles" noise hidden by default (shown with --verbose)
450
+ - Handbook: Team Packs page added, reference sidebar reordered
451
+
452
+ ## 1.0.2
453
+
454
+ ### Fixed
455
+ - Fix double-nested `.claude/.claude/` directory created by `roleos init` — `starter-pack/.claude/workflows/full-treatment.md` moved to `starter-pack/workflows/`
456
+ - Read VERSION from `package.json` at runtime instead of hardcoded constant — prevents version drift between CLI and package metadata
457
+
458
+ ### Added
459
+ - `roleos init --force` — update canonical scaffolded files while always protecting user-filled `context/` files
460
+ - 4 regression tests: no double-nesting, correct workflow placement, version sync, --force context protection
461
+
462
+ ## 1.0.0
463
+
464
+ ### Added
465
+ - `roleos init` — scaffold Role OS starter pack into `.claude/`
466
+ - `roleos packet new <type>` — create feature, integration, or identity packets
467
+ - `roleos route <packet-file>` — recommend smallest valid role chain with dependency verification
468
+ - `roleos review <packet-file> <verdict>` — record accept/reject/blocked verdicts
469
+ - Full starter pack: 8 role contracts, 3 schemas, 4 policies, 3 workflows
470
+ - Guided context templates with inline prompts
471
+ - 3 canonical example packets (feature, integration, identity)
472
+ - Adoption handbook