role-os 2.3.1 → 2.6.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/CHANGELOG.md CHANGED
@@ -1,437 +1,484 @@
1
- # Changelog
2
-
3
- ## 2.3.0
4
-
5
- ### Added
6
-
7
- #### Dogfood Swarm Mission Multi-Pass Health + Feature Convergence
8
-
9
- - **Dogfood swarm mission** 9th mission in the library. Three-stage health pass (bug/security proactive humanization) then iterative feature pass with exclusive file ownership, build gates, and user checkpoints. Moves a repo from "works" to "production-ready." Proven on claude-collaborate (35→129 tests, 106 findings fixed, v1.1.0 shipped).
10
- - **7 new roles** Swarm Coordinator, Swarm Backend Agent, Swarm Bridge Agent, Swarm Tests Agent, Swarm Infra Agent, Swarm Frontend Agent, Swarm Synthesizer (61 total roles)
11
- - **Swarm team pack** — 10th pack, 8 roles (7 swarm + Critic Reviewer), with mismatch guards and trial evidence
12
- - **Two new mission primitives**:
13
- - `waveLoops` iterative convergence with exit conditions, max iterations, build gates, and user approval flags
14
- - `exclusiveOwnership` — strict domain file boundaries enforced by manifest
15
- - **Dynamic domain dispatch** — scales agent count based on repo structure via `swarm-manifest.json`
16
- - **`roleos swarm` CLI** — first-class entry point with subcommands: `swarm`, `swarm manifest`, `swarm manifest --generate`, `swarm status`, `swarm findings`, `swarm approve`, `swarm verify`
17
- - **Domain detection** (`src/swarm/domain-detect.mjs`) — auto-detects repo type (CLI, web, desktop, MCP, monorepo) and generates domain manifests with non-overlapping file ownership
18
- - **Build gate** (`src/swarm/build-gate.mjs`) — auto-detects build system (Node, Rust, Python, Go) and runs lint → typecheck → test verification after every wave
19
- - **Evidence persistence bridge** (`src/swarm/persist-bridge.mjs`) optional connection back to dogfood-labs, converts wave results to dogfood submission + audit DB payloads
20
- - **7 artifact contracts** — `swarm-gate`, `wave-report` (×5 with domain-specific sections), `swarm-final-report`
21
- - **Pack handoff contract** for swarm flow
22
-
23
- ### Tests
24
- - 97 new tests (swarm core, domain detection, build gate, persist bridge)total: 1150
25
-
26
- ## 2.2.1
27
-
28
- ### Added
29
- - **`roleos audit` CLI** — first-class entry point for deep audit with subcommands: `audit`, `audit manifest`, `audit manifest --generate`, `audit status`, `audit verify`
30
- - **Shared state machine** (`src/state-machine.mjs`) — canonical step/run transitions shared by both runners
31
- - **Shared tool profiles** (`src/tool-profiles.mjs`) — extracted from dispatch.mjs to break trial→dispatch coupling
32
-
33
- ### Fixed
34
- - **P3-1:** Cycle detection in composite execution (`detectCycles` + visited-set guard in `findUnreachable`)
35
- - **P3-2:** Dual-active guard in `startNext`/`startNextStep` prevents two steps active simultaneously
36
- - **P3-3:** Atomic persistence — `saveRun` writes to temp file then renames
37
- - **P4-1:** Dependency Auditor has own artifact contract (`dependency-audit`), pack handoff corrected
38
- - **P4-2:** `partitionBrief` returns topic-only for unknown roles instead of full brief
39
- - **P4-3:** Atom kind normalization layer bridges scout `.kind` and atom `.claim_kind`
40
- - **P4-4:** `/dev/stdin` `readFileSync(0)` for Windows compatibility in all 5 hooks
41
- - **P4-5:** TOOL_PROFILES extracted to shared module, eliminating trial→dispatch coupling
42
- - Node 18 compatibility fix for `import.meta.dirname` in deep-audit-proof test
43
-
44
- ### Tests
45
- - 18 new tests (audit-cmd, audit-p5, deep-audit-proof) — total: 954
46
-
47
- ## 2.2.0
48
-
49
- ### Added
50
-
51
- #### Deep Audit Mission — Runner-Native Componentized Repo Audit
52
-
53
- - **Deep audit mission** — 8th mission in the library. Decomposes a repo into bounded components, dispatches one auditor per component, inspects seams from the dependency graph, assesses test truth, then synthesizes into a ranked verdict and action plan.
54
- - **Dynamic dispatch**missions with `dynamicDispatch` field now expand from a manifest at runtime. `createRun("deep-audit", task, { manifest })` creates N + M + K + 3 steps from the repo graph instead of a fixed static chain. A 6-component / 8-boundary repo produces 23 steps; a 10-component / 5-boundary repo produces 28.
55
- - **4 new audit roles** — Component Auditor, Seam Auditor, Test Truth Auditor, Audit Synthesizer. Each with full artifact contracts, tool profiles, and role definitions in starter-pack.
56
- - **Deep-audit pack** — 9th team pack with scaling chain order, dispatch defaults, and mismatch guards.
57
- - **Artifact validation at execution boundaries** — `validateArtifact()` now runs on every step completion in both `run.mjs` and `mission-run.mjs`. Validation results are attached to the step object. Warn, don't block.
58
- - **Proof run test suite** — `test/deep-audit-proof.test.mjs` proves the full runner-native lifecycle against the real audit-manifest.json: step creation, parcel identity, validation, escalation, partial failure, scaling formula, and report generation.
59
-
60
- ### Fixed
61
-
62
- - **Critical: "approve" vs "accept" verdict mismatch** — `evidence.mjs:195` checked `!== "approve"` but the enum defines `"accept"`. Every accept verdict generated a spurious warning. Tests masked it via substring matching. Fixed to `"accept"` with hardened exact-assertion tests.
63
- - **Dead imports removed** — `TEAM_PACKS` and `ROLE_ARTIFACT_CONTRACTS` in mission-run.mjs, `TEAM_PACKS` in run.mjs, `scoreRole` and `MIN_SCORE_THRESHOLD` in trial.mjs were imported but never used.
64
- - **Warning message terminology** — all evidence warning messages now use "accept" instead of "approve" consistently.
65
-
66
- ### Changed
67
-
68
- - Mission count: 7 8
69
- - Role count: 50 → 54 (4 deep audit roles)
70
- - Pack count: 8 → 9
71
- - Artifact contract count: 30 34 (4 new audit role contracts)
72
- - Test count: 905 → 936
73
-
74
- ### Evidence
75
-
76
- - Self-audit dogfood: 128 findings (1 critical, 11 high, 39 medium) across 6 component parcels, 8 boundary seams, and 31 test files
77
- - Runner-native proof run: 23 dynamic steps from real manifest, full lifecycle, all green
78
- - Scaling formula verified: 2N + K + 3 holds for manifests of 3, 6, 10, and 15 components
79
-
80
- ## 2.1.0
81
-
82
- ### Added
83
-
84
- #### Brainstorm Mission (v0.4) Structured Inquiry with Traceable Disagreement
85
-
86
- - **Brainstorm mission** 7th mission in the library, 9-role chain with two-layer architecture
87
- - **Layer 1 (truth):** 4 analyst roles emit role-native schemas (ContextMap, UserValueMap, MechanicsMap, PositioningMap), not shared prose. Blindspot enforcement: forbidden phrases, forbidden claim kinds, filtered input partitions per role. Provenance-preserving atoms carry source_role, claim_kind, allowed_challengers. Cross-examination permission matrix (directed graph). Rebut phase: original analysts defend, narrow, or retract under pressure.
88
- - **Layer 2 (render):** 5 distinct voices Boundary Memo (taxonomist), Field Notes (ethnographer), System Sketch (whiteboard), Claim Brief (strategist), Cross-Exam Transcript (litigator). Lexical bans prevent voice convergence. Debate transcript generator. Both layers always available.
89
- - **Trace links:** Every rendered sentence maps to a truth-layer atom. Synthesis cites atoms, never prose.
90
- - **Golden run proof:** Full artifact chain for MCP server marketplace topic — truth artifacts, dispute graph (4 challenges, 3 narrowed, 1 unresolved), rendered artifacts, trace map (16+ links). Published as `examples/golden-run.md`.
91
- - **Result formatter:** `formatBrainstormResult()` produces saveable markdown with verdict, directions, dispute, tensions, rendered artifacts (opt-in), and evidence trail. Layer parameter controls truth-only vs both.
92
- - **Artifact contracts:** 9 brainstorm role contracts (replacing 3 v0.1 scout contracts) with completion rules, required evidence, and consumer mapping.
93
- - **Pack update:** Brainstorm pack updated from v0.1 scouts to v0.3/v0.4 analysts with correct chain order and required artifacts.
94
-
95
- ### Changed
96
-
97
- - Mission count: 6 → 7
98
- - Role count: 31 50 (brainstorm analysts, contrarian, plus existing)
99
- - Artifact contract count: 20 → 30
100
- - Test count: 617 905
101
-
102
- ## 2.0.1
103
-
104
- ### Added
105
-
106
- - 4 version consistency tests (semver, >= 1.0.0, CHANGELOG, help output)
107
-
108
- ## 2.0.0
109
-
110
- ### Added
111
-
112
- #### Operator Friction Pass (Phase U)
113
- - `roleos run "<task>"` — one command from task description to active execution
114
- - Persistent disk-backed runs in `.claude/runs/` — survives session interruptions
115
- - Entry level auto-selection: mission, pack, or free routing with force overrides (`--mission=`, `--pack=`)
116
- - Step-local operator guidance at every step: role, artifact, required sections, completion rule, stop conditions
117
- - `roleos resume [id]` continue interrupted runs from disk
118
- - `roleos next` start the next step or show what's active
119
- - `roleos explain [id]` full run state with guidance, escalations, interventions
120
- - `roleos complete <artifact> [note]` — complete the active step with artifact reference
121
- - `roleos fail <partial|failed> <reason>` — fail with honest downstream blocking
122
- - `roleos run list` — list all runs with status icons
123
- - `roleos run show <id>` full run detail
124
-
125
- #### Intervention Shortcuts
126
- - `roleos retry <step>` — retry a failed/partial step, unblock downstream
127
- - `roleos reroute <step> <role> <reason>` — swap a step to a different role
128
- - `roleos escalate <from> <to> <trigger> <action>` — escalate between roles with step re-opening
129
- - `roleos block <step> <reason>` — manually block a step
130
- - `roleos reopen <step> <reason>` — reopen a completed step for re-execution
131
-
132
- #### Friction Measurement
133
- - `roleos report [id]` generate completion report with honest-partial
134
- - `roleos friction [id]` measure operator touches: interventions, escalations, manual steps
135
- - Friction score: low/medium/high based on touch count vs step count
136
-
137
- ### Evidence
138
- - 613 tests, zero failures (86 new)
139
- - 6 friction trials validated: clean run, reroute, retry, pack-level, free-routing, disk resume
140
- - All entry levels produce low/medium friction scores
141
- - Disk round-trip verified: create → pause → load → resume → complete
142
-
143
- ## 1.9.0
144
-
145
- ### Added
146
-
147
- #### Unified Entry Path (Phase T)
148
- - `roleos start <task>` — auto-decides mission vs pack vs free routing
149
- - Three-level fallback ladder with confidence scores and alternatives
150
- - Composite task detection warns when a task should be decomposed
151
- - `--json` flag for machine-readable entry decisions
152
- - 46 new tests: entry engine, comparison trials, CLI integration
153
-
154
- #### Handbook Updates
155
- - New Missions handbook page with full mission documentation
156
- - Updated Getting Started to lead with `roleos start`
157
- - Updated Reference with all CLI commands (start, mission, packs, artifacts, status, doctor)
158
- - Updated handbook index with entry levels and 9 operating layers
159
-
160
- #### README Overhaul
161
- - "How it works" section leads with `roleos start` examples
162
- - Quick Start updated with mission and start commands
163
- - Added 6 Missions table
164
- - Updated project structure with all 18 source modules
165
- - Updated status history through v1.9.0
166
-
167
- ### Evidence
168
- - 527 tests, zero failures (46 new)
169
- - Entry path trials validated against 20+ real task descriptions
170
- - Fallback ladder tested: mission, pack, free-routing, composite, empty input
171
-
172
- ## 1.8.0
173
-
174
- ### Added
175
-
176
- #### Mission Library (Phase SMission Hardening)
177
- - 6 named, repeatable mission types: feature-ship, bugfix, treatment, docs-release, security-hardening, research-launch
178
- - Each mission declares: pack, role chain, artifact flow, escalation branches, honest-partial definition, stop conditions, dispatch defaults, trial evidence
179
- - Mission runner: create → step through → complete/fail → generate completion report
180
- - Completion proof reporter with honest-partial and formatted text output
181
- - `roleos mission list` — list all missions
182
- - `roleos mission show <key>` full mission detail
183
- - `roleos mission suggest <text>` — signal-based mission suggestion
184
- - `roleos mission validate [key]` — validate mission wiring against packs/roles
185
-
186
- #### Mission Runner Engine
187
- - `createRun()` instantiate a mission with tracked steps
188
- - `startNextStep()` / `completeStep()` / `failStep()` step lifecycle
189
- - `recordEscalation()` — re-opens completed steps on escalation loops
190
- - `getRunPosition()` / `getArtifactChain()` — run introspection
191
- - `generateCompletionReport()` / `formatCompletionReport()` — honest outcome reporting
192
-
193
- ### Evidence
194
- - 465 tests, zero failures (67 new)
195
- - All 6 missions validate against live pack/role catalog
196
- - Full lifecycle tests: end-to-end runs, escalation loops, partial completions, failure reporting
197
-
198
- ## 1.7.0
199
-
200
- ### Added
201
-
202
- #### Completion Proof (Phase R)
203
- - `roleos artifacts` CLI command: list, show, validate, chain subcommands
204
- - 13 new CLI integration tests for artifact inspection
205
- - Real task completion missions through the full stack
206
-
207
- #### Completion Proof Evidence
208
- - R1-1 Feature mission: `roleos artifacts` command shipped through feature pack
209
- - Pack: feature (high confidence, correct)
210
- - Chain: 5 roles, 0 escalations, 1 minor correction
211
- - Artifact contracts: all 4 used and valid
212
- - R1-2 Bugfix mission: README.zh.md npm anomaly
213
- - Diagnosed correctly: npm auto-includes README* regardless of files field
214
- - Escalated honestly: fix requires structural decision (translation file organization)
215
- - Not force-closed: deferred to treatment pass
216
-
217
- ### Evidence
218
- - 398 tests, zero failures
219
- - 3 missions run through the full stack
220
- - Completion metrics recorded per mission
221
-
222
- ## 1.6.0
223
-
224
- ### Added
225
-
226
- #### Artifact Spine (Phase Q)
227
- - 20 per-role artifact contracts: each defines artifact type, required sections, evidence references, downstream consumers, and completion rules
228
- - `validateArtifact(role, content)` — structural validation against role contracts (missing sections, evidence references, content depth)
229
- - 7 pack-level handoff contracts: define the expected artifact flow between steps for each pack (e.g., strategy-brief → implementation-spec → change-plan → test-package → verdict)
230
- - `validatePackChain(pack, artifacts)` validates an entire pack's artifact chain for completeness
231
- - `getArtifactContract(role)` / `getHandoffContract(pack)`lookup APIs
232
- - `formatArtifactValidation()` / `formatPackChain()` — display formatters
233
-
234
- #### Artifact contract coverage
235
- - Product Strategist strategy-brief (problem-framing, scope, non-goals, tradeoffs)
236
- - Spec Writer → implementation-spec (acceptance-criteria, edge-cases, interface-spec)
237
- - Backend/Frontend Engineer change-plan (files-to-change, implementation-approach, risk-notes)
238
- - Test Engineer test-package (test-plan, test-cases, false-confidence-assessment)
239
- - Security Reviewer → security-findings (findings, severity-assessment, recommendations)
240
- - Critic Reviewer → verdict (verdict, evidence, required-corrections)
241
- - And 14 more roles with full contracts
242
-
243
- ### Evidence
244
- - 385 tests, zero failures
245
- - 27 new artifact tests
246
-
247
- ## 1.5.0
248
-
249
- ### Added
250
-
251
- #### Hook Spine / Runtime Enforcement (Phase R)
252
- - 5 lifecycle hooks: SessionStart, UserPromptSubmit, PreToolUse, SubagentStart, Stop
253
- - `scaffoldHooks()` generates all 5 hook scripts in .claude/hooks/
254
- - `roleos init claude` now scaffolds hooks + settings.local.json with hook config
255
- - `roleos doctor` now checks for hook scripts (check 7) and settings hooks (check 8)
256
-
257
- #### SessionStart hook
258
- - Establishes session contract on every new session
259
- - Records session ID, timestamp, initializes state tracking
260
- - Adds context reminding Claude to use /roleos-route for non-trivial tasks
261
-
262
- #### UserPromptSubmit hook
263
- - Classifies prompts as substantial (>50 chars + action verbs)
264
- - After 2+ substantial prompts without a route card, adds context reminder
265
- - Does not block — advisory enforcement
266
-
267
- #### PreToolUse hook
268
- - Records all tool usage in session state
269
- - Flags write tools (Bash, Write, Edit) used without route card after substantial work
270
- - Advisory, not blocking — preserves operator control
271
-
272
- #### SubagentStart hook
273
- - Injects active role contract into delegated agents
274
- - Ensures subagents inherit the Role OS session context
275
-
276
- #### Stop hook
277
- - Warns when substantial sessions end without route card or outcome artifact
278
- - Advisory does not block session exit
279
- - Trivial sessions (< 2 substantial prompts) are exempt
280
-
281
- ### Evidence
282
- - 358 tests, zero failures
283
- - 23 new hook tests covering all 5 lifecycle hooks
284
-
285
- ## 1.4.0
286
-
287
- ### Added
288
-
289
- #### Session Spine (Phase Q)
290
- - `roleos init claude` — scaffolds Claude Code integration: CLAUDE.md instructions, /roleos-route + /roleos-review + /roleos-status slash commands
291
- - `roleos doctor` — verifies repo is correctly wired for Role OS sessions (6 checks: .claude/ dir, CLAUDE.md section, /roleos-route command, context files, role contracts, packets)
292
- - Route card generation — session header artifact proving Role OS was engaged (task type, pack, confidence, composite status, success artifact)
293
- - CLAUDE.md template instructs Claude to route through Role OS before non-trivial work
294
- - /roleos-route command produces structured route cards
295
- - /roleos-review command guides structured verdict production
296
- - /roleos-status command shows active work and context health
297
- - Appends to existing CLAUDE.md without overwriting (detects Role OS section)
298
- - --force flag overwrites existing command files
299
-
300
- ### Evidence
301
- - 335 tests, zero failures
302
-
303
- ## 1.3.0
304
-
305
- ### Added
306
-
307
- #### Outcome Calibration (Phase M)
308
- - Run outcome ledger — append-only JSONL recording pack selection, confidence, overrides, escalations, corrections, completion status
309
- - `computeCalibration()` — pack usage rates, high-confidence accuracy, operator override rates, per-pack performance
310
- - `computePackBoosts()` weight tuning from clean completed runs (+0.5/run, capped at 2.0)
311
- - `computeConfidenceAdjustment()` raises threshold when high-confidence is often overridden, lowers when medium is often accepted
312
- - Auto-generated calibration suggestions when metrics drift
313
- - Safety constraint: calibration never overrides mismatch guards, conflict rules, escalation honesty, or evidence requirements
314
-
315
- #### Mixed-Task Decomposition (Phase N)
316
- - `detectComposite()` 7 subtask categories (build, bugfix, security, docs, research, launch, treatment) with signal-based detection
317
- - Structural connector detection ("and then", "after that", "plus", "also")
318
- - Confidence levels: high (3+ categories or 2+ with connectors), medium, low
319
- - `decompose()` — generates linked child packets sorted by phase order
320
- - `createRunPlan()` dependency-aware parent plan with child tracking
321
- - Honest fallback: medium/low confidence shows uncertainty warning with `--no-split` override
322
-
323
- #### Composite Execution (Phase O)
324
- - `initExecution()` / `advance()` dependency-driven child execution with artifact passing
325
- - 7 artifact contracts defining what each category produces and expects
326
- - Artifact ledger tracking all cross-packet handoffs
327
- - `blockChild()` / `recoverChild()` / `failChild()` — branch recovery with transitive cascade
328
- - `invalidateDownstream()` — resets stale children when upstream changes, removes stale artifacts
329
- - `synthesize()` truthful parent-level completion report
330
- - Independent branches continue unaffected when a sibling fails
331
-
332
- #### Adaptive Replanning (Phase P)
333
- - 6 structured change event types: scope-change, artifact-changed, new-requirement, review-finding, dependency-discovered, priority-change
334
- - `analyzeImpact()` — identifies valid/stale children, stale artifacts, whether new children or reorder needed
335
- - `replan()` — selective replanning: invalidates only affected branches, inserts new children, updates dependencies
336
- - Plan diff: shows what changed, what stayed valid, what reopened, what was inserted
337
- - Execution resumes from next valid child after replan no restart required
338
-
339
- ### Evidence
340
- - 317 tests, zero failures
341
- - Calibration, decomposition, composite execution, and replanning each have dedicated test suites
342
-
343
- ## 1.2.0
344
-
345
- ### Added
346
- - Pack auto-selection in `roleos route` — suggests best pack when confidence is high
347
- - `roleos route --pack=<name>` — use a specific pack for routing
348
- - Pack mismatch detection — warns when a pack doesn't fit the task, suggests the correct alternative
349
- - Pack fallback — mismatched or unknown packs fall back to free routing automatically
350
- - `checkPackMismatch()` API with 7 guard sets covering all pack×task-type combinations
351
- - `getPackRoles()` API with conditional Orchestrator support
352
-
353
- ### Changed
354
- - Docs pack: Support Triage Lead now opens (was Feedback Synthesizer). Feedback Synthesizer is second. Release Engineer + Deployment Verifier moved to optional (overhead for docs-only tasks).
355
- - Pack calibration applied from comparison evidence: conditional Orchestrator, Security Reviewer in Treatment, Product Strategist opens Research, mismatch guards on all 7 packs.
356
-
357
- ### Evidence
358
- - Pack comparison: calibrated packs now win or tie 6/7 (was 2/7 pre-calibration)
359
- - Misfit honesty: 0 full bluffs, 0 undetected partial bluffs (was 1 + 3)
360
- - 230 tests, zero failures
361
-
362
- ## 1.1.0
363
-
364
- ### Added
365
-
366
- #### Routing
367
- - Full 31-role catalog all roles scored by keyword, trigger phrase, packet type bias, and deliverable affinity
368
- - Dynamic chain builder phase-ordered assembly replacing static templates
369
- - Routing confidence assessment (high/medium/low)
370
- - `excludeWhen` enforcement roles suppressed when exclusion patterns match packet content
371
- - `detectType` false-positive prevention"integration testing" no longer triggers integration type
372
- - `--verbose` flag for `roleos route` hides scoring noise by default
373
-
374
- #### Conflict Detection
375
- - 4-pass conflict engine: hard conflicts, sequence, redundancy, coverage gaps
376
- - Per-role constraint registry: lateOnly, requiresBeforePacks
377
- - Overlap pair detection
378
- - Repair suggestions on every finding
379
-
380
- #### Escalation Auto-Routing
381
- - Blocked/rejected/conflict/split work auto-routes to named resolver
382
- - Every escalation includes: target role, recovery type, required artifact, handoff context
383
-
384
- #### Structured Evidence
385
- - 12 evidence kinds, 4 statuses, closed 4-verdict enum (accept/accept-with-notes/reject/blocked)
386
- - Role-aware evidence requirements for 15 roles
387
- - Sufficiency checks with contradiction detection
388
-
389
- #### Runtime Dispatch
390
- - Execution manifests for multi-claude with per-role tool profiles and budgets
391
- - 8 execution states with auto-advance
392
- - Escalation packet generation for blocked/rejected steps
393
-
394
- #### Proven Team Packs
395
- - 7 battle-tested packs: feature, bugfix, security, docs, launch, research, treatment
396
- - `roleos packs list` show all packs with role counts
397
- - `roleos packs suggest <packet>` suggest best pack for a packet
398
- - `roleos packs show <name>` show pack details (roles, artifacts, stop conditions)
399
- - Pack suggestion engine with confidence levels
400
-
401
- #### Trials
402
- - Full roster proven: 30/30 gold-task trials + 5/5 negative (wrong-task honesty) trials
403
- - 7 pack execution trials — all packs ran full chains with honest Critic verdicts
404
- - Trial framework: buildClusterTrials, evaluateTrialOutput, formatTrialReport
405
-
406
- ### Changed
407
- - 32 31 roles: Information Architect merged into Docs Architect
408
- - Verdict vocabulary unified: evidence.mjs now uses accept/reject/blocked (matching review.mjs)
409
- - "worker" terminology replaced with "role" in dispatch.mjs
410
-
411
- ### Fixed
412
- - `excludeWhen` was declared on 14 roles but never enforced — now active in scoreRole
413
- - `detectType` false-positived on "integration testing" — now uses word-boundary regex
414
- - "Not triggered: N roles" noise hidden by default (shown with --verbose)
415
- - Handbook: Team Packs page added, reference sidebar reordered
416
-
417
- ## 1.0.2
418
-
419
- ### Fixed
420
- - Fix double-nested `.claude/.claude/` directory created by `roleos init` — `starter-pack/.claude/workflows/full-treatment.md` moved to `starter-pack/workflows/`
421
- - Read VERSION from `package.json` at runtime instead of hardcoded constant — prevents version drift between CLI and package metadata
422
-
423
- ### Added
424
- - `roleos init --force` — update canonical scaffolded files while always protecting user-filled `context/` files
425
- - 4 regression tests: no double-nesting, correct workflow placement, version sync, --force context protection
426
-
427
- ## 1.0.0
428
-
429
- ### Added
430
- - `roleos init` — scaffold Role OS starter pack into `.claude/`
431
- - `roleos packet new <type>` — create feature, integration, or identity packets
432
- - `roleos route <packet-file>` recommend smallest valid role chain with dependency verification
433
- - `roleos review <packet-file> <verdict>` record accept/reject/blocked verdicts
434
- - Full starter pack: 8 role contracts, 3 schemas, 4 policies, 3 workflows
435
- - Guided context templates with inline prompts
436
- - 3 canonical example packets (feature, integration, identity)
437
- - Adoption handbook
1
+ # Changelog
2
+
3
+ ## 2.6.0
4
+
5
+ ### Changed
6
+
7
+ #### `verify-citations --local-panel` now judges against prism's FULL abstract (not just the span)
8
+
9
+ - The local panel previously re-checked each `supported` citation against only prism's `source_title` + the single `supporting_span` the groundedness lens surfaced. A faithful claim that the *full* abstract entails but no single span does was escalated as a panel disagreement (the wave-6 end-to-end Kambhampati false-escalation). `buildEvidence` now prefers prism's full `source_abstract` (surfaced by prism **v1.0+**), falling back to the span on older prism builds — so faithful claims land cleanly while genuine false-confirms are still caught.
10
+ - `gateCitations` threads `source_abstract` through from prism's `citation_results`. Requires prism 1.0 to take effect; older prism builds omit the field and the span fallback preserves prior behavior. Pairs with `tensor-engine-knowledge` wave-9 (the 3rd verifier family + the full-abstract e2e).
11
+
12
+ ### Tests
13
+ - 3 new tests (`buildEvidence` abstract-preference + span fallback; an end-to-end assertion that the panel's evidence carries the full abstract). **1199 total, all green.**
14
+
15
+ ## 2.5.0
16
+
17
+ ### Added
18
+
19
+ #### Local-panel seata second, family-different verifier for `verify-citations`, runnable locally for free
20
+
21
+ - **`roleos verify-citations --local-panel`** — adds a local grounded-entailment PANEL (the `offload` CLI on **Qwen3-4B + Qwen3-14B + Mistral-Nemo-12B** via llama-swap) as a SECOND verifier seat, decorrelated from the Claude generator by construction (no Anthropic model in the panel) and from prism's single groundedness model (3 seats, ≥2 families, conservative majority). It re-checks each citation prism marked `supported` against prism's own retrieved evidence (`source_title` + `supporting_span`).
22
+ - **Monotone-tightening** — the panel can only downgrade a passing gate to **escalate** (`local_panel_disagreement`), never loosen one; the deterministic existence floor (`fabricated` → blocking) always dominates, and a non-passing gate is left untouched. A panel that is requested but unreachable **escalates** (`local_panel_unreachable`) — the same closed-gate invariant prism uses.
23
+ - **Why it earns its seat (EXTERNAL_VERIFIER, now local + zero-cost):** the panel's measured property is **zero false-confirms** — a 3-seat conservative majority never stamps a false claim "supported." On a 16-case real-arXiv citation set, `mistral-nemo-12b` solo false-confirmed a refuted claim (inverting arXiv:2404.13076's finding); the panel held it at `insufficient`. Receipt + dataset: `tensor-engine-knowledge/verifier/citation-panel-receipt.json` (study-swarm wave-6, recipe #156).
24
+ - **Receipt** gains a `local_panel` block (PIN_PER_STEP): the exact seat models used, per-citation panel verdicts, and any disagreement that downgraded the gate folded into the receipt's hash chain via the verdict + a panel digest.
25
+ - New module `src/citation-panel.mjs` (`runOffloadPanel`, `applyLocalPanel`, `buildEvidence`); injectable `offloadExec` for tests. **Off by default** — opt in with `--local-panel` (needs llama-swap up + `offload.py` on the rig; `OFFLOAD_PYTHON` / `OFFLOAD_SCRIPT` / `--llamaswap-base` configurable).
26
+
27
+ ### Tests
28
+ - 16 new tests (evidence building; the panel runner — agree / refuted / insufficient / no-evidence / unreachable / garbage; monotone `applyLocalPanel`; end-to-end `runCitationGate --local-panel` incl. the PIN'd receipt + blocking-skips-panel). **1196 total, all green.**
29
+
30
+ ## 2.4.0
31
+
32
+ ### Added
33
+
34
+ #### Citation-Verification Gate defers citation truthfulness to prism (an external, family-different verifier)
35
+
36
+ - **`roleos verify-citations <dispatch.md|.json>` CLI** — extracts a research dispatch's citations, shells the external `prism verify` CLI (a family-different, reasoning-stripped citation verifier), and gates on the verdict. Exit `0` = accept, `20` = blocking (a cited paper did not resolve in arXiv/Crossref likely fabricated), `10` = advisory (revise / escalate), `2` = no resolvable citations found.
37
+ - **Citation gate module** (`src/verify-citations.mjs`, peer to `build-gate.mjs`) deterministic, copy-only extraction (`extractCitations` — never invents an identifier); a three-tier gate keyed to the failure source (`gateCitations`): existence `fabricated` → **BLOCKING** hard halt, soft groundedness `contradicted` → advisory revise, low-confidence → advisory escalate. An unreachable verifier **escalates, never default-accepts** ("an unreachable gate is a closed gate"). Emits a receipt chained to prism's HMAC receipt (per-citation `source_sha256` pins → drift-detectable on re-run).
38
+ - **Critic Reviewer** gains a citation-verification clause — for a research dispatch it runs the gate, treats blocking as reject and advisory as accept-with-notes / escalate, and never grades the citations itself.
39
+ - **Design doc** (`design/citation-verification-runner.md`) research-grounded by a 4-question study-swarm (`wf_20651368-297`), with a Standards-compliance section scoring 15/15 on the applicable standards (NAMED_COMPENSATORS a documented read-only skip): EXTERNAL_VERIFIER, ANDON_AUTHORITY, PIN_PER_STEP, DECOMPOSE_BY_SECRETS, and UNCERTAINTY_GATED_HUMANS (the contrastive escalate-to-human path).
40
+ - Pairs with prism **v0.3.2**'s `prism verify --gate` (verdict-coded exit status).
41
+
42
+ ### Tests
43
+ - 14 new tests (extraction; three-tier gate; runner with injected prism — accept / block / escalate / unreachable). Module is testable with no real prism shell-out.
44
+
45
+ ## 2.3.1
46
+
47
+ ### Changed
48
+ - Version bump for dogfood swarm mission release
49
+
50
+ ## 2.3.0
51
+
52
+ ### Added
53
+
54
+ #### Dogfood Swarm Mission Multi-Pass Health + Feature Convergence
55
+
56
+ - **Dogfood swarm mission** — 9th mission in the library. Three-stage health pass (bug/security → proactive → humanization) then iterative feature pass with exclusive file ownership, build gates, and user checkpoints. Moves a repo from "works" to "production-ready." Proven on claude-collaborate (35→129 tests, 106 findings fixed, v1.1.0 shipped).
57
+ - **7 new roles** — Swarm Coordinator, Swarm Backend Agent, Swarm Bridge Agent, Swarm Tests Agent, Swarm Infra Agent, Swarm Frontend Agent, Swarm Synthesizer (61 total roles)
58
+ - **Swarm team pack** — 10th pack, 8 roles (7 swarm + Critic Reviewer), with mismatch guards and trial evidence
59
+ - **Two new mission primitives**:
60
+ - `waveLoops` — iterative convergence with exit conditions, max iterations, build gates, and user approval flags
61
+ - `exclusiveOwnership` — strict domain file boundaries enforced by manifest
62
+ - **Dynamic domain dispatch** — scales agent count based on repo structure via `swarm-manifest.json`
63
+ - **`roleos swarm` CLI** — first-class entry point with subcommands: `swarm`, `swarm manifest`, `swarm manifest --generate`, `swarm status`, `swarm findings`, `swarm approve`, `swarm verify`
64
+ - **Domain detection** (`src/swarm/domain-detect.mjs`) auto-detects repo type (CLI, web, desktop, MCP, monorepo) and generates domain manifests with non-overlapping file ownership
65
+ - **Build gate** (`src/swarm/build-gate.mjs`) — auto-detects build system (Node, Rust, Python, Go) and runs lint → typecheck → test verification after every wave
66
+ - **Evidence persistence bridge** (`src/swarm/persist-bridge.mjs`) — optional connection back to dogfood-labs, converts wave results to dogfood submission + audit DB payloads
67
+ - **7 artifact contracts** — `swarm-gate`, `wave-report` (×5 with domain-specific sections), `swarm-final-report`
68
+ - **Pack handoff contract** for swarm flow
69
+
70
+ ### Tests
71
+ - 97 new tests (swarm core, domain detection, build gate, persist bridge) — total: 1150
72
+
73
+ ## 2.2.1
74
+
75
+ ### Added
76
+ - **`roleos audit` CLI** first-class entry point for deep audit with subcommands: `audit`, `audit manifest`, `audit manifest --generate`, `audit status`, `audit verify`
77
+ - **Shared state machine** (`src/state-machine.mjs`) canonical step/run transitions shared by both runners
78
+ - **Shared tool profiles** (`src/tool-profiles.mjs`) extracted from dispatch.mjs to break trial→dispatch coupling
79
+
80
+ ### Fixed
81
+ - **P3-1:** Cycle detection in composite execution (`detectCycles` + visited-set guard in `findUnreachable`)
82
+ - **P3-2:** Dual-active guard in `startNext`/`startNextStep` prevents two steps active simultaneously
83
+ - **P3-3:** Atomic persistence — `saveRun` writes to temp file then renames
84
+ - **P4-1:** Dependency Auditor has own artifact contract (`dependency-audit`), pack handoff corrected
85
+ - **P4-2:** `partitionBrief` returns topic-only for unknown roles instead of full brief
86
+ - **P4-3:** Atom kind normalization layer bridges scout `.kind` and atom `.claim_kind`
87
+ - **P4-4:** `/dev/stdin` `readFileSync(0)` for Windows compatibility in all 5 hooks
88
+ - **P4-5:** TOOL_PROFILES extracted to shared module, eliminating trial→dispatch coupling
89
+ - Node 18 compatibility fix for `import.meta.dirname` in deep-audit-proof test
90
+
91
+ ### Tests
92
+ - 18 new tests (audit-cmd, audit-p5, deep-audit-proof) total: 954
93
+
94
+ ## 2.2.0
95
+
96
+ ### Added
97
+
98
+ #### Deep Audit Mission Runner-Native Componentized Repo Audit
99
+
100
+ - **Deep audit mission** 8th mission in the library. Decomposes a repo into bounded components, dispatches one auditor per component, inspects seams from the dependency graph, assesses test truth, then synthesizes into a ranked verdict and action plan.
101
+ - **Dynamic dispatch** — missions with `dynamicDispatch` field now expand from a manifest at runtime. `createRun("deep-audit", task, { manifest })` creates N + M + K + 3 steps from the repo graph instead of a fixed static chain. A 6-component / 8-boundary repo produces 23 steps; a 10-component / 5-boundary repo produces 28.
102
+ - **4 new audit roles** — Component Auditor, Seam Auditor, Test Truth Auditor, Audit Synthesizer. Each with full artifact contracts, tool profiles, and role definitions in starter-pack.
103
+ - **Deep-audit pack** — 9th team pack with scaling chain order, dispatch defaults, and mismatch guards.
104
+ - **Artifact validation at execution boundaries** — `validateArtifact()` now runs on every step completion in both `run.mjs` and `mission-run.mjs`. Validation results are attached to the step object. Warn, don't block.
105
+ - **Proof run test suite** — `test/deep-audit-proof.test.mjs` proves the full runner-native lifecycle against the real audit-manifest.json: step creation, parcel identity, validation, escalation, partial failure, scaling formula, and report generation.
106
+
107
+ ### Fixed
108
+
109
+ - **Critical: "approve" vs "accept" verdict mismatch** — `evidence.mjs:195` checked `!== "approve"` but the enum defines `"accept"`. Every accept verdict generated a spurious warning. Tests masked it via substring matching. Fixed to `"accept"` with hardened exact-assertion tests.
110
+ - **Dead imports removed** — `TEAM_PACKS` and `ROLE_ARTIFACT_CONTRACTS` in mission-run.mjs, `TEAM_PACKS` in run.mjs, `scoreRole` and `MIN_SCORE_THRESHOLD` in trial.mjs were imported but never used.
111
+ - **Warning message terminology** — all evidence warning messages now use "accept" instead of "approve" consistently.
112
+
113
+ ### Changed
114
+
115
+ - Mission count: 7 8
116
+ - Role count: 50 54 (4 deep audit roles)
117
+ - Pack count: 8 9
118
+ - Artifact contract count: 30 34 (4 new audit role contracts)
119
+ - Test count: 905 936
120
+
121
+ ### Evidence
122
+
123
+ - Self-audit dogfood: 128 findings (1 critical, 11 high, 39 medium) across 6 component parcels, 8 boundary seams, and 31 test files
124
+ - Runner-native proof run: 23 dynamic steps from real manifest, full lifecycle, all green
125
+ - Scaling formula verified: 2N + K + 3 holds for manifests of 3, 6, 10, and 15 components
126
+
127
+ ## 2.1.0
128
+
129
+ ### Added
130
+
131
+ #### Brainstorm Mission (v0.4) — Structured Inquiry with Traceable Disagreement
132
+
133
+ - **Brainstorm mission**7th mission in the library, 9-role chain with two-layer architecture
134
+ - **Layer 1 (truth):** 4 analyst roles emit role-native schemas (ContextMap, UserValueMap, MechanicsMap, PositioningMap), not shared prose. Blindspot enforcement: forbidden phrases, forbidden claim kinds, filtered input partitions per role. Provenance-preserving atoms carry source_role, claim_kind, allowed_challengers. Cross-examination permission matrix (directed graph). Rebut phase: original analysts defend, narrow, or retract under pressure.
135
+ - **Layer 2 (render):** 5 distinct voices Boundary Memo (taxonomist), Field Notes (ethnographer), System Sketch (whiteboard), Claim Brief (strategist), Cross-Exam Transcript (litigator). Lexical bans prevent voice convergence. Debate transcript generator. Both layers always available.
136
+ - **Trace links:** Every rendered sentence maps to a truth-layer atom. Synthesis cites atoms, never prose.
137
+ - **Golden run proof:** Full artifact chain for MCP server marketplace topic — truth artifacts, dispute graph (4 challenges, 3 narrowed, 1 unresolved), rendered artifacts, trace map (16+ links). Published as `examples/golden-run.md`.
138
+ - **Result formatter:** `formatBrainstormResult()` produces saveable markdown with verdict, directions, dispute, tensions, rendered artifacts (opt-in), and evidence trail. Layer parameter controls truth-only vs both.
139
+ - **Artifact contracts:** 9 brainstorm role contracts (replacing 3 v0.1 scout contracts) with completion rules, required evidence, and consumer mapping.
140
+ - **Pack update:** Brainstorm pack updated from v0.1 scouts to v0.3/v0.4 analysts with correct chain order and required artifacts.
141
+
142
+ ### Changed
143
+
144
+ - Mission count: 6 → 7
145
+ - Role count: 31 → 50 (brainstorm analysts, contrarian, plus existing)
146
+ - Artifact contract count: 20 → 30
147
+ - Test count: 617 905
148
+
149
+ ## 2.0.1
150
+
151
+ ### Added
152
+
153
+ - 4 version consistency tests (semver, >= 1.0.0, CHANGELOG, help output)
154
+
155
+ ## 2.0.0
156
+
157
+ ### Added
158
+
159
+ #### Operator Friction Pass (Phase U)
160
+ - `roleos run "<task>"` — one command from task description to active execution
161
+ - Persistent disk-backed runs in `.claude/runs/` survives session interruptions
162
+ - Entry level auto-selection: mission, pack, or free routing with force overrides (`--mission=`, `--pack=`)
163
+ - Step-local operator guidance at every step: role, artifact, required sections, completion rule, stop conditions
164
+ - `roleos resume [id]` continue interrupted runs from disk
165
+ - `roleos next` start the next step or show what's active
166
+ - `roleos explain [id]` — full run state with guidance, escalations, interventions
167
+ - `roleos complete <artifact> [note]` — complete the active step with artifact reference
168
+ - `roleos fail <partial|failed> <reason>` fail with honest downstream blocking
169
+ - `roleos run list` list all runs with status icons
170
+ - `roleos run show <id>` full run detail
171
+
172
+ #### Intervention Shortcuts
173
+ - `roleos retry <step>` — retry a failed/partial step, unblock downstream
174
+ - `roleos reroute <step> <role> <reason>` — swap a step to a different role
175
+ - `roleos escalate <from> <to> <trigger> <action>` — escalate between roles with step re-opening
176
+ - `roleos block <step> <reason>`manually block a step
177
+ - `roleos reopen <step> <reason>` reopen a completed step for re-execution
178
+
179
+ #### Friction Measurement
180
+ - `roleos report [id]` — generate completion report with honest-partial
181
+ - `roleos friction [id]` — measure operator touches: interventions, escalations, manual steps
182
+ - Friction score: low/medium/high based on touch count vs step count
183
+
184
+ ### Evidence
185
+ - 613 tests, zero failures (86 new)
186
+ - 6 friction trials validated: clean run, reroute, retry, pack-level, free-routing, disk resume
187
+ - All entry levels produce low/medium friction scores
188
+ - Disk round-trip verified: create pause load → resume → complete
189
+
190
+ ## 1.9.0
191
+
192
+ ### Added
193
+
194
+ #### Unified Entry Path (Phase T)
195
+ - `roleos start <task>` auto-decides mission vs pack vs free routing
196
+ - Three-level fallback ladder with confidence scores and alternatives
197
+ - Composite task detection warns when a task should be decomposed
198
+ - `--json` flag for machine-readable entry decisions
199
+ - 46 new tests: entry engine, comparison trials, CLI integration
200
+
201
+ #### Handbook Updates
202
+ - New Missions handbook page with full mission documentation
203
+ - Updated Getting Started to lead with `roleos start`
204
+ - Updated Reference with all CLI commands (start, mission, packs, artifacts, status, doctor)
205
+ - Updated handbook index with entry levels and 9 operating layers
206
+
207
+ #### README Overhaul
208
+ - "How it works" section leads with `roleos start` examples
209
+ - Quick Start updated with mission and start commands
210
+ - Added 6 Missions table
211
+ - Updated project structure with all 18 source modules
212
+ - Updated status history through v1.9.0
213
+
214
+ ### Evidence
215
+ - 527 tests, zero failures (46 new)
216
+ - Entry path trials validated against 20+ real task descriptions
217
+ - Fallback ladder tested: mission, pack, free-routing, composite, empty input
218
+
219
+ ## 1.8.0
220
+
221
+ ### Added
222
+
223
+ #### Mission Library (Phase S — Mission Hardening)
224
+ - 6 named, repeatable mission types: feature-ship, bugfix, treatment, docs-release, security-hardening, research-launch
225
+ - Each mission declares: pack, role chain, artifact flow, escalation branches, honest-partial definition, stop conditions, dispatch defaults, trial evidence
226
+ - Mission runner: create → step through → complete/fail → generate completion report
227
+ - Completion proof reporter with honest-partial and formatted text output
228
+ - `roleos mission list` — list all missions
229
+ - `roleos mission show <key>` full mission detail
230
+ - `roleos mission suggest <text>` signal-based mission suggestion
231
+ - `roleos mission validate [key]` — validate mission wiring against packs/roles
232
+
233
+ #### Mission Runner Engine
234
+ - `createRun()` instantiate a mission with tracked steps
235
+ - `startNextStep()` / `completeStep()` / `failStep()` step lifecycle
236
+ - `recordEscalation()` re-opens completed steps on escalation loops
237
+ - `getRunPosition()` / `getArtifactChain()` run introspection
238
+ - `generateCompletionReport()` / `formatCompletionReport()` honest outcome reporting
239
+
240
+ ### Evidence
241
+ - 465 tests, zero failures (67 new)
242
+ - All 6 missions validate against live pack/role catalog
243
+ - Full lifecycle tests: end-to-end runs, escalation loops, partial completions, failure reporting
244
+
245
+ ## 1.7.0
246
+
247
+ ### Added
248
+
249
+ #### Completion Proof (Phase R)
250
+ - `roleos artifacts` CLI command: list, show, validate, chain subcommands
251
+ - 13 new CLI integration tests for artifact inspection
252
+ - Real task completion missions through the full stack
253
+
254
+ #### Completion Proof Evidence
255
+ - R1-1 Feature mission: `roleos artifacts` command shipped through feature pack
256
+ - Pack: feature (high confidence, correct)
257
+ - Chain: 5 roles, 0 escalations, 1 minor correction
258
+ - Artifact contracts: all 4 used and valid
259
+ - R1-2 Bugfix mission: README.zh.md npm anomaly
260
+ - Diagnosed correctly: npm auto-includes README* regardless of files field
261
+ - Escalated honestly: fix requires structural decision (translation file organization)
262
+ - Not force-closed: deferred to treatment pass
263
+
264
+ ### Evidence
265
+ - 398 tests, zero failures
266
+ - 3 missions run through the full stack
267
+ - Completion metrics recorded per mission
268
+
269
+ ## 1.6.0
270
+
271
+ ### Added
272
+
273
+ #### Artifact Spine (Phase Q)
274
+ - 20 per-role artifact contracts: each defines artifact type, required sections, evidence references, downstream consumers, and completion rules
275
+ - `validateArtifact(role, content)` — structural validation against role contracts (missing sections, evidence references, content depth)
276
+ - 7 pack-level handoff contracts: define the expected artifact flow between steps for each pack (e.g., strategy-brief → implementation-spec → change-plan → test-package → verdict)
277
+ - `validatePackChain(pack, artifacts)` validates an entire pack's artifact chain for completeness
278
+ - `getArtifactContract(role)` / `getHandoffContract(pack)` lookup APIs
279
+ - `formatArtifactValidation()` / `formatPackChain()` display formatters
280
+
281
+ #### Artifact contract coverage
282
+ - Product Strategist → strategy-brief (problem-framing, scope, non-goals, tradeoffs)
283
+ - Spec Writer implementation-spec (acceptance-criteria, edge-cases, interface-spec)
284
+ - Backend/Frontend Engineer → change-plan (files-to-change, implementation-approach, risk-notes)
285
+ - Test Engineer → test-package (test-plan, test-cases, false-confidence-assessment)
286
+ - Security Reviewer → security-findings (findings, severity-assessment, recommendations)
287
+ - Critic Reviewer → verdict (verdict, evidence, required-corrections)
288
+ - And 14 more roles with full contracts
289
+
290
+ ### Evidence
291
+ - 385 tests, zero failures
292
+ - 27 new artifact tests
293
+
294
+ ## 1.5.0
295
+
296
+ ### Added
297
+
298
+ #### Hook Spine / Runtime Enforcement (Phase R)
299
+ - 5 lifecycle hooks: SessionStart, UserPromptSubmit, PreToolUse, SubagentStart, Stop
300
+ - `scaffoldHooks()` generates all 5 hook scripts in .claude/hooks/
301
+ - `roleos init claude` now scaffolds hooks + settings.local.json with hook config
302
+ - `roleos doctor` now checks for hook scripts (check 7) and settings hooks (check 8)
303
+
304
+ #### SessionStart hook
305
+ - Establishes session contract on every new session
306
+ - Records session ID, timestamp, initializes state tracking
307
+ - Adds context reminding Claude to use /roleos-route for non-trivial tasks
308
+
309
+ #### UserPromptSubmit hook
310
+ - Classifies prompts as substantial (>50 chars + action verbs)
311
+ - After 2+ substantial prompts without a route card, adds context reminder
312
+ - Does not block advisory enforcement
313
+
314
+ #### PreToolUse hook
315
+ - Records all tool usage in session state
316
+ - Flags write tools (Bash, Write, Edit) used without route card after substantial work
317
+ - Advisory, not blocking preserves operator control
318
+
319
+ #### SubagentStart hook
320
+ - Injects active role contract into delegated agents
321
+ - Ensures subagents inherit the Role OS session context
322
+
323
+ #### Stop hook
324
+ - Warns when substantial sessions end without route card or outcome artifact
325
+ - Advisory does not block session exit
326
+ - Trivial sessions (< 2 substantial prompts) are exempt
327
+
328
+ ### Evidence
329
+ - 358 tests, zero failures
330
+ - 23 new hook tests covering all 5 lifecycle hooks
331
+
332
+ ## 1.4.0
333
+
334
+ ### Added
335
+
336
+ #### Session Spine (Phase Q)
337
+ - `roleos init claude` scaffolds Claude Code integration: CLAUDE.md instructions, /roleos-route + /roleos-review + /roleos-status slash commands
338
+ - `roleos doctor` — verifies repo is correctly wired for Role OS sessions (6 checks: .claude/ dir, CLAUDE.md section, /roleos-route command, context files, role contracts, packets)
339
+ - Route card generation — session header artifact proving Role OS was engaged (task type, pack, confidence, composite status, success artifact)
340
+ - CLAUDE.md template instructs Claude to route through Role OS before non-trivial work
341
+ - /roleos-route command produces structured route cards
342
+ - /roleos-review command guides structured verdict production
343
+ - /roleos-status command shows active work and context health
344
+ - Appends to existing CLAUDE.md without overwriting (detects Role OS section)
345
+ - --force flag overwrites existing command files
346
+
347
+ ### Evidence
348
+ - 335 tests, zero failures
349
+
350
+ ## 1.3.0
351
+
352
+ ### Added
353
+
354
+ #### Outcome Calibration (Phase M)
355
+ - Run outcome ledger append-only JSONL recording pack selection, confidence, overrides, escalations, corrections, completion status
356
+ - `computeCalibration()` — pack usage rates, high-confidence accuracy, operator override rates, per-pack performance
357
+ - `computePackBoosts()` — weight tuning from clean completed runs (+0.5/run, capped at 2.0)
358
+ - `computeConfidenceAdjustment()` raises threshold when high-confidence is often overridden, lowers when medium is often accepted
359
+ - Auto-generated calibration suggestions when metrics drift
360
+ - Safety constraint: calibration never overrides mismatch guards, conflict rules, escalation honesty, or evidence requirements
361
+
362
+ #### Mixed-Task Decomposition (Phase N)
363
+ - `detectComposite()` — 7 subtask categories (build, bugfix, security, docs, research, launch, treatment) with signal-based detection
364
+ - Structural connector detection ("and then", "after that", "plus", "also")
365
+ - Confidence levels: high (3+ categories or 2+ with connectors), medium, low
366
+ - `decompose()` — generates linked child packets sorted by phase order
367
+ - `createRunPlan()`dependency-aware parent plan with child tracking
368
+ - Honest fallback: medium/low confidence shows uncertainty warning with `--no-split` override
369
+
370
+ #### Composite Execution (Phase O)
371
+ - `initExecution()` / `advance()`dependency-driven child execution with artifact passing
372
+ - 7 artifact contracts defining what each category produces and expects
373
+ - Artifact ledger tracking all cross-packet handoffs
374
+ - `blockChild()` / `recoverChild()` / `failChild()` — branch recovery with transitive cascade
375
+ - `invalidateDownstream()` resets stale children when upstream changes, removes stale artifacts
376
+ - `synthesize()` truthful parent-level completion report
377
+ - Independent branches continue unaffected when a sibling fails
378
+
379
+ #### Adaptive Replanning (Phase P)
380
+ - 6 structured change event types: scope-change, artifact-changed, new-requirement, review-finding, dependency-discovered, priority-change
381
+ - `analyzeImpact()` — identifies valid/stale children, stale artifacts, whether new children or reorder needed
382
+ - `replan()` selective replanning: invalidates only affected branches, inserts new children, updates dependencies
383
+ - Plan diff: shows what changed, what stayed valid, what reopened, what was inserted
384
+ - Execution resumes from next valid child after replan — no restart required
385
+
386
+ ### Evidence
387
+ - 317 tests, zero failures
388
+ - Calibration, decomposition, composite execution, and replanning each have dedicated test suites
389
+
390
+ ## 1.2.0
391
+
392
+ ### Added
393
+ - Pack auto-selection in `roleos route` — suggests best pack when confidence is high
394
+ - `roleos route --pack=<name>` — use a specific pack for routing
395
+ - Pack mismatch detection warns when a pack doesn't fit the task, suggests the correct alternative
396
+ - Pack fallbackmismatched or unknown packs fall back to free routing automatically
397
+ - `checkPackMismatch()` API with 7 guard sets covering all pack×task-type combinations
398
+ - `getPackRoles()` API with conditional Orchestrator support
399
+
400
+ ### Changed
401
+ - Docs pack: Support Triage Lead now opens (was Feedback Synthesizer). Feedback Synthesizer is second. Release Engineer + Deployment Verifier moved to optional (overhead for docs-only tasks).
402
+ - Pack calibration applied from comparison evidence: conditional Orchestrator, Security Reviewer in Treatment, Product Strategist opens Research, mismatch guards on all 7 packs.
403
+
404
+ ### Evidence
405
+ - Pack comparison: calibrated packs now win or tie 6/7 (was 2/7 pre-calibration)
406
+ - Misfit honesty: 0 full bluffs, 0 undetected partial bluffs (was 1 + 3)
407
+ - 230 tests, zero failures
408
+
409
+ ## 1.1.0
410
+
411
+ ### Added
412
+
413
+ #### Routing
414
+ - Full 31-role catalog — all roles scored by keyword, trigger phrase, packet type bias, and deliverable affinity
415
+ - Dynamic chain builder phase-ordered assembly replacing static templates
416
+ - Routing confidence assessment (high/medium/low)
417
+ - `excludeWhen` enforcement — roles suppressed when exclusion patterns match packet content
418
+ - `detectType` false-positive prevention — "integration testing" no longer triggers integration type
419
+ - `--verbose` flag for `roleos route` — hides scoring noise by default
420
+
421
+ #### Conflict Detection
422
+ - 4-pass conflict engine: hard conflicts, sequence, redundancy, coverage gaps
423
+ - Per-role constraint registry: lateOnly, requiresBeforePacks
424
+ - Overlap pair detection
425
+ - Repair suggestions on every finding
426
+
427
+ #### Escalation Auto-Routing
428
+ - Blocked/rejected/conflict/split work auto-routes to named resolver
429
+ - Every escalation includes: target role, recovery type, required artifact, handoff context
430
+
431
+ #### Structured Evidence
432
+ - 12 evidence kinds, 4 statuses, closed 4-verdict enum (accept/accept-with-notes/reject/blocked)
433
+ - Role-aware evidence requirements for 15 roles
434
+ - Sufficiency checks with contradiction detection
435
+
436
+ #### Runtime Dispatch
437
+ - Execution manifests for multi-claude with per-role tool profiles and budgets
438
+ - 8 execution states with auto-advance
439
+ - Escalation packet generation for blocked/rejected steps
440
+
441
+ #### Proven Team Packs
442
+ - 7 battle-tested packs: feature, bugfix, security, docs, launch, research, treatment
443
+ - `roleos packs list` — show all packs with role counts
444
+ - `roleos packs suggest <packet>` — suggest best pack for a packet
445
+ - `roleos packs show <name>` — show pack details (roles, artifacts, stop conditions)
446
+ - Pack suggestion engine with confidence levels
447
+
448
+ #### Trials
449
+ - Full roster proven: 30/30 gold-task trials + 5/5 negative (wrong-task honesty) trials
450
+ - 7 pack execution trials — all packs ran full chains with honest Critic verdicts
451
+ - Trial framework: buildClusterTrials, evaluateTrialOutput, formatTrialReport
452
+
453
+ ### Changed
454
+ - 32 → 31 roles: Information Architect merged into Docs Architect
455
+ - Verdict vocabulary unified: evidence.mjs now uses accept/reject/blocked (matching review.mjs)
456
+ - "worker" terminology replaced with "role" in dispatch.mjs
457
+
458
+ ### Fixed
459
+ - `excludeWhen` was declared on 14 roles but never enforced — now active in scoreRole
460
+ - `detectType` false-positived on "integration testing" — now uses word-boundary regex
461
+ - "Not triggered: N roles" noise hidden by default (shown with --verbose)
462
+ - Handbook: Team Packs page added, reference sidebar reordered
463
+
464
+ ## 1.0.2
465
+
466
+ ### Fixed
467
+ - Fix double-nested `.claude/.claude/` directory created by `roleos init` — `starter-pack/.claude/workflows/full-treatment.md` moved to `starter-pack/workflows/`
468
+ - Read VERSION from `package.json` at runtime instead of hardcoded constant — prevents version drift between CLI and package metadata
469
+
470
+ ### Added
471
+ - `roleos init --force` — update canonical scaffolded files while always protecting user-filled `context/` files
472
+ - 4 regression tests: no double-nesting, correct workflow placement, version sync, --force context protection
473
+
474
+ ## 1.0.0
475
+
476
+ ### Added
477
+ - `roleos init` — scaffold Role OS starter pack into `.claude/`
478
+ - `roleos packet new <type>` — create feature, integration, or identity packets
479
+ - `roleos route <packet-file>` — recommend smallest valid role chain with dependency verification
480
+ - `roleos review <packet-file> <verdict>` — record accept/reject/blocked verdicts
481
+ - Full starter pack: 8 role contracts, 3 schemas, 4 policies, 3 workflows
482
+ - Guided context templates with inline prompts
483
+ - 3 canonical example packets (feature, integration, identity)
484
+ - Adoption handbook