role-os 1.9.0 → 2.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/CHANGELOG.md CHANGED
@@ -1,297 +1,360 @@
1
- # Changelog
2
-
3
- ## 1.9.0
4
-
5
- ### Added
6
-
7
- #### Unified Entry Path (Phase T)
8
- - `roleos start <task>` — auto-decides mission vs pack vs free routing
9
- - Three-level fallback ladder with confidence scores and alternatives
10
- - Composite task detection warns when a task should be decomposed
11
- - `--json` flag for machine-readable entry decisions
12
- - 46 new tests: entry engine, comparison trials, CLI integration
13
-
14
- #### Handbook Updates
15
- - New Missions handbook page with full mission documentation
16
- - Updated Getting Started to lead with `roleos start`
17
- - Updated Reference with all CLI commands (start, mission, packs, artifacts, status, doctor)
18
- - Updated handbook index with entry levels and 9 operating layers
19
-
20
- #### README Overhaul
21
- - "How it works" section leads with `roleos start` examples
22
- - Quick Start updated with mission and start commands
23
- - Added 6 Missions table
24
- - Updated project structure with all 18 source modules
25
- - Updated status history through v1.9.0
26
-
27
- ### Evidence
28
- - 527 tests, zero failures (46 new)
29
- - Entry path trials validated against 20+ real task descriptions
30
- - Fallback ladder tested: mission, pack, free-routing, composite, empty input
31
-
32
- ## 1.8.0
33
-
34
- ### Added
35
-
36
- #### Mission Library (Phase S Mission Hardening)
37
- - 6 named, repeatable mission types: feature-ship, bugfix, treatment, docs-release, security-hardening, research-launch
38
- - Each mission declares: pack, role chain, artifact flow, escalation branches, honest-partial definition, stop conditions, dispatch defaults, trial evidence
39
- - Mission runner: create step through complete/fail generate completion report
40
- - Completion proof reporter with honest-partial and formatted text output
41
- - `roleos mission list` — list all missions
42
- - `roleos mission show <key>` — full mission detail
43
- - `roleos mission suggest <text>`signal-based mission suggestion
44
- - `roleos mission validate [key]`validate mission wiring against packs/roles
45
-
46
- #### Mission Runner Engine
47
- - `createRun()` — instantiate a mission with tracked steps
48
- - `startNextStep()` / `completeStep()` / `failStep()` — step lifecycle
49
- - `recordEscalation()`re-opens completed steps on escalation loops
50
- - `getRunPosition()` / `getArtifactChain()`run introspection
51
- - `generateCompletionReport()` / `formatCompletionReport()`honest outcome reporting
52
-
53
- ### Evidence
54
- - 465 tests, zero failures (67 new)
55
- - All 6 missions validate against live pack/role catalog
56
- - Full lifecycle tests: end-to-end runs, escalation loops, partial completions, failure reporting
57
-
58
- ## 1.7.0
59
-
60
- ### Added
61
-
62
- #### Completion Proof (Phase R)
63
- - `roleos artifacts` CLI command: list, show, validate, chain subcommands
64
- - 13 new CLI integration tests for artifact inspection
65
- - Real task completion missions through the full stack
66
-
67
- #### Completion Proof Evidence
68
- - R1-1 Feature mission: `roleos artifacts` command shipped through feature pack
69
- - Pack: feature (high confidence, correct)
70
- - Chain: 5 roles, 0 escalations, 1 minor correction
71
- - Artifact contracts: all 4 used and valid
72
- - R1-2 Bugfix mission: README.zh.md npm anomaly
73
- - Diagnosed correctly: npm auto-includes README* regardless of files field
74
- - Escalated honestly: fix requires structural decision (translation file organization)
75
- - Not force-closed: deferred to treatment pass
76
-
77
- ### Evidence
78
- - 398 tests, zero failures
79
- - 3 missions run through the full stack
80
- - Completion metrics recorded per mission
81
-
82
- ## 1.6.0
83
-
84
- ### Added
85
-
86
- #### Artifact Spine (Phase Q)
87
- - 20 per-role artifact contracts: each defines artifact type, required sections, evidence references, downstream consumers, and completion rules
88
- - `validateArtifact(role, content)` structural validation against role contracts (missing sections, evidence references, content depth)
89
- - 7 pack-level handoff contracts: define the expected artifact flow between steps for each pack (e.g., strategy-brief → implementation-spec → change-plan → test-package → verdict)
90
- - `validatePackChain(pack, artifacts)` — validates an entire pack's artifact chain for completeness
91
- - `getArtifactContract(role)` / `getHandoffContract(pack)` lookup APIs
92
- - `formatArtifactValidation()` / `formatPackChain()` display formatters
93
-
94
- #### Artifact contract coverage
95
- - Product Strategist → strategy-brief (problem-framing, scope, non-goals, tradeoffs)
96
- - Spec Writer → implementation-spec (acceptance-criteria, edge-cases, interface-spec)
97
- - Backend/Frontend Engineer → change-plan (files-to-change, implementation-approach, risk-notes)
98
- - Test Engineer → test-package (test-plan, test-cases, false-confidence-assessment)
99
- - Security Reviewer security-findings (findings, severity-assessment, recommendations)
100
- - Critic Reviewer verdict (verdict, evidence, required-corrections)
101
- - And 14 more roles with full contracts
102
-
103
- ### Evidence
104
- - 385 tests, zero failures
105
- - 27 new artifact tests
106
-
107
- ## 1.5.0
108
-
109
- ### Added
110
-
111
- #### Hook Spine / Runtime Enforcement (Phase R)
112
- - 5 lifecycle hooks: SessionStart, UserPromptSubmit, PreToolUse, SubagentStart, Stop
113
- - `scaffoldHooks()` generates all 5 hook scripts in .claude/hooks/
114
- - `roleos init claude` now scaffolds hooks + settings.local.json with hook config
115
- - `roleos doctor` now checks for hook scripts (check 7) and settings hooks (check 8)
116
-
117
- #### SessionStart hook
118
- - Establishes session contract on every new session
119
- - Records session ID, timestamp, initializes state tracking
120
- - Adds context reminding Claude to use /roleos-route for non-trivial tasks
121
-
122
- #### UserPromptSubmit hook
123
- - Classifies prompts as substantial (>50 chars + action verbs)
124
- - After 2+ substantial prompts without a route card, adds context reminder
125
- - Does not block — advisory enforcement
126
-
127
- #### PreToolUse hook
128
- - Records all tool usage in session state
129
- - Flags write tools (Bash, Write, Edit) used without route card after substantial work
130
- - Advisory, not blocking — preserves operator control
131
-
132
- #### SubagentStart hook
133
- - Injects active role contract into delegated agents
134
- - Ensures subagents inherit the Role OS session context
135
-
136
- #### Stop hook
137
- - Warns when substantial sessions end without route card or outcome artifact
138
- - Advisory does not block session exit
139
- - Trivial sessions (< 2 substantial prompts) are exempt
140
-
141
- ### Evidence
142
- - 358 tests, zero failures
143
- - 23 new hook tests covering all 5 lifecycle hooks
144
-
145
- ## 1.4.0
146
-
147
- ### Added
148
-
149
- #### Session Spine (Phase Q)
150
- - `roleos init claude` scaffolds Claude Code integration: CLAUDE.md instructions, /roleos-route + /roleos-review + /roleos-status slash commands
151
- - `roleos doctor` — verifies repo is correctly wired for Role OS sessions (6 checks: .claude/ dir, CLAUDE.md section, /roleos-route command, context files, role contracts, packets)
152
- - Route card generation session header artifact proving Role OS was engaged (task type, pack, confidence, composite status, success artifact)
153
- - CLAUDE.md template instructs Claude to route through Role OS before non-trivial work
154
- - /roleos-route command produces structured route cards
155
- - /roleos-review command guides structured verdict production
156
- - /roleos-status command shows active work and context health
157
- - Appends to existing CLAUDE.md without overwriting (detects Role OS section)
158
- - --force flag overwrites existing command files
159
-
160
- ### Evidence
161
- - 335 tests, zero failures
162
-
163
- ## 1.3.0
164
-
165
- ### Added
166
-
167
- #### Outcome Calibration (Phase M)
168
- - Run outcome ledger — append-only JSONL recording pack selection, confidence, overrides, escalations, corrections, completion status
169
- - `computeCalibration()` — pack usage rates, high-confidence accuracy, operator override rates, per-pack performance
170
- - `computePackBoosts()` — weight tuning from clean completed runs (+0.5/run, capped at 2.0)
171
- - `computeConfidenceAdjustment()` — raises threshold when high-confidence is often overridden, lowers when medium is often accepted
172
- - Auto-generated calibration suggestions when metrics drift
173
- - Safety constraint: calibration never overrides mismatch guards, conflict rules, escalation honesty, or evidence requirements
174
-
175
- #### Mixed-Task Decomposition (Phase N)
176
- - `detectComposite()` 7 subtask categories (build, bugfix, security, docs, research, launch, treatment) with signal-based detection
177
- - Structural connector detection ("and then", "after that", "plus", "also")
178
- - Confidence levels: high (3+ categories or 2+ with connectors), medium, low
179
- - `decompose()` — generates linked child packets sorted by phase order
180
- - `createRunPlan()` — dependency-aware parent plan with child tracking
181
- - Honest fallback: medium/low confidence shows uncertainty warning with `--no-split` override
182
-
183
- #### Composite Execution (Phase O)
184
- - `initExecution()` / `advance()` — dependency-driven child execution with artifact passing
185
- - 7 artifact contracts defining what each category produces and expects
186
- - Artifact ledger tracking all cross-packet handoffs
187
- - `blockChild()` / `recoverChild()` / `failChild()` branch recovery with transitive cascade
188
- - `invalidateDownstream()` resets stale children when upstream changes, removes stale artifacts
189
- - `synthesize()` — truthful parent-level completion report
190
- - Independent branches continue unaffected when a sibling fails
191
-
192
- #### Adaptive Replanning (Phase P)
193
- - 6 structured change event types: scope-change, artifact-changed, new-requirement, review-finding, dependency-discovered, priority-change
194
- - `analyzeImpact()` — identifies valid/stale children, stale artifacts, whether new children or reorder needed
195
- - `replan()` — selective replanning: invalidates only affected branches, inserts new children, updates dependencies
196
- - Plan diff: shows what changed, what stayed valid, what reopened, what was inserted
197
- - Execution resumes from next valid child after replan — no restart required
198
-
199
- ### Evidence
200
- - 317 tests, zero failures
201
- - Calibration, decomposition, composite execution, and replanning each have dedicated test suites
202
-
203
- ## 1.2.0
204
-
205
- ### Added
206
- - Pack auto-selection in `roleos route` suggests best pack when confidence is high
207
- - `roleos route --pack=<name>` — use a specific pack for routing
208
- - Pack mismatch detection — warns when a pack doesn't fit the task, suggests the correct alternative
209
- - Pack fallback — mismatched or unknown packs fall back to free routing automatically
210
- - `checkPackMismatch()` API with 7 guard sets covering all pack×task-type combinations
211
- - `getPackRoles()` API with conditional Orchestrator support
212
-
213
- ### Changed
214
- - Docs pack: Support Triage Lead now opens (was Feedback Synthesizer). Feedback Synthesizer is second. Release Engineer + Deployment Verifier moved to optional (overhead for docs-only tasks).
215
- - Pack calibration applied from comparison evidence: conditional Orchestrator, Security Reviewer in Treatment, Product Strategist opens Research, mismatch guards on all 7 packs.
216
-
217
- ### Evidence
218
- - Pack comparison: calibrated packs now win or tie 6/7 (was 2/7 pre-calibration)
219
- - Misfit honesty: 0 full bluffs, 0 undetected partial bluffs (was 1 + 3)
220
- - 230 tests, zero failures
221
-
222
- ## 1.1.0
223
-
224
- ### Added
225
-
226
- #### Routing
227
- - Full 31-role catalog — all roles scored by keyword, trigger phrase, packet type bias, and deliverable affinity
228
- - Dynamic chain builder — phase-ordered assembly replacing static templates
229
- - Routing confidence assessment (high/medium/low)
230
- - `excludeWhen` enforcement roles suppressed when exclusion patterns match packet content
231
- - `detectType` false-positive prevention"integration testing" no longer triggers integration type
232
- - `--verbose` flag for `roleos route` hides scoring noise by default
233
-
234
- #### Conflict Detection
235
- - 4-pass conflict engine: hard conflicts, sequence, redundancy, coverage gaps
236
- - Per-role constraint registry: lateOnly, requiresBeforePacks
237
- - Overlap pair detection
238
- - Repair suggestions on every finding
239
-
240
- #### Escalation Auto-Routing
241
- - Blocked/rejected/conflict/split work auto-routes to named resolver
242
- - Every escalation includes: target role, recovery type, required artifact, handoff context
243
-
244
- #### Structured Evidence
245
- - 12 evidence kinds, 4 statuses, closed 4-verdict enum (accept/accept-with-notes/reject/blocked)
246
- - Role-aware evidence requirements for 15 roles
247
- - Sufficiency checks with contradiction detection
248
-
249
- #### Runtime Dispatch
250
- - Execution manifests for multi-claude with per-role tool profiles and budgets
251
- - 8 execution states with auto-advance
252
- - Escalation packet generation for blocked/rejected steps
253
-
254
- #### Proven Team Packs
255
- - 7 battle-tested packs: feature, bugfix, security, docs, launch, research, treatment
256
- - `roleos packs list` show all packs with role counts
257
- - `roleos packs suggest <packet>` suggest best pack for a packet
258
- - `roleos packs show <name>` show pack details (roles, artifacts, stop conditions)
259
- - Pack suggestion engine with confidence levels
260
-
261
- #### Trials
262
- - Full roster proven: 30/30 gold-task trials + 5/5 negative (wrong-task honesty) trials
263
- - 7 pack execution trials — all packs ran full chains with honest Critic verdicts
264
- - Trial framework: buildClusterTrials, evaluateTrialOutput, formatTrialReport
265
-
266
- ### Changed
267
- - 32 → 31 roles: Information Architect merged into Docs Architect
268
- - Verdict vocabulary unified: evidence.mjs now uses accept/reject/blocked (matching review.mjs)
269
- - "worker" terminology replaced with "role" in dispatch.mjs
270
-
271
- ### Fixed
272
- - `excludeWhen` was declared on 14 roles but never enforced now active in scoreRole
273
- - `detectType` false-positived on "integration testing" now uses word-boundary regex
274
- - "Not triggered: N roles" noise hidden by default (shown with --verbose)
275
- - Handbook: Team Packs page added, reference sidebar reordered
276
-
277
- ## 1.0.2
278
-
279
- ### Fixed
280
- - Fix double-nested `.claude/.claude/` directory created by `roleos init` — `starter-pack/.claude/workflows/full-treatment.md` moved to `starter-pack/workflows/`
281
- - Read VERSION from `package.json` at runtime instead of hardcoded constant prevents version drift between CLI and package metadata
282
-
283
- ### Added
284
- - `roleos init --force` — update canonical scaffolded files while always protecting user-filled `context/` files
285
- - 4 regression tests: no double-nesting, correct workflow placement, version sync, --force context protection
286
-
287
- ## 1.0.0
288
-
289
- ### Added
290
- - `roleos init`scaffold Role OS starter pack into `.claude/`
291
- - `roleos packet new <type>` create feature, integration, or identity packets
292
- - `roleos route <packet-file>` — recommend smallest valid role chain with dependency verification
293
- - `roleos review <packet-file> <verdict>` record accept/reject/blocked verdicts
294
- - Full starter pack: 8 role contracts, 3 schemas, 4 policies, 3 workflows
295
- - Guided context templates with inline prompts
296
- - 3 canonical example packets (feature, integration, identity)
297
- - Adoption handbook
1
+ # Changelog
2
+
3
+ ## 2.1.0
4
+
5
+ ### Added
6
+
7
+ #### Brainstorm Mission (v0.4) — Structured Inquiry with Traceable Disagreement
8
+
9
+ - **Brainstorm mission** 7th mission in the library, 9-role chain with two-layer architecture
10
+ - **Layer 1 (truth):** 4 analyst roles emit role-native schemas (ContextMap, UserValueMap, MechanicsMap, PositioningMap), not shared prose. Blindspot enforcement: forbidden phrases, forbidden claim kinds, filtered input partitions per role. Provenance-preserving atoms carry source_role, claim_kind, allowed_challengers. Cross-examination permission matrix (directed graph). Rebut phase: original analysts defend, narrow, or retract under pressure.
11
+ - **Layer 2 (render):** 5 distinct voices — Boundary Memo (taxonomist), Field Notes (ethnographer), System Sketch (whiteboard), Claim Brief (strategist), Cross-Exam Transcript (litigator). Lexical bans prevent voice convergence. Debate transcript generator. Both layers always available.
12
+ - **Trace links:** Every rendered sentence maps to a truth-layer atom. Synthesis cites atoms, never prose.
13
+ - **Golden run proof:** Full artifact chain for MCP server marketplace topic — truth artifacts, dispute graph (4 challenges, 3 narrowed, 1 unresolved), rendered artifacts, trace map (16+ links). Published as `examples/golden-run.md`.
14
+ - **Result formatter:** `formatBrainstormResult()` produces saveable markdown with verdict, directions, dispute, tensions, rendered artifacts (opt-in), and evidence trail. Layer parameter controls truth-only vs both.
15
+ - **Artifact contracts:** 9 brainstorm role contracts (replacing 3 v0.1 scout contracts) with completion rules, required evidence, and consumer mapping.
16
+ - **Pack update:** Brainstorm pack updated from v0.1 scouts to v0.3/v0.4 analysts with correct chain order and required artifacts.
17
+
18
+ ### Changed
19
+
20
+ - Mission count: 6 → 7
21
+ - Role count: 31 50 (brainstorm analysts, contrarian, plus existing)
22
+ - Artifact contract count: 20 30
23
+ - Test count: 617 → 905
24
+
25
+ ## 2.0.1
26
+
27
+ ### Added
28
+
29
+ - 4 version consistency tests (semver, >= 1.0.0, CHANGELOG, help output)
30
+
31
+ ## 2.0.0
32
+
33
+ ### Added
34
+
35
+ #### Operator Friction Pass (Phase U)
36
+ - `roleos run "<task>"`one command from task description to active execution
37
+ - Persistent disk-backed runs in `.claude/runs/` survives session interruptions
38
+ - Entry level auto-selection: mission, pack, or free routing with force overrides (`--mission=`, `--pack=`)
39
+ - Step-local operator guidance at every step: role, artifact, required sections, completion rule, stop conditions
40
+ - `roleos resume [id]` continue interrupted runs from disk
41
+ - `roleos next` — start the next step or show what's active
42
+ - `roleos explain [id]` — full run state with guidance, escalations, interventions
43
+ - `roleos complete <artifact> [note]` complete the active step with artifact reference
44
+ - `roleos fail <partial|failed> <reason>`fail with honest downstream blocking
45
+ - `roleos run list` — list all runs with status icons
46
+ - `roleos run show <id>` — full run detail
47
+
48
+ #### Intervention Shortcuts
49
+ - `roleos retry <step>` retry a failed/partial step, unblock downstream
50
+ - `roleos reroute <step> <role> <reason>` swap a step to a different role
51
+ - `roleos escalate <from> <to> <trigger> <action>` escalate between roles with step re-opening
52
+ - `roleos block <step> <reason>` — manually block a step
53
+ - `roleos reopen <step> <reason>` — reopen a completed step for re-execution
54
+
55
+ #### Friction Measurement
56
+ - `roleos report [id]` generate completion report with honest-partial
57
+ - `roleos friction [id]` — measure operator touches: interventions, escalations, manual steps
58
+ - Friction score: low/medium/high based on touch count vs step count
59
+
60
+ ### Evidence
61
+ - 613 tests, zero failures (86 new)
62
+ - 6 friction trials validated: clean run, reroute, retry, pack-level, free-routing, disk resume
63
+ - All entry levels produce low/medium friction scores
64
+ - Disk round-trip verified: create pause load → resume → complete
65
+
66
+ ## 1.9.0
67
+
68
+ ### Added
69
+
70
+ #### Unified Entry Path (Phase T)
71
+ - `roleos start <task>` auto-decides mission vs pack vs free routing
72
+ - Three-level fallback ladder with confidence scores and alternatives
73
+ - Composite task detection warns when a task should be decomposed
74
+ - `--json` flag for machine-readable entry decisions
75
+ - 46 new tests: entry engine, comparison trials, CLI integration
76
+
77
+ #### Handbook Updates
78
+ - New Missions handbook page with full mission documentation
79
+ - Updated Getting Started to lead with `roleos start`
80
+ - Updated Reference with all CLI commands (start, mission, packs, artifacts, status, doctor)
81
+ - Updated handbook index with entry levels and 9 operating layers
82
+
83
+ #### README Overhaul
84
+ - "How it works" section leads with `roleos start` examples
85
+ - Quick Start updated with mission and start commands
86
+ - Added 6 Missions table
87
+ - Updated project structure with all 18 source modules
88
+ - Updated status history through v1.9.0
89
+
90
+ ### Evidence
91
+ - 527 tests, zero failures (46 new)
92
+ - Entry path trials validated against 20+ real task descriptions
93
+ - Fallback ladder tested: mission, pack, free-routing, composite, empty input
94
+
95
+ ## 1.8.0
96
+
97
+ ### Added
98
+
99
+ #### Mission Library (Phase S Mission Hardening)
100
+ - 6 named, repeatable mission types: feature-ship, bugfix, treatment, docs-release, security-hardening, research-launch
101
+ - Each mission declares: pack, role chain, artifact flow, escalation branches, honest-partial definition, stop conditions, dispatch defaults, trial evidence
102
+ - Mission runner: create → step through → complete/fail → generate completion report
103
+ - Completion proof reporter with honest-partial and formatted text output
104
+ - `roleos mission list` — list all missions
105
+ - `roleos mission show <key>` — full mission detail
106
+ - `roleos mission suggest <text>` — signal-based mission suggestion
107
+ - `roleos mission validate [key]` — validate mission wiring against packs/roles
108
+
109
+ #### Mission Runner Engine
110
+ - `createRun()` — instantiate a mission with tracked steps
111
+ - `startNextStep()` / `completeStep()` / `failStep()` — step lifecycle
112
+ - `recordEscalation()` re-opens completed steps on escalation loops
113
+ - `getRunPosition()` / `getArtifactChain()` run introspection
114
+ - `generateCompletionReport()` / `formatCompletionReport()` honest outcome reporting
115
+
116
+ ### Evidence
117
+ - 465 tests, zero failures (67 new)
118
+ - All 6 missions validate against live pack/role catalog
119
+ - Full lifecycle tests: end-to-end runs, escalation loops, partial completions, failure reporting
120
+
121
+ ## 1.7.0
122
+
123
+ ### Added
124
+
125
+ #### Completion Proof (Phase R)
126
+ - `roleos artifacts` CLI command: list, show, validate, chain subcommands
127
+ - 13 new CLI integration tests for artifact inspection
128
+ - Real task completion missions through the full stack
129
+
130
+ #### Completion Proof Evidence
131
+ - R1-1 Feature mission: `roleos artifacts` command shipped through feature pack
132
+ - Pack: feature (high confidence, correct)
133
+ - Chain: 5 roles, 0 escalations, 1 minor correction
134
+ - Artifact contracts: all 4 used and valid
135
+ - R1-2 Bugfix mission: README.zh.md npm anomaly
136
+ - Diagnosed correctly: npm auto-includes README* regardless of files field
137
+ - Escalated honestly: fix requires structural decision (translation file organization)
138
+ - Not force-closed: deferred to treatment pass
139
+
140
+ ### Evidence
141
+ - 398 tests, zero failures
142
+ - 3 missions run through the full stack
143
+ - Completion metrics recorded per mission
144
+
145
+ ## 1.6.0
146
+
147
+ ### Added
148
+
149
+ #### Artifact Spine (Phase Q)
150
+ - 20 per-role artifact contracts: each defines artifact type, required sections, evidence references, downstream consumers, and completion rules
151
+ - `validateArtifact(role, content)` — structural validation against role contracts (missing sections, evidence references, content depth)
152
+ - 7 pack-level handoff contracts: define the expected artifact flow between steps for each pack (e.g., strategy-brief implementation-spec change-plan test-package → verdict)
153
+ - `validatePackChain(pack, artifacts)` validates an entire pack's artifact chain for completeness
154
+ - `getArtifactContract(role)` / `getHandoffContract(pack)` lookup APIs
155
+ - `formatArtifactValidation()` / `formatPackChain()` display formatters
156
+
157
+ #### Artifact contract coverage
158
+ - Product Strategist strategy-brief (problem-framing, scope, non-goals, tradeoffs)
159
+ - Spec Writer → implementation-spec (acceptance-criteria, edge-cases, interface-spec)
160
+ - Backend/Frontend Engineer → change-plan (files-to-change, implementation-approach, risk-notes)
161
+ - Test Engineer → test-package (test-plan, test-cases, false-confidence-assessment)
162
+ - Security Reviewer → security-findings (findings, severity-assessment, recommendations)
163
+ - Critic Reviewer → verdict (verdict, evidence, required-corrections)
164
+ - And 14 more roles with full contracts
165
+
166
+ ### Evidence
167
+ - 385 tests, zero failures
168
+ - 27 new artifact tests
169
+
170
+ ## 1.5.0
171
+
172
+ ### Added
173
+
174
+ #### Hook Spine / Runtime Enforcement (Phase R)
175
+ - 5 lifecycle hooks: SessionStart, UserPromptSubmit, PreToolUse, SubagentStart, Stop
176
+ - `scaffoldHooks()` generates all 5 hook scripts in .claude/hooks/
177
+ - `roleos init claude` now scaffolds hooks + settings.local.json with hook config
178
+ - `roleos doctor` now checks for hook scripts (check 7) and settings hooks (check 8)
179
+
180
+ #### SessionStart hook
181
+ - Establishes session contract on every new session
182
+ - Records session ID, timestamp, initializes state tracking
183
+ - Adds context reminding Claude to use /roleos-route for non-trivial tasks
184
+
185
+ #### UserPromptSubmit hook
186
+ - Classifies prompts as substantial (>50 chars + action verbs)
187
+ - After 2+ substantial prompts without a route card, adds context reminder
188
+ - Does not block advisory enforcement
189
+
190
+ #### PreToolUse hook
191
+ - Records all tool usage in session state
192
+ - Flags write tools (Bash, Write, Edit) used without route card after substantial work
193
+ - Advisory, not blocking preserves operator control
194
+
195
+ #### SubagentStart hook
196
+ - Injects active role contract into delegated agents
197
+ - Ensures subagents inherit the Role OS session context
198
+
199
+ #### Stop hook
200
+ - Warns when substantial sessions end without route card or outcome artifact
201
+ - Advisory does not block session exit
202
+ - Trivial sessions (< 2 substantial prompts) are exempt
203
+
204
+ ### Evidence
205
+ - 358 tests, zero failures
206
+ - 23 new hook tests covering all 5 lifecycle hooks
207
+
208
+ ## 1.4.0
209
+
210
+ ### Added
211
+
212
+ #### Session Spine (Phase Q)
213
+ - `roleos init claude` — scaffolds Claude Code integration: CLAUDE.md instructions, /roleos-route + /roleos-review + /roleos-status slash commands
214
+ - `roleos doctor` verifies repo is correctly wired for Role OS sessions (6 checks: .claude/ dir, CLAUDE.md section, /roleos-route command, context files, role contracts, packets)
215
+ - Route card generation session header artifact proving Role OS was engaged (task type, pack, confidence, composite status, success artifact)
216
+ - CLAUDE.md template instructs Claude to route through Role OS before non-trivial work
217
+ - /roleos-route command produces structured route cards
218
+ - /roleos-review command guides structured verdict production
219
+ - /roleos-status command shows active work and context health
220
+ - Appends to existing CLAUDE.md without overwriting (detects Role OS section)
221
+ - --force flag overwrites existing command files
222
+
223
+ ### Evidence
224
+ - 335 tests, zero failures
225
+
226
+ ## 1.3.0
227
+
228
+ ### Added
229
+
230
+ #### Outcome Calibration (Phase M)
231
+ - Run outcome ledgerappend-only JSONL recording pack selection, confidence, overrides, escalations, corrections, completion status
232
+ - `computeCalibration()` pack usage rates, high-confidence accuracy, operator override rates, per-pack performance
233
+ - `computePackBoosts()` — weight tuning from clean completed runs (+0.5/run, capped at 2.0)
234
+ - `computeConfidenceAdjustment()` — raises threshold when high-confidence is often overridden, lowers when medium is often accepted
235
+ - Auto-generated calibration suggestions when metrics drift
236
+ - Safety constraint: calibration never overrides mismatch guards, conflict rules, escalation honesty, or evidence requirements
237
+
238
+ #### Mixed-Task Decomposition (Phase N)
239
+ - `detectComposite()` — 7 subtask categories (build, bugfix, security, docs, research, launch, treatment) with signal-based detection
240
+ - Structural connector detection ("and then", "after that", "plus", "also")
241
+ - Confidence levels: high (3+ categories or 2+ with connectors), medium, low
242
+ - `decompose()` generates linked child packets sorted by phase order
243
+ - `createRunPlan()` — dependency-aware parent plan with child tracking
244
+ - Honest fallback: medium/low confidence shows uncertainty warning with `--no-split` override
245
+
246
+ #### Composite Execution (Phase O)
247
+ - `initExecution()` / `advance()` — dependency-driven child execution with artifact passing
248
+ - 7 artifact contracts defining what each category produces and expects
249
+ - Artifact ledger tracking all cross-packet handoffs
250
+ - `blockChild()` / `recoverChild()` / `failChild()` branch recovery with transitive cascade
251
+ - `invalidateDownstream()` resets stale children when upstream changes, removes stale artifacts
252
+ - `synthesize()` truthful parent-level completion report
253
+ - Independent branches continue unaffected when a sibling fails
254
+
255
+ #### Adaptive Replanning (Phase P)
256
+ - 6 structured change event types: scope-change, artifact-changed, new-requirement, review-finding, dependency-discovered, priority-change
257
+ - `analyzeImpact()` identifies valid/stale children, stale artifacts, whether new children or reorder needed
258
+ - `replan()` selective replanning: invalidates only affected branches, inserts new children, updates dependencies
259
+ - Plan diff: shows what changed, what stayed valid, what reopened, what was inserted
260
+ - Execution resumes from next valid child after replan — no restart required
261
+
262
+ ### Evidence
263
+ - 317 tests, zero failures
264
+ - Calibration, decomposition, composite execution, and replanning each have dedicated test suites
265
+
266
+ ## 1.2.0
267
+
268
+ ### Added
269
+ - Pack auto-selection in `roleos route` suggests best pack when confidence is high
270
+ - `roleos route --pack=<name>` — use a specific pack for routing
271
+ - Pack mismatch detection — warns when a pack doesn't fit the task, suggests the correct alternative
272
+ - Pack fallback mismatched or unknown packs fall back to free routing automatically
273
+ - `checkPackMismatch()` API with 7 guard sets covering all pack×task-type combinations
274
+ - `getPackRoles()` API with conditional Orchestrator support
275
+
276
+ ### Changed
277
+ - Docs pack: Support Triage Lead now opens (was Feedback Synthesizer). Feedback Synthesizer is second. Release Engineer + Deployment Verifier moved to optional (overhead for docs-only tasks).
278
+ - Pack calibration applied from comparison evidence: conditional Orchestrator, Security Reviewer in Treatment, Product Strategist opens Research, mismatch guards on all 7 packs.
279
+
280
+ ### Evidence
281
+ - Pack comparison: calibrated packs now win or tie 6/7 (was 2/7 pre-calibration)
282
+ - Misfit honesty: 0 full bluffs, 0 undetected partial bluffs (was 1 + 3)
283
+ - 230 tests, zero failures
284
+
285
+ ## 1.1.0
286
+
287
+ ### Added
288
+
289
+ #### Routing
290
+ - Full 31-role catalog all roles scored by keyword, trigger phrase, packet type bias, and deliverable affinity
291
+ - Dynamic chain builderphase-ordered assembly replacing static templates
292
+ - Routing confidence assessment (high/medium/low)
293
+ - `excludeWhen` enforcement roles suppressed when exclusion patterns match packet content
294
+ - `detectType` false-positive prevention "integration testing" no longer triggers integration type
295
+ - `--verbose` flag for `roleos route` — hides scoring noise by default
296
+
297
+ #### Conflict Detection
298
+ - 4-pass conflict engine: hard conflicts, sequence, redundancy, coverage gaps
299
+ - Per-role constraint registry: lateOnly, requiresBeforePacks
300
+ - Overlap pair detection
301
+ - Repair suggestions on every finding
302
+
303
+ #### Escalation Auto-Routing
304
+ - Blocked/rejected/conflict/split work auto-routes to named resolver
305
+ - Every escalation includes: target role, recovery type, required artifact, handoff context
306
+
307
+ #### Structured Evidence
308
+ - 12 evidence kinds, 4 statuses, closed 4-verdict enum (accept/accept-with-notes/reject/blocked)
309
+ - Role-aware evidence requirements for 15 roles
310
+ - Sufficiency checks with contradiction detection
311
+
312
+ #### Runtime Dispatch
313
+ - Execution manifests for multi-claude with per-role tool profiles and budgets
314
+ - 8 execution states with auto-advance
315
+ - Escalation packet generation for blocked/rejected steps
316
+
317
+ #### Proven Team Packs
318
+ - 7 battle-tested packs: feature, bugfix, security, docs, launch, research, treatment
319
+ - `roleos packs list` — show all packs with role counts
320
+ - `roleos packs suggest <packet>` — suggest best pack for a packet
321
+ - `roleos packs show <name>` — show pack details (roles, artifacts, stop conditions)
322
+ - Pack suggestion engine with confidence levels
323
+
324
+ #### Trials
325
+ - Full roster proven: 30/30 gold-task trials + 5/5 negative (wrong-task honesty) trials
326
+ - 7 pack execution trials — all packs ran full chains with honest Critic verdicts
327
+ - Trial framework: buildClusterTrials, evaluateTrialOutput, formatTrialReport
328
+
329
+ ### Changed
330
+ - 32 → 31 roles: Information Architect merged into Docs Architect
331
+ - Verdict vocabulary unified: evidence.mjs now uses accept/reject/blocked (matching review.mjs)
332
+ - "worker" terminology replaced with "role" in dispatch.mjs
333
+
334
+ ### Fixed
335
+ - `excludeWhen` was declared on 14 roles but never enforced — now active in scoreRole
336
+ - `detectType` false-positived on "integration testing" — now uses word-boundary regex
337
+ - "Not triggered: N roles" noise hidden by default (shown with --verbose)
338
+ - Handbook: Team Packs page added, reference sidebar reordered
339
+
340
+ ## 1.0.2
341
+
342
+ ### Fixed
343
+ - Fix double-nested `.claude/.claude/` directory created by `roleos init` — `starter-pack/.claude/workflows/full-treatment.md` moved to `starter-pack/workflows/`
344
+ - Read VERSION from `package.json` at runtime instead of hardcoded constant — prevents version drift between CLI and package metadata
345
+
346
+ ### Added
347
+ - `roleos init --force` — update canonical scaffolded files while always protecting user-filled `context/` files
348
+ - 4 regression tests: no double-nesting, correct workflow placement, version sync, --force context protection
349
+
350
+ ## 1.0.0
351
+
352
+ ### Added
353
+ - `roleos init` — scaffold Role OS starter pack into `.claude/`
354
+ - `roleos packet new <type>` — create feature, integration, or identity packets
355
+ - `roleos route <packet-file>` — recommend smallest valid role chain with dependency verification
356
+ - `roleos review <packet-file> <verdict>` — record accept/reject/blocked verdicts
357
+ - Full starter pack: 8 role contracts, 3 schemas, 4 policies, 3 workflows
358
+ - Guided context templates with inline prompts
359
+ - 3 canonical example packets (feature, integration, identity)
360
+ - Adoption handbook