create-claude-cabinet 0.6.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (135) hide show
  1. package/LICENSE +21 -0
  2. package/README.md +196 -0
  3. package/bin/create-claude-cabinet.js +8 -0
  4. package/lib/cli.js +624 -0
  5. package/lib/copy.js +152 -0
  6. package/lib/db-setup.js +51 -0
  7. package/lib/metadata.js +42 -0
  8. package/lib/reset.js +193 -0
  9. package/lib/settings-merge.js +93 -0
  10. package/package.json +29 -0
  11. package/templates/EXTENSIONS.md +311 -0
  12. package/templates/README.md +485 -0
  13. package/templates/briefing/_briefing-api-template.md +21 -0
  14. package/templates/briefing/_briefing-architecture-template.md +16 -0
  15. package/templates/briefing/_briefing-cabinet-template.md +20 -0
  16. package/templates/briefing/_briefing-identity-template.md +18 -0
  17. package/templates/briefing/_briefing-scopes-template.md +39 -0
  18. package/templates/briefing/_briefing-template.md +148 -0
  19. package/templates/briefing/_briefing-work-tracking-template.md +18 -0
  20. package/templates/cabinet/committees-template.yaml +49 -0
  21. package/templates/cabinet/composition-patterns.md +240 -0
  22. package/templates/cabinet/eval-protocol.md +208 -0
  23. package/templates/cabinet/lifecycle.md +93 -0
  24. package/templates/cabinet/output-contract.md +148 -0
  25. package/templates/cabinet/prompt-guide.md +266 -0
  26. package/templates/hooks/cor-upstream-guard.sh +79 -0
  27. package/templates/hooks/git-guardrails.sh +67 -0
  28. package/templates/hooks/skill-telemetry.sh +66 -0
  29. package/templates/hooks/skill-tool-telemetry.sh +54 -0
  30. package/templates/hooks/stop-hook.md +56 -0
  31. package/templates/memory/patterns/_pattern-template.md +119 -0
  32. package/templates/memory/patterns/pattern-intelligence-first.md +41 -0
  33. package/templates/rules/enforcement-pipeline.md +151 -0
  34. package/templates/scripts/cor-drift-check.cjs +84 -0
  35. package/templates/scripts/finding-schema.json +94 -0
  36. package/templates/scripts/load-triage-history.js +151 -0
  37. package/templates/scripts/merge-findings.js +126 -0
  38. package/templates/scripts/pib-db-schema.sql +68 -0
  39. package/templates/scripts/pib-db.js +365 -0
  40. package/templates/scripts/triage-server.mjs +98 -0
  41. package/templates/scripts/triage-ui.html +536 -0
  42. package/templates/skills/audit/SKILL.md +273 -0
  43. package/templates/skills/audit/phases/finding-output.md +56 -0
  44. package/templates/skills/audit/phases/member-execution.md +83 -0
  45. package/templates/skills/audit/phases/member-selection.md +44 -0
  46. package/templates/skills/audit/phases/structural-checks.md +54 -0
  47. package/templates/skills/audit/phases/triage-history.md +45 -0
  48. package/templates/skills/cabinet-accessibility/SKILL.md +180 -0
  49. package/templates/skills/cabinet-anti-confirmation/SKILL.md +172 -0
  50. package/templates/skills/cabinet-architecture/SKILL.md +279 -0
  51. package/templates/skills/cabinet-boundary-man/SKILL.md +265 -0
  52. package/templates/skills/cabinet-cor-health/SKILL.md +342 -0
  53. package/templates/skills/cabinet-data-integrity/SKILL.md +157 -0
  54. package/templates/skills/cabinet-debugger/SKILL.md +221 -0
  55. package/templates/skills/cabinet-historian/SKILL.md +253 -0
  56. package/templates/skills/cabinet-organized-mind/SKILL.md +338 -0
  57. package/templates/skills/cabinet-process-therapist/SKILL.md +261 -0
  58. package/templates/skills/cabinet-qa/SKILL.md +205 -0
  59. package/templates/skills/cabinet-record-keeper/SKILL.md +168 -0
  60. package/templates/skills/cabinet-roster-check/SKILL.md +297 -0
  61. package/templates/skills/cabinet-security/SKILL.md +181 -0
  62. package/templates/skills/cabinet-small-screen/SKILL.md +154 -0
  63. package/templates/skills/cabinet-speed-freak/SKILL.md +169 -0
  64. package/templates/skills/cabinet-system-advocate/SKILL.md +194 -0
  65. package/templates/skills/cabinet-technical-debt/SKILL.md +115 -0
  66. package/templates/skills/cabinet-usability/SKILL.md +189 -0
  67. package/templates/skills/cabinet-workflow-cop/SKILL.md +238 -0
  68. package/templates/skills/cor-upgrade/SKILL.md +302 -0
  69. package/templates/skills/debrief/SKILL.md +409 -0
  70. package/templates/skills/debrief/phases/auto-maintenance.md +48 -0
  71. package/templates/skills/debrief/phases/close-work.md +88 -0
  72. package/templates/skills/debrief/phases/health-checks.md +54 -0
  73. package/templates/skills/debrief/phases/inventory.md +40 -0
  74. package/templates/skills/debrief/phases/loose-ends.md +52 -0
  75. package/templates/skills/debrief/phases/record-lessons.md +67 -0
  76. package/templates/skills/debrief/phases/report.md +59 -0
  77. package/templates/skills/debrief/phases/update-state.md +48 -0
  78. package/templates/skills/debrief/phases/upstream-feedback.md +129 -0
  79. package/templates/skills/debrief-quick/SKILL.md +12 -0
  80. package/templates/skills/execute/SKILL.md +293 -0
  81. package/templates/skills/execute/phases/cabinet.md +49 -0
  82. package/templates/skills/execute/phases/commit-and-deploy.md +66 -0
  83. package/templates/skills/execute/phases/load-plan.md +49 -0
  84. package/templates/skills/execute/phases/validators.md +50 -0
  85. package/templates/skills/execute/phases/verification-tools.md +67 -0
  86. package/templates/skills/extract/SKILL.md +168 -0
  87. package/templates/skills/investigate/SKILL.md +160 -0
  88. package/templates/skills/link/SKILL.md +52 -0
  89. package/templates/skills/menu/SKILL.md +61 -0
  90. package/templates/skills/onboard/SKILL.md +356 -0
  91. package/templates/skills/onboard/phases/detect-state.md +79 -0
  92. package/templates/skills/onboard/phases/generate-briefing.md +127 -0
  93. package/templates/skills/onboard/phases/generate-session-loop.md +87 -0
  94. package/templates/skills/onboard/phases/interview.md +233 -0
  95. package/templates/skills/onboard/phases/modularity-menu.md +162 -0
  96. package/templates/skills/onboard/phases/options.md +98 -0
  97. package/templates/skills/onboard/phases/post-onboard-audit.md +121 -0
  98. package/templates/skills/onboard/phases/summary.md +122 -0
  99. package/templates/skills/onboard/phases/work-tracking.md +231 -0
  100. package/templates/skills/orient/SKILL.md +251 -0
  101. package/templates/skills/orient/phases/auto-maintenance.md +48 -0
  102. package/templates/skills/orient/phases/briefing.md +53 -0
  103. package/templates/skills/orient/phases/cabinet.md +46 -0
  104. package/templates/skills/orient/phases/context.md +63 -0
  105. package/templates/skills/orient/phases/data-sync.md +35 -0
  106. package/templates/skills/orient/phases/health-checks.md +50 -0
  107. package/templates/skills/orient/phases/work-scan.md +69 -0
  108. package/templates/skills/orient-quick/SKILL.md +12 -0
  109. package/templates/skills/plan/SKILL.md +358 -0
  110. package/templates/skills/plan/phases/cabinet-critique.md +47 -0
  111. package/templates/skills/plan/phases/calibration-examples.md +75 -0
  112. package/templates/skills/plan/phases/completeness-check.md +44 -0
  113. package/templates/skills/plan/phases/composition-check.md +36 -0
  114. package/templates/skills/plan/phases/overlap-check.md +62 -0
  115. package/templates/skills/plan/phases/plan-template.md +69 -0
  116. package/templates/skills/plan/phases/present.md +60 -0
  117. package/templates/skills/plan/phases/research.md +43 -0
  118. package/templates/skills/plan/phases/work-tracker.md +95 -0
  119. package/templates/skills/publish/SKILL.md +74 -0
  120. package/templates/skills/pulse/SKILL.md +242 -0
  121. package/templates/skills/pulse/phases/auto-fix-scope.md +40 -0
  122. package/templates/skills/pulse/phases/checks.md +58 -0
  123. package/templates/skills/pulse/phases/output.md +54 -0
  124. package/templates/skills/seed/SKILL.md +257 -0
  125. package/templates/skills/seed/phases/build-member.md +93 -0
  126. package/templates/skills/seed/phases/evaluate-existing.md +61 -0
  127. package/templates/skills/seed/phases/maintain.md +92 -0
  128. package/templates/skills/seed/phases/scan-signals.md +86 -0
  129. package/templates/skills/triage-audit/SKILL.md +251 -0
  130. package/templates/skills/triage-audit/phases/apply-verdicts.md +90 -0
  131. package/templates/skills/triage-audit/phases/load-findings.md +38 -0
  132. package/templates/skills/triage-audit/phases/triage-ui.md +66 -0
  133. package/templates/skills/unlink/SKILL.md +35 -0
  134. package/templates/skills/validate/SKILL.md +116 -0
  135. package/templates/skills/validate/phases/validators.md +53 -0
@@ -0,0 +1,148 @@
1
+ # Briefing File System — Guide
2
+
3
+ The cabinet system uses **split briefing files** instead of a single
4
+ monolithic `_briefing.md`. Each file focuses on one domain of project
5
+ knowledge, and cabinet members declare which files they need in their
6
+ frontmatter. This keeps briefing loading focused — a cabinet member that
7
+ only needs identity and paths doesn't load API configuration or work
8
+ tracking details.
9
+
10
+ ## Architecture
11
+
12
+ A **hub file** (`_briefing.md`) indexes the focused briefing files that
13
+ exist for this project. Cabinet members read the specific files they need
14
+ rather than parsing one large document.
15
+
16
+ ```
17
+ _briefing.md ← Hub/index (always exists)
18
+ _briefing-identity.md ← What the project is (always exists)
19
+ _briefing-architecture.md ← System structure, codebase layout
20
+ _briefing-scopes.md ← Where to look (paths)
21
+ _briefing-cabinet.md ← Active cabinet members, portfolio rules
22
+ _briefing-work-tracking.md ← Work item storage and interfaces
23
+ _briefing-api.md ← API config, entity types
24
+ _briefing-{domain}.md ← Domain extensions (see below)
25
+ ```
26
+
27
+ ## File Descriptions
28
+
29
+ ### `_briefing.md` — Hub/Index
30
+ **Always created.** Lists which briefing files exist and a one-line
31
+ summary of each. This is what cabinet members fall back to if they don't
32
+ declare specific briefing needs.
33
+
34
+ ### `_briefing-identity.md` — Project Identity
35
+ **Always created.** What the project is, core principles, user context.
36
+ Every cabinet member needs this — it calibrates all findings. Template:
37
+ `_briefing-identity-template.md`.
38
+
39
+ ### `_briefing-architecture.md` — Architecture
40
+ System structure, codebase layout, technology stack. Needed by
41
+ cabinet members that evaluate code structure or need to understand where
42
+ things live. Template: `_briefing-architecture-template.md`.
43
+
44
+ ### `_briefing-scopes.md` — Paths
45
+ Where to look for different kinds of code and configuration. Sections
46
+ are referenced by name (e.g., "App Source", "Data Store"). Only fill in
47
+ sections relevant to the cabinet members you adopt. Template:
48
+ `_briefing-scopes-template.md`.
49
+
50
+ ### `_briefing-cabinet.md` — Cabinet
51
+ Which cabinet members are active, portfolio rules, invocation patterns.
52
+ Needed by meta cabinet members that evaluate the cabinet system itself.
53
+ Template: `_briefing-cabinet-template.md`.
54
+
55
+ ### `_briefing-work-tracking.md` — Work Tracking
56
+ How the project tracks planned work — storage, query interface,
57
+ mutation interface. Referenced by /plan, /execute, /orient, /debrief.
58
+ Template: `_briefing-work-tracking-template.md`.
59
+
60
+ ### `_briefing-api.md` — API Configuration
61
+ Endpoints, auth, entity types. Only create this if the project has an
62
+ API. Template: `_briefing-api-template.md`.
63
+
64
+ ## Cabinet Member-to-Briefing Mapping
65
+
66
+ Which cabinet members need which briefing files (identity is always loaded):
67
+
68
+ | Cabinet Member | architecture | scopes | cabinet | work-tracking | api |
69
+ |-----------------------|:---:|:---:|:---:|:---:|:---:|
70
+ | accessibility | | x | | | |
71
+ | anti-confirmation | | | | | |
72
+ | architecture | x | x | | | |
73
+ | boundary-man | x | x | | | |
74
+ | cor-health | | x | x | | |
75
+ | data-integrity | x | x | | | x |
76
+ | debugger | x | | | | |
77
+ | record-keeper | | x | | | |
78
+ | historian | x | | | | |
79
+ | process-therapist | | x | x | | |
80
+ | small-screen | | x | | | |
81
+ | organized-mind | x | | | | |
82
+ | speed-freak | x | x | | | |
83
+ | workflow-cop | | x | | | |
84
+ | qa | x | x | | | |
85
+ | security | x | x | | | x |
86
+ | roster-check | | | x | | |
87
+ | system-advocate | | | x | | |
88
+ | technical-debt | x | | | | |
89
+ | usability | | x | | | |
90
+
91
+ ## Domain Extension Files
92
+
93
+ Specialized cabinet members may need domain-specific briefing that doesn't
94
+ fit the standard files. These are created by `/seed` when a specialized
95
+ cabinet member is adopted:
96
+
97
+ - **`_briefing-methodology.md`** — For methodology-compliance or GTD
98
+ cabinet members. Contains methodology rules, review cadences, horizon
99
+ definitions.
100
+ - **`_briefing-design-system.md`** — For framework-quality or
101
+ information-design cabinet members. Contains design tokens, component
102
+ conventions, layout patterns.
103
+ - **Any `_briefing-{domain}.md`** — A cabinet member can declare any briefing
104
+ file it needs in its frontmatter. If the file doesn't exist, the
105
+ cabinet member falls back to the hub.
106
+
107
+ ## Files Are Optional
108
+
109
+ Only create briefing files relevant to your project. A CLI tool with no
110
+ UI doesn't need `_briefing-scopes.md` App Source. A project without an
111
+ API skips `_briefing-api.md` entirely. The hub `_briefing.md` lists what
112
+ exists so cabinet members know what's available.
113
+
114
+ ## How These Files Get Created
115
+
116
+ - **/onboard** generates the initial set from interview answers. It
117
+ always creates the hub and identity file. Other files are created
118
+ only if the interview produced content for them.
119
+ - **/seed** adds domain extension files when specialized cabinet members
120
+ are adopted.
121
+ - **/cor-upgrade** can migrate a monolithic `_briefing.md` into the
122
+ split format.
123
+
124
+ ## Backward Compatibility
125
+
126
+ The old monolithic `_briefing.md` format still works. If a cabinet member
127
+ declares briefing files in its frontmatter but those files don't exist,
128
+ or if no `briefing` field is present, the system falls back to reading
129
+ `_briefing.md` directly. This means:
130
+
131
+ - Existing projects with a monolithic `_briefing.md` continue to work
132
+ without changes.
133
+ - Projects can migrate incrementally — split out one file at a time.
134
+ - `/cor-upgrade` handles the full migration when the project is ready.
135
+
136
+ ## Finding Format
137
+
138
+ When producing audit findings, use this structure:
139
+
140
+ ```yaml
141
+ finding:
142
+ cabinet-member: member-name
143
+ severity: critical | significant | minor | informational
144
+ category: what domain this falls under
145
+ description: what was found
146
+ evidence: specific file:line or observation
147
+ recommendation: what to do about it
148
+ ```
@@ -0,0 +1,18 @@
1
+ # Work Tracking — [Your Project Name]
2
+
3
+ How this project tracks planned work. Skills that manage work items
4
+ (/plan, /execute, /orient, /debrief) reference this file.
5
+
6
+ ## Work Item Storage
7
+ *Where work items live.*
8
+ *Example: SQLite `tasks` table, or `backlog.md`, or GitHub Issues*
9
+
10
+ ## Query Interface
11
+ *How to search open items.*
12
+ *Example: `sqlite3 project.db "SELECT * FROM tasks WHERE status != 'done'"`*
13
+ *Example: `gh issue list --state open --json number,title`*
14
+
15
+ ## Mutation Interface
16
+ *How to create, update, and close items.*
17
+ *Example: `POST /api/tasks` with JSON body*
18
+ *Example: `gh issue create --title "..." --body "..."`*
@@ -0,0 +1,49 @@
1
+ # Cabinet member committees — canonical mapping of committee slugs to cabinet member lists.
2
+ #
3
+ # Copy this file as committees.yaml and uncomment/customize the committees that
4
+ # match your project. Technology choices imply expertise needs: if your
5
+ # project has a UI, you need a UI committee. If it has an API, you need
6
+ # a system health committee.
7
+ #
8
+ # Consumed by:
9
+ # - /audit skill (committee selection menu)
10
+ # - scripts (--committee flag, resolve-committees helpers)
11
+ #
12
+ # Cross-portfolio cabinet members are NOT in any committee. They activate via
13
+ # standing-mandate in their SKILL.md frontmatter:
14
+ # anti-confirmation, qa, debugger, organized-mind
15
+
16
+ committees:
17
+ # -- If your project has a user interface --
18
+ # ux:
19
+ # name: "UX & Design"
20
+ # members:
21
+ # - accessibility
22
+ # - small-screen
23
+ # # Add framework-specific cabinet members (e.g., mantine-quality, tailwind-quality)
24
+
25
+ # -- If your project has code (most projects) --
26
+ # code:
27
+ # name: "Code Quality"
28
+ # members:
29
+ # - technical-debt
30
+ # - boundary-man
31
+ # # Add: architecture (if multi-layer system)
32
+
33
+ # -- If your project has an API, database, or infrastructure --
34
+ # health:
35
+ # name: "System Health"
36
+ # members:
37
+ # - security
38
+ # - data-integrity
39
+ # - speed-freak
40
+ # # Add: sync-health (if remote sync), deployment (if CI/CD)
41
+
42
+ # -- If your project has established process/methodology --
43
+ # process:
44
+ # name: "Process & Meta"
45
+ # members:
46
+ # - workflow-cop
47
+ # - record-keeper
48
+ # - process-therapist
49
+ # - cor-health # CoR adoption health, configuration drift, anti-bloat
@@ -0,0 +1,240 @@
1
+ # Cabinet Member Composition Patterns
2
+
3
+ Shared reference for how cabinet members combine during skill execution.
4
+ Adapted from cc-thinking-skills' 5 combination patterns. This isn't
5
+ theory — every pattern should have at least one working example in
6
+ your system before you consider it proven.
7
+
8
+ ## The Five Patterns
9
+
10
+ ### 1. Parallel
11
+
12
+ **When:** Independent evaluations that should not influence each other.
13
+ Each cabinet member gets a clean context window with the same input data.
14
+ Results are collected and synthesized by the consuming skill.
15
+
16
+ **How it works:** The orchestrating skill (e.g., audit or plan) spawns
17
+ one agent per cabinet member in a single message. Each cabinet member analyzes
18
+ independently. Findings are merged. No cabinet member sees another's output.
19
+
20
+ **Risk:** Contradictory findings. Two cabinet members may flag the same
21
+ area with opposite recommendations (e.g., architecture says "split
22
+ this component" while usability says "keep it unified for simplicity").
23
+
24
+ **Mitigation:** The consuming skill synthesizes contradictions and
25
+ presents them to the user as a tension to resolve, not a bug. The
26
+ synthesis should name both cabinet members and their reasoning.
27
+
28
+ **Implementation:** Use the Agent tool with multiple agents in a
29
+ single message. Each agent receives: shared briefing (`_briefing.md`) +
30
+ cabinet member SKILL.md + output contract + input data.
31
+
32
+ ---
33
+
34
+ ### 2. Sequential
35
+
36
+ **When:** Ordered evaluation where later steps depend on earlier results.
37
+ One cabinet member's output becomes input (or gating condition) for the next.
38
+
39
+ **How it works:** The orchestrating skill runs cabinet members in order.
40
+ If the first returns "block," execution stops. Otherwise, its output
41
+ feeds into the next cabinet member's prompt as context.
42
+
43
+ **Example:** Execution checkpoints. Pre-implementation review runs first.
44
+ If all continue, implementation proceeds. Per-file-group review runs
45
+ during implementation. Pre-commit sweep runs after all implementation,
46
+ re-checking earlier concerns in aggregate.
47
+
48
+ **Example:** System diagnosis — debugger maps the dependency chain first,
49
+ then technical-debt evaluates the code quality of the dependencies the
50
+ debugger identified, then historian checks whether any of these
51
+ dependencies have caused issues before.
52
+
53
+ **Risk:** Anchoring. The first cabinet member's framing can bias later
54
+ cabinet members. If the debugger says "the problem is in the database
55
+ layer," technical-debt may focus exclusively on database code and
56
+ miss the real issue in the API layer.
57
+
58
+ **Mitigation:** Later cabinet members should receive the prior output as
59
+ context but with an explicit instruction: "The previous analysis found
60
+ X. You may agree, disagree, or identify issues the prior analysis missed.
61
+ Do not limit your scope to what was already identified."
62
+
63
+ **Implementation:** Sequential Agent calls — launch the first, wait for
64
+ result, include result in the second agent's prompt, etc.
65
+
66
+ ---
67
+
68
+ ### 3. Adversarial
69
+
70
+ **When:** High-stakes decisions where confirmation bias is likely. One
71
+ cabinet member is explicitly tasked with challenging another's conclusions.
72
+
73
+ **How it works:** Anti-confirmation (or a similar meta-cognitive
74
+ cabinet member) activates alongside domain cabinet members. It challenges the
75
+ reasoning quality of the plan itself AND of the other cabinet members'
76
+ critiques — asking what would make this wrong, what alternatives were
77
+ dismissed too quickly, where consensus formed before dissent was heard.
78
+
79
+ **When to use vs. not use:**
80
+ - USE when: redesigning a core system, choosing between approaches,
81
+ making an irreversible architectural decision, deferring significant
82
+ work (is the deferral justified or avoidant?)
83
+ - SKIP when: routine implementation, bug fixes, documentation updates,
84
+ trivial configuration changes
85
+
86
+ **Risk:** Slowness. Adversarial composition takes longer and can feel
87
+ obstructive when the decision is actually straightforward.
88
+
89
+ **Mitigation:** Topic-based activation (anti-confirmation only fires on
90
+ high-stakes topics). The adversarial cabinet member should have a hard
91
+ boundary: it challenges reasoning quality, not domain conclusions.
92
+
93
+ **Implementation:** Include the adversarial cabinet member in the parallel
94
+ agent batch alongside domain cabinet members. Its prompt explicitly states:
95
+ "Your job is to challenge the reasoning, not the domain conclusions.
96
+ Focus on: premature consensus, dismissed alternatives, unstated
97
+ assumptions, confirmation bias in the plan or in other critiques."
98
+
99
+ ---
100
+
101
+ ### 4. Nested
102
+
103
+ **When:** A cabinet member needs another cabinet member's analysis as input
104
+ to do its own work. One cabinet member consults another mid-evaluation.
105
+
106
+ **How it works:** A cabinet member running in the main context (session-modal,
107
+ needs full conversation history) references another cabinet member's known
108
+ findings — from memory, from audit history, or from prior session output.
109
+
110
+ **Example:** During debrief, a historian cabinet member is activated to check:
111
+ "Has this kind of change been done before? What happened? Are there
112
+ lessons from prior sessions relevant to what was just completed?"
113
+
114
+ **Example:** During planning, the organized-mind cabinet member might need
115
+ the historian's input: "Has this kind of information architecture been
116
+ tried before? What was the user's reaction?"
117
+
118
+ **Risk:** Deep nesting creates long dependency chains that are slow and
119
+ fragile. A three-level nest (A calls B calls C) means C's context
120
+ must include A's original input plus B's intermediate output — context
121
+ window pressure.
122
+
123
+ **Mitigation:** Limit to one level of nesting. If cabinet member A needs B's
124
+ output, A can reference B's known findings (from memory, from audit
125
+ history). It should NOT spawn B as a sub-agent. The consuming skill is
126
+ responsible for orchestrating multi-cabinet-member flows, not individual
127
+ cabinet members.
128
+
129
+ **Implementation:** The nested cabinet member runs in the main context
130
+ (session-modal) rather than as a parallel agent. This is the exception,
131
+ not the rule — most cabinet members run in clean parallel contexts.
132
+
133
+ ---
134
+
135
+ ### 5. Temporal
136
+
137
+ **When:** The same domain needs different evaluation at different lifecycle
138
+ stages. A cabinet member that applies during planning applies differently
139
+ during execution and differently again during audit.
140
+
141
+ **How it works:** Same cabinet member, different output contracts at different
142
+ lifecycle stages. The orchestrating skill passes the appropriate contract.
143
+
144
+ **Example:** QA cabinet member across the lifecycle:
145
+ - During planning: evaluates acceptance criteria quality (are they testable?
146
+ do they have [auto]/[manual]/[deferred] tags? are edge cases covered?)
147
+ - During execution: active testing (runs [auto] criteria, verifies
148
+ [manual] criteria via preview tools, documents [deferred])
149
+ - During debrief: produces QA report summarizing what was verified,
150
+ what failed, what's still unverified
151
+
152
+ Same cabinet member, three different output contracts, three different
153
+ points in the lifecycle.
154
+
155
+ **Example:** Security cabinet member:
156
+ - During planning: evaluates whether the plan introduces attack surface
157
+ (new endpoints, auth changes, input handling)
158
+ - During execution: reviews the actual code for OWASP vulnerabilities
159
+ before implementation proceeds
160
+ - During audit: scans deployed code for security issues
161
+
162
+ **Risk:** Criteria drift between stages. If the QA cabinet member defines
163
+ "testable AC" differently during planning than execution expects, the
164
+ execute phase will struggle with criteria that looked good during
165
+ planning but are actually unverifiable.
166
+
167
+ **Mitigation:** Output contracts define what each cabinet member produces
168
+ at each stage. The contracts are explicit — a cabinet member reading its
169
+ contract knows exactly what's expected.
170
+
171
+ **Implementation:** Same cabinet member SKILL.md, different output contract
172
+ per consuming skill. The consuming skill passes the appropriate contract
173
+ to the agent prompt.
174
+
175
+ ---
176
+
177
+ ## Pre-Built Recipes
178
+
179
+ Recipes are named combinations for common situations. The consuming
180
+ skill selects a recipe based on context, then activates the listed
181
+ cabinet members using the appropriate pattern.
182
+
183
+ ### Committee Audit
184
+
185
+ **When:** Scoped audit of a specific domain (UX, code quality, etc.).
186
+ **Pattern:** Parallel
187
+ **Cabinet Members:** All cabinet members in the selected committee(s) from your
188
+ project's committee configuration.
189
+ **Why this combination:** Committees are pre-curated sets of related
190
+ cabinet members. Running a committee audit gives thorough coverage of one
191
+ domain without the cost/time of a full audit.
192
+
193
+ ### High-Stakes Decision
194
+
195
+ **When:** Architectural redesign, technology choice, significant deferral.
196
+ **Pattern:** Parallel + Adversarial
197
+ **Cabinet Members:** anti-confirmation + architecture + historian + goal-alignment
198
+ **Why this combination:** Architecture evaluates technical fitness.
199
+ Goal-alignment checks strategic fit. Historian surfaces past precedent.
200
+ Anti-confirmation stress-tests the reasoning behind all three.
201
+
202
+ ### New Feature Planning
203
+
204
+ **When:** Adding user-visible functionality with UI + API changes.
205
+ **Pattern:** Parallel (with design committee for UI)
206
+ **Cabinet Members:** security + architecture + organized-mind + qa +
207
+ any domain-specific UI cabinet members
208
+ **Why this combination:** Security catches attack surface. Architecture
209
+ evaluates system fit. Organized-mind checks cognitive load. QA evaluates
210
+ AC quality. UI cabinet members critique the interaction model.
211
+
212
+ ### System Diagnosis
213
+
214
+ **When:** Something is broken or degrading and the root cause is unclear.
215
+ **Pattern:** Sequential (debugger first, then technical-debt, then historian)
216
+ **Cabinet Members:** debugger → technical-debt → historian
217
+ **Why this combination:** Debugger maps the dependency chain and identifies
218
+ the failure point. Technical-debt evaluates whether the failure point
219
+ is symptomatic of deeper code quality issues. Historian checks whether
220
+ this failure pattern has occurred before and what fixed it last time.
221
+
222
+ ### Prompt Refinement
223
+
224
+ **When:** Improving skill definitions or cabinet member definitions.
225
+ **Pattern:** Parallel
226
+ **Cabinet Members:** roster-check + process-therapist + organized-mind
227
+ **Why this combination:** Roster-check evaluates whether the skill
228
+ covers its full scope. Process-therapist checks whether the skill follows
229
+ established patterns. Organized-mind evaluates whether the skill's
230
+ structure is cognitively navigable for a fresh session.
231
+
232
+ ### Post-Execution Review
233
+
234
+ **When:** Debrief after completing implementation work.
235
+ **Pattern:** Nested (session-modal)
236
+ **Cabinet Members:** historian + any lifecycle-tracking cabinet members + qa
237
+ **Why this combination:** Historian records what was done and checks for
238
+ lessons learned. Lifecycle cabinet members capture relevant non-dev items.
239
+ QA produces the final verification report. All need session context,
240
+ so they run in the main context.
@@ -0,0 +1,208 @@
1
+ # Skill Effectiveness Assessment Protocol
2
+
3
+ Shared reference for evaluating whether skills and cabinet members are doing
4
+ their job. Adapted from Anthropic's skill-creator eval framework for manual
5
+ assessment. This is not an automated test suite — it's a structured way to
6
+ ask "is this skill working?" and get an evidence-based answer.
7
+
8
+ ## When This Runs
9
+
10
+ Two trigger mechanisms ensure this protocol doesn't just sit as a document:
11
+
12
+ 1. **Pull (via prompt refinement):** When reviewing a skill's definition,
13
+ run the assessment before proposing changes. The assessment grounds the
14
+ refinement in evidence rather than intuition.
15
+
16
+ 2. **Push (via process-therapist):** During audits, process-therapist checks whether
17
+ any skill's last assessment is older than 30 days. If so, it surfaces
18
+ an "eval overdue" finding, which enters the normal triage flow. The
19
+ user decides whether to act on it.
20
+
21
+ ## The Assessment Framework
22
+
23
+ ### 1. Define Assertions
24
+
25
+ An assertion is a testable claim about what a skill should produce.
26
+ Each assertion has three fields:
27
+
28
+ ```
29
+ { "text": "what is being tested",
30
+ "passed": true/false,
31
+ "evidence": "supporting data or reasoning" }
32
+ ```
33
+
34
+ **Types of assertions:**
35
+
36
+ - **Behavioral:** Does the skill produce the right actions?
37
+ - Example: "The planning skill produces AC with [auto]/[manual]/[deferred] tags"
38
+ - Example: "The orient skill surfaces overdue items before suggesting focus"
39
+
40
+ - **Quality:** Is the output at the right level?
41
+ - Example: "The inbox skill asks before routing ambiguous items"
42
+ - Example: "QA cabinet member catches [auto] AC failures before commit"
43
+
44
+ - **Coverage:** Does the skill handle its full scope?
45
+ - Example: "The execute skill runs all checkpoint types (pre, per-group, pre-commit)"
46
+ - Example: "Roster-check detects drift between skill definition and documentation"
47
+
48
+ - **Boundary:** Does the skill stay in its portfolio?
49
+ - Example: "Organized-mind flags cognitive load but does not suggest UI framework components"
50
+ - Example: "Anti-confirmation challenges reasoning quality without developing domain arguments"
51
+
52
+ **How many assertions per skill:** 5-8 for core skills (planning,
53
+ execution, orientation, inbox processing). 3-5 for cabinet members. More
54
+ isn't better — each assertion should test something meaningfully different.
55
+
56
+ ### 2. Sample Past Executions
57
+
58
+ Use conversation history and memory to find evidence:
59
+
60
+ - Find recent sessions where a skill was invoked
61
+ - Find sessions where a cabinet member activated
62
+ - Find sessions where something went wrong
63
+
64
+ Also check:
65
+ - Memory files for feedback corrections (these are failed assertions)
66
+ - Git history for reverted changes (execution failures)
67
+ - Audit triage history for rejected findings (cabinet member miscalibration)
68
+
69
+ **Sample size:** 3-5 recent executions is sufficient for manual assessment.
70
+ If a skill hasn't been invoked 3 times in the last month, that itself is
71
+ a finding (coverage gap or trigger problem).
72
+
73
+ ### 3. Score Each Assertion
74
+
75
+ For each assertion, review the sampled executions and score:
76
+
77
+ | Score | Meaning |
78
+ |-------|---------|
79
+ | **pass** | Assertion holds in all sampled executions |
80
+ | **partial** | Assertion holds in some but not all executions |
81
+ | **fail** | Assertion does not hold in any sampled execution |
82
+ | **untestable** | Not enough data to evaluate (note why) |
83
+
84
+ Record the evidence for each score — a pass without evidence is
85
+ an assumption, not a finding.
86
+
87
+ ### 4. Aggregate and Interpret
88
+
89
+ ```
90
+ ## Assessment: /plan — 2026-03-22
91
+
92
+ Assertions: 6 total
93
+ - Pass: 4 (67%)
94
+ - Partial: 1 (17%)
95
+ - Fail: 1 (17%)
96
+ - Untestable: 0
97
+
98
+ Pass rate: 67% (4/6 testable)
99
+
100
+ ### Findings
101
+ - PASS: Produces AC with [auto]/[manual]/[deferred] tags (5/5 sampled plans)
102
+ - PASS: Surface area includes all implementation files (4/5 — one missed shared entry point)
103
+ - PASS: Presents plan for user approval before creating action (5/5)
104
+ - PASS: Runs cabinet member critique before presenting (4/5 — skipped once for trivial plan)
105
+ - PARTIAL: Plans deliver complete features (3/5 — two plans had infrastructure-only steps)
106
+ - FAIL: Plan notes persist reasoning (1/5 — four plans had thin "why" sections)
107
+ ```
108
+
109
+ **Interpretation guide:**
110
+
111
+ | Pass rate | Health | Action |
112
+ |-----------|--------|--------|
113
+ | 80-100% | Healthy | Monitor. Note partial assertions for refinement. |
114
+ | 60-79% | Degrading | Investigate failing assertions. Are they skill design issues or execution drift? Propose targeted refinements. |
115
+ | Below 60% | Unhealthy | The skill needs significant revision. Root-cause analysis before patching. |
116
+
117
+ ### 5. Track Over Time
118
+
119
+ Append each assessment to a tracking section at the bottom of
120
+ the skill's SKILL.md (or in a separate ASSESSMENTS.md if the skill
121
+ is approaching the 500-line limit):
122
+
123
+ ```markdown
124
+ ## Assessment Log
125
+
126
+ ### 2026-03-22 — Pass rate: 67% (4/6)
127
+ Key finding: Plan notes don't persist reasoning. "Why" sections thin.
128
+ Action taken: Added emphasis in Step 2 template + calibration example.
129
+
130
+ ### 2026-04-15 — Pass rate: 83% (5/6)
131
+ Improvement: Reasoning persistence now at 4/5. Remaining partial: feature
132
+ completeness — one plan delivered infrastructure step without wiring.
133
+ ```
134
+
135
+ This creates a trend line. A declining pass rate across assessments
136
+ signals systemic drift. An improving rate confirms refinements are working.
137
+
138
+ ## Example: Complete Assessment
139
+
140
+ *(Adapted from a reference implementation's orient skill assessment)*
141
+
142
+ ```
143
+ ## Assessment: /orient — 2026-03-22
144
+ Sampled: 5 recent sessions (3 standard, 1 morning, 1 user-requested)
145
+
146
+ ### Assertions
147
+
148
+ 1. BEHAVIORAL: Pulls fresh data before presenting briefing
149
+ Score: pass
150
+ Evidence: All 5 sessions ran data sync in Step 2.
151
+
152
+ 2. BEHAVIORAL: Surfaces overdue items with severity indication
153
+ Score: pass
154
+ Evidence: 4/5 sessions had overdue items; all were surfaced with
155
+ due dates. One session had no overdue items (correctly omitted).
156
+
157
+ 3. BEHAVIORAL: Mentions inbox counts (main + sub-inboxes)
158
+ Score: partial
159
+ Evidence: 4/5 sessions showed inbox counts. One session showed main
160
+ inbox count but omitted sub-inbox counts (had 2 items in a sub-inbox
161
+ that went unmentioned).
162
+
163
+ 4. QUALITY: Suggested focus grounded in data, not defaults
164
+ Score: pass
165
+ Evidence: All 5 suggestions referenced specific items (deadlines,
166
+ project momentum, inbox counts). None defaulted to "continue last
167
+ session's work" without evidence.
168
+
169
+ 5. COVERAGE: Morning mode includes calendar + completions
170
+ Score: untestable
171
+ Evidence: Only 1 morning-mode session in sample. Calendar was shown.
172
+ Completions section was present but sparse. Need more morning-mode
173
+ samples to evaluate reliably.
174
+
175
+ 6. BOUNDARY: Does not prescribe what to work on
176
+ Score: pass
177
+ Evidence: All 5 sessions ended with "What feels right?" or similar.
178
+ None said "You should work on X."
179
+
180
+ ### Summary
181
+ Assertions: 6 total
182
+ - Pass: 4 (67%), Partial: 1 (17%), Fail: 0, Untestable: 1 (17%)
183
+ - Pass rate: 80% (4/5 testable)
184
+ - Health: Healthy
185
+
186
+ ### Actions
187
+ - Sub-inbox counts: add explicit check to orient phase (address partial)
188
+ - Morning mode: defer re-assessment until 3+ morning sessions available
189
+ ```
190
+
191
+ ## For Cabinet Members (Compressed Format)
192
+
193
+ Cabinet members are simpler — they produce findings in a structured format.
194
+ Assessment focuses on:
195
+
196
+ 1. **Signal quality:** Are findings actionable or noise?
197
+ - Check audit triage history: what percentage were accepted vs. rejected?
198
+ - A >50% rejection rate means the cabinet member is miscalibrated.
199
+
200
+ 2. **Portfolio discipline:** Does the cabinet member stay in its domain?
201
+ - Check findings for cross-portfolio observations that duplicate other cabinet members.
202
+
203
+ 3. **Activation accuracy:** Does it fire when relevant and stay quiet when not?
204
+ - Check: did it activate for files/topics outside its declared scope?
205
+ - Check: were there situations where it should have activated but didn't?
206
+
207
+ Three assertions per cabinet member is sufficient. Use the same
208
+ pass/partial/fail/untestable scoring.