create-claude-cabinet 0.6.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/LICENSE +21 -0
- package/README.md +196 -0
- package/bin/create-claude-cabinet.js +8 -0
- package/lib/cli.js +624 -0
- package/lib/copy.js +152 -0
- package/lib/db-setup.js +51 -0
- package/lib/metadata.js +42 -0
- package/lib/reset.js +193 -0
- package/lib/settings-merge.js +93 -0
- package/package.json +29 -0
- package/templates/EXTENSIONS.md +311 -0
- package/templates/README.md +485 -0
- package/templates/briefing/_briefing-api-template.md +21 -0
- package/templates/briefing/_briefing-architecture-template.md +16 -0
- package/templates/briefing/_briefing-cabinet-template.md +20 -0
- package/templates/briefing/_briefing-identity-template.md +18 -0
- package/templates/briefing/_briefing-scopes-template.md +39 -0
- package/templates/briefing/_briefing-template.md +148 -0
- package/templates/briefing/_briefing-work-tracking-template.md +18 -0
- package/templates/cabinet/committees-template.yaml +49 -0
- package/templates/cabinet/composition-patterns.md +240 -0
- package/templates/cabinet/eval-protocol.md +208 -0
- package/templates/cabinet/lifecycle.md +93 -0
- package/templates/cabinet/output-contract.md +148 -0
- package/templates/cabinet/prompt-guide.md +266 -0
- package/templates/hooks/cor-upstream-guard.sh +79 -0
- package/templates/hooks/git-guardrails.sh +67 -0
- package/templates/hooks/skill-telemetry.sh +66 -0
- package/templates/hooks/skill-tool-telemetry.sh +54 -0
- package/templates/hooks/stop-hook.md +56 -0
- package/templates/memory/patterns/_pattern-template.md +119 -0
- package/templates/memory/patterns/pattern-intelligence-first.md +41 -0
- package/templates/rules/enforcement-pipeline.md +151 -0
- package/templates/scripts/cor-drift-check.cjs +84 -0
- package/templates/scripts/finding-schema.json +94 -0
- package/templates/scripts/load-triage-history.js +151 -0
- package/templates/scripts/merge-findings.js +126 -0
- package/templates/scripts/pib-db-schema.sql +68 -0
- package/templates/scripts/pib-db.js +365 -0
- package/templates/scripts/triage-server.mjs +98 -0
- package/templates/scripts/triage-ui.html +536 -0
- package/templates/skills/audit/SKILL.md +273 -0
- package/templates/skills/audit/phases/finding-output.md +56 -0
- package/templates/skills/audit/phases/member-execution.md +83 -0
- package/templates/skills/audit/phases/member-selection.md +44 -0
- package/templates/skills/audit/phases/structural-checks.md +54 -0
- package/templates/skills/audit/phases/triage-history.md +45 -0
- package/templates/skills/cabinet-accessibility/SKILL.md +180 -0
- package/templates/skills/cabinet-anti-confirmation/SKILL.md +172 -0
- package/templates/skills/cabinet-architecture/SKILL.md +279 -0
- package/templates/skills/cabinet-boundary-man/SKILL.md +265 -0
- package/templates/skills/cabinet-cor-health/SKILL.md +342 -0
- package/templates/skills/cabinet-data-integrity/SKILL.md +157 -0
- package/templates/skills/cabinet-debugger/SKILL.md +221 -0
- package/templates/skills/cabinet-historian/SKILL.md +253 -0
- package/templates/skills/cabinet-organized-mind/SKILL.md +338 -0
- package/templates/skills/cabinet-process-therapist/SKILL.md +261 -0
- package/templates/skills/cabinet-qa/SKILL.md +205 -0
- package/templates/skills/cabinet-record-keeper/SKILL.md +168 -0
- package/templates/skills/cabinet-roster-check/SKILL.md +297 -0
- package/templates/skills/cabinet-security/SKILL.md +181 -0
- package/templates/skills/cabinet-small-screen/SKILL.md +154 -0
- package/templates/skills/cabinet-speed-freak/SKILL.md +169 -0
- package/templates/skills/cabinet-system-advocate/SKILL.md +194 -0
- package/templates/skills/cabinet-technical-debt/SKILL.md +115 -0
- package/templates/skills/cabinet-usability/SKILL.md +189 -0
- package/templates/skills/cabinet-workflow-cop/SKILL.md +238 -0
- package/templates/skills/cor-upgrade/SKILL.md +302 -0
- package/templates/skills/debrief/SKILL.md +409 -0
- package/templates/skills/debrief/phases/auto-maintenance.md +48 -0
- package/templates/skills/debrief/phases/close-work.md +88 -0
- package/templates/skills/debrief/phases/health-checks.md +54 -0
- package/templates/skills/debrief/phases/inventory.md +40 -0
- package/templates/skills/debrief/phases/loose-ends.md +52 -0
- package/templates/skills/debrief/phases/record-lessons.md +67 -0
- package/templates/skills/debrief/phases/report.md +59 -0
- package/templates/skills/debrief/phases/update-state.md +48 -0
- package/templates/skills/debrief/phases/upstream-feedback.md +129 -0
- package/templates/skills/debrief-quick/SKILL.md +12 -0
- package/templates/skills/execute/SKILL.md +293 -0
- package/templates/skills/execute/phases/cabinet.md +49 -0
- package/templates/skills/execute/phases/commit-and-deploy.md +66 -0
- package/templates/skills/execute/phases/load-plan.md +49 -0
- package/templates/skills/execute/phases/validators.md +50 -0
- package/templates/skills/execute/phases/verification-tools.md +67 -0
- package/templates/skills/extract/SKILL.md +168 -0
- package/templates/skills/investigate/SKILL.md +160 -0
- package/templates/skills/link/SKILL.md +52 -0
- package/templates/skills/menu/SKILL.md +61 -0
- package/templates/skills/onboard/SKILL.md +356 -0
- package/templates/skills/onboard/phases/detect-state.md +79 -0
- package/templates/skills/onboard/phases/generate-briefing.md +127 -0
- package/templates/skills/onboard/phases/generate-session-loop.md +87 -0
- package/templates/skills/onboard/phases/interview.md +233 -0
- package/templates/skills/onboard/phases/modularity-menu.md +162 -0
- package/templates/skills/onboard/phases/options.md +98 -0
- package/templates/skills/onboard/phases/post-onboard-audit.md +121 -0
- package/templates/skills/onboard/phases/summary.md +122 -0
- package/templates/skills/onboard/phases/work-tracking.md +231 -0
- package/templates/skills/orient/SKILL.md +251 -0
- package/templates/skills/orient/phases/auto-maintenance.md +48 -0
- package/templates/skills/orient/phases/briefing.md +53 -0
- package/templates/skills/orient/phases/cabinet.md +46 -0
- package/templates/skills/orient/phases/context.md +63 -0
- package/templates/skills/orient/phases/data-sync.md +35 -0
- package/templates/skills/orient/phases/health-checks.md +50 -0
- package/templates/skills/orient/phases/work-scan.md +69 -0
- package/templates/skills/orient-quick/SKILL.md +12 -0
- package/templates/skills/plan/SKILL.md +358 -0
- package/templates/skills/plan/phases/cabinet-critique.md +47 -0
- package/templates/skills/plan/phases/calibration-examples.md +75 -0
- package/templates/skills/plan/phases/completeness-check.md +44 -0
- package/templates/skills/plan/phases/composition-check.md +36 -0
- package/templates/skills/plan/phases/overlap-check.md +62 -0
- package/templates/skills/plan/phases/plan-template.md +69 -0
- package/templates/skills/plan/phases/present.md +60 -0
- package/templates/skills/plan/phases/research.md +43 -0
- package/templates/skills/plan/phases/work-tracker.md +95 -0
- package/templates/skills/publish/SKILL.md +74 -0
- package/templates/skills/pulse/SKILL.md +242 -0
- package/templates/skills/pulse/phases/auto-fix-scope.md +40 -0
- package/templates/skills/pulse/phases/checks.md +58 -0
- package/templates/skills/pulse/phases/output.md +54 -0
- package/templates/skills/seed/SKILL.md +257 -0
- package/templates/skills/seed/phases/build-member.md +93 -0
- package/templates/skills/seed/phases/evaluate-existing.md +61 -0
- package/templates/skills/seed/phases/maintain.md +92 -0
- package/templates/skills/seed/phases/scan-signals.md +86 -0
- package/templates/skills/triage-audit/SKILL.md +251 -0
- package/templates/skills/triage-audit/phases/apply-verdicts.md +90 -0
- package/templates/skills/triage-audit/phases/load-findings.md +38 -0
- package/templates/skills/triage-audit/phases/triage-ui.md +66 -0
- package/templates/skills/unlink/SKILL.md +35 -0
- package/templates/skills/validate/SKILL.md +116 -0
- package/templates/skills/validate/phases/validators.md +53 -0
|
@@ -0,0 +1,148 @@
|
|
|
1
|
+
# Briefing File System — Guide
|
|
2
|
+
|
|
3
|
+
The cabinet system uses **split briefing files** instead of a single
|
|
4
|
+
monolithic `_briefing.md`. Each file focuses on one domain of project
|
|
5
|
+
knowledge, and cabinet members declare which files they need in their
|
|
6
|
+
frontmatter. This keeps briefing loading focused — a cabinet member that
|
|
7
|
+
only needs identity and paths doesn't load API configuration or work
|
|
8
|
+
tracking details.
|
|
9
|
+
|
|
10
|
+
## Architecture
|
|
11
|
+
|
|
12
|
+
A **hub file** (`_briefing.md`) indexes the focused briefing files that
|
|
13
|
+
exist for this project. Cabinet members read the specific files they need
|
|
14
|
+
rather than parsing one large document.
|
|
15
|
+
|
|
16
|
+
```
|
|
17
|
+
_briefing.md ← Hub/index (always exists)
|
|
18
|
+
_briefing-identity.md ← What the project is (always exists)
|
|
19
|
+
_briefing-architecture.md ← System structure, codebase layout
|
|
20
|
+
_briefing-scopes.md ← Where to look (paths)
|
|
21
|
+
_briefing-cabinet.md ← Active cabinet members, portfolio rules
|
|
22
|
+
_briefing-work-tracking.md ← Work item storage and interfaces
|
|
23
|
+
_briefing-api.md ← API config, entity types
|
|
24
|
+
_briefing-{domain}.md ← Domain extensions (see below)
|
|
25
|
+
```
|
|
26
|
+
|
|
27
|
+
## File Descriptions
|
|
28
|
+
|
|
29
|
+
### `_briefing.md` — Hub/Index
|
|
30
|
+
**Always created.** Lists which briefing files exist and a one-line
|
|
31
|
+
summary of each. This is what cabinet members fall back to if they don't
|
|
32
|
+
declare specific briefing needs.
|
|
33
|
+
|
|
34
|
+
### `_briefing-identity.md` — Project Identity
|
|
35
|
+
**Always created.** What the project is, core principles, user context.
|
|
36
|
+
Every cabinet member needs this — it calibrates all findings. Template:
|
|
37
|
+
`_briefing-identity-template.md`.
|
|
38
|
+
|
|
39
|
+
### `_briefing-architecture.md` — Architecture
|
|
40
|
+
System structure, codebase layout, technology stack. Needed by
|
|
41
|
+
cabinet members that evaluate code structure or need to understand where
|
|
42
|
+
things live. Template: `_briefing-architecture-template.md`.
|
|
43
|
+
|
|
44
|
+
### `_briefing-scopes.md` — Paths
|
|
45
|
+
Where to look for different kinds of code and configuration. Sections
|
|
46
|
+
are referenced by name (e.g., "App Source", "Data Store"). Only fill in
|
|
47
|
+
sections relevant to the cabinet members you adopt. Template:
|
|
48
|
+
`_briefing-scopes-template.md`.
|
|
49
|
+
|
|
50
|
+
### `_briefing-cabinet.md` — Cabinet
|
|
51
|
+
Which cabinet members are active, portfolio rules, invocation patterns.
|
|
52
|
+
Needed by meta cabinet members that evaluate the cabinet system itself.
|
|
53
|
+
Template: `_briefing-cabinet-template.md`.
|
|
54
|
+
|
|
55
|
+
### `_briefing-work-tracking.md` — Work Tracking
|
|
56
|
+
How the project tracks planned work — storage, query interface,
|
|
57
|
+
mutation interface. Referenced by /plan, /execute, /orient, /debrief.
|
|
58
|
+
Template: `_briefing-work-tracking-template.md`.
|
|
59
|
+
|
|
60
|
+
### `_briefing-api.md` — API Configuration
|
|
61
|
+
Endpoints, auth, entity types. Only create this if the project has an
|
|
62
|
+
API. Template: `_briefing-api-template.md`.
|
|
63
|
+
|
|
64
|
+
## Cabinet Member-to-Briefing Mapping
|
|
65
|
+
|
|
66
|
+
Which cabinet members need which briefing files (identity is always loaded):
|
|
67
|
+
|
|
68
|
+
| Cabinet Member | architecture | scopes | cabinet | work-tracking | api |
|
|
69
|
+
|-----------------------|:---:|:---:|:---:|:---:|:---:|
|
|
70
|
+
| accessibility | | x | | | |
|
|
71
|
+
| anti-confirmation | | | | | |
|
|
72
|
+
| architecture | x | x | | | |
|
|
73
|
+
| boundary-man | x | x | | | |
|
|
74
|
+
| cor-health | | x | x | | |
|
|
75
|
+
| data-integrity | x | x | | | x |
|
|
76
|
+
| debugger | x | | | | |
|
|
77
|
+
| record-keeper | | x | | | |
|
|
78
|
+
| historian | x | | | | |
|
|
79
|
+
| process-therapist | | x | x | | |
|
|
80
|
+
| small-screen | | x | | | |
|
|
81
|
+
| organized-mind | x | | | | |
|
|
82
|
+
| speed-freak | x | x | | | |
|
|
83
|
+
| workflow-cop | | x | | | |
|
|
84
|
+
| qa | x | x | | | |
|
|
85
|
+
| security | x | x | | | x |
|
|
86
|
+
| roster-check | | | x | | |
|
|
87
|
+
| system-advocate | | | x | | |
|
|
88
|
+
| technical-debt | x | | | | |
|
|
89
|
+
| usability | | x | | | |
|
|
90
|
+
|
|
91
|
+
## Domain Extension Files
|
|
92
|
+
|
|
93
|
+
Specialized cabinet members may need domain-specific briefing that doesn't
|
|
94
|
+
fit the standard files. These are created by `/seed` when a specialized
|
|
95
|
+
cabinet member is adopted:
|
|
96
|
+
|
|
97
|
+
- **`_briefing-methodology.md`** — For methodology-compliance or GTD
|
|
98
|
+
cabinet members. Contains methodology rules, review cadences, horizon
|
|
99
|
+
definitions.
|
|
100
|
+
- **`_briefing-design-system.md`** — For framework-quality or
|
|
101
|
+
information-design cabinet members. Contains design tokens, component
|
|
102
|
+
conventions, layout patterns.
|
|
103
|
+
- **Any `_briefing-{domain}.md`** — A cabinet member can declare any briefing
|
|
104
|
+
file it needs in its frontmatter. If the file doesn't exist, the
|
|
105
|
+
cabinet member falls back to the hub.
|
|
106
|
+
|
|
107
|
+
## Files Are Optional
|
|
108
|
+
|
|
109
|
+
Only create briefing files relevant to your project. A CLI tool with no
|
|
110
|
+
UI doesn't need `_briefing-scopes.md` App Source. A project without an
|
|
111
|
+
API skips `_briefing-api.md` entirely. The hub `_briefing.md` lists what
|
|
112
|
+
exists so cabinet members know what's available.
|
|
113
|
+
|
|
114
|
+
## How These Files Get Created
|
|
115
|
+
|
|
116
|
+
- **/onboard** generates the initial set from interview answers. It
|
|
117
|
+
always creates the hub and identity file. Other files are created
|
|
118
|
+
only if the interview produced content for them.
|
|
119
|
+
- **/seed** adds domain extension files when specialized cabinet members
|
|
120
|
+
are adopted.
|
|
121
|
+
- **/cor-upgrade** can migrate a monolithic `_briefing.md` into the
|
|
122
|
+
split format.
|
|
123
|
+
|
|
124
|
+
## Backward Compatibility
|
|
125
|
+
|
|
126
|
+
The old monolithic `_briefing.md` format still works. If a cabinet member
|
|
127
|
+
declares briefing files in its frontmatter but those files don't exist,
|
|
128
|
+
or if no `briefing` field is present, the system falls back to reading
|
|
129
|
+
`_briefing.md` directly. This means:
|
|
130
|
+
|
|
131
|
+
- Existing projects with a monolithic `_briefing.md` continue to work
|
|
132
|
+
without changes.
|
|
133
|
+
- Projects can migrate incrementally — split out one file at a time.
|
|
134
|
+
- `/cor-upgrade` handles the full migration when the project is ready.
|
|
135
|
+
|
|
136
|
+
## Finding Format
|
|
137
|
+
|
|
138
|
+
When producing audit findings, use this structure:
|
|
139
|
+
|
|
140
|
+
```yaml
|
|
141
|
+
finding:
|
|
142
|
+
cabinet-member: member-name
|
|
143
|
+
severity: critical | significant | minor | informational
|
|
144
|
+
category: what domain this falls under
|
|
145
|
+
description: what was found
|
|
146
|
+
evidence: specific file:line or observation
|
|
147
|
+
recommendation: what to do about it
|
|
148
|
+
```
|
|
@@ -0,0 +1,18 @@
|
|
|
1
|
+
# Work Tracking — [Your Project Name]
|
|
2
|
+
|
|
3
|
+
How this project tracks planned work. Skills that manage work items
|
|
4
|
+
(/plan, /execute, /orient, /debrief) reference this file.
|
|
5
|
+
|
|
6
|
+
## Work Item Storage
|
|
7
|
+
*Where work items live.*
|
|
8
|
+
*Example: SQLite `tasks` table, or `backlog.md`, or GitHub Issues*
|
|
9
|
+
|
|
10
|
+
## Query Interface
|
|
11
|
+
*How to search open items.*
|
|
12
|
+
*Example: `sqlite3 project.db "SELECT * FROM tasks WHERE status != 'done'"`*
|
|
13
|
+
*Example: `gh issue list --state open --json number,title`*
|
|
14
|
+
|
|
15
|
+
## Mutation Interface
|
|
16
|
+
*How to create, update, and close items.*
|
|
17
|
+
*Example: `POST /api/tasks` with JSON body*
|
|
18
|
+
*Example: `gh issue create --title "..." --body "..."`*
|
|
@@ -0,0 +1,49 @@
|
|
|
1
|
+
# Cabinet member committees — canonical mapping of committee slugs to cabinet member lists.
|
|
2
|
+
#
|
|
3
|
+
# Copy this file as committees.yaml and uncomment/customize the committees that
|
|
4
|
+
# match your project. Technology choices imply expertise needs: if your
|
|
5
|
+
# project has a UI, you need a UI committee. If it has an API, you need
|
|
6
|
+
# a system health committee.
|
|
7
|
+
#
|
|
8
|
+
# Consumed by:
|
|
9
|
+
# - /audit skill (committee selection menu)
|
|
10
|
+
# - scripts (--committee flag, resolve-committees helpers)
|
|
11
|
+
#
|
|
12
|
+
# Cross-portfolio cabinet members are NOT in any committee. They activate via
|
|
13
|
+
# standing-mandate in their SKILL.md frontmatter:
|
|
14
|
+
# anti-confirmation, qa, debugger, organized-mind
|
|
15
|
+
|
|
16
|
+
committees:
|
|
17
|
+
# -- If your project has a user interface --
|
|
18
|
+
# ux:
|
|
19
|
+
# name: "UX & Design"
|
|
20
|
+
# members:
|
|
21
|
+
# - accessibility
|
|
22
|
+
# - small-screen
|
|
23
|
+
# # Add framework-specific cabinet members (e.g., mantine-quality, tailwind-quality)
|
|
24
|
+
|
|
25
|
+
# -- If your project has code (most projects) --
|
|
26
|
+
# code:
|
|
27
|
+
# name: "Code Quality"
|
|
28
|
+
# members:
|
|
29
|
+
# - technical-debt
|
|
30
|
+
# - boundary-man
|
|
31
|
+
# # Add: architecture (if multi-layer system)
|
|
32
|
+
|
|
33
|
+
# -- If your project has an API, database, or infrastructure --
|
|
34
|
+
# health:
|
|
35
|
+
# name: "System Health"
|
|
36
|
+
# members:
|
|
37
|
+
# - security
|
|
38
|
+
# - data-integrity
|
|
39
|
+
# - speed-freak
|
|
40
|
+
# # Add: sync-health (if remote sync), deployment (if CI/CD)
|
|
41
|
+
|
|
42
|
+
# -- If your project has established process/methodology --
|
|
43
|
+
# process:
|
|
44
|
+
# name: "Process & Meta"
|
|
45
|
+
# members:
|
|
46
|
+
# - workflow-cop
|
|
47
|
+
# - record-keeper
|
|
48
|
+
# - process-therapist
|
|
49
|
+
# - cor-health # CoR adoption health, configuration drift, anti-bloat
|
|
@@ -0,0 +1,240 @@
|
|
|
1
|
+
# Cabinet Member Composition Patterns
|
|
2
|
+
|
|
3
|
+
Shared reference for how cabinet members combine during skill execution.
|
|
4
|
+
Adapted from cc-thinking-skills' 5 combination patterns. This isn't
|
|
5
|
+
theory — every pattern should have at least one working example in
|
|
6
|
+
your system before you consider it proven.
|
|
7
|
+
|
|
8
|
+
## The Five Patterns
|
|
9
|
+
|
|
10
|
+
### 1. Parallel
|
|
11
|
+
|
|
12
|
+
**When:** Independent evaluations that should not influence each other.
|
|
13
|
+
Each cabinet member gets a clean context window with the same input data.
|
|
14
|
+
Results are collected and synthesized by the consuming skill.
|
|
15
|
+
|
|
16
|
+
**How it works:** The orchestrating skill (e.g., audit or plan) spawns
|
|
17
|
+
one agent per cabinet member in a single message. Each cabinet member analyzes
|
|
18
|
+
independently. Findings are merged. No cabinet member sees another's output.
|
|
19
|
+
|
|
20
|
+
**Risk:** Contradictory findings. Two cabinet members may flag the same
|
|
21
|
+
area with opposite recommendations (e.g., architecture says "split
|
|
22
|
+
this component" while usability says "keep it unified for simplicity").
|
|
23
|
+
|
|
24
|
+
**Mitigation:** The consuming skill synthesizes contradictions and
|
|
25
|
+
presents them to the user as a tension to resolve, not a bug. The
|
|
26
|
+
synthesis should name both cabinet members and their reasoning.
|
|
27
|
+
|
|
28
|
+
**Implementation:** Use the Agent tool with multiple agents in a
|
|
29
|
+
single message. Each agent receives: shared briefing (`_briefing.md`) +
|
|
30
|
+
cabinet member SKILL.md + output contract + input data.
|
|
31
|
+
|
|
32
|
+
---
|
|
33
|
+
|
|
34
|
+
### 2. Sequential
|
|
35
|
+
|
|
36
|
+
**When:** Ordered evaluation where later steps depend on earlier results.
|
|
37
|
+
One cabinet member's output becomes input (or gating condition) for the next.
|
|
38
|
+
|
|
39
|
+
**How it works:** The orchestrating skill runs cabinet members in order.
|
|
40
|
+
If the first returns "block," execution stops. Otherwise, its output
|
|
41
|
+
feeds into the next cabinet member's prompt as context.
|
|
42
|
+
|
|
43
|
+
**Example:** Execution checkpoints. Pre-implementation review runs first.
|
|
44
|
+
If all continue, implementation proceeds. Per-file-group review runs
|
|
45
|
+
during implementation. Pre-commit sweep runs after all implementation,
|
|
46
|
+
re-checking earlier concerns in aggregate.
|
|
47
|
+
|
|
48
|
+
**Example:** System diagnosis — debugger maps the dependency chain first,
|
|
49
|
+
then technical-debt evaluates the code quality of the dependencies the
|
|
50
|
+
debugger identified, then historian checks whether any of these
|
|
51
|
+
dependencies have caused issues before.
|
|
52
|
+
|
|
53
|
+
**Risk:** Anchoring. The first cabinet member's framing can bias later
|
|
54
|
+
cabinet members. If the debugger says "the problem is in the database
|
|
55
|
+
layer," technical-debt may focus exclusively on database code and
|
|
56
|
+
miss the real issue in the API layer.
|
|
57
|
+
|
|
58
|
+
**Mitigation:** Later cabinet members should receive the prior output as
|
|
59
|
+
context but with an explicit instruction: "The previous analysis found
|
|
60
|
+
X. You may agree, disagree, or identify issues the prior analysis missed.
|
|
61
|
+
Do not limit your scope to what was already identified."
|
|
62
|
+
|
|
63
|
+
**Implementation:** Sequential Agent calls — launch the first, wait for
|
|
64
|
+
result, include result in the second agent's prompt, etc.
|
|
65
|
+
|
|
66
|
+
---
|
|
67
|
+
|
|
68
|
+
### 3. Adversarial
|
|
69
|
+
|
|
70
|
+
**When:** High-stakes decisions where confirmation bias is likely. One
|
|
71
|
+
cabinet member is explicitly tasked with challenging another's conclusions.
|
|
72
|
+
|
|
73
|
+
**How it works:** Anti-confirmation (or a similar meta-cognitive
|
|
74
|
+
cabinet member) activates alongside domain cabinet members. It challenges the
|
|
75
|
+
reasoning quality of the plan itself AND of the other cabinet members'
|
|
76
|
+
critiques — asking what would make this wrong, what alternatives were
|
|
77
|
+
dismissed too quickly, where consensus formed before dissent was heard.
|
|
78
|
+
|
|
79
|
+
**When to use vs. not use:**
|
|
80
|
+
- USE when: redesigning a core system, choosing between approaches,
|
|
81
|
+
making an irreversible architectural decision, deferring significant
|
|
82
|
+
work (is the deferral justified or avoidant?)
|
|
83
|
+
- SKIP when: routine implementation, bug fixes, documentation updates,
|
|
84
|
+
trivial configuration changes
|
|
85
|
+
|
|
86
|
+
**Risk:** Slowness. Adversarial composition takes longer and can feel
|
|
87
|
+
obstructive when the decision is actually straightforward.
|
|
88
|
+
|
|
89
|
+
**Mitigation:** Topic-based activation (anti-confirmation only fires on
|
|
90
|
+
high-stakes topics). The adversarial cabinet member should have a hard
|
|
91
|
+
boundary: it challenges reasoning quality, not domain conclusions.
|
|
92
|
+
|
|
93
|
+
**Implementation:** Include the adversarial cabinet member in the parallel
|
|
94
|
+
agent batch alongside domain cabinet members. Its prompt explicitly states:
|
|
95
|
+
"Your job is to challenge the reasoning, not the domain conclusions.
|
|
96
|
+
Focus on: premature consensus, dismissed alternatives, unstated
|
|
97
|
+
assumptions, confirmation bias in the plan or in other critiques."
|
|
98
|
+
|
|
99
|
+
---
|
|
100
|
+
|
|
101
|
+
### 4. Nested
|
|
102
|
+
|
|
103
|
+
**When:** A cabinet member needs another cabinet member's analysis as input
|
|
104
|
+
to do its own work. One cabinet member consults another mid-evaluation.
|
|
105
|
+
|
|
106
|
+
**How it works:** A cabinet member running in the main context (session-modal,
|
|
107
|
+
needs full conversation history) references another cabinet member's known
|
|
108
|
+
findings — from memory, from audit history, or from prior session output.
|
|
109
|
+
|
|
110
|
+
**Example:** During debrief, a historian cabinet member is activated to check:
|
|
111
|
+
"Has this kind of change been done before? What happened? Are there
|
|
112
|
+
lessons from prior sessions relevant to what was just completed?"
|
|
113
|
+
|
|
114
|
+
**Example:** During planning, the organized-mind cabinet member might need
|
|
115
|
+
the historian's input: "Has this kind of information architecture been
|
|
116
|
+
tried before? What was the user's reaction?"
|
|
117
|
+
|
|
118
|
+
**Risk:** Deep nesting creates long dependency chains that are slow and
|
|
119
|
+
fragile. A three-level nest (A calls B calls C) means C's context
|
|
120
|
+
must include A's original input plus B's intermediate output — context
|
|
121
|
+
window pressure.
|
|
122
|
+
|
|
123
|
+
**Mitigation:** Limit to one level of nesting. If cabinet member A needs B's
|
|
124
|
+
output, A can reference B's known findings (from memory, from audit
|
|
125
|
+
history). It should NOT spawn B as a sub-agent. The consuming skill is
|
|
126
|
+
responsible for orchestrating multi-cabinet-member flows, not individual
|
|
127
|
+
cabinet members.
|
|
128
|
+
|
|
129
|
+
**Implementation:** The nested cabinet member runs in the main context
|
|
130
|
+
(session-modal) rather than as a parallel agent. This is the exception,
|
|
131
|
+
not the rule — most cabinet members run in clean parallel contexts.
|
|
132
|
+
|
|
133
|
+
---
|
|
134
|
+
|
|
135
|
+
### 5. Temporal
|
|
136
|
+
|
|
137
|
+
**When:** The same domain needs different evaluation at different lifecycle
|
|
138
|
+
stages. A cabinet member that applies during planning applies differently
|
|
139
|
+
during execution and differently again during audit.
|
|
140
|
+
|
|
141
|
+
**How it works:** Same cabinet member, different output contracts at different
|
|
142
|
+
lifecycle stages. The orchestrating skill passes the appropriate contract.
|
|
143
|
+
|
|
144
|
+
**Example:** QA cabinet member across the lifecycle:
|
|
145
|
+
- During planning: evaluates acceptance criteria quality (are they testable?
|
|
146
|
+
do they have [auto]/[manual]/[deferred] tags? are edge cases covered?)
|
|
147
|
+
- During execution: active testing (runs [auto] criteria, verifies
|
|
148
|
+
[manual] criteria via preview tools, documents [deferred])
|
|
149
|
+
- During debrief: produces QA report summarizing what was verified,
|
|
150
|
+
what failed, what's still unverified
|
|
151
|
+
|
|
152
|
+
Same cabinet member, three different output contracts, three different
|
|
153
|
+
points in the lifecycle.
|
|
154
|
+
|
|
155
|
+
**Example:** Security cabinet member:
|
|
156
|
+
- During planning: evaluates whether the plan introduces attack surface
|
|
157
|
+
(new endpoints, auth changes, input handling)
|
|
158
|
+
- During execution: reviews the actual code for OWASP vulnerabilities
|
|
159
|
+
before implementation proceeds
|
|
160
|
+
- During audit: scans deployed code for security issues
|
|
161
|
+
|
|
162
|
+
**Risk:** Criteria drift between stages. If the QA cabinet member defines
|
|
163
|
+
"testable AC" differently during planning than execution expects, the
|
|
164
|
+
execute phase will struggle with criteria that looked good during
|
|
165
|
+
planning but are actually unverifiable.
|
|
166
|
+
|
|
167
|
+
**Mitigation:** Output contracts define what each cabinet member produces
|
|
168
|
+
at each stage. The contracts are explicit — a cabinet member reading its
|
|
169
|
+
contract knows exactly what's expected.
|
|
170
|
+
|
|
171
|
+
**Implementation:** Same cabinet member SKILL.md, different output contract
|
|
172
|
+
per consuming skill. The consuming skill passes the appropriate contract
|
|
173
|
+
to the agent prompt.
|
|
174
|
+
|
|
175
|
+
---
|
|
176
|
+
|
|
177
|
+
## Pre-Built Recipes
|
|
178
|
+
|
|
179
|
+
Recipes are named combinations for common situations. The consuming
|
|
180
|
+
skill selects a recipe based on context, then activates the listed
|
|
181
|
+
cabinet members using the appropriate pattern.
|
|
182
|
+
|
|
183
|
+
### Committee Audit
|
|
184
|
+
|
|
185
|
+
**When:** Scoped audit of a specific domain (UX, code quality, etc.).
|
|
186
|
+
**Pattern:** Parallel
|
|
187
|
+
**Cabinet Members:** All cabinet members in the selected committee(s) from your
|
|
188
|
+
project's committee configuration.
|
|
189
|
+
**Why this combination:** Committees are pre-curated sets of related
|
|
190
|
+
cabinet members. Running a committee audit gives thorough coverage of one
|
|
191
|
+
domain without the cost/time of a full audit.
|
|
192
|
+
|
|
193
|
+
### High-Stakes Decision
|
|
194
|
+
|
|
195
|
+
**When:** Architectural redesign, technology choice, significant deferral.
|
|
196
|
+
**Pattern:** Parallel + Adversarial
|
|
197
|
+
**Cabinet Members:** anti-confirmation + architecture + historian + goal-alignment
|
|
198
|
+
**Why this combination:** Architecture evaluates technical fitness.
|
|
199
|
+
Goal-alignment checks strategic fit. Historian surfaces past precedent.
|
|
200
|
+
Anti-confirmation stress-tests the reasoning behind all three.
|
|
201
|
+
|
|
202
|
+
### New Feature Planning
|
|
203
|
+
|
|
204
|
+
**When:** Adding user-visible functionality with UI + API changes.
|
|
205
|
+
**Pattern:** Parallel (with design committee for UI)
|
|
206
|
+
**Cabinet Members:** security + architecture + organized-mind + qa +
|
|
207
|
+
any domain-specific UI cabinet members
|
|
208
|
+
**Why this combination:** Security catches attack surface. Architecture
|
|
209
|
+
evaluates system fit. Organized-mind checks cognitive load. QA evaluates
|
|
210
|
+
AC quality. UI cabinet members critique the interaction model.
|
|
211
|
+
|
|
212
|
+
### System Diagnosis
|
|
213
|
+
|
|
214
|
+
**When:** Something is broken or degrading and the root cause is unclear.
|
|
215
|
+
**Pattern:** Sequential (debugger first, then technical-debt, then historian)
|
|
216
|
+
**Cabinet Members:** debugger → technical-debt → historian
|
|
217
|
+
**Why this combination:** Debugger maps the dependency chain and identifies
|
|
218
|
+
the failure point. Technical-debt evaluates whether the failure point
|
|
219
|
+
is symptomatic of deeper code quality issues. Historian checks whether
|
|
220
|
+
this failure pattern has occurred before and what fixed it last time.
|
|
221
|
+
|
|
222
|
+
### Prompt Refinement
|
|
223
|
+
|
|
224
|
+
**When:** Improving skill definitions or cabinet member definitions.
|
|
225
|
+
**Pattern:** Parallel
|
|
226
|
+
**Cabinet Members:** roster-check + process-therapist + organized-mind
|
|
227
|
+
**Why this combination:** Roster-check evaluates whether the skill
|
|
228
|
+
covers its full scope. Process-therapist checks whether the skill follows
|
|
229
|
+
established patterns. Organized-mind evaluates whether the skill's
|
|
230
|
+
structure is cognitively navigable for a fresh session.
|
|
231
|
+
|
|
232
|
+
### Post-Execution Review
|
|
233
|
+
|
|
234
|
+
**When:** Debrief after completing implementation work.
|
|
235
|
+
**Pattern:** Nested (session-modal)
|
|
236
|
+
**Cabinet Members:** historian + any lifecycle-tracking cabinet members + qa
|
|
237
|
+
**Why this combination:** Historian records what was done and checks for
|
|
238
|
+
lessons learned. Lifecycle cabinet members capture relevant non-dev items.
|
|
239
|
+
QA produces the final verification report. All need session context,
|
|
240
|
+
so they run in the main context.
|
|
@@ -0,0 +1,208 @@
|
|
|
1
|
+
# Skill Effectiveness Assessment Protocol
|
|
2
|
+
|
|
3
|
+
Shared reference for evaluating whether skills and cabinet members are doing
|
|
4
|
+
their job. Adapted from Anthropic's skill-creator eval framework for manual
|
|
5
|
+
assessment. This is not an automated test suite — it's a structured way to
|
|
6
|
+
ask "is this skill working?" and get an evidence-based answer.
|
|
7
|
+
|
|
8
|
+
## When This Runs
|
|
9
|
+
|
|
10
|
+
Two trigger mechanisms ensure this protocol doesn't just sit as a document:
|
|
11
|
+
|
|
12
|
+
1. **Pull (via prompt refinement):** When reviewing a skill's definition,
|
|
13
|
+
run the assessment before proposing changes. The assessment grounds the
|
|
14
|
+
refinement in evidence rather than intuition.
|
|
15
|
+
|
|
16
|
+
2. **Push (via process-therapist):** During audits, process-therapist checks whether
|
|
17
|
+
any skill's last assessment is older than 30 days. If so, it surfaces
|
|
18
|
+
an "eval overdue" finding, which enters the normal triage flow. The
|
|
19
|
+
user decides whether to act on it.
|
|
20
|
+
|
|
21
|
+
## The Assessment Framework
|
|
22
|
+
|
|
23
|
+
### 1. Define Assertions
|
|
24
|
+
|
|
25
|
+
An assertion is a testable claim about what a skill should produce.
|
|
26
|
+
Each assertion has three fields:
|
|
27
|
+
|
|
28
|
+
```
|
|
29
|
+
{ "text": "what is being tested",
|
|
30
|
+
"passed": true/false,
|
|
31
|
+
"evidence": "supporting data or reasoning" }
|
|
32
|
+
```
|
|
33
|
+
|
|
34
|
+
**Types of assertions:**
|
|
35
|
+
|
|
36
|
+
- **Behavioral:** Does the skill produce the right actions?
|
|
37
|
+
- Example: "The planning skill produces AC with [auto]/[manual]/[deferred] tags"
|
|
38
|
+
- Example: "The orient skill surfaces overdue items before suggesting focus"
|
|
39
|
+
|
|
40
|
+
- **Quality:** Is the output at the right level?
|
|
41
|
+
- Example: "The inbox skill asks before routing ambiguous items"
|
|
42
|
+
- Example: "QA cabinet member catches [auto] AC failures before commit"
|
|
43
|
+
|
|
44
|
+
- **Coverage:** Does the skill handle its full scope?
|
|
45
|
+
- Example: "The execute skill runs all checkpoint types (pre, per-group, pre-commit)"
|
|
46
|
+
- Example: "Roster-check detects drift between skill definition and documentation"
|
|
47
|
+
|
|
48
|
+
- **Boundary:** Does the skill stay in its portfolio?
|
|
49
|
+
- Example: "Organized-mind flags cognitive load but does not suggest UI framework components"
|
|
50
|
+
- Example: "Anti-confirmation challenges reasoning quality without developing domain arguments"
|
|
51
|
+
|
|
52
|
+
**How many assertions per skill:** 5-8 for core skills (planning,
|
|
53
|
+
execution, orientation, inbox processing). 3-5 for cabinet members. More
|
|
54
|
+
isn't better — each assertion should test something meaningfully different.
|
|
55
|
+
|
|
56
|
+
### 2. Sample Past Executions
|
|
57
|
+
|
|
58
|
+
Use conversation history and memory to find evidence:
|
|
59
|
+
|
|
60
|
+
- Find recent sessions where a skill was invoked
|
|
61
|
+
- Find sessions where a cabinet member activated
|
|
62
|
+
- Find sessions where something went wrong
|
|
63
|
+
|
|
64
|
+
Also check:
|
|
65
|
+
- Memory files for feedback corrections (these are failed assertions)
|
|
66
|
+
- Git history for reverted changes (execution failures)
|
|
67
|
+
- Audit triage history for rejected findings (cabinet member miscalibration)
|
|
68
|
+
|
|
69
|
+
**Sample size:** 3-5 recent executions is sufficient for manual assessment.
|
|
70
|
+
If a skill hasn't been invoked 3 times in the last month, that itself is
|
|
71
|
+
a finding (coverage gap or trigger problem).
|
|
72
|
+
|
|
73
|
+
### 3. Score Each Assertion
|
|
74
|
+
|
|
75
|
+
For each assertion, review the sampled executions and score:
|
|
76
|
+
|
|
77
|
+
| Score | Meaning |
|
|
78
|
+
|-------|---------|
|
|
79
|
+
| **pass** | Assertion holds in all sampled executions |
|
|
80
|
+
| **partial** | Assertion holds in some but not all executions |
|
|
81
|
+
| **fail** | Assertion does not hold in any sampled execution |
|
|
82
|
+
| **untestable** | Not enough data to evaluate (note why) |
|
|
83
|
+
|
|
84
|
+
Record the evidence for each score — a pass without evidence is
|
|
85
|
+
an assumption, not a finding.
|
|
86
|
+
|
|
87
|
+
### 4. Aggregate and Interpret
|
|
88
|
+
|
|
89
|
+
```
|
|
90
|
+
## Assessment: /plan — 2026-03-22
|
|
91
|
+
|
|
92
|
+
Assertions: 6 total
|
|
93
|
+
- Pass: 4 (67%)
|
|
94
|
+
- Partial: 1 (17%)
|
|
95
|
+
- Fail: 1 (17%)
|
|
96
|
+
- Untestable: 0
|
|
97
|
+
|
|
98
|
+
Pass rate: 67% (4/6 testable)
|
|
99
|
+
|
|
100
|
+
### Findings
|
|
101
|
+
- PASS: Produces AC with [auto]/[manual]/[deferred] tags (5/5 sampled plans)
|
|
102
|
+
- PASS: Surface area includes all implementation files (4/5 — one missed shared entry point)
|
|
103
|
+
- PASS: Presents plan for user approval before creating action (5/5)
|
|
104
|
+
- PASS: Runs cabinet member critique before presenting (4/5 — skipped once for trivial plan)
|
|
105
|
+
- PARTIAL: Plans deliver complete features (3/5 — two plans had infrastructure-only steps)
|
|
106
|
+
- FAIL: Plan notes persist reasoning (1/5 — four plans had thin "why" sections)
|
|
107
|
+
```
|
|
108
|
+
|
|
109
|
+
**Interpretation guide:**
|
|
110
|
+
|
|
111
|
+
| Pass rate | Health | Action |
|
|
112
|
+
|-----------|--------|--------|
|
|
113
|
+
| 80-100% | Healthy | Monitor. Note partial assertions for refinement. |
|
|
114
|
+
| 60-79% | Degrading | Investigate failing assertions. Are they skill design issues or execution drift? Propose targeted refinements. |
|
|
115
|
+
| Below 60% | Unhealthy | The skill needs significant revision. Root-cause analysis before patching. |
|
|
116
|
+
|
|
117
|
+
### 5. Track Over Time
|
|
118
|
+
|
|
119
|
+
Append each assessment to a tracking section at the bottom of
|
|
120
|
+
the skill's SKILL.md (or in a separate ASSESSMENTS.md if the skill
|
|
121
|
+
is approaching the 500-line limit):
|
|
122
|
+
|
|
123
|
+
```markdown
|
|
124
|
+
## Assessment Log
|
|
125
|
+
|
|
126
|
+
### 2026-03-22 — Pass rate: 67% (4/6)
|
|
127
|
+
Key finding: Plan notes don't persist reasoning. "Why" sections thin.
|
|
128
|
+
Action taken: Added emphasis in Step 2 template + calibration example.
|
|
129
|
+
|
|
130
|
+
### 2026-04-15 — Pass rate: 83% (5/6)
|
|
131
|
+
Improvement: Reasoning persistence now at 4/5. Remaining partial: feature
|
|
132
|
+
completeness — one plan delivered infrastructure step without wiring.
|
|
133
|
+
```
|
|
134
|
+
|
|
135
|
+
This creates a trend line. A declining pass rate across assessments
|
|
136
|
+
signals systemic drift. An improving rate confirms refinements are working.
|
|
137
|
+
|
|
138
|
+
## Example: Complete Assessment
|
|
139
|
+
|
|
140
|
+
*(Adapted from a reference implementation's orient skill assessment)*
|
|
141
|
+
|
|
142
|
+
```
|
|
143
|
+
## Assessment: /orient — 2026-03-22
|
|
144
|
+
Sampled: 5 recent sessions (3 standard, 1 morning, 1 user-requested)
|
|
145
|
+
|
|
146
|
+
### Assertions
|
|
147
|
+
|
|
148
|
+
1. BEHAVIORAL: Pulls fresh data before presenting briefing
|
|
149
|
+
Score: pass
|
|
150
|
+
Evidence: All 5 sessions ran data sync in Step 2.
|
|
151
|
+
|
|
152
|
+
2. BEHAVIORAL: Surfaces overdue items with severity indication
|
|
153
|
+
Score: pass
|
|
154
|
+
Evidence: 4/5 sessions had overdue items; all were surfaced with
|
|
155
|
+
due dates. One session had no overdue items (correctly omitted).
|
|
156
|
+
|
|
157
|
+
3. BEHAVIORAL: Mentions inbox counts (main + sub-inboxes)
|
|
158
|
+
Score: partial
|
|
159
|
+
Evidence: 4/5 sessions showed inbox counts. One session showed main
|
|
160
|
+
inbox count but omitted sub-inbox counts (had 2 items in a sub-inbox
|
|
161
|
+
that went unmentioned).
|
|
162
|
+
|
|
163
|
+
4. QUALITY: Suggested focus grounded in data, not defaults
|
|
164
|
+
Score: pass
|
|
165
|
+
Evidence: All 5 suggestions referenced specific items (deadlines,
|
|
166
|
+
project momentum, inbox counts). None defaulted to "continue last
|
|
167
|
+
session's work" without evidence.
|
|
168
|
+
|
|
169
|
+
5. COVERAGE: Morning mode includes calendar + completions
|
|
170
|
+
Score: untestable
|
|
171
|
+
Evidence: Only 1 morning-mode session in sample. Calendar was shown.
|
|
172
|
+
Completions section was present but sparse. Need more morning-mode
|
|
173
|
+
samples to evaluate reliably.
|
|
174
|
+
|
|
175
|
+
6. BOUNDARY: Does not prescribe what to work on
|
|
176
|
+
Score: pass
|
|
177
|
+
Evidence: All 5 sessions ended with "What feels right?" or similar.
|
|
178
|
+
None said "You should work on X."
|
|
179
|
+
|
|
180
|
+
### Summary
|
|
181
|
+
Assertions: 6 total
|
|
182
|
+
- Pass: 4 (67%), Partial: 1 (17%), Fail: 0, Untestable: 1 (17%)
|
|
183
|
+
- Pass rate: 80% (4/5 testable)
|
|
184
|
+
- Health: Healthy
|
|
185
|
+
|
|
186
|
+
### Actions
|
|
187
|
+
- Sub-inbox counts: add explicit check to orient phase (address partial)
|
|
188
|
+
- Morning mode: defer re-assessment until 3+ morning sessions available
|
|
189
|
+
```
|
|
190
|
+
|
|
191
|
+
## For Cabinet Members (Compressed Format)
|
|
192
|
+
|
|
193
|
+
Cabinet members are simpler — they produce findings in a structured format.
|
|
194
|
+
Assessment focuses on:
|
|
195
|
+
|
|
196
|
+
1. **Signal quality:** Are findings actionable or noise?
|
|
197
|
+
- Check audit triage history: what percentage were accepted vs. rejected?
|
|
198
|
+
- A >50% rejection rate means the cabinet member is miscalibrated.
|
|
199
|
+
|
|
200
|
+
2. **Portfolio discipline:** Does the cabinet member stay in its domain?
|
|
201
|
+
- Check findings for cross-portfolio observations that duplicate other cabinet members.
|
|
202
|
+
|
|
203
|
+
3. **Activation accuracy:** Does it fire when relevant and stay quiet when not?
|
|
204
|
+
- Check: did it activate for files/topics outside its declared scope?
|
|
205
|
+
- Check: were there situations where it should have activated but didn't?
|
|
206
|
+
|
|
207
|
+
Three assertions per cabinet member is sufficient. Use the same
|
|
208
|
+
pass/partial/fail/untestable scoring.
|