@nst173/superpowers-ccg 1.3.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/.agent/skills/brainstorming/SKILL.md +26 -0
- package/.agent/skills/coordinating-multi-model-work/SKILL.md +29 -0
- package/.agent/skills/executing-plans/SKILL.md +27 -0
- package/.agent/skills/using-superpowers/SKILL.md +29 -0
- package/.agent/skills/verifying-before-completion/SKILL.md +20 -0
- package/.agent/skills/writing-plans/SKILL.md +29 -0
- package/.cursor/agents/code-reviewer.md +22 -0
- package/.cursor/commands/brainstorm.md +11 -0
- package/.cursor/commands/execute-plan.md +12 -0
- package/.cursor/commands/write-plan.md +11 -0
- package/.cursor/hook-scripts/after-file-edit.mjs +3 -0
- package/.cursor/hook-scripts/before-shell-execution.mjs +3 -0
- package/.cursor/hook-scripts/session-end.mjs +3 -0
- package/.cursor/hooks.json +21 -0
- package/.cursor/mcp.json +20 -0
- package/.cursor/rules/checkpoint-protocol.mdc +11 -0
- package/.cursor/rules/orchestrator-routing.mdc +12 -0
- package/.cursor/rules/token-discipline.mdc +12 -0
- package/.cursor/skills/brainstorming/SKILL.md +26 -0
- package/.cursor/skills/coordinating-multi-model-work/SKILL.md +29 -0
- package/.cursor/skills/executing-plans/SKILL.md +27 -0
- package/.cursor/skills/using-superpowers/SKILL.md +29 -0
- package/.cursor/skills/verifying-before-completion/SKILL.md +20 -0
- package/.cursor/skills/writing-plans/SKILL.md +29 -0
- package/AGENTS.md +23 -0
- package/CLAUDE.md +78 -0
- package/GEMINI.md +27 -0
- package/LICENSE +21 -0
- package/README.md +171 -0
- package/agents/code-reviewer.md +54 -0
- package/cli/superpowers-ccg.mjs +8 -0
- package/commands/brainstorm.md +6 -0
- package/commands/execute-plan.md +6 -0
- package/commands/write-plan.md +6 -0
- package/config/antigravity/mcp_config.example.json +26 -0
- package/hooks/hooks.json +37 -0
- package/hooks/pre-tool-use-task.sh +4 -0
- package/hooks/run-hook.cmd +19 -0
- package/hooks/session-start.sh +72 -0
- package/hooks/user-prompt-submit.sh +31 -0
- package/package.json +56 -0
- package/skills/EVALUATION.md +201 -0
- package/skills/brainstorming/SKILL.md +120 -0
- package/skills/coordinating-multi-model-work/GATE.md +36 -0
- package/skills/coordinating-multi-model-work/INTEGRATION.md +51 -0
- package/skills/coordinating-multi-model-work/SKILL.md +51 -0
- package/skills/coordinating-multi-model-work/checkpoints.md +31 -0
- package/skills/coordinating-multi-model-work/cross-validation.md +37 -0
- package/skills/coordinating-multi-model-work/prompts/codex-base.md +40 -0
- package/skills/coordinating-multi-model-work/prompts/gemini-base.md +41 -0
- package/skills/coordinating-multi-model-work/review-chain.md +25 -0
- package/skills/coordinating-multi-model-work/routing-decision.md +50 -0
- package/skills/debugging-systematically/CREATION-LOG.md +119 -0
- package/skills/debugging-systematically/SKILL.md +325 -0
- package/skills/debugging-systematically/condition-based-waiting-example.ts +158 -0
- package/skills/debugging-systematically/condition-based-waiting.md +115 -0
- package/skills/debugging-systematically/defense-in-depth.md +122 -0
- package/skills/debugging-systematically/find-polluter.sh +63 -0
- package/skills/debugging-systematically/root-cause-tracing.md +169 -0
- package/skills/debugging-systematically/test-academic.md +14 -0
- package/skills/debugging-systematically/test-pressure-1.md +58 -0
- package/skills/debugging-systematically/test-pressure-2.md +68 -0
- package/skills/debugging-systematically/test-pressure-3.md +69 -0
- package/skills/developing-with-subagents/SKILL.md +51 -0
- package/skills/developing-with-subagents/code-quality-reviewer-prompt.md +30 -0
- package/skills/developing-with-subagents/implementer-prompt.md +41 -0
- package/skills/developing-with-subagents/spec-reviewer-prompt.md +25 -0
- package/skills/dispatching-parallel-agents/SKILL.md +195 -0
- package/skills/executing-plans/SKILL.md +67 -0
- package/skills/finishing-development-branches/SKILL.md +208 -0
- package/skills/practicing-test-driven-development/SKILL.md +346 -0
- package/skills/practicing-test-driven-development/testing-anti-patterns.md +299 -0
- package/skills/receiving-code-review/SKILL.md +221 -0
- package/skills/requesting-code-review/SKILL.md +127 -0
- package/skills/requesting-code-review/code-reviewer.md +146 -0
- package/skills/shared/multi-model-integration-section.md +32 -0
- package/skills/shared/protocol-threshold.md +46 -0
- package/skills/shared/supplementary-tools.md +132 -0
- package/skills/shared/task-format-reference.md +83 -0
- package/skills/using-git-worktrees/SKILL.md +225 -0
- package/skills/using-superpowers/SKILL.md +101 -0
- package/skills/verifying-before-completion/SKILL.md +159 -0
- package/skills/writing-plans/SKILL.md +55 -0
- package/skills/writing-skills/CHECKLIST.md +92 -0
- package/skills/writing-skills/SKILL.md +111 -0
- package/skills/writing-skills/STRUCTURE.md +208 -0
- package/skills/writing-skills/TESTING.md +155 -0
- package/skills/writing-skills/anthropic-best-practices.md +1150 -0
- package/skills/writing-skills/examples/CLAUDE_MD_TESTING.md +189 -0
- package/skills/writing-skills/graphviz-conventions.dot +172 -0
- package/skills/writing-skills/persuasion-principles.md +187 -0
- package/skills/writing-skills/render-graphs.js +168 -0
- package/skills/writing-skills/testing-skills-with-subagents.md +384 -0
- package/src/cli.mjs +165 -0
- package/src/constants.mjs +7 -0
- package/src/install.mjs +186 -0
- package/src/io.mjs +81 -0
|
@@ -0,0 +1,201 @@
|
|
|
1
|
+
# Skills Evaluation Scenarios
|
|
2
|
+
|
|
3
|
+
Use evaluation scenarios (per Anthropic best practices) to validate skill effectiveness.
|
|
4
|
+
|
|
5
|
+
## Evaluation Format
|
|
6
|
+
|
|
7
|
+
```json
|
|
8
|
+
{
|
|
9
|
+
"skill": "skill-name",
|
|
10
|
+
"query": "User request",
|
|
11
|
+
"context": "Optional context file",
|
|
12
|
+
"expected_behavior": [
|
|
13
|
+
"Expected behavior 1",
|
|
14
|
+
"Expected behavior 2"
|
|
15
|
+
]
|
|
16
|
+
}
|
|
17
|
+
```
|
|
18
|
+
|
|
19
|
+
## Core Workflow Skills
|
|
20
|
+
|
|
21
|
+
### test-driven-development
|
|
22
|
+
|
|
23
|
+
```json
|
|
24
|
+
{
|
|
25
|
+
"skill": "test-driven-development",
|
|
26
|
+
"query": "Add a function to validate email format",
|
|
27
|
+
"expected_behavior": [
|
|
28
|
+
"Write a failing test before implementation",
|
|
29
|
+
"Run tests to confirm failure and show the reason",
|
|
30
|
+
"Write minimal code to make the test pass",
|
|
31
|
+
"Run tests to confirm passing",
|
|
32
|
+
"Do not add functionality beyond the test scope"
|
|
33
|
+
]
|
|
34
|
+
}
|
|
35
|
+
```
|
|
36
|
+
|
|
37
|
+
```json
|
|
38
|
+
{
|
|
39
|
+
"skill": "test-driven-development",
|
|
40
|
+
"query": "This function has a bug, please fix it",
|
|
41
|
+
"context": "src/utils/parser.ts",
|
|
42
|
+
"expected_behavior": [
|
|
43
|
+
"Write a failing test that reproduces the bug",
|
|
44
|
+
"Confirm the test fails for the correct reason",
|
|
45
|
+
"Fix the code",
|
|
46
|
+
"Confirm the test passes"
|
|
47
|
+
]
|
|
48
|
+
}
|
|
49
|
+
```
|
|
50
|
+
|
|
51
|
+
### debugging-systematically
|
|
52
|
+
|
|
53
|
+
```json
|
|
54
|
+
{
|
|
55
|
+
"skill": "debugging-systematically",
|
|
56
|
+
"query": "Tests are failing, please take a look",
|
|
57
|
+
"context": "npm test output shows 3 test failures",
|
|
58
|
+
"expected_behavior": [
|
|
59
|
+
"Phase 1: Carefully read the error messages before proposing fixes",
|
|
60
|
+
"Try to reproduce the issue",
|
|
61
|
+
"Check recent code changes",
|
|
62
|
+
"Phase 2: Compare with similar working code",
|
|
63
|
+
"Phase 3: Form a single hypothesis and minimize the test",
|
|
64
|
+
"Phase 4: Write a failing test before fixing"
|
|
65
|
+
]
|
|
66
|
+
}
|
|
67
|
+
```
|
|
68
|
+
|
|
69
|
+
```json
|
|
70
|
+
{
|
|
71
|
+
"skill": "debugging-systematically",
|
|
72
|
+
"query": "Build failed and I can't understand the error",
|
|
73
|
+
"expected_behavior": [
|
|
74
|
+
"Do not guess a fix immediately",
|
|
75
|
+
"Carefully read the full error message and stack trace",
|
|
76
|
+
"If multiple components are involved, add diagnostic logs to isolate the layer",
|
|
77
|
+
"Form a hypothesis before proposing a fix"
|
|
78
|
+
]
|
|
79
|
+
}
|
|
80
|
+
```
|
|
81
|
+
|
|
82
|
+
### verifying-before-completion
|
|
83
|
+
|
|
84
|
+
```json
|
|
85
|
+
{
|
|
86
|
+
"skill": "verifying-before-completion",
|
|
87
|
+
"query": "Fix this bug and then commit",
|
|
88
|
+
"expected_behavior": [
|
|
89
|
+
"Run tests after the fix",
|
|
90
|
+
"Show test output proving success",
|
|
91
|
+
"Only claim 'fixed' after seeing passing evidence",
|
|
92
|
+
"Avoid vague wording like 'should work' or 'should pass'"
|
|
93
|
+
]
|
|
94
|
+
}
|
|
95
|
+
```
|
|
96
|
+
|
|
97
|
+
### brainstorming
|
|
98
|
+
|
|
99
|
+
```json
|
|
100
|
+
{
|
|
101
|
+
"skill": "brainstorming",
|
|
102
|
+
"query": "I want to add user authentication",
|
|
103
|
+
"expected_behavior": [
|
|
104
|
+
"Understand the current project state first (files, docs, recent commits)",
|
|
105
|
+
"Ask one question at a time",
|
|
106
|
+
"Prefer multiple-choice questions",
|
|
107
|
+
"Propose 2-3 different approaches with trade-offs",
|
|
108
|
+
"Present the design in sections and confirm after each"
|
|
109
|
+
]
|
|
110
|
+
}
|
|
111
|
+
```
|
|
112
|
+
|
|
113
|
+
### executing-plans
|
|
114
|
+
|
|
115
|
+
```json
|
|
116
|
+
{
|
|
117
|
+
"skill": "executing-plans",
|
|
118
|
+
"query": "Execute the plan in docs/plans/feature.md",
|
|
119
|
+
"expected_behavior": [
|
|
120
|
+
"Read the plan first and review it critically",
|
|
121
|
+
"Ask questions before starting if anything is unclear",
|
|
122
|
+
"Use a checklist to track progress",
|
|
123
|
+
"Report after each batch and wait for feedback",
|
|
124
|
+
"Do not skip verification steps in the plan"
|
|
125
|
+
]
|
|
126
|
+
}
|
|
127
|
+
```
|
|
128
|
+
|
|
129
|
+
### developing-with-subagents
|
|
130
|
+
|
|
131
|
+
```json
|
|
132
|
+
{
|
|
133
|
+
"skill": "developing-with-subagents",
|
|
134
|
+
"query": "Execute this plan using subagents",
|
|
135
|
+
"context": "docs/plans/feature.md",
|
|
136
|
+
"expected_behavior": [
|
|
137
|
+
"Read the plan once and extract all tasks",
|
|
138
|
+
"Dispatch a separate subagent for each task",
|
|
139
|
+
"Answer subagent questions before proceeding",
|
|
140
|
+
"Run spec review before code quality review",
|
|
141
|
+
"Iterate on fixes until the reviewer passes"
|
|
142
|
+
]
|
|
143
|
+
}
|
|
144
|
+
```
|
|
145
|
+
|
|
146
|
+
## Supporting Skills
|
|
147
|
+
|
|
148
|
+
### using-git-worktrees
|
|
149
|
+
|
|
150
|
+
```json
|
|
151
|
+
{
|
|
152
|
+
"skill": "using-git-worktrees",
|
|
153
|
+
"query": "Create an isolated workspace to develop a new feature",
|
|
154
|
+
"expected_behavior": [
|
|
155
|
+
"Check existing .worktrees or worktrees directories",
|
|
156
|
+
"Verify the directory is gitignored",
|
|
157
|
+
"If not ignored, add it to .gitignore and commit",
|
|
158
|
+
"Run project setup after creating the worktree",
|
|
159
|
+
"Run tests to verify a clean baseline"
|
|
160
|
+
]
|
|
161
|
+
}
|
|
162
|
+
```
|
|
163
|
+
|
|
164
|
+
### finishing-development-branches
|
|
165
|
+
|
|
166
|
+
```json
|
|
167
|
+
{
|
|
168
|
+
"skill": "finishing-development-branches",
|
|
169
|
+
"query": "Development is complete, help me handle the branch",
|
|
170
|
+
"expected_behavior": [
|
|
171
|
+
"Run tests and verify they pass",
|
|
172
|
+
"Do not offer completion options if tests fail",
|
|
173
|
+
"Provide exactly 4 options",
|
|
174
|
+
"Require confirmation when discarding",
|
|
175
|
+
"Clean up the worktree correctly"
|
|
176
|
+
]
|
|
177
|
+
}
|
|
178
|
+
```
|
|
179
|
+
|
|
180
|
+
### dispatching-parallel-agents
|
|
181
|
+
|
|
182
|
+
```json
|
|
183
|
+
{
|
|
184
|
+
"skill": "dispatching-parallel-agents",
|
|
185
|
+
"query": "Five test files are failing, help me fix them",
|
|
186
|
+
"expected_behavior": [
|
|
187
|
+
"Identify whether failures are independent",
|
|
188
|
+
"If independent, group them by problem domain",
|
|
189
|
+
"Dispatch multiple agents in parallel",
|
|
190
|
+
"Give each agent a clear scope and constraints",
|
|
191
|
+
"Summarize results and verify no conflicts"
|
|
192
|
+
]
|
|
193
|
+
}
|
|
194
|
+
```
|
|
195
|
+
|
|
196
|
+
## Run the Evaluation
|
|
197
|
+
|
|
198
|
+
1. Run the query without the skill and record behavior
|
|
199
|
+
2. Enable the skill and run the same query
|
|
200
|
+
3. Compare behavior against expected_behavior
|
|
201
|
+
4. Record differences and iterate on the skill
|
|
@@ -0,0 +1,120 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: brainstorming
|
|
3
|
+
description: "Explores user intent, requirements and design through collaborative dialogue before implementation. Use when: creating features, building components, adding functionality, modifying behavior, or starting any creative work. Keywords: design, requirements, spec, ideation, planning"
|
|
4
|
+
---
|
|
5
|
+
|
|
6
|
+
# Brainstorming Ideas Into Designs
|
|
7
|
+
|
|
8
|
+
## Overview
|
|
9
|
+
|
|
10
|
+
Help turn ideas into fully formed designs and specs through natural collaborative dialogue.
|
|
11
|
+
|
|
12
|
+
Start by understanding the current project context, then ask questions one at a time to refine the idea. Once you understand what you're building, present the design in small sections (200-300 words), checking after each section whether it looks right so far.
|
|
13
|
+
|
|
14
|
+
## Protocol Threshold (Required)
|
|
15
|
+
|
|
16
|
+
Follow `skills/shared/protocol-threshold.md`. The hook injects CP reminders automatically.
|
|
17
|
+
|
|
18
|
+
## The Process
|
|
19
|
+
|
|
20
|
+
**Understanding the idea:**
|
|
21
|
+
|
|
22
|
+
- Check out the current project state first (files, docs, recent commits)
|
|
23
|
+
- Ask questions one at a time to refine the idea
|
|
24
|
+
- Prefer multiple choice questions when possible, but open-ended is fine too
|
|
25
|
+
- Only one question per message - if a topic needs more exploration, break it into multiple questions
|
|
26
|
+
- Focus on understanding: purpose, constraints, success criteria
|
|
27
|
+
|
|
28
|
+
**Model tip for exploration:** When dispatching subagents to explore the codebase, use `model: haiku` for fast, cost-effective searches. Haiku excels at file pattern matching and quick lookups.
|
|
29
|
+
|
|
30
|
+
**Supplementary tools (optional, enhance research):**
|
|
31
|
+
- **Grok Search (Tavily):** If the idea involves unfamiliar tech, current trends, or competitive analysis — use `mcp__grok-search__web_search` to gather real-time information before proposing approaches. Especially useful when the user references a library, service, or pattern you're uncertain about.
|
|
32
|
+
- **Serena:** If the project is large (>10 files involved) — use Serena for semantic codebase exploration to understand existing architecture and symbol relationships.
|
|
33
|
+
- See `skills/shared/supplementary-tools.md` for full reference.
|
|
34
|
+
|
|
35
|
+
**► CP1 (Task Analysis):** After understanding the idea, apply `coordinating-multi-model-work/checkpoints.md`.
|
|
36
|
+
|
|
37
|
+
**Exploring approaches:**
|
|
38
|
+
|
|
39
|
+
- Propose 2-3 different approaches with trade-offs
|
|
40
|
+
- Present options conversationally with your recommendation and reasoning
|
|
41
|
+
- Lead with your recommended option and explain why
|
|
42
|
+
- **Sequential-Thinking (optional):** For complex designs with 3+ interacting components, use Sequential-Thinking MCP to systematically decompose trade-offs and validate reasoning chains before presenting options.
|
|
43
|
+
|
|
44
|
+
**► CP2 (Mid-Review):** When multiple approaches have significant trade-offs, apply `coordinating-multi-model-work/checkpoints.md`.
|
|
45
|
+
|
|
46
|
+
**Presenting the design:**
|
|
47
|
+
|
|
48
|
+
- Once you believe you understand what you're building, present the design
|
|
49
|
+
- Break it into sections of 200-300 words
|
|
50
|
+
- Ask after each section whether it looks right so far
|
|
51
|
+
- Cover: architecture, components, data flow, error handling, testing
|
|
52
|
+
- Be ready to go back and clarify if something doesn't make sense
|
|
53
|
+
|
|
54
|
+
## After the Design
|
|
55
|
+
|
|
56
|
+
**Documentation (must not be skipped):**
|
|
57
|
+
|
|
58
|
+
Once the user confirms the design looks right, do ALL of the following:
|
|
59
|
+
|
|
60
|
+
1. Ensure the output directory exists (create `docs/plans/` if missing)
|
|
61
|
+
2. Write the final design content to `docs/plans/YYYY-MM-DD-<topic>-design.md`
|
|
62
|
+
3. Then tell the user the file path you wrote
|
|
63
|
+
|
|
64
|
+
Only commit if the user explicitly asks you to commit.
|
|
65
|
+
|
|
66
|
+
**Native Task Integration (must not be skipped):**
|
|
67
|
+
|
|
68
|
+
After each design section is confirmed by the user, create a native task using Claude Code's TaskCreate tool. Follow the format in `skills/shared/task-format-reference.md`.
|
|
69
|
+
|
|
70
|
+
~~~yaml
|
|
71
|
+
TaskCreate:
|
|
72
|
+
subject: "Implement [Component Name]"
|
|
73
|
+
description: |
|
|
74
|
+
**Goal:** [What this component produces — one sentence]
|
|
75
|
+
|
|
76
|
+
**Files:**
|
|
77
|
+
- Create/Modify: [paths identified during design]
|
|
78
|
+
|
|
79
|
+
**Acceptance Criteria:**
|
|
80
|
+
- [ ] [Criterion from design validation]
|
|
81
|
+
- [ ] [Criterion from design validation]
|
|
82
|
+
|
|
83
|
+
**Verify:** [How to test this component works]
|
|
84
|
+
|
|
85
|
+
```json:metadata
|
|
86
|
+
{"files": ["path/from/design"], "verifyCommand": "command to verify", "acceptanceCriteria": ["criterion 1", "criterion 2"]}
|
|
87
|
+
```
|
|
88
|
+
activeForm: "Implementing [Component Name]"
|
|
89
|
+
~~~
|
|
90
|
+
|
|
91
|
+
Track all returned task IDs.
|
|
92
|
+
|
|
93
|
+
After **all** components are validated, wire dependency relationships:
|
|
94
|
+
|
|
95
|
+
~~~yaml
|
|
96
|
+
TaskUpdate:
|
|
97
|
+
taskId: [dependent-task-id]
|
|
98
|
+
addBlockedBy: [prerequisite-task-ids]
|
|
99
|
+
~~~
|
|
100
|
+
|
|
101
|
+
Before handing off to writing-plans, run `TaskList` to display the complete task tree with dependency status so the user can confirm it looks right.
|
|
102
|
+
|
|
103
|
+
**Implementation (if continuing):**
|
|
104
|
+
|
|
105
|
+
- Ask: "Ready to set up for implementation?"
|
|
106
|
+
- Use superpowers:using-git-worktrees to create isolated workspace
|
|
107
|
+
- Use superpowers:writing-plans to create detailed implementation plan
|
|
108
|
+
|
|
109
|
+
## Key Principles
|
|
110
|
+
|
|
111
|
+
- **One question at a time** - Don't overwhelm with multiple questions
|
|
112
|
+
- **Multiple choice preferred** - Easier to answer than open-ended when possible
|
|
113
|
+
- **YAGNI ruthlessly** - Remove unnecessary features from all designs
|
|
114
|
+
- **Explore alternatives** - Always propose 2-3 approaches before settling
|
|
115
|
+
- **Incremental validation** - Present design in sections, validate each
|
|
116
|
+
- **Be flexible** - Go back and clarify when something doesn't make sense
|
|
117
|
+
|
|
118
|
+
## Multi-Model Design Validation
|
|
119
|
+
|
|
120
|
+
See `skills/shared/multi-model-integration-section.md` for routing, invocation, and fallback rules.
|
|
@@ -0,0 +1,36 @@
|
|
|
1
|
+
# Multi-Model Gate (Fail-Closed)
|
|
2
|
+
|
|
3
|
+
Use this gate whenever a skill decides **Routing != CLAUDE** (`CODEX`, `GEMINI`, or `CROSS_VALIDATION`).
|
|
4
|
+
|
|
5
|
+
## Core Rule
|
|
6
|
+
|
|
7
|
+
- If `Routing != CLAUDE`, you must obtain external model output via MCP tools (`mcp__codex__codex`, `mcp__gemini__gemini`).
|
|
8
|
+
- If code changed, you must run the review chain per `coordinating-multi-model-work/review-chain.md`.
|
|
9
|
+
- If you cannot obtain required external output, stop in `BLOCKED`.
|
|
10
|
+
|
|
11
|
+
## Evidence Requirement
|
|
12
|
+
|
|
13
|
+
```text
|
|
14
|
+
[Multi-Model Gate]
|
|
15
|
+
Routing: CODEX | GEMINI | CROSS_VALIDATION
|
|
16
|
+
Why: <one sentence>
|
|
17
|
+
|
|
18
|
+
Evidence (Implementation):
|
|
19
|
+
- Tool: mcp__codex__codex | mcp__gemini__gemini
|
|
20
|
+
- Params: <key MCP parameters used>
|
|
21
|
+
- Result: <3-6 bullets>
|
|
22
|
+
|
|
23
|
+
Evidence (Opus Review):
|
|
24
|
+
- Reviewer: Opus
|
|
25
|
+
- Artifact: <commit SHA>
|
|
26
|
+
- Result: <3-6 bullets>
|
|
27
|
+
```
|
|
28
|
+
|
|
29
|
+
## Failure Handling
|
|
30
|
+
|
|
31
|
+
```text
|
|
32
|
+
[Multi-Model Gate]
|
|
33
|
+
Routing: CODEX | GEMINI | CROSS_VALIDATION
|
|
34
|
+
Status: BLOCKED
|
|
35
|
+
Reason: timeout | tool-unavailable | permission-blocked | other
|
|
36
|
+
```
|
|
@@ -0,0 +1,51 @@
|
|
|
1
|
+
# Multi-Model Integration Guide
|
|
2
|
+
|
|
3
|
+
Claude is orchestrator-only. All implementation code goes through external models.
|
|
4
|
+
|
|
5
|
+
## Standard Pattern
|
|
6
|
+
|
|
7
|
+
1. Define one bounded task.
|
|
8
|
+
2. Include only the minimum context needed to complete that task.
|
|
9
|
+
3. Ask the worker for one of two outcomes:
|
|
10
|
+
- changed hunks / patch-ready diff
|
|
11
|
+
- blocking questions
|
|
12
|
+
4. Reuse `SESSION_ID` only for follow-up fixes on the same task.
|
|
13
|
+
5. Run Opus review on the resulting artifact.
|
|
14
|
+
|
|
15
|
+
## Hard Rules
|
|
16
|
+
|
|
17
|
+
- Do not ask for draft code that the orchestrator will later re-implement.
|
|
18
|
+
- Do not ask for design prose on an implementation task.
|
|
19
|
+
- Do not restate the whole PRD, plan, or prior conversation in every prompt.
|
|
20
|
+
- Do not send multiple workers the same bounded implementation task.
|
|
21
|
+
|
|
22
|
+
## Prompt Structure
|
|
23
|
+
|
|
24
|
+
Every implementation prompt should contain:
|
|
25
|
+
|
|
26
|
+
```text
|
|
27
|
+
## Task
|
|
28
|
+
[single bounded task]
|
|
29
|
+
|
|
30
|
+
## Files
|
|
31
|
+
[explicit file set]
|
|
32
|
+
|
|
33
|
+
## Acceptance
|
|
34
|
+
[2-5 concrete checks]
|
|
35
|
+
|
|
36
|
+
## Verify
|
|
37
|
+
[exact command]
|
|
38
|
+
|
|
39
|
+
## Response Protocol
|
|
40
|
+
FIRST: Read Serena memory 'global/response_protocol' for full format rules.
|
|
41
|
+
FALLBACK: Output only one of:
|
|
42
|
+
1. ## DIFF → ## VERIFY → ## ISSUES
|
|
43
|
+
2. ## QUESTIONS
|
|
44
|
+
```
|
|
45
|
+
|
|
46
|
+
## When To Cross-Validate
|
|
47
|
+
|
|
48
|
+
Use `CROSS_VALIDATION` only for design arbitration or unresolved multi-domain ambiguity. When you do:
|
|
49
|
+
- ask both models the same narrow question
|
|
50
|
+
- compare only divergences
|
|
51
|
+
- do not ask both to generate full implementations
|
|
@@ -0,0 +1,51 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: coordinating-multi-model-work
|
|
3
|
+
description: "Routes bounded implementation tasks to Codex (backend and systems) or Gemini (frontend) via MCP tools. Claude is orchestrator-only and should stay out of the implementation hot path. Use when: implementation, debugging, refactoring, UI work, APIs, databases, scripts, CI/CD, or cross-model arbitration."
|
|
4
|
+
---
|
|
5
|
+
|
|
6
|
+
# Coordinating Multi-Model Work
|
|
7
|
+
|
|
8
|
+
## Overview
|
|
9
|
+
|
|
10
|
+
Claude is the orchestrator. It routes tasks, coordinates workers, and integrates results, but never writes implementation code.
|
|
11
|
+
|
|
12
|
+
Use this module to route one bounded task at a time:
|
|
13
|
+
- **Codex** — backend and systems
|
|
14
|
+
- **Gemini** — frontend
|
|
15
|
+
|
|
16
|
+
## Core Rules
|
|
17
|
+
|
|
18
|
+
1. Reduce the current work to one bounded task with a clear file set and verification command.
|
|
19
|
+
2. Route that bounded task to exactly one worker unless there is real architectural uncertainty.
|
|
20
|
+
3. Reuse the same worker `SESSION_ID` for follow-up fixes on that task.
|
|
21
|
+
4. Ask for `diff-or-questions`, not prototypes, essays, or full rewrites.
|
|
22
|
+
5. After the worker completes, review the artifact with Opus.
|
|
23
|
+
|
|
24
|
+
## Cross-Validation
|
|
25
|
+
|
|
26
|
+
`CROSS_VALIDATION` is rare. Use it only when:
|
|
27
|
+
- the task genuinely spans frontend and backend at the same time, or
|
|
28
|
+
- two viable designs remain after scope reduction, or
|
|
29
|
+
- the failure mode is still ambiguous after one worker pass.
|
|
30
|
+
|
|
31
|
+
Do not use cross-validation as the default for ordinary implementation work.
|
|
32
|
+
|
|
33
|
+
## Checkpoint Workflow
|
|
34
|
+
|
|
35
|
+
At CP1, CP2, and CP3:
|
|
36
|
+
1. Decide routing
|
|
37
|
+
2. Apply `GATE.md`
|
|
38
|
+
3. Continue only with evidence
|
|
39
|
+
|
|
40
|
+
## Response Protocol
|
|
41
|
+
|
|
42
|
+
All external model prompts must reference Serena memory `global/response_protocol`.
|
|
43
|
+
|
|
44
|
+
## Reference Files
|
|
45
|
+
|
|
46
|
+
- `coordinating-multi-model-work/checkpoints.md`
|
|
47
|
+
- `coordinating-multi-model-work/routing-decision.md`
|
|
48
|
+
- `coordinating-multi-model-work/GATE.md`
|
|
49
|
+
- `coordinating-multi-model-work/INTEGRATION.md`
|
|
50
|
+
- `coordinating-multi-model-work/review-chain.md`
|
|
51
|
+
- `coordinating-multi-model-work/cross-validation.md`
|
|
@@ -0,0 +1,31 @@
|
|
|
1
|
+
# Collaboration Checkpoints
|
|
2
|
+
|
|
3
|
+
## Overview
|
|
4
|
+
|
|
5
|
+
Checkpoints exist to control routing and keep the orchestrator thread small.
|
|
6
|
+
|
|
7
|
+
## CP1: Task Analysis
|
|
8
|
+
|
|
9
|
+
- Reduce the work to one bounded task.
|
|
10
|
+
- Prefer one worker.
|
|
11
|
+
- Record routing in one short block.
|
|
12
|
+
|
|
13
|
+
## CP2: Mid-Review
|
|
14
|
+
|
|
15
|
+
Trigger only when:
|
|
16
|
+
- 2 or more attempts have failed on the same bounded task
|
|
17
|
+
- the worker returns blocking questions
|
|
18
|
+
- the task still has unresolved cross-domain ambiguity
|
|
19
|
+
|
|
20
|
+
Before escalating to `CROSS_VALIDATION`, first narrow the task further or split it.
|
|
21
|
+
|
|
22
|
+
## CP3: Quality Gate
|
|
23
|
+
|
|
24
|
+
- Review the artifact, not the whole session narrative.
|
|
25
|
+
- If code changed, run the Opus review chain.
|
|
26
|
+
- If no code changed, skip quality review.
|
|
27
|
+
|
|
28
|
+
## User Override
|
|
29
|
+
|
|
30
|
+
- "Use Codex" / "Use Gemini" / "Cross-validate" force corresponding routing.
|
|
31
|
+
- "Do not use external models" forces `CLAUDE` for docs and coordination only.
|
|
@@ -0,0 +1,37 @@
|
|
|
1
|
+
# Cross-Validation Mechanism
|
|
2
|
+
|
|
3
|
+
Cross-validation is for arbitration, not for routine implementation.
|
|
4
|
+
|
|
5
|
+
## Use It Only When
|
|
6
|
+
|
|
7
|
+
- the task has unavoidable frontend and backend coupling
|
|
8
|
+
- two competing designs remain after scope reduction
|
|
9
|
+
- one worker pass did not remove the ambiguity
|
|
10
|
+
|
|
11
|
+
## Do Not Use It When
|
|
12
|
+
|
|
13
|
+
- one worker can own the task cleanly
|
|
14
|
+
- you only want a prototype or a second opinion
|
|
15
|
+
- the implementation can be split into smaller bounded tasks
|
|
16
|
+
|
|
17
|
+
## Pattern
|
|
18
|
+
|
|
19
|
+
1. Ask Codex and Gemini the same narrow question.
|
|
20
|
+
2. Collect concise answers.
|
|
21
|
+
3. Compare only the disagreements.
|
|
22
|
+
4. Choose a direction.
|
|
23
|
+
5. Route implementation to one worker.
|
|
24
|
+
|
|
25
|
+
## Output
|
|
26
|
+
|
|
27
|
+
```markdown
|
|
28
|
+
## Cross-Validation Summary
|
|
29
|
+
|
|
30
|
+
**Agreement:** [shared conclusions]
|
|
31
|
+
|
|
32
|
+
**Divergences:**
|
|
33
|
+
| Aspect | Codex | Gemini | Resolution |
|
|
34
|
+
|--------|-------|--------|------------|
|
|
35
|
+
|
|
36
|
+
**Next worker:** [CODEX or GEMINI]
|
|
37
|
+
```
|
|
@@ -0,0 +1,40 @@
|
|
|
1
|
+
# Codex Base Prompt Templates
|
|
2
|
+
|
|
3
|
+
> Invoke via `mcp__codex__codex`.
|
|
4
|
+
> All prompts in English.
|
|
5
|
+
|
|
6
|
+
## Bounded Implementation Template
|
|
7
|
+
|
|
8
|
+
```text
|
|
9
|
+
## Task
|
|
10
|
+
{task_description}
|
|
11
|
+
|
|
12
|
+
## Files
|
|
13
|
+
{file_list}
|
|
14
|
+
|
|
15
|
+
## Acceptance
|
|
16
|
+
{acceptance_criteria}
|
|
17
|
+
|
|
18
|
+
## Verify
|
|
19
|
+
{verify_command}
|
|
20
|
+
|
|
21
|
+
## Response Protocol
|
|
22
|
+
FIRST: Read Serena memory 'global/response_protocol' for full format rules.
|
|
23
|
+
FALLBACK: Return exactly one of:
|
|
24
|
+
1. ## DIFF → ## VERIFY → ## ISSUES
|
|
25
|
+
2. ## QUESTIONS
|
|
26
|
+
```
|
|
27
|
+
|
|
28
|
+
## Narrow Analysis Template
|
|
29
|
+
|
|
30
|
+
```text
|
|
31
|
+
## Question
|
|
32
|
+
{narrow_question}
|
|
33
|
+
|
|
34
|
+
## Files
|
|
35
|
+
{file_list}
|
|
36
|
+
|
|
37
|
+
## Response Protocol
|
|
38
|
+
FIRST: Read Serena memory 'global/response_protocol' for full format rules.
|
|
39
|
+
FALLBACK: Output ## ANALYSIS (<=120 words) → ## ISSUES (<=3) → ## VERDICT (one line).
|
|
40
|
+
```
|
|
@@ -0,0 +1,41 @@
|
|
|
1
|
+
# Gemini Base Prompt Templates
|
|
2
|
+
|
|
3
|
+
> Invoke via `mcp__gemini__gemini`.
|
|
4
|
+
> All prompts in English.
|
|
5
|
+
> Keep prompts narrow.
|
|
6
|
+
|
|
7
|
+
## Bounded Implementation Template
|
|
8
|
+
|
|
9
|
+
```text
|
|
10
|
+
## Task
|
|
11
|
+
{task_description}
|
|
12
|
+
|
|
13
|
+
## Files
|
|
14
|
+
{file_list}
|
|
15
|
+
|
|
16
|
+
## Acceptance
|
|
17
|
+
{acceptance_criteria}
|
|
18
|
+
|
|
19
|
+
## Verify
|
|
20
|
+
{verify_command}
|
|
21
|
+
|
|
22
|
+
## Response Protocol
|
|
23
|
+
FIRST: Read Serena memory 'global/response_protocol' for full format rules.
|
|
24
|
+
FALLBACK: Return exactly one of:
|
|
25
|
+
1. ## DIFF → ## VERIFY → ## ISSUES
|
|
26
|
+
2. ## QUESTIONS
|
|
27
|
+
```
|
|
28
|
+
|
|
29
|
+
## Narrow Analysis Template
|
|
30
|
+
|
|
31
|
+
```text
|
|
32
|
+
## Question
|
|
33
|
+
{narrow_question}
|
|
34
|
+
|
|
35
|
+
## Files
|
|
36
|
+
{file_list}
|
|
37
|
+
|
|
38
|
+
## Response Protocol
|
|
39
|
+
FIRST: Read Serena memory 'global/response_protocol' for full format rules.
|
|
40
|
+
FALLBACK: Output ## ANALYSIS (<=120 words) → ## ISSUES (<=3) → ## VERDICT (one line).
|
|
41
|
+
```
|
|
@@ -0,0 +1,25 @@
|
|
|
1
|
+
# Review Chain (Canonical Reference)
|
|
2
|
+
|
|
3
|
+
This is the single source of truth for the review chain rule.
|
|
4
|
+
|
|
5
|
+
## The Rule
|
|
6
|
+
|
|
7
|
+
```text
|
|
8
|
+
FinalArbiter = Opus
|
|
9
|
+
```
|
|
10
|
+
|
|
11
|
+
## What This Means
|
|
12
|
+
|
|
13
|
+
| Implementer | Reviewer |
|
|
14
|
+
|-------------|----------|
|
|
15
|
+
| Codex | Opus |
|
|
16
|
+
| Gemini | Opus |
|
|
17
|
+
| Docs-only | Skip |
|
|
18
|
+
|
|
19
|
+
## Key Rules
|
|
20
|
+
|
|
21
|
+
- Opus has final say on every code-changing path.
|
|
22
|
+
- All implementers are reviewed directly by Opus.
|
|
23
|
+
- If Opus is unavailable for a code-changing path: `BLOCKED`.
|
|
24
|
+
- Max 3 fix-review loops before escalating to the user.
|
|
25
|
+
- Docs-only changes are exempt from quality review.
|
|
@@ -0,0 +1,50 @@
|
|
|
1
|
+
# Routing Decision Framework
|
|
2
|
+
|
|
3
|
+
## Overview
|
|
4
|
+
|
|
5
|
+
This framework guides Claude in making semantic routing decisions for multi-model task distribution.
|
|
6
|
+
|
|
7
|
+
## When to Use
|
|
8
|
+
|
|
9
|
+
Invoke this framework when a skill needs to call external models (Codex/Gemini) via the MCP tools (`mcp__codex__codex`, `mcp__gemini__gemini`).
|
|
10
|
+
|
|
11
|
+
## Decision Output
|
|
12
|
+
|
|
13
|
+
```text
|
|
14
|
+
**Routing Decision:** [CODEX | GEMINI | CROSS_VALIDATION | CLAUDE]
|
|
15
|
+
**Rationale:** [One sentence]
|
|
16
|
+
```
|
|
17
|
+
|
|
18
|
+
## Routing Targets
|
|
19
|
+
|
|
20
|
+
- **CODEX** - Backend and systems expert for APIs, databases, algorithms, server-side logic, CI/CD, scripts, Dockerfiles, infrastructure, and repo tooling
|
|
21
|
+
- **GEMINI** - Frontend expert for UI, components, styles, interactions
|
|
22
|
+
- **CROSS_VALIDATION** - Multiple models for full-stack tasks, architectural decisions, or high uncertainty (Codex + Gemini)
|
|
23
|
+
- **CLAUDE** - Orchestrator only: routing decisions, coordination, documentation edits
|
|
24
|
+
|
|
25
|
+
## Decision Guidelines
|
|
26
|
+
|
|
27
|
+
- Strong backend or systems signals and weak/no frontend signals → **CODEX**
|
|
28
|
+
- Strong frontend signals and weak/no backend signals → **GEMINI**
|
|
29
|
+
- Strong signals in both domains or high uncertainty → **CROSS_VALIDATION**
|
|
30
|
+
- Documentation-only or pure coordination → **CLAUDE**
|
|
31
|
+
|
|
32
|
+
## File Extension Heuristics
|
|
33
|
+
|
|
34
|
+
| File Pattern | Default Routing |
|
|
35
|
+
|-------------|----------------|
|
|
36
|
+
| `**/*.go`, `**/*.py`, `**/*.sql` | CODEX |
|
|
37
|
+
| `**/*.sh`, `**/*.yml`, `Dockerfile`, `Makefile`, `**/*.tf` | CODEX |
|
|
38
|
+
| `**/*.tsx`, `**/*.css`, `**/*.html` | GEMINI |
|
|
39
|
+
| Mixed frontend + backend | CROSS_VALIDATION |
|
|
40
|
+
| `**/*.md` (docs only, no code) | CLAUDE |
|
|
41
|
+
|
|
42
|
+
## Example
|
|
43
|
+
|
|
44
|
+
**Input:** "Fix the flaky test in CI pipeline"
|
|
45
|
+
|
|
46
|
+
**Output:**
|
|
47
|
+
```text
|
|
48
|
+
**Routing Decision:** CODEX
|
|
49
|
+
**Rationale:** CI/CD pipeline task with clear systems and automation signals
|
|
50
|
+
```
|