claude-raid 0.2.7 → 0.2.9
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +84 -23
- package/bin/cli.js +4 -2
- package/package.json +1 -1
- package/src/descriptions.js +10 -7
- package/src/init.js +36 -5
- package/src/merge-settings.js +53 -2
- package/src/remove.js +1 -1
- package/src/setup.js +32 -0
- package/src/ui.js +1 -0
- package/src/update.js +26 -3
- package/template/.claude/agents/archer.md +18 -4
- package/template/.claude/agents/rogue.md +18 -4
- package/template/.claude/agents/warrior.md +18 -4
- package/template/.claude/agents/wizard.md +32 -5
- package/template/.claude/dungeon-master-rules.md +120 -31
- package/template/.claude/hooks/raid-lib.sh +45 -4
- package/template/.claude/hooks/raid-pre-compact.sh +8 -4
- package/template/.claude/hooks/raid-session-end.sh +2 -2
- package/template/.claude/hooks/raid-session-start.sh +2 -0
- package/template/.claude/hooks/rtk-bridge.sh +46 -0
- package/template/.claude/hooks/validate-dungeon.sh +11 -3
- package/template/.claude/hooks/validate-file-naming.sh +6 -1
- package/template/.claude/hooks/validate-no-placeholders.sh +13 -2
- package/template/.claude/hooks/validate-write-gate.sh +7 -2
- package/template/.claude/party-rules.md +91 -65
- package/template/.claude/skills/raid-browser/SKILL.md +3 -5
- package/template/.claude/skills/raid-browser-chrome/SKILL.md +1 -1
- package/template/.claude/skills/raid-canonical-design/SKILL.md +309 -162
- package/template/.claude/skills/raid-canonical-implementation/SKILL.md +157 -132
- package/template/.claude/skills/raid-canonical-implementation-plan/SKILL.md +196 -141
- package/template/.claude/skills/raid-canonical-prd/SKILL.md +92 -89
- package/template/.claude/skills/raid-canonical-protocol/SKILL.md +29 -123
- package/template/.claude/skills/raid-canonical-review/SKILL.md +292 -148
- package/template/.claude/skills/raid-debugging/SKILL.md +1 -7
- package/template/.claude/skills/raid-init/SKILL.md +7 -5
- package/template/.claude/skills/raid-tdd/SKILL.md +5 -5
- package/template/.claude/skills/raid-teambuff/SKILL.md +6 -24
- package/template/.claude/skills/raid-verification/SKILL.md +0 -6
- package/template/.claude/skills/raid-wrap-up/SKILL.md +30 -29
|
@@ -5,232 +5,376 @@ description: "Use when Phase 5 (Review) begins in a Canonical Quest, after imple
|
|
|
5
5
|
|
|
6
6
|
# Raid Review — Phase 5 (Optional)
|
|
7
7
|
|
|
8
|
-
|
|
8
|
+
Two sub-phases: **Review** (find issues, build fix plan) then **Fix Session** (execute fixes). The review digests all prior deliverables — PRD, Design, Plan, Implementation — and verifies the implementation is correct, complete, and coherent.
|
|
9
9
|
|
|
10
10
|
<HARD-GATE>
|
|
11
|
-
This phase is OPTIONAL — the Wizard asks the human before entering. All
|
|
11
|
+
This phase is OPTIONAL — the Wizard asks the human before entering. All agents review the ENTIRE implementation. Use `raid-verification` before any completion claims.
|
|
12
12
|
</HARD-GATE>
|
|
13
13
|
|
|
14
|
-
## Mode Behavior
|
|
15
|
-
|
|
16
|
-
- **Full Raid**: 3 independent reviews, then agents fight directly over findings. All severity levels enforced.
|
|
17
|
-
- **Skirmish**: 1 agent reviews + Wizard. Cross-testing between reviewer and Wizard.
|
|
18
|
-
- **Scout**: Wizard reviews alone. Checks against requirements and runs tests.
|
|
19
|
-
|
|
20
14
|
## Process Flow
|
|
21
15
|
|
|
22
16
|
```dot
|
|
23
17
|
digraph review {
|
|
24
|
-
"Wizard reads
|
|
25
|
-
"
|
|
26
|
-
|
|
27
|
-
|
|
28
|
-
|
|
29
|
-
|
|
30
|
-
|
|
31
|
-
|
|
32
|
-
|
|
33
|
-
|
|
34
|
-
|
|
35
|
-
|
|
18
|
+
"Wizard reads all prior deliverables" -> "Phase recap (PRD + Design + Plan + Implementation)";
|
|
19
|
+
"Phase recap (PRD + Design + Plan + Implementation)" -> "SUB-PHASE A: REVIEW";
|
|
20
|
+
|
|
21
|
+
subgraph cluster_review {
|
|
22
|
+
label="Sub-phase A: Review";
|
|
23
|
+
"Roll dice for review turn order" -> "ROUND 1: Agent 1 reviews, pins findings";
|
|
24
|
+
"ROUND 1: Agent 1 reviews, pins findings" -> "Agent 2 adversary-tests findings + adds own";
|
|
25
|
+
"Agent 2 adversary-tests findings + adds own" -> "Agent 3 reviews + adds own";
|
|
26
|
+
"Agent 3 reviews + adds own" -> "Wizard evaluates Round 1";
|
|
27
|
+
"Wizard evaluates Round 1" -> "ROUND 2: Agent 1 converges findings, proposes fix plan";
|
|
28
|
+
"ROUND 2: Agent 1 converges findings, proposes fix plan" -> "Agents 2+3 attack fix plan";
|
|
29
|
+
"Agents 2+3 attack fix plan" -> "Wizard evaluates — Round 3?" [shape=diamond];
|
|
30
|
+
"Wizard evaluates — Round 3?" -> "ROUND 3 (FINAL)" [label="critical gaps"];
|
|
31
|
+
"Wizard evaluates — Round 3?" -> "Extract review.md fix plan" [label="solid"];
|
|
32
|
+
"ROUND 3 (FINAL)" -> "Extract review.md fix plan";
|
|
33
|
+
}
|
|
34
|
+
|
|
35
|
+
"SUB-PHASE A: REVIEW" -> "Roll dice for review turn order";
|
|
36
|
+
"Extract review.md fix plan" -> "Human approves fix plan?" [shape=diamond];
|
|
37
|
+
"Human approves fix plan?" -> "Ask why, revise" [label="no"];
|
|
38
|
+
"Ask why, revise" -> "Extract review.md fix plan";
|
|
39
|
+
"Human approves fix plan?" -> "Fixes needed?" [shape=diamond];
|
|
40
|
+
"Fixes needed?" -> "SUB-PHASE B: FIX SESSION" [label="yes"];
|
|
41
|
+
"Fixes needed?" -> "Commit + load wrap-up" [label="no fixes", shape=doublecircle];
|
|
42
|
+
|
|
43
|
+
subgraph cluster_fix {
|
|
44
|
+
label="Sub-phase B: Fix Session";
|
|
45
|
+
"Fresh dice roll for fix order" -> "Agent 1 makes fixes from review.md, reports";
|
|
46
|
+
"Agent 1 makes fixes from review.md, reports" -> "Agent 2 reviews fixes, reports";
|
|
47
|
+
"Agent 2 reviews fixes, reports" -> "Agent 3 reviews fixes, reports";
|
|
48
|
+
"Agent 3 reviews fixes, reports" -> "Wizard evaluates fixes";
|
|
49
|
+
"Wizard evaluates fixes" -> "More fix rounds?" [shape=diamond];
|
|
50
|
+
"More fix rounds?" -> "Next fix round" [label="yes"];
|
|
51
|
+
"More fix rounds?" -> "Wizard extracts results" [label="done"];
|
|
52
|
+
}
|
|
53
|
+
|
|
54
|
+
"SUB-PHASE B: FIX SESSION" -> "Fresh dice roll for fix order";
|
|
55
|
+
"Wizard extracts results" -> "Present to human";
|
|
56
|
+
"Present to human" -> "Commit + load wrap-up" [shape=doublecircle];
|
|
36
57
|
}
|
|
37
58
|
```
|
|
38
59
|
|
|
39
|
-
##
|
|
60
|
+
## Sub-phase A: Review
|
|
61
|
+
|
|
62
|
+
### Wizard Checklist (Review)
|
|
63
|
+
|
|
64
|
+
1. **Prepare** — gather all prior deliverables: PRD, design.md, task files, phase-4-implementation.md, git diff range
|
|
65
|
+
2. **Phase recap** — summarize all prior phases. Present to agents and human.
|
|
66
|
+
3. **Roll dice** — randomly shuffle `["warrior", "archer", "rogue"]` for the review turn order. Update raid-session via Bash using the jq command from protocol "Dice Roll Reference". Announce: *"The dice have spoken. Review turn order: {agent1} → {agent2} → {agent3}."*
|
|
67
|
+
4. **Create evolution log** — `{questDir}/phases/phase-5-review.md`
|
|
68
|
+
5. **Run rounds** — see Round Protocol below
|
|
69
|
+
6. **Extract fix plan** — polish into `{questDir}/spoils/review.md`
|
|
70
|
+
7. **Present to human** for approval
|
|
71
|
+
|
|
72
|
+
### Dispatch Templates (Review)
|
|
73
|
+
|
|
74
|
+
Dispatch carries only dynamic context. Detailed instructions (severity format, checklist, finding structure) are embedded in the scaffolded phase file.
|
|
75
|
+
|
|
76
|
+
**Reviewer (Round 1):**
|
|
77
|
+
```
|
|
78
|
+
TURN_DISPATCH: Phase 5 Review, Round 1, Turn {T}.
|
|
79
|
+
Quest: {description}
|
|
80
|
+
Phase recap: {summary of all prior phases — what was built, key decisions}
|
|
81
|
+
Your role: REVIEWER. Your section: "@{name} [R1]"
|
|
82
|
+
|
|
83
|
+
FIRST: Read the FULL document at {questDir}/phases/phase-5-review.md to understand the structure.
|
|
84
|
+
Read the embedded instructions in your section. Then read the code changes (git diff),
|
|
85
|
+
{questDir}/spoils/design.md, and task files.
|
|
86
|
+
THEN: Write your review in your designated section following the embedded instructions.
|
|
87
|
+
```
|
|
88
|
+
|
|
89
|
+
**Fix Plan Writer (Round 2, Turn 1):**
|
|
90
|
+
```
|
|
91
|
+
TURN_DISPATCH: Phase 5 Review, Round 2, Turn 1.
|
|
92
|
+
Quest: {description}
|
|
93
|
+
All Round 1 findings are in.
|
|
94
|
+
Your role: converge findings into fix plan. Your section: "@{name} [R2] — Converged Fix Plan"
|
|
95
|
+
|
|
96
|
+
FIRST: Read the FULL document at {questDir}/phases/phase-5-review.md.
|
|
97
|
+
Read all Round 1 findings. Read the embedded instructions in your section.
|
|
98
|
+
THEN: Write the converged fix plan following the embedded instructions.
|
|
99
|
+
```
|
|
40
100
|
|
|
41
|
-
|
|
42
|
-
|
|
43
|
-
|
|
44
|
-
|
|
45
|
-
|
|
46
|
-
|
|
47
|
-
|
|
48
|
-
|
|
49
|
-
|
|
50
|
-
|
|
51
|
-
|
|
52
|
-
12. **Transition** — invoke `raid-wrap-up`
|
|
101
|
+
**Fix Session dispatch:**
|
|
102
|
+
```
|
|
103
|
+
TURN_DISPATCH: Phase 5 Fix Session, Round 1, Turn {T}.
|
|
104
|
+
Quest: {description}
|
|
105
|
+
Fix plan: {questDir}/spoils/review.md
|
|
106
|
+
|
|
107
|
+
FIRST: Read the FULL document at {questDir}/phases/phase-5-review.md to find
|
|
108
|
+
the Fix Session section and your embedded instructions. Then read the fix plan.
|
|
109
|
+
THEN: Execute your role following the embedded instructions.
|
|
110
|
+
TDD enforced — load raid-tdd. Signal TURN_COMPLETE with status when done.
|
|
111
|
+
```
|
|
53
112
|
|
|
54
|
-
|
|
113
|
+
### Evolution Log Template (Sub-phase A)
|
|
55
114
|
|
|
56
|
-
|
|
115
|
+
Scaffold `{questDir}/phases/phase-5-review.md`. Replace agent name placeholders with actual names from dice roll:
|
|
57
116
|
|
|
58
117
|
```markdown
|
|
59
|
-
# Phase 5: Review
|
|
60
|
-
## Quest: Full adversarial review of <feature> implementation
|
|
61
|
-
## Mode: <Full Raid | Skirmish>
|
|
118
|
+
# Phase 5: Review — Evolution Log
|
|
62
119
|
|
|
63
|
-
|
|
120
|
+
## Quest: [quest description]
|
|
121
|
+
## Quest Type: Canonical Quest
|
|
122
|
+
## Turn Order (Review): @{agent1} → @{agent2} → @{agent3}
|
|
64
123
|
|
|
65
|
-
|
|
124
|
+
## References
|
|
125
|
+
- PRD: `{questDir}/spoils/prd.md` (if exists)
|
|
126
|
+
- Design: `{questDir}/spoils/design.md`
|
|
127
|
+
- Tasks: `{questDir}/spoils/tasks/phase-3-plan-task-*.md`
|
|
128
|
+
- Implementation: `{questDir}/phases/phase-4-implementation.md`
|
|
66
129
|
|
|
67
|
-
|
|
130
|
+
## Quest Goal
|
|
131
|
+
<!-- Wizard writes 2-3 lines: what this review must verify,
|
|
132
|
+
total file count from implementation, key risk areas to focus on -->
|
|
68
133
|
|
|
69
|
-
|
|
134
|
+
---
|
|
70
135
|
|
|
71
|
-
|
|
72
|
-
```
|
|
136
|
+
## Sub-phase A: Review
|
|
73
137
|
|
|
74
|
-
|
|
138
|
+
### @{agent1} [R1] — Full Implementation Review
|
|
75
139
|
|
|
76
|
-
|
|
140
|
+
<!-- @{agent1}: FIRST REVIEWER. Read ACTUAL CODE — not reports.
|
|
141
|
+
For each finding: [Severity] `file:line` — what, why, proposed fix.
|
|
142
|
+
Example: [Critical] `src/auth/handler.ts:23` — missing validation. Fix: add zod schema.
|
|
143
|
+
Checklist: requirements, code quality, testing, architecture, naming, production. -->
|
|
77
144
|
|
|
78
|
-
|
|
79
|
-
>
|
|
80
|
-
> **@Archer**: Review full implementation. Does it match the design doc exactly? Patterns consistent? Interfaces correct? Types sound? Naming conventions followed? File structure clean? Find the bugs that silently produce wrong results. Then fight @Warrior and @Rogue.
|
|
81
|
-
>
|
|
82
|
-
> **@Rogue**: Review full implementation. Think like an attacker. What inputs break it? What timing causes races? What happens when dependencies fail? Find the bugs nobody else will find. Then fight @Warrior and @Archer.
|
|
83
|
-
>
|
|
84
|
-
> **All**: Review independently first, then fight directly. Challenge each other's findings AND each other's blind spots. Pin severity-classified issues to Dungeon with `DUNGEON:`. Reference the Phase 3 Dungeon for context.
|
|
145
|
+
### @{agent2} [R1] — Adversarial Review
|
|
85
146
|
|
|
86
|
-
|
|
147
|
+
<!-- @{agent2}: ADVERSARIAL REVIEWER. Verify @{agent1}'s findings against actual code.
|
|
148
|
+
Challenge severity if overblown. Add findings @{agent1} missed. Don't repeat.
|
|
149
|
+
Same format: [Severity] `file:line` — what, why, fix. -->
|
|
87
150
|
|
|
88
|
-
|
|
151
|
+
### @{agent3} [R1] — Final Review Pass
|
|
89
152
|
|
|
90
|
-
|
|
153
|
+
<!-- @{agent3}: FINAL REVIEWER. Read all prior findings. Challenge what you disagree with.
|
|
154
|
+
Find what BOTH reviewers missed. Same format: [Severity] `file:line` — what, why, fix. -->
|
|
91
155
|
|
|
92
|
-
|
|
156
|
+
### Wizard [R1] Synthesis
|
|
157
|
+
<!-- Wizard categorizes all surviving findings by severity.
|
|
158
|
+
Counts: N Critical, N Important, N Minor.
|
|
159
|
+
Direction for Round 2. -->
|
|
93
160
|
|
|
94
|
-
|
|
161
|
+
---
|
|
95
162
|
|
|
96
|
-
|
|
163
|
+
### @{agent1} [R2] — Converged Fix Plan
|
|
97
164
|
|
|
98
|
-
|
|
165
|
+
<!-- @{agent1}: Read EVERY finding from all reviewers (R1).
|
|
166
|
+
Your job is to produce a SINGLE converged fix plan.
|
|
99
167
|
|
|
100
|
-
|
|
168
|
+
1. Group all findings by severity (Critical → Important → Minor)
|
|
169
|
+
2. Within each group, order by domain/file for efficient fixing
|
|
170
|
+
3. For each finding: confirm, mark as false positive (with evidence), or merge duplicates
|
|
171
|
+
4. Propose concrete fix for each confirmed finding
|
|
172
|
+
5. Note execution order (dependencies between fixes)
|
|
101
173
|
|
|
102
|
-
|
|
103
|
-
1
|
|
104
|
-
|
|
105
|
-
|
|
106
|
-
|
|
174
|
+
Format per finding:
|
|
175
|
+
**[Critical-1]** `src/auth/handler.ts:23` — Missing input validation
|
|
176
|
+
- Found by: @{agent2} [R1], confirmed by @{agent3} [R1]
|
|
177
|
+
- Fix: Add zod schema validation in validateToken() before line 23
|
|
178
|
+
- Blocked by: none -->
|
|
107
179
|
|
|
108
|
-
|
|
180
|
+
### @{agent2} [R2] — Fix Plan Review
|
|
109
181
|
|
|
110
|
-
|
|
111
|
-
|
|
112
|
-
Actions speak. Fix and show — don't compliment.
|
|
182
|
+
<!-- @{agent2}: Review fix plan. Are fixes correct? Execution order right?
|
|
183
|
+
Challenge false positive designations. Flag dropped findings. -->
|
|
113
184
|
|
|
114
|
-
|
|
185
|
+
### @{agent3} [R2] — Fix Plan Review
|
|
115
186
|
|
|
116
|
-
|
|
187
|
+
<!-- @{agent3}: Same focus. Challenge what @{agent2} missed.
|
|
188
|
+
Confirm or dispute false positive designations. -->
|
|
117
189
|
|
|
118
|
-
|
|
119
|
-
|
|
120
|
-
|
|
121
|
-
- If used: implement properly
|
|
122
|
-
- Don't gold-plate during review
|
|
190
|
+
### Wizard [R2] Synthesis
|
|
191
|
+
<!-- Wizard evaluates the fix plan. If solid → extract to review.md.
|
|
192
|
+
If critical gaps → announce Round 3 as FINAL. -->
|
|
123
193
|
|
|
124
|
-
|
|
194
|
+
---
|
|
125
195
|
|
|
126
|
-
|
|
196
|
+
## Final Extraction Notes — Wizard
|
|
197
|
+
<!-- What was incorporated into review.md.
|
|
198
|
+
False positives excluded and why.
|
|
199
|
+
Total findings: N confirmed, N false positives, N deferred. -->
|
|
127
200
|
|
|
128
|
-
|
|
129
|
-
- `BUILDING: @Warrior, your finding about the missing error handler — the impact is worse than you stated because...`
|
|
130
|
-
- `CHALLENGE: @Rogue, your "Critical" severity on the naming inconsistency is overblown — here's why it's actually Minor...`
|
|
131
|
-
- `DUNGEON: [Critical] handler.js:23 — missing input validation allows injection. Verified by @Warrior and @Rogue.`
|
|
201
|
+
---
|
|
132
202
|
|
|
133
|
-
|
|
203
|
+
## Writing Guidance
|
|
204
|
+
- Sign all work: `@{name} [R{N}]`
|
|
205
|
+
- Read ACTUAL CODE — not summaries, not reports, not commit messages
|
|
206
|
+
- Every finding needs: severity, location, what, why, proposed fix
|
|
207
|
+
- No performative agreement — no "Great catch!" Just evidence or pushback.
|
|
208
|
+
- Reviewers: challenge severity classifications, not just content
|
|
209
|
+
- Fix plan must be actionable — concrete fixes, not "improve error handling"
|
|
210
|
+
```
|
|
134
211
|
|
|
135
|
-
|
|
136
|
-
|----------|------------|--------|
|
|
137
|
-
| **Critical** | Bugs, security holes, data loss, crashes | Must fix. No exceptions. |
|
|
138
|
-
| **Important** | Missing features, poor error handling, test gaps, naming inconsistencies | Must fix. |
|
|
139
|
-
| **Minor** | Style, docs, optimization | Note for future. |
|
|
212
|
+
**Round 3:** If needed, wizard appends Round 3 sections before dispatching. Do NOT pre-scaffold.
|
|
140
213
|
|
|
141
|
-
|
|
214
|
+
### Browser Inspection (when `browser.enabled`)
|
|
142
215
|
|
|
143
|
-
After code review findings are pinned,
|
|
216
|
+
After code review findings are pinned, agents inspect the live application:
|
|
217
|
+
1. Each reviewer boots their own instance on separate ports (invoke `raid-browser`)
|
|
218
|
+
2. Pre-flight: state test subject, check auth, discover routes
|
|
219
|
+
3. Inspect from angle (invoke `raid-browser-chrome`): Warrior=stress, Archer=visual, Rogue=security
|
|
220
|
+
4. Cross-verify others' findings on own instance
|
|
221
|
+
5. Pin browser findings alongside code findings
|
|
222
|
+
6. Cleanup instances
|
|
144
223
|
|
|
145
|
-
|
|
224
|
+
Browser bugs block merge the same way code bugs do.
|
|
146
225
|
|
|
147
|
-
|
|
148
|
-
2. **Each reviewer BOOTs** their own app instance on separate ports (invoke `raid-browser`)
|
|
149
|
-
3. **Each reviewer runs PRE-FLIGHT** — state test subject, check auth, discover routes
|
|
150
|
-
4. **Each reviewer LOGINs** if auth is required (credentials from `.env.raid`)
|
|
151
|
-
5. **Each reviewer inspects** from their angle (invoke `raid-browser-chrome`):
|
|
152
|
-
- Minimum gates first (console, network, page loads)
|
|
153
|
-
- Then angle-driven exploration (Warrior: stress, Archer: visual/precision, Rogue: security)
|
|
154
|
-
- Evidence captured for every finding (GIF, screenshot, console/network)
|
|
155
|
-
6. **Cross-verification** — each reviewer reproduces others' findings on their own instance
|
|
156
|
-
7. **Pin browser findings** to Dungeon alongside code review findings
|
|
157
|
-
8. **Each reviewer CLEANUPs** their instance
|
|
158
|
-
9. **Wizard rules** on ALL findings (code + browser) together
|
|
226
|
+
## Sub-phase B: Fix Session
|
|
159
227
|
|
|
160
|
-
|
|
228
|
+
Only entered if `review.md` contains fixes to make. This is different from the Implementation phase — the source is `review.md`, not numbered plan tasks.
|
|
161
229
|
|
|
162
|
-
|
|
163
|
-
- **Important** (broken feature, visual inconsistency, responsive breakage) — must fix
|
|
164
|
-
- **Minor** (polish, console warnings) — note for future
|
|
230
|
+
### Wizard Checklist (Fix Session)
|
|
165
231
|
|
|
166
|
-
**
|
|
232
|
+
1. **Fresh dice roll** — a new turn order for the fix session. Update raid-session via Bash using the jq command from protocol "Dice Roll Reference". Announce: *"Fresh dice for the fix session: {agent1} → {agent2} → {agent3}."*
|
|
233
|
+
2. **Dispatch fixes** — round-based, sequential
|
|
167
234
|
|
|
168
|
-
|
|
235
|
+
### Fix Session Evolution Log (Appended Dynamically)
|
|
169
236
|
|
|
170
|
-
|
|
237
|
+
When Sub-phase B begins, the wizard appends these sections to `phase-5-review.md` with fresh agent names from the new dice roll:
|
|
171
238
|
|
|
239
|
+
```markdown
|
|
240
|
+
---
|
|
241
|
+
|
|
242
|
+
## Sub-phase B: Fix Session
|
|
243
|
+
|
|
244
|
+
## Turn Order (Fix Session): @{agent1} → @{agent2} → @{agent3}
|
|
245
|
+
<!-- Fresh dice roll — may be different order from review sub-phase -->
|
|
246
|
+
|
|
247
|
+
### @{agent1} [R1] — Implementing Fixes
|
|
248
|
+
|
|
249
|
+
<!-- @{agent1}: Work through review.md fix plan in order.
|
|
250
|
+
For each fix:
|
|
251
|
+
1. Implement the fix following TDD (write test → verify fail → fix → verify pass)
|
|
252
|
+
2. Report what was fixed and how
|
|
253
|
+
|
|
254
|
+
Format per fix:
|
|
255
|
+
**[Critical-1]** `src/auth/handler.ts:23` — FIXED
|
|
256
|
+
- Change: Added zod schema validation in validateToken()
|
|
257
|
+
- Test: `tests/auth/handler.test.ts` — added "rejects malformed tokens" test
|
|
258
|
+
- Commit: `fix(auth): add input validation to token handler`
|
|
259
|
+
|
|
260
|
+
Prioritize: blocking issues first, then simple fixes, then complex fixes. -->
|
|
261
|
+
|
|
262
|
+
### @{agent2} [R1] — Fix Verification
|
|
263
|
+
|
|
264
|
+
<!-- @{agent2}: Read the ACTUAL CODE changes for each fix above.
|
|
265
|
+
- Does each fix address the original finding?
|
|
266
|
+
- Does any fix introduce new issues?
|
|
267
|
+
- Run the full test suite — any regressions?
|
|
268
|
+
Report per fix: VERIFIED or ISSUE: [what's wrong] -->
|
|
269
|
+
|
|
270
|
+
### @{agent3} [R1] — Fix Verification
|
|
271
|
+
|
|
272
|
+
<!-- @{agent3}: Same focus. Verify fixes AND @{agent2}'s verification.
|
|
273
|
+
Final pass — anything missed? -->
|
|
274
|
+
|
|
275
|
+
### Wizard [R1] Synthesis
|
|
276
|
+
<!-- All fixes verified? If issues remain → another round.
|
|
277
|
+
If clean → extract results, present to human. -->
|
|
172
278
|
```
|
|
173
|
-
BLACKCARD: [description of breaking concern]
|
|
174
|
-
Evidence: [file paths, scenarios, why this is unfixable within current design]
|
|
175
|
-
Impact: [what breaks, how deep the damage goes]
|
|
176
|
-
```
|
|
177
279
|
|
|
178
|
-
|
|
179
|
-
|
|
180
|
-
|
|
181
|
-
|
|
182
|
-
|
|
183
|
-
|
|
184
|
-
|
|
185
|
-
|
|
186
|
-
|
|
187
|
-
|
|
280
|
+
2-3 rounds until the Wizard is satisfied all fixes are sound.
|
|
281
|
+
|
|
282
|
+
### Review Deliverable Template
|
|
283
|
+
|
|
284
|
+
Wizard extracts into `{questDir}/spoils/review.md` — issue-centric, grouped by severity:
|
|
285
|
+
|
|
286
|
+
```markdown
|
|
287
|
+
# [Feature Name] — Review Report
|
|
288
|
+
|
|
289
|
+
## Quest: [quest description]
|
|
290
|
+
## Date: YYYY-MM-DD
|
|
291
|
+
## Author: Wizard (extracted from phase-5-review.md)
|
|
292
|
+
|
|
293
|
+
---
|
|
294
|
+
|
|
295
|
+
## Summary
|
|
296
|
+
<!-- Total findings, breakdown by severity, fix session outcome -->
|
|
188
297
|
|
|
189
|
-
|
|
298
|
+
## Critical Issues
|
|
190
299
|
|
|
191
|
-
|
|
300
|
+
### [Critical-1] `file:line` — Short description
|
|
301
|
+
- **Found by:** @agent [R1], confirmed by @agent [R1]
|
|
302
|
+
- **Description:** What is wrong and why it matters
|
|
303
|
+
- **Fix:** What was done to resolve it
|
|
304
|
+
- **Status:** Fixed | Deferred — [reason]
|
|
305
|
+
- **Verification:** Test name or evidence that the fix works
|
|
192
306
|
|
|
193
|
-
|
|
194
|
-
1. **Blocking issues** — crashes, security holes, data loss
|
|
195
|
-
2. **Simple fixes** — typos, imports, naming inconsistencies
|
|
196
|
-
3. **Complex fixes** — refactoring, logic changes, architectural adjustments
|
|
307
|
+
## Important Issues
|
|
197
308
|
|
|
198
|
-
|
|
309
|
+
### [Important-1] `file:line` — Short description
|
|
310
|
+
<!-- Same structure -->
|
|
199
311
|
|
|
200
|
-
##
|
|
312
|
+
## Minor Issues (Noted for Future)
|
|
201
313
|
|
|
202
|
-
|
|
314
|
+
### [Minor-1] `file:line` — Short description
|
|
315
|
+
- **Found by:** @agent [R1]
|
|
316
|
+
- **Description:** What and why
|
|
317
|
+
- **Status:** Deferred — not blocking
|
|
203
318
|
|
|
204
|
-
##
|
|
319
|
+
## False Positives
|
|
205
320
|
|
|
206
|
-
|
|
321
|
+
### [FP-1] `file:line` — Short description
|
|
322
|
+
- **Raised by:** @agent [R1]
|
|
323
|
+
- **Dismissed by:** @agent [R2] — [evidence why it's not an issue]
|
|
324
|
+
```
|
|
325
|
+
|
|
326
|
+
## Black Card System
|
|
327
|
+
|
|
328
|
+
If any agent finds something that fundamentally breaks the architecture — unfixable within current design:
|
|
207
329
|
|
|
208
|
-
|
|
330
|
+
```
|
|
331
|
+
BLACKCARD: [description]
|
|
332
|
+
Evidence: [file paths, scenarios, why unfixable]
|
|
333
|
+
Impact: [what breaks, how deep]
|
|
334
|
+
```
|
|
209
335
|
|
|
210
|
-
**
|
|
336
|
+
**Flow:** Agent plays → 2+ agents verify → Wizard escalates to human → Options: (a) rollback to earlier phase, (b) accept limitation.
|
|
337
|
+
|
|
338
|
+
Black cards are RARE. Most issues are Critical or Important, not black cards.
|
|
339
|
+
|
|
340
|
+
## No Performative Agreement
|
|
341
|
+
|
|
342
|
+
NEVER respond with "Great catch!" or "You're absolutely right!" Instead: state the finding, show evidence, or push back. If a finding IS correct: fix it and move on.
|
|
343
|
+
|
|
344
|
+
## Verification Protocol
|
|
345
|
+
|
|
346
|
+
Before acting on ANY finding:
|
|
347
|
+
1. **READ:** Complete the finding without reacting
|
|
348
|
+
2. **VERIFY:** Check against actual code at the referenced location
|
|
349
|
+
3. **EVALUATE:** Is this technically sound for THIS codebase?
|
|
350
|
+
4. **RESPOND:** Technical evidence or reasoned pushback
|
|
211
351
|
|
|
212
352
|
## Red Flags
|
|
213
353
|
|
|
214
354
|
| Thought | Reality |
|
|
215
355
|
|---------|---------|
|
|
216
|
-
| "The implementation looks fine
|
|
217
|
-
| "
|
|
218
|
-
| "This is a Minor issue" (when it causes wrong behavior) | Wrong results = Important or Critical. |
|
|
356
|
+
| "The implementation looks fine" | Every review finds at least one issue. Look harder. |
|
|
357
|
+
| "This is Minor" (when it causes wrong behavior) | Wrong results = Important or Critical. |
|
|
219
358
|
| "The tests pass, so it works" | Tests prove what they test. What DON'T they test? |
|
|
220
|
-
| "Let
|
|
359
|
+
| "Let me silently ignore that finding" | Every finding gets addressed in the fix plan. |
|
|
360
|
+
| "Fixes are simple, skip re-review" | Fixes introduce new bugs. Always re-verify. |
|
|
221
361
|
|
|
222
362
|
---
|
|
223
363
|
|
|
224
364
|
## Phase Transition
|
|
225
365
|
|
|
226
|
-
When the
|
|
366
|
+
When the review is complete and all fixes verified:
|
|
227
367
|
|
|
228
|
-
1. Update
|
|
368
|
+
1. Update raid-session phase via Bash:
|
|
229
369
|
```bash
|
|
230
370
|
jq '.phase="wrap-up"' .claude/raid-session > .claude/raid-session.tmp && mv .claude/raid-session.tmp .claude/raid-session
|
|
231
371
|
```
|
|
232
|
-
2. **Commit
|
|
233
|
-
3. **
|
|
234
|
-
4. **Load
|
|
372
|
+
2. **Commit:** `fix(quest-{slug}): phase 5 review — {N} findings resolved`
|
|
373
|
+
3. **Report:** Link `review.md` and `phase-5-review.md` file paths to the human.
|
|
374
|
+
4. **Load `raid-wrap-up` and begin Phase 6.**
|
|
375
|
+
|
|
376
|
+
## Phase Spoils
|
|
235
377
|
|
|
236
|
-
|
|
378
|
+
**Two outputs:**
|
|
379
|
+
- `{questDir}/phases/phase-5-review.md` — Full evolution (findings, challenges, fix plan debate, fix session)
|
|
380
|
+
- `{questDir}/spoils/review.md` — Clean fix plan deliverable (what was found, what was fixed, what was deferred)
|
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
---
|
|
2
2
|
name: raid-debugging
|
|
3
|
-
description: "Use when encountering any bug, test failure, or unexpected behavior. Agents investigate competing hypotheses in
|
|
3
|
+
description: "Use when encountering any bug, test failure, or unexpected behavior. Agents investigate competing hypotheses in sequential turns. No fixes without root cause. No subagents."
|
|
4
4
|
---
|
|
5
5
|
|
|
6
6
|
# Raid Debugging — Adversarial Root Cause Analysis
|
|
@@ -19,12 +19,6 @@ NO FIXES WITHOUT ROOT CAUSE INVESTIGATION FIRST
|
|
|
19
19
|
|
|
20
20
|
If you haven't completed Phase 1, you cannot propose fixes.
|
|
21
21
|
|
|
22
|
-
## Mode Behavior
|
|
23
|
-
|
|
24
|
-
- **Full Raid**: 3 agents investigate competing hypotheses in parallel.
|
|
25
|
-
- **Skirmish**: 2 agents with different hypotheses.
|
|
26
|
-
- **Scout**: 1 agent investigates + Wizard challenges the hypothesis.
|
|
27
|
-
|
|
28
22
|
## Process Flow
|
|
29
23
|
|
|
30
24
|
```dot
|
|
@@ -26,7 +26,7 @@ digraph init {
|
|
|
26
26
|
"Coming soon message" -> "Present quest menu";
|
|
27
27
|
"Ask: PRD needed?" -> "Human describes task";
|
|
28
28
|
"Human describes task" -> "Spawn full team + create quest dir";
|
|
29
|
-
"Spawn full team + create quest dir" -> "
|
|
29
|
+
"Spawn full team + create quest dir" -> "Announce quest + begin first phase" [shape=doublecircle];
|
|
30
30
|
}
|
|
31
31
|
```
|
|
32
32
|
|
|
@@ -90,12 +90,12 @@ Ask the human to describe the task/feature they want to build. Listen carefully.
|
|
|
90
90
|
|
|
91
91
|
### 4c. Spawn Team & Setup
|
|
92
92
|
|
|
93
|
-
The Canonical Quest always runs
|
|
93
|
+
The Canonical Quest always runs with the full party (Wizard + Warrior + Archer + Rogue). 4 agents, no reduced configurations.
|
|
94
94
|
|
|
95
95
|
1. Update `.claude/raid-session` (created by the session-start hook) via **Bash with jq** — the write gate blocks Write/Edit on this file, so always use Bash:
|
|
96
96
|
```bash
|
|
97
97
|
jq --arg qt "canonical" --arg qid "{questId}" --arg qdir ".claude/dungeon/{questId}" \
|
|
98
|
-
'.questType=$qt | .questId=$qid | .questDir=$qdir
|
|
98
|
+
'.questType=$qt | .questId=$qid | .questDir=$qdir' \
|
|
99
99
|
.claude/raid-session > .claude/raid-session.tmp && mv .claude/raid-session.tmp .claude/raid-session
|
|
100
100
|
```
|
|
101
101
|
2. Create quest directory if not already created by hook:
|
|
@@ -116,13 +116,15 @@ The Canonical Quest always runs as Full Raid (Warrior, Archer, Rogue). Do NOT as
|
|
|
116
116
|
- If PRD skipped → Load `raid-canonical-design` skill, begin Phase 2
|
|
117
117
|
|
|
118
118
|
**Announce the quest to the party and the human:**
|
|
119
|
-
> "The quest begins: **{task description}**.
|
|
119
|
+
> "The quest begins: **{task description}**. 4 brave souls answer the call. The dice will roll at each phase to determine turn order."
|
|
120
|
+
|
|
121
|
+
Dice rolls happen **per phase**, not at quest start. The first dice roll happens when Phase 2 (Design) opens — or whenever the first agent phase begins. Phase 1 (PRD) is wizard+human only, so no dice needed there.
|
|
120
122
|
|
|
121
123
|
## Red Flags
|
|
122
124
|
|
|
123
125
|
| Thought | Reality |
|
|
124
126
|
|---------|---------|
|
|
125
127
|
| "Skip the greeting, get to work" | The greeting sets the tone. It takes 5 seconds. Do it. |
|
|
126
|
-
| "Let me ask which mode to use" | Canonical Quest =
|
|
128
|
+
| "Let me ask which mode to use" | Canonical Quest = full party, always. Don't ask. |
|
|
127
129
|
| "Let me start exploring the codebase" | You are the Wizard. You don't explore. You dispatch. |
|
|
128
130
|
| "I'll figure out the quest type later" | Quest type determines the phase flow. Choose now. |
|
|
@@ -11,7 +11,7 @@ Write the test first. Watch it fail. Write minimal code to pass. Then the others
|
|
|
11
11
|
|
|
12
12
|
**Violating the letter of these rules is violating their spirit.**
|
|
13
13
|
|
|
14
|
-
**TDD is enforced
|
|
14
|
+
**TDD is enforced. No exceptions.**
|
|
15
15
|
|
|
16
16
|
## The Iron Law
|
|
17
17
|
|
|
@@ -116,7 +116,7 @@ When claiming tests pass, both must pass:
|
|
|
116
116
|
|
|
117
117
|
## Adversarial Test Review
|
|
118
118
|
|
|
119
|
-
After TDD cycle, challengers attack the TESTS
|
|
119
|
+
After TDD cycle, challengers attack the TESTS in their sequential turns — each building on prior challengers' findings via Dungeon pins:
|
|
120
120
|
|
|
121
121
|
1. **Does this test prove the behavior, or just confirm the implementation?** If you renamed an internal method, would the test break? It shouldn't.
|
|
122
122
|
2. **What input would make this test pass even with a broken implementation?** (e.g., a test that only checks the happy path passes for any implementation that doesn't crash)
|
|
@@ -124,9 +124,9 @@ After TDD cycle, challengers attack the TESTS directly — and build on each oth
|
|
|
124
124
|
4. **Is it testing real code or mock behavior?** Mocks that don't match real behavior = false confidence.
|
|
125
125
|
5. **Would this catch a regression?** If someone changes the implementation next month, does this test catch the break?
|
|
126
126
|
|
|
127
|
-
**Challengers
|
|
128
|
-
-
|
|
129
|
-
-
|
|
127
|
+
**Challengers pin findings to the Dungeon on their turns:**
|
|
128
|
+
- `@archer [R1] CHALLENGE: @warrior's test at line 15 only validates the happy path — here's an input that passes with a broken implementation: ...`
|
|
129
|
+
- `@rogue [R1] BUILDING: @archer's edge case finding — the same gap exists in the error path test at line 32...`
|
|
130
130
|
- `CHALLENGE: @Rogue, you claimed the test is implementation-dependent but renaming the internal method doesn't break it — here's proof: ...`
|
|
131
131
|
|
|
132
132
|
**Browser-specific attacks (when `browser.enabled`):**
|