npm - claude-raid - Versions diffs - 0.2.7 → 0.2.9 - Mend

claude-raid 0.2.7 → 0.2.9

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (39) hide show

package/README.md +84 -23
package/bin/cli.js +4 -2
package/package.json +1 -1
package/src/descriptions.js +10 -7
package/src/init.js +36 -5
package/src/merge-settings.js +53 -2
package/src/remove.js +1 -1
package/src/setup.js +32 -0
package/src/ui.js +1 -0
package/src/update.js +26 -3
package/template/.claude/agents/archer.md +18 -4
package/template/.claude/agents/rogue.md +18 -4
package/template/.claude/agents/warrior.md +18 -4
package/template/.claude/agents/wizard.md +32 -5
package/template/.claude/dungeon-master-rules.md +120 -31
package/template/.claude/hooks/raid-lib.sh +45 -4
package/template/.claude/hooks/raid-pre-compact.sh +8 -4
package/template/.claude/hooks/raid-session-end.sh +2 -2
package/template/.claude/hooks/raid-session-start.sh +2 -0
package/template/.claude/hooks/rtk-bridge.sh +46 -0
package/template/.claude/hooks/validate-dungeon.sh +11 -3
package/template/.claude/hooks/validate-file-naming.sh +6 -1
package/template/.claude/hooks/validate-no-placeholders.sh +13 -2
package/template/.claude/hooks/validate-write-gate.sh +7 -2
package/template/.claude/party-rules.md +91 -65
package/template/.claude/skills/raid-browser/SKILL.md +3 -5
package/template/.claude/skills/raid-browser-chrome/SKILL.md +1 -1
package/template/.claude/skills/raid-canonical-design/SKILL.md +309 -162
package/template/.claude/skills/raid-canonical-implementation/SKILL.md +157 -132
package/template/.claude/skills/raid-canonical-implementation-plan/SKILL.md +196 -141
package/template/.claude/skills/raid-canonical-prd/SKILL.md +92 -89
package/template/.claude/skills/raid-canonical-protocol/SKILL.md +29 -123
package/template/.claude/skills/raid-canonical-review/SKILL.md +292 -148
package/template/.claude/skills/raid-debugging/SKILL.md +1 -7
package/template/.claude/skills/raid-init/SKILL.md +7 -5
package/template/.claude/skills/raid-tdd/SKILL.md +5 -5
package/template/.claude/skills/raid-teambuff/SKILL.md +6 -24
package/template/.claude/skills/raid-verification/SKILL.md +0 -6
package/template/.claude/skills/raid-wrap-up/SKILL.md +30 -29

package/template/.claude/skills/raid-canonical-design/SKILL.md CHANGED Viewed

@@ -5,248 +5,395 @@ description: "Use when Phase 2 (Design) begins in a Canonical Quest, after PRD i
 # Raid Design — Phase 2
-Turn ideas into battle-tested designs through agent-driven adversarial exploration.
+Turn ideas into battle-tested designs through the writer/reviewer/defend-concede protocol.
 <HARD-GATE>
-Do NOT write any code, scaffold any project, or take any implementation action until the Wizard has approved the design and it is committed to git. All assigned agents participate. Agents communicate via SendMessage — do not spawn subagents.
+Do NOT write any code, scaffold any project, or take any implementation action until the design is approved and committed.
 </HARD-GATE>
 ## Scope Check
-Before asking detailed questions, assess scope. If the request describes multiple independent subsystems (e.g., "build a platform with chat, file storage, billing, and analytics"), flag this immediately. Don't spend rounds refining details of a project that needs decomposition first.
+Before dispatching agents, assess scope. If the request describes multiple independent subsystems (e.g., "build a platform with chat, file storage, billing, and analytics"), flag it immediately. Don't spend rounds refining a project that needs decomposition first.
-If too large for a single design: help the human decompose into sub-quests. Each sub-quest gets its own design → plan → implementation cycle. Design the first sub-quest through the normal flow.
-## Mode Behavior
-- **Full Raid**: All 3 agents explore from different angles, fight directly, pin findings to Dungeon. Full design doc required.
-- **Skirmish**: 2 agents explore and interact, produce a lightweight design+plan combined doc.
-- **Scout**: Wizard assesses inline, no design doc required. Skip this skill entirely.
+If too large for a single design: decompose into sub-quests with the human. Each sub-quest gets its own design → plan → implementation cycle.
 ## Process Flow
 ```dot
 digraph design {
-  "Wizard comprehends request (reads 3x)" -> "Scope check";
-  "Scope check" -> "Too large?" [shape=diamond];
-  "Too large?" -> "Decompose into sub-projects" [label="yes"];
-  "Decompose into sub-projects" -> "Brainstorm first sub-project";
-  "Too large?" -> "Explore project context" [label="no"];
-  "Explore project context" -> "Research dependencies";
-  "Research dependencies" -> "Ask clarifying questions (one at a time)";
-  "Ask clarifying questions (one at a time)" -> "Wizard opens Dungeon + dispatches";
-  "Wizard opens Dungeon + dispatches" -> "Agents explore, challenge, build freely";
-  "Agents explore, challenge, build freely" -> "Agents pin verified findings to Dungeon";
-  "Agents pin verified findings to Dungeon" -> "Dungeon sufficient?" [shape=diamond];
-  "Dungeon sufficient?" -> "Agents explore, challenge, build freely" [label="no"];
-  "Dungeon sufficient?" -> "Wizard closes: synthesizes 2-3 approaches from Dungeon" [label="yes"];
-  "Wizard closes: synthesizes 2-3 approaches from Dungeon" -> "Present design (section by section)";
-  "Present design (section by section)" -> "Human approves?" [shape=diamond];
-  "Human approves?" -> "Present design (section by section)" [label="revise"];
-  "Human approves?" -> "Write design doc" [label="yes"];
-  "Write design doc" -> "Adversarial spec review (agents attack directly)";
-  "Adversarial spec review (agents attack directly)" -> "Spec self-review (fix inline)";
-  "Spec self-review (fix inline)" -> "Human reviews written spec";
-  "Human reviews written spec" -> "Commit + invoke raid-canonical-implementation-plan" [shape=doublecircle];
+  "Wizard comprehends request + scope check" -> "Explore codebase, ask human questions";
+  "Explore codebase, ask human questions" -> "Phase recap (PRD if exists)";
+  "Phase recap (PRD if exists)" -> "Roll dice for phase turn order";
+  "Roll dice for phase turn order" -> "Scaffold design.md template + create phase-2-design.md";
+  "Scaffold design.md template + create phase-2-design.md" -> "ROUND 1: Agent 1 WRITES initial design";
+  "ROUND 1: Agent 1 WRITES initial design" -> "Agent 2 REVIEWS, pins findings";
+  "Agent 2 REVIEWS, pins findings" -> "Agent 3 REVIEWS, pins findings";
+  "Agent 3 REVIEWS, pins findings" -> "Wizard evaluates, optionally intervenes";
+  "Wizard evaluates, optionally intervenes" -> "ROUND 2: Agent 1 DEFEND/CONCEDE, writes V2";
+  "ROUND 2: Agent 1 DEFEND/CONCEDE, writes V2" -> "Agents 2+3 review V2";
+  "Agents 2+3 review V2" -> "Wizard evaluates — Round 3 needed?" [shape=diamond];
+  "Wizard evaluates — Round 3 needed?" -> "ROUND 3 (FINAL): same cycle" [label="critical findings"];
+  "Wizard evaluates — Round 3 needed?" -> "Drift check: design.md vs prd.md" [label="solid"];
+  "ROUND 3 (FINAL): same cycle" -> "Drift check: design.md vs prd.md";
+  "Drift check: design.md vs prd.md" -> "Extract final design.md";
+  "Extract final design.md" -> "Present to human" -> "Approved?" [shape=diamond];
+  "Approved?" -> "Ask why, explain to agents, more rounds" [label="no"];
+  "Ask why, explain to agents, more rounds" -> "ROUND 2: Agent 1 DEFEND/CONCEDE, writes V2";
+  "Approved?" -> "Commit + report with file links" [label="yes"];
+  "Commit + report with file links" -> "Load raid-canonical-implementation-plan" [shape=doublecircle];
 }
 ```
 ## Wizard Checklist
-Complete in order:
 1. **Comprehend the request** — read 3 times, identify the real problem beneath the stated one
-2. **Scope check** — if the request describes multiple independent subsystems, flag it immediately
+2. **Scope check** — if multiple independent subsystems, decompose first
 3. **Explore project context** — files, docs, recent commits, dependencies, conventions, patterns
-4. **Research dependencies** — API surface, versioning, compatibility, known issues. Read docs COMPLETELY.
-5. **Ask clarifying questions** — one at a time to the human, eliminate every ambiguity
-6. **Open the Dungeon** — create `{questDir}/phase-2-design.md` (scoreboard) with Phase 2 header, quest, mode. Read `{questDir}/prd.md` if it exists.
-7. **Dispatch with angles** — send each agent their angle via SendMessage, then go silent:
-   ```
-   SendMessage(to="warrior", message="DISPATCH: [quest]. Your angle: [X]...")
-   SendMessage(to="archer", message="DISPATCH: [quest]. Your angle: [Y]...")
-   SendMessage(to="rogue", message="DISPATCH: [quest]. Your angle: [Z]...")
-   ```
-8. **Round 1: Research** — agents explore their angles independently in their own panes. Pin findings to Dungeon. Signal `ROUND_COMPLETE:`. **Stop.** Agents do NOT self-initiate cross-testing. You receive messages automatically. Intervene only on protocol violations.
-9. **Round 2: Cross-testing** — when ALL agents have flagged `ROUND_COMPLETE:`, dispatch explicit cross-verification assignments. Each agent challenges specific findings from the others. Signal `ROUND_COMPLETE:` when done. **Stop.**
-10. **Repeat if needed** — if more exploration is needed, dispatch a new research round with refined angles
-11. **Close the phase** — broadcast `HOLD`. Close when Dungeon has sufficient verified findings to form 2-3 approaches
-12. **Synthesize approaches** — propose 2-3 approaches from Dungeon evidence, with trade-offs and recommendation
-13. **Present design section by section** — scale each section to its complexity (a few sentences if straightforward, up to 200-300 words if nuanced). Ask the human after each section: "Does this look right so far?" Be ready to revise before moving on. Cover: architecture, components, data flow, error handling, testing.
-14. **Write design doc** — save to `{questDir}/design.md` (separate from the phase scoreboard). May also create `{questDir}/design-diagrams.md` for mermaid charts.
-15. **Adversarial spec review** — agents attack the written spec directly, challenging each other
-16. **Spec self-review** — fix issues inline (see checklist below)
-17. **Human reviews written spec** — human approves before proceeding
-18. **Commit** — `docs(quest-{slug}): phase 2 design — {summary}`
-19. **Transition** — invoke `raid-canonical-implementation-plan`
-## Opening the Dungeon (Phase Scoreboard)
-Create `{questDir}/phase-2-design.md` — this is the **dungeon scoreboard**, not the deliverable. It tracks discoveries, battles, and shared knowledge from agent exploration. Every line in Discoveries/Active Battles must use a recognized prefix (`DUNGEON:`, `UNRESOLVED:`, `BLACKCARD:`, `RESOLVED:`, `TASK:`). Freeform content is only allowed in Resolved, Shared Knowledge, and Escalations sections.
+4. **Ask clarifying questions** — one at a time to the human, eliminate every ambiguity
+5. **Phase recap** — summarize PRD findings and deliverable (read `{questDir}/spoils/prd.md` if it exists). Or summarize the exploration context if PRD was skipped. Present to agents and human.
+6. **Roll dice** — randomly shuffle `["warrior", "archer", "rogue"]` for this phase's turn order. Update raid-session via Bash using the jq command from protocol "Dice Roll Reference". Announce: *"The dice have spoken. Turn order for this phase: {agent1} → {agent2} → {agent3}."*
+7. **Scaffold documents** — create `{questDir}/spoils/design.md` (template) and `{questDir}/phases/phase-2-design.md` (evolution log)
+8. **Run rounds** — see Round Protocol below
+9. **Drift check** — compare final design with `prd.md` (if exists). See Drift Detection below.
+10. **Extract final** — polish the final version into clean `design.md` from the evolution in `phase-2-design.md`
+11. **Present to human** — walk through the design. If not approved: ask why, understand, explain feedback to agents, run more rounds, re-extract. Repeat until approved.
+12. **Commit** — `docs(quest-{slug}): phase 2 design — {summary}`
+13. **Report** — link both `design.md` and `phase-2-design.md` file paths
+14. **Transition** — load `raid-canonical-implementation-plan`
+## Dispatch Templates
+Dispatch carries only dynamic context the agent can't get from party-rules or the phase file's embedded comments. Keep dispatch lean — detailed instructions are in the scaffolded document sections.
+**Writer (Round 1, Turn 1):**
+```
+TURN_DISPATCH: Phase 2 Design, Round 1, Turn 1.
+Quest: {description}
+Phase recap: {summary of PRD/prior findings}
+Your role: WRITER. Your section: "Version 1 — @{name} [R1]"
+FIRST: Read the FULL document at {questDir}/phases/phase-2-design.md before writing anything.
+  Understand the structure, read the embedded instructions in your section, and read the
+  Writing Guidance at the bottom. Then read {questDir}/spoils/prd.md (if exists) + codebase.
+THEN: Write in your designated section following the embedded instructions.
+```
+**Reviewer (Round 1, Turns 2-3):**
+```
+TURN_DISPATCH: Phase 2 Design, Round 1, Turn {T}.
+Quest: {description}
+{prior agent} just wrote Version 1.
+Your role: REVIEWER. Your section: "@{name} [R1] Review"
+FIRST: Read the FULL document at {questDir}/phases/phase-2-design.md before writing anything.
+  Understand the structure, read Version 1, read the embedded instructions in your review section.
+THEN: Write your review in your designated section following the embedded instructions.
+```
+**Writer (Round 2+, Defend/Concede):**
+```
+TURN_DISPATCH: Phase 2 Design, Round {N}, Turn 1.
+Quest: {description}
+Round {N-1} reviews are in from @{reviewer1} and @{reviewer2}.
+Your role: WRITER. Sections: "Defend/Concede — @{name} [R{N}]" then "Version {N} — @{name} [R{N}]"
+FIRST: Read the FULL document at {questDir}/phases/phase-2-design.md.
+  Read every finding from Round {N-1}. Read the embedded instructions in your sections.
+THEN: Respond to each finding, then write Version {N}.
+```
+**Reviewer (Round 2+, Turns 2-3):**
+```
+TURN_DISPATCH: Phase 2 Design, Round {N}, Turn {T}.
+Quest: {description}
+{writer} responded with DEFEND/CONCEDE and wrote Version {N}.
+Your role: REVIEWER. Your section: "@{name} [R{N}] Review"
+FIRST: Read the FULL document at {questDir}/phases/phase-2-design.md.
+  Read Version {N}, the defend/concede responses, and your embedded instructions.
+THEN: Write your review in your designated section.
+```
+## Round Protocol
+### Round 1: Write + Review
+**Agent 1 (dice-first) — WRITES the initial design:**
+- Receives the PRD (or exploration context), codebase findings, and the `design.md` template
+- Writes the complete initial design applying their unique lens
+- Signs all work: `@{name} [R1]`
+- Output goes to the "Version 1" section of `phase-2-design.md`
+- Signals `TURN_COMPLETE:`
+**Agent 2 — REVIEWS Agent 1's work:**
+- Reads Agent 1's design in `phase-2-design.md`
+- Writes review in the "Review — Round 1" section, pins findings
+- Challenges gaps, weak assumptions, missing edge cases — from their unique lens
+- Signs all findings: `@{name} [R1]`
+- Signals `TURN_COMPLETE:`
+**Agent 3 — REVIEWS both prior works:**
+- Reads Agent 1's design AND Agent 2's review
+- Writes their own review section, building on or challenging Agent 2's findings
+- Signs all findings: `@{name} [R1]`
+- Signals `TURN_COMPLETE:`
+**Wizard evaluates Round 1:**
+- Reads all work. Ultrathink synthesis.
+- Optionally intervenes on the document — with human approval, explaining why. But only if needed; if the document is in good shape, move to Round 2.
+### Round 2: Defend/Concede + Review
+**Agent 1 — DEFEND or CONCEDE each finding, write Version 2:**
+- Reads every finding from Agents 2 and 3
+- Responds to **each one** explicitly:
+  - `DEFEND:` — counter-evidence showing the approach is correct
+  - `CONCEDE:` — acknowledge the issue, commit to addressing it
+- Writes Version 2 incorporating all conceded findings
+- May intentionally mark specific findings as false positives (with explanation)
+- Signs: `@{name} [R2]`
+- Signals `TURN_COMPLETE:`
+**Agents 2+3 — Review Version 2:**
+- Same review pattern as Round 1, but now evaluating the V2 and the defend/concede responses
+- Sign: `@{name} [R2]`
+**Wizard evaluates Round 2:**
+- If no critical or high-relevance findings remain → close
+- If breaking concerns exist → announce Round 3 as FINAL: *"This is the final round. Make every move count."*
+### Round 3 (if needed): Final Round
+Same cycle. Wizard makes clear this is the FINAL round — agents have limited moves, so every one must count. After Round 3, the Wizard closes regardless.
+## Evolution Log Template
+Scaffold `{questDir}/phases/phase-2-design.md`. Replace `{writer}`, `{reviewer1}`, `{reviewer2}` with actual agent names from the dice roll:
 ```markdown
-# Phase 2: Design
-## Quest: <task description from human>
-## Mode: <Full Raid | Skirmish>
-## PRD: <link to prd.md if it exists>
+# Phase 2: Design — Evolution Log
+## Quest: [quest description]
+## Quest Type: Canonical Quest
+## Turn Order: @{agent1} → @{agent2} → @{agent3}
-### Discoveries
+## References
+- PRD: `{questDir}/spoils/prd.md` (if exists)
-### Active Battles
+## Quest Goal
+<!-- Wizard writes 2-3 lines: what the design phase aims to produce,
+     key constraints from the PRD, and the main architectural question to answer -->
-### Resolved
+---
-### Shared Knowledge
+## Version 1 — @{writer} [R1]
+<!-- @{writer}: WRITER for this phase. Read references above first.
+     Fill EVERY section. Scale depth to complexity (simple→bullets, complex→full detail).
+     Make reasoning explicit — reviewers will challenge everything. -->
+### Problem Restatement
+<!-- Restate the problem in technical terms. How does it manifest in the codebase?
+     What specific code, systems, or flows are affected? -->
+### Requirements Summary
+<!-- Numbered list extracted from PRD. Each requirement that this design must satisfy.
+     If no PRD exists, derive from the wizard's context. -->
+### Constraints
+<!-- Technical: language, framework, infrastructure, backwards compatibility.
+     Business: timeline, compliance, dependencies on other teams.
+     Only constraints that affect design decisions. -->
+### Architecture
+<!-- Scale depth to complexity:
+     - Describe the main components/modules and how they connect
+     - Show data flow: what enters, what's processed, what exits
+     - Define key interfaces between components
+     - For complex features: include sequence of operations, state transitions
+     - Reference existing code patterns in the codebase where you're extending them
+     - Call out what's NEW vs what EXTENDS existing code -->
+### File Structure
+<!-- Map of files to create or modify:
+     | File | Action | Purpose |
+     |------|--------|---------|
+     Use the project's existing structure as the guide. -->
+### Error Handling Strategy
+<!-- What errors can occur at each boundary?
+     How is each error surfaced to the user or calling code?
+     What's the recovery path? What's unrecoverable? -->
+### Testing Strategy
+<!-- What types of tests? (unit, integration, e2e)
+     What's the mocking strategy?
+     What are the critical paths that MUST have test coverage?
+     When browser.enabled: which flows need Playwright tests? -->
+### Edge Cases
+<!-- Catalog by category:
+     - Boundary: empty input, max values, zero, negative
+     - State: concurrent access, partial failure, interrupted operations
+     - Input: malformed data, unicode, unexpected types
+     - Environment: network failure, timeout, missing dependencies
+     Only include edge cases relevant to THIS feature. -->
+### Alternatives Considered
+<!-- At least 2 alternatives to your chosen approach.
+     For each: what it is, why it was rejected (specific technical reason). -->
-### Escalations
-```
+---
-## Question Chain
+## Review — Round 1
-**Agents NEVER ask the human directly.** The question flow is:
-1. Agent discovers they need clarification → sends `WIZARD:` with the question
-2. Wizard reasons: can I answer this confidently from the PRD, codebase, or prior context?
-3. If yes → answer the agent directly via SendMessage
-4. If unsure → digest the question, formulate it clearly for the human, ask human
-5. Wizard passes human's answer back to agents with his own interpretation added
-6. Goal: minimize questions to human, batch related questions
+### @{reviewer1} [R1] Review
-## Dispatch Pattern
+<!-- @{reviewer1}: REVIEWER. Read Version 1, then verify claims against actual code.
+     For each finding: 1) WHAT is wrong 2) WHY it matters 3) WHAT should change.
+     Use FINDING:/CHALLENGE:/BUILDING: signals. Sign @{reviewer1} [R1]. -->
-Each agent gets the same objective but a different starting angle. After dispatch, the Wizard goes silent.
+### @{reviewer2} [R1] Review
-**DISPATCH:**
+<!-- @{reviewer2}: REVIEWER. Read Version 1 + @{reviewer1}'s review.
+     Find what was missed. Challenge with evidence. Don't repeat — add new value. -->
-> **@Warrior**: Explore from the data/infrastructure side. What are the hard technical constraints? What schemas, migrations, APIs are needed? What breaks if we get this wrong? Find the structural load-bearing walls. Challenge @Archer and @Rogue's findings directly. Pin verified findings to the Dungeon.
->
-> **@Archer**: Explore from the integration/consistency side. How does this fit with existing patterns? What implicit contracts exist? What ripple effects? Trace the dependency chain. Check naming and file structure conventions. Challenge @Warrior and @Rogue's findings directly. Pin verified findings to the Dungeon.
->
-> **@Rogue**: Explore from the failure/adversarial side. What assumptions about inputs, state, timing, availability? Build failure scenarios. What does a malicious user do? What does a slow network do? What does concurrent access do? Challenge @Warrior and @Archer's findings directly. Pin verified findings to the Dungeon.
->
-> **All**: Read the Dungeon. Build on each other's discoveries. Challenge everything. Pin only what survives. Escalate to me with `WIZARD:` only when genuinely stuck.
+### Wizard [R1] Synthesis
+<!-- Wizard evaluates the round. Key findings, open questions,
+     direction for Round 2. Optional interventions (with human approval). -->
-## Design Principles
+---
-- **Isolation:** Break into units with one clear purpose, well-defined interfaces, testable independently. For each unit: what does it do, how do you use it, what does it depend on?
-- **Encapsulation:** Can someone understand a unit without reading its internals? Can you change internals without breaking consumers? If not, the boundaries need work.
-- **Size:** Smaller, well-bounded units are easier to reason about. When a file grows large, that's a signal it's doing too much.
-- **Existing codebases:** Explore current structure first. Follow existing patterns. Only include targeted improvements where they serve the current goal — no unrelated refactoring.
+## Defend/Concede — @{writer} [R2]
-## What Agents Must Cover
+<!-- @{writer}: Respond to EACH finding from both reviewers.
+     DEFEND: [ref] — counter-evidence. CONCEDE: [ref] — what you'll fix in V2.
+     No silent ignoring. Every finding gets a response. -->
-Every agent addresses ALL of these from their assigned angle:
+## Version 2 — @{writer} [R2]
-- **Performance** — scale, bottlenecks, complexity
-- **Robustness** — retries, fallbacks, graceful degradation
-- **Reliability** — blast radius of failure, production-readiness
-- **Testability** — meaningful tests, mock strategy, test-friendly design. When `browser.enabled`: can this feature be E2E tested with Playwright? What user flows need browser verification? Are there loading states, client-side routing, or visual states that unit tests can't catch?
-- **Error handling** — what errors occur, how surfaced, UX of failure
-- **Edge cases** — empty, null, boundary, Unicode, timezones, large payloads
-- **Cascading effects** — blast radius, what else changes
-- **Clean architecture** — separation of concerns, single responsibility, dependency inversion
-- **Modularity & composability** — replaceable, extensible, composable
-- **DRY** — duplicating logic? reuse existing code?
-- **Dependencies** — version compatibility, security, maintenance, licensing
+<!-- @{writer}: Incorporate all conceded findings into a revised design.
+     Mark what changed from V1 and why.
+     Defended items remain as-is — state why they survived challenge. -->
-## The Fight — What Makes It Productive
+[Same sections as Version 1]
-```
-Agents interact DIRECTLY — @Name addressing, building, challenging, roasting:
-1. Present findings with EVIDENCE (file paths, docs, concrete examples)
-2. Challenge other agents DIRECTLY with COUNTER-EVIDENCE (not opinions)
-3. Build on each other's discoveries — BUILDING: with independent verification
-4. Go to the EDGES — push every finding to its extreme
-5. LEARN from each other — incorporate discoveries into your model
-6. Pin verified findings — DUNGEON: only after surviving challenge
-7. Challenge weak analysis — back every challenge with your own independent evidence
-8. Escalate to Wizard — WIZARD: only when genuinely stuck
-```
+---
-**The goal is not to tear each other down. The goal is to forge the strongest design by testing it from every angle. The Dungeon captures what survived.**
+## Review — Round 2
-## Closing the Phase
+### @{reviewer1} [R2] Review
+<!-- @{reviewer1}: Focus on Version 2 changes and the defend/concede responses.
+     Did @{writer} address your findings adequately?
+     Are the defenses valid? Are the concessions properly incorporated?
+     Any NEW issues introduced by the changes? -->
-The Wizard closes when the Dungeon has sufficient verified findings — enough Discoveries, Shared Knowledge, and Resolved battles to synthesize 2-3 approaches.
+### @{reviewer2} [R2] Review
+<!-- @{reviewer2}: Same focus. Challenge defenses you disagree with.
+     Confirm concessions were properly incorporated. -->
-**How the Wizard knows it's time to close:**
-- Dungeon has verified findings covering all major aspects (performance, robustness, testability, etc.)
-- Active Battles section is empty or has only minor unresolved points
-- Agents are converging — new findings are variations, not revelations
-- Shared Knowledge section has the foundational truths the design needs
+### Wizard [R2] Synthesis
+<!-- Wizard evaluates. If critical findings remain → announce Round 3 as FINAL.
+     If solid → proceed to extraction. -->
-**RULING:** Synthesize from Dungeon evidence. Propose 2-3 approaches. Recommend one. Archive Dungeon.
+---
-## Spec Self-Review
+## Final Extraction Notes — Wizard
+<!-- What was incorporated into design.md and why.
+     What was intentionally excluded and why.
+     Drift check result against prd.md (if exists). -->
-After writing the design doc, the Wizard reviews with fresh eyes:
+---
-1. **Placeholder scan:** Any TBD, TODO, incomplete sections, vague requirements? Fix them.
-2. **Internal consistency:** Do any sections contradict each other? Architecture match feature descriptions?
-3. **Scope check:** Focused enough for a single implementation plan, or needs decomposition?
-4. **Ambiguity check:** Could any requirement be interpreted two ways? Pick one and make it explicit.
+## Writing Guidance
+- Sign all work: `@{name} [R{N}]`
+- Evidence-based: file paths, line numbers, concrete examples — no opinions without proof
+- No placeholders: no TBD, TODO, or vague references
+- Scale depth to complexity — a few sentences if straightforward, detailed if nuanced
+- Reviewers: respond to EVERY finding with DEFEND: or CONCEDE:
+- Each review must add NEW value — don't repeat what prior reviewers said
+```
-Fix issues inline.
+**Round 3:** If needed, the wizard appends Round 3 sections to the evolution log before dispatching. Do NOT pre-scaffold Round 3.
-## Design Document Structure (Phase Deliverable)
+## Design Document Template
-The actual design doc is a **separate file**: `{questDir}/design.md`. This file is not validated by the dungeon hook and can contain freeform markdown. Write it when closing the phase — synthesize from scoreboard findings and agent exploration.
+Scaffold `{questDir}/spoils/design.md` — wizard-only, clean deliverable extracted from evolution log:
 ```markdown
 # [Feature Name] Design Specification
 **Date:** YYYY-MM-DD
 **Status:** Draft | Under Review | Approved
-**Raid Team:** Wizard (dungeon master), [agents used]
-**Mode:** Full Raid | Skirmish
+**Quest Type:** Canonical Quest
 ## Problem Statement
 ## Requirements (numbered, unambiguous)
 ## Constraints
-## Dungeon Findings (verified, from Phase 1 Dungeon)
-### Key Discoveries (survived cross-testing)
-### Lessons Learned (wrong assumptions corrected)
-## Design Decision
-### Alternatives Considered (2-3 with rejection reasons)
 ## Architecture
 ## File Structure
 ## Error Handling Strategy
 ## Testing Strategy
 ## Edge Cases
 ## Future Considerations (NOT building now, designing to accommodate)
+## Design Decision
+### Alternatives Considered (with rejection reasons)
 ## RULING
 ```
-## Red Flags — Thoughts That Signal Violations
+## What Agents Must Cover
-| Thought | Reality |
-|---------|---------|
-| "This is too simple to need a design" | Simple projects are where unexamined assumptions cause the most waste. |
-| "I already know the right approach" | Knowing and verifying are different. Propose 2-3 anyway. |
-| "Let's just start coding and figure it out" | Code without design becomes the design. And it's usually wrong. |
-| "The agents all agree, let's move on" | Agreement without challenge is groupthink. Did they actually cross-test? |
-| "I'll wait for the Wizard to tell me what to do" | You own the phase. Explore, challenge, build. Self-organize. |
-| "Let me just post everything to the Dungeon" | Only verified, challenged findings get pinned. |
-| "I need the Wizard to mediate this disagreement" | Talk to the other agent directly first. Escalate only if stuck. |
+Every agent addresses ALL of these from their assigned angle:
-## Escalation
+- **Performance** — scale, bottlenecks, complexity
+- **Robustness** — retries, fallbacks, graceful degradation
+- **Testability** — meaningful tests, mock strategy, test-friendly design
+- **Error handling** — what errors occur, how surfaced, UX of failure
+- **Edge cases** — empty, null, boundary, Unicode, timezones, large payloads
+- **Cascading effects** — blast radius, what else changes
+- **Clean architecture** — separation of concerns, single responsibility
+- **Dependencies** — version compatibility, security, licensing
-If the team is stuck on a fundamental design choice after genuine direct debate:
-1. Present the top 2 options with trade-offs to the human
-2. Let the human decide
-3. Never ask the human to resolve something the team should handle
+## Drift Detection
----
+Before closing, the Wizard compares `design.md` with `prd.md` (if it exists). If the design contradicts or omits a PRD requirement without explicit rationale, that's drift.
+If drift detected, present options to the human:
+- **(a)** Change PRD to match design — the design exploration revealed the PRD was wrong
+- **(b)** Change design to match PRD — the design drifted from the original intent
+- **(c)** Something else — explain the situation, let the human decide
+## Design Principles
+- **Isolation:** Break into units with one clear purpose, well-defined interfaces, testable independently.
+- **Encapsulation:** Can someone understand a unit without reading its internals?
+- **Size:** When a file grows large, that's a signal it's doing too much.
+- **Existing codebases:** Follow existing patterns. Only improve where it serves the current goal.
+## Red Flags
+| Thought | Reality |
+|---------|---------|
+| "This is too simple to need a design" | Simple projects hide unexamined assumptions. |
+| "I already know the right approach" | Knowing and verifying are different. |
+| "The agents all agree after one round" | Minimum 2 rounds. Agreement without challenge is groupthink. |
+| "Let me silently ignore that finding" | Every finding must get DEFEND: or CONCEDE:. No silent ignoring. |
+| "Good enough, let's move on" | Present to human. Only they decide when it's good enough. |
 ## Phase Transition
 When the design is approved and committed:
-1. Update `.claude/raid-session` phase via Bash (write gate blocks Write/Edit on this file):
+1. Update raid-session phase via Bash:
    ```bash
    jq '.phase="plan"' .claude/raid-session > .claude/raid-session.tmp && mv .claude/raid-session.tmp .claude/raid-session
    ```
 2. **Commit:** `docs(quest-{slug}): phase 2 design — {summary}`
-3. **Send phase report to human:** summarize key design decisions, trade-offs resolved, what's next
-4. **Load the `raid-canonical-implementation-plan` skill now and begin Phase 3.**
+3. **Report:** Link `design.md` and `phase-2-design.md` file paths to the human.
+4. **Load `raid-canonical-implementation-plan` and begin Phase 3.**
+## Phase Spoils
-Do not wait. Do not ask. The next action after committing the design doc is loading the next skill.
+**Two outputs:**
+- `{questDir}/phases/phase-2-design.md` — Full evolution timeline (all versions, reviews, defend/concede responses)
+- `{questDir}/spoils/design.md` — Clean final design specification (wizard-polished)