npm - claude-raid - Versions diffs - 0.1.0 - Mend

claude-raid 0.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (31) hide show

package/LICENSE +21 -0
package/README.md +345 -0
package/bin/cli.js +34 -0
package/package.json +37 -0
package/src/detect-project.js +112 -0
package/src/doctor.js +201 -0
package/src/init.js +138 -0
package/src/merge-settings.js +119 -0
package/src/remove.js +92 -0
package/src/update.js +110 -0
package/template/.claude/agents/archer.md +115 -0
package/template/.claude/agents/rogue.md +116 -0
package/template/.claude/agents/warrior.md +114 -0
package/template/.claude/agents/wizard.md +206 -0
package/template/.claude/hooks/validate-commit-message.sh +78 -0
package/template/.claude/hooks/validate-file-naming.sh +73 -0
package/template/.claude/hooks/validate-no-placeholders.sh +67 -0
package/template/.claude/hooks/validate-phase-gate.sh +60 -0
package/template/.claude/hooks/validate-tests-pass.sh +43 -0
package/template/.claude/hooks/validate-verification.sh +70 -0
package/template/.claude/raid-rules.md +21 -0
package/template/.claude/skills/raid-debugging/SKILL.md +159 -0
package/template/.claude/skills/raid-design/SKILL.md +208 -0
package/template/.claude/skills/raid-finishing/SKILL.md +123 -0
package/template/.claude/skills/raid-git-worktrees/SKILL.md +96 -0
package/template/.claude/skills/raid-implementation/SKILL.md +155 -0
package/template/.claude/skills/raid-implementation-plan/SKILL.md +173 -0
package/template/.claude/skills/raid-protocol/SKILL.md +288 -0
package/template/.claude/skills/raid-review/SKILL.md +133 -0
package/template/.claude/skills/raid-tdd/SKILL.md +147 -0
package/template/.claude/skills/raid-verification/SKILL.md +113 -0

package/template/.claude/skills/raid-design/SKILL.md ADDED Viewed

@@ -0,0 +1,208 @@
+---
+name: raid-design
+description: "Phase 1 of Raid protocol. Wizard opens the Dungeon, agents explore freely from different angles, challenge and build on each other directly, and pin verified findings. Wizard closes when design is battle-tested."
+---
+# Raid Design — Phase 1
+Turn ideas into battle-tested designs through agent-driven adversarial exploration.
+<HARD-GATE>
+Do NOT write any code, scaffold any project, or take any implementation action until the Wizard has approved the design and it is committed to git. All assigned agents participate. No subagents.
+</HARD-GATE>
+## Mode Behavior
+- **Full Raid**: All 3 agents explore from different angles, fight directly, pin findings to Dungeon. Full design doc required.
+- **Skirmish**: 2 agents explore and interact, produce a lightweight design+plan combined doc.
+- **Scout**: Wizard assesses inline, no design doc required. Skip this skill entirely.
+## Process Flow
+```dot
+digraph design {
+  "Wizard comprehends request (reads 3x)" -> "Scope check";
+  "Scope check" -> "Too large?" [shape=diamond];
+  "Too large?" -> "Decompose into sub-projects" [label="yes"];
+  "Decompose into sub-projects" -> "Brainstorm first sub-project";
+  "Too large?" -> "Explore project context" [label="no"];
+  "Explore project context" -> "Research dependencies";
+  "Research dependencies" -> "Ask clarifying questions (one at a time)";
+  "Ask clarifying questions (one at a time)" -> "Wizard opens Dungeon + dispatches";
+  "Wizard opens Dungeon + dispatches" -> "Agents explore, challenge, build freely";
+  "Agents explore, challenge, build freely" -> "Agents pin verified findings to Dungeon";
+  "Agents pin verified findings to Dungeon" -> "Dungeon sufficient?" [shape=diamond];
+  "Dungeon sufficient?" -> "Agents explore, challenge, build freely" [label="no"];
+  "Dungeon sufficient?" -> "Wizard closes: synthesizes 2-3 approaches from Dungeon" [label="yes"];
+  "Wizard closes: synthesizes 2-3 approaches from Dungeon" -> "Present design (section by section)";
+  "Present design (section by section)" -> "Human approves?" [shape=diamond];
+  "Human approves?" -> "Present design (section by section)" [label="revise"];
+  "Human approves?" -> "Write design doc" [label="yes"];
+  "Write design doc" -> "Adversarial spec review (agents attack directly)";
+  "Adversarial spec review (agents attack directly)" -> "Spec self-review (fix inline)";
+  "Spec self-review (fix inline)" -> "Human reviews written spec";
+  "Human reviews written spec" -> "Commit + invoke raid-implementation-plan" [shape=doublecircle];
+}
+```
+## Wizard Checklist
+Complete in order:
+1. **Comprehend the request** — read 3 times, identify the real problem beneath the stated one
+2. **Scope check** — if the request describes multiple independent subsystems, flag it immediately
+3. **Explore project context** — files, docs, recent commits, dependencies, conventions, patterns
+4. **Research dependencies** — API surface, versioning, compatibility, known issues. Read docs COMPLETELY.
+5. **Ask clarifying questions** — one at a time to the human, eliminate every ambiguity
+6. **Open the Dungeon** — create `.claude/raid-dungeon.md` with Phase 1 header, quest, mode
+7. **Dispatch with angles** — give each agent their angle, then go silent
+8. **Observe the fight** — agents explore, challenge, build, roast, and pin findings to Dungeon. Intervene only on triggers.
+9. **Close the phase** — when Dungeon has sufficient verified findings to form 2-3 approaches
+10. **Synthesize approaches** — propose 2-3 approaches from Dungeon evidence, with trade-offs and recommendation
+11. **Present design** — in sections scaled to complexity, get human approval per section
+12. **Write design doc** — save to specs path from `.claude/raid.json`
+13. **Adversarial spec review** — agents attack the written spec directly, challenging each other
+14. **Spec self-review** — fix issues inline (see checklist below)
+15. **Human reviews written spec** — human approves before proceeding
+16. **Commit** — `docs(design): <topic> specification`
+17. **Archive Dungeon** — rename to `.claude/raid-dungeon-phase-1.md`
+18. **Transition** — invoke `raid-implementation-plan`
+## Opening the Dungeon
+Create `.claude/raid-dungeon.md`:
+```markdown
+# Dungeon — Phase 1: Design
+## Quest: <task description from human>
+## Mode: <Full Raid | Skirmish>
+### Discoveries
+### Active Battles
+### Resolved
+### Shared Knowledge
+### Escalations
+```
+## Dispatch Pattern
+Each agent gets the same objective but a different starting angle. After dispatch, the Wizard goes silent.
+**📡 DISPATCH:**
+> **@Warrior**: Explore from the data/infrastructure side. What are the hard technical constraints? What schemas, migrations, APIs are needed? What breaks if we get this wrong? Find the structural load-bearing walls. Challenge @Archer and @Rogue's findings directly. Pin verified findings to the Dungeon.
+>
+> **@Archer**: Explore from the integration/consistency side. How does this fit with existing patterns? What implicit contracts exist? What ripple effects? Trace the dependency chain. Check naming and file structure conventions. Challenge @Warrior and @Rogue's findings directly. Pin verified findings to the Dungeon.
+>
+> **@Rogue**: Explore from the failure/adversarial side. What assumptions about inputs, state, timing, availability? Build failure scenarios. What does a malicious user do? What does a slow network do? What does concurrent access do? Challenge @Warrior and @Archer's findings directly. Pin verified findings to the Dungeon.
+>
+> **All**: Read the Dungeon. Build on each other's discoveries. Challenge everything. Pin only what survives. Escalate to me with `🆘 WIZARD:` only when genuinely stuck.
+## What Agents Must Cover
+Every agent addresses ALL of these from their assigned angle:
+- **Performance** — scale, bottlenecks, complexity
+- **Robustness** — retries, fallbacks, graceful degradation
+- **Reliability** — blast radius of failure, production-readiness
+- **Testability** — meaningful tests, mock strategy, test-friendly design
+- **Error handling** — what errors occur, how surfaced, UX of failure
+- **Edge cases** — empty, null, boundary, Unicode, timezones, large payloads
+- **Cascading effects** — blast radius, what else changes
+- **Clean architecture** — separation of concerns, single responsibility, dependency inversion
+- **Modularity & composability** — replaceable, extensible, composable
+- **DRY** — duplicating logic? reuse existing code?
+- **Dependencies** — version compatibility, security, maintenance, licensing
+## The Fight — What Makes It Productive
+```
+Agents interact DIRECTLY — @Name addressing, building, challenging, roasting:
+1. Present findings with EVIDENCE (file paths, docs, concrete examples)
+2. Challenge other agents DIRECTLY with COUNTER-EVIDENCE (not opinions)
+3. Build on each other's discoveries — 🔗 BUILDING ON @Name:
+4. Go to the EDGES — push every finding to its extreme
+5. LEARN from each other — incorporate discoveries into your model
+6. Pin verified findings — 📌 DUNGEON: only after surviving challenge
+7. Roast weak analysis — 🔥 ROAST: with evidence, not insults
+8. Escalate to Wizard — 🆘 WIZARD: only when genuinely stuck
+```
+**The goal is not to tear each other down. The goal is to forge the strongest design by testing it from every angle. The Dungeon captures what survived.**
+## Closing the Phase
+The Wizard closes when the Dungeon has sufficient verified findings — enough Discoveries, Shared Knowledge, and Resolved battles to synthesize 2-3 approaches.
+**How the Wizard knows it's time to close:**
+- Dungeon has verified findings covering all major aspects (performance, robustness, testability, etc.)
+- Active Battles section is empty or has only minor unresolved points
+- Agents are converging — new findings are variations, not revelations
+- Shared Knowledge section has the foundational truths the design needs
+**⚡ WIZARD RULING:** Synthesize from Dungeon evidence. Propose 2-3 approaches. Recommend one. Archive Dungeon.
+## Spec Self-Review
+After writing the design doc, the Wizard reviews with fresh eyes:
+1. **Placeholder scan:** Any TBD, TODO, incomplete sections, vague requirements? Fix them.
+2. **Internal consistency:** Do any sections contradict each other? Architecture match feature descriptions?
+3. **Scope check:** Focused enough for a single implementation plan, or needs decomposition?
+4. **Ambiguity check:** Could any requirement be interpreted two ways? Pick one and make it explicit.
+Fix issues inline.
+## Design Document Structure
+Save to: specs path from `.claude/raid.json` (default: `docs/raid/specs/YYYY-MM-DD-<topic>-design.md`)
+```markdown
+# [Feature Name] Design Specification
+**Date:** YYYY-MM-DD
+**Status:** Draft | Under Review | Approved
+**Raid Team:** Wizard (dungeon master), [agents used]
+**Mode:** Full Raid | Skirmish
+## Problem Statement
+## Requirements (numbered, unambiguous)
+## Constraints
+## Dungeon Findings (verified, from Phase 1 Dungeon)
+### Key Discoveries (survived cross-testing)
+### Lessons Learned (wrong assumptions corrected)
+## Design Decision
+### Alternatives Considered (2-3 with rejection reasons)
+## Architecture
+## File Structure
+## Error Handling Strategy
+## Testing Strategy
+## Edge Cases
+## Future Considerations (NOT building now, designing to accommodate)
+## ⚡ WIZARD RULING
+```
+## Red Flags — Thoughts That Signal Violations
+| Thought | Reality |
+|---------|---------|
+| "This is too simple to need a design" | Simple projects are where unexamined assumptions cause the most waste. |
+| "I already know the right approach" | Knowing and verifying are different. Propose 2-3 anyway. |
+| "Let's just start coding and figure it out" | Code without design becomes the design. And it's usually wrong. |
+| "The agents all agree, let's move on" | Agreement without challenge is groupthink. Did they actually cross-test? |
+| "I'll wait for the Wizard to tell me what to do" | You own the phase. Explore, challenge, build. Self-organize. |
+| "Let me just post everything to the Dungeon" | Only verified, challenged findings get pinned. |
+| "I need the Wizard to mediate this disagreement" | Talk to the other agent directly first. Escalate only if stuck. |
+## Escalation
+If the team is stuck on a fundamental design choice after genuine direct debate:
+1. Present the top 2 options with trade-offs to the human
+2. Let the human decide
+3. Never ask the human to resolve something the team should handle
+**Terminal state:** ⚡ WIZARD RULING: Design approved. Commit. Archive Dungeon. Invoke `raid-implementation-plan`.

package/template/.claude/skills/raid-finishing/SKILL.md ADDED Viewed

@@ -0,0 +1,123 @@
+---
+name: raid-finishing
+description: "Use after Phase 4 review is approved. Agents debate completeness directly, fighting over what's truly done. Wizard closes with verdict, presents merge options, cleans up Dungeon files and session."
+---
+# Raid Finishing — Complete the Development Branch
+Agents debate completeness directly. Verify. Present options. Execute. Clean up.
+**Violating the letter of this process is violating its spirit.**
+## Mode Behavior
+- **Full Raid**: All 3 agents debate completeness directly. Full verification.
+- **Skirmish**: 1 agent + Wizard verify completeness.
+- **Scout**: Wizard verifies alone.
+## Process Flow
+```dot
+digraph finishing {
+  "Wizard opens final debate" -> "Agents argue directly: truly done?";
+  "Agents argue directly: truly done?" -> "Any agent says incomplete?" [shape=diamond];
+  "Any agent says incomplete?" -> "Agent presents evidence, others attack" [label="yes"];
+  "Agent presents evidence, others attack" -> "Wizard rules" [shape=diamond];
+  "Wizard rules" -> "Return to Phase 3 or 4" [label="incomplete"];
+  "Wizard rules" -> "Verify all tests pass (fresh run)" [label="complete"];
+  "Any agent says incomplete?" -> "Verify all tests pass (fresh run)" [label="no, all agree"];
+  "Verify all tests pass (fresh run)" -> "Tests pass?" [shape=diamond];
+  "Tests pass?" -> "Fix first. Do not present options." [label="no"];
+  "Fix first. Do not present options." -> "Verify all tests pass (fresh run)";
+  "Tests pass?" -> "Present 4 options" [label="yes"];
+  "Present 4 options" -> "Execute choice";
+  "Execute choice" -> "Clean up: Dungeon files + worktree + raid-session";
+  "Clean up: Dungeon files + worktree + raid-session" -> "Done" [shape=doublecircle];
+}
+```
+## Wizard Checklist
+1. **Open final debate** — dispatch agents to argue completeness directly
+2. **Observe the fight** — agents challenge each other on what's done vs. missing
+3. **Wizard rules on completeness** — only proceed if ruling is "complete"
+4. **Verify all tests pass** — full suite, fresh run
+5. **Present options** — exactly 4 choices
+6. **Execute choice** — merge, PR, keep, or discard
+7. **Clean up** — remove all Dungeon files (`.claude/raid-dungeon.md`, `.claude/raid-dungeon-phase-*.md`), worktree if applicable, remove `.claude/raid-session`
+## Step 1: The Completeness Debate
+**📡 DISPATCH:**
+> **@Warrior**: Review the implementation against the plan. Is every task completed? Every acceptance criterion met? Every test passing? Is anything half-done? Fight @Archer and @Rogue directly on their assessments.
+>
+> **@Archer**: Review the implementation against the design doc. Is every requirement covered? Naming patterns consistent throughout? File structure clean? Did we introduce inconsistencies with the rest of the codebase? Fight @Warrior and @Rogue directly.
+>
+> **@Rogue**: Review from the adversarial angle. What did we miss? What edge case is untested? What requirement was subtly misinterpreted? What will break in the first week of production? Fight @Warrior and @Archer directly.
+>
+> **All**: Reference ALL archived Dungeons (Phase 1-4) for full context. Debate directly. If you believe the work is incomplete, present evidence. Others challenge your claim. Pin conclusions to conversation (no Dungeon for finishing — this is the final debate).
+**The agents must fight over this.** If any agent believes the work is incomplete, they present evidence. The other two challenge that claim directly.
+⚡ WIZARD RULING: [Complete — proceed | Incomplete — return to Phase 3/4 with specific issues]
+## Step 2: Final Verification
+```
+BEFORE presenting options:
+1. IDENTIFY: test command from .claude/raid.json
+2. RUN: Execute the FULL test suite (fresh, complete)
+3. READ: Full output, check exit code, count failures
+4. VERIFY: Zero failures?
+   If NO → STOP. Fix first. Do not present options.
+   If YES → Proceed with evidence.
+```
+## Step 3: Present Options
+```
+⚡ WIZARD RULING: Implementation complete and verified.
+Tests: [N] passing, 0 failures (evidence: [command output])
+Options:
+1. Merge back to [base-branch] locally
+2. Push and create a Pull Request
+3. Keep the branch as-is (handle later)
+4. Discard this work
+Which option?
+```
+## Step 4: Execute
+| Option | Actions |
+|--------|---------|
+| **1. Merge** | Checkout base -> pull -> merge -> run tests on merged result -> delete branch -> clean up |
+| **2. PR** | Push with -u -> create PR via gh -> clean up |
+| **3. Keep** | Report branch location. Done. |
+| **4. Discard** | Require typed "discard" confirmation -> delete branch (force) -> clean up |
+## Step 5: Clean Up
+Remove ALL Dungeon artifacts:
+- `.claude/raid-dungeon.md` (if exists)
+- `.claude/raid-dungeon-phase-1.md`
+- `.claude/raid-dungeon-phase-2.md`
+- `.claude/raid-dungeon-phase-3.md`
+- `.claude/raid-dungeon-phase-4.md`
+- `.claude/raid-session`
+- Worktree (if applicable)
+## Red Flags
+| Thought | Reality |
+|---------|---------|
+| "Tests passed earlier, no need to re-run" | Verification Iron Law. Fresh run or no claim. |
+| "The completeness debate is a formality" | It's where missed requirements surface. Take it seriously. |
+| "Let me report to the Wizard whether it's complete" | Debate with the other agents directly. |
+| "Merge without testing the merged result" | Merges introduce conflicts. Always test after merge. |
+| "Leave the Dungeon files, they might be useful" | Clean up. Session artifacts don't belong in the repo. |
+**Terminal state:** Choice executed. All Dungeon files removed. `.claude/raid-session` removed. Session over.

package/template/.claude/skills/raid-git-worktrees/SKILL.md ADDED Viewed

@@ -0,0 +1,96 @@
+---
+name: raid-git-worktrees
+description: "Use when starting Raid implementation that needs isolation. Creates isolated git worktree with safety verification and clean test baseline."
+---
+# Raid Git Worktrees — Isolated Workspaces
+Systematic directory selection + safety verification = reliable isolation.
+## Process Flow
+```dot
+digraph worktree {
+  "Check worktree path from raid.json" -> "Directory exists?";
+  "Directory exists?" -> "Verify gitignored" [label="yes"];
+  "Directory exists?" -> "Create directory" [label="no"];
+  "Create directory" -> "Add to .gitignore + commit";
+  "Add to .gitignore + commit" -> "Verify gitignored";
+  "Verify gitignored" -> "Ignored?" [shape=diamond];
+  "Ignored?" -> "Create worktree" [label="yes"];
+  "Ignored?" -> "Add to .gitignore + commit" [label="no"];
+  "Create worktree" -> "Install dependencies";
+  "Install dependencies" -> "Run baseline tests";
+  "Run baseline tests" -> "Tests pass?" [shape=diamond];
+  "Tests pass?" -> "Report ready" [label="yes", shape=doublecircle];
+  "Tests pass?" -> "Report failures, ask user" [label="no"];
+}
+```
+## Directory Selection Priority
+1. Check worktrees path from `.claude/raid.json` (default: `.worktrees/`) -> use it (verify ignored)
+2. Check CLAUDE.md for preference -> use it
+3. Ask the user
+## Safety Verification
+```bash
+# MUST verify directory is gitignored before creating worktree
+git check-ignore -q [worktrees-path] 2>/dev/null
+```
+If NOT ignored: add to `.gitignore`, commit immediately, then proceed. Fix broken things immediately — don't leave unignored worktree directories.
+## Creation
+```bash
+WORKTREE_PATH=$(jq -r '.paths.worktrees // ".worktrees"' .claude/raid.json)
+git worktree add "$WORKTREE_PATH/$BRANCH_NAME" -b "$BRANCH_NAME"
+cd "$WORKTREE_PATH/$BRANCH_NAME"
+# Auto-detect and install deps
+[ -f package.json ] && npm install
+[ -f Cargo.toml ] && cargo build
+[ -f requirements.txt ] && pip install -r requirements.txt
+[ -f pyproject.toml ] && poetry install
+[ -f go.mod ] && go mod download
+# Verify clean baseline
+TEST_CMD=$(jq -r '.project.testCommand // empty' .claude/raid.json)
+[ -n "$TEST_CMD" ] && eval "$TEST_CMD"
+```
+## Report
+```
+Worktree ready at [path]
+Branch: [branch-name]
+Tests: [N] passing, 0 failures
+Ready for Raid implementation
+Note: Dungeon files (.claude/raid-dungeon*.md) are session artifacts
+and will be cleaned up by raid-finishing. No gitignore needed.
+```
+## Quick Reference
+| Situation | Action |
+|-----------|--------|
+| `.worktrees/` exists | Use it (verify ignored) |
+| `worktrees/` exists | Use it (verify ignored) |
+| Both exist | Use `.worktrees/` |
+| Neither exists | Check raid.json -> CLAUDE.md -> ask user |
+| Directory not ignored | Add to .gitignore + commit first |
+| Tests fail during baseline | Report failures + ask user before proceeding |
+| No test command configured | Warn, proceed without baseline |
+## Red Flags
+| Thought | Reality |
+|---------|---------|
+| "I'll add it to .gitignore later" | Fix it now. Worktree dirs must never be committed. |
+| "Baseline tests don't matter" | Failing baseline = you'll waste time debugging pre-existing failures. |
+| "Skip dependency install, it'll be fine" | Missing deps = mysterious failures during implementation. |
+**Never** create a worktree without verifying it's gitignored. **Never** skip baseline test verification. **Never** proceed with failing baseline tests without asking.

package/template/.claude/skills/raid-implementation/SKILL.md ADDED Viewed

@@ -0,0 +1,155 @@
+---
+name: raid-implementation
+description: "Phase 3 of Raid protocol. Wizard assigns implementer and opens task Dungeon. Implementer builds with TDD. Challengers attack directly, building on each other's critiques. Wizard rotates and closes each task."
+---
+# Raid Implementation — Phase 3
+One builds, two attack — and the attackers attack each other's reviews too. Every implementation earns its approval through direct adversarial pressure.
+<HARD-GATE>
+Do NOT implement without an approved plan (except Scout mode). Do NOT skip TDD. Do NOT let any implementation pass unchallenged. Do NOT use subagents. Use `raid-tdd` skill for all test-driven development. Use `raid-verification` before any completion claims.
+</HARD-GATE>
+## Mode Behavior
+- **Full Raid**: 1 implements, 2 challenge (and challenge each other's reviews). Rotate implementer.
+- **Skirmish**: 1 implements, 1 challenges. Swap roles each task.
+- **Scout**: 1 agent implements. Wizard reviews. Self-challenge ruthlessly.
+TDD is enforced in ALL modes. This is an Iron Law.
+## Process Flow
+```dot
+digraph implementation {
+  "Wizard reads plan + Phase 2 Dungeon" -> "Create task tracking (TaskCreate)";
+  "Create task tracking (TaskCreate)" -> "Wizard assigns task N (rotate implementer)";
+  "Wizard assigns task N (rotate implementer)" -> "Wizard opens task Dungeon";
+  "Wizard opens task Dungeon" -> "Implementer executes (TDD)";
+  "Implementer executes (TDD)" -> "Implementer reports status";
+  "Implementer reports status" -> "Status?" [shape=diamond];
+  "Status?" -> "Challengers attack directly" [label="DONE"];
+  "Status?" -> "Read concerns, decide" [label="DONE_WITH_CONCERNS"];
+  "Status?" -> "Provide context, re-dispatch" [label="NEEDS_CONTEXT"];
+  "Status?" -> "Assess blocker" [label="BLOCKED"];
+  "Read concerns, decide" -> "Challengers attack directly";
+  "Provide context, re-dispatch" -> "Implementer executes (TDD)";
+  "Assess blocker" -> "Break down task / escalate";
+  "Challengers attack directly" -> "Challengers build on each other's critiques";
+  "Challengers build on each other's critiques" -> "Implementer defends against both";
+  "Implementer defends against both" -> "All issues resolved?" [shape=diamond];
+  "All issues resolved?" -> "Challengers attack directly" [label="no, continue"];
+  "All issues resolved?" -> "Wizard closes task: ruling" [label="yes"];
+  "Wizard closes task: ruling" -> "More tasks?" [shape=diamond];
+  "More tasks?" -> "Wizard assigns task N (rotate implementer)" [label="yes"];
+  "More tasks?" -> "Archive Dungeon + invoke raid-review" [label="no", shape=doublecircle];
+}
+```
+## Wizard Checklist
+1. **Read the plan** — extract all tasks, dependencies, ordering
+2. **Read Phase 2 archived Dungeon** — carry forward context
+3. **Set up worktree** — use `raid-git-worktrees` for isolation (optional)
+4. **Create task tracking** — use TaskCreate for every plan task
+5. **Per task:** Assign implementer (rotate), open Dungeon, observe attack, close with ruling
+6. **Track progress** — mark complete only after Wizard ruling per task
+7. **After all tasks** — archive Dungeon, invoke `raid-review`
+## The Implementation Gauntlet (per task)
+### Step 1: Wizard Assigns + Opens Dungeon
+One agent implements. Others prepare to attack. **Rotate the implementer** across tasks.
+The Wizard doesn't open a new Dungeon for every task — the Phase 3 Dungeon is continuous across all tasks. But the Wizard announces each task assignment clearly.
+### Step 2: Implementer Executes (TDD)
+Following `raid-tdd` strictly:
+1. Write the failing test from the plan
+2. Run test command from `.claude/raid.json` — verify it fails for the RIGHT reason
+3. Write minimal code to pass
+4. Run — verify pass
+5. Run FULL test suite — verify no regressions
+6. Self-review against acceptance criteria
+7. Commit: `feat(scope): descriptive message`
+Report status: **DONE** | **DONE_WITH_CONCERNS** | **NEEDS_CONTEXT** | **BLOCKED**
+### Step 3: Challengers Attack Directly
+This is where the new model shines. Challengers don't just report to the Wizard — they:
+1. **Read ACTUAL CODE** (not the implementer's report — reports lie)
+2. **Challenge the implementer directly:** `⚔️ CHALLENGE: @Warrior, your implementation at handler.js:23 doesn't validate...`
+3. **Build on each other's critiques:** `🔗 BUILDING ON @Archer: Your naming drift finding — the inconsistency also affects the test at...`
+4. **Roast weak implementations:** `🔥 ROAST: @Rogue, you claimed this handles concurrent access but there's no lock at...`
+5. **Pin verified issues to Dungeon:** `📌 DUNGEON: Confirmed issue — handler.js:23 missing validation [verified by @Archer and @Rogue]`
+**Challengers check:**
+- Spec compliance — does it match the task spec line by line?
+- Design doc compliance — does it match the design requirements?
+- Edge cases — what inputs break it?
+- Test quality — do tests prove correctness or just confirm happy path?
+- Naming consistency — do new names follow established patterns?
+- File structure — does new code follow project conventions?
+### Step 4: Implementer Defends
+The implementer defends against BOTH challengers simultaneously:
+- Respond to each challenge with evidence or concede immediately
+- Fix conceded issues
+- Re-run all tests
+- Pin resolved issues to Dungeon: `📌 DUNGEON: Resolved — added validation at handler.js:23 [tests pass]`
+### Step 5: Wizard Closes Task
+⚡ WIZARD RULING: Task N [approved | needs fixes]
+The Wizard closes when the Dungeon shows all issues resolved and challengers have no remaining critiques.
+## Handling Implementer Status
+| Status | Action |
+|--------|--------|
+| **DONE** | Challengers attack directly |
+| **DONE_WITH_CONCERNS** | Read concerns. If correctness: address before attack. If observations: note and proceed. |
+| **NEEDS_CONTEXT** | Provide missing information. Re-dispatch. |
+| **BLOCKED** | 1) Context → provide more. 2) Too complex → break into subtasks. 3) Plan wrong → fix plan. |
+**Never ignore an escalation.** If the implementer says it's stuck, something needs to change.
+## Quality Gates Per Task
+- [ ] Tests written BEFORE implementation (TDD)
+- [ ] Tests fail for the right reason
+- [ ] Tests pass after implementation
+- [ ] Full test suite passes (no regressions)
+- [ ] Challengers attacked ACTUAL CODE directly
+- [ ] Challengers built on each other's critiques
+- [ ] All challenges addressed (fixed or defended with evidence)
+- [ ] Implementation matches task spec (nothing more, nothing less)
+- [ ] Naming follows established patterns
+- [ ] Verified issues pinned to Dungeon
+- [ ] Code committed with descriptive message
+## Red Flags
+| Thought | Reality |
+|---------|---------|
+| "This task is simple, skip cross-testing" | Simple tasks are where assumptions slip through. |
+| "The challengers should report to the Wizard" | Challengers attack the implementer and each other directly. |
+| "We can batch the review for multiple tasks" | Review per task. Batching lets issues compound. |
+| "I trust this agent's work" | Trust without verification is the definition of a bug farm. |
+| "The same agent can implement twice in a row" | Rotation prevents blind spots. Enforce it. |
+| "I'll wait for the Wizard to coordinate the review" | Attack directly. Build on each other's findings. |
+## Escalation
+- **3+ fix attempts on one task:** Question whether the task spec or design is wrong.
+- **Agent repeatedly blocked:** The plan may need revision.
+- **Tests can't be written:** The design may not be testable. Return to Phase 1.
+**Terminal state:** All tasks approved. Archive Dungeon. Invoke `raid-review`.