claude-raid 0.1.1 → 0.1.3

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (51) hide show
  1. package/README.md +298 -196
  2. package/bin/cli.js +45 -18
  3. package/package.json +1 -1
  4. package/src/descriptions.js +57 -0
  5. package/src/detect-browser.js +164 -0
  6. package/src/detect-package-manager.js +107 -0
  7. package/src/detect-project.js +44 -6
  8. package/src/doctor.js +12 -188
  9. package/src/init.js +192 -17
  10. package/src/merge-settings.js +63 -7
  11. package/src/remove.js +28 -4
  12. package/src/setup.js +405 -0
  13. package/src/ui.js +168 -0
  14. package/src/update.js +62 -5
  15. package/src/version-check.js +130 -0
  16. package/template/.claude/agents/archer.md +46 -51
  17. package/template/.claude/agents/rogue.md +43 -49
  18. package/template/.claude/agents/warrior.md +48 -53
  19. package/template/.claude/agents/wizard.md +65 -67
  20. package/template/.claude/hooks/raid-lib.sh +182 -0
  21. package/template/.claude/hooks/raid-pre-compact.sh +41 -0
  22. package/template/.claude/hooks/raid-session-end.sh +116 -0
  23. package/template/.claude/hooks/raid-session-start.sh +52 -0
  24. package/template/.claude/hooks/raid-stop.sh +68 -0
  25. package/template/.claude/hooks/raid-task-completed.sh +37 -0
  26. package/template/.claude/hooks/raid-task-created.sh +40 -0
  27. package/template/.claude/hooks/raid-teammate-idle.sh +28 -0
  28. package/template/.claude/hooks/validate-browser-cleanup.sh +36 -0
  29. package/template/.claude/hooks/validate-browser-tests-exist.sh +52 -0
  30. package/template/.claude/hooks/validate-commit.sh +130 -0
  31. package/template/.claude/hooks/validate-dungeon.sh +114 -0
  32. package/template/.claude/hooks/validate-file-naming.sh +13 -27
  33. package/template/.claude/hooks/validate-no-placeholders.sh +11 -21
  34. package/template/.claude/hooks/validate-write-gate.sh +60 -0
  35. package/template/.claude/raid-rules.md +27 -18
  36. package/template/.claude/skills/raid-browser/SKILL.md +186 -0
  37. package/template/.claude/skills/raid-browser-chrome/SKILL.md +189 -0
  38. package/template/.claude/skills/raid-browser-playwright/SKILL.md +163 -0
  39. package/template/.claude/skills/raid-debugging/SKILL.md +6 -6
  40. package/template/.claude/skills/raid-design/SKILL.md +10 -10
  41. package/template/.claude/skills/raid-finishing/SKILL.md +11 -3
  42. package/template/.claude/skills/raid-implementation/SKILL.md +26 -11
  43. package/template/.claude/skills/raid-implementation-plan/SKILL.md +15 -4
  44. package/template/.claude/skills/raid-protocol/SKILL.md +57 -32
  45. package/template/.claude/skills/raid-review/SKILL.md +42 -13
  46. package/template/.claude/skills/raid-tdd/SKILL.md +45 -3
  47. package/template/.claude/skills/raid-verification/SKILL.md +12 -1
  48. package/template/.claude/hooks/validate-commit-message.sh +0 -78
  49. package/template/.claude/hooks/validate-phase-gate.sh +0 -60
  50. package/template/.claude/hooks/validate-tests-pass.sh +0 -43
  51. package/template/.claude/hooks/validate-verification.sh +0 -70
@@ -87,21 +87,21 @@ digraph debugging {
87
87
 
88
88
  The Wizard dispatches all agents with different hypotheses. After dispatch, agents debate directly:
89
89
 
90
- **📡 DISPATCH:**
90
+ **DISPATCH:**
91
91
  > **@Warrior**: Investigate [structural/data cause]. Reproduce. Trace data flow. Gather evidence at boundaries.
92
92
  > **@Archer**: Investigate [integration/contract cause]. Check interfaces, type mismatches, implicit contracts, dependency versions.
93
93
  > **@Rogue**: Investigate [timing/state/adversarial cause]. Race conditions, stale state, environment assumptions, concurrent access.
94
94
  >
95
- > **All**: Investigate independently, then debate directly. Challenge each other's hypotheses with evidence. Build on each other's findings. Pin verified evidence to the Dungeon. The hypothesis that survives cross-testing wins. Escalate to me with `🆘 WIZARD:` only if stuck.
95
+ > **All**: Investigate independently, then debate directly. Challenge each other's hypotheses with evidence. Build on each other's findings. Pin verified evidence to the Dungeon. The hypothesis that survives cross-testing wins. Escalate to me with `WIZARD:` only if stuck.
96
96
 
97
97
  **How agents debate root cause:**
98
- - `⚔️ CHALLENGE: @Rogue, your race condition hypothesis doesn't explain why it fails on single-threaded test runs — evidence: [test output]`
99
- - `🔗 BUILDING ON @Warrior: Your data flow trace reveals the value originates from the config loader, not the API call — here's the upstream path: ...`
100
- - `📌 DUNGEON: Root cause evidence — config loader at config.js:47 returns stale cache when called concurrently [verified by @Archer and @Warrior]`
98
+ - `CHALLENGE: @Rogue, your race condition hypothesis doesn't explain why it fails on single-threaded test runs — evidence: [test output]`
99
+ - `BUILDING: @Warrior, your data flow trace reveals the value originates from the config loader, not the API call — here's the upstream path: ...`
100
+ - `DUNGEON: Root cause evidence — config loader at config.js:47 returns stale cache when called concurrently [verified by @Archer and @Warrior]`
101
101
 
102
102
  The hypothesis that survives direct cross-testing gets the Wizard's ruling:
103
103
 
104
- ⚡ WIZARD RULING: Root cause is [X] because [evidence from Dungeon].
104
+ RULING: Root cause is [X] because [evidence from Dungeon].
105
105
 
106
106
  ## Root Cause Tracing
107
107
 
@@ -92,7 +92,7 @@ Create `.claude/raid-dungeon.md`:
92
92
 
93
93
  Each agent gets the same objective but a different starting angle. After dispatch, the Wizard goes silent.
94
94
 
95
- **📡 DISPATCH:**
95
+ **DISPATCH:**
96
96
 
97
97
  > **@Warrior**: Explore from the data/infrastructure side. What are the hard technical constraints? What schemas, migrations, APIs are needed? What breaks if we get this wrong? Find the structural load-bearing walls. Challenge @Archer and @Rogue's findings directly. Pin verified findings to the Dungeon.
98
98
  >
@@ -100,7 +100,7 @@ Each agent gets the same objective but a different starting angle. After dispatc
100
100
  >
101
101
  > **@Rogue**: Explore from the failure/adversarial side. What assumptions about inputs, state, timing, availability? Build failure scenarios. What does a malicious user do? What does a slow network do? What does concurrent access do? Challenge @Warrior and @Archer's findings directly. Pin verified findings to the Dungeon.
102
102
  >
103
- > **All**: Read the Dungeon. Build on each other's discoveries. Challenge everything. Pin only what survives. Escalate to me with `🆘 WIZARD:` only when genuinely stuck.
103
+ > **All**: Read the Dungeon. Build on each other's discoveries. Challenge everything. Pin only what survives. Escalate to me with `WIZARD:` only when genuinely stuck.
104
104
 
105
105
  ## What Agents Must Cover
106
106
 
@@ -109,7 +109,7 @@ Every agent addresses ALL of these from their assigned angle:
109
109
  - **Performance** — scale, bottlenecks, complexity
110
110
  - **Robustness** — retries, fallbacks, graceful degradation
111
111
  - **Reliability** — blast radius of failure, production-readiness
112
- - **Testability** — meaningful tests, mock strategy, test-friendly design
112
+ - **Testability** — meaningful tests, mock strategy, test-friendly design. When `browser.enabled`: can this feature be E2E tested with Playwright? What user flows need browser verification? Are there loading states, client-side routing, or visual states that unit tests can't catch?
113
113
  - **Error handling** — what errors occur, how surfaced, UX of failure
114
114
  - **Edge cases** — empty, null, boundary, Unicode, timezones, large payloads
115
115
  - **Cascading effects** — blast radius, what else changes
@@ -124,12 +124,12 @@ Every agent addresses ALL of these from their assigned angle:
124
124
  Agents interact DIRECTLY — @Name addressing, building, challenging, roasting:
125
125
  1. Present findings with EVIDENCE (file paths, docs, concrete examples)
126
126
  2. Challenge other agents DIRECTLY with COUNTER-EVIDENCE (not opinions)
127
- 3. Build on each other's discoveries — 🔗 BUILDING ON @Name:
127
+ 3. Build on each other's discoveries — BUILDING: with independent verification
128
128
  4. Go to the EDGES — push every finding to its extreme
129
129
  5. LEARN from each other — incorporate discoveries into your model
130
- 6. Pin verified findings — 📌 DUNGEON: only after surviving challenge
131
- 7. Roast weak analysis — 🔥 ROAST: with evidence, not insults
132
- 8. Escalate to Wizard — 🆘 WIZARD: only when genuinely stuck
130
+ 6. Pin verified findings — DUNGEON: only after surviving challenge
131
+ 7. Challenge weak analysis — back every challenge with your own independent evidence
132
+ 8. Escalate to Wizard — WIZARD: only when genuinely stuck
133
133
  ```
134
134
 
135
135
  **The goal is not to tear each other down. The goal is to forge the strongest design by testing it from every angle. The Dungeon captures what survived.**
@@ -144,7 +144,7 @@ The Wizard closes when the Dungeon has sufficient verified findings — enough D
144
144
  - Agents are converging — new findings are variations, not revelations
145
145
  - Shared Knowledge section has the foundational truths the design needs
146
146
 
147
- **⚡ WIZARD RULING:** Synthesize from Dungeon evidence. Propose 2-3 approaches. Recommend one. Archive Dungeon.
147
+ **RULING:** Synthesize from Dungeon evidence. Propose 2-3 approaches. Recommend one. Archive Dungeon.
148
148
 
149
149
  ## Spec Self-Review
150
150
 
@@ -183,7 +183,7 @@ Save to: specs path from `.claude/raid.json` (default: `docs/raid/specs/YYYY-MM-
183
183
  ## Testing Strategy
184
184
  ## Edge Cases
185
185
  ## Future Considerations (NOT building now, designing to accommodate)
186
- ## ⚡ WIZARD RULING
186
+ ## RULING
187
187
  ```
188
188
 
189
189
  ## Red Flags — Thoughts That Signal Violations
@@ -205,4 +205,4 @@ If the team is stuck on a fundamental design choice after genuine direct debate:
205
205
  2. Let the human decide
206
206
  3. Never ask the human to resolve something the team should handle
207
207
 
208
- **Terminal state:** ⚡ WIZARD RULING: Design approved. Commit. Archive Dungeon. Invoke `raid-implementation-plan`.
208
+ **Terminal state:** RULING: Design approved. Commit. Archive Dungeon. Invoke `raid-implementation-plan`.
@@ -48,7 +48,7 @@ digraph finishing {
48
48
 
49
49
  ## Step 1: The Completeness Debate
50
50
 
51
- **📡 DISPATCH:**
51
+ **DISPATCH:**
52
52
 
53
53
  > **@Warrior**: Review the implementation against the plan. Is every task completed? Every acceptance criterion met? Every test passing? Is anything half-done? Fight @Archer and @Rogue directly on their assessments.
54
54
  >
@@ -60,7 +60,7 @@ digraph finishing {
60
60
 
61
61
  **The agents must fight over this.** If any agent believes the work is incomplete, they present evidence. The other two challenge that claim directly.
62
62
 
63
- ⚡ WIZARD RULING: [Complete — proceed | Incomplete — return to Phase 3/4 with specific issues]
63
+ RULING: [Complete — proceed | Incomplete — return to Phase 3/4 with specific issues]
64
64
 
65
65
  ## Step 2: Final Verification
66
66
 
@@ -74,10 +74,18 @@ BEFORE presenting options:
74
74
  If YES → Proceed with evidence.
75
75
  ```
76
76
 
77
+ ### Browser Verification (when `browser.enabled` in raid.json)
78
+
79
+ Additional final checks:
80
+ - Full Playwright test suite passes headlessly
81
+ - Verify no leaked processes from prior browser sessions
82
+ - Verify all ports in `browser.portRange` are free (`lsof -i :PORT`)
83
+ - Agents debate: "Are browser tests sufficient for this feature's coverage?"
84
+
77
85
  ## Step 3: Present Options
78
86
 
79
87
  ```
80
- ⚡ WIZARD RULING: Implementation complete and verified.
88
+ RULING: Implementation complete and verified.
81
89
 
82
90
  Tests: [N] passing, 0 failures (evidence: [command output])
83
91
 
@@ -52,10 +52,14 @@ digraph implementation {
52
52
  1. **Read the plan** — extract all tasks, dependencies, ordering
53
53
  2. **Read Phase 2 archived Dungeon** — carry forward context
54
54
  3. **Set up worktree** — use `raid-git-worktrees` for isolation (optional)
55
- 4. **Create task tracking** use TaskCreate for every plan task
56
- 5. **Per task:** Assign implementer (rotate), open Dungeon, observe attack, close with ruling
57
- 6. **Track progress**mark complete only after Wizard ruling per task
58
- 7. **After all tasks** archive Dungeon, invoke `raid-review`
55
+ 4. **Browser setup (if `browser.enabled` in raid.json)**:
56
+ - Check if `browser.startup` exists if null, invoke `raid-browser` startup discovery FIRST
57
+ - Check if Playwright is installed if not, first task becomes "scaffold Playwright"
58
+ - Assign port from `browser.portRange` to implementer
59
+ 5. **Create task tracking** — use TaskCreate for every plan task
60
+ 6. **Per task:** Assign implementer (rotate), open Dungeon, observe attack, close with ruling
61
+ 7. **Track progress** — mark complete only after Wizard ruling per task
62
+ 8. **After all tasks** — archive Dungeon, invoke `raid-review`
59
63
 
60
64
  ## The Implementation Gauntlet (per task)
61
65
 
@@ -63,7 +67,7 @@ digraph implementation {
63
67
 
64
68
  One agent implements. Others prepare to attack. **Rotate the implementer** across tasks.
65
69
 
66
- The Wizard doesn't open a new Dungeon for every taskthe Phase 3 Dungeon is continuous across all tasks. But the Wizard announces each task assignment clearly.
70
+ Phase 3 uses a single continuous Dungeon (`.claude/raid-dungeon.md`) across all tasks unlike Phases 1 and 2 which each get their own Dungeon that is archived on close. The Wizard announces each task assignment clearly within the running Dungeon.
67
71
 
68
72
  ### Step 2: Implementer Executes (TDD)
69
73
 
@@ -76,6 +80,11 @@ Following `raid-tdd` strictly:
76
80
  6. Self-review against acceptance criteria
77
81
  7. Commit: `feat(scope): descriptive message`
78
82
 
83
+ **Browser tasks (if `browser.enabled` and task involves browser-facing code):**
84
+ - BOOT app on assigned port before browser TDD (invoke `raid-browser`)
85
+ - Use Playwright MCP tools to explore while authoring tests
86
+ - CLEANUP after task is complete (or on failure — cleanup always runs)
87
+
79
88
  Report status: **DONE** | **DONE_WITH_CONCERNS** | **NEEDS_CONTEXT** | **BLOCKED**
80
89
 
81
90
  ### Step 3: Challengers Attack Directly
@@ -83,10 +92,16 @@ Report status: **DONE** | **DONE_WITH_CONCERNS** | **NEEDS_CONTEXT** | **BLOCKED
83
92
  This is where the new model shines. Challengers don't just report to the Wizard — they:
84
93
 
85
94
  1. **Read ACTUAL CODE** (not the implementer's report — reports lie)
86
- 2. **Challenge the implementer directly:** `⚔️ CHALLENGE: @Warrior, your implementation at handler.js:23 doesn't validate...`
87
- 3. **Build on each other's critiques:** `🔗 BUILDING ON @Archer: Your naming drift finding — the inconsistency also affects the test at...`
88
- 4. **Roast weak implementations:** `🔥 ROAST: @Rogue, you claimed this handles concurrent access but there's no lock at...`
89
- 5. **Pin verified issues to Dungeon:** `📌 DUNGEON: Confirmed issue — handler.js:23 missing validation [verified by @Archer and @Rogue]`
95
+ 2. **Challenge the implementer directly:** `CHALLENGE: @Warrior, your implementation at handler.js:23 doesn't validate...`
96
+ 3. **Build on each other's critiques:** `BUILDING: @Archer, your naming drift finding — the inconsistency also affects the test at...`
97
+ 4. **Challenge weak implementations:** `CHALLENGE: @Rogue, you claimed this handles concurrent access but there's no lock at...`
98
+ 5. **Pin verified issues to Dungeon:** `DUNGEON: Confirmed issue — handler.js:23 missing validation [verified by @Archer and @Rogue]`
99
+
100
+ **Browser verification (if `browser.enabled`):**
101
+ - Challengers can BOOT on their own ports to run Playwright tests independently
102
+ - Verify tests pass without flakiness (run 3x if suspect)
103
+ - Explore the feature manually via Playwright MCP to find gaps the tests missed
104
+ - Each challenger CLEANUPS their own instance when done
90
105
 
91
106
  **Challengers check:**
92
107
  - Spec compliance — does it match the task spec line by line?
@@ -102,11 +117,11 @@ The implementer defends against BOTH challengers simultaneously:
102
117
  - Respond to each challenge with evidence or concede immediately
103
118
  - Fix conceded issues
104
119
  - Re-run all tests
105
- - Pin resolved issues to Dungeon: `📌 DUNGEON: Resolved — added validation at handler.js:23 [tests pass]`
120
+ - Pin resolved issues to Dungeon: `DUNGEON: Resolved — added validation at handler.js:23 [tests pass]`
106
121
 
107
122
  ### Step 5: Wizard Closes Task
108
123
 
109
- ⚡ WIZARD RULING: Task N [approved | needs fixes]
124
+ RULING: Task N [approved | needs fixes]
110
125
 
111
126
  The Wizard closes when the Dungeon shows all issues resolved and challengers have no remaining critiques.
112
127
 
@@ -76,7 +76,7 @@ Create `.claude/raid-dungeon.md`:
76
76
 
77
77
  ## Dispatch for Decomposition
78
78
 
79
- **📡 DISPATCH:**
79
+ **DISPATCH:**
80
80
 
81
81
  > **@Warrior**: Decompose into tasks. Focus on structural ordering — what MUST be built first? Hard dependencies? Critical path? Include tests for every task. Challenge @Archer and @Rogue's decompositions directly. Pin agreed tasks to Dungeon.
82
82
  >
@@ -84,7 +84,7 @@ Create `.claude/raid-dungeon.md`:
84
84
  >
85
85
  > **@Rogue**: Decompose into tasks. Focus on hidden complexity — which tasks are deceptively hard? Where will the implementer guess wrong? Which tests miss the failure path? Challenge @Warrior and @Archer directly. Pin agreed tasks to Dungeon.
86
86
  >
87
- > **All**: Read the Phase 1 archived Dungeon for design knowledge. Interact directly. Build on each other's decompositions. Pin agreed tasks with `📌 DUNGEON:`. Escalate to me with `🆘 WIZARD:` only when genuinely stuck.
87
+ > **All**: Read the Phase 1 archived Dungeon for design knowledge. Interact directly. Build on each other's decompositions. Pin agreed tasks with `DUNGEON:`. Escalate to me with `WIZARD:` only when genuinely stuck.
88
88
 
89
89
  ## Collaborative Compliance Testing (Agent-Driven)
90
90
 
@@ -109,6 +109,17 @@ After independent decomposition, agents fight directly over the plan:
109
109
  - "Run tests to verify pass" — step
110
110
  - "Commit" — step
111
111
 
112
+ ### Browser Test Tasks (when `browser.enabled` in raid.json)
113
+
114
+ When a task involves browser-facing code, the plan must include browser test steps alongside unit tests:
115
+ - "Write failing Playwright test (`tests/e2e/<feature>.spec.ts`)" — step
116
+ - "Run `{execCommand} playwright test` to verify it fails" — step
117
+ - "Implement the feature" — step
118
+ - "Run Playwright test to verify it passes" — step
119
+ - "Run full suite (unit + browser) to verify no regressions" — step
120
+
121
+ Not every task needs a browser test. Include them for user-facing flows, UI interactions, client-side routing, and visual state changes. State reasoning — challengers will attack this decision.
122
+
112
123
  ## Task Structure
113
124
 
114
125
  ````markdown
@@ -154,7 +165,7 @@ After writing the complete plan:
154
165
  2. **Placeholder scan:** Search for TBD, TODO, vague descriptions, missing code. Fix them.
155
166
  3. **Type/name consistency:** Do types, method signatures, property names match across ALL tasks?
156
167
  4. **File structure consistency:** Do all file paths follow the project's conventions?
157
- 5. **Test quality:** Does every task have tests? Do tests cover failure paths?
168
+ 5. **Test quality:** Does every task have tests? Do tests cover failure paths? When `browser.enabled`: do browser-facing tasks include Playwright tests?
158
169
  6. **Ordering:** Can each task be built and committed independently without breaking the build?
159
170
 
160
171
  Fix issues inline. If a spec requirement has no task, add the task.
@@ -170,4 +181,4 @@ Fix issues inline. If a spec requirement has no task, add the task.
170
181
  | "Tests can be added later" | TDD means tests are in the plan. No test = no task. |
171
182
  | "The naming will be consistent enough" | Check it explicitly. Naming drift is the #1 source of bugs. |
172
183
 
173
- **Terminal state:** ⚡ WIZARD RULING: Plan approved. Commit. Archive Dungeon. Invoke `raid-implementation`.
184
+ **Terminal state:** RULING: Plan approved. Commit. Archive Dungeon. Invoke `raid-implementation`.
@@ -60,12 +60,35 @@ Read `.claude/raid.json` for project-specific settings. If absent, use sensible
60
60
  | `project.testCommand` | (none) | Command to run tests |
61
61
  | `project.lintCommand` | (none) | Command to run linting |
62
62
  | `project.buildCommand` | (none) | Command to build |
63
+ | `project.packageManager` | (auto-detected) | Package manager (npm, pnpm, yarn, bun, uv, poetry) |
64
+ | `project.runCommand` | (auto-detected) | Run command prefix (e.g., `pnpm`, `npm run`) |
65
+ | `project.execCommand` | (auto-detected) | Exec command prefix (e.g., `pnpm dlx`, `npx`) |
63
66
  | `paths.specs` | `docs/raid/specs` | Where design docs go |
64
67
  | `paths.plans` | `docs/raid/plans` | Where plans go |
65
68
  | `paths.worktrees` | `.worktrees` | Where worktrees go |
66
69
  | `conventions.fileNaming` | `none` | Naming convention |
67
70
  | `conventions.commits` | `conventional` | Commit format |
68
71
  | `raid.defaultMode` | `full` | Default mode |
72
+ | `browser.enabled` | `false` | Whether browser testing is active |
73
+ | `browser.framework` | (auto-detected) | Detected framework (next, vite, angular, etc.) |
74
+ | `browser.devCommand` | (auto-detected) | Dev server command |
75
+ | `browser.baseUrl` | (auto-detected) | Base URL for browser tests |
76
+ | `browser.portRange` | `[3001, 3005]` | Port range for isolated agent instances |
77
+ | `browser.playwrightConfig` | `playwright.config.ts` | Playwright config path |
78
+ | `browser.auth` | `null` | Auth config (discovered by agents) |
79
+ | `browser.startup` | `null` | Startup recipe (discovered by agents) |
80
+
81
+ ## Browser Testing
82
+
83
+ When `browser.enabled` is `true` in `raid.json`, browser testing integrates into the existing workflow:
84
+
85
+ - **Phase 3 (Implementation):** Browser-facing code uses TDD with Playwright — write `.spec.ts` files as part of RED-GREEN-REFACTOR. Use `raid-browser-playwright`. Challengers boot their own app instances to verify tests independently.
86
+ - **Phase 4 (Review):** After code review, challengers do live adversarial inspection in Chrome — each on their own isolated port. Use `raid-browser-chrome`. Warrior stress-tests, Archer checks visual consistency, Rogue probes security.
87
+ - **Startup discovery:** First time browser testing runs, an agent investigates how to boot the app (dev server, databases, edge workers, env vars) and writes the recipe to `raid.json`. Use `raid-browser`.
88
+ - **Pre-flight:** Before every browser session, agents must state exactly what they're testing (hard gate) and check auth requirements.
89
+ - **Cleanup iron law:** Every boot has a matching cleanup. Leaked processes are never acceptable.
90
+
91
+ Browser testing is **not a separate workflow** — it extends existing phases. If `browser.enabled` is `false` or absent, all browser-related behavior is skipped.
69
92
 
70
93
  ## Modes
71
94
 
@@ -122,14 +145,14 @@ The Dungeon (`.claude/raid-dungeon.md`) is the team's shared knowledge board. It
122
145
  | Event | Action | Who |
123
146
  |-------|--------|-----|
124
147
  | Phase opens | Create `.claude/raid-dungeon.md` with header | Wizard |
125
- | During phase | Read and write via `📌 DUNGEON:` signal | Agents |
148
+ | During phase | Read and write via `DUNGEON:` signal | Agents |
126
149
  | Phase closes | Rename to `.claude/raid-dungeon-phase-N.md` | Wizard |
127
150
  | Next phase opens | Create fresh `.claude/raid-dungeon.md` | Wizard |
128
151
  | Session ends | Remove all Dungeon files | Wizard |
129
152
 
130
153
  ### Dungeon Curation Rules
131
154
 
132
- **What goes IN the Dungeon (via `📌 DUNGEON:` only):**
155
+ **What goes IN the Dungeon (via `DUNGEON:` only):**
133
156
  - Findings that survived a challenge (verified truths)
134
157
  - Active unresolved battles (prevents re-litigation)
135
158
  - Shared knowledge promoted by 2+ agents agreeing
@@ -185,30 +208,24 @@ digraph phase_pattern {
185
208
 
186
209
  | Signal | Who | Meaning | Goes to Dungeon? |
187
210
  |--------|-----|---------|------------------|
188
- | `📡 DISPATCH:` | Wizard | Opening a phase, assigning angles | No (phase opening) |
189
- | `⚡ WIZARD OBSERVES:` | Wizard | Brief course correction, hint, nudge | No |
190
- | `⚡ WIZARD INTERVENES:` | Wizard | Stops action, something wrong | No |
191
- | `⚡ WIZARD RULING:` | Wizard | Phase over, binding decision | Ruling archived with Dungeon |
211
+ | `DISPATCH:` | Wizard | Opening a phase, assigning angles | No (phase opening) |
212
+ | `REDIRECT:` | Wizard | Brief course correction — one sentence, then silence | No |
213
+ | `RULING:` | Wizard | Phase over, binding decision | Ruling archived with Dungeon |
192
214
  | `@Name, ...` | Any agent | Direct address to specific agent | No |
193
- | `🔍 FINDING:` | Warrior | Discovery with evidence | Only after surviving challenge |
194
- | `🎯 FINDING:` | Archer | Discovery with evidence | Only after surviving challenge |
195
- | `💀 FINDING:` | Rogue | Discovery with attack scenario | Only after surviving challenge |
196
- | `⚔️ CHALLENGE:` | Warrior | Direct challenge | No |
197
- | `🏹 CHALLENGE:` | Archer | Direct challenge | No |
198
- | `🗡️ CHALLENGE:` | Rogue | Direct challenge | No |
199
- | `🔥 ROAST:` | Any agent | Pointed critique with evidence | No |
200
- | `🔗 BUILDING ON @Name:` | Any agent | Extending another's work | Result goes to Dungeon if verified |
201
- | `📌 DUNGEON:` | Any agent | Pinning verified finding | Yes — this is the write gate |
202
- | `🆘 WIZARD:` | Any agent | Escalation — needs Wizard input | Yes (as escalation point) |
203
- | `✅ CONCEDE:` | Any agent | Admitting wrong, moving on | No |
215
+ | `FINDING:` | Any agent | Discovery with own evidence | No |
216
+ | `CHALLENGE:` | Any agent | Independently verified a claim, found a problem | No |
217
+ | `BUILDING:` | Any agent | Independently verified a claim, found it goes deeper | Result goes to Dungeon if verified |
218
+ | `DUNGEON:` | Any agent | Pinning finding verified by 2+ agents | Yes — this is the write gate |
219
+ | `WIZARD:` | Any agent | Escalation needs Wizard input | Yes (as escalation point) |
220
+ | `CONCEDE:` | Any agent | Proven wrong, moving on | No |
204
221
 
205
222
  ### Direct Interaction Rules
206
223
 
207
224
  - **Evidence required.** All challenges, roasts, and findings must carry proof — file paths, line numbers, concrete scenarios. "This is wrong" without evidence is laziness.
208
- - **Build explicitly.** `🔗 BUILDING ON @Name:` forces credit and continuity. Don't restart from scratch when someone found something useful.
225
+ - **Build explicitly.** `BUILDING:` forces credit and continuity. Don't restart from scratch when someone found something useful.
209
226
  - **Concede instantly.** When proven wrong, concede. Then find a new angle. No ego.
210
- - **Pin deliberately.** `📌 DUNGEON:` is the quality gate. Only verified, challenged findings get pinned. Other agents can challenge whether a pin belongs.
211
- - **Escalate wisely.** `🆘 WIZARD:` when genuinely stuck, split on fundamentals, or need project-level context. Not when lazy.
227
+ - **Pin deliberately.** `DUNGEON:` is the quality gate. Only verified, challenged findings get pinned. Other agents can challenge whether a pin belongs.
228
+ - **Escalate wisely.** `WIZARD:` when genuinely stuck, split on fundamentals, or need project-level context. Not when lazy.
212
229
 
213
230
  ### When to Escalate to Wizard
214
231
 
@@ -230,14 +247,16 @@ The Wizard observes 90%, acts 10%. Intervention triggers:
230
247
 
231
248
  | Signal | Action |
232
249
  |--------|--------|
233
- | Same arguments 3+ rounds, no new evidence | `⚡ WIZARD INTERVENES:` Break the loop. Rule or redirect. |
234
- | Agents drifting from objective | `⚡ WIZARD OBSERVES:` Redirect with clarity. |
235
- | Agents stuck, no progress (deadlock) | `⚡ WIZARD INTERVENES:` Rule with rationale. Binding. |
236
- | Shallow work, rubber-stamping (laziness) | `⚡ WIZARD INTERVENES:` Call out and demand genuine challenge. |
237
- | Defending past evidence (ego) | `⚡ WIZARD OBSERVES:` Evidence or concede. |
238
- | Wrong finding in Dungeon (misinformation) | `⚡ WIZARD INTERVENES:` Remove and correct. |
239
- | Agent escalation (`🆘 WIZARD:`) | Answer or redirect as appropriate. |
240
- | All agents converged | `⚡ WIZARD RULING:` Synthesize and close. |
250
+ | Same arguments 3+ rounds, no new evidence | `REDIRECT:` Break the loop. Or `RULING:` if unresolvable. |
251
+ | Agents drifting from objective | `REDIRECT:` One sentence back on track. |
252
+ | Agents stuck, no progress (deadlock) | `RULING:` Decide with rationale. Binding. |
253
+ | Shallow work, rubber-stamping (laziness) | `REDIRECT:` Demand genuine independent verification. |
254
+ | Skipped verification (responded without own evidence) | `REDIRECT:` "Verify first, then respond." |
255
+ | Premature convergence (agreed without challenging) | `REDIRECT:` "Challenge before agreeing." |
256
+ | Defending past evidence (ego) | `REDIRECT:` Evidence or concede. |
257
+ | Wrong finding in Dungeon (misinformation) | `REDIRECT:` Remove and correct. |
258
+ | Agent escalation (`WIZARD:`) | Answer or redirect as appropriate. |
259
+ | All agents converged with genuine verification | `RULING:` Synthesize and close. |
241
260
 
242
261
  ## Red Flags — Thoughts That Signal Violations
243
262
 
@@ -267,17 +286,23 @@ The Wizard observes 90%, acts 10%. Intervention triggers:
267
286
  | `raid-debugging` | Any | Competing hypothesis with direct debate |
268
287
  | `raid-verification` | Any | Evidence before completion claims |
269
288
  | `raid-git-worktrees` | 3 | Isolated workspace setup |
289
+ | `raid-browser` | 3, 4 | Browser orchestration: startup discovery, boot/cleanup, pre-flight |
290
+ | `raid-browser-playwright` | 3 | Automated browser TDD with Playwright MCP |
291
+ | `raid-browser-chrome` | 4 | Live adversarial Chrome inspection |
270
292
 
271
293
  ## Hooks Reference
272
294
 
295
+ All hooks source `raid-lib.sh` for shared session/config parsing.
296
+
273
297
  | Hook | Event | Active | Purpose |
274
298
  |------|-------|--------|---------|
299
+ | `validate-commit.sh` | PreToolUse (Bash) | Always (format), Raid session (tests/verification) | Conventional commits + tests pass + verification evidence |
300
+ | `validate-write-gate.sh` | PreToolUse (Write/Edit) | Raid session only | Phase-aware write gate (design doc before code) |
275
301
  | `validate-file-naming.sh` | PostToolUse (Write/Edit) | Always | Enforce naming conventions |
276
- | `validate-commit-message.sh` | PreToolUse (Bash) | Always | Conventional commits |
277
- | `validate-tests-pass.sh` | PreToolUse (Bash) | Raid session only | Tests before commits |
278
- | `validate-phase-gate.sh` | PreToolUse (Write) | Raid session only | Design doc before code |
279
302
  | `validate-no-placeholders.sh` | PostToolUse (Write/Edit) | Always | No TBD/TODO in specs/plans |
280
- | `validate-verification.sh` | PreToolUse (Bash) | Raid session only | Test evidence before completion |
303
+ | `validate-dungeon.sh` | PostToolUse (Write/Edit) | Raid session only | Dungeon discipline enforcement |
304
+ | `validate-browser-cleanup.sh` | PostToolUse (Bash) | Raid session + browser enabled | Warn if browser ports still occupied |
305
+ | `validate-browser-tests-exist.sh` | PreToolUse (Bash) | Raid session + browser enabled | Warn if browser-facing code has no Playwright tests |
281
306
 
282
307
  ## Commit Convention
283
308
 
@@ -43,11 +43,13 @@ digraph review {
43
43
  3. **Dispatch** — all agents review independently, then interact directly
44
44
  4. **Observe the fight** — agents challenge findings and missing findings directly
45
45
  5. **Close** — categorize surviving issues by severity from Dungeon
46
- 6. **Rule on fixes** — Critical and Important must be fixed
47
- 7. **Verify fixes** — targeted re-attack after fixes (use `raid-verification`)
48
- 8. **Final ruling** — approved or rejected
49
- 9. **Archive Dungeon** — rename to `.claude/raid-dungeon-phase-4.md`
50
- 10. **Transition** — invoke `raid-finishing`
46
+ 6. **Browser inspection** — dispatch agents to inspect in Chrome (if `browser.enabled`)
47
+ 7. **Observe browser fights** — agents cross-verify findings on separate instances
48
+ 8. **Rule on fixes** — Critical and Important must be fixed (code AND browser)
49
+ 9. **Verify fixes** — targeted re-attack after fixes (use `raid-verification`)
50
+ 10. **Final ruling** — approved or rejected
51
+ 11. **Archive Dungeon** — rename to `.claude/raid-dungeon-phase-4.md`
52
+ 12. **Transition** — invoke `raid-finishing`
51
53
 
52
54
  ## Opening the Dungeon
53
55
 
@@ -71,7 +73,7 @@ Create `.claude/raid-dungeon.md`:
71
73
 
72
74
  ## Dispatch
73
75
 
74
- **📡 DISPATCH:**
76
+ **DISPATCH:**
75
77
 
76
78
  > **@Warrior**: Review full implementation. Run every test. Check error handling at every boundary. Verify all requirements from design doc. Find the bugs that crash in production. Then fight @Archer and @Rogue over their findings.
77
79
  >
@@ -79,7 +81,7 @@ Create `.claude/raid-dungeon.md`:
79
81
  >
80
82
  > **@Rogue**: Review full implementation. Think like an attacker. What inputs break it? What timing causes races? What happens when dependencies fail? Find the bugs nobody else will find. Then fight @Warrior and @Archer.
81
83
  >
82
- > **All**: Review independently first, then fight directly. Challenge each other's findings AND each other's blind spots. Pin severity-classified issues to Dungeon with `📌 DUNGEON:`. Reference the Phase 3 Dungeon for context.
84
+ > **All**: Review independently first, then fight directly. Challenge each other's findings AND each other's blind spots. Pin severity-classified issues to Dungeon with `DUNGEON:`. Reference the Phase 3 Dungeon for context.
83
85
 
84
86
  ## Review Checklist — Each Agent
85
87
 
@@ -99,10 +101,10 @@ Create `.claude/raid-dungeon.md`:
99
101
 
100
102
  After independent reviews, agents fight DIRECTLY over findings AND missing findings:
101
103
 
102
- - `⚔️ CHALLENGE: @Archer, you gave the auth module a pass but didn't check the session rotation path — review it now.`
103
- - `🔗 BUILDING ON @Warrior: Your finding about the missing error handler — the impact is worse than you stated because...`
104
- - `🔥 ROAST: @Rogue, your "Critical" severity on the naming inconsistency is overblown — here's why it's actually Minor...`
105
- - `📌 DUNGEON: [Critical] handler.js:23 — missing input validation allows injection. Verified by @Warrior and @Rogue.`
104
+ - `CHALLENGE: @Archer, you gave the auth module a pass but didn't check the session rotation path — review it now.`
105
+ - `BUILDING: @Warrior, your finding about the missing error handler — the impact is worse than you stated because...`
106
+ - `CHALLENGE: @Rogue, your "Critical" severity on the naming inconsistency is overblown — here's why it's actually Minor...`
107
+ - `DUNGEON: [Critical] handler.js:23 — missing input validation allows injection. Verified by @Warrior and @Rogue.`
106
108
 
107
109
  **Agents classify severity when pinning to Dungeon:**
108
110
 
@@ -112,13 +114,40 @@ After independent reviews, agents fight DIRECTLY over findings AND missing findi
112
114
  | **Important** | Missing features, poor error handling, test gaps, naming inconsistencies | Must fix. |
113
115
  | **Minor** | Style, docs, optimization | Note for future. |
114
116
 
117
+ ## Browser Inspection Phase (when `browser.enabled` in raid.json)
118
+
119
+ After code review findings are pinned, the Wizard announces browser inspection.
120
+
121
+ ### Process
122
+
123
+ 1. **Wizard announces:** "Browser inspection phase — each reviewer boots their own instance"
124
+ 2. **Each reviewer BOOTs** their own app instance on separate ports (invoke `raid-browser`)
125
+ 3. **Each reviewer runs PRE-FLIGHT** — state test subject, check auth, discover routes
126
+ 4. **Each reviewer LOGINs** if auth is required (credentials from `.env.raid`)
127
+ 5. **Each reviewer inspects** from their angle (invoke `raid-browser-chrome`):
128
+ - Minimum gates first (console, network, page loads)
129
+ - Then angle-driven exploration (Warrior: stress, Archer: visual/precision, Rogue: security)
130
+ - Evidence captured for every finding (GIF, screenshot, console/network)
131
+ 6. **Cross-verification** — each reviewer reproduces others' findings on their own instance
132
+ 7. **Pin browser findings** to Dungeon alongside code review findings
133
+ 8. **Each reviewer CLEANUPs** their instance
134
+ 9. **Wizard rules** on ALL findings (code + browser) together
135
+
136
+ ### Browser findings follow the same severity rules:
137
+
138
+ - **Critical** (crash, security, layout broken) — must fix
139
+ - **Important** (broken feature, visual inconsistency, responsive breakage) — must fix
140
+ - **Minor** (polish, console warnings) — note for future
141
+
142
+ **Browser bugs block merge the same way code bugs do.**
143
+
115
144
  ## Closing the Phase
116
145
 
117
146
  The Wizard closes when agents have exhausted their findings and the Dungeon has all issues classified:
118
147
 
119
- **⚡ WIZARD RULING: APPROVED FOR MERGE** — all Critical/Important fixed, tests pass, requirements met.
148
+ **RULING: APPROVED FOR MERGE** — all Critical/Important fixed, tests pass, requirements met.
120
149
 
121
- **⚡ WIZARD RULING: REJECTED** — specify what must change and which phase to return to.
150
+ **RULING: REJECTED** — specify what must change and which phase to return to.
122
151
 
123
152
  ## Red Flags
124
153
 
@@ -78,6 +78,42 @@ Run: test command from `.claude/raid.json`
78
78
  - Keep tests green throughout — run after every change
79
79
  - Refactor the TESTS too, not just the implementation
80
80
 
81
+ ## Browser-Aware TDD (when `browser.enabled` in raid.json)
82
+
83
+ ### Deciding Test Type
84
+
85
+ Before writing the test, decide: is this a unit test or a browser test?
86
+
87
+ | Write Browser Test | Write Unit Test Only |
88
+ |---|---|
89
+ | New user-facing flow (signup, checkout) | Pure utility function |
90
+ | UI interaction (drag-drop, modal, form) | API endpoint logic |
91
+ | Client-side routing / navigation | Data transformation |
92
+ | Visual state changes (loading, error, empty) | Business rule validation |
93
+ | Integration between frontend and API | Database queries |
94
+
95
+ - **If both:** Write the unit test FIRST, then the browser test
96
+ - **State your reasoning** — challengers will attack this decision
97
+ - **When unsure:** Write the browser test. Better to have it and not need it.
98
+
99
+ ### Browser TDD Cycle
100
+
101
+ Follow the same RED-GREEN-REFACTOR discipline but with Playwright:
102
+
103
+ 1. **RED:** Write `.spec.ts` with user behavior assertions + console/network checks
104
+ 2. **Verify RED:** Run `{execCommand} playwright test` — must fail for the RIGHT reason
105
+ 3. **GREEN:** Implement feature → test passes
106
+ 4. **Verify GREEN:** Run FULL suite (unit + browser) → all green
107
+ 5. **REFACTOR:** Clean up → re-run all
108
+
109
+ Use `raid-browser-playwright` for detailed guidance. Invoke `raid-browser` for pre-flight and boot.
110
+
111
+ ### "Tests pass" = Unit AND Browser Tests
112
+
113
+ When claiming tests pass, both must pass:
114
+ - Unit: test command from `raid.json`
115
+ - Browser: `{execCommand} playwright test`
116
+
81
117
  ## Adversarial Test Review
82
118
 
83
119
  After TDD cycle, challengers attack the TESTS directly — and build on each other's critiques:
@@ -89,9 +125,15 @@ After TDD cycle, challengers attack the TESTS directly — and build on each oth
89
125
  5. **Would this catch a regression?** If someone changes the implementation next month, does this test catch the break?
90
126
 
91
127
  **Challengers interact directly:**
92
- - `⚔️ CHALLENGE: @Warrior, your test at line 15 only validates the happy path — here's an input that passes with a broken implementation: ...`
93
- - `🔗 BUILDING ON @Archer: Your edge case finding — the same gap exists in the error path test at line 32...`
94
- - `🔥 ROAST: @Rogue, you claimed the test is implementation-dependent but renaming the internal method doesn't break it — here's proof: ...`
128
+ - `CHALLENGE: @Warrior, your test at line 15 only validates the happy path — here's an input that passes with a broken implementation: ...`
129
+ - `BUILDING: @Archer, your edge case finding — the same gap exists in the error path test at line 32...`
130
+ - `CHALLENGE: @Rogue, you claimed the test is implementation-dependent but renaming the internal method doesn't break it — here's proof: ...`
131
+
132
+ **Browser-specific attacks (when `browser.enabled`):**
133
+
134
+ 6. **This is a user-facing feature but you only wrote unit tests — where's the browser test?** If the user interacts with it in a browser, it needs a browser test.
135
+ 7. **Your browser test checks the DOM but doesn't assert on console errors or network health.** Infrastructure assertions are mandatory.
136
+ 8. **You tested at desktop width only — what about mobile?** Responsive behavior is Important severity.
95
137
 
96
138
  **Challengers don't just report to the Wizard — they fight each other over test quality.**
97
139
 
@@ -53,6 +53,17 @@ This gate applies to EVERY status claim: "tests pass", "bug fixed", "feature com
53
53
  | "Feature complete" | All acceptance criteria verified with evidence | Self-assessment without running tests |
54
54
  | "Regression test works" | Red-green cycle verified | Test passes once without seeing it fail |
55
55
 
56
+ ### Browser Verification (when `browser.enabled` in raid.json)
57
+
58
+ "Tests pass" means BOTH unit and browser tests pass:
59
+
60
+ | Claim | Requires |
61
+ |---|---|
62
+ | "Tests pass" | Unit test command output: 0 failures AND `{execCommand} playwright test`: 0 failures |
63
+ | "Feature complete" | All acceptance criteria verified WITH browser test evidence |
64
+
65
+ If the project's test command doesn't include Playwright, the agent MUST run it separately and report both results.
66
+
56
67
  ## Forbidden Phrases Without Evidence
57
68
 
58
69
  These phrases are NEVER allowed without preceding verification output:
@@ -74,7 +85,7 @@ The implementer's claim is NOT sufficient. Challengers verify AND cross-check ea
74
85
  1. **Implementer verifies** — runs tests, reports with evidence (command + output)
75
86
  2. **Challenger 1 verifies independently** — runs same tests, confirms output matches
76
87
  3. **Challenger 2 verifies adversarially** — runs tests PLUS tries to break it with edge cases
77
- 4. **Challengers cross-check each other:** `@Archer, you said tests pass but did you run the full suite or just the changed files?` / `🔗 BUILDING ON @Warrior: Your verification missed the integration test at...`
88
+ 4. **Challengers cross-check each other:** `@Archer, you said tests pass but did you run the full suite or just the changed files?` / `BUILDING: @Warrior, your verification missed the integration test at...`
78
89
 
79
90
  Only after all required verifications confirm — and challengers have cross-checked each other — does the Wizard accept.
80
91