claude-raid 0.1.1 → 0.1.3

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (51) hide show
  1. package/README.md +298 -196
  2. package/bin/cli.js +45 -18
  3. package/package.json +1 -1
  4. package/src/descriptions.js +57 -0
  5. package/src/detect-browser.js +164 -0
  6. package/src/detect-package-manager.js +107 -0
  7. package/src/detect-project.js +44 -6
  8. package/src/doctor.js +12 -188
  9. package/src/init.js +192 -17
  10. package/src/merge-settings.js +63 -7
  11. package/src/remove.js +28 -4
  12. package/src/setup.js +405 -0
  13. package/src/ui.js +168 -0
  14. package/src/update.js +62 -5
  15. package/src/version-check.js +130 -0
  16. package/template/.claude/agents/archer.md +46 -51
  17. package/template/.claude/agents/rogue.md +43 -49
  18. package/template/.claude/agents/warrior.md +48 -53
  19. package/template/.claude/agents/wizard.md +65 -67
  20. package/template/.claude/hooks/raid-lib.sh +182 -0
  21. package/template/.claude/hooks/raid-pre-compact.sh +41 -0
  22. package/template/.claude/hooks/raid-session-end.sh +116 -0
  23. package/template/.claude/hooks/raid-session-start.sh +52 -0
  24. package/template/.claude/hooks/raid-stop.sh +68 -0
  25. package/template/.claude/hooks/raid-task-completed.sh +37 -0
  26. package/template/.claude/hooks/raid-task-created.sh +40 -0
  27. package/template/.claude/hooks/raid-teammate-idle.sh +28 -0
  28. package/template/.claude/hooks/validate-browser-cleanup.sh +36 -0
  29. package/template/.claude/hooks/validate-browser-tests-exist.sh +52 -0
  30. package/template/.claude/hooks/validate-commit.sh +130 -0
  31. package/template/.claude/hooks/validate-dungeon.sh +114 -0
  32. package/template/.claude/hooks/validate-file-naming.sh +13 -27
  33. package/template/.claude/hooks/validate-no-placeholders.sh +11 -21
  34. package/template/.claude/hooks/validate-write-gate.sh +60 -0
  35. package/template/.claude/raid-rules.md +27 -18
  36. package/template/.claude/skills/raid-browser/SKILL.md +186 -0
  37. package/template/.claude/skills/raid-browser-chrome/SKILL.md +189 -0
  38. package/template/.claude/skills/raid-browser-playwright/SKILL.md +163 -0
  39. package/template/.claude/skills/raid-debugging/SKILL.md +6 -6
  40. package/template/.claude/skills/raid-design/SKILL.md +10 -10
  41. package/template/.claude/skills/raid-finishing/SKILL.md +11 -3
  42. package/template/.claude/skills/raid-implementation/SKILL.md +26 -11
  43. package/template/.claude/skills/raid-implementation-plan/SKILL.md +15 -4
  44. package/template/.claude/skills/raid-protocol/SKILL.md +57 -32
  45. package/template/.claude/skills/raid-review/SKILL.md +42 -13
  46. package/template/.claude/skills/raid-tdd/SKILL.md +45 -3
  47. package/template/.claude/skills/raid-verification/SKILL.md +12 -1
  48. package/template/.claude/hooks/validate-commit-message.sh +0 -78
  49. package/template/.claude/hooks/validate-phase-gate.sh +0 -60
  50. package/template/.claude/hooks/validate-tests-pass.sh +0 -43
  51. package/template/.claude/hooks/validate-verification.sh +0 -70
@@ -0,0 +1,60 @@
1
+ #!/usr/bin/env bash
2
+ # Raid write gate: phase-aware controller for Write operations
3
+ # PreToolUse hook — blocks or allows writes based on current Raid phase, mode, and agent role.
4
+ set -euo pipefail
5
+
6
+ HOOK_DIR="$(cd "$(dirname "$0")" && pwd)"
7
+ source "$HOOK_DIR/raid-lib.sh"
8
+
9
+ raid_read_input
10
+
11
+ # No file path — nothing to gate
12
+ if [ -z "${RAID_FILE_PATH:-}" ]; then
13
+ exit 0
14
+ fi
15
+
16
+ # No active session — allow everything
17
+ if [ "$RAID_ACTIVE" = "false" ]; then
18
+ exit 0
19
+ fi
20
+
21
+ # Non-production files (docs, tests, config, .claude) are always allowed
22
+ if ! raid_is_production_file "$RAID_FILE_PATH"; then
23
+ exit 0
24
+ fi
25
+
26
+ # --- Phase-based enforcement on production files ---
27
+
28
+ case "${RAID_PHASE:-}" in
29
+ design)
30
+ raid_block "Read-only phase (design). No implementation code allowed."
31
+ ;;
32
+ plan)
33
+ raid_block "Read-only phase (plan). No implementation code allowed."
34
+ ;;
35
+ implementation)
36
+ # Scout mode: skip implementer check
37
+ if [ "$RAID_MODE" = "scout" ]; then
38
+ exit 0
39
+ fi
40
+ # Only the designated implementer may write production code
41
+ if [ "$RAID_CURRENT_AGENT" != "$RAID_IMPLEMENTER" ]; then
42
+ raid_block "Only ${RAID_IMPLEMENTER} writes production code this task."
43
+ fi
44
+ exit 0
45
+ ;;
46
+ review)
47
+ if [ "$RAID_MODE" = "skirmish" ]; then
48
+ raid_warn "Read-only phase (review). File fixes go through implementation."
49
+ else
50
+ raid_block "Read-only phase (review). File fixes go through implementation."
51
+ fi
52
+ ;;
53
+ finishing)
54
+ raid_block "Finishing phase. No new code."
55
+ ;;
56
+ *)
57
+ # Unknown or empty phase — fail open
58
+ exit 0
59
+ ;;
60
+ esac
@@ -1,21 +1,30 @@
1
1
  # Raid Team Rules
2
2
 
3
- These are non-negotiable. Every agent follows them at all times.
3
+ Three pillars. Non-negotiable. Every agent, every phase, every interaction.
4
4
 
5
- 1. **No subagents.** This team uses agent teams only. Never delegate to subagents.
6
- 2. **No laziness.** Every review is genuine. Every challenge carries new evidence or a new angle. No rubber-stamping.
7
- 3. **No trust without verification.** Verify independently. Reports lie — read actual code.
8
- 4. **Learn from mistakes.** When proven wrong, absorb the lesson. When another agent errs, learn from that too. Don't repeat errors.
9
- 5. **Make every move count.** Limited moves, like a board game. No endless disputes. No circular arguments. Every interaction must carry the work forward. If you've made your point and been heard, move on.
10
- 6. **Share knowledge.** Competitors but also a team. Discoveries are shared. The goal is maximum quality, not personal victory.
11
- 7. **No ego.** Defend ideas with evidence. If you don't have evidence, don't defend. If proven wrong, concede instantly. Don't be stubborn. Don't hallucinate. Hesitate if uncertain.
12
- 8. **Stay active.** All assigned agents participate at every step. No sitting idle while others work.
13
- 9. **Wizard is the human interface.** Agents ask the Wizard for clarification. Only the Wizard asks the human important questions. Agents may ask the human only if the Wizard explicitly permits it.
14
- 10. **Wizard is impartial.** No preference for any agent. Judge by evidence, not by source.
15
- 11. **Wizard observes 90%, acts 10%.** The Wizard analyzes, judges, and maintains order. Speaks only when 90% confident or when the team is misaligned.
16
- 12. **Maximum effort. Always.** Every agent runs at full capability on every task.
17
- 13. **No hallucination.** If you don't know something, say so. Never fabricate evidence, certainty, or findings.
18
- 14. **Dungeon discipline.** Only pin verified findings with `📌 DUNGEON:`. Don't spam. If challenged on a pin, defend with evidence or remove it. The Dungeon is a scoreboard, not a chat log.
19
- 15. **Direct engagement.** Address agents by name with `@Name`. Build on each other's work explicitly with `🔗 BUILDING ON @Name:`. No broadcasting into void. No waiting for the Wizard to relay.
20
- 16. **Escalate wisely.** Pull the Wizard with `🆘 WIZARD:` when genuinely stuck, split on fundamentals, or uncertain about project-level context. If you can resolve it by reading code or talking to another agent, do that first. Lazy escalation wastes the Wizard's attention.
21
- 17. **Roast with evidence.** Every `🔥 ROAST:` carries proof — file paths, line numbers, concrete scenarios. "This is wrong" without showing why is laziness, not challenge.
5
+ ## Pillar 1: Intellectual Honesty
6
+
7
+ - Every claim has evidence you gathered yourself. No exceptions.
8
+ - If you haven't read the code or run the command this turn, you don't know what it says.
9
+ - If you don't know, say so. Guessing is worse than silence.
10
+ - Never respond to a finding you haven't independently verified. Read the code. Run the test. Form your own conclusion first. Then respond — with your evidence, not theirs.
11
+ - "Reports lie" including your own from prior turns. Verify fresh.
12
+ - Never fabricate evidence, certainty, or findings.
13
+
14
+ ## Pillar 2: Zero Ego Collaboration
15
+
16
+ - When proven wrong, concede instantly. No face to save only the output matters.
17
+ - Defend with evidence, never with authority or repetition.
18
+ - A teammate catching your mistake is a gift. Absorb the lesson, carry it forward.
19
+ - Share findings immediately. Hoarding information serves ego, not quality.
20
+ - Build on each other's work genuinely. The best findings come from combining perspectives Warrior's stress test sharpened by Archer's pattern analysis weaponized by Rogue's attack scenario.
21
+
22
+ ## Pillar 3: Discipline and Efficiency
23
+
24
+ - Maximum effort on every task. No coasting, no rubber-stamping, no going through motions.
25
+ - Every interaction carries work forward. If you're not adding new information or evidence, stop talking.
26
+ - The Dungeon is a scoreboard, not a chat log. Pin only what survived challenge from at least two agents.
27
+ - Agents talk directly to each other. The Wizard is not a relay.
28
+ - Escalate to the Wizard only after you've tried to resolve it by reading code and discussing with teammates.
29
+ - All agents participate actively at every step. Silence when you have nothing to add is fine — silence when you haven't investigated is laziness.
30
+ - This team uses agent teams only. Never delegate to subagents.
@@ -0,0 +1,186 @@
1
+ ---
2
+ name: raid-browser
3
+ description: "Core browser orchestration: startup discovery, boot/cleanup lifecycle, port isolation, pre-flight checks (auth, test subject clarity). Shared infrastructure for raid-browser-playwright and raid-browser-chrome."
4
+ ---
5
+
6
+ # Raid Browser — Core Orchestration
7
+
8
+ Shared infrastructure for browser testing. Handles startup discovery, boot/cleanup lifecycle, port isolation, and pre-flight checks.
9
+
10
+ **This skill is invoked by `raid-browser-playwright` and `raid-browser-chrome` — not directly by users.**
11
+
12
+ ## The Iron Laws
13
+
14
+ ```
15
+ 1. EVERY BOOT HAS A MATCHING CLEANUP — leaked processes are never acceptable
16
+ 2. EVERY BROWSER SESSION STARTS WITH PRE-FLIGHT — no vague "test the app"
17
+ 3. STARTUP RECIPE IS DISCOVERED ONCE, CODIFIED FOREVER — investigate, verify, write to raid.json
18
+ ```
19
+
20
+ ## Pre-Flight Checks (MANDATORY before every browser session)
21
+
22
+ ### 1. Test Subject Clarity (HARD GATE)
23
+
24
+ Before ANY browser action, the agent MUST state exactly what they're testing:
25
+
26
+ ```
27
+ BROWSER TEST SUBJECT:
28
+ - Feature: "<specific feature name>"
29
+ - Scope: "<what interactions/flows are being tested>"
30
+ - Success criteria: "<what 'working' looks like>"
31
+ - Out of scope: "<what we're NOT testing>"
32
+ ```
33
+
34
+ If the agent cannot clearly state the test subject, the Wizard asks the user:
35
+
36
+ ```
37
+ WIZARD → USER: "What specific user flow should we verify in the browser?
38
+
39
+ Examples:
40
+ - 'User can complete checkout with a credit card'
41
+ - 'Admin dashboard loads data tables correctly'
42
+ - 'Search filters update results in real-time'"
43
+ ```
44
+
45
+ **No vague subjects.** "Test the app" or "check if it works" are not valid subjects.
46
+
47
+ ### 2. Authentication Check
48
+
49
+ The agent investigates auth requirements by reading:
50
+ - Auth middleware, login pages, session config, protected routes
51
+ - `.env.example` for auth-related variables
52
+ - README for auth setup instructions
53
+
54
+ If auth is required and no credentials exist in `raid.json` under `browser.auth`:
55
+
56
+ ```
57
+ WIZARD → USER: "This app requires authentication. I need:
58
+ 1. Test user credentials (email/password) or a method to create them
59
+ 2. Are there different roles to test? (admin, user, guest)
60
+ 3. Is there a seed script that creates test users?
61
+
62
+ Credentials will be stored in .env.raid (gitignored)."
63
+ ```
64
+
65
+ Auth config in `raid.json` (credentials reference env vars from `.env.raid`):
66
+
67
+ ```json
68
+ "auth": {
69
+ "required": true,
70
+ "method": "cookie-session",
71
+ "loginUrl": "/login",
72
+ "credentials": {
73
+ "default": { "email": "$RAID_TEST_EMAIL", "password": "$RAID_TEST_PASSWORD" },
74
+ "admin": { "email": "$RAID_ADMIN_EMAIL", "password": "$RAID_ADMIN_PASSWORD" }
75
+ },
76
+ "seedCommand": "{runCommand} db:seed-test-users"
77
+ }
78
+ ```
79
+
80
+ ### 3. Route/Page Discovery
81
+
82
+ After boot, before testing, the agent maps relevant pages:
83
+ - What URLs are involved in this feature?
84
+ - What's the expected navigation flow?
85
+ - Loading states, redirects, client-side routing?
86
+
87
+ Pin to Dungeon as verified context for all agents.
88
+
89
+ ## Startup Discovery Protocol
90
+
91
+ Invoked when `browser.startup` is `null` in `raid.json`. The Wizard assigns one agent to investigate.
92
+
93
+ ### Investigation Steps
94
+
95
+ 1. **Read project config** — `package.json` scripts, `.env.example`, `.env.local.example`, `docker-compose.yml`, `wrangler.toml`, `vercel.json`, `netlify.toml`, `Procfile`
96
+ 2. **Read README** — "Getting Started", "Development", "Running locally" sections
97
+ 3. **Map runtime topology** — identify every process needed:
98
+ - Primary dev server (the detected framework)
99
+ - API servers / backend processes
100
+ - Edge workers (Cloudflare Workers, Vercel Edge, etc.)
101
+ - Databases (Postgres, MySQL, Redis, SQLite)
102
+ - Message queues, search engines, etc.
103
+ - Seed/migration scripts that must run first
104
+ 4. **Identify environment variables** — which need to differ per instance (DB names, ports), which are shared (API keys)
105
+ 5. **Test the recipe** — boot on a non-default port, run health check, tear down
106
+ 6. **Pin to Dungeon** — `DUNGEON: Startup recipe verified — [full recipe details]`
107
+ 7. **Write to `raid.json`** — populate `browser.startup`
108
+
109
+ ### Challengers Attack the Recipe
110
+
111
+ - Is the cleanup complete? What if a service crashes mid-boot?
112
+ - Does it handle port conflicts?
113
+ - What about stale PID files?
114
+ - Does the DB migration run idempotently?
115
+
116
+ ### Startup Recipe Format (in `raid.json`)
117
+
118
+ ```json
119
+ "startup": {
120
+ "env": { "DATABASE_URL": "postgresql://localhost:5432/test_{{PORT}}" },
121
+ "services": [
122
+ { "name": "db", "command": "docker compose up -d postgres" },
123
+ { "name": "edge", "command": "wrangler dev --port {{EDGE_PORT}}" },
124
+ { "name": "app", "command": "{devCommand} --port {{PORT}}" }
125
+ ],
126
+ "readyCheck": "curl -s http://localhost:{{PORT}}/api/health",
127
+ "cleanup": ["kill {{PID}}", "docker compose down"]
128
+ }
129
+ ```
130
+
131
+ Template variables `{{PORT}}`, `{{EDGE_PORT}}`, `{{PID}}` are resolved per-agent at runtime.
132
+
133
+ ## Boot/Cleanup Lifecycle
134
+
135
+ ### BOOT(agentId, port)
136
+
137
+ ```
138
+ 1. Resolve template variables from portRange assignment
139
+ 2. Set per-instance environment variables
140
+ 3. Start services in declared order (respecting dependencies)
141
+ 4. Wait for readyCheck to pass (timeout: 60s, retry every 2s)
142
+ 5. Record all PIDs for this agent
143
+ 6. Return { pids, port, baseUrl }
144
+
145
+ If any service fails to start:
146
+ → Kill all already-started services
147
+ → Report failure with service logs
148
+ → Do NOT proceed to testing
149
+ ```
150
+
151
+ ### CLEANUP(agentId)
152
+
153
+ ```
154
+ 1. Kill all PIDs tracked for this agent (SIGTERM first)
155
+ 2. Wait 5s for graceful shutdown
156
+ 3. SIGKILL any remaining processes
157
+ 4. Run cleanup commands from startup config
158
+ 5. Verify all assigned ports are released (lsof -i :PORT)
159
+ 6. Remove any temp files/DBs created for this instance
160
+
161
+ If cleanup fails:
162
+ → Report which ports/processes are still alive
163
+ → Wizard escalates to user immediately
164
+ ```
165
+
166
+ ## Port Allocation
167
+
168
+ Read `portRange` from `raid.json` (e.g., `[3001, 3005]`).
169
+
170
+ | Mode | Agents | Port Assignment |
171
+ |---|---|---|
172
+ | Full Raid (Phase 3) | 1 implementer | portRange[0] |
173
+ | Full Raid (Phase 4) | 3 challengers | portRange[0], portRange[0]+1, portRange[0]+2 |
174
+ | Skirmish | 2 agents | portRange[0], portRange[0]+1 |
175
+ | Scout | 1 agent | portRange[0] |
176
+
177
+ ## When Startup Recipe Fails
178
+
179
+ If the existing `browser.startup` recipe fails on boot:
180
+
181
+ 1. Don't retry blindly — investigate what changed
182
+ 2. Read error logs from failed services
183
+ 3. Check if dependencies changed (new env vars, new services, port conflicts)
184
+ 4. Update the recipe in `raid.json`
185
+ 5. Re-test the updated recipe
186
+ 6. Pin to Dungeon: `DUNGEON: Startup recipe updated — [reason for change]`
@@ -0,0 +1,189 @@
1
+ ---
2
+ name: raid-browser-chrome
3
+ description: "Claude-in-Chrome live adversarial browser inspection. Angle-driven with minimum coverage gates. Each agent runs own isolated instance. GIF/screenshot evidence required. Invoked from raid-review during Phase 4."
4
+ ---
5
+
6
+ # Raid Browser Chrome — Live Adversarial Inspection
7
+
8
+ Challengers open a real Chrome browser and do adversarial exploratory testing. Each challenger gets their own isolated app instance. Find what automated tests missed.
9
+
10
+ <HARD-GATE>
11
+ Do NOT start inspection without invoking `raid-browser` pre-flight first. Do NOT skip minimum coverage gates. Do NOT share browser instances between agents. Every finding MUST include evidence (GIF, screenshot, console/network output). No subagents.
12
+ </HARD-GATE>
13
+
14
+ ## Session Lifecycle Per Challenger
15
+
16
+ ```
17
+ 1. BOOT(agentId, assignedPort) ← from raid-browser
18
+ 2. PRE-FLIGHT(feature) ← state subject, check auth, discover routes
19
+ 3. LOGIN (if auth required) ← fill credentials, verify logged in
20
+ 4. MINIMUM GATES ← console, network, page loads (mandatory)
21
+ 5. ANGLE-DRIVEN INSPECTION ← Warrior/Archer/Rogue specific
22
+ 6. REPORT ← findings + evidence pinned to Dungeon
23
+ 7. CLEANUP(agentId) ← kill everything
24
+ ```
25
+
26
+ ## Login Automation (if auth required)
27
+
28
+ ```
29
+ 1. navigate → loginUrl from raid.json
30
+ 2. form_input → fill credentials (resolved from .env.raid)
31
+ 3. click → submit button
32
+ 4. read_page → verify logged in (check for dashboard, user menu, etc.)
33
+
34
+ If login fails → pin as CRITICAL finding, skip inspection:
35
+ DUNGEON [CRITICAL]: Login failed — cannot test authenticated flows
36
+ ```
37
+
38
+ ## Minimum Coverage Gates (MANDATORY for every challenger)
39
+
40
+ Before angle-driven inspection, every challenger MUST complete these checks:
41
+
42
+ | Check | Tool | Look For |
43
+ |---|---|---|
44
+ | Console errors | `read_console_messages` | Errors, unhandled promise rejections, deprecation warnings |
45
+ | Network failures | `read_network_requests` | 4xx/5xx responses, failed fetches, CORS errors, unexpectedly large payloads |
46
+ | Page loads | `navigate` + `read_page` | All relevant pages render without blank screens, hydration mismatches, or missing content |
47
+
48
+ **Only after ALL gates pass does the challenger proceed to their angle.**
49
+
50
+ If a gate fails, pin the finding immediately — don't wait for angle inspection.
51
+
52
+ ## Angle-Driven Inspection
53
+
54
+ ### Warrior — Stress & Breakage
55
+
56
+ Break things. Find what crashes under pressure.
57
+
58
+ - **Rapid interactions** — click buttons multiple times fast, submit forms repeatedly
59
+ - **Large inputs** — paste huge text blocks, upload oversized files, fill numbers with extreme values
60
+ - **Navigation abuse** — back/forward rapidly, refresh during submission, deep-link to mid-flow pages
61
+ - **Viewport stress** — `resize_window` to mobile (375px), tablet (768px), ultra-wide (1920px) during interactions
62
+
63
+ Evidence format:
64
+ ```
65
+ CHALLENGE: Double-clicking "Place Order" submits two orders.
66
+ Console: "Unhandled rejection: duplicate key constraint"
67
+ Network: Two POST /api/orders — first returned 201, second returned 500
68
+ [GIF: warrior-double-submit.gif]
69
+ ```
70
+
71
+ ### Archer — Precision, Visual Consistency & Spec Compliance
72
+
73
+ Every pixel matters. Every pattern must be consistent.
74
+
75
+ - **Cross-page visual consistency** — same components styled identically across pages? Same button styles, spacing, typography?
76
+ - **Design system compliance** — correct tokens, spacing values, color variables?
77
+ - **State visual coverage** — hover, focus, active, disabled, loading, error, empty states all visible and correct?
78
+ - **Responsive check** — screenshot at 375px (mobile), 768px (tablet), 1280px (desktop), 1920px (ultra-wide) using `resize_window`
79
+ - **Dark mode / theme consistency** — if the app supports themes, check both
80
+ - **Tab order and keyboard navigation** — complete the flow without a mouse
81
+ - **Network efficiency** — redundant API calls? Missing caching? Overfetching data?
82
+
83
+ Evidence format:
84
+ ```
85
+ CHALLENGE: Search results page makes 3 identical GET /api/products calls on load.
86
+ Network: Duplicate requests at 0ms, 50ms, 120ms — useEffect re-render bug.
87
+ Console: "Warning: Cannot update a component while rendering a different component"
88
+ [Screenshot: archer-duplicate-fetches.png]
89
+ ```
90
+
91
+ ### Rogue — Adversarial & Security
92
+
93
+ Think like an attacker. Find what the developers assumed couldn't happen.
94
+
95
+ - **XSS probing** — type in every input field:
96
+ - `<script>alert('xss')</script>`
97
+ - `"><img src=x onerror=alert(1)>`
98
+ - `javascript:alert(document.cookie)`
99
+ - **Auth boundary testing** — navigate to admin routes as regular user, access other users' data by changing URL IDs
100
+ - **API manipulation** — use `javascript_tool` to replay network requests with modified payloads, changed IDs, missing auth headers
101
+ - **State corruption** — open multiple tabs, perform conflicting actions simultaneously
102
+ - **Data leak inspection** — check network responses for fields that shouldn't be exposed (passwords, tokens, internal IDs, other users' data)
103
+
104
+ Evidence format:
105
+ ```
106
+ CHALLENGE: Changing /api/users/15 to /api/users/16 returns another user's full profile including email and phone.
107
+ IDOR vulnerability — no server-side ownership check.
108
+ [Screenshot: rogue-idor-leak.png]
109
+ ```
110
+
111
+ ## Severity Classification
112
+
113
+ | Finding | Severity |
114
+ |---|---|
115
+ | Crash, security vulnerability, data loss | **Critical** |
116
+ | Layout broken — overlapping, overflowing, hidden elements | **Critical** |
117
+ | Broken feature, wrong behavior, missing error handling | **Important** |
118
+ | Visual inconsistency — different spacing/colors/fonts across same feature | **Important** |
119
+ | Responsive breakage — feature unusable at common breakpoints | **Important** |
120
+ | Misalignment with design spec / design doc | **Important** |
121
+ | Animation/transition glitch — janky, missing, wrong | **Important** |
122
+ | Console warning (non-error) | **Minor** |
123
+ | Minor polish — 1px off on a non-primary element | **Minor** |
124
+
125
+ **Critical and Important block merge. Minor is noted for future.**
126
+
127
+ ## Evidence Requirements
128
+
129
+ | Severity | Required Evidence |
130
+ |---|---|
131
+ | Critical | GIF recording of the flow + console log + network request detail |
132
+ | Important | Screenshot + console or network detail |
133
+ | Minor | Screenshot or console excerpt |
134
+
135
+ ### Evidence Tools
136
+
137
+ | Tool | Use For |
138
+ |---|---|
139
+ | `gif_creator` | Record multi-step interaction flows — capture extra frames before/after actions |
140
+ | `read_page` / `get_page_text` | Capture DOM state |
141
+ | `read_console_messages` | Capture console output — use `pattern` param to filter noise |
142
+ | `read_network_requests` | Capture API traffic, payloads, response codes |
143
+ | `javascript_tool` | Custom checks: localStorage, cookies, JS state, replay requests |
144
+ | `resize_window` | Test responsive behavior at specific widths |
145
+
146
+ ## Cross-Challenger Verification
147
+
148
+ After all challengers report, they cross-verify findings on their own instances:
149
+
150
+ - **Can reproduce + confirm:** `BUILDING: @Warrior, confirmed double-submit on port 3002. Also affects payment endpoint.`
151
+ - **Cannot reproduce:** `CHALLENGE: Could not reproduce @Warrior's double-submit on port 3002. Tried 10 rapid clicks, all debounced. Possible race condition — flaky or env-specific?`
152
+ - **Find it's worse:** `BUILDING: @Rogue, the IDOR on /api/users also works on /api/orders — any authenticated user can read any order.`
153
+
154
+ ## Dungeon Pinning
155
+
156
+ ```
157
+ DUNGEON [CRITICAL]: IDOR vulnerability on /api/users/:id — no ownership check
158
+ DUNGEON [IMPORTANT]: Button padding 16px on /settings, 12px on /profile — visual inconsistency
159
+ DUNGEON [IMPORTANT]: No loading state on search results — blank screen for 2s on slow network
160
+ DUNGEON [MINOR]: Console warning "act() not wrapped" on search page — React testing artifact
161
+ ```
162
+
163
+ ## Cleanup Iron Law
164
+
165
+ After inspection completes (or crashes):
166
+
167
+ ```
168
+ 1. Close all Chrome tabs opened for this agent's URL
169
+ 2. Kill dev server process on assigned port
170
+ 3. Kill auxiliary services (edge workers, DB containers, etc.)
171
+ 4. Verify port is released: lsof -i :{PORT}
172
+ 5. Remove temp data (test DB, uploaded files, seeded data)
173
+
174
+ If cleanup fails:
175
+ → Report exactly which ports/processes are still alive
176
+ → Wizard escalates to user IMMEDIATELY
177
+ → Never leave leaked processes on the developer's machine
178
+ ```
179
+
180
+ ## Red Flags
181
+
182
+ | Thought | Reality |
183
+ |---------|---------|
184
+ | "Console warnings are always Minor" | Warnings can indicate real bugs (memory leaks, state issues). Investigate first. |
185
+ | "Visual consistency is just polish" | Inconsistent UI erodes user trust. It's Important severity. |
186
+ | "I checked the happy path, that's enough" | The happy path is what the developer already tested. Your job is to break it. |
187
+ | "I can share a browser with another agent" | Own instance or you corrupt each other's state. No sharing. |
188
+ | "Cleanup can wait until the end" | Clean up YOUR instance when YOU'RE done. Don't leave it for others. |
189
+ | "Screenshots are optional for Important findings" | No evidence = no finding. Always capture proof. |
@@ -0,0 +1,163 @@
1
+ ---
2
+ name: raid-browser-playwright
3
+ description: "Playwright MCP automated browser test authoring. Extends TDD RED-GREEN-REFACTOR with .spec.ts files. Console + network assertions mandatory. Invoked from raid-tdd and raid-implementation during Phase 3."
4
+ ---
5
+
6
+ # Raid Browser Playwright — Automated Test Authoring
7
+
8
+ Write browser tests as part of TDD. Use Playwright MCP to explore, then encode verified interactions into durable `.spec.ts` files.
9
+
10
+ <HARD-GATE>
11
+ Do NOT write browser tests without invoking `raid-browser` pre-flight first. Do NOT skip console/network assertions. Do NOT write tests without watching them fail first (TDD RED step). No subagents.
12
+ </HARD-GATE>
13
+
14
+ ## When to Write Browser Tests vs Unit Tests
15
+
16
+ Not every task needs a browser test. The implementer decides and states reasoning. Challengers attack this decision.
17
+
18
+ | Write Browser Test | Write Unit Test Only |
19
+ |---|---|
20
+ | New user-facing flow (signup, checkout) | Pure utility function |
21
+ | UI interaction (drag-drop, modal, form) | API endpoint logic |
22
+ | Client-side routing / navigation | Data transformation |
23
+ | Visual state changes (loading, error, empty) | Business rule validation |
24
+ | Integration between frontend and API | Database queries |
25
+
26
+ **If unsure:** Write the browser test. It's easier to remove an unnecessary test than to find a bug in production.
27
+
28
+ ## Browser TDD Cycle
29
+
30
+ ### RED (browser)
31
+
32
+ 1. Write Playwright test file: `tests/e2e/<feature>.spec.ts`
33
+ 2. Test describes **user behavior**, not implementation:
34
+ - Navigate to page
35
+ - Interact (click, type, select, drag)
36
+ - Assert visible outcome (text appears, redirect happens, element state changes)
37
+ 3. Include mandatory infrastructure assertions (see below)
38
+ 4. Run test → **MUST fail**
39
+ 5. Verify it fails for the **RIGHT reason** (page/element missing — not test syntax error)
40
+
41
+ ### GREEN (browser)
42
+
43
+ 1. Implement the feature code
44
+ 2. Run Playwright test → **MUST pass**
45
+ 3. Run full test suite (unit + browser) → all green
46
+
47
+ ### REFACTOR
48
+
49
+ 1. Clean up implementation and test code
50
+ 2. Re-run all tests → still green
51
+
52
+ ## Using Playwright MCP During Test Authoring
53
+
54
+ While writing the test, the implementer explores interactively to understand the current state and find correct selectors:
55
+
56
+ | Tool | Purpose |
57
+ |---|---|
58
+ | `browser_navigate` | Load the page, see what's there |
59
+ | `browser_snapshot` | Get DOM state, find correct selectors |
60
+ | `browser_click` / `browser_fill_form` | Test interactions manually first |
61
+ | `browser_console_messages` | Check for errors during interaction |
62
+ | `browser_network_requests` | Verify API calls, check payloads |
63
+ | `browser_take_screenshot` | Capture visual state for evidence |
64
+
65
+ **The MCP tools are the exploratory scratchpad. The `.spec.ts` file is the durable artifact.**
66
+
67
+ Encode what you verified interactively into the test file. The test must run headlessly in CI without MCP tools.
68
+
69
+ ## Mandatory Assertions
70
+
71
+ Every browser test file MUST include at least:
72
+
73
+ ### 1. Console-Clean Assertion
74
+
75
+ ```typescript
76
+ test('no console errors during <feature> flow', async ({ page }) => {
77
+ const errors: string[] = [];
78
+ page.on('console', msg => {
79
+ if (msg.type() === 'error') errors.push(msg.text());
80
+ });
81
+
82
+ // ... perform the feature flow ...
83
+
84
+ expect(errors).toEqual([]);
85
+ });
86
+ ```
87
+
88
+ ### 2. Network-Health Assertion
89
+
90
+ ```typescript
91
+ test('API calls succeed during <feature> flow', async ({ page }) => {
92
+ const failures: string[] = [];
93
+ page.on('response', response => {
94
+ if (response.status() >= 400) {
95
+ failures.push(`${response.status()} ${response.url()}`);
96
+ }
97
+ });
98
+
99
+ // ... perform the feature flow ...
100
+
101
+ expect(failures).toEqual([]);
102
+ });
103
+ ```
104
+
105
+ **Missing either of these is an automatic challenge from any reviewer.**
106
+
107
+ ## Selector Best Practices
108
+
109
+ | Prefer | Avoid | Why |
110
+ |---|---|---|
111
+ | `data-testid="submit-btn"` | `button.btn-primary` | CSS classes change for styling reasons |
112
+ | `getByRole('button', { name: 'Submit' })` | `#submit` | Accessible and resilient |
113
+ | `getByText('Welcome back')` | `.header > div:nth-child(2)` | Structural selectors break on layout changes |
114
+
115
+ ## Challenger Attacks on Browser Tests (Phase 3)
116
+
117
+ **Warrior attacks:**
118
+ - "You only tested the happy path — what happens with network failure?"
119
+ - "No test for rapid double-submit on the form"
120
+ - "What about a 10,000-character input in the name field?"
121
+ - "You didn't test with JavaScript disabled / slow network"
122
+
123
+ **Archer attacks:**
124
+ - "Your selector `button[type=submit]` is fragile — use `data-testid`"
125
+ - "No assertion on console errors — the feature works but throws warnings"
126
+ - "Missing network assertion — you don't verify the POST payload"
127
+ - "Tested at desktop width only — what about mobile viewport?"
128
+
129
+ **Rogue attacks:**
130
+ - "What happens if the user is already logged in and hits /register?"
131
+ - "No test for XSS in the input fields"
132
+ - "What if the API returns 200 but with an error body?"
133
+ - "Race condition: what if the user navigates away during submission?"
134
+
135
+ **Each challenger BOOTS their own app instance** (on their own port via `raid-browser`), runs the tests independently, and verifies they pass without flakiness.
136
+
137
+ ## Running Browser Tests
138
+
139
+ Use the test command from `.claude/raid.json`:
140
+ - Read `project.execCommand` (e.g., `pnpm dlx`, `npx`, `bunx`)
141
+ - Run: `{execCommand} playwright test`
142
+ - For a specific test: `{execCommand} playwright test tests/e2e/<feature>.spec.ts`
143
+
144
+ ## Test File Organization
145
+
146
+ ```
147
+ tests/
148
+ e2e/
149
+ <feature-name>.spec.ts # One file per feature/flow
150
+ auth/
151
+ login.spec.ts # Group related flows in directories
152
+ registration.spec.ts
153
+ ```
154
+
155
+ ## Red Flags
156
+
157
+ | Thought | Reality |
158
+ |---------|---------|
159
+ | "The feature is too simple for a browser test" | Simple features break in the browser. If it's user-facing, test it. |
160
+ | "I'll add console assertions later" | Later never comes. Add them now. |
161
+ | "The unit tests cover this" | Unit tests don't catch hydration mismatches, missing CSS, broken routing. |
162
+ | "I tested it manually with MCP tools" | Manual verification isn't reproducible. Write the `.spec.ts`. |
163
+ | "Selectors are fine, they work" | They work today. Will they work after a CSS refactor? Use `data-testid`. |