npm - claude-raid - Versions diffs - 0.1.1 → 0.1.3 - Mend

claude-raid 0.1.1 → 0.1.3

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (51) hide show

package/README.md +298 -196
package/bin/cli.js +45 -18
package/package.json +1 -1
package/src/descriptions.js +57 -0
package/src/detect-browser.js +164 -0
package/src/detect-package-manager.js +107 -0
package/src/detect-project.js +44 -6
package/src/doctor.js +12 -188
package/src/init.js +192 -17
package/src/merge-settings.js +63 -7
package/src/remove.js +28 -4
package/src/setup.js +405 -0
package/src/ui.js +168 -0
package/src/update.js +62 -5
package/src/version-check.js +130 -0
package/template/.claude/agents/archer.md +46 -51
package/template/.claude/agents/rogue.md +43 -49
package/template/.claude/agents/warrior.md +48 -53
package/template/.claude/agents/wizard.md +65 -67
package/template/.claude/hooks/raid-lib.sh +182 -0
package/template/.claude/hooks/raid-pre-compact.sh +41 -0
package/template/.claude/hooks/raid-session-end.sh +116 -0
package/template/.claude/hooks/raid-session-start.sh +52 -0
package/template/.claude/hooks/raid-stop.sh +68 -0
package/template/.claude/hooks/raid-task-completed.sh +37 -0
package/template/.claude/hooks/raid-task-created.sh +40 -0
package/template/.claude/hooks/raid-teammate-idle.sh +28 -0
package/template/.claude/hooks/validate-browser-cleanup.sh +36 -0
package/template/.claude/hooks/validate-browser-tests-exist.sh +52 -0
package/template/.claude/hooks/validate-commit.sh +130 -0
package/template/.claude/hooks/validate-dungeon.sh +114 -0
package/template/.claude/hooks/validate-file-naming.sh +13 -27
package/template/.claude/hooks/validate-no-placeholders.sh +11 -21
package/template/.claude/hooks/validate-write-gate.sh +60 -0
package/template/.claude/raid-rules.md +27 -18
package/template/.claude/skills/raid-browser/SKILL.md +186 -0
package/template/.claude/skills/raid-browser-chrome/SKILL.md +189 -0
package/template/.claude/skills/raid-browser-playwright/SKILL.md +163 -0
package/template/.claude/skills/raid-debugging/SKILL.md +6 -6
package/template/.claude/skills/raid-design/SKILL.md +10 -10
package/template/.claude/skills/raid-finishing/SKILL.md +11 -3
package/template/.claude/skills/raid-implementation/SKILL.md +26 -11
package/template/.claude/skills/raid-implementation-plan/SKILL.md +15 -4
package/template/.claude/skills/raid-protocol/SKILL.md +57 -32
package/template/.claude/skills/raid-review/SKILL.md +42 -13
package/template/.claude/skills/raid-tdd/SKILL.md +45 -3
package/template/.claude/skills/raid-verification/SKILL.md +12 -1
package/template/.claude/hooks/validate-commit-message.sh +0 -78
package/template/.claude/hooks/validate-phase-gate.sh +0 -60
package/template/.claude/hooks/validate-tests-pass.sh +0 -43
package/template/.claude/hooks/validate-verification.sh +0 -70

package/template/.claude/hooks/validate-write-gate.sh ADDED Viewed

@@ -0,0 +1,60 @@
+#!/usr/bin/env bash
+# Raid write gate: phase-aware controller for Write operations
+# PreToolUse hook — blocks or allows writes based on current Raid phase, mode, and agent role.
+set -euo pipefail
+HOOK_DIR="$(cd "$(dirname "$0")" && pwd)"
+source "$HOOK_DIR/raid-lib.sh"
+raid_read_input
+# No file path — nothing to gate
+if [ -z "${RAID_FILE_PATH:-}" ]; then
+  exit 0
+fi
+# No active session — allow everything
+if [ "$RAID_ACTIVE" = "false" ]; then
+  exit 0
+fi
+# Non-production files (docs, tests, config, .claude) are always allowed
+if ! raid_is_production_file "$RAID_FILE_PATH"; then
+  exit 0
+fi
+# --- Phase-based enforcement on production files ---
+case "${RAID_PHASE:-}" in
+  design)
+    raid_block "Read-only phase (design). No implementation code allowed."
+    ;;
+  plan)
+    raid_block "Read-only phase (plan). No implementation code allowed."
+    ;;
+  implementation)
+    # Scout mode: skip implementer check
+    if [ "$RAID_MODE" = "scout" ]; then
+      exit 0
+    fi
+    # Only the designated implementer may write production code
+    if [ "$RAID_CURRENT_AGENT" != "$RAID_IMPLEMENTER" ]; then
+      raid_block "Only ${RAID_IMPLEMENTER} writes production code this task."
+    fi
+    exit 0
+    ;;
+  review)
+    if [ "$RAID_MODE" = "skirmish" ]; then
+      raid_warn "Read-only phase (review). File fixes go through implementation."
+    else
+      raid_block "Read-only phase (review). File fixes go through implementation."
+    fi
+    ;;
+  finishing)
+    raid_block "Finishing phase. No new code."
+    ;;
+  *)
+    # Unknown or empty phase — fail open
+    exit 0
+    ;;
+esac

package/template/.claude/raid-rules.md CHANGED Viewed

@@ -1,21 +1,30 @@
 # Raid Team Rules
-These are non-negotiable. Every agent follows them at all times.
+Three pillars. Non-negotiable. Every agent, every phase, every interaction.
-1. **No subagents.** This team uses agent teams only. Never delegate to subagents.
-2. **No laziness.** Every review is genuine. Every challenge carries new evidence or a new angle. No rubber-stamping.
-3. **No trust without verification.** Verify independently. Reports lie — read actual code.
-4. **Learn from mistakes.** When proven wrong, absorb the lesson. When another agent errs, learn from that too. Don't repeat errors.
-5. **Make every move count.** Limited moves, like a board game. No endless disputes. No circular arguments. Every interaction must carry the work forward. If you've made your point and been heard, move on.
-6. **Share knowledge.** Competitors but also a team. Discoveries are shared. The goal is maximum quality, not personal victory.
-7. **No ego.** Defend ideas with evidence. If you don't have evidence, don't defend. If proven wrong, concede instantly. Don't be stubborn. Don't hallucinate. Hesitate if uncertain.
-8. **Stay active.** All assigned agents participate at every step. No sitting idle while others work.
-9. **Wizard is the human interface.** Agents ask the Wizard for clarification. Only the Wizard asks the human important questions. Agents may ask the human only if the Wizard explicitly permits it.
-10. **Wizard is impartial.** No preference for any agent. Judge by evidence, not by source.
-11. **Wizard observes 90%, acts 10%.** The Wizard analyzes, judges, and maintains order. Speaks only when 90% confident or when the team is misaligned.
-12. **Maximum effort. Always.** Every agent runs at full capability on every task.
-13. **No hallucination.** If you don't know something, say so. Never fabricate evidence, certainty, or findings.
-14. **Dungeon discipline.** Only pin verified findings with `📌 DUNGEON:`. Don't spam. If challenged on a pin, defend with evidence or remove it. The Dungeon is a scoreboard, not a chat log.
-15. **Direct engagement.** Address agents by name with `@Name`. Build on each other's work explicitly with `🔗 BUILDING ON @Name:`. No broadcasting into void. No waiting for the Wizard to relay.
-16. **Escalate wisely.** Pull the Wizard with `🆘 WIZARD:` when genuinely stuck, split on fundamentals, or uncertain about project-level context. If you can resolve it by reading code or talking to another agent, do that first. Lazy escalation wastes the Wizard's attention.
-17. **Roast with evidence.** Every `🔥 ROAST:` carries proof — file paths, line numbers, concrete scenarios. "This is wrong" without showing why is laziness, not challenge.
+## Pillar 1: Intellectual Honesty
+- Every claim has evidence you gathered yourself. No exceptions.
+- If you haven't read the code or run the command this turn, you don't know what it says.
+- If you don't know, say so. Guessing is worse than silence.
+- Never respond to a finding you haven't independently verified. Read the code. Run the test. Form your own conclusion first. Then respond — with your evidence, not theirs.
+- "Reports lie" — including your own from prior turns. Verify fresh.
+- Never fabricate evidence, certainty, or findings.
+## Pillar 2: Zero Ego Collaboration
+- When proven wrong, concede instantly. No face to save — only the output matters.
+- Defend with evidence, never with authority or repetition.
+- A teammate catching your mistake is a gift. Absorb the lesson, carry it forward.
+- Share findings immediately. Hoarding information serves ego, not quality.
+- Build on each other's work genuinely. The best findings come from combining perspectives — Warrior's stress test sharpened by Archer's pattern analysis weaponized by Rogue's attack scenario.
+## Pillar 3: Discipline and Efficiency
+- Maximum effort on every task. No coasting, no rubber-stamping, no going through motions.
+- Every interaction carries work forward. If you're not adding new information or evidence, stop talking.
+- The Dungeon is a scoreboard, not a chat log. Pin only what survived challenge from at least two agents.
+- Agents talk directly to each other. The Wizard is not a relay.
+- Escalate to the Wizard only after you've tried to resolve it by reading code and discussing with teammates.
+- All agents participate actively at every step. Silence when you have nothing to add is fine — silence when you haven't investigated is laziness.
+- This team uses agent teams only. Never delegate to subagents.

package/template/.claude/skills/raid-browser/SKILL.md ADDED Viewed

@@ -0,0 +1,186 @@
+---
+name: raid-browser
+description: "Core browser orchestration: startup discovery, boot/cleanup lifecycle, port isolation, pre-flight checks (auth, test subject clarity). Shared infrastructure for raid-browser-playwright and raid-browser-chrome."
+---
+# Raid Browser — Core Orchestration
+Shared infrastructure for browser testing. Handles startup discovery, boot/cleanup lifecycle, port isolation, and pre-flight checks.
+**This skill is invoked by `raid-browser-playwright` and `raid-browser-chrome` — not directly by users.**
+## The Iron Laws
+```
+1. EVERY BOOT HAS A MATCHING CLEANUP — leaked processes are never acceptable
+2. EVERY BROWSER SESSION STARTS WITH PRE-FLIGHT — no vague "test the app"
+3. STARTUP RECIPE IS DISCOVERED ONCE, CODIFIED FOREVER — investigate, verify, write to raid.json
+```
+## Pre-Flight Checks (MANDATORY before every browser session)
+### 1. Test Subject Clarity (HARD GATE)
+Before ANY browser action, the agent MUST state exactly what they're testing:
+```
+BROWSER TEST SUBJECT:
+- Feature: "<specific feature name>"
+- Scope: "<what interactions/flows are being tested>"
+- Success criteria: "<what 'working' looks like>"
+- Out of scope: "<what we're NOT testing>"
+```
+If the agent cannot clearly state the test subject, the Wizard asks the user:
+```
+WIZARD → USER: "What specific user flow should we verify in the browser?
+Examples:
+- 'User can complete checkout with a credit card'
+- 'Admin dashboard loads data tables correctly'
+- 'Search filters update results in real-time'"
+```
+**No vague subjects.** "Test the app" or "check if it works" are not valid subjects.
+### 2. Authentication Check
+The agent investigates auth requirements by reading:
+- Auth middleware, login pages, session config, protected routes
+- `.env.example` for auth-related variables
+- README for auth setup instructions
+If auth is required and no credentials exist in `raid.json` under `browser.auth`:
+```
+WIZARD → USER: "This app requires authentication. I need:
+  1. Test user credentials (email/password) or a method to create them
+  2. Are there different roles to test? (admin, user, guest)
+  3. Is there a seed script that creates test users?
+  Credentials will be stored in .env.raid (gitignored)."
+```
+Auth config in `raid.json` (credentials reference env vars from `.env.raid`):
+```json
+"auth": {
+  "required": true,
+  "method": "cookie-session",
+  "loginUrl": "/login",
+  "credentials": {
+    "default": { "email": "$RAID_TEST_EMAIL", "password": "$RAID_TEST_PASSWORD" },
+    "admin": { "email": "$RAID_ADMIN_EMAIL", "password": "$RAID_ADMIN_PASSWORD" }
+  },
+  "seedCommand": "{runCommand} db:seed-test-users"
+}
+```
+### 3. Route/Page Discovery
+After boot, before testing, the agent maps relevant pages:
+- What URLs are involved in this feature?
+- What's the expected navigation flow?
+- Loading states, redirects, client-side routing?
+Pin to Dungeon as verified context for all agents.
+## Startup Discovery Protocol
+Invoked when `browser.startup` is `null` in `raid.json`. The Wizard assigns one agent to investigate.
+### Investigation Steps
+1. **Read project config** — `package.json` scripts, `.env.example`, `.env.local.example`, `docker-compose.yml`, `wrangler.toml`, `vercel.json`, `netlify.toml`, `Procfile`
+2. **Read README** — "Getting Started", "Development", "Running locally" sections
+3. **Map runtime topology** — identify every process needed:
+   - Primary dev server (the detected framework)
+   - API servers / backend processes
+   - Edge workers (Cloudflare Workers, Vercel Edge, etc.)
+   - Databases (Postgres, MySQL, Redis, SQLite)
+   - Message queues, search engines, etc.
+   - Seed/migration scripts that must run first
+4. **Identify environment variables** — which need to differ per instance (DB names, ports), which are shared (API keys)
+5. **Test the recipe** — boot on a non-default port, run health check, tear down
+6. **Pin to Dungeon** — `DUNGEON: Startup recipe verified — [full recipe details]`
+7. **Write to `raid.json`** — populate `browser.startup`
+### Challengers Attack the Recipe
+- Is the cleanup complete? What if a service crashes mid-boot?
+- Does it handle port conflicts?
+- What about stale PID files?
+- Does the DB migration run idempotently?
+### Startup Recipe Format (in `raid.json`)
+```json
+"startup": {
+  "env": { "DATABASE_URL": "postgresql://localhost:5432/test_{{PORT}}" },
+  "services": [
+    { "name": "db", "command": "docker compose up -d postgres" },
+    { "name": "edge", "command": "wrangler dev --port {{EDGE_PORT}}" },
+    { "name": "app", "command": "{devCommand} --port {{PORT}}" }
+  ],
+  "readyCheck": "curl -s http://localhost:{{PORT}}/api/health",
+  "cleanup": ["kill {{PID}}", "docker compose down"]
+}
+```
+Template variables `{{PORT}}`, `{{EDGE_PORT}}`, `{{PID}}` are resolved per-agent at runtime.
+## Boot/Cleanup Lifecycle
+### BOOT(agentId, port)
+```
+1. Resolve template variables from portRange assignment
+2. Set per-instance environment variables
+3. Start services in declared order (respecting dependencies)
+4. Wait for readyCheck to pass (timeout: 60s, retry every 2s)
+5. Record all PIDs for this agent
+6. Return { pids, port, baseUrl }
+If any service fails to start:
+  → Kill all already-started services
+  → Report failure with service logs
+  → Do NOT proceed to testing
+```
+### CLEANUP(agentId)
+```
+1. Kill all PIDs tracked for this agent (SIGTERM first)
+2. Wait 5s for graceful shutdown
+3. SIGKILL any remaining processes
+4. Run cleanup commands from startup config
+5. Verify all assigned ports are released (lsof -i :PORT)
+6. Remove any temp files/DBs created for this instance
+If cleanup fails:
+  → Report which ports/processes are still alive
+  → Wizard escalates to user immediately
+```
+## Port Allocation
+Read `portRange` from `raid.json` (e.g., `[3001, 3005]`).
+| Mode | Agents | Port Assignment |
+|---|---|---|
+| Full Raid (Phase 3) | 1 implementer | portRange[0] |
+| Full Raid (Phase 4) | 3 challengers | portRange[0], portRange[0]+1, portRange[0]+2 |
+| Skirmish | 2 agents | portRange[0], portRange[0]+1 |
+| Scout | 1 agent | portRange[0] |
+## When Startup Recipe Fails
+If the existing `browser.startup` recipe fails on boot:
+1. Don't retry blindly — investigate what changed
+2. Read error logs from failed services
+3. Check if dependencies changed (new env vars, new services, port conflicts)
+4. Update the recipe in `raid.json`
+5. Re-test the updated recipe
+6. Pin to Dungeon: `DUNGEON: Startup recipe updated — [reason for change]`

package/template/.claude/skills/raid-browser-chrome/SKILL.md ADDED Viewed

@@ -0,0 +1,189 @@
+---
+name: raid-browser-chrome
+description: "Claude-in-Chrome live adversarial browser inspection. Angle-driven with minimum coverage gates. Each agent runs own isolated instance. GIF/screenshot evidence required. Invoked from raid-review during Phase 4."
+---
+# Raid Browser Chrome — Live Adversarial Inspection
+Challengers open a real Chrome browser and do adversarial exploratory testing. Each challenger gets their own isolated app instance. Find what automated tests missed.
+<HARD-GATE>
+Do NOT start inspection without invoking `raid-browser` pre-flight first. Do NOT skip minimum coverage gates. Do NOT share browser instances between agents. Every finding MUST include evidence (GIF, screenshot, console/network output). No subagents.
+</HARD-GATE>
+## Session Lifecycle Per Challenger
+```
+1. BOOT(agentId, assignedPort)     ← from raid-browser
+2. PRE-FLIGHT(feature)             ← state subject, check auth, discover routes
+3. LOGIN (if auth required)        ← fill credentials, verify logged in
+4. MINIMUM GATES                   ← console, network, page loads (mandatory)
+5. ANGLE-DRIVEN INSPECTION         ← Warrior/Archer/Rogue specific
+6. REPORT                          ← findings + evidence pinned to Dungeon
+7. CLEANUP(agentId)                ← kill everything
+```
+## Login Automation (if auth required)
+```
+1. navigate → loginUrl from raid.json
+2. form_input → fill credentials (resolved from .env.raid)
+3. click → submit button
+4. read_page → verify logged in (check for dashboard, user menu, etc.)
+If login fails → pin as CRITICAL finding, skip inspection:
+DUNGEON [CRITICAL]: Login failed — cannot test authenticated flows
+```
+## Minimum Coverage Gates (MANDATORY for every challenger)
+Before angle-driven inspection, every challenger MUST complete these checks:
+| Check | Tool | Look For |
+|---|---|---|
+| Console errors | `read_console_messages` | Errors, unhandled promise rejections, deprecation warnings |
+| Network failures | `read_network_requests` | 4xx/5xx responses, failed fetches, CORS errors, unexpectedly large payloads |
+| Page loads | `navigate` + `read_page` | All relevant pages render without blank screens, hydration mismatches, or missing content |
+**Only after ALL gates pass does the challenger proceed to their angle.**
+If a gate fails, pin the finding immediately — don't wait for angle inspection.
+## Angle-Driven Inspection
+### Warrior — Stress & Breakage
+Break things. Find what crashes under pressure.
+- **Rapid interactions** — click buttons multiple times fast, submit forms repeatedly
+- **Large inputs** — paste huge text blocks, upload oversized files, fill numbers with extreme values
+- **Navigation abuse** — back/forward rapidly, refresh during submission, deep-link to mid-flow pages
+- **Viewport stress** — `resize_window` to mobile (375px), tablet (768px), ultra-wide (1920px) during interactions
+Evidence format:
+```
+CHALLENGE: Double-clicking "Place Order" submits two orders.
+Console: "Unhandled rejection: duplicate key constraint"
+Network: Two POST /api/orders — first returned 201, second returned 500
+[GIF: warrior-double-submit.gif]
+```
+### Archer — Precision, Visual Consistency & Spec Compliance
+Every pixel matters. Every pattern must be consistent.
+- **Cross-page visual consistency** — same components styled identically across pages? Same button styles, spacing, typography?
+- **Design system compliance** — correct tokens, spacing values, color variables?
+- **State visual coverage** — hover, focus, active, disabled, loading, error, empty states all visible and correct?
+- **Responsive check** — screenshot at 375px (mobile), 768px (tablet), 1280px (desktop), 1920px (ultra-wide) using `resize_window`
+- **Dark mode / theme consistency** — if the app supports themes, check both
+- **Tab order and keyboard navigation** — complete the flow without a mouse
+- **Network efficiency** — redundant API calls? Missing caching? Overfetching data?
+Evidence format:
+```
+CHALLENGE: Search results page makes 3 identical GET /api/products calls on load.
+Network: Duplicate requests at 0ms, 50ms, 120ms — useEffect re-render bug.
+Console: "Warning: Cannot update a component while rendering a different component"
+[Screenshot: archer-duplicate-fetches.png]
+```
+### Rogue — Adversarial & Security
+Think like an attacker. Find what the developers assumed couldn't happen.
+- **XSS probing** — type in every input field:
+  - `<script>alert('xss')</script>`
+  - `"><img src=x onerror=alert(1)>`
+  - `javascript:alert(document.cookie)`
+- **Auth boundary testing** — navigate to admin routes as regular user, access other users' data by changing URL IDs
+- **API manipulation** — use `javascript_tool` to replay network requests with modified payloads, changed IDs, missing auth headers
+- **State corruption** — open multiple tabs, perform conflicting actions simultaneously
+- **Data leak inspection** — check network responses for fields that shouldn't be exposed (passwords, tokens, internal IDs, other users' data)
+Evidence format:
+```
+CHALLENGE: Changing /api/users/15 to /api/users/16 returns another user's full profile including email and phone.
+IDOR vulnerability — no server-side ownership check.
+[Screenshot: rogue-idor-leak.png]
+```
+## Severity Classification
+| Finding | Severity |
+|---|---|
+| Crash, security vulnerability, data loss | **Critical** |
+| Layout broken — overlapping, overflowing, hidden elements | **Critical** |
+| Broken feature, wrong behavior, missing error handling | **Important** |
+| Visual inconsistency — different spacing/colors/fonts across same feature | **Important** |
+| Responsive breakage — feature unusable at common breakpoints | **Important** |
+| Misalignment with design spec / design doc | **Important** |
+| Animation/transition glitch — janky, missing, wrong | **Important** |
+| Console warning (non-error) | **Minor** |
+| Minor polish — 1px off on a non-primary element | **Minor** |
+**Critical and Important block merge. Minor is noted for future.**
+## Evidence Requirements
+| Severity | Required Evidence |
+|---|---|
+| Critical | GIF recording of the flow + console log + network request detail |
+| Important | Screenshot + console or network detail |
+| Minor | Screenshot or console excerpt |
+### Evidence Tools
+| Tool | Use For |
+|---|---|
+| `gif_creator` | Record multi-step interaction flows — capture extra frames before/after actions |
+| `read_page` / `get_page_text` | Capture DOM state |
+| `read_console_messages` | Capture console output — use `pattern` param to filter noise |
+| `read_network_requests` | Capture API traffic, payloads, response codes |
+| `javascript_tool` | Custom checks: localStorage, cookies, JS state, replay requests |
+| `resize_window` | Test responsive behavior at specific widths |
+## Cross-Challenger Verification
+After all challengers report, they cross-verify findings on their own instances:
+- **Can reproduce + confirm:** `BUILDING: @Warrior, confirmed double-submit on port 3002. Also affects payment endpoint.`
+- **Cannot reproduce:** `CHALLENGE: Could not reproduce @Warrior's double-submit on port 3002. Tried 10 rapid clicks, all debounced. Possible race condition — flaky or env-specific?`
+- **Find it's worse:** `BUILDING: @Rogue, the IDOR on /api/users also works on /api/orders — any authenticated user can read any order.`
+## Dungeon Pinning
+```
+DUNGEON [CRITICAL]: IDOR vulnerability on /api/users/:id — no ownership check
+DUNGEON [IMPORTANT]: Button padding 16px on /settings, 12px on /profile — visual inconsistency
+DUNGEON [IMPORTANT]: No loading state on search results — blank screen for 2s on slow network
+DUNGEON [MINOR]: Console warning "act() not wrapped" on search page — React testing artifact
+```
+## Cleanup Iron Law
+After inspection completes (or crashes):
+```
+1. Close all Chrome tabs opened for this agent's URL
+2. Kill dev server process on assigned port
+3. Kill auxiliary services (edge workers, DB containers, etc.)
+4. Verify port is released: lsof -i :{PORT}
+5. Remove temp data (test DB, uploaded files, seeded data)
+If cleanup fails:
+  → Report exactly which ports/processes are still alive
+  → Wizard escalates to user IMMEDIATELY
+  → Never leave leaked processes on the developer's machine
+```
+## Red Flags
+| Thought | Reality |
+|---------|---------|
+| "Console warnings are always Minor" | Warnings can indicate real bugs (memory leaks, state issues). Investigate first. |
+| "Visual consistency is just polish" | Inconsistent UI erodes user trust. It's Important severity. |
+| "I checked the happy path, that's enough" | The happy path is what the developer already tested. Your job is to break it. |
+| "I can share a browser with another agent" | Own instance or you corrupt each other's state. No sharing. |
+| "Cleanup can wait until the end" | Clean up YOUR instance when YOU'RE done. Don't leave it for others. |
+| "Screenshots are optional for Important findings" | No evidence = no finding. Always capture proof. |

package/template/.claude/skills/raid-browser-playwright/SKILL.md ADDED Viewed

@@ -0,0 +1,163 @@
+---
+name: raid-browser-playwright
+description: "Playwright MCP automated browser test authoring. Extends TDD RED-GREEN-REFACTOR with .spec.ts files. Console + network assertions mandatory. Invoked from raid-tdd and raid-implementation during Phase 3."
+---
+# Raid Browser Playwright — Automated Test Authoring
+Write browser tests as part of TDD. Use Playwright MCP to explore, then encode verified interactions into durable `.spec.ts` files.
+<HARD-GATE>
+Do NOT write browser tests without invoking `raid-browser` pre-flight first. Do NOT skip console/network assertions. Do NOT write tests without watching them fail first (TDD RED step). No subagents.
+</HARD-GATE>
+## When to Write Browser Tests vs Unit Tests
+Not every task needs a browser test. The implementer decides and states reasoning. Challengers attack this decision.
+| Write Browser Test | Write Unit Test Only |
+|---|---|
+| New user-facing flow (signup, checkout) | Pure utility function |
+| UI interaction (drag-drop, modal, form) | API endpoint logic |
+| Client-side routing / navigation | Data transformation |
+| Visual state changes (loading, error, empty) | Business rule validation |
+| Integration between frontend and API | Database queries |
+**If unsure:** Write the browser test. It's easier to remove an unnecessary test than to find a bug in production.
+## Browser TDD Cycle
+### RED (browser)
+1. Write Playwright test file: `tests/e2e/<feature>.spec.ts`
+2. Test describes **user behavior**, not implementation:
+   - Navigate to page
+   - Interact (click, type, select, drag)
+   - Assert visible outcome (text appears, redirect happens, element state changes)
+3. Include mandatory infrastructure assertions (see below)
+4. Run test → **MUST fail**
+5. Verify it fails for the **RIGHT reason** (page/element missing — not test syntax error)
+### GREEN (browser)
+1. Implement the feature code
+2. Run Playwright test → **MUST pass**
+3. Run full test suite (unit + browser) → all green
+### REFACTOR
+1. Clean up implementation and test code
+2. Re-run all tests → still green
+## Using Playwright MCP During Test Authoring
+While writing the test, the implementer explores interactively to understand the current state and find correct selectors:
+| Tool | Purpose |
+|---|---|
+| `browser_navigate` | Load the page, see what's there |
+| `browser_snapshot` | Get DOM state, find correct selectors |
+| `browser_click` / `browser_fill_form` | Test interactions manually first |
+| `browser_console_messages` | Check for errors during interaction |
+| `browser_network_requests` | Verify API calls, check payloads |
+| `browser_take_screenshot` | Capture visual state for evidence |
+**The MCP tools are the exploratory scratchpad. The `.spec.ts` file is the durable artifact.**
+Encode what you verified interactively into the test file. The test must run headlessly in CI without MCP tools.
+## Mandatory Assertions
+Every browser test file MUST include at least:
+### 1. Console-Clean Assertion
+```typescript
+test('no console errors during <feature> flow', async ({ page }) => {
+  const errors: string[] = [];
+  page.on('console', msg => {
+    if (msg.type() === 'error') errors.push(msg.text());
+  });
+  // ... perform the feature flow ...
+  expect(errors).toEqual([]);
+});
+```
+### 2. Network-Health Assertion
+```typescript
+test('API calls succeed during <feature> flow', async ({ page }) => {
+  const failures: string[] = [];
+  page.on('response', response => {
+    if (response.status() >= 400) {
+      failures.push(`${response.status()} ${response.url()}`);
+    }
+  });
+  // ... perform the feature flow ...
+  expect(failures).toEqual([]);
+});
+```
+**Missing either of these is an automatic challenge from any reviewer.**
+## Selector Best Practices
+| Prefer | Avoid | Why |
+|---|---|---|
+| `data-testid="submit-btn"` | `button.btn-primary` | CSS classes change for styling reasons |
+| `getByRole('button', { name: 'Submit' })` | `#submit` | Accessible and resilient |
+| `getByText('Welcome back')` | `.header > div:nth-child(2)` | Structural selectors break on layout changes |
+## Challenger Attacks on Browser Tests (Phase 3)
+**Warrior attacks:**
+- "You only tested the happy path — what happens with network failure?"
+- "No test for rapid double-submit on the form"
+- "What about a 10,000-character input in the name field?"
+- "You didn't test with JavaScript disabled / slow network"
+**Archer attacks:**
+- "Your selector `button[type=submit]` is fragile — use `data-testid`"
+- "No assertion on console errors — the feature works but throws warnings"
+- "Missing network assertion — you don't verify the POST payload"
+- "Tested at desktop width only — what about mobile viewport?"
+**Rogue attacks:**
+- "What happens if the user is already logged in and hits /register?"
+- "No test for XSS in the input fields"
+- "What if the API returns 200 but with an error body?"
+- "Race condition: what if the user navigates away during submission?"
+**Each challenger BOOTS their own app instance** (on their own port via `raid-browser`), runs the tests independently, and verifies they pass without flakiness.
+## Running Browser Tests
+Use the test command from `.claude/raid.json`:
+- Read `project.execCommand` (e.g., `pnpm dlx`, `npx`, `bunx`)
+- Run: `{execCommand} playwright test`
+- For a specific test: `{execCommand} playwright test tests/e2e/<feature>.spec.ts`
+## Test File Organization
+```
+tests/
+  e2e/
+    <feature-name>.spec.ts       # One file per feature/flow
+    auth/
+      login.spec.ts              # Group related flows in directories
+      registration.spec.ts
+```
+## Red Flags
+| Thought | Reality |
+|---------|---------|
+| "The feature is too simple for a browser test" | Simple features break in the browser. If it's user-facing, test it. |
+| "I'll add console assertions later" | Later never comes. Add them now. |
+| "The unit tests cover this" | Unit tests don't catch hydration mismatches, missing CSS, broken routing. |
+| "I tested it manually with MCP tools" | Manual verification isn't reproducible. Write the `.spec.ts`. |
+| "Selectors are fine, they work" | They work today. Will they work after a CSS refactor? Use `data-testid`. |