npm - claude-raid - Versions diffs - 0.1.1 → 0.1.3 - Mend

claude-raid 0.1.1 → 0.1.3

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (51) hide show

package/README.md +298 -196
package/bin/cli.js +45 -18
package/package.json +1 -1
package/src/descriptions.js +57 -0
package/src/detect-browser.js +164 -0
package/src/detect-package-manager.js +107 -0
package/src/detect-project.js +44 -6
package/src/doctor.js +12 -188
package/src/init.js +192 -17
package/src/merge-settings.js +63 -7
package/src/remove.js +28 -4
package/src/setup.js +405 -0
package/src/ui.js +168 -0
package/src/update.js +62 -5
package/src/version-check.js +130 -0
package/template/.claude/agents/archer.md +46 -51
package/template/.claude/agents/rogue.md +43 -49
package/template/.claude/agents/warrior.md +48 -53
package/template/.claude/agents/wizard.md +65 -67
package/template/.claude/hooks/raid-lib.sh +182 -0
package/template/.claude/hooks/raid-pre-compact.sh +41 -0
package/template/.claude/hooks/raid-session-end.sh +116 -0
package/template/.claude/hooks/raid-session-start.sh +52 -0
package/template/.claude/hooks/raid-stop.sh +68 -0
package/template/.claude/hooks/raid-task-completed.sh +37 -0
package/template/.claude/hooks/raid-task-created.sh +40 -0
package/template/.claude/hooks/raid-teammate-idle.sh +28 -0
package/template/.claude/hooks/validate-browser-cleanup.sh +36 -0
package/template/.claude/hooks/validate-browser-tests-exist.sh +52 -0
package/template/.claude/hooks/validate-commit.sh +130 -0
package/template/.claude/hooks/validate-dungeon.sh +114 -0
package/template/.claude/hooks/validate-file-naming.sh +13 -27
package/template/.claude/hooks/validate-no-placeholders.sh +11 -21
package/template/.claude/hooks/validate-write-gate.sh +60 -0
package/template/.claude/raid-rules.md +27 -18
package/template/.claude/skills/raid-browser/SKILL.md +186 -0
package/template/.claude/skills/raid-browser-chrome/SKILL.md +189 -0
package/template/.claude/skills/raid-browser-playwright/SKILL.md +163 -0
package/template/.claude/skills/raid-debugging/SKILL.md +6 -6
package/template/.claude/skills/raid-design/SKILL.md +10 -10
package/template/.claude/skills/raid-finishing/SKILL.md +11 -3
package/template/.claude/skills/raid-implementation/SKILL.md +26 -11
package/template/.claude/skills/raid-implementation-plan/SKILL.md +15 -4
package/template/.claude/skills/raid-protocol/SKILL.md +57 -32
package/template/.claude/skills/raid-review/SKILL.md +42 -13
package/template/.claude/skills/raid-tdd/SKILL.md +45 -3
package/template/.claude/skills/raid-verification/SKILL.md +12 -1
package/template/.claude/hooks/validate-commit-message.sh +0 -78
package/template/.claude/hooks/validate-phase-gate.sh +0 -60
package/template/.claude/hooks/validate-tests-pass.sh +0 -43
package/template/.claude/hooks/validate-verification.sh +0 -70

package/template/.claude/skills/raid-debugging/SKILL.md CHANGED Viewed

@@ -87,21 +87,21 @@ digraph debugging {
 The Wizard dispatches all agents with different hypotheses. After dispatch, agents debate directly:
-**📡 DISPATCH:**
+**DISPATCH:**
 > **@Warrior**: Investigate [structural/data cause]. Reproduce. Trace data flow. Gather evidence at boundaries.
 > **@Archer**: Investigate [integration/contract cause]. Check interfaces, type mismatches, implicit contracts, dependency versions.
 > **@Rogue**: Investigate [timing/state/adversarial cause]. Race conditions, stale state, environment assumptions, concurrent access.
 >
-> **All**: Investigate independently, then debate directly. Challenge each other's hypotheses with evidence. Build on each other's findings. Pin verified evidence to the Dungeon. The hypothesis that survives cross-testing wins. Escalate to me with `🆘 WIZARD:` only if stuck.
+> **All**: Investigate independently, then debate directly. Challenge each other's hypotheses with evidence. Build on each other's findings. Pin verified evidence to the Dungeon. The hypothesis that survives cross-testing wins. Escalate to me with `WIZARD:` only if stuck.
 **How agents debate root cause:**
-- `⚔️ CHALLENGE: @Rogue, your race condition hypothesis doesn't explain why it fails on single-threaded test runs — evidence: [test output]`
-- `🔗 BUILDING ON @Warrior: Your data flow trace reveals the value originates from the config loader, not the API call — here's the upstream path: ...`
-- `📌 DUNGEON: Root cause evidence — config loader at config.js:47 returns stale cache when called concurrently [verified by @Archer and @Warrior]`
+- `CHALLENGE: @Rogue, your race condition hypothesis doesn't explain why it fails on single-threaded test runs — evidence: [test output]`
+- `BUILDING: @Warrior, your data flow trace reveals the value originates from the config loader, not the API call — here's the upstream path: ...`
+- `DUNGEON: Root cause evidence — config loader at config.js:47 returns stale cache when called concurrently [verified by @Archer and @Warrior]`
 The hypothesis that survives direct cross-testing gets the Wizard's ruling:
-⚡ WIZARD RULING: Root cause is [X] because [evidence from Dungeon].
+RULING: Root cause is [X] because [evidence from Dungeon].
 ## Root Cause Tracing

package/template/.claude/skills/raid-design/SKILL.md CHANGED Viewed

@@ -92,7 +92,7 @@ Create `.claude/raid-dungeon.md`:
 Each agent gets the same objective but a different starting angle. After dispatch, the Wizard goes silent.
-**📡 DISPATCH:**
+**DISPATCH:**
 > **@Warrior**: Explore from the data/infrastructure side. What are the hard technical constraints? What schemas, migrations, APIs are needed? What breaks if we get this wrong? Find the structural load-bearing walls. Challenge @Archer and @Rogue's findings directly. Pin verified findings to the Dungeon.
 >
@@ -100,7 +100,7 @@ Each agent gets the same objective but a different starting angle. After dispatc
 >
 > **@Rogue**: Explore from the failure/adversarial side. What assumptions about inputs, state, timing, availability? Build failure scenarios. What does a malicious user do? What does a slow network do? What does concurrent access do? Challenge @Warrior and @Archer's findings directly. Pin verified findings to the Dungeon.
 >
-> **All**: Read the Dungeon. Build on each other's discoveries. Challenge everything. Pin only what survives. Escalate to me with `🆘 WIZARD:` only when genuinely stuck.
+> **All**: Read the Dungeon. Build on each other's discoveries. Challenge everything. Pin only what survives. Escalate to me with `WIZARD:` only when genuinely stuck.
 ## What Agents Must Cover
@@ -109,7 +109,7 @@ Every agent addresses ALL of these from their assigned angle:
 - **Performance** — scale, bottlenecks, complexity
 - **Robustness** — retries, fallbacks, graceful degradation
 - **Reliability** — blast radius of failure, production-readiness
-- **Testability** — meaningful tests, mock strategy, test-friendly design
+- **Testability** — meaningful tests, mock strategy, test-friendly design. When `browser.enabled`: can this feature be E2E tested with Playwright? What user flows need browser verification? Are there loading states, client-side routing, or visual states that unit tests can't catch?
 - **Error handling** — what errors occur, how surfaced, UX of failure
 - **Edge cases** — empty, null, boundary, Unicode, timezones, large payloads
 - **Cascading effects** — blast radius, what else changes
@@ -124,12 +124,12 @@ Every agent addresses ALL of these from their assigned angle:
 Agents interact DIRECTLY — @Name addressing, building, challenging, roasting:
 1. Present findings with EVIDENCE (file paths, docs, concrete examples)
 2. Challenge other agents DIRECTLY with COUNTER-EVIDENCE (not opinions)
-3. Build on each other's discoveries — 🔗 BUILDING ON @Name:
+3. Build on each other's discoveries — BUILDING: with independent verification
 4. Go to the EDGES — push every finding to its extreme
 5. LEARN from each other — incorporate discoveries into your model
-6. Pin verified findings — 📌 DUNGEON: only after surviving challenge
-7. Roast weak analysis — 🔥 ROAST: with evidence, not insults
-8. Escalate to Wizard — 🆘 WIZARD: only when genuinely stuck
+6. Pin verified findings — DUNGEON: only after surviving challenge
+7. Challenge weak analysis — back every challenge with your own independent evidence
+8. Escalate to Wizard — WIZARD: only when genuinely stuck
 ```
 **The goal is not to tear each other down. The goal is to forge the strongest design by testing it from every angle. The Dungeon captures what survived.**
@@ -144,7 +144,7 @@ The Wizard closes when the Dungeon has sufficient verified findings — enough D
 - Agents are converging — new findings are variations, not revelations
 - Shared Knowledge section has the foundational truths the design needs
-**⚡ WIZARD RULING:** Synthesize from Dungeon evidence. Propose 2-3 approaches. Recommend one. Archive Dungeon.
+**RULING:** Synthesize from Dungeon evidence. Propose 2-3 approaches. Recommend one. Archive Dungeon.
 ## Spec Self-Review
@@ -183,7 +183,7 @@ Save to: specs path from `.claude/raid.json` (default: `docs/raid/specs/YYYY-MM-
 ## Testing Strategy
 ## Edge Cases
 ## Future Considerations (NOT building now, designing to accommodate)
-## ⚡ WIZARD RULING
+## RULING
 ```
 ## Red Flags — Thoughts That Signal Violations
@@ -205,4 +205,4 @@ If the team is stuck on a fundamental design choice after genuine direct debate:
 2. Let the human decide
 3. Never ask the human to resolve something the team should handle
-**Terminal state:** ⚡ WIZARD RULING: Design approved. Commit. Archive Dungeon. Invoke `raid-implementation-plan`.
+**Terminal state:** RULING: Design approved. Commit. Archive Dungeon. Invoke `raid-implementation-plan`.

package/template/.claude/skills/raid-finishing/SKILL.md CHANGED Viewed

@@ -48,7 +48,7 @@ digraph finishing {
 ## Step 1: The Completeness Debate
-**📡 DISPATCH:**
+**DISPATCH:**
 > **@Warrior**: Review the implementation against the plan. Is every task completed? Every acceptance criterion met? Every test passing? Is anything half-done? Fight @Archer and @Rogue directly on their assessments.
 >
@@ -60,7 +60,7 @@ digraph finishing {
 **The agents must fight over this.** If any agent believes the work is incomplete, they present evidence. The other two challenge that claim directly.
-⚡ WIZARD RULING: [Complete — proceed | Incomplete — return to Phase 3/4 with specific issues]
+RULING: [Complete — proceed | Incomplete — return to Phase 3/4 with specific issues]
 ## Step 2: Final Verification
@@ -74,10 +74,18 @@ BEFORE presenting options:
    If YES → Proceed with evidence.
 ```
+### Browser Verification (when `browser.enabled` in raid.json)
+Additional final checks:
+- Full Playwright test suite passes headlessly
+- Verify no leaked processes from prior browser sessions
+- Verify all ports in `browser.portRange` are free (`lsof -i :PORT`)
+- Agents debate: "Are browser tests sufficient for this feature's coverage?"
 ## Step 3: Present Options
 ```
-⚡ WIZARD RULING: Implementation complete and verified.
+RULING: Implementation complete and verified.
 Tests: [N] passing, 0 failures (evidence: [command output])

package/template/.claude/skills/raid-implementation/SKILL.md CHANGED Viewed

@@ -52,10 +52,14 @@ digraph implementation {
 1. **Read the plan** — extract all tasks, dependencies, ordering
 2. **Read Phase 2 archived Dungeon** — carry forward context
 3. **Set up worktree** — use `raid-git-worktrees` for isolation (optional)
-4. **Create task tracking** — use TaskCreate for every plan task
-5. **Per task:** Assign implementer (rotate), open Dungeon, observe attack, close with ruling
-6. **Track progress** — mark complete only after Wizard ruling per task
-7. **After all tasks** — archive Dungeon, invoke `raid-review`
+4. **Browser setup (if `browser.enabled` in raid.json)**:
+   - Check if `browser.startup` exists — if null, invoke `raid-browser` startup discovery FIRST
+   - Check if Playwright is installed — if not, first task becomes "scaffold Playwright"
+   - Assign port from `browser.portRange` to implementer
+5. **Create task tracking** — use TaskCreate for every plan task
+6. **Per task:** Assign implementer (rotate), open Dungeon, observe attack, close with ruling
+7. **Track progress** — mark complete only after Wizard ruling per task
+8. **After all tasks** — archive Dungeon, invoke `raid-review`
 ## The Implementation Gauntlet (per task)
@@ -63,7 +67,7 @@ digraph implementation {
 One agent implements. Others prepare to attack. **Rotate the implementer** across tasks.
-The Wizard doesn't open a new Dungeon for every task — the Phase 3 Dungeon is continuous across all tasks. But the Wizard announces each task assignment clearly.
+Phase 3 uses a single continuous Dungeon (`.claude/raid-dungeon.md`) across all tasks — unlike Phases 1 and 2 which each get their own Dungeon that is archived on close. The Wizard announces each task assignment clearly within the running Dungeon.
 ### Step 2: Implementer Executes (TDD)
@@ -76,6 +80,11 @@ Following `raid-tdd` strictly:
 6. Self-review against acceptance criteria
 7. Commit: `feat(scope): descriptive message`
+**Browser tasks (if `browser.enabled` and task involves browser-facing code):**
+- BOOT app on assigned port before browser TDD (invoke `raid-browser`)
+- Use Playwright MCP tools to explore while authoring tests
+- CLEANUP after task is complete (or on failure — cleanup always runs)
 Report status: **DONE** | **DONE_WITH_CONCERNS** | **NEEDS_CONTEXT** | **BLOCKED**
 ### Step 3: Challengers Attack Directly
@@ -83,10 +92,16 @@ Report status: **DONE** | **DONE_WITH_CONCERNS** | **NEEDS_CONTEXT** | **BLOCKED
 This is where the new model shines. Challengers don't just report to the Wizard — they:
 1. **Read ACTUAL CODE** (not the implementer's report — reports lie)
-2. **Challenge the implementer directly:** `⚔️ CHALLENGE: @Warrior, your implementation at handler.js:23 doesn't validate...`
-3. **Build on each other's critiques:** `🔗 BUILDING ON @Archer: Your naming drift finding — the inconsistency also affects the test at...`
-4. **Roast weak implementations:** `🔥 ROAST: @Rogue, you claimed this handles concurrent access but there's no lock at...`
-5. **Pin verified issues to Dungeon:** `📌 DUNGEON: Confirmed issue — handler.js:23 missing validation [verified by @Archer and @Rogue]`
+2. **Challenge the implementer directly:** `CHALLENGE: @Warrior, your implementation at handler.js:23 doesn't validate...`
+3. **Build on each other's critiques:** `BUILDING: @Archer, your naming drift finding — the inconsistency also affects the test at...`
+4. **Challenge weak implementations:** `CHALLENGE: @Rogue, you claimed this handles concurrent access but there's no lock at...`
+5. **Pin verified issues to Dungeon:** `DUNGEON: Confirmed issue — handler.js:23 missing validation [verified by @Archer and @Rogue]`
+**Browser verification (if `browser.enabled`):**
+- Challengers can BOOT on their own ports to run Playwright tests independently
+- Verify tests pass without flakiness (run 3x if suspect)
+- Explore the feature manually via Playwright MCP to find gaps the tests missed
+- Each challenger CLEANUPS their own instance when done
 **Challengers check:**
 - Spec compliance — does it match the task spec line by line?
@@ -102,11 +117,11 @@ The implementer defends against BOTH challengers simultaneously:
 - Respond to each challenge with evidence or concede immediately
 - Fix conceded issues
 - Re-run all tests
-- Pin resolved issues to Dungeon: `📌 DUNGEON: Resolved — added validation at handler.js:23 [tests pass]`
+- Pin resolved issues to Dungeon: `DUNGEON: Resolved — added validation at handler.js:23 [tests pass]`
 ### Step 5: Wizard Closes Task
-⚡ WIZARD RULING: Task N [approved | needs fixes]
+RULING: Task N [approved | needs fixes]
 The Wizard closes when the Dungeon shows all issues resolved and challengers have no remaining critiques.

package/template/.claude/skills/raid-implementation-plan/SKILL.md CHANGED Viewed

@@ -76,7 +76,7 @@ Create `.claude/raid-dungeon.md`:
 ## Dispatch for Decomposition
-**📡 DISPATCH:**
+**DISPATCH:**
 > **@Warrior**: Decompose into tasks. Focus on structural ordering — what MUST be built first? Hard dependencies? Critical path? Include tests for every task. Challenge @Archer and @Rogue's decompositions directly. Pin agreed tasks to Dungeon.
 >
@@ -84,7 +84,7 @@ Create `.claude/raid-dungeon.md`:
 >
 > **@Rogue**: Decompose into tasks. Focus on hidden complexity — which tasks are deceptively hard? Where will the implementer guess wrong? Which tests miss the failure path? Challenge @Warrior and @Archer directly. Pin agreed tasks to Dungeon.
 >
-> **All**: Read the Phase 1 archived Dungeon for design knowledge. Interact directly. Build on each other's decompositions. Pin agreed tasks with `📌 DUNGEON:`. Escalate to me with `🆘 WIZARD:` only when genuinely stuck.
+> **All**: Read the Phase 1 archived Dungeon for design knowledge. Interact directly. Build on each other's decompositions. Pin agreed tasks with `DUNGEON:`. Escalate to me with `WIZARD:` only when genuinely stuck.
 ## Collaborative Compliance Testing (Agent-Driven)
@@ -109,6 +109,17 @@ After independent decomposition, agents fight directly over the plan:
 - "Run tests to verify pass" — step
 - "Commit" — step
+### Browser Test Tasks (when `browser.enabled` in raid.json)
+When a task involves browser-facing code, the plan must include browser test steps alongside unit tests:
+- "Write failing Playwright test (`tests/e2e/<feature>.spec.ts`)" — step
+- "Run `{execCommand} playwright test` to verify it fails" — step
+- "Implement the feature" — step
+- "Run Playwright test to verify it passes" — step
+- "Run full suite (unit + browser) to verify no regressions" — step
+Not every task needs a browser test. Include them for user-facing flows, UI interactions, client-side routing, and visual state changes. State reasoning — challengers will attack this decision.
 ## Task Structure
 ````markdown
@@ -154,7 +165,7 @@ After writing the complete plan:
 2. **Placeholder scan:** Search for TBD, TODO, vague descriptions, missing code. Fix them.
 3. **Type/name consistency:** Do types, method signatures, property names match across ALL tasks?
 4. **File structure consistency:** Do all file paths follow the project's conventions?
-5. **Test quality:** Does every task have tests? Do tests cover failure paths?
+5. **Test quality:** Does every task have tests? Do tests cover failure paths? When `browser.enabled`: do browser-facing tasks include Playwright tests?
 6. **Ordering:** Can each task be built and committed independently without breaking the build?
 Fix issues inline. If a spec requirement has no task, add the task.
@@ -170,4 +181,4 @@ Fix issues inline. If a spec requirement has no task, add the task.
 | "Tests can be added later" | TDD means tests are in the plan. No test = no task. |
 | "The naming will be consistent enough" | Check it explicitly. Naming drift is the #1 source of bugs. |
-**Terminal state:** ⚡ WIZARD RULING: Plan approved. Commit. Archive Dungeon. Invoke `raid-implementation`.
+**Terminal state:** RULING: Plan approved. Commit. Archive Dungeon. Invoke `raid-implementation`.

package/template/.claude/skills/raid-protocol/SKILL.md CHANGED Viewed

@@ -60,12 +60,35 @@ Read `.claude/raid.json` for project-specific settings. If absent, use sensible
 | `project.testCommand` | (none) | Command to run tests |
 | `project.lintCommand` | (none) | Command to run linting |
 | `project.buildCommand` | (none) | Command to build |
+| `project.packageManager` | (auto-detected) | Package manager (npm, pnpm, yarn, bun, uv, poetry) |
+| `project.runCommand` | (auto-detected) | Run command prefix (e.g., `pnpm`, `npm run`) |
+| `project.execCommand` | (auto-detected) | Exec command prefix (e.g., `pnpm dlx`, `npx`) |
 | `paths.specs` | `docs/raid/specs` | Where design docs go |
 | `paths.plans` | `docs/raid/plans` | Where plans go |
 | `paths.worktrees` | `.worktrees` | Where worktrees go |
 | `conventions.fileNaming` | `none` | Naming convention |
 | `conventions.commits` | `conventional` | Commit format |
 | `raid.defaultMode` | `full` | Default mode |
+| `browser.enabled` | `false` | Whether browser testing is active |
+| `browser.framework` | (auto-detected) | Detected framework (next, vite, angular, etc.) |
+| `browser.devCommand` | (auto-detected) | Dev server command |
+| `browser.baseUrl` | (auto-detected) | Base URL for browser tests |
+| `browser.portRange` | `[3001, 3005]` | Port range for isolated agent instances |
+| `browser.playwrightConfig` | `playwright.config.ts` | Playwright config path |
+| `browser.auth` | `null` | Auth config (discovered by agents) |
+| `browser.startup` | `null` | Startup recipe (discovered by agents) |
+## Browser Testing
+When `browser.enabled` is `true` in `raid.json`, browser testing integrates into the existing workflow:
+- **Phase 3 (Implementation):** Browser-facing code uses TDD with Playwright — write `.spec.ts` files as part of RED-GREEN-REFACTOR. Use `raid-browser-playwright`. Challengers boot their own app instances to verify tests independently.
+- **Phase 4 (Review):** After code review, challengers do live adversarial inspection in Chrome — each on their own isolated port. Use `raid-browser-chrome`. Warrior stress-tests, Archer checks visual consistency, Rogue probes security.
+- **Startup discovery:** First time browser testing runs, an agent investigates how to boot the app (dev server, databases, edge workers, env vars) and writes the recipe to `raid.json`. Use `raid-browser`.
+- **Pre-flight:** Before every browser session, agents must state exactly what they're testing (hard gate) and check auth requirements.
+- **Cleanup iron law:** Every boot has a matching cleanup. Leaked processes are never acceptable.
+Browser testing is **not a separate workflow** — it extends existing phases. If `browser.enabled` is `false` or absent, all browser-related behavior is skipped.
 ## Modes
@@ -122,14 +145,14 @@ The Dungeon (`.claude/raid-dungeon.md`) is the team's shared knowledge board. It
 | Event | Action | Who |
 |-------|--------|-----|
 | Phase opens | Create `.claude/raid-dungeon.md` with header | Wizard |
-| During phase | Read and write via `📌 DUNGEON:` signal | Agents |
+| During phase | Read and write via `DUNGEON:` signal | Agents |
 | Phase closes | Rename to `.claude/raid-dungeon-phase-N.md` | Wizard |
 | Next phase opens | Create fresh `.claude/raid-dungeon.md` | Wizard |
 | Session ends | Remove all Dungeon files | Wizard |
 ### Dungeon Curation Rules
-**What goes IN the Dungeon (via `📌 DUNGEON:` only):**
+**What goes IN the Dungeon (via `DUNGEON:` only):**
 - Findings that survived a challenge (verified truths)
 - Active unresolved battles (prevents re-litigation)
 - Shared knowledge promoted by 2+ agents agreeing
@@ -185,30 +208,24 @@ digraph phase_pattern {
 | Signal | Who | Meaning | Goes to Dungeon? |
 |--------|-----|---------|------------------|
-| `📡 DISPATCH:` | Wizard | Opening a phase, assigning angles | No (phase opening) |
-| `⚡ WIZARD OBSERVES:` | Wizard | Brief course correction, hint, nudge | No |
-| `⚡ WIZARD INTERVENES:` | Wizard | Stops action, something wrong | No |
-| `⚡ WIZARD RULING:` | Wizard | Phase over, binding decision | Ruling archived with Dungeon |
+| `DISPATCH:` | Wizard | Opening a phase, assigning angles | No (phase opening) |
+| `REDIRECT:` | Wizard | Brief course correction — one sentence, then silence | No |
+| `RULING:` | Wizard | Phase over, binding decision | Ruling archived with Dungeon |
 | `@Name, ...` | Any agent | Direct address to specific agent | No |
-| `🔍 FINDING:` | Warrior | Discovery with evidence | Only after surviving challenge |
-| `🎯 FINDING:` | Archer | Discovery with evidence | Only after surviving challenge |
-| `💀 FINDING:` | Rogue | Discovery with attack scenario | Only after surviving challenge |
-| `⚔️ CHALLENGE:` | Warrior | Direct challenge | No |
-| `🏹 CHALLENGE:` | Archer | Direct challenge | No |
-| `🗡️ CHALLENGE:` | Rogue | Direct challenge | No |
-| `🔥 ROAST:` | Any agent | Pointed critique with evidence | No |
-| `🔗 BUILDING ON @Name:` | Any agent | Extending another's work | Result goes to Dungeon if verified |
-| `📌 DUNGEON:` | Any agent | Pinning verified finding | Yes — this is the write gate |
-| `🆘 WIZARD:` | Any agent | Escalation — needs Wizard input | Yes (as escalation point) |
-| `✅ CONCEDE:` | Any agent | Admitting wrong, moving on | No |
+| `FINDING:` | Any agent | Discovery with own evidence | No |
+| `CHALLENGE:` | Any agent | Independently verified a claim, found a problem | No |
+| `BUILDING:` | Any agent | Independently verified a claim, found it goes deeper | Result goes to Dungeon if verified |
+| `DUNGEON:` | Any agent | Pinning finding verified by 2+ agents | Yes — this is the write gate |
+| `WIZARD:` | Any agent | Escalation — needs Wizard input | Yes (as escalation point) |
+| `CONCEDE:` | Any agent | Proven wrong, moving on | No |
 ### Direct Interaction Rules
 - **Evidence required.** All challenges, roasts, and findings must carry proof — file paths, line numbers, concrete scenarios. "This is wrong" without evidence is laziness.
-- **Build explicitly.** `🔗 BUILDING ON @Name:` forces credit and continuity. Don't restart from scratch when someone found something useful.
+- **Build explicitly.** `BUILDING:` forces credit and continuity. Don't restart from scratch when someone found something useful.
 - **Concede instantly.** When proven wrong, concede. Then find a new angle. No ego.
-- **Pin deliberately.** `📌 DUNGEON:` is the quality gate. Only verified, challenged findings get pinned. Other agents can challenge whether a pin belongs.
-- **Escalate wisely.** `🆘 WIZARD:` when genuinely stuck, split on fundamentals, or need project-level context. Not when lazy.
+- **Pin deliberately.** `DUNGEON:` is the quality gate. Only verified, challenged findings get pinned. Other agents can challenge whether a pin belongs.
+- **Escalate wisely.** `WIZARD:` when genuinely stuck, split on fundamentals, or need project-level context. Not when lazy.
 ### When to Escalate to Wizard
@@ -230,14 +247,16 @@ The Wizard observes 90%, acts 10%. Intervention triggers:
 | Signal | Action |
 |--------|--------|
-| Same arguments 3+ rounds, no new evidence | `⚡ WIZARD INTERVENES:` Break the loop. Rule or redirect. |
-| Agents drifting from objective | `⚡ WIZARD OBSERVES:` Redirect with clarity. |
-| Agents stuck, no progress (deadlock) | `⚡ WIZARD INTERVENES:` Rule with rationale. Binding. |
-| Shallow work, rubber-stamping (laziness) | `⚡ WIZARD INTERVENES:` Call out and demand genuine challenge. |
-| Defending past evidence (ego) | `⚡ WIZARD OBSERVES:` Evidence or concede. |
-| Wrong finding in Dungeon (misinformation) | `⚡ WIZARD INTERVENES:` Remove and correct. |
-| Agent escalation (`🆘 WIZARD:`) | Answer or redirect as appropriate. |
-| All agents converged | `⚡ WIZARD RULING:` Synthesize and close. |
+| Same arguments 3+ rounds, no new evidence | `REDIRECT:` Break the loop. Or `RULING:` if unresolvable. |
+| Agents drifting from objective | `REDIRECT:` One sentence back on track. |
+| Agents stuck, no progress (deadlock) | `RULING:` Decide with rationale. Binding. |
+| Shallow work, rubber-stamping (laziness) | `REDIRECT:` Demand genuine independent verification. |
+| Skipped verification (responded without own evidence) | `REDIRECT:` "Verify first, then respond." |
+| Premature convergence (agreed without challenging) | `REDIRECT:` "Challenge before agreeing." |
+| Defending past evidence (ego) | `REDIRECT:` Evidence or concede. |
+| Wrong finding in Dungeon (misinformation) | `REDIRECT:` Remove and correct. |
+| Agent escalation (`WIZARD:`) | Answer or redirect as appropriate. |
+| All agents converged with genuine verification | `RULING:` Synthesize and close. |
 ## Red Flags — Thoughts That Signal Violations
@@ -267,17 +286,23 @@ The Wizard observes 90%, acts 10%. Intervention triggers:
 | `raid-debugging` | Any | Competing hypothesis with direct debate |
 | `raid-verification` | Any | Evidence before completion claims |
 | `raid-git-worktrees` | 3 | Isolated workspace setup |
+| `raid-browser` | 3, 4 | Browser orchestration: startup discovery, boot/cleanup, pre-flight |
+| `raid-browser-playwright` | 3 | Automated browser TDD with Playwright MCP |
+| `raid-browser-chrome` | 4 | Live adversarial Chrome inspection |
 ## Hooks Reference
+All hooks source `raid-lib.sh` for shared session/config parsing.
 | Hook | Event | Active | Purpose |
 |------|-------|--------|---------|
+| `validate-commit.sh` | PreToolUse (Bash) | Always (format), Raid session (tests/verification) | Conventional commits + tests pass + verification evidence |
+| `validate-write-gate.sh` | PreToolUse (Write/Edit) | Raid session only | Phase-aware write gate (design doc before code) |
 | `validate-file-naming.sh` | PostToolUse (Write/Edit) | Always | Enforce naming conventions |
-| `validate-commit-message.sh` | PreToolUse (Bash) | Always | Conventional commits |
-| `validate-tests-pass.sh` | PreToolUse (Bash) | Raid session only | Tests before commits |
-| `validate-phase-gate.sh` | PreToolUse (Write) | Raid session only | Design doc before code |
 | `validate-no-placeholders.sh` | PostToolUse (Write/Edit) | Always | No TBD/TODO in specs/plans |
-| `validate-verification.sh` | PreToolUse (Bash) | Raid session only | Test evidence before completion |
+| `validate-dungeon.sh` | PostToolUse (Write/Edit) | Raid session only | Dungeon discipline enforcement |
+| `validate-browser-cleanup.sh` | PostToolUse (Bash) | Raid session + browser enabled | Warn if browser ports still occupied |
+| `validate-browser-tests-exist.sh` | PreToolUse (Bash) | Raid session + browser enabled | Warn if browser-facing code has no Playwright tests |
 ## Commit Convention

package/template/.claude/skills/raid-review/SKILL.md CHANGED Viewed

@@ -43,11 +43,13 @@ digraph review {
 3. **Dispatch** — all agents review independently, then interact directly
 4. **Observe the fight** — agents challenge findings and missing findings directly
 5. **Close** — categorize surviving issues by severity from Dungeon
-6. **Rule on fixes** — Critical and Important must be fixed
-7. **Verify fixes** — targeted re-attack after fixes (use `raid-verification`)
-8. **Final ruling** — approved or rejected
-9. **Archive Dungeon** — rename to `.claude/raid-dungeon-phase-4.md`
-10. **Transition** — invoke `raid-finishing`
+6. **Browser inspection** — dispatch agents to inspect in Chrome (if `browser.enabled`)
+7. **Observe browser fights** — agents cross-verify findings on separate instances
+8. **Rule on fixes** — Critical and Important must be fixed (code AND browser)
+9. **Verify fixes** — targeted re-attack after fixes (use `raid-verification`)
+10. **Final ruling** — approved or rejected
+11. **Archive Dungeon** — rename to `.claude/raid-dungeon-phase-4.md`
+12. **Transition** — invoke `raid-finishing`
 ## Opening the Dungeon
@@ -71,7 +73,7 @@ Create `.claude/raid-dungeon.md`:
 ## Dispatch
-**📡 DISPATCH:**
+**DISPATCH:**
 > **@Warrior**: Review full implementation. Run every test. Check error handling at every boundary. Verify all requirements from design doc. Find the bugs that crash in production. Then fight @Archer and @Rogue over their findings.
 >
@@ -79,7 +81,7 @@ Create `.claude/raid-dungeon.md`:
 >
 > **@Rogue**: Review full implementation. Think like an attacker. What inputs break it? What timing causes races? What happens when dependencies fail? Find the bugs nobody else will find. Then fight @Warrior and @Archer.
 >
-> **All**: Review independently first, then fight directly. Challenge each other's findings AND each other's blind spots. Pin severity-classified issues to Dungeon with `📌 DUNGEON:`. Reference the Phase 3 Dungeon for context.
+> **All**: Review independently first, then fight directly. Challenge each other's findings AND each other's blind spots. Pin severity-classified issues to Dungeon with `DUNGEON:`. Reference the Phase 3 Dungeon for context.
 ## Review Checklist — Each Agent
@@ -99,10 +101,10 @@ Create `.claude/raid-dungeon.md`:
 After independent reviews, agents fight DIRECTLY over findings AND missing findings:
-- `⚔️ CHALLENGE: @Archer, you gave the auth module a pass but didn't check the session rotation path — review it now.`
-- `🔗 BUILDING ON @Warrior: Your finding about the missing error handler — the impact is worse than you stated because...`
-- `🔥 ROAST: @Rogue, your "Critical" severity on the naming inconsistency is overblown — here's why it's actually Minor...`
-- `📌 DUNGEON: [Critical] handler.js:23 — missing input validation allows injection. Verified by @Warrior and @Rogue.`
+- `CHALLENGE: @Archer, you gave the auth module a pass but didn't check the session rotation path — review it now.`
+- `BUILDING: @Warrior, your finding about the missing error handler — the impact is worse than you stated because...`
+- `CHALLENGE: @Rogue, your "Critical" severity on the naming inconsistency is overblown — here's why it's actually Minor...`
+- `DUNGEON: [Critical] handler.js:23 — missing input validation allows injection. Verified by @Warrior and @Rogue.`
 **Agents classify severity when pinning to Dungeon:**
@@ -112,13 +114,40 @@ After independent reviews, agents fight DIRECTLY over findings AND missing findi
 | **Important** | Missing features, poor error handling, test gaps, naming inconsistencies | Must fix. |
 | **Minor** | Style, docs, optimization | Note for future. |
+## Browser Inspection Phase (when `browser.enabled` in raid.json)
+After code review findings are pinned, the Wizard announces browser inspection.
+### Process
+1. **Wizard announces:** "Browser inspection phase — each reviewer boots their own instance"
+2. **Each reviewer BOOTs** their own app instance on separate ports (invoke `raid-browser`)
+3. **Each reviewer runs PRE-FLIGHT** — state test subject, check auth, discover routes
+4. **Each reviewer LOGINs** if auth is required (credentials from `.env.raid`)
+5. **Each reviewer inspects** from their angle (invoke `raid-browser-chrome`):
+   - Minimum gates first (console, network, page loads)
+   - Then angle-driven exploration (Warrior: stress, Archer: visual/precision, Rogue: security)
+   - Evidence captured for every finding (GIF, screenshot, console/network)
+6. **Cross-verification** — each reviewer reproduces others' findings on their own instance
+7. **Pin browser findings** to Dungeon alongside code review findings
+8. **Each reviewer CLEANUPs** their instance
+9. **Wizard rules** on ALL findings (code + browser) together
+### Browser findings follow the same severity rules:
+- **Critical** (crash, security, layout broken) — must fix
+- **Important** (broken feature, visual inconsistency, responsive breakage) — must fix
+- **Minor** (polish, console warnings) — note for future
+**Browser bugs block merge the same way code bugs do.**
 ## Closing the Phase
 The Wizard closes when agents have exhausted their findings and the Dungeon has all issues classified:
-**⚡ WIZARD RULING: APPROVED FOR MERGE** — all Critical/Important fixed, tests pass, requirements met.
+**RULING: APPROVED FOR MERGE** — all Critical/Important fixed, tests pass, requirements met.
-**⚡ WIZARD RULING: REJECTED** — specify what must change and which phase to return to.
+**RULING: REJECTED** — specify what must change and which phase to return to.
 ## Red Flags

package/template/.claude/skills/raid-tdd/SKILL.md CHANGED Viewed

@@ -78,6 +78,42 @@ Run: test command from `.claude/raid.json`
 - Keep tests green throughout — run after every change
 - Refactor the TESTS too, not just the implementation
+## Browser-Aware TDD (when `browser.enabled` in raid.json)
+### Deciding Test Type
+Before writing the test, decide: is this a unit test or a browser test?
+| Write Browser Test | Write Unit Test Only |
+|---|---|
+| New user-facing flow (signup, checkout) | Pure utility function |
+| UI interaction (drag-drop, modal, form) | API endpoint logic |
+| Client-side routing / navigation | Data transformation |
+| Visual state changes (loading, error, empty) | Business rule validation |
+| Integration between frontend and API | Database queries |
+- **If both:** Write the unit test FIRST, then the browser test
+- **State your reasoning** — challengers will attack this decision
+- **When unsure:** Write the browser test. Better to have it and not need it.
+### Browser TDD Cycle
+Follow the same RED-GREEN-REFACTOR discipline but with Playwright:
+1. **RED:** Write `.spec.ts` with user behavior assertions + console/network checks
+2. **Verify RED:** Run `{execCommand} playwright test` — must fail for the RIGHT reason
+3. **GREEN:** Implement feature → test passes
+4. **Verify GREEN:** Run FULL suite (unit + browser) → all green
+5. **REFACTOR:** Clean up → re-run all
+Use `raid-browser-playwright` for detailed guidance. Invoke `raid-browser` for pre-flight and boot.
+### "Tests pass" = Unit AND Browser Tests
+When claiming tests pass, both must pass:
+- Unit: test command from `raid.json`
+- Browser: `{execCommand} playwright test`
 ## Adversarial Test Review
 After TDD cycle, challengers attack the TESTS directly — and build on each other's critiques:
@@ -89,9 +125,15 @@ After TDD cycle, challengers attack the TESTS directly — and build on each oth
 5. **Would this catch a regression?** If someone changes the implementation next month, does this test catch the break?
 **Challengers interact directly:**
-- `⚔️ CHALLENGE: @Warrior, your test at line 15 only validates the happy path — here's an input that passes with a broken implementation: ...`
-- `🔗 BUILDING ON @Archer: Your edge case finding — the same gap exists in the error path test at line 32...`
-- `🔥 ROAST: @Rogue, you claimed the test is implementation-dependent but renaming the internal method doesn't break it — here's proof: ...`
+- `CHALLENGE: @Warrior, your test at line 15 only validates the happy path — here's an input that passes with a broken implementation: ...`
+- `BUILDING: @Archer, your edge case finding — the same gap exists in the error path test at line 32...`
+- `CHALLENGE: @Rogue, you claimed the test is implementation-dependent but renaming the internal method doesn't break it — here's proof: ...`
+**Browser-specific attacks (when `browser.enabled`):**
+6. **This is a user-facing feature but you only wrote unit tests — where's the browser test?** If the user interacts with it in a browser, it needs a browser test.
+7. **Your browser test checks the DOM but doesn't assert on console errors or network health.** Infrastructure assertions are mandatory.
+8. **You tested at desktop width only — what about mobile?** Responsive behavior is Important severity.
 **Challengers don't just report to the Wizard — they fight each other over test quality.**

package/template/.claude/skills/raid-verification/SKILL.md CHANGED Viewed

@@ -53,6 +53,17 @@ This gate applies to EVERY status claim: "tests pass", "bug fixed", "feature com
 | "Feature complete" | All acceptance criteria verified with evidence | Self-assessment without running tests |
 | "Regression test works" | Red-green cycle verified | Test passes once without seeing it fail |
+### Browser Verification (when `browser.enabled` in raid.json)
+"Tests pass" means BOTH unit and browser tests pass:
+| Claim | Requires |
+|---|---|
+| "Tests pass" | Unit test command output: 0 failures AND `{execCommand} playwright test`: 0 failures |
+| "Feature complete" | All acceptance criteria verified WITH browser test evidence |
+If the project's test command doesn't include Playwright, the agent MUST run it separately and report both results.
 ## Forbidden Phrases Without Evidence
 These phrases are NEVER allowed without preceding verification output:
@@ -74,7 +85,7 @@ The implementer's claim is NOT sufficient. Challengers verify AND cross-check ea
 1. **Implementer verifies** — runs tests, reports with evidence (command + output)
 2. **Challenger 1 verifies independently** — runs same tests, confirms output matches
 3. **Challenger 2 verifies adversarially** — runs tests PLUS tries to break it with edge cases
-4. **Challengers cross-check each other:** `@Archer, you said tests pass but did you run the full suite or just the changed files?` / `🔗 BUILDING ON @Warrior: Your verification missed the integration test at...`
+4. **Challengers cross-check each other:** `@Archer, you said tests pass but did you run the full suite or just the changed files?` / `BUILDING: @Warrior, your verification missed the integration test at...`
 Only after all required verifications confirm — and challengers have cross-checked each other — does the Wizard accept.