npm - start-vibing - Versions diffs - 4.4.0 → 4.4.2 - Mend

start-vibing 4.4.0 → 4.4.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (58) hide show

package/template/.claude/skills/e2e-audit/SKILL.md CHANGED Viewed

@@ -1,660 +1,216 @@
----
-name: e2e-audit
-description: Comprehensive E2E audit system - discovers all interactive elements on every page, generates page specs, creates fixed Playwright test files, validates UX/UI/security, detects dead code. Use when auditing pages, writing E2E tests, or checking test coverage.
----
-# E2E Audit Skill
-## Purpose
-Systematically audit every page in the Hakutaku Dashboard, discovering all interactive elements and generating fixed Playwright test files that run standalone without AI.
-## Design Spec
-Read `docs/e2e-audit/DESIGN.md` for the full architecture, page list, and patterns.
-## Pre-Audit: Server & Environment Setup
-Before auditing ANY pages, set up monitoring:
-### 1. Start Dev Server with Log Capture
-```bash
-# Start dev server in background, capture logs
-bun run dev > /tmp/hakutaku-dev.log 2>&1 &
-DEV_PID=$!
-# Wait for server to be ready
-until curl -s http://localhost:3000/api/health > /dev/null 2>&1; do sleep 2; done
-# Verify tRPC returns JSON (not HTML error page)
-curl -s "http://localhost:3000/api/v1/user.me?batch=1&input=%7B%7D" | head -1
-# Should show JSON (even if UNAUTHORIZED), NOT "<!DOCTYPE html>"
-```
-### 2. Monitor Server Logs
-Throughout the audit, periodically check server logs for:
-```bash
-# Check for server errors
-tail -50 /tmp/hakutaku-dev.log | grep -i "error\|warn\|fail\|crash\|exception"
-# Check for specific issues
-tail -50 /tmp/hakutaku-dev.log | grep -i "jest worker\|out of memory\|SIGKILL\|ECONNREFUSED"
-```
-**What to look for in server logs:**
-- `Jest worker encountered X child process exceptions` → Dev server crashed, needs restart
-- `ECONNREFUSED` on DB/Redis → Missing env vars or services down
-- `PrismaClientKnownRequestError` → DB schema mismatch
-- `UNAUTHORIZED` / `FORBIDDEN` → Auth issues (may be expected in some test scenarios)
-- Unhandled promise rejections → Real bugs
-- Memory warnings → Performance issue
-### 3. Monitor Console Errors (Playwright)
-On EVERY page navigation, immediately check:
-```
-1. browser_console_messages(level: "error") — capture all errors
-2. Count error toasts visible in snapshot
-3. If errors found:
-   a. Check server logs (tail /tmp/hakutaku-dev.log) to correlate
-   b. Classify: ENV/CONFIG vs BUG vs EXPECTED
-   c. If server is returning HTML instead of JSON → restart dev server
-   d. Document findings in page spec "## Error Analysis" section
-```
-### 4. Visual Validation
-On EVERY page, verify visually:
-- No broken layouts (elements overlapping, overflowing)
-- No missing images (alt text visible instead of image)
-- No untranslated keys (raw `key.path.like.this` visible)
-- No raw error messages shown to users (stack traces, parse errors)
-- Toast messages are appropriate (not storms of identical errors)
-- Loading states transition to content (not stuck loading)
-**Flag as FINDINGS any visual issues** — these are UX bugs even if not security issues.
----
-## Workflow: Auditing a Page
-For each page, follow this exact sequence:
-### 1. RESEARCH
-Before writing any tests, research:
-- OWASP Top 10 issues relevant to this page type
-- Common vulnerabilities for the page's features (e.g., file upload → unrestricted file type, XSS in filename)
-- Library-specific edge cases (e.g., TanStack Query cache issues, shadcn modal focus traps)
-### 2. DISCOVER
-Use Playwright MCP to navigate, snapshot, AND INTERACT with everything:
-```
-1. Navigate to the page URL
-2. Take a snapshot (browser_snapshot)
-3. Check server logs (tail /tmp/hakutaku-dev.log) for server-side errors on this route
-4. Check console errors (browser_console_messages level: "error")
-5. List ALL interactive elements from snapshot
-6. CLICK EVERYTHING — do NOT just snapshot:
-   a. Click every TAB → snapshot each tab panel, note URL changes
-   b. Click every BUTTON → see what happens (modal? toast? redirect?)
-   c. Open every DROPDOWN/COMBOBOX → list all options
-   d. Toggle SIDEBAR expand/collapse → note hidden/revealed elements
-   e. Click every LINK → verify redirect URL is correct
-   f. Open every MODAL → snapshot modal content, list its elements
-   g. Hover elements that look like they have TOOLTIPS → capture text
-   h. Fill INPUTS with test data → check validation, error messages
-   i. Submit FORMS → check success/error toasts, redirects
-7. For each interaction, document:
-   - What was clicked/triggered
-   - What happened (URL change, modal opened, toast appeared, etc.)
-   - Any errors produced (console, server log, visual)
-8. Check states: empty, loading, error, populated
-9. Check RBAC: which elements appear per role
-```
-**CRITICAL**: A snapshot-only audit misses hidden elements (modal contents,
-dropdown options, tab panels, hover states). You MUST interact to discover them.
-### 2.3. FUNCTIONAL TESTING (GUARANTEE IT WORKS)
-Discovery is NOT enough. You MUST **verify that every feature actually works**.
-Clicking a button and seeing it exists is NOT testing — you must confirm the OUTCOME.
-**THE COMPLETENESS RULE**: For EVERY interactive element discovered in step 2,
-you MUST perform at least ONE interaction AND verify the outcome. No exceptions.
-If an element can't be tested (requires external service, auth, etc.), mark it
-`[~]` with a reason — but you MUST attempt it first.
-```
-1. EVERY BUTTON — Click it. What happened?
-   a. Did it navigate? → Verify the destination URL is correct
-   b. Did it open a modal/dialog? → Snapshot and list all elements inside
-   c. Did it trigger a toast? → Read the toast message, verify it's correct
-   d. Did it change state? → Verify the state actually changed (UI + DB)
-   e. Did nothing happen? → That's a BUG — document it
-   f. Is it disabled? → Verify WHY (missing prereq? permission? loading?)
-2. EVERY COMBOBOX/DROPDOWN — Open it and list ALL options.
-   a. Select EACH option one by one
-   b. After each selection, verify the OUTCOME changed:
-      - List count before vs after
-      - Verify displayed items match the filter criteria
-   c. Test "clear/reset" returns to original state
-   d. Test combinations (filter A + filter B together)
-3. EVERY INPUT — Type into it.
-   a. Type a valid value → verify it's accepted
-   b. Type an invalid value → verify error message appears
-   c. Leave it empty and submit → verify required validation
-   d. Test boundary values (min/max length, special chars)
-   e. For search: type known match, partial match, and no-match
-4. EVERY TAB — Click it.
-   a. Verify the tab panel content changes
-   b. Verify the URL updates (if URL-driven tabs)
-   c. Snapshot the NEW content and list its elements
-   d. Repeat steps 1-3 for elements INSIDE each tab panel
-5. EVERY LINK — Click it.
-   a. Verify the destination URL is correct
-   b. Verify the destination page loads (not 404)
-   c. Verify back button returns to original page
-6. MULTI-STEP WIZARDS — Navigate EVERY step of EVERY path.
-   a. For each wizard variant (e.g., each integration type):
-      - Click the variant to select it
-      - Click "Next" to advance to step 2
-      - SNAPSHOT step 2, list ALL elements
-      - Fill required fields in step 2
-      - Click "Next" to advance to step 3
-      - SNAPSHOT step 3, list ALL elements
-      - Continue until you reach the LAST step or hit a blocker
-        (e.g., OAuth requires external service)
-   b. Test "Previous" button at every step (goes back correctly?)
-   c. Test "Cancel" at every step (exits wizard correctly?)
-   d. Test step indicator buttons (can you jump to completed steps?)
-   e. Test validation: try advancing without filling required fields
-   f. If a step requires an external service (OAuth, file upload):
-      - Document what the step UI looks like
-      - Mark it `[~]` with "requires dev.hakutaku.ai"
-      - Still verify the UI elements are present and correct
-7. CRUD OPERATIONS — Verify end-to-end.
-   a. CREATE: Complete the FULL creation flow. Verify the created item
-      appears in the list afterward WITHOUT page refresh.
-   b. EDIT: Modify a field, save, RELOAD the page, verify change persists
-   c. DELETE: Delete YOUR item, verify it disappears from list AND DB
-   d. Test validation: empty forms, invalid data, duplicate names
-8. PAGINATION — If pagination exists:
-   a. Verify count text matches actual items
-   b. Navigate pages if multiple exist
-   c. Verify items don't repeat across pages
-9. SUBMIT FLOWS — Verify the COMPLETE chain, not just the button click.
-   a. For EVERY form/search submit that navigates to another page:
-      - Fill inputs → click submit → wait for navigation
-      - Verify the destination URL is correct
-      - Verify the destination page LOADED FULLY (not blank/error)
-      - Verify the SUBMITTED DATA appears correctly on the destination page
-      - Verify any SIDE EFFECTS completed (e.g., AI response in chat,
-        item appears in list, email sent)
-   b. For search → chat creation flows specifically:
-      - Type query → submit → verify toast → verify navigation to chat page
-      - ON THE CHAT PAGE: verify user message appears AND AI response streams
-      - If AI does NOT respond → that is a BUG, document it immediately
-   c. For create flows (wizard → detail page):
-      - Complete wizard → verify redirect to detail/list page
-      - Verify the created item is VISIBLE on the destination page
-      - Verify the item's data matches what was entered in the wizard
-   d. NEVER mark a submit flow as ✅ if you only verified the button click
-      or the toast — you MUST verify the outcome on the destination page
-```
-**CRITICAL**: Seeing a filter combobox open is NOT testing. You must select each
-option and VERIFY the list changes correctly. Same for search, create, delete, etc.
-**CRITICAL**: "I clicked Next and saw step 2" is NOT testing step 2. You must
-interact with EVERY element in step 2 (fill inputs, open dropdowns, click buttons)
-before advancing to step 3. Each step is its own mini-page that needs full audit.
-**PROJECT-SPECIFIC NOTE (Hakutaku)**: For features requiring external services
-(OAuth integrations, file uploads with S3), use `https://dev.hakutaku.ai` instead
-of `localhost:3000` — Upstash and OAuth callbacks require the real domain. You can
-still verify the UI flow steps on localhost, but actual sync/upload operations need
-the tunnel running. When hitting an OAuth/external blocker, mark the step `[~]` and
-document what UI elements are present — do NOT skip the entire wizard path.
-**NEVER DELETE OR EDIT EXISTING DEV DATA.** When testing CRUD operations:
-- Always CREATE new items for testing (new integration, new chat, new file, etc.)
-- NEVER delete or modify data that already exists in the dev database — other
-  developers may be using it.
-- Only delete/edit items YOU created during the current audit session.
-- If you need specific data states (error, paused, etc.), create new items and
-  put them in that state — don't change existing ones.
-**CLONE STRATEGY FOR OAUTH INTEGRATIONS**: When testing integration types that
-require OAuth credentials you don't have (Google Drive, OneDrive, Notion, Slack,
-GitHub), you can't create new ones through the wizard. Instead:
-1. Check if an existing integration of that type already exists in the org
-2. If it does, CLONE it via the tRPC API or database (create a copy with a
-   test name like "E2E Test Clone - {Type} - {Date}")
-3. Test the CLONE's detail page fully: all tabs (Files, History, Settings),
-   all action buttons (Sync, Force Sync, Pause, Configure, Delete), all
-   settings (access control, sync frequency if applicable)
-4. After testing, DELETE the clone you created
-5. This lets you test detail page functionality for OAuth types without
-   needing actual OAuth credentials
-6. If NO existing integration of that type exists, mark those detail-page
-   tests as `[~]` with "no existing integration to clone"
-### 2.5. ANALYZE ERRORS
-After discovery, collect and classify ALL console errors and toast notifications:
-```
-1. Run browser_console_messages(level: "error") — capture all errors
-2. Check for error toasts visible in the snapshot
-3. Classify each error:
-   ENV/CONFIG — Missing env vars, DB connection, external service down
-     → Document but don't block. Note what env is needed.
-     Example: "500 on /api/billing/status — likely missing STRIPE_SECRET_KEY"
-   BUG — Real code issues that happen regardless of env
-     → Flag as HIGH priority finding. Create fix recommendation.
-     Example: "tRPC returns HTML instead of JSON on batch error — missing error boundary"
-   EXPECTED — Known warnings, dev-only messages, React strict mode
-     → Document and skip.
-     Example: "React DevTools extension warning"
-4. Check for:
-   - Stack traces exposed in UI (security issue)
-   - Sensitive data in error messages (API keys, DB URLs, user data)
-   - Error messages that leak implementation details
-   - Failed API calls that should have fallback/retry
-   - Toast storms (>3 error toasts = bad UX, should consolidate)
-5. Record ALL findings in the page spec under "## Error Analysis"
-6. Add actionable items to the audit report
-```
-This phase is CRITICAL — errors found during discovery often reveal:
-- Missing error boundaries
-- Unhandled API failures
-- Environment-dependent code without fallbacks
-- UX issues (toast spam, unhelpful error messages)
-- Security leaks (stack traces, internal paths in errors)
-### 2.7. TRIANGULATE (DB + Source Code + UI)
-After discovery and error analysis, **cross-reference** what the UI shows against the
-database and the component source code to verify correctness:
-```
-1. READ SOURCE CODE for the page's components:
-   - Find the page component in src/app/(app)/dashboard/{page}/
-   - Read _components/ subfolder for page-specific components
-   - Read src/components/{feature}/ for shared feature components
-   - Identify which tRPC procedures are called (trpc.{router}.{procedure})
-   - Understand expected data flow: tRPC query → component props → UI rendering
-2. CHECK DATABASE for test data:
-   - Query the database (via Prisma/tRPC or MCP) to see what data exists
-   - Verify the UI accurately reflects what's in the DB:
-     * Counts match (e.g., "10 integrations" in UI = 10 rows in DB)
-     * Names/labels match (no stale cache, no wrong field displayed)
-     * Statuses match (e.g., "active" in DB shown as active in UI)
-     * Dates/timestamps are formatted correctly
-   - If data is EMPTY or INSUFFICIENT for testing:
-     a. Check if a seed script exists (schema/seeders/) and enrich it
-     b. OR use the UI flow to CREATE test data (e.g., create an integration,
-        upload a file, start a chat) — this also audits the creation flow
-     c. Document what data was needed and how it was obtained
-3. VERIFY BEHAVIOR matches expectations:
-   - Does clicking "Delete" actually delete from DB? (check before/after)
-   - Does creating an item show it in the list without refresh?
-   - Do filters actually filter the correct data? (cross-check with DB query)
-   - Do pagination counts match total DB records?
-   - Does sorting work correctly? (verify order matches DB ORDER BY)
-   - Do role-based visibility rules match ZenStack policies?
-4. FLAG MISMATCHES as findings:
-   - UI shows data that doesn't exist in DB → stale cache or mock data
-   - DB has data not shown in UI → missing query, wrong filter, permission issue
-   - UI count != DB count → pagination bug, filter leak, or policy issue
-   - Component code expects field X but API returns field Y → type mismatch
-```
-**Why this matters**: A UI can "look correct" while showing wrong data. Triangulating
-DB ↔ Source Code ↔ UI catches data integrity bugs that snapshot-only testing misses.
-### 2.9. VALIDATE CHECKLIST (GATE — BLOCKS GENERATE)
-Before writing ANY test code (POM or spec), the page spec checklist MUST be 100% validated.
-**What "validated" means for each element type:**
-- **Button**: You CLICKED it and documented what happened
-- **Combobox/Dropdown**: You OPENED it, listed ALL options, SELECTED each one
-- **Input**: You TYPED into it and verified validation
-- **Tab**: You CLICKED it and snapshotted the panel content
-- **Link**: You CLICKED it and verified the destination
-- **Modal/Dialog**: You OPENED it and listed all elements inside
-- **Wizard step**: You NAVIGATED to it, interacted with ALL its elements
-- **Action**: You TRIGGERED it and verified the outcome
-**TRIANGULATION REQUIREMENT**: For every page, the checklist MUST include a
-dedicated `## Triangulation` section with THREE explicit sub-checks:
-```markdown
-## Triangulation
-### DB Verification
-- [ ] Count in UI matches DB count (e.g., "11 integrations" = 11 rows in DB)
-- [ ] Status values in UI match DB status column
-- [ ] Action outcomes confirmed in DB (e.g., Pausar → status changed to PAUSED)
-- [ ] Created items appear in DB with correct fields
-### Source Code Verification
-- [ ] Component file identified and read (path noted)
-- [ ] tRPC procedures identified (router.procedure names)
-- [ ] Known bugs traced to source (file:line noted)
-### UI vs DB Mismatch Check
-- [ ] No stale data displayed (UI reflects current DB state)
-- [ ] Filter/search results match what DB query would return
-- [ ] Pagination count matches total DB records for this org
-```
-Use `bunx dotenvx run -f artifacts/.env.local -f artifacts/.env.development -- prisma studio`
-to open Prisma Studio for DB inspection. For quick queries, write a temp script
-and run with `bunx dotenvx run ... -- bun run tmp-query.ts`, then delete it.
-```
-1. Open the page spec at docs/e2e-audit/page-specs/{page-name}.md
-2. Scan for EVERY `[ ]` (unchecked) item
-3. If ANY unchecked items remain:
-   a. DO NOT proceed to step 3 (GENERATE)
-   b. Go back and interact with each unchecked element via Playwright
-   c. After verifying, update the checklist: `[ ]` → `[x]` with ✅
-   d. If an element cannot be verified (missing, broken), mark with ❌
-      and explain why (e.g., "BUG: button not rendered")
-   e. If an element requires external service, mark with `[~]` and explain
-      (e.g., "[~] requires dev.hakutaku.ai for OAuth")
-4. ONLY when the spec has ZERO `[ ]` items may you proceed to GENERATE
-5. Final counts MUST be documented at the top of the spec:
-   - Total elements: N
-   - Validated [x]: N
-   - Blocked [~]: N (with reasons)
-   - Failed [❌]: N (with bug reports)
-```
-**CRITICAL**: Writing tests for elements you haven't actually clicked, opened,
-or interacted with produces unreliable tests. The checklist IS the proof of work.
-A page spec full of `[ ]` means the audit is NOT done — it's just a draft.
-**SELF-CHECK before proceeding**: Read the page spec top to bottom. For each `[x]`
-item, can you recall EXACTLY what happened when you interacted with it? If not,
-you didn't actually test it — you just checked the box. Go back and test it.
-### 3. GENERATE
-Write three files per page:
-#### Page Spec (`docs/e2e-audit/page-specs/{page-name}.md`)
-- Complete inventory of all PAGE-SPECIFIC elements found
-- Checklist format for tracking test coverage
-- Notes on states, permissions, edge cases
-**IMPORTANT — Shared vs Page-Specific Elements:**
-- **Sidebar navigation, user menu, toast region, FAB button** are SHARED layout
-  elements. They appear on EVERY page and are NOT part of the page audit.
-- Audit shared layout elements ONCE in a dedicated `shared-layout.md` spec.
-- Each page spec should ONLY contain elements unique to THAT page's content
-  (inside `<main>`). Do NOT duplicate sidebar links in every page spec.
-- If a shared element behaves DIFFERENTLY on a specific page (e.g., sidebar
-  highlights a different link), note the difference but don't re-audit the
-  entire sidebar.
-#### Page Object Model (`tests/e2e/pages/{page-name}.page.ts`)
-```typescript
-import { type Page, type Locator } from '@playwright/test'
-export class DashboardHomePage {
-  readonly page: Page
-  readonly heading: Locator
-  readonly createButton: Locator
-  constructor(page: Page) {
-    this.page = page
-    this.heading = page.getByRole('heading', { name: 'Home' })
-    this.createButton = page.getByRole('button', { name: 'Create' })
-  }
-  async goto() {
-    await this.page.goto('/dashboard/home')
-  }
-  async waitForLoad() {
-    await this.heading.waitFor()
-  }
-}
-```
-#### Test Spec (`tests/e2e/specs/{page-name}.spec.ts`)
-```typescript
-import { test, expect } from '../fixtures/base'
-import { DashboardHomePage } from '../pages/dashboard-home.page'
-test.describe('Dashboard Home @smoke', () => {
-  let home: DashboardHomePage
-  test.beforeEach(async ({ authenticatedPage, apiErrors: _apiErrors }) => {
-    home = new DashboardHomePage(authenticatedPage)
-    await home.goto()
-    await home.waitForLoad()
-  })
-  test('loads and displays heading', async ({ authenticatedPage }) => {
-    await expect(authenticatedPage, 'Should navigate to home page').toHaveURL(/dashboard\/home/)
-    await expect(home.heading).toBeVisible()
-  })
-  test('displays hero section', async () => {
-    await expect(home.heroTitle, 'Hero section should render after data loads').toBeVisible()
-  })
-})
-```
-### 4. VALIDATE
-Run the generated tests:
-```bash
-bunx playwright test tests/e2e/specs/{page-name}.spec.ts
-```
-Fix any failures before moving on.
-### 5. REPORT
-Update `docs/e2e-audit/reports/master-audit.md` with:
-- Elements found vs tested
-- Security findings
-- UX/UI issues
-- Missing test-ids
-- Accessibility gaps
-## API Error Tracking (MANDATORY)
-Every test MUST use the `apiErrors` fixture to automatically detect and report API failures.
-This replaces generic "TimeoutError: waiting for heading" with actionable messages like
-`"API errors detected: GET /api/v1/integration.list → 500 Internal Server Error"`.
-### How It Works
-The `apiErrors` fixture in `tests/e2e/fixtures/base.ts`:
-1. Listens to ALL network responses on the authenticated page
-2. Captures any 4xx/5xx responses from `/api/` or `/v1/` endpoints
-3. After the test completes, if any API errors were collected, it FAILS the test
-   with a detailed report including method, URL, status code, and status text
-4. Known env-dependent endpoints (billing, ontology) are excluded via `IGNORED_API_PATTERNS`
-### Usage Pattern
-```typescript
-// REQUIRED: Destructure apiErrors in beforeEach (even if unused in test body)
-test.beforeEach(async ({ authenticatedPage, apiErrors: _apiErrors }) => {
-  // apiErrors starts listening immediately — no setup needed
-  page = new SomePage(authenticatedPage)
-  await page.goto()
-  await page.waitForLoad()
-})
-// For standalone tests without beforeEach:
-test('standalone test', async ({ authenticatedPage, apiErrors: _apiErrors }) => {
-  // apiErrors is active for this test
-})
-```
-### Adding Ignored Endpoints
-When an API returns errors due to missing env vars (not real bugs), add the pattern:
-```typescript
-// In tests/e2e/fixtures/base.ts
-const IGNORED_API_PATTERNS = [
-  /\/api\/billing\//,        // Missing Stripe key in dev
-  /ontology\.getGraph/,      // Missing ontology service
-]
-```
-### Custom Assertion Messages (MANDATORY)
-Every `expect()` call MUST include a descriptive second argument explaining what the
-assertion checks and what might be wrong if it fails:
-```typescript
-// BAD — gives useless error: "expected locator to be visible"
-await expect(cards.first()).toBeVisible()
-// GOOD — gives actionable error with debugging hint
-await expect(cards.first(), 'At least 1 integration card should render (is integration.list returning data?)').toBeVisible()
-// BAD — gives "expected 0 to be greater than 0"
-expect(count).toBeGreaterThan(0)
-// GOOD — explains what's missing
-expect(count, 'Integration cards not found — check if tRPC integration.list returns data for this org').toBeGreaterThan(0)
-```
-**Why this matters**: When a test fails in CI, the developer sees the custom message
-immediately — no need to reproduce locally or dig through Playwright traces.
-## Test Categories per Page
-Every page MUST have tests for:
-### Navigation
-- All links resolve (no 404s)
-- Breadcrumbs correct
-- Back/forward browser navigation works
-### Interactions
-- All buttons clickable and produce expected result
-- All modals open/close correctly
-- All forms submit with valid data
-- All dropdowns open and select options
-### Validation
-- Required fields show errors when empty
-- Invalid input shows appropriate error
-- Max length enforced
-- Special characters handled
-### UX/UI
-- Loading states display (skeletons/spinners)
-- Empty states display when no data
-- Error states display on failure
-- Toasts appear with correct message and type
-- Tooltips show on hover
-- Focus management (modals trap focus, inputs auto-focus)
-### Security
-- Console has no sensitive data leaks
-- XSS payloads in inputs don't execute
-- RBAC: unauthorized roles can't access/see restricted elements
-- No stack traces in UI error messages
-## Locator Strategy
-Priority order:
-1. `getByRole()` — semantic, accessible
-2. `getByText()` — visible text
-3. `getByLabel()` — form fields
-4. `getByPlaceholder()` — inputs
-5. `getByTestId()` — last resort (flag missing semantic labels)
-**NEVER use CSS selectors** (`.class`, `#id`, `div > span`).
-## File Structure
-```
-tests/e2e/
-├── fixtures/
-│   ├── auth.ts              # Storage state paths + UserRole type
-│   ├── auth.setup.ts        # Auth setup (preserves valid sessions)
-│   ├── base.ts              # Extended fixtures (authenticatedPage, apiErrors)
-│   └── storage/             # Auth state files (gitignored)
-│       ├── owner.json
-│       ├── admin.json
-│       ├── manager.json
-│       └── member.json
-├── pages/                   # Page Object Models
-│   ├── dashboard-home.page.ts
-│   ├── dashboard-integrations.page.ts
-│   ├── dashboard-teams.page.ts
-│   └── ...
-├── specs/                   # Test specifications
-│   ├── dashboard-home.spec.ts
-│   ├── dashboard-integrations.spec.ts
-│   ├── dashboard-teams.spec.ts
-│   ├── security/
-│   │   ├── rbac.spec.ts
-│   │   └── headers.spec.ts
-│   └── ...
-└── utils/
-    ├── console-collector.ts # Console message interceptor + sensitive data scanner
-    ├── security-helpers.ts  # XSS payloads, header checks
-    └── test-data.ts         # Shared test data
-```
-## Running
-```bash
-# All tests
-bun run test:e2e
-# Smoke tests only
-bunx playwright test --grep @smoke
-# Security tests only
-bunx playwright test --grep @security
-# Specific page
-bunx playwright test tests/e2e/specs/dashboard-home.spec.ts
-# UI mode (visual debugging)
-bunx playwright test --ui
-```
+---
+name: e2e-audit
+version: 0.2.0
+description: Comprehensive E2E audit that maps all routes, APIs, tRPC procedures, middleware auth, and forms from SOURCE first, cross-references against existing tests and the current branch diff, runs Playwright against dev, then reports coverage gaps and problems with a SHOT+TRACE+ASSERT+SOURCE evidence quad. Invoke when the user mentions "e2e audit", "run the e2e", "integration test audit", "test coverage gaps", "roda o e2e", end-to-end tests, API contract check, RBAC coverage, or auditing integration tests. Report-then-ask: stop after mapping, run only on confirmation, emit a post-run-feedback report before writing findings.
+---
+# e2e-audit — source-first integration-test audit
+> **Operating principle:** you cannot audit what you never opened. Playwright traffic logs only cover flows you already know. Read the source first, then drive the browser to close the gaps the source revealed.
+## Entry contract (non-negotiable)
+1. **Mapping before clicking.** Run discovery scripts, write all JSON inventories, then STOP and report. Do NOT spin up the browser before the user confirms scope.
+2. **Existing tests are load-bearing.** If `tests/e2e/` (or equivalent) exists, inventory it FIRST. Reuse fixtures, auth storage state, and page objects. Warn on drift between runs.
+3. **Evidence quad.** Every non-meta finding ships SHOT+TRACE+ASSERT+SOURCE — screenshot path, Playwright trace path, literal assertion string, and implicated source file. Coverage gaps (`rule=coverage-gap-*` / `uncovered-*`) are the only exceptions.
+4. **Dev, not prod.** Always audit against the local dev server. Detect HTML-instead-of-JSON crashes (500 responses that render the Next/Remix error page) and surface them.
+5. **Report-then-ask → run → feedback → findings.** Four gates, in order. Do not merge them.
+## Output layout
+```
+.e2e-audit/<YYYY-MM-DD-HHMMSS>/
+├── stack.json               # detect-stack.sh
+├── routes.json              # discover-routes.sh
+├── api-surface.json         # discover-api-surface.sh
+├── existing-tests.json      # inventory-existing-tests.sh
+├── uncovered.json           # detect-uncovered.sh
+├── map.md                   # human-readable summary of the above
+├── traces/                  # Playwright trace.zip per test
+├── screenshots/             # PNGs per assertion moment
+├── logs/
+│   ├── dev-server.log       # piped stdout+stderr of dev server
+│   └── playwright.log
+├── post-run-feedback.json   # emitted AFTER runs, BEFORE findings
+├── post-run-feedback.md     # human copy
+└── findings.json            # final — schema at findings.schema.json
+```
+## Pipeline
+```
+PREFLIGHT        →  detect-stack, inventory-existing-tests, compute drift-hash
+DISCOVERY        →  discover-routes, discover-api-surface, detect-uncovered
+REPORT-THEN-ASK  →  write map.md, present to user, WAIT for confirmation
+RUN              →  start dev server, tail logs, drive Playwright
+FEEDBACK         →  post-run-feedback.json from logs + trace + console
+FINDINGS         →  findings.json with SHOT+TRACE+ASSERT+SOURCE quad
+VERIFY           →  bash scripts/verify-audit.sh <session_dir>
+```
+---
+## Step 1 — Preflight
+```bash
+SESSION_DIR=".e2e-audit/$(date +%Y-%m-%d-%H%M%S)"
+mkdir -p "$SESSION_DIR/traces" "$SESSION_DIR/screenshots" "$SESSION_DIR/logs"
+bash .claude/skills/e2e-audit/scripts/detect-stack.sh            > "$SESSION_DIR/stack.json"
+bash .claude/skills/e2e-audit/scripts/inventory-existing-tests.sh > "$SESSION_DIR/existing-tests.json"
+```
+**Drift check.** If a previous session exists at `.e2e-audit/.last-hash`, compare `existing-tests.json.hash` against it. On mismatch, surface a `test-drift` meta finding (non-fatal) showing which files were added, removed, or resized. Write the new hash after the run completes.
+**Stack fallback.** If `stack.test_runner == "none"`, emit a `meta` finding prompting the user to install Playwright (`bun add -D @playwright/test`) and stop the pipeline. Do not proceed blind.
+## Step 2 — Source-first discovery
+```bash
+bash .claude/skills/e2e-audit/scripts/discover-routes.sh       > "$SESSION_DIR/routes.json"
+bash .claude/skills/e2e-audit/scripts/discover-api-surface.sh  > "$SESSION_DIR/api-surface.json"
+bash .claude/skills/e2e-audit/scripts/detect-uncovered.sh \
+  "$SESSION_DIR/routes.json" \
+  "$SESSION_DIR/api-surface.json" \
+  "$SESSION_DIR/existing-tests.json" \
+  "${BASE_REF:-origin/main}" > "$SESSION_DIR/uncovered.json"
+```
+Then write `map.md` summarising:
+- **Stack**: framework + router style + test runner + auth providers + ORMs.
+- **Surface counts**: routes, HTTP handlers, tRPC procedures (by auth tier), server actions.
+- **Branch diff**: files changed vs `BASE_REF`; highlight those without test references.
+- **Uncovered**: bulleted list of every item in `uncovered_routes / uncovered_http / uncovered_trpc / uncovered_actions`.
+- **Existing test inventory**: count + hash + drift status.
+## Step 3 — Report-then-ask (HARD STOP)
+Present `map.md` to the user with a short prompt:
+> Mapping complete. Found N routes, M uncovered surfaces, K existing specs. Scope to run:
+>  - (a) uncovered + changed (default, recommended)
+>  - (b) full suite (all existing specs + uncovered surfaces)
+>  - (c) custom subset (user lists paths)
+> Reply with a/b/c before I touch the browser.
+Do NOT proceed to Step 4 without a reply. This is the mandatory report-then-ask gate.
+## Step 4 — Run against dev
+1. **Start dev server in background**, redirect stdout+stderr to `$SESSION_DIR/logs/dev-server.log`:
+   ```bash
+   nohup sh -c "$(jq -r .dev_command "$SESSION_DIR/stack.json")" \
+     > "$SESSION_DIR/logs/dev-server.log" 2>&1 &
+   echo $! > "$SESSION_DIR/logs/dev.pid"
+   ```
+2. **Wait** for `$(jq -r .base_url stack.json)` to respond 200 within 90s. Fail loud if not.
+3. **Auth setup**: if `stack.auth` is non-empty, use/create `storageState` per role. Start from any existing state in `existing-tests.storage_states`; only synthesize new states via explicit user-provided credentials (never read env files and print them). See `references/auth-setup-playbook.md`.
+4. **Spec selection** per Step 3 answer. Prefer existing specs when coverage exists.
+5. **Run Playwright** with tracing forced on:
+   ```bash
+   npx playwright test \
+     --trace=on \
+     --output="$SESSION_DIR/traces" \
+     --reporter=list,json \
+     2>&1 | tee "$SESSION_DIR/logs/playwright.log"
+   ```
+Capture for each test:
+- Screenshot at the key assertion step (`await page.screenshot({path, fullPage: true})`).
+- Trace zip (auto when `--trace=on`).
+- All `page.on('console')` messages with level + URL + line.
+- All responses via `page.on('response')`: filter 4xx/5xx on `/api/` `/v1/` `/trpc/` paths.
+- HTML-instead-of-JSON: any response where `Content-Type: text/html` hits an API path → `server-crash` rule.
+## Step 5 — Post-run feedback
+BEFORE writing `findings.json`, consolidate into `post-run-feedback.json`:
+```jsonc
+{
+  "session": "<session_dir>",
+  "duration_s": 128,
+  "tests_total": 42,
+  "tests_failed": 3,
+  "problems": [
+    { "kind": "api-5xx",         "where": "POST /api/users", "count": 2, "sample_trace": "traces/users-create-1.zip" },
+    { "kind": "console-error",   "where": "dashboard",       "count": 7, "sample": "Uncaught TypeError: Cannot read ..." },
+    { "kind": "rbac-bypass",     "where": "member sees /admin", "count": 1 },
+    { "kind": "server-crash",    "where": "POST /api/x returned text/html 500" },
+    { "kind": "auth-flow-broken","where": "login redirect loop after valid credentials" },
+    { "kind": "dev-server-log",  "where": "unhandledRejection at server:1234" }
+  ],
+  "uncovered_carried_forward": { "routes": 4, "http": 2, "trpc": 9, "actions": 1 }
+}
+```
+Also mirror to `post-run-feedback.md`. Present a short summary to the user; do not dump the full JSON.
+## Step 6 — Write findings
+For every problem in `post-run-feedback.problems` that is tied to a specific failure, emit one finding. Allocate IDs `E2E-0001`, `E2E-0002`, … sequentially.
+- Evidence quad required (`screenshot_path`, `trace_path`, `assertion`, `source_file`).
+- `source_file` must point at the route handler / procedure / action / middleware implicated — not the spec file.
+- Add `http.method`, `http.path`, `http.status`, `http.response_snippet` for api-contract + server-crash findings.
+Meta findings (no evidence quad required):
+- `coverage-gap-routes` / `coverage-gap-http` / `coverage-gap-trpc` / `coverage-gap-actions` — one per non-empty `uncovered.*` array, with the array echoed into `detail`.
+- `test-drift` — emitted by Step 1 when the test-corpus hash changed since last run.
+- `stack-detect` — info-level snapshot of `stack.json` for traceability.
+- `post-run-feedback` — aggregate, links to `post-run-feedback.json`.
+Schema: `.claude/skills/e2e-audit/findings.schema.json`. Validate with `jq --slurpfile schema findings.schema.json` or skip strict validation and lean on `verify-audit.sh`.
+## Step 7 — Verify + persist
+```bash
+bash .claude/skills/e2e-audit/scripts/verify-audit.sh "$SESSION_DIR"
+jq -r '.hash' "$SESSION_DIR/existing-tests.json" > .e2e-audit/.last-hash
+```
+Kill the dev server: `kill "$(cat "$SESSION_DIR/logs/dev.pid")"`.
+## Final response to user
+≤5 sentences. Report: session dir, # findings, # coverage gaps, # problems, and one-line guidance on whether to invoke a fix agent or hand-fix. Do NOT paste `map.md` or `findings.json` bodies.
+---
+## Invocation triggers (already enforced by SessionStart hook)
+Keywords that MUST trigger this skill: `e2e audit`, `roda o e2e`, `run the e2e`, `integration test audit`, `test coverage gaps`, `coverage gap`, `audit my tests`, `api contract check`, `rbac coverage`, `end-to-end tests`. Claude must read this file before improvising a plan.
+## Boundaries (what this skill does NOT do)
+- Does not write fixes. Fix work is out of scope; hand the finding list to a sd-fix-style agent or the user.
+- Does not audit design / UX — that's `super-design`. If the user asked for a UX audit, hand off.
+- Does not run against production. Only local dev. If `stack.base_url` points to prod, refuse.
+- Does not invent credentials. Never read `.env*` files; only use credentials the user provides inline for the session.
+- Does not delete existing tests. Drift is reported, never "resolved" by removing specs.
+## References
+- `references/auth-setup-playbook.md` — storageState + role patterns per auth provider.
+- `references/api-contract-playbook.md` — HTTP-4xx / HTTP-5xx / HTML-instead-of-JSON detection.
+- `references/coverage-gap-playbook.md` — how to translate `uncovered.*` into meta findings + suggested specs.
+- `references/post-run-feedback-playbook.md` — how to consolidate Playwright run signals into feedback.
+## Templates
+- `templates/base-fixture.ts.tpl` — `test.extend` with `apiErrors` + `authenticatedPage` fixtures.
+- `templates/auth-setup.ts.tpl` — globalSetup shape that writes storageState per role.
+- `templates/findings-report.md.tpl` — human-readable summary rendered from findings.json.
+- `templates/post-run-feedback.md.tpl` — the mirror of post-run-feedback.json.
+## Attention points
+- **tRPC v10 vs v11.** Procedure nesting works differently; `createCaller()` exists in both but the router introspection APIs diverge. Treat `discover-api-surface.sh` output as names-only.
+- **Route groups.** Next `(marketing)` style segments are stripped in URL computation; don't emit findings that name the parenthesis.
+- **Parallel & intercepting routes.** `@modal` slots and `(.)photo` shortcuts are surfaces that Playwright can miss; the route discovery already flags them — propose specs that hit them directly.
+- **Middleware.** If `middleware.has_auth_guard == true` and a public matcher exists, any public URL the audit hit should not have triggered auth redirects. Mismatches = findings.
+- **Windows paths.** `.claude/skills/e2e-audit/scripts/*.sh` must run via Git Bash or WSL. If `bash` isn't available, abort with a meta finding; never fall back to half-runs.
+- **Dev server crashes mid-run.** If `dev.pid` exits unexpectedly during Playwright execution, mark all remaining tests as inconclusive and emit `server-crash` findings with the last 40 lines of `dev-server.log` in `detail`.