npm - start-vibing - Versions diffs - 4.3.4 → 4.4.1 - Mend

start-vibing 4.3.4 → 4.4.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (25) hide show

package/package.json CHANGED Viewed

@@ -1,7 +1,7 @@
 {
 	"name": "start-vibing",
-	"version": "4.3.4",
-	"description": "Setup Claude Code with 9 plugins, 6 community skills, and 8 MCP servers. Parallel install, auto-accept, superpowers + ralph-loop. super-design 0.6.4: Baymard verbatim rules backfilled — search (12), filter (10), breadcrumbs (6), PDP (18) now enumerated in audit-methodology.md with rule IDs + source URLs (joining the existing cc/14 + addr/8 sets), plus docs/research/baymard-public-rules.md catalogs 88 rules with SHOT+QUOTE+SEL+VAL evidence for detectors. DSC-choice typeui selection + harvest-typeui.sh carry over from 0.6.3.",
+	"version": "4.4.1",
+	"description": "Setup Claude Code with 9 plugins, 6 community skills, and 8 MCP servers. Parallel install, auto-accept, superpowers + ralph-loop. e2e-audit 0.2.0 refactor (skill-only, no agents): SessionStart hook + slash command make the skill keyword-invokable (\"e2e audit\", \"roda o e2e\", \"integration test\", \"test coverage gaps\"). Source-first discovery via detect-stack, discover-routes (Next app/pages/Remix/SvelteKit/Nuxt/Astro), discover-api-surface (HTTP handlers, tRPC procedures, GraphQL, server actions, middleware auth), inventory-existing-tests (preserve prior corpus + sha256 drift hash), and detect-uncovered (branch-diff vs origin/main finds changes not covered by existing specs). Report-then-ask between mapping and Playwright run; post-run-feedback report before writing findings. SHOT+TRACE+ASSERT+SOURCE evidence quad per non-meta finding; meta rules (coverage-gap-*, uncovered-*, test-drift, stack-detect, post-run-feedback) exempt. verify-audit.sh enforces schema + quad. Generic (no project leakage). super-design 0.7.0 carries over.",
 	"type": "module",
 	"bin": {
 		"start-vibing": "./dist/cli.js"

package/template/.claude/agents/sd-audit.md CHANGED Viewed

@@ -67,6 +67,20 @@ Read in order:
 Run `.claude/skills/super-design/scripts/discover-routes.sh`. If incremental mode, filter to scope (read `.super-design/sessions/<id>/scope.json`).
+## Step 1.5 — Source-first surface & project-rule discovery (MANDATORY, 0.7.0+)
+Playwright deduction misses internal state (modals never triggered in the tested flow, forms gated behind other forms, parallel/intercepting routes). Source-first discovery reads the repo FIRST and emits two authoritative artifacts that Step 2.5 and Step 3i consume as ground truth.
+```bash
+bash .claude/skills/super-design/scripts/discover-surfaces.sh     > "$SESSION_DIR/surfaces.json"
+bash .claude/skills/super-design/scripts/extract-project-rules.sh > "$SESSION_DIR/project-rules.json"
+```
+- `surfaces.json` — authoritative inventory of modals, forms, triggers, internal nav, Next.js layout/error/loading/not-found/parallel/intercepting routes. Step 2.5 Phase B cross-checks runtime discovery against this list and emits `modal-coverage-gap` / `form-coverage-gap` findings for anything the source declares but Playwright never exercised.
+- `project-rules.json` — parsed FORBIDDEN tables from `CLAUDE.md`/`AGENTS.md`/`.cursorrules`. Applicable rules (audit-scope, not code-level) are consumed by Step 3i.
+Both files MUST exist before Step 2 starts. `verify-audit.sh` warns when either is missing.
 ## Step 2 — Launch audit loop
 For each viewport ∈ [mobile 375×812, tablet 768×1024, desktop 1440×900], for each page in scope:
@@ -385,6 +399,24 @@ Cross-reference the competitor component vocabulary from
 tabs on mobile and the product uses hamburger-only, density score drops AND
 the M1 finding cites the category norm.
+## Step 3i — Project-rule enforcement (MANDATORY, 0.7.0+)
+Iterate the `audit_applicable: true` rules from `project-rules.json` (Step 1.5). These rules are authoritative — the project owner has already codified them as the right answer for this codebase. Each violation fires as a PRIMARY finding with `rule: project-forbidden-<slug>` keyed to the project's own wording.
+```jsonc
+{
+  "id": "F-NNNN",
+  "rule": "project-forbidden-use-cards-on-mobile",
+  "source_rule": { "raw": "Use Cards on mobile", "reason": "Waste space in flex-col", "source_file": "CLAUDE.md" },
+  "template_id": "M2",
+  "viewport": "mobile",
+  "severity": 3,
+  ...
+}
+```
+Do NOT downgrade or tag — project-forbidden rules ARE the rule source, not a bump on another finding. `verify-audit.sh` skips snapshot_quote verification for this rule family (evidence is aggregate, not a single DOM quote).
 ## Step 4 — Write findings
 Append to `docs/super-design/findings/F-NNNN.md` (one file per finding) AND `.super-design/sessions/<id>/findings.json`.

package/template/.claude/commands/e2e-audit.md ADDED Viewed

@@ -0,0 +1,16 @@
+---
+description: Run e2e-audit (source-first integration-test audit + coverage-gap detection)
+---
+Invoke the e2e-audit skill with flags: $ARGUMENTS
+Follow SKILL.md entry flow:
+1. Preflight — detect stack, inventory existing tests, hash for drift
+2. Source-first discovery — routes + api-surface + uncovered (branch diff)
+3. Report-then-ask — show map, STOP for user confirmation
+4. Dev server + Playwright run — capture SHOT+TRACE+ASSERT+SOURCE
+5. Post-run feedback — API errors, RBAC, console, server crashes
+6. Write findings.json + run verify-audit.sh
+7. Return ≤5-sentence summary
+Do not paste the full map into chat.

package/template/.claude/hooks/e2e-audit-session-start.sh ADDED Viewed

@@ -0,0 +1,4 @@
+#!/usr/bin/env bash
+cat <<'EOF'
+{"hookSpecificOutput":{"hookEventName":"SessionStart","additionalContext":"When the user mentions e2e audit, integration test audit, test coverage gaps, 'roda o e2e', 'run the e2e', end-to-end tests, API contract check, RBAC coverage, or says anything about auditing their application's integration tests — you MUST invoke the e2e-audit skill. Do not improvise a test plan, do not manually write specs without running the skill first, do not start Playwright blind. Read .claude/skills/e2e-audit/SKILL.md first, then follow its entry flow. The skill uses source-first discovery (routes.json + api-surface.json + existing-tests.json + uncovered.json) BEFORE touching the browser. It preserves any existing tests/ directory, warns on drift between runs, and emits a post-run-feedback report before writing findings.\n\nAfter mapping finishes, STOP and ask the user before running tests — report-then-ask. Every non-meta finding must carry the SHOT+TRACE+ASSERT+SOURCE evidence quad (screenshot, Playwright trace, literal assertion, source file). Coverage gaps surface as rule=coverage-gap-* or uncovered-* meta findings."}}
+EOF

package/template/.claude/settings.json CHANGED Viewed

@@ -39,6 +39,10 @@
 					{
 						"type": "command",
 						"command": "bash \"$CLAUDE_PROJECT_DIR/.claude/hooks/super-design-session-start.sh\""
+					},
+					{
+						"type": "command",
+						"command": "bash \"$CLAUDE_PROJECT_DIR/.claude/hooks/e2e-audit-session-start.sh\""
 					}
 				]
 			}

package/template/.claude/skills/e2e-audit/SKILL.md ADDED Viewed

@@ -0,0 +1,216 @@
+---
+name: e2e-audit
+version: 0.2.0
+description: Comprehensive E2E audit that maps all routes, APIs, tRPC procedures, middleware auth, and forms from SOURCE first, cross-references against existing tests and the current branch diff, runs Playwright against dev, then reports coverage gaps and problems with a SHOT+TRACE+ASSERT+SOURCE evidence quad. Invoke when the user mentions "e2e audit", "run the e2e", "integration test audit", "test coverage gaps", "roda o e2e", end-to-end tests, API contract check, RBAC coverage, or auditing integration tests. Report-then-ask: stop after mapping, run only on confirmation, emit a post-run-feedback report before writing findings.
+---
+# e2e-audit — source-first integration-test audit
+> **Operating principle:** you cannot audit what you never opened. Playwright traffic logs only cover flows you already know. Read the source first, then drive the browser to close the gaps the source revealed.
+## Entry contract (non-negotiable)
+1. **Mapping before clicking.** Run discovery scripts, write all JSON inventories, then STOP and report. Do NOT spin up the browser before the user confirms scope.
+2. **Existing tests are load-bearing.** If `tests/e2e/` (or equivalent) exists, inventory it FIRST. Reuse fixtures, auth storage state, and page objects. Warn on drift between runs.
+3. **Evidence quad.** Every non-meta finding ships SHOT+TRACE+ASSERT+SOURCE — screenshot path, Playwright trace path, literal assertion string, and implicated source file. Coverage gaps (`rule=coverage-gap-*` / `uncovered-*`) are the only exceptions.
+4. **Dev, not prod.** Always audit against the local dev server. Detect HTML-instead-of-JSON crashes (500 responses that render the Next/Remix error page) and surface them.
+5. **Report-then-ask → run → feedback → findings.** Four gates, in order. Do not merge them.
+## Output layout
+```
+.e2e-audit/<YYYY-MM-DD-HHMMSS>/
+├── stack.json               # detect-stack.sh
+├── routes.json              # discover-routes.sh
+├── api-surface.json         # discover-api-surface.sh
+├── existing-tests.json      # inventory-existing-tests.sh
+├── uncovered.json           # detect-uncovered.sh
+├── map.md                   # human-readable summary of the above
+├── traces/                  # Playwright trace.zip per test
+├── screenshots/             # PNGs per assertion moment
+├── logs/
+│   ├── dev-server.log       # piped stdout+stderr of dev server
+│   └── playwright.log
+├── post-run-feedback.json   # emitted AFTER runs, BEFORE findings
+├── post-run-feedback.md     # human copy
+└── findings.json            # final — schema at findings.schema.json
+```
+## Pipeline
+```
+PREFLIGHT        →  detect-stack, inventory-existing-tests, compute drift-hash
+DISCOVERY        →  discover-routes, discover-api-surface, detect-uncovered
+REPORT-THEN-ASK  →  write map.md, present to user, WAIT for confirmation
+RUN              →  start dev server, tail logs, drive Playwright
+FEEDBACK         →  post-run-feedback.json from logs + trace + console
+FINDINGS         →  findings.json with SHOT+TRACE+ASSERT+SOURCE quad
+VERIFY           →  bash scripts/verify-audit.sh <session_dir>
+```
+---
+## Step 1 — Preflight
+```bash
+SESSION_DIR=".e2e-audit/$(date +%Y-%m-%d-%H%M%S)"
+mkdir -p "$SESSION_DIR/traces" "$SESSION_DIR/screenshots" "$SESSION_DIR/logs"
+bash .claude/skills/e2e-audit/scripts/detect-stack.sh            > "$SESSION_DIR/stack.json"
+bash .claude/skills/e2e-audit/scripts/inventory-existing-tests.sh > "$SESSION_DIR/existing-tests.json"
+```
+**Drift check.** If a previous session exists at `.e2e-audit/.last-hash`, compare `existing-tests.json.hash` against it. On mismatch, surface a `test-drift` meta finding (non-fatal) showing which files were added, removed, or resized. Write the new hash after the run completes.
+**Stack fallback.** If `stack.test_runner == "none"`, emit a `meta` finding prompting the user to install Playwright (`bun add -D @playwright/test`) and stop the pipeline. Do not proceed blind.
+## Step 2 — Source-first discovery
+```bash
+bash .claude/skills/e2e-audit/scripts/discover-routes.sh       > "$SESSION_DIR/routes.json"
+bash .claude/skills/e2e-audit/scripts/discover-api-surface.sh  > "$SESSION_DIR/api-surface.json"
+bash .claude/skills/e2e-audit/scripts/detect-uncovered.sh \
+  "$SESSION_DIR/routes.json" \
+  "$SESSION_DIR/api-surface.json" \
+  "$SESSION_DIR/existing-tests.json" \
+  "${BASE_REF:-origin/main}" > "$SESSION_DIR/uncovered.json"
+```
+Then write `map.md` summarising:
+- **Stack**: framework + router style + test runner + auth providers + ORMs.
+- **Surface counts**: routes, HTTP handlers, tRPC procedures (by auth tier), server actions.
+- **Branch diff**: files changed vs `BASE_REF`; highlight those without test references.
+- **Uncovered**: bulleted list of every item in `uncovered_routes / uncovered_http / uncovered_trpc / uncovered_actions`.
+- **Existing test inventory**: count + hash + drift status.
+## Step 3 — Report-then-ask (HARD STOP)
+Present `map.md` to the user with a short prompt:
+> Mapping complete. Found N routes, M uncovered surfaces, K existing specs. Scope to run:
+>  - (a) uncovered + changed (default, recommended)
+>  - (b) full suite (all existing specs + uncovered surfaces)
+>  - (c) custom subset (user lists paths)
+> Reply with a/b/c before I touch the browser.
+Do NOT proceed to Step 4 without a reply. This is the mandatory report-then-ask gate.
+## Step 4 — Run against dev
+1. **Start dev server in background**, redirect stdout+stderr to `$SESSION_DIR/logs/dev-server.log`:
+   ```bash
+   nohup sh -c "$(jq -r .dev_command "$SESSION_DIR/stack.json")" \
+     > "$SESSION_DIR/logs/dev-server.log" 2>&1 &
+   echo $! > "$SESSION_DIR/logs/dev.pid"
+   ```
+2. **Wait** for `$(jq -r .base_url stack.json)` to respond 200 within 90s. Fail loud if not.
+3. **Auth setup**: if `stack.auth` is non-empty, use/create `storageState` per role. Start from any existing state in `existing-tests.storage_states`; only synthesize new states via explicit user-provided credentials (never read env files and print them). See `references/auth-setup-playbook.md`.
+4. **Spec selection** per Step 3 answer. Prefer existing specs when coverage exists.
+5. **Run Playwright** with tracing forced on:
+   ```bash
+   npx playwright test \
+     --trace=on \
+     --output="$SESSION_DIR/traces" \
+     --reporter=list,json \
+     2>&1 | tee "$SESSION_DIR/logs/playwright.log"
+   ```
+Capture for each test:
+- Screenshot at the key assertion step (`await page.screenshot({path, fullPage: true})`).
+- Trace zip (auto when `--trace=on`).
+- All `page.on('console')` messages with level + URL + line.
+- All responses via `page.on('response')`: filter 4xx/5xx on `/api/` `/v1/` `/trpc/` paths.
+- HTML-instead-of-JSON: any response where `Content-Type: text/html` hits an API path → `server-crash` rule.
+## Step 5 — Post-run feedback
+BEFORE writing `findings.json`, consolidate into `post-run-feedback.json`:
+```jsonc
+{
+  "session": "<session_dir>",
+  "duration_s": 128,
+  "tests_total": 42,
+  "tests_failed": 3,
+  "problems": [
+    { "kind": "api-5xx",         "where": "POST /api/users", "count": 2, "sample_trace": "traces/users-create-1.zip" },
+    { "kind": "console-error",   "where": "dashboard",       "count": 7, "sample": "Uncaught TypeError: Cannot read ..." },
+    { "kind": "rbac-bypass",     "where": "member sees /admin", "count": 1 },
+    { "kind": "server-crash",    "where": "POST /api/x returned text/html 500" },
+    { "kind": "auth-flow-broken","where": "login redirect loop after valid credentials" },
+    { "kind": "dev-server-log",  "where": "unhandledRejection at server:1234" }
+  ],
+  "uncovered_carried_forward": { "routes": 4, "http": 2, "trpc": 9, "actions": 1 }
+}
+```
+Also mirror to `post-run-feedback.md`. Present a short summary to the user; do not dump the full JSON.
+## Step 6 — Write findings
+For every problem in `post-run-feedback.problems` that is tied to a specific failure, emit one finding. Allocate IDs `E2E-0001`, `E2E-0002`, … sequentially.
+- Evidence quad required (`screenshot_path`, `trace_path`, `assertion`, `source_file`).
+- `source_file` must point at the route handler / procedure / action / middleware implicated — not the spec file.
+- Add `http.method`, `http.path`, `http.status`, `http.response_snippet` for api-contract + server-crash findings.
+Meta findings (no evidence quad required):
+- `coverage-gap-routes` / `coverage-gap-http` / `coverage-gap-trpc` / `coverage-gap-actions` — one per non-empty `uncovered.*` array, with the array echoed into `detail`.
+- `test-drift` — emitted by Step 1 when the test-corpus hash changed since last run.
+- `stack-detect` — info-level snapshot of `stack.json` for traceability.
+- `post-run-feedback` — aggregate, links to `post-run-feedback.json`.
+Schema: `.claude/skills/e2e-audit/findings.schema.json`. Validate with `jq --slurpfile schema findings.schema.json` or skip strict validation and lean on `verify-audit.sh`.
+## Step 7 — Verify + persist
+```bash
+bash .claude/skills/e2e-audit/scripts/verify-audit.sh "$SESSION_DIR"
+jq -r '.hash' "$SESSION_DIR/existing-tests.json" > .e2e-audit/.last-hash
+```
+Kill the dev server: `kill "$(cat "$SESSION_DIR/logs/dev.pid")"`.
+## Final response to user
+≤5 sentences. Report: session dir, # findings, # coverage gaps, # problems, and one-line guidance on whether to invoke a fix agent or hand-fix. Do NOT paste `map.md` or `findings.json` bodies.
+---
+## Invocation triggers (already enforced by SessionStart hook)
+Keywords that MUST trigger this skill: `e2e audit`, `roda o e2e`, `run the e2e`, `integration test audit`, `test coverage gaps`, `coverage gap`, `audit my tests`, `api contract check`, `rbac coverage`, `end-to-end tests`. Claude must read this file before improvising a plan.
+## Boundaries (what this skill does NOT do)
+- Does not write fixes. Fix work is out of scope; hand the finding list to a sd-fix-style agent or the user.
+- Does not audit design / UX — that's `super-design`. If the user asked for a UX audit, hand off.
+- Does not run against production. Only local dev. If `stack.base_url` points to prod, refuse.
+- Does not invent credentials. Never read `.env*` files; only use credentials the user provides inline for the session.
+- Does not delete existing tests. Drift is reported, never "resolved" by removing specs.
+## References
+- `references/auth-setup-playbook.md` — storageState + role patterns per auth provider.
+- `references/api-contract-playbook.md` — HTTP-4xx / HTTP-5xx / HTML-instead-of-JSON detection.
+- `references/coverage-gap-playbook.md` — how to translate `uncovered.*` into meta findings + suggested specs.
+- `references/post-run-feedback-playbook.md` — how to consolidate Playwright run signals into feedback.
+## Templates
+- `templates/base-fixture.ts.tpl` — `test.extend` with `apiErrors` + `authenticatedPage` fixtures.
+- `templates/auth-setup.ts.tpl` — globalSetup shape that writes storageState per role.
+- `templates/findings-report.md.tpl` — human-readable summary rendered from findings.json.
+- `templates/post-run-feedback.md.tpl` — the mirror of post-run-feedback.json.
+## Attention points
+- **tRPC v10 vs v11.** Procedure nesting works differently; `createCaller()` exists in both but the router introspection APIs diverge. Treat `discover-api-surface.sh` output as names-only.
+- **Route groups.** Next `(marketing)` style segments are stripped in URL computation; don't emit findings that name the parenthesis.
+- **Parallel & intercepting routes.** `@modal` slots and `(.)photo` shortcuts are surfaces that Playwright can miss; the route discovery already flags them — propose specs that hit them directly.
+- **Middleware.** If `middleware.has_auth_guard == true` and a public matcher exists, any public URL the audit hit should not have triggered auth redirects. Mismatches = findings.
+- **Windows paths.** `.claude/skills/e2e-audit/scripts/*.sh` must run via Git Bash or WSL. If `bash` isn't available, abort with a meta finding; never fall back to half-runs.
+- **Dev server crashes mid-run.** If `dev.pid` exits unexpectedly during Playwright execution, mark all remaining tests as inconclusive and emit `server-crash` findings with the last 40 lines of `dev-server.log` in `detail`.

package/template/.claude/skills/e2e-audit/findings.schema.json ADDED Viewed

@@ -0,0 +1,98 @@
+{
+  "$schema": "http://json-schema.org/draft-07/schema#",
+  "$id": "https://start-vibing.dev/schema/e2e-audit-findings-0.2.0.json",
+  "title": "e2e-audit findings v0.2.0",
+  "description": "Output contract for e2e-audit. Every finding carries an SHOT+TRACE+ASSERT+SOURCE quad; aggregate (meta) findings are exempt.",
+  "type": "array",
+  "items": {
+    "type": "object",
+    "required": ["id", "rule", "severity", "summary", "files_affected"],
+    "additionalProperties": true,
+    "properties": {
+      "id": {
+        "type": "string",
+        "pattern": "^E2E-\\d{4}$",
+        "description": "Stable finding id. Format E2E-NNNN. Allocate sequentially per session."
+      },
+      "rule": {
+        "type": "string",
+        "description": "Machine-matchable rule slug (kebab-case). Examples: auth-flow-broken, api-500, rbac-bypass, console-error, uncovered-route, coverage-gap-trpc, test-drift, post-run-feedback."
+      },
+      "category": {
+        "type": "string",
+        "enum": [
+          "auth",
+          "api-contract",
+          "rbac",
+          "console",
+          "server-crash",
+          "coverage-gap",
+          "test-drift",
+          "flake",
+          "ui-regression",
+          "a11y",
+          "meta"
+        ]
+      },
+      "severity": {
+        "type": "string",
+        "enum": ["critical", "high", "medium", "low", "info"]
+      },
+      "summary": { "type": "string", "minLength": 3 },
+      "detail":  { "type": "string" },
+      "files_affected": {
+        "type": "array",
+        "items": { "type": "string" },
+        "description": "Source files this finding implicates. sd-fix-style guards MUST prevent edits outside this list unless risk is reclassified."
+      },
+      "viewport": {
+        "type": "string",
+        "enum": ["mobile", "tablet", "desktop"],
+        "description": "Optional — only set when a viewport-specific flow reproduced the issue."
+      },
+      "test_file":   { "type": "string", "description": "Path to the spec that reproduced this (if any)." },
+      "test_title":  { "type": "string", "description": "Title of the failing/passing test." },
+      "screenshot_path": { "type": "string", "description": "SHOT — non-empty PNG captured at the assertion moment." },
+      "trace_path":      { "type": "string", "description": "TRACE — Playwright trace.zip or equivalent recording." },
+      "assertion":       { "type": "string", "description": "ASSERT — the literal assertion that fired, e.g. `expect(response.status()).toBe(200)` observed 500." },
+      "source_file":     { "type": "string", "description": "SOURCE — the source file the assertion implicates (router.ts, page.tsx, action.ts, ...)." },
+      "source_quote":    { "type": "string", "description": "Optional verbatim quote from source_file explaining the implicated code." },
+      "http": {
+        "type": "object",
+        "description": "Optional — populated for api-contract / server-crash rules.",
+        "properties": {
+          "method":   { "type": "string" },
+          "path":     { "type": "string" },
+          "status":   { "type": "integer" },
+          "response_snippet": { "type": "string", "maxLength": 512 },
+          "content_type":     { "type": "string" }
+        }
+      },
+      "console_messages": {
+        "type": "array",
+        "items": {
+          "type": "object",
+          "required": ["level", "text"],
+          "properties": {
+            "level": { "type": "string", "enum": ["log", "warning", "error", "debug", "info"] },
+            "text":  { "type": "string" },
+            "url":   { "type": "string" },
+            "line":  { "type": "integer" }
+          }
+        }
+      },
+      "suggested_fix": {
+        "type": "object",
+        "description": "Advisory only. sd-fix-style agents may consume this.",
+        "properties": {
+          "kind":  { "type": "string", "enum": ["add-test", "fix-route", "fix-zod", "fix-rbac", "fix-auth", "add-fixture", "advisory"] },
+          "notes": { "type": "string" },
+          "files": { "type": "array", "items": { "type": "string" } }
+        }
+      }
+    }
+  }
+}

package/template/.claude/skills/e2e-audit/references/api-contract-playbook.md ADDED Viewed

@@ -0,0 +1,66 @@
+# api-contract-playbook (e2e-audit 0.2.0)
+> How to turn Playwright's network observations into contract findings.
+## Observations to capture per test
+Wire these fixtures once (see `templates/base-fixture.ts.tpl`):
+- `page.on('response')` — filter by `url` prefix matching `/api/`, `/v1/`, `/trpc/`.
+- `page.on('console')` — levels `error` and `warning`.
+- `page.on('pageerror')` — unhandled runtime errors in the SPA.
+- `request.response()` — for explicit `page.request` calls in specs.
+## Detection rules
+### HTTP 4xx on a flow that should succeed
+- **Rule:** `api-4xx`
+- **Severity:** `high` (user-blocking) or `medium` (inconsistent UX)
+- **Signal:** response status in [400, 499] on a request the test expected to succeed.
+- **Evidence:** trace zip + screenshot at the assertion moment + response snippet (first 400 chars) + source_file = the route handler or tRPC procedure file.
+### HTTP 5xx anywhere
+- **Rule:** `api-5xx`
+- **Severity:** `critical`
+- **Always fails the test.** Trace zip is mandatory; Playwright's trace viewer will show the request/response payloads.
+### HTML-instead-of-JSON (server crash signal)
+Next.js and most frameworks render an HTML error page when the server throws. Detection:
+```ts
+if (
+  res.status() >= 500 &&
+  (res.headers()['content-type'] || '').includes('text/html') &&
+  /\/api\/|\/trpc\//.test(res.url())
+) emit({ rule: 'server-crash', ... });
+```
+Surface the last ~40 lines of `dev-server.log` in `detail` so the user can see the stack trace without opening the trace.
+### Zod validation gap
+- **Rule:** `zod-validation-missing`
+- **Severity:** `medium`
+- **Signal:** `api-surface.http_routes[].zod_schema_found == false` AND the route accepts `POST`/`PUT`/`PATCH`.
+- **Evidence:** SHOT is waived (meta-ish), but `source_file` is required.
+### RBAC bypass
+- **Rule:** `rbac-bypass`
+- **Severity:** `critical`
+- **Signal:** an endpoint tagged `auth: "protected"` in `api-surface.json` returned 200 for a role that should be rejected.
+- **Evidence:** trace showing the call under the "wrong" storageState + source_file (the middleware or procedure).
+## What NOT to flag as a contract problem
+- 404 on a page navigation the user typed manually — that's a route-missing finding, not a contract one.
+- 401 on an endpoint protected BEFORE login — that's expected behavior; only flag after valid login.
+- Third-party hosts (analytics, stripe, intercom) — ignore by prefix match on `stack.base_url`.
+- `/api/auth/*` during login form submit — transient 4xx (invalid-credentials) is expected; only flag if the happy-path login produced it.
+## Sampling
+To avoid flooding findings, de-dupe by `(method, path, status)` and keep the first occurrence's trace. Add `http.count` for repeats.

package/template/.claude/skills/e2e-audit/references/auth-setup-playbook.md ADDED Viewed

@@ -0,0 +1,78 @@
+# auth-setup-playbook (e2e-audit 0.2.0)
+> How to obtain and reuse authenticated browser state, per auth provider. Goal: one `storageState.json` per role × per audit session, produced exactly once.
+## Principles
+1. **Never read `.env*`** inside this skill. If credentials are needed, ask the user inline. State files live under `$SESSION_DIR/auth/` and are gitignored (caller's responsibility).
+2. **Reuse before synthesize.** If `existing-tests.storage_states` contains files, use them. Check freshness: any file older than 7 days is considered stale; regenerate.
+3. **One role per file.** `owner.json`, `admin.json`, `member.json` — do not collapse roles.
+4. **State files contain secrets.** They must not enter `findings.json`, `map.md`, or any output that ships to the user beyond the session dir.
+## Playwright baseline
+```ts
+// playwright.config.ts
+import { defineConfig, devices } from '@playwright/test';
+export default defineConfig({
+  projects: [
+    {
+      name: 'setup',
+      testMatch: /.*\.setup\.ts/,
+    },
+    {
+      name: 'authed',
+      dependencies: ['setup'],
+      use: {
+        ...devices['Desktop Chrome'],
+        storageState: '.e2e-audit/current/auth/owner.json',
+      },
+    },
+  ],
+});
+```
+## Per-provider recipes
+### next-auth / Auth.js (credentials provider)
+```ts
+// tests/e2e/auth.setup.ts
+import { test as setup } from '@playwright/test';
+setup('authenticate owner', async ({ page }) => {
+  await page.goto('/signin');
+  await page.getByLabel('Email').fill(process.env.E2E_OWNER_EMAIL!);
+  await page.getByLabel('Password').fill(process.env.E2E_OWNER_PASSWORD!);
+  await page.getByRole('button', { name: /sign in/i }).click();
+  await page.waitForURL(/\/(dashboard|home)/);
+  await page.context().storageState({ path: '.e2e-audit/current/auth/owner.json' });
+});
+```
+### Clerk
+Use Clerk's `@clerk/testing` helper. It writes a session via `setupClerkTestingToken()` and then calls `storageState`.
+### better-auth / Lucia
+Same pattern as next-auth: drive the login form, then persist storage state. Both libraries store a session cookie which storageState captures.
+### Supabase (JWT)
+After login, `localStorage` holds the session. `storageState` serializes localStorage so no extra work is needed. If the dev project uses PKCE, ensure the setup runs in a chromium context.
+### Custom (cookie-session / JWT header)
+If the app does not have a login form (API-only auth), seed storage via `page.context().addCookies([...])` using a short-lived token the user pastes in. Never store long-lived tokens in `auth/*.json`.
+## RBAC coverage
+For every role declared in `stack.auth`:
+1. Drive one happy-path login per role.
+2. For each `trpc_procedures[]` with `auth == "protected"`, attempt the call with a role that should be forbidden. Expect 401 or 403. If 200, emit an `rbac-bypass` finding.
+## Global teardown
+Do NOT teardown or delete storageState files at the end of a run — the skill keeps them inside `$SESSION_DIR`, which is the audit's own sandbox.

package/template/.claude/skills/e2e-audit/references/coverage-gap-playbook.md ADDED Viewed

@@ -0,0 +1,95 @@
+# coverage-gap-playbook (e2e-audit 0.2.0)
+> How to translate `uncovered.json` into actionable meta findings + spec suggestions.
+## Input
+`detect-uncovered.sh` produces four uncovered arrays:
+- `uncovered_routes` — user-facing pages the branch changed with no test referencing their URL.
+- `uncovered_http` — REST handlers the branch changed with no test referencing their path.
+- `uncovered_trpc` — tRPC procedures the branch changed with no test referencing their name.
+- `uncovered_actions` — server actions the branch changed with no test referencing their name.
+## One finding per category
+Emit at most four meta findings:
+```json
+{
+  "id": "E2E-00XX",
+  "rule": "coverage-gap-routes",
+  "category": "coverage-gap",
+  "severity": "medium",
+  "summary": "N routes changed on branch without E2E coverage",
+  "detail": "<bulleted list of paths + files>",
+  "files_affected": ["<every file from the uncovered entries>"],
+  "suggested_fix": { "kind": "add-test", "files": ["tests/e2e/<slug>.spec.ts"] }
+}
+```
+Severities:
+- `coverage-gap-trpc`    — `medium` (contract risk)
+- `coverage-gap-http`    — `medium` (contract risk)
+- `coverage-gap-routes`  — `low` if diff is cosmetic, `medium` otherwise
+- `coverage-gap-actions` — `medium`
+## Spec suggestions per surface type
+**Page (route)**
+```ts
+test('GET /users/[id] renders user dashboard', async ({ authenticatedPage }) => {
+  const res = await authenticatedPage.goto('/users/1');
+  await expect(res!.status()).toBeLessThan(400);
+  await expect(authenticatedPage.getByRole('heading')).toBeVisible();
+});
+```
+**HTTP route handler**
+```ts
+test('POST /api/users creates user (200)', async ({ request }) => {
+  const res = await request.post('/api/users', { data: { email: 'u@test', name: 't' } });
+  expect(res.status()).toBe(200);
+  expect((await res.json()).id).toBeTruthy();
+});
+```
+**tRPC procedure**
+```ts
+test('users.create rejects invalid input (400)', async ({ request }) => {
+  const res = await request.post('/api/trpc/users.create?batch=1', {
+    data: { 0: { json: { email: 'not-an-email' } } },
+  });
+  expect(res.status()).toBeGreaterThanOrEqual(400);
+});
+```
+**Server action**
+```ts
+test('createUser action returns redirect', async ({ page }) => {
+  await page.goto('/users/new');
+  await page.getByLabel('Email').fill('u@test');
+  await page.getByRole('button', { name: /create/i }).click();
+  await page.waitForURL(/\/users\/\d+/);
+});
+```
+## What to NOT flag
+- Unchanged surfaces. If a file is not in `uncovered.diff_files`, it did not change on this branch — no finding, even if tests are missing. Coverage auditing of the whole app is out-of-scope for branch-diff mode.
+- `loading.tsx`, `error.tsx`, `layout.tsx` in Next.js — these are treated separately. A coverage gap finding should not fire for them; they are covered transitively by any page that renders through them.
+## suggested_fix files
+For each uncovered entry, suggest a plausible spec filename:
+- `/users/[id]` → `tests/e2e/users-id.spec.ts`
+- `POST /api/users` → `tests/e2e/api-users.spec.ts`
+- `users.create` → `tests/e2e/trpc-users.spec.ts`
+Do not CREATE the files. This skill stops at reporting.