start-vibing 4.1.0 → 4.1.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (67) hide show
  1. package/package.json +1 -1
  2. package/template/.claude/CLAUDE.md +86 -20
  3. package/template/.claude/agents/sd-audit.md +197 -0
  4. package/template/.claude/agents/sd-fix-verify-semantic.md +112 -0
  5. package/template/.claude/agents/sd-fix-verify-technical.md +36 -0
  6. package/template/.claude/agents/sd-fix.md +194 -0
  7. package/template/.claude/agents/sd-research.md +61 -0
  8. package/template/.claude/agents/sd-synthesis.md +74 -0
  9. package/template/.claude/commands/super-design.md +15 -0
  10. package/template/.claude/hooks/super-design-session-start.sh +4 -0
  11. package/template/.claude/settings.json +14 -0
  12. package/template/.claude/skills/codebase-knowledge/SKILL.md +145 -0
  13. package/template/.claude/skills/codebase-knowledge/TEMPLATE.md +35 -0
  14. package/template/.claude/skills/codebase-knowledge/domains/claude-system.md +93 -0
  15. package/template/.claude/skills/composition-patterns/SKILL.md +89 -0
  16. package/template/.claude/skills/docs-tracker/SKILL.md +239 -0
  17. package/template/.claude/skills/mcp-builder/SKILL.md +236 -0
  18. package/template/.claude/skills/quality-gate/scripts/check-all.sh +83 -0
  19. package/template/.claude/skills/react-best-practices/SKILL.md +146 -0
  20. package/template/.claude/skills/security-scan/reference/owasp-top-10.md +257 -0
  21. package/template/.claude/skills/security-scan/scripts/scan.py +190 -0
  22. package/template/.claude/skills/super-design/README.md +37 -0
  23. package/template/.claude/skills/super-design/SKILL.md +105 -0
  24. package/template/.claude/skills/super-design/hooks/guard-paths.py +35 -0
  25. package/template/.claude/skills/super-design/hooks/post-edit-lint.py +57 -0
  26. package/template/.claude/skills/super-design/references/audit-methodology.md +513 -0
  27. package/template/.claude/skills/super-design/references/change-detection-playbook.md +1432 -0
  28. package/template/.claude/skills/super-design/references/design-theory.md +706 -0
  29. package/template/.claude/skills/super-design/references/fix-agent-playbook.md +118 -0
  30. package/template/.claude/skills/super-design/references/market-research-playbook.md +773 -0
  31. package/template/.claude/skills/super-design/references/playwright-mcp-reference.md +1057 -0
  32. package/template/.claude/skills/super-design/references/skills-subagents-reference.md +784 -0
  33. package/template/.claude/skills/super-design/references/superpowers-and-distribution.md +136 -0
  34. package/template/.claude/skills/super-design/scripts/detect-changes.sh +61 -0
  35. package/template/.claude/skills/super-design/scripts/diff-tokens.sh +13 -0
  36. package/template/.claude/skills/super-design/scripts/discover-routes.sh +45 -0
  37. package/template/.claude/skills/super-design/scripts/extract-tokens.mjs +41 -0
  38. package/template/.claude/skills/super-design/scripts/hash-pages.sh +42 -0
  39. package/template/.claude/skills/super-design/scripts/validate-state.sh +15 -0
  40. package/template/.claude/skills/super-design/scripts/verify-audit.sh +19 -0
  41. package/template/.claude/skills/super-design/templates/audit-state.schema.json +57 -0
  42. package/template/.claude/skills/super-design/templates/findings.schema.json +57 -0
  43. package/template/.claude/skills/super-design/templates/fix-history.md.tpl +26 -0
  44. package/template/.claude/skills/super-design/templates/overview.md.tpl +52 -0
  45. package/template/.claude/skills/test-coverage/reference/playwright-patterns.md +260 -0
  46. package/template/.claude/skills/test-coverage/scripts/coverage-check.sh +52 -0
  47. package/template/.claude/skills/typeui-ant/SKILL.md +133 -0
  48. package/template/.claude/skills/typeui-application/SKILL.md +128 -0
  49. package/template/.claude/skills/typeui-artistic/SKILL.md +133 -0
  50. package/template/.claude/skills/typeui-bento/SKILL.md +127 -0
  51. package/template/.claude/skills/typeui-bold/SKILL.md +127 -0
  52. package/template/.claude/skills/typeui-clean/SKILL.md +128 -0
  53. package/template/.claude/skills/typeui-dashboard/SKILL.md +133 -0
  54. package/template/.claude/skills/typeui-doodle/SKILL.md +142 -0
  55. package/template/.claude/skills/typeui-dramatic/SKILL.md +127 -0
  56. package/template/.claude/skills/typeui-enterprise/SKILL.md +132 -0
  57. package/template/.claude/skills/typeui-neobrutalism/SKILL.md +127 -0
  58. package/template/.claude/skills/typeui-paper/SKILL.md +127 -0
  59. package/template/.claude/skills/ui-ux-audit/QUICK-START.md +450 -0
  60. package/template/.claude/skills/ui-ux-audit/README.md +470 -0
  61. package/template/.claude/skills/ui-ux-audit/templates/audit-report.md +591 -0
  62. package/template/.claude/skills/ui-ux-audit/templates/competitor-analysis.md +363 -0
  63. package/template/.claude/skills/ui-ux-audit/templates/component-spec.md +491 -0
  64. package/template/.claude/skills/ui-ux-audit/templates/improvement-recommendation.md +450 -0
  65. package/template/.claude/skills/web-design-guidelines/SKILL.md +39 -0
  66. package/template/.claude/skills/webapp-testing/SKILL.md +96 -0
  67. package/template/.claude/skills/workflow-state/workflow-state.json +77 -0
@@ -0,0 +1,1057 @@
1
+ # Playwright MCP inside Claude Code: A production reference for UX-audit subagents
2
+
3
+ Microsoft's **`@playwright/mcp`** is the right choice for design/UX audit subagents in Claude Code (terminal, v2.1+), but only if you pin a known-good version, say "Playwright MCP" explicitly in the first turn, and enforce a SHOW-YOUR-WORK evidence protocol. This reference consolidates verified behavior from the official README, npm registry, GitHub issue #1359, Simon Willison's TIL, Playwright's device registry, and three real community subagent examples. The dominant pitfalls are not API gaps — they are **per-snapshot ref churn**, **inline screenshot token explosions (4× more than CLI)**, and Claude Code's tendency to reach for Bash instead of the MCP tools when the prompt is vague. Everything below is structured so you can copy-paste into a shipping skill.
4
+
5
+ Verified state as of **April 18, 2026**: `@playwright/mcp@0.0.70` is current on npm (306 versions published); issue #1359 — the notorious "No such tool available: mcp__playwright__browser_navigate" bug — is **closed**, with `0.0.41` as the validated fallback pin. Claude Code 2.0.1 through 2.1.22 are all compatible with a correctly pinned server.
6
+
7
+ ---
8
+
9
+ ## 1. Installation and configuration for Claude Code
10
+
11
+ ### Canonical install (straight from the Microsoft README)
12
+
13
+ ```bash
14
+ claude mcp add playwright npx @playwright/mcp@latest
15
+ ```
16
+
17
+ Node 18+ required. This works because `claude mcp add` reads `npx` as the command and `@playwright/mcp@latest` as its first arg. When you start passing flags that could be confused with `claude mcp add`'s own flags, use the `--` separator:
18
+
19
+ ```bash
20
+ # With flags passed through to the MCP server:
21
+ claude mcp add playwright -- npx @playwright/mcp@latest --headless --viewport-size "1440x900"
22
+ claude mcp add playwright -- npx @playwright/mcp@latest --device "iPhone 15"
23
+ claude mcp add playwright -- npx @playwright/mcp@latest --isolated --storage-state ./auth/storage.json
24
+ ```
25
+
26
+ **Windows**: wrap with `cmd /c` — `claude mcp add playwright -- cmd /c npx @playwright/mcp@latest`.
27
+
28
+ ### Scopes: local vs project vs user
29
+
30
+ | Scope | Flag | Storage | Shared? | Use for |
31
+ |---|---|---|---|---|
32
+ | **local** (default) | `--scope local` | `~/.claude.json` under `projects["<cwd>"].mcpServers` | Private, per-directory | Experiments, personal tokens |
33
+ | **project** | `--scope project` | **`.mcp.json` at repo root** | Yes — commit to git | Team-shared tooling |
34
+ | **user** | `--scope user` | `~/.claude.json` top-level `mcpServers` | Private, all your projects | Personal global tools |
35
+
36
+ Precedence when multiple scopes define the same server: **local > project > user**. Historical note: older Claude Code called `local` → `project` and `user` → `global`.
37
+
38
+ ### The `.mcp.json` schema (project scope)
39
+
40
+ Lives at the project root and should be committed. Minimal form:
41
+
42
+ ```json
43
+ {
44
+ "mcpServers": {
45
+ "playwright": {
46
+ "command": "npx",
47
+ "args": ["@playwright/mcp@latest"]
48
+ }
49
+ }
50
+ }
51
+ ```
52
+
53
+ Extended form with explicit transport type, timeout, and env:
54
+
55
+ ```json
56
+ {
57
+ "mcpServers": {
58
+ "playwright": {
59
+ "type": "stdio",
60
+ "command": "npx",
61
+ "timeout": 30,
62
+ "args": ["-y", "@playwright/mcp@0.0.70", "--headless", "--isolated",
63
+ "--output-dir", "./audit/screens", "--image-responses", "omit"],
64
+ "env": {
65
+ "PLAYWRIGHT_MCP_CONSOLE_LEVEL": "warning"
66
+ },
67
+ "disabled": false
68
+ }
69
+ }
70
+ }
71
+ ```
72
+
73
+ User and local scopes live inside **`~/.claude.json`** rather than a dedicated file — a structure shared with `allowedTools`, `mcpContextUris`, and per-project state.
74
+
75
+ ### Version state and the #1359 tool-name bug
76
+
77
+ **Latest:** `@playwright/mcp@0.0.70` (published mid-April 2026). **Known-broken:** `0.0.56` and `0.0.61` — both failed against Claude Code 2.0.1 → 2.1.22 with the error:
78
+
79
+ ```
80
+ Error: No such tool available: mcp__playwright__browser_navigate
81
+ ```
82
+
83
+ Root cause was a **tool-schema registration mismatch**: the MCP server connected successfully but Claude Code's session-stored tool manifest didn't include the `mcp__playwright__browser_*` tools, so the model couldn't see or call them. Not a permissions issue, not a server crash — a plumbing bug between the two. Issue #1359 is **closed**; `@latest` should work again. **Recommended pin for shared configs**: `@playwright/mcp@0.0.41` (the community-validated fallback) or `@playwright/mcp@0.0.70` after you verify it in your environment. Never pin `0.0.56` or `0.0.61`. The pre-release alpha Playwright runtime this package tracks is another reason to pin rather than use `@latest` in `.mcp.json` committed to a team repo.
84
+
85
+ Resolution pattern:
86
+
87
+ ```bash
88
+ claude mcp remove playwright
89
+ claude mcp add playwright npx @playwright/mcp@0.0.41 # known-good fallback
90
+ ```
91
+
92
+ ### Verification
93
+
94
+ Outside a session: `claude mcp list` shows registered servers and their connection state. Inside a session: the **`/mcp`** slash command opens a panel listing each server's status and its available tools. Simon Willison's first-test prompt:
95
+
96
+ ```
97
+ Use playwright mcp to open a browser to example.com
98
+ ```
99
+
100
+ You **must** say "playwright mcp" explicitly — otherwise Claude often reaches for Bash or `curl` instead of the MCP tools (see §7).
101
+
102
+ ### Transports: stdio vs HTTP/SSE
103
+
104
+ **stdio** (default) — Claude Code spawns `npx @playwright/mcp@latest` and talks JSON-RPC over stdin/stdout. Use locally.
105
+
106
+ **HTTP/SSE** — run the server separately, connect by URL. Use when running headed browser on a display-less host (WSL, remote dev box), Docker, or sharing a server across a team.
107
+
108
+ ```bash
109
+ npx @playwright/mcp@latest --port 8931
110
+ # --host 0.0.0.0 to bind all interfaces
111
+ ```
112
+
113
+ Register:
114
+
115
+ ```bash
116
+ claude mcp add --transport http playwright http://localhost:8931/mcp
117
+ ```
118
+
119
+ Or in `.mcp.json`:
120
+
121
+ ```json
122
+ { "mcpServers": { "playwright": { "url": "http://localhost:8931/mcp" } } }
123
+ ```
124
+
125
+ ### Browser binaries and Linux deps
126
+
127
+ Playwright MCP needs a browser binary. Three paths:
128
+
129
+ ```bash
130
+ npx playwright install chromium # ahead of time
131
+ npx playwright install # all browsers
132
+ npx playwright install-deps # Linux/Docker system libs (apt packages)
133
+ ```
134
+
135
+ Or call the MCP tool `browser_install` after connection — it installs whatever `--browser` the config specifies. Because `@playwright/mcp` often tracks alpha Playwright builds, running `npx playwright install` from a *different* local Playwright version can produce mismatched binaries. The `@latest` suffix on the MCP package fetches a clean copy rather than reusing whatever Playwright lives in your `node_modules`.
136
+
137
+ Docker (headless chromium only):
138
+
139
+ ```json
140
+ { "mcpServers": { "playwright": {
141
+ "command": "docker",
142
+ "args": ["run", "-i", "--rm", "--init", "--pull=always",
143
+ "mcr.microsoft.com/playwright/mcp"]
144
+ }}}
145
+ ```
146
+
147
+ ### Microsoft vs ExecuteAutomation — pick Microsoft
148
+
149
+ | Dimension | **Microsoft `@playwright/mcp`** | ExecuteAutomation `@executeautomation/playwright-mcp-server` |
150
+ |---|---|---|
151
+ | Status | **Official**, Microsoft Playwright team | Community third-party |
152
+ | Stars / forks | 31k / 2.5k | 5.4k / 489 |
153
+ | Tool prefix | `browser_*` | `playwright_*` |
154
+ | Interaction model | **Accessibility-tree-based** (`ref=eNN` identifiers) | DOM/selector + visible-text |
155
+ | Vision needed? | No — structured snapshots | Optional (screenshot parsing) |
156
+ | Tool count | ~19 core + opt-in caps (vision/pdf/testing/tracing) → 70+ | Smaller, DOM-focused |
157
+ | Chrome extension ("attach to my real tab") | Yes (`--extension`) | No |
158
+ | Docker image | `mcr.microsoft.com/playwright/mcp` | Community builds only |
159
+ | Cadence | 306 versions; weekly releases | Active but slower |
160
+
161
+ **Pick Microsoft's** for Claude Code audits: the accessibility-tree model gives **deterministic element targeting via stable `ref` IDs**, no vision-model dependency, and tracks Playwright core features as they ship. The Playwright docs explicitly recommend it; Builder.io's guide calls out the namespace collision warning that *"`Playwright MCP server` in search results is often ExecuteAutomation's separate community project — Microsoft's official package is `@playwright/mcp`."*
162
+
163
+ ---
164
+
165
+ ## 2. Complete tool API (@playwright/mcp, current)
166
+
167
+ All tools expose as `mcp__<server-name>__<tool-name>` to Claude — so `browser_navigate` on a server you registered as `playwright` becomes **`mcp__playwright__browser_navigate`**. Tools are organized by capability. **Core + core-tabs + core-install are always on**. `--caps=vision,pdf,testing,tracing` enable opt-in groups.
168
+
169
+ ### Core automation (always enabled)
170
+
171
+ **`browser_navigate`** — `url` (string, required). Navigate to a URL.
172
+
173
+ **`browser_navigate_back`** — no params. Go back one page. ⚠️ `browser_navigate_forward` is **not in the current README** — earlier versions had it; assume absent in 0.0.70.
174
+
175
+ **`browser_snapshot`** — `filename` (string, optional). Returns the **accessibility tree** as YAML-like text (roles, accessible names, `ref=eNN` identifiers). *"This is better than screenshot"* per the README. Token-cheap and deterministic. See §2.4 for the ref system.
176
+
177
+ **`browser_take_screenshot`** — `type` (`png`|`jpeg`, default png), `filename` (default `page-{timestamp}.{png|jpeg}`), `element` (string) + `ref` (string) for per-element (must be provided together), `fullPage` (boolean, cannot combine with element). Per README: *"You can't perform actions based on the screenshot — use `browser_snapshot` for actions."*
178
+
179
+ **`browser_click`** — `element` (required human-readable desc), `ref` (required), `doubleClick` (bool), `button` (`left`|`middle`|`right`), `modifiers` (array).
180
+
181
+ **`browser_type`** — `element`, `ref`, `text` (all required), `submit` (bool, press Enter after), `slowly` (bool, char-by-char for key handlers).
182
+
183
+ **`browser_hover`** — `element`, `ref` (both required).
184
+
185
+ **`browser_fill_form`** — `fields` (array, required). ⚠️ The tool is **`browser_fill_form`**, not `browser_fill`. Inner per-field schema is not itemized in the README prose (it's in the runtime JSON Schema).
186
+
187
+ **`browser_press_key`** — `key` (string, required): `ArrowLeft`, `Enter`, a single char, etc.
188
+
189
+ **`browser_select_option`** — `element`, `ref`, `values` (array, single or multiple).
190
+
191
+ **`browser_drag`** — `startElement`, `startRef`, `endElement`, `endRef` (all required).
192
+
193
+ **`browser_resize`** — `width` (number), `height` (number), both required. Runtime viewport resize. No `device` or `orientation` param — those are `--device` launch-time only.
194
+
195
+ **`browser_evaluate`** — `function` (string, required, form `() => { ... }` or `(element) => { ... }`), `element` + `ref` (optional pair). Cannot accept `ref=eNN` as the argument directly; pass a CSS selector inside the function body (see issue #870).
196
+
197
+ **`browser_run_code`** — `code` (string, required). Full Playwright snippet: `async (page) => { await page.getByRole('button', { name: 'Submit' }).click(); return await page.title(); }`.
198
+
199
+ **`browser_console_messages`** — `level` (`error`|`warning`|`info`|`debug`, default `info`). Each level includes more severe levels.
200
+
201
+ **`browser_network_requests`** — `includeStatic` (bool, default false). Static assets like images/fonts/scripts are filtered unless enabled. Failed-request field names are not documented in the README.
202
+
203
+ **`browser_wait_for`** — mutually exclusive: `text` (appears), `textGone` (disappears), or `time` (seconds). **Prefer `text`** — see §7.
204
+
205
+ **`browser_handle_dialog`** — `accept` (bool, required), `promptText` (string). For native `alert`/`confirm`/`prompt`, not HTML modals.
206
+
207
+ **`browser_file_upload`** — `paths` (array of absolute paths). Empty cancels the chooser.
208
+
209
+ **`browser_close`** — no params.
210
+
211
+ ### Tabs + install (always enabled)
212
+
213
+ **`browser_tabs`** — `action` (`list`|`new`|`close`|`select`), `index` (number, optional). Unifies what earlier versions had as `browser_tab_new`/`_close`/`_list`/`_select`.
214
+
215
+ **`browser_install`** — no params. Installs the browser specified in the config. Call this on "browser not installed" errors.
216
+
217
+ ### Opt-in caps
218
+
219
+ `--caps=vision`:
220
+ - **`browser_mouse_click_xy`** — `element`, `x`, `y`
221
+ - **`browser_mouse_move_xy`** — `element`, `x`, `y`
222
+ - **`browser_mouse_drag_xy`** — `element`, `startX`, `startY`, `endX`, `endY`
223
+
224
+ `--caps=pdf`:
225
+ - **`browser_pdf_save`** — `filename` (default `page-{timestamp}.pdf`).
226
+
227
+ `--caps=testing`:
228
+ - **`browser_generate_locator`** — `element`, `ref`. Generate test-grade locator.
229
+ - **`browser_verify_element_visible`** — `role`, `accessibleName`.
230
+ - **`browser_verify_text_visible`** — `text`.
231
+ - **`browser_verify_list_visible`** — `element`, `ref`, `items` (array).
232
+ - **`browser_verify_value`** — `type`, `element`, `ref`, `value` (use `"true"`/`"false"` for checkboxes).
233
+
234
+ `--caps=tracing`:
235
+ - **`browser_start_tracing`** / **`browser_stop_tracing`** — no params.
236
+
237
+ ### The `ref=eNN` accessibility-reference system
238
+
239
+ `browser_snapshot` returns a structured accessibility tree, **not pixels**. Every interactive node carries a role, accessible name, and a stable ref — **assigned at snapshot time** by walking the tree:
240
+
241
+ ```yaml
242
+ - banner:
243
+ - heading "Example Domain" [level=1] [ref=e3]
244
+ - paragraph [ref=e4]: "This domain is for use in illustrative examples..."
245
+ - link "More information..." [ref=e5]:
246
+ /url: https://www.iana.org/domains/example
247
+ - textbox "Search" [ref=e12]
248
+ - button "Submit" [ref=e13]
249
+ ```
250
+
251
+ **Scope and stability** — refs are scoped to a single snapshot. The `e{N}` prefix is the main frame; `s{F}e{N}` is subframe-F element-N. **After any mutation (click, type, navigate), refs go stale**. Call `browser_snapshot` again before the next interaction, or rely on the auto-snapshot that most tools return in their response.
252
+
253
+ **Why two params (`element` + `ref`)** — every interaction tool takes both. `ref` is the deterministic target; `element` is a human-readable description used for permission prompts and logging. For drag, the pattern doubles into `startElement`/`startRef` + `endElement`/`endRef`:
254
+
255
+ ```json
256
+ { "tool": "mcp__playwright__browser_click",
257
+ "arguments": { "element": "'More information' link in banner", "ref": "e5" } }
258
+ ```
259
+
260
+ **Snapshot modes** — `--snapshot-mode` (env `PLAYWRIGHT_MCP_SNAPSHOT_MODE`): `incremental` (default, return diff only), `full` (always complete tree), `none` (suppress auto-snapshot; you must call `browser_snapshot` manually). Use `none` on very large pages to avoid context bloat; see §10.
261
+
262
+ ---
263
+
264
+ ## 3. Viewport and device emulation
265
+
266
+ ### Launch-time flags
267
+
268
+ ```
269
+ --viewport-size <WxH> e.g. "1440x900" env PLAYWRIGHT_MCP_VIEWPORT_SIZE
270
+ --device <name> e.g. "iPhone 15"
271
+ --user-agent <string> override UA env PLAYWRIGHT_MCP_USER_AGENT
272
+ ```
273
+
274
+ Viewport format is **`WIDTHxHEIGHT`** with lowercase `x`. The comma form (`1280,720`) is not documented.
275
+
276
+ ### Runtime resize
277
+
278
+ `browser_resize(width, height)` calls `page.setViewportSize(...)` under the hood — it changes **viewport dimensions only**. Any `--device`-derived `deviceScaleFactor`, `userAgent`, `isMobile`, or `hasTouch` flags remain intact. There is **no MCP tool to change `--device` mid-session**; to switch device emulation, restart the server.
279
+
280
+ ### Useful device names from Playwright's registry
281
+
282
+ All names are case-sensitive, from `deviceDescriptorsSource.json`. Each has a `" landscape"` variant.
283
+
284
+ | `--device` value | Viewport | DPR | Engine |
285
+ |---|---|---|---|
286
+ | `"iPhone SE"` | 320×568 | 2 | webkit |
287
+ | `"iPhone 13"` / `"iPhone 14"` | 390×664 | 3 | webkit |
288
+ | `"iPhone 15"` / `"iPhone 15 Pro"` | 393×659 | 3 | webkit |
289
+ | `"iPhone 15 Pro Max"` | 430×739 | 3 | webkit |
290
+ | `"Pixel 5"` | 393×727 | 2.75 | chromium |
291
+ | `"Pixel 7"` | 412×839 | ~2.625 | chromium |
292
+ | `"Galaxy S9+"` | 320×658 | 4.5 | chromium |
293
+ | `"iPad Mini"` | 768×1024 | 2 | webkit |
294
+ | `"iPad Pro 11"` | 834×1194 | 2 | webkit |
295
+ | `"Desktop Chrome"` / `"Desktop Safari"` / `"Desktop Edge"` / `"Desktop Firefox"` | 1280×720 | 1 | varies |
296
+
297
+ ### Standard audit breakpoints
298
+
299
+ The community converges on three: **375×812 mobile**, **768×1024 tablet**, **1440×900 desktop**. For comprehensive sweeps, add **1920×1080** (full HD). For the smallest mobile, test **320×568** (iPhone SE). Use `browser_resize` between pages — don't restart the server.
300
+
301
+ ### Device-pixel-ratio and screenshot bloat
302
+
303
+ **iPhone 15 at DPR 3** turns a 393×2000 CSS-px full-page shot into **1179×6000 physical pixels** — easily 3–10 MB, and catastrophic if returned inline to the model. Mitigations, in priority order:
304
+
305
+ 1. Run with `--image-responses omit` (env `PLAYWRIGHT_MCP_IMAGE_RESPONSES=omit`) so screenshots go to disk instead of being base64-encoded into the response.
306
+ 2. Prefer `browser_snapshot` for decisions; use `browser_take_screenshot` only for reviewer evidence.
307
+ 3. On high-DPR devices, pass `type: "jpeg"` or restrict to element screenshots.
308
+ 4. Save with descriptive relative filenames into `--output-dir`.
309
+
310
+ ---
311
+
312
+ ## 4. Capture strategies
313
+
314
+ ### Snapshot vs screenshot (know when to use which)
315
+
316
+ | Need | Tool | Token cost | When |
317
+ |---|---|---|---|
318
+ | Decide what to do, get refs | `browser_snapshot` | 200–400 tokens on small pages, multi-K on rich apps | **Default**; after every state change |
319
+ | Visual evidence for human reviewer | `browser_take_screenshot` | 4–8K inline, ~50 if saved to disk | Only when visuals matter; always save to disk |
320
+ | Computed styles (hex, px, ratios) | `browser_evaluate` | Tiny | Required for any WCAG-grade numeric claim |
321
+ | Error context | `browser_console_messages` | Small | At top of every page audit |
322
+ | Failed resource requests | `browser_network_requests` | Small–medium | When diagnosing 404s, blocked requests |
323
+
324
+ Playwright's own benchmark shows **~114K tokens via MCP vs ~27K via CLI** on equivalent tasks — a 4× multiplier driven almost entirely by inline images and auto-snapshots. A single content-heavy-page screenshot inline is 5–8K tokens; saved to disk, it's ~50 tokens for the path.
325
+
326
+ ### Full-page and element captures
327
+
328
+ ```json
329
+ // Full page
330
+ { "tool": "browser_take_screenshot",
331
+ "arguments": { "fullPage": true, "filename": "home_mobile.png" } }
332
+
333
+ // Per-element (from a prior snapshot)
334
+ { "tool": "browser_take_screenshot",
335
+ "arguments": { "element": "Primary CTA button",
336
+ "ref": "e87",
337
+ "filename": "cta_mobile.png" } }
338
+ ```
339
+
340
+ `fullPage: true` and `element`/`ref` are **mutually exclusive**.
341
+
342
+ ### Output env vars (verified from `--help`)
343
+
344
+ ```
345
+ --output-dir <path> env PLAYWRIGHT_MCP_OUTPUT_DIR
346
+ --save-session env PLAYWRIGHT_MCP_SAVE_SESSION
347
+ --save-trace env PLAYWRIGHT_MCP_SAVE_TRACE
348
+ --save-video <WxH> env PLAYWRIGHT_MCP_SAVE_VIDEO
349
+ --image-responses allow|omit env PLAYWRIGHT_MCP_IMAGE_RESPONSES
350
+ --snapshot-mode incremental|full|none env PLAYWRIGHT_MCP_SNAPSHOT_MODE
351
+ ```
352
+
353
+ ⚠️ **`PLAYWRIGHT_MCP_OUTPUT_MODE` is NOT documented** in the current upstream README. If your notes or third-party guides reference it with values `file`/`stdout`, they're citing a fork or an outdated version. The actual lever for "send images to disk, not into the model" is `--image-responses omit`. The closest match for session-trace persistence is `--save-session` / `PLAYWRIGHT_MCP_SAVE_SESSION`.
354
+
355
+ ### Default output directory
356
+
357
+ The README does **not** explicitly state the default. Files go to an OS temp directory created at startup when `--output-dir` is omitted. **Always pass `--output-dir` explicitly** for audits so evidence lands in a predictable place. Default auto-generated filenames are `page-{timestamp}.png|jpeg|pdf`.
358
+
359
+ ### File-naming strategy for audit evidence
360
+
361
+ Deterministic, sortable, and relative. Pass `filename` explicitly for every shot:
362
+
363
+ ```
364
+ {route-slug}_{viewport-label}_{state}_{iso-timestamp}.png
365
+
366
+ home_iphone15_loggedout_2026-04-18T14-32-11Z.png
367
+ pricing_1440x900_loggedin_2026-04-18T14-33-02Z.png
368
+ checkout-step2_ipadpro_error_2026-04-18T14-35-44Z.png
369
+ ```
370
+
371
+ Keep filenames relative (no leading `/`) so they resolve inside `--output-dir`.
372
+
373
+ ---
374
+
375
+ ## 5. Authentication and state
376
+
377
+ ### Persistent profile (default)
378
+
379
+ Without `--isolated`, Playwright MCP uses a persistent profile. Default locations:
380
+
381
+ ```
382
+ Windows: %USERPROFILE%\AppData\Local\ms-playwright\mcp-{channel}-profile
383
+ macOS: ~/Library/Caches/ms-playwright/mcp-{channel}-profile
384
+ Linux: ~/.cache/ms-playwright/mcp-{channel}-profile
385
+ ```
386
+
387
+ `{channel}` is `chrome` / `msedge` / `chromium`. Some newer versions use `mcp-{channel}-{workspace-hash}` to give different projects separate profiles — inspect the directory after first run to see which form your version uses.
388
+
389
+ ### The "log in yourself, then continue" pattern
390
+
391
+ From the README: *"All the logged in information will be stored in the persistent profile; you can delete it between sessions if you'd like to clear the offline state."* Flow:
392
+
393
+ 1. Launch MCP in headed mode (default, don't pass `--headless`) with an explicit `--user-data-dir ./.pw-mcp-profile`.
394
+ 2. Agent navigates to the login URL via `browser_navigate`.
395
+ 3. **You log in manually** in the headed browser while the agent pauses or waits on a text signal.
396
+ 4. Cookies/localStorage persist into the profile directory.
397
+ 5. Future sessions with the same `--user-data-dir` skip login.
398
+
399
+ ### `--storage-state` for CI
400
+
401
+ With `--isolated`, every session starts clean — closing the browser discards all state. Pre-seed credentials via Playwright's standard storageState JSON:
402
+
403
+ ```json
404
+ { "mcpServers": { "playwright": {
405
+ "command": "npx",
406
+ "args": ["@playwright/mcp@latest",
407
+ "--isolated",
408
+ "--storage-state=./auth/storage.json"]
409
+ }}}
410
+ ```
411
+
412
+ Generate the file once with `playwright codegen --save-storage=./auth/storage.json` or from test code.
413
+
414
+ ### The browser-extension approach ("share my real tab")
415
+
416
+ Install the **Playwright MCP Bridge** extension from github.com/microsoft/playwright-mcp/releases. Load it as unpacked in `chrome://extensions/` with Developer mode on. Then:
417
+
418
+ ```json
419
+ { "mcpServers": { "playwright-extension": {
420
+ "command": "npx",
421
+ "args": ["@playwright/mcp@latest", "--extension"]
422
+ }}}
423
+ ```
424
+
425
+ On first tool call, the extension opens a tab-picker; you select the running tab the agent attaches to. For headless auto-approval, copy the `PLAYWRIGHT_MCP_EXTENSION_TOKEN` from the extension popup and put it in the server's `env` block. This is the only way to get **real SSO sessions, enterprise policies, and ad-blocker state** without manual reauth — MCP attaches rather than spawning a clean Playwright browser.
426
+
427
+ ---
428
+
429
+ ## 6. Security and scope
430
+
431
+ ### Origin allow/block lists
432
+
433
+ ```
434
+ --allowed-origins "https://a.com;https://*.b.com" env PLAYWRIGHT_MCP_ALLOWED_ORIGINS
435
+ --blocked-origins "https://evil.com" env PLAYWRIGHT_MCP_BLOCKED_ORIGINS
436
+ ```
437
+
438
+ **Semicolon-separated**, not comma. Glob wildcards with `*` are supported (Playwright-style). Blocklist is evaluated first. With only a blocklist, non-matching requests are still allowed.
439
+
440
+ ### File access
441
+
442
+ `--allow-unrestricted-file-access` / `PLAYWRIGHT_MCP_ALLOW_UNRESTRICTED_FILE_ACCESS` unlocks two defaults:
443
+ - File-system access (for upload paths, output paths) is otherwise restricted to the MCP client's workspace roots (or cwd).
444
+ - `file://` navigation is otherwise blocked.
445
+
446
+ ### The "not a security boundary" disclaimer (verbatim)
447
+
448
+ The README — and both origin flag help strings — explicitly state:
449
+
450
+ > **Playwright MCP is *not* a security boundary. See MCP Security Best Practices for guidance on securing your deployment.**
451
+
452
+ > "`--allowed-origins` / `--blocked-origins`: Important: *does not* serve as a security boundary and *does not* affect redirects."
453
+
454
+ > "`allowUnrestrictedFileAccess` acts as a guardrail to prevent the LLM from accidentally wandering outside its intended workspace. It is a convenience defense to catch unintended file access, not a secure boundary; a deliberate attempt to reach other directories can be easily worked around, so always rely on client-level permissions for true security."
455
+
456
+ Practical implication: origin lists can be bypassed via **DNS rebinding, HTTP redirects, or direct navigation**. Treat them as hygiene controls, not containment. For real isolation, run MCP in a disposable container or VM.
457
+
458
+ ### Data-leakage implications for authenticated audits
459
+
460
+ Every byte returned by an MCP tool is forwarded to Anthropic's API. Specifically, that includes:
461
+
462
+ - `browser_snapshot` — all visible text, `aria-label`s, form labels, occasionally form values
463
+ - `browser_take_screenshot` — raw pixels (names, emails, dashboards, tokens in URL bars)
464
+ - `browser_console_messages` — stack traces frequently containing tokens, debug dumps, API request bodies
465
+ - `browser_network_requests` — URLs including query-string session IDs
466
+ - `browser_evaluate` — arbitrary JS (`document.cookie`, `localStorage`, DOM secrets)
467
+
468
+ **Never audit prod with real PII.** Use a staging tenant seeded with synthetic data, pair it with `--image-responses omit` + local human review, and check your Anthropic retention/ZDR tier before running against anything internal.
469
+
470
+ ### Docker and `--no-sandbox`
471
+
472
+ `--no-sandbox` disables Chromium's sandbox — required inside minimal containers (including `mcr.microsoft.com/playwright/mcp` when running as root without user-namespace capabilities). Fine for a disposable container, bad on a workstation. Example long-lived container from the README:
473
+
474
+ ```bash
475
+ docker run -d -i --rm --init --pull=always \
476
+ --entrypoint node --name playwright -p 8931:8931 \
477
+ mcr.microsoft.com/playwright/mcp \
478
+ cli.js --headless --browser chromium --no-sandbox --port 8931
479
+ ```
480
+
481
+ ---
482
+
483
+ ## 7. Prompting patterns: how to get the tools actually used
484
+
485
+ ### Rule zero: say "Playwright MCP" explicitly
486
+
487
+ Simon Willison's TIL (til.simonwillison.net/claude-code/playwright-mcp-claude-code) is blunt:
488
+
489
+ > "I found I needed to explicitly say 'playwright mcp' the first time, otherwise it might try to use Bash to run Playwright instead."
490
+
491
+ The mechanism is Claude's tool-selection heuristic — "test the login page" semantically matches the general-purpose `Bash` tool more strongly than `mcp__playwright__browser_navigate`, **especially in repos where `playwright` is already a dev dependency**. Writing "Playwright MCP" (or the `mcp__playwright__` prefix) in the first turn aligns the request with the MCP tool descriptions. Canonical kickoff:
492
+
493
+ ```
494
+ Using the Playwright MCP server (mcp__playwright__*), open http://localhost:3000
495
+ and perform a responsive audit. Do NOT use Bash, curl, or write a Playwright
496
+ test file — drive the browser live through the MCP tools.
497
+ ```
498
+
499
+ ### Standard recon flow
500
+
501
+ ```
502
+ browser_navigate(url)
503
+ → browser_wait_for(text="<known copy on loaded page>") # text, not time
504
+ → browser_console_messages() # bail early on JS errors
505
+ → browser_snapshot() # get refs
506
+ → [quote refs verbatim in findings]
507
+ → browser_click({ ref: "e42", element: "Submit button in 'Billing' form" })
508
+ → browser_wait_for(text="<post-action signal>")
509
+ → browser_snapshot() # REFRESH — old refs stale
510
+ → browser_evaluate(...) # numeric evidence
511
+ → browser_take_screenshot({ fullPage: true, filename: "..." })
512
+ ```
513
+
514
+ ### Viewport loop (responsive audits)
515
+
516
+ Embed in the subagent system prompt. Run **sequentially** inside one subagent, not as parallel subagents sharing one browser tab (issue #893 — they fight over the same tab and return inconsistent results unless you use `--isolated` + per-agent `browser_tab_new`).
517
+
518
+ ```python
519
+ VIEWPORTS = [("mobile", 375, 812), ("tablet", 768, 1024), ("desktop", 1440, 900)]
520
+ PAGES = ["/", "/pricing", "/docs", "/signup"]
521
+
522
+ for page in PAGES:
523
+ for name, w, h in VIEWPORTS:
524
+ browser_resize(width=w, height=h)
525
+ browser_navigate(url=BASE+page)
526
+ browser_wait_for(text=EXPECTED[page])
527
+ browser_console_messages()
528
+ snap = browser_snapshot() # save to audit/snapshots/{page}_{vp}.yaml
529
+ browser_take_screenshot(fullPage=True,
530
+ filename=f"audit/screens/{page}_{vp}.png")
531
+ styles = browser_evaluate(function="() => { /* getComputedStyle for h1, CTA, nav */ }")
532
+ append_finding(...)
533
+ ```
534
+
535
+ ### Waiting strategies — text beats time, always
536
+
537
+ Playwright itself *"discourages waitFor methods that wait for network connections to be idle"* — modern apps never stop talking (analytics beacons, websockets, polling). `networkidle` is a trap.
538
+
539
+ Decision tree:
540
+ - Just navigated? → `browser_wait_for(text="<known copy>")`
541
+ - Clicked something that opens a modal? → `browser_wait_for(text="<modal heading>")`
542
+ - Removed a spinner? → `browser_wait_for(textGone="Loading...")`
543
+ - No textual signal? → `browser_evaluate` polling `document.readyState` or a specific selector
544
+ - Absolutely last resort → `browser_wait_for(time=1)` with a comment explaining why
545
+
546
+ Also: **auto-wait exists**. `browser_click`/`browser_type` already wait for actionability. Don't stack a redundant `browser_wait_for` before every action — only after navigation and async state changes.
547
+
548
+ ### `browser_evaluate` patterns for WCAG-grade numbers
549
+
550
+ ```js
551
+ // Contrast + typography
552
+ (el => {
553
+ const s = getComputedStyle(el);
554
+ return { color: s.color, background: s.backgroundColor,
555
+ fontSize: s.fontSize, lineHeight: s.lineHeight,
556
+ fontWeight: s.fontWeight };
557
+ })(document.querySelector('h1.hero__title'))
558
+
559
+ // Touch-target audit (WCAG 2.5.5 — 44×44 CSS px)
560
+ [...document.querySelectorAll('button, a, [role=button]')].map(el => {
561
+ const r = el.getBoundingClientRect();
562
+ return { text: el.innerText.slice(0,40), w: r.width, h: r.height,
563
+ ok: r.width >= 44 && r.height >= 44 };
564
+ }).filter(x => !x.ok)
565
+ ```
566
+
567
+ ### Error handling
568
+
569
+ | Failure | Response |
570
+ |---|---|
571
+ | `ref=eNN` not found | Page re-rendered. Re-snapshot, re-identify by accessible name, retry with new ref. Don't guess CSS selectors. |
572
+ | Click hit wrong element | Two "Submit" buttons — re-snapshot, include parent context in `element`: `"Submit button inside 'Shipping address' form"`, pick the ref nested under that parent in the YAML. |
573
+ | `browser_wait_for(text=…)` timeout | 80% of the time: there's a JS error. Dump `browser_console_messages` first, then `browser_snapshot` to see what actually rendered. Retry with different text or `textGone`. |
574
+ | "No browser" error | `browser_install` once, then retry. Don't loop blindly. |
575
+ | Same step fails twice | **Stop.** Write failure + snapshot + console to findings, surface to orchestrator. Do NOT fabricate success. |
576
+
577
+ ---
578
+
579
+ ## 8. Anti-hallucination and verification
580
+
581
+ Community reports are consistent: the agent often loses its place around step 12 of a 20-step flow, starts hallucinating selectors, and writes findings for pages it never visited. The antidote is structural, not prompt-hopeful.
582
+
583
+ ### Require a structured JSON findings file
584
+
585
+ The findings file is the **single source of truth** — nothing the agent *says* in chat counts unless it's in the file:
586
+
587
+ ```json
588
+ {
589
+ "id": "f-001",
590
+ "page_url": "https://app.example.com/pricing",
591
+ "viewport": { "name": "mobile", "w": 375, "h": 812 },
592
+ "screenshot_path": "audit/screens/pricing_mobile.png",
593
+ "snapshot_path": "audit/snapshots/pricing_mobile.yaml",
594
+ "snapshot_quote": "- button \"Get started\" [ref=e87]",
595
+ "dom_selector": "main > section.cta > button.primary",
596
+ "computed_style_excerpt": {
597
+ "color": "rgb(255, 255, 255)", "background-color": "rgb(147, 197, 253)",
598
+ "font-size": "14px", "min-height": "36px", "min-width": "88px"
599
+ },
600
+ "wcag_criterion": "2.5.5 Target Size (AAA) / 1.4.3 Contrast",
601
+ "severity": "high",
602
+ "evidence_hex_fg": "#FFFFFF",
603
+ "evidence_hex_bg": "#93C5FD",
604
+ "evidence_contrast_ratio": 1.89,
605
+ "finding": "Primary CTA is 88×36px (fails 44×44) and 1.89:1 contrast (fails 4.5:1)."
606
+ }
607
+ ```
608
+
609
+ Every field is mandatory. Missing field → orchestrator rejects the run.
610
+
611
+ ### Orchestrator-side verification: stat the paths
612
+
613
+ The parent agent (or a `SubagentStop` hook) must verify **before** accepting the report:
614
+
615
+ ```bash
616
+ # scripts/verify-audit.sh
617
+ set -euo pipefail
618
+ jq -e 'length > 0' audit/findings.json >/dev/null
619
+ jq -r '.[] | .screenshot_path, .snapshot_path' audit/findings.json | while read p; do
620
+ [ -s "$p" ] || { echo "MISSING OR EMPTY: $p"; exit 1; }
621
+ done
622
+ # Every snapshot_quote must literally appear in the snapshot file it cites
623
+ jq -c '.[]' audit/findings.json | while read f; do
624
+ q=$(echo "$f" | jq -r .snapshot_quote)
625
+ s=$(echo "$f" | jq -r .snapshot_path)
626
+ grep -qF "$q" "$s" || { echo "QUOTE NOT IN SNAPSHOT: $s"; exit 1; }
627
+ done
628
+ ```
629
+
630
+ If verification fails, re-dispatch the subagent with a **SHOW-YOUR-WORK retry prompt** that lists the specific gaps.
631
+
632
+ ### Verbatim accessibility-tree quoting
633
+
634
+ Playwright MCP snapshots have a fixed format: `- <role> "<name>" [ref=eNN]`. Rule in the subagent prompt:
635
+
636
+ > For every finding, `snapshot_quote` MUST be a single line copied character-for-character from the snapshot file you saved, containing a `[ref=eNN]` token. If you can't produce a verbatim line, the element wasn't in the a11y tree — say so and switch to a DOM selector with a screenshot crop as evidence.
637
+
638
+ A hallucinated ref fails `grep -qF` trivially — that's the point.
639
+
640
+ ### Show-your-work discipline
641
+
642
+ Every finding must cite four things:
643
+
644
+ - **[SHOT]** — a screenshot file on disk
645
+ - **[QUOTE]** — a verbatim line from the snapshot YAML, including `[ref=eNN]`
646
+ - **[SEL]** — a CSS selector that resolves
647
+ - **[VAL]** — a computed value from `browser_evaluate`
648
+
649
+ Miss any one → don't file the finding, file the gap instead: *"Component observed in <screenshot> but absent from accessibility tree — likely shadow DOM or canvas."*
650
+
651
+ ### Retry loop
652
+
653
+ When verification fails:
654
+
655
+ ```
656
+ Your previous audit run failed verification. Specific problems:
657
+ - findings.json is missing screenshot_path for 3 entries
658
+ - snapshot_quote "button 'Buy now' [ref=e42]" not found in
659
+ audit/snapshots/pricing_mobile.yaml
660
+
661
+ Re-run with SHOW-YOUR-WORK discipline:
662
+ 1. Every finding MUST cite SHOT+QUOTE+SEL+VAL (all four).
663
+ 2. Do NOT invent findings for elements you did not snapshot.
664
+ 3. If you can't see an element in the a11y tree, say so.
665
+ 4. Save screenshots and snapshots to disk BEFORE writing findings.json.
666
+ ```
667
+
668
+ ### Required summary block
669
+
670
+ `audit/SUMMARY.md` with fixed headings (grep'd by the verifier for URL count, viewport count, finding row count):
671
+
672
+ ```markdown
673
+ # Audit Summary
674
+ ## Pages visited
675
+ - https://app.example.com/
676
+ - https://app.example.com/pricing
677
+ ## Viewports tested
678
+ - mobile 375×812
679
+ - tablet 768×1024
680
+ - desktop 1440×900
681
+ ## Findings
682
+ | id | url | vp | ref | selector | hex_fg | hex_bg | px | severity |
683
+ |----|-----|----|-----|----------|--------|--------|----|----------|
684
+ ```
685
+
686
+ ---
687
+
688
+ ## 9. Integration with Claude Code subagents (.claude/agents/)
689
+
690
+ ### Frontmatter fields (2026)
691
+
692
+ Required: `name`, `description`. Optional:
693
+
694
+ | Field | Purpose |
695
+ |---|---|
696
+ | `tools` | Comma-separated allowlist. Omit → inherits ALL parent tools (including every MCP). |
697
+ | `disallowedTools` | Explicit deny list, overrides inherit |
698
+ | `model` | `sonnet` / `opus` / `haiku` / full ID / `inherit` |
699
+ | `permissionMode` | `default` / `acceptEdits` / `dontAsk` / `bypassPermissions` / `plan` |
700
+ | `mcpServers` | List of MCP server names to scope to this subagent (additive, see caveat below) |
701
+ | `hooks` | Pre/PostToolUse/Stop hooks scoped to this subagent |
702
+ | `skills` | Skills preloaded at startup |
703
+ | `memory` | `user` / `project` / `local` |
704
+ | `maxTurns` | Hard cap on agentic turns |
705
+ | `background` | `true` → concurrent background task |
706
+ | `isolation` | `worktree` → run in temp git worktree |
707
+ | `color`, `effort` | UI color; effort tier for Opus |
708
+
709
+ Plugin-installed subagents **cannot** use `hooks`, `mcpServers`, or `permissionMode` (security restriction).
710
+
711
+ ### MCP tool naming
712
+
713
+ Confirmed pattern: **`mcp__<server-name>__<tool-name>`** (double underscores). A server registered with `claude mcp add playwright ...` exposes `mcp__playwright__browser_navigate`, `mcp__playwright__browser_snapshot`, etc. Wildcards work in allowlists: `mcp__playwright__*` grants all tools from that server. Plugin-scoped servers add a `plugin_<plugin-name>_` prefix.
714
+
715
+ ### Per-subagent MCP scoping: real but imperfect
716
+
717
+ The `mcpServers:` frontmatter field exists but is **additive, not isolating**. From issue #24054:
718
+
719
+ > "No isolation: there's no way to make an MCP server available only to a subagent or skill. The `mcpServers` frontmatter field in agent definitions is additive — it selects from globally-configured servers but doesn't hide them from the parent."
720
+
721
+ And issue #25200 (open): `mcpServers` + MCP tools in `tools:` fails at runtime under deferred tool loading — workaround is to include the Playwright tools in the parent session's allowlist or set `ENABLE_TOOL_SEARCH=off`. Issue #6915 specifically cites Playwright as the motivating example: *"the main chat should be able to give instructions to a subagent to use the tools without polluting the context of the main chat with all of the Playwright MCP tool calls."* That feature is pending.
722
+
723
+ **Today's practical pattern**: include Playwright in the subagent's tool allowlist; in the parent's CLAUDE.md, instruct *"do not call `mcp__playwright__*` directly; delegate to the `ux-auditor` subagent."* The tool schemas still load into the parent's context — there's no way around that without the pending feature.
724
+
725
+ ### Context-cost concern
726
+
727
+ Playwright MCP registers **~25 tools**. Simon Willison observed on Mastodon: *"the Playwright one is pretty big, it has 25 tools defined which may be too many for the local LLMs to handle."* Community benchmarks estimate **~500 tokens per tool schema**, i.e. **~12–15K tokens just to load Playwright's schemas** before any action. Combined with auto-snapshots (2–10K tokens each), a 30-action flow routinely burns 100K+ tokens. Playwright's own comparison: 114K MCP vs 27K CLI for equivalent work.
728
+
729
+ Mitigations, in priority order:
730
+ 1. **Tool Search** (default in Claude Code late-2025+) loads only tool names at startup; schemas load on demand. Set `ENABLE_TOOL_SEARCH=auto`.
731
+ 2. Scope Playwright to the audit subagent — even though schemas still load in the parent, the orchestrator won't *invoke* them, so snapshot pollution is avoided.
732
+ 3. Run the server with `--output-dir` + `--image-responses omit` so images go to disk, not inline into context. Biggest single lever.
733
+ 4. For long flows (>20 steps), Microsoft themselves recommend the CLI + Skill path: *"Modern coding agents increasingly favor CLI-based workflows exposed as Skills over MCP because CLI invocations are more token-efficient."*
734
+
735
+ ---
736
+
737
+ ## 10. Common pitfalls
738
+
739
+ **Snapshot churn on dynamic pages.** Refs are indexed per-snapshot, not stable IDs — the same button may be `ref=e87` in one snapshot and `ref=e91` in the next. Official guidance: *"Refs are stable within a single snapshot… after navigation or DOM updates, the tool returns a fresh snapshot with new refs. Most tools also return a snapshot automatically after each action."* Enforce: **treat every `[ref=eNN]` as valid for ONE action only.** Never reuse a ref across actions without a fresh snapshot.
740
+
741
+ **Accessibility tree vs DOM discrepancies.** Shadow DOM (Lit/Shoelace/Stencil, especially closed shadow roots) can be invisible to the a11y snapshot — Playwright itself can't pierce closed shadow roots at all. Cross-origin iframes are generally opaque. Canvas and custom-rendered charts don't appear in the tree. Detection rule: *"If you see a visual element in a screenshot but no corresponding entry in the YAML snapshot, report it as a high-severity finding: component not exposed to the accessibility tree — screen-reader users can't perceive it."*
742
+
743
+ **Modal and cookie-banner handling.** HTML modals and cookie banners are real DOM — dismiss via `browser_click` on the ref, **before** snapshotting main content (otherwise "real" content renders but is inert/covered). Native JS dialogs (`alert`, `confirm`, `prompt`) are NOT DOM and require `browser_handle_dialog` separately. A page-load `confirm()` will hang the entire session until handled. Prompt pattern: right after every `browser_navigate`, scan the snapshot for accessible names containing "cookie"/"consent"/"accept"/"GDPR"/"subscribe" and dismiss first.
744
+
745
+ **JS errors cascade.** One uncaught error during init silently breaks every downstream interaction — clicks "succeed" but state never updates. Call `browser_console_messages` immediately after every navigation; if errors exist, write them to findings and stop auditing that page rather than generating more findings on broken state.
746
+
747
+ **Duplicate accessible names.** Two buttons named "Submit" means both `element` disambiguation and `ref` matter. Resolution: include parent context in `element` (`"Submit button in 'Shipping address' form"`), pick the ref nested under the right parent in the YAML, or use `data-testid` via `--test-id-attribute`.
748
+
749
+ **Animation timing.** Screenshots mid-animation look broken (50%-opacity modals, half-slid sidebars). Don't use `networkidle`. Instead wait for text inside the animated component, or **disable animations** via `browser_evaluate` at the start of each page:
750
+
751
+ ```js
752
+ () => {
753
+ const s = document.createElement('style');
754
+ s.textContent = '*,*::before,*::after{animation:none!important;transition:none!important;}';
755
+ document.head.appendChild(s);
756
+ }
757
+ ```
758
+
759
+ **Parallel subagents fighting over one tab** (issue #893). Multiple Claude subagents launched in parallel share the MCP server's browser tab and produce inconsistent results. Run viewport loops **sequentially inside one subagent**, or start the server with `--isolated` and have each agent call `browser_tab_new` first.
760
+
761
+ **`browser_evaluate` cannot use snapshot refs** (issue #870). The `evaluate` tool doesn't understand `ref=eNN` — you must pass a CSS/XPath selector inside the function body.
762
+
763
+ **Pages too big for context** (issue #1329). Some pages produce snapshots that exceed the context budget. Set `PLAYWRIGHT_MCP_SNAPSHOT_MODE=none` to suppress auto-snapshots and call `browser_snapshot` manually with the `filename` param so the tree goes to disk.
764
+
765
+ **Deprecated package warning.** `@modelcontextprotocol/server-playwright` is abandoned — use `@playwright/mcp`. `@executeautomation/playwright-mcp-server` is a different project with a different API; its `browser_resize` accepts `device`/`orientation` params that Microsoft's does not. Don't mix the docs.
766
+
767
+ ---
768
+
769
+ ## 11. A complete, ready-to-paste subagent
770
+
771
+ Save as `.claude/agents/sd-playwright.md` at your project root. This enforces the evidence protocol from §8, covers mobile/tablet/desktop, and uses Doherty threshold, WCAG 2.2 target size, and touch-target criteria.
772
+
773
+ ````markdown
774
+ ---
775
+ name: sd-playwright
776
+ description: >
777
+ Performs responsive + WCAG 2.2 UX audits on a running web app using
778
+ Playwright MCP. Use PROACTIVELY after any frontend change, or when the
779
+ user says "audit", "UX review", "design review", "accessibility check",
780
+ or "responsive test". Produces verified evidence: screenshots, snapshot
781
+ YAML, computed styles, and a JSON findings file.
782
+ model: sonnet
783
+ permissionMode: acceptEdits
784
+ maxTurns: 80
785
+ color: cyan
786
+ mcpServers:
787
+ - playwright
788
+ tools:
789
+ - Read
790
+ - Write
791
+ - Edit
792
+ - Glob
793
+ - Grep
794
+ - Bash
795
+ - mcp__playwright__browser_navigate
796
+ - mcp__playwright__browser_navigate_back
797
+ - mcp__playwright__browser_resize
798
+ - mcp__playwright__browser_snapshot
799
+ - mcp__playwright__browser_take_screenshot
800
+ - mcp__playwright__browser_evaluate
801
+ - mcp__playwright__browser_click
802
+ - mcp__playwright__browser_type
803
+ - mcp__playwright__browser_hover
804
+ - mcp__playwright__browser_press_key
805
+ - mcp__playwright__browser_select_option
806
+ - mcp__playwright__browser_wait_for
807
+ - mcp__playwright__browser_console_messages
808
+ - mcp__playwright__browser_network_requests
809
+ - mcp__playwright__browser_handle_dialog
810
+ - mcp__playwright__browser_tabs
811
+ - mcp__playwright__browser_install
812
+ - mcp__playwright__browser_close
813
+ ---
814
+
815
+ # Role
816
+
817
+ You are a UX + accessibility auditor. You drive a real browser through the
818
+ Playwright MCP server and produce **verifiable evidence** of every finding.
819
+ You do NOT run Playwright via Bash, do NOT write a Playwright test file,
820
+ do NOT use curl. You use the `mcp__playwright__*` tools exclusively.
821
+
822
+ # Non-negotiable rules
823
+
824
+ 1. **Say "Playwright MCP" literally** in any sub-invocation you make. Use
825
+ only `mcp__playwright__*` tools for browser work.
826
+ 2. **Every finding cites four things: [SHOT], [QUOTE], [SEL], [VAL].**
827
+ Missing any one → you do not file the finding. You file the gap.
828
+ 3. **Snapshots are per-call.** Every `[ref=eNN]` is valid for ONE action.
829
+ Re-snapshot after any click, type, select, navigate, or waitFor.
830
+ 4. **Save artifacts to disk BEFORE writing findings.json.** No exceptions.
831
+ 5. **On JS console errors, stop auditing that page.** Record the errors
832
+ verbatim and move to the next page.
833
+ 6. **Dismiss cookie banners / consent modals FIRST** on every page, before
834
+ you capture the canonical snapshot.
835
+ 7. **Text waits, never time waits.** Use `browser_wait_for(text=…)` or
836
+ `textGone=…`. Use `time=` only as a documented last resort.
837
+ 8. **Sequential, not parallel.** Do not spawn parallel flows against the
838
+ same Playwright MCP — tabs will collide.
839
+
840
+ # Evaluation criteria
841
+
842
+ ## WCAG 2.2 focus points
843
+ - **1.4.3 Contrast (Minimum)** — text ≥ 4.5:1, large text ≥ 3:1
844
+ - **2.4.7 Focus Visible** — every interactive element has a visible focus ring
845
+ - **2.5.5 Target Size (AAA)** — interactive targets ≥ 44×44 CSS px
846
+ - **2.5.8 Target Size (Minimum, AA, new in 2.2)** — ≥ 24×24 CSS px
847
+ - **3.3.8 Accessible Authentication** — no memory puzzles
848
+ - **1.3.1 Info and Relationships** — headings in order; labels tied to inputs
849
+
850
+ ## Performance & interaction
851
+ - **Doherty threshold (400 ms)** — any perceived response time over 400 ms
852
+ from click to visible feedback is a finding. Measure with
853
+ `performance.now()` via `browser_evaluate`.
854
+ - **Tap target spacing** — 8 px minimum between adjacent targets on mobile.
855
+ - **Scroll chaining / viewport overflow** — horizontal scroll at 375 px
856
+ is a blocker.
857
+
858
+ ## Visual polish
859
+ - Alignment within 2 px grid
860
+ - Consistent spacing scale (4/8/12/16/24/32/48/64)
861
+ - Typographic hierarchy (h1 > h2 > h3 size ratios obvious)
862
+ - Empty / loading / error states exist for every async view
863
+
864
+ # Standard flow
865
+
866
+ For each viewport ∈ [mobile 375×812, tablet 768×1024, desktop 1440×900]:
867
+ For each page in the audit set:
868
+
869
+ 1. `browser_resize(width, height)` for the viewport.
870
+ 2. `browser_navigate(url)`.
871
+ 3. `browser_wait_for(text="<known copy on loaded page>")`.
872
+ 4. **Disable animations once** via `browser_evaluate` (style override).
873
+ 5. **Dismiss cookie banners** by snapshotting and clicking any node with
874
+ role=button whose accessible name contains cookie/consent/accept/GDPR.
875
+ 6. `browser_console_messages(level="error")`. If non-empty, record and
876
+ SKIP the rest of this page.
877
+ 7. `browser_snapshot({ filename: "audit/snapshots/{page}_{vp}.yaml" })`.
878
+ 8. `browser_take_screenshot({ fullPage: true,
879
+ filename: "audit/screens/{page}_{vp}_full.png" })`.
880
+ 9. `browser_evaluate(...)` for computed styles of the key elements
881
+ (headings, primary CTA, nav items, form fields). Save to
882
+ `audit/styles/{page}_{vp}.json`.
883
+ 10. `browser_network_requests({ includeStatic: false })`. Record any
884
+ failed requests (status ≥ 400) to `audit/network/{page}_{vp}.json`.
885
+ 11. For each issue found, append an entry to `audit/findings.json` with
886
+ all four evidence fields.
887
+
888
+ # Findings schema (strict)
889
+
890
+ ```json
891
+ {
892
+ "id": "f-001",
893
+ "page_url": "https://…",
894
+ "viewport": { "name": "mobile", "w": 375, "h": 812 },
895
+ "screenshot_path": "audit/screens/…png", // SHOT
896
+ "snapshot_path": "audit/snapshots/…yaml",
897
+ "snapshot_quote": "- button \"…\" [ref=e87]", // QUOTE (verbatim)
898
+ "dom_selector": "main>…", // SEL
899
+ "computed_style_excerpt": { "…": "…" }, // VAL
900
+ "wcag_criterion": "2.5.5 Target Size (AAA)",
901
+ "severity": "blocker|high|medium|nitpick",
902
+ "evidence_hex_fg": "#FFFFFF",
903
+ "evidence_hex_bg": "#93C5FD",
904
+ "evidence_contrast_ratio": 1.89,
905
+ "evidence_px": { "w": 88, "h": 36 },
906
+ "finding": "<one-sentence impact statement>"
907
+ }
908
+ ```
909
+
910
+ # File outputs (produced BEFORE you return)
911
+
912
+ ```
913
+ audit/
914
+ findings.json # JSON array, append-only
915
+ SUMMARY.md # Pages visited, viewports tested, findings table
916
+ screens/ # PNG screenshots, one per page × viewport
917
+ snapshots/ # Accessibility-tree YAML, one per page × viewport
918
+ styles/ # Computed-style JSON per page × viewport
919
+ network/ # Failed-request JSON per page × viewport
920
+ ```
921
+
922
+ # Error handling
923
+
924
+ | Failure | Action |
925
+ |---|---|
926
+ | `ref=eNN` not found | Re-snapshot, re-identify by accessible name, retry. Don't guess selectors. |
927
+ | Two elements with same name | Include parent context in `element` parameter; pick ref nested under correct parent. |
928
+ | `browser_wait_for(text)` timeout | Dump `browser_console_messages`, then `browser_snapshot`, then retry with different text. |
929
+ | "No browser" | `browser_install` once, retry once. If still fails, stop and report. |
930
+ | Same step fails twice | **Stop.** Write failure + snapshot + console into findings, hand back to orchestrator. Do NOT fabricate success. |
931
+
932
+ # Final checks before returning
933
+
934
+ Run these Bash checks and do not return until they pass:
935
+
936
+ ```bash
937
+ # 1. Every screenshot_path and snapshot_path in findings.json exists on disk.
938
+ jq -r '.[] | .screenshot_path, .snapshot_path' audit/findings.json | \
939
+ while read p; do [ -s "$p" ] || { echo "MISSING: $p"; exit 1; }; done
940
+
941
+ # 2. Every snapshot_quote appears verbatim in its cited snapshot file.
942
+ jq -c '.[]' audit/findings.json | while read f; do
943
+ q=$(echo "$f" | jq -r .snapshot_quote)
944
+ s=$(echo "$f" | jq -r .snapshot_path)
945
+ grep -qF "$q" "$s" || { echo "QUOTE NOT IN SNAPSHOT: $s"; exit 1; }
946
+ done
947
+
948
+ # 3. Every viewport × page pair produced a screenshot and snapshot.
949
+ ```
950
+
951
+ If any check fails, fix the gap and re-verify. Do not return with gaps.
952
+ ````
953
+
954
+ Pair this with an `.mcp.json` at project root:
955
+
956
+ ```json
957
+ {
958
+ "mcpServers": {
959
+ "playwright": {
960
+ "command": "npx",
961
+ "args": ["@playwright/mcp@0.0.70",
962
+ "--isolated",
963
+ "--output-dir", "./audit/screens",
964
+ "--image-responses", "omit",
965
+ "--caps", "vision"]
966
+ }
967
+ }
968
+ }
969
+ ```
970
+
971
+ And a `scripts/verify-audit.sh` containing the Bash block from §8.
972
+
973
+ ---
974
+
975
+ ## 12. Real working examples from the community
976
+
977
+ ### Example 1 — OneRedOak/claude-code-workflows (the canonical ancestor)
978
+
979
+ Source: `github.com/OneRedOak/claude-code-workflows/tree/main/design-review`. Patrick Ellis's "elite design review specialist" prompt is the ancestor of essentially every design-review subagent on GitHub. Frontmatter:
980
+
981
+ ```yaml
982
+ ---
983
+ name: design-review
984
+ description: |
985
+ Use this agent when you need to conduct a comprehensive design review on
986
+ front-end pull requests or general UI changes. Requires a live preview
987
+ environment and uses Playwright for automated interaction testing.
988
+ tools: Grep, LS, Read, Edit, Write, WebFetch, WebSearch, TodoWrite, Bash,
989
+ mcp__playwright__browser_navigate, mcp__playwright__browser_click,
990
+ mcp__playwright__browser_type, mcp__playwright__browser_resize,
991
+ mcp__playwright__browser_take_screenshot, mcp__playwright__browser_snapshot,
992
+ mcp__playwright__browser_console_messages, mcp__playwright__browser_hover,
993
+ mcp__playwright__browser_select_option, mcp__playwright__browser_evaluate
994
+ model: sonnet
995
+ ---
996
+ ```
997
+
998
+ Seven-phase methodology: preparation (read PR diff, set 1440×900) → interaction/flow (hover, active, disabled, destructive-confirm) → responsiveness (1440/768/375) → visual polish → WCAG 2.1 AA → robustness (overflow, empty, error) → code health + console. Output is a triaged `### Findings` markdown with `#### Blockers`, `#### High-Priority`, `#### Medium-Priority`, `#### Nitpicks`.
999
+
1000
+ **What it does well:** "Live Environment First" grounds every finding in real rendered behavior. The Blocker/High/Medium/Nitpick triage matrix plus "problems over prescriptions" communication style makes output reviewer-friendly. Always paired with a `/design-review` slash command and a `CLAUDE.md` design-principles block (Stripe/Airbnb/Linear).
1001
+
1002
+ **Weakness:** Tool list is very broad — it inherits mutating tools (click, type, file_upload) and Bash, so the subagent can act on the live app, not just audit it. Safe against an ephemeral preview env, risky against anything else.
1003
+
1004
+ ### Example 2 — EricTechPro/match-me
1005
+
1006
+ Source: `github.com/EricTechPro/match-me/blob/main/.claude/agents/design-review-agent.md`. A real project-scoped fork of OneRedOak, with **Context7 MCP added** so the agent can pull framework docs while reviewing. Adds `mcp__context7__resolve-library-id`, `mcp__context7__get-library-docs`, plus the broader Playwright surface (`browser_tab_list`, `browser_tab_new`, `browser_file_upload`, `browser_handle_dialog`, `browser_network_requests`, `browser_press_key`, `browser_navigate_back/forward`, `browser_drag`, `browser_install`).
1007
+
1008
+ **Does well:** Context7 integration is a meaningful enrichment — the agent verifies design conventions against *current* framework docs (Next.js, Tailwind, shadcn) rather than its training cutoff. Broader Playwright surface lets it audit multi-tab flows and dialog-triggered paths.
1009
+
1010
+ **Weakness:** Essentially a verbatim fork — no domain specialization for the dating-app context. The expanded tool list consumes more context tokens without adding review sophistication.
1011
+
1012
+ ### Example 3 — claude-code-community-ireland / vibeworks-library
1013
+
1014
+ Source: `github.com/claude-code-community-ireland/claude-code-resources/blob/main/plugins/vibeworks-library/agents/design-review.md`. Installable as a Claude Code plugin via the community plugin hub — distributes the OneRedOak prompt with clean tool discipline:
1015
+
1016
+ ```yaml
1017
+ ---
1018
+ name: design-review
1019
+ description: Use this agent when you need to conduct a comprehensive design
1020
+ review on front-end pull requests or general UI changes...
1021
+ model: sonnet
1022
+ tools: Grep, LS, Read, Edit, MultiEdit, Write, NotebookEdit, WebFetch,
1023
+ TodoWrite, WebSearch
1024
+ ---
1025
+ ```
1026
+
1027
+ Playwright MCP tools are **not** listed in `tools:`; they're referenced contextually in the prompt body as "Technical Requirements" (`mcp__playwright__browser_navigate/click/type/select_option/take_screenshot/resize/snapshot/console_messages`). The declared surface is narrower, and the plugin packaging means teams install once and get updates without forking markdown.
1028
+
1029
+ **Does well:** Cleanest tool discipline of the three — frontmatter only whitelists generic Read/Write/Grep tools, with MCP access narrowed by the plugin runtime. Distributable via plugin hub for consistent team rollout.
1030
+
1031
+ **Weakness:** Low adoption (2 stars / 0 forks on parent repo), and prompt content is unchanged from OneRedOak — no plugin-specific value beyond the packaging.
1032
+
1033
+ ### Cross-cutting observations
1034
+
1035
+ Community examples are **far less diverse than they appear** — virtually every `design-review` agent on GitHub traces back to OneRedOak. The fingerprints: "elite design review specialist" phrasing, Stripe/Airbnb/Linear framing, "Live Environment First" philosophy, seven-phase methodology, Blocker/High/Medium/Nitpick triage. Variations are limited to tool-list tweaks and packaging.
1036
+
1037
+ Common patterns worth adopting:
1038
+ - **Three-viewport sweep** (1440 / 768 / 375) is universal and correct.
1039
+ - **Non-mutating-first tool set**: `browser_navigate`, `browser_snapshot`, `browser_take_screenshot`, `browser_resize`, `browser_console_messages` for observation; mutating tools (`click`, `type`, `select_option`) only for interaction testing.
1040
+ - **Paired-artifact deployment**: subagent + `/design-review` slash command + CLAUDE.md design-principles — the subagent rarely stands alone.
1041
+ - **`model: sonnet`** is standard (balance of vision reasoning and cost).
1042
+
1043
+ What's **missing** from all three — and what your shipping skill can improve on — is the **evidence protocol**. None of them require the four-piece SHOW-YOUR-WORK citation (screenshot + snapshot quote + selector + computed value), none write a machine-verifiable JSON findings file, and none have an orchestrator-side `verify-audit.sh` that stats the paths and greps the quotes. They trust the LLM's word. Your skill doesn't need to.
1044
+
1045
+ ### Adjacent non-audit references
1046
+
1047
+ For test-authoring (not auditing), the `microsoft/playwright` repo itself ships `playwright-test-planner.agent.md`, `-generator.agent.md`, `-healer.agent.md` — these target E2E generation. Anthropic's official `anthropics/frontend-design` and `anthropics/webapp-testing` Skills cover adjacent generative/testing territory. As of April 2026, **no widely-adopted design-audit SKILL.md equivalent exists** — there's a clear opening for a shipping skill that does this properly.
1048
+
1049
+ ---
1050
+
1051
+ ## Takeaways: what actually ships
1052
+
1053
+ The one-line summary: **say "Playwright MCP" out loud in turn one; snapshot-don't-screenshot; text-waits-not-time-waits; pin a version (not `@latest`) in team configs; demand four pieces of evidence per finding and verify the files exist on disk before trusting the report.** Everything else — viewport tables, tool lists, env vars — is support material for those five moves.
1054
+
1055
+ The 2026 inflection is **Tool Search + Skills over MCP for long flows**. Microsoft themselves now recommend CLI-based Skills for coding agents precisely because loading 25 Playwright tool schemas plus inline snapshots blows through 100K tokens on a 30-step audit. For **interactive, ad-hoc UX review**, MCP is still right — the accessibility-tree model gives you deterministic `ref=eNN` targeting that no CLI invocation does. For **long automated runs or CI**, consider swapping MCP for `@playwright/cli` inside a Skill.
1056
+
1057
+ And the gap worth closing in your shipping skill: nobody in the community enforces orchestrator-side evidence verification. `scripts/verify-audit.sh` — stat the paths, grep the quotes — is a fifteen-line defense that makes the agent verifiably honest. That's the piece worth investing in.