start-vibing 4.0.2 → 4.1.1
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/package.json +1 -1
- package/template/.claude/CLAUDE.md +86 -20
- package/template/.claude/agents/sd-audit.md +197 -0
- package/template/.claude/agents/sd-fix-verify-semantic.md +112 -0
- package/template/.claude/agents/sd-fix-verify-technical.md +36 -0
- package/template/.claude/agents/sd-fix.md +194 -0
- package/template/.claude/agents/sd-research.md +61 -0
- package/template/.claude/agents/sd-synthesis.md +74 -0
- package/template/.claude/commands/super-design.md +15 -0
- package/template/.claude/hooks/super-design-session-start.sh +4 -0
- package/template/.claude/settings.json +14 -0
- package/template/.claude/skills/codebase-knowledge/SKILL.md +145 -0
- package/template/.claude/skills/codebase-knowledge/TEMPLATE.md +35 -0
- package/template/.claude/skills/codebase-knowledge/domains/claude-system.md +93 -0
- package/template/.claude/skills/composition-patterns/SKILL.md +89 -0
- package/template/.claude/skills/docs-tracker/SKILL.md +239 -0
- package/template/.claude/skills/mcp-builder/SKILL.md +236 -0
- package/template/.claude/skills/quality-gate/scripts/check-all.sh +83 -0
- package/template/.claude/skills/react-best-practices/SKILL.md +146 -0
- package/template/.claude/skills/security-scan/reference/owasp-top-10.md +257 -0
- package/template/.claude/skills/security-scan/scripts/scan.py +190 -0
- package/template/.claude/skills/super-design/README.md +37 -0
- package/template/.claude/skills/super-design/SKILL.md +105 -0
- package/template/.claude/skills/super-design/hooks/guard-paths.py +35 -0
- package/template/.claude/skills/super-design/hooks/post-edit-lint.py +57 -0
- package/template/.claude/skills/super-design/references/audit-methodology.md +513 -0
- package/template/.claude/skills/super-design/references/change-detection-playbook.md +1432 -0
- package/template/.claude/skills/super-design/references/design-theory.md +706 -0
- package/template/.claude/skills/super-design/references/fix-agent-playbook.md +118 -0
- package/template/.claude/skills/super-design/references/market-research-playbook.md +773 -0
- package/template/.claude/skills/super-design/references/playwright-mcp-reference.md +1057 -0
- package/template/.claude/skills/super-design/references/skills-subagents-reference.md +784 -0
- package/template/.claude/skills/super-design/references/superpowers-and-distribution.md +136 -0
- package/template/.claude/skills/super-design/scripts/detect-changes.sh +61 -0
- package/template/.claude/skills/super-design/scripts/diff-tokens.sh +13 -0
- package/template/.claude/skills/super-design/scripts/discover-routes.sh +45 -0
- package/template/.claude/skills/super-design/scripts/extract-tokens.mjs +41 -0
- package/template/.claude/skills/super-design/scripts/hash-pages.sh +42 -0
- package/template/.claude/skills/super-design/scripts/validate-state.sh +15 -0
- package/template/.claude/skills/super-design/scripts/verify-audit.sh +19 -0
- package/template/.claude/skills/super-design/templates/audit-state.schema.json +57 -0
- package/template/.claude/skills/super-design/templates/findings.schema.json +57 -0
- package/template/.claude/skills/super-design/templates/fix-history.md.tpl +26 -0
- package/template/.claude/skills/super-design/templates/overview.md.tpl +52 -0
- package/template/.claude/skills/test-coverage/reference/playwright-patterns.md +260 -0
- package/template/.claude/skills/test-coverage/scripts/coverage-check.sh +52 -0
- package/template/.claude/skills/typeui-ant/SKILL.md +133 -0
- package/template/.claude/skills/typeui-application/SKILL.md +128 -0
- package/template/.claude/skills/typeui-artistic/SKILL.md +133 -0
- package/template/.claude/skills/typeui-bento/SKILL.md +127 -0
- package/template/.claude/skills/typeui-bold/SKILL.md +127 -0
- package/template/.claude/skills/typeui-clean/SKILL.md +128 -0
- package/template/.claude/skills/typeui-dashboard/SKILL.md +133 -0
- package/template/.claude/skills/typeui-doodle/SKILL.md +142 -0
- package/template/.claude/skills/typeui-dramatic/SKILL.md +127 -0
- package/template/.claude/skills/typeui-enterprise/SKILL.md +132 -0
- package/template/.claude/skills/typeui-neobrutalism/SKILL.md +127 -0
- package/template/.claude/skills/typeui-paper/SKILL.md +127 -0
- package/template/.claude/skills/ui-ux-audit/QUICK-START.md +450 -0
- package/template/.claude/skills/ui-ux-audit/README.md +470 -0
- package/template/.claude/skills/ui-ux-audit/templates/audit-report.md +591 -0
- package/template/.claude/skills/ui-ux-audit/templates/competitor-analysis.md +363 -0
- package/template/.claude/skills/ui-ux-audit/templates/component-spec.md +491 -0
- package/template/.claude/skills/ui-ux-audit/templates/improvement-recommendation.md +450 -0
- package/template/.claude/skills/web-design-guidelines/SKILL.md +39 -0
- package/template/.claude/skills/webapp-testing/SKILL.md +96 -0
- package/template/.claude/skills/workflow-state/workflow-state.json +77 -0
|
@@ -0,0 +1,1057 @@
|
|
|
1
|
+
# Playwright MCP inside Claude Code: A production reference for UX-audit subagents
|
|
2
|
+
|
|
3
|
+
Microsoft's **`@playwright/mcp`** is the right choice for design/UX audit subagents in Claude Code (terminal, v2.1+), but only if you pin a known-good version, say "Playwright MCP" explicitly in the first turn, and enforce a SHOW-YOUR-WORK evidence protocol. This reference consolidates verified behavior from the official README, npm registry, GitHub issue #1359, Simon Willison's TIL, Playwright's device registry, and three real community subagent examples. The dominant pitfalls are not API gaps — they are **per-snapshot ref churn**, **inline screenshot token explosions (4× more than CLI)**, and Claude Code's tendency to reach for Bash instead of the MCP tools when the prompt is vague. Everything below is structured so you can copy-paste into a shipping skill.
|
|
4
|
+
|
|
5
|
+
Verified state as of **April 18, 2026**: `@playwright/mcp@0.0.70` is current on npm (306 versions published); issue #1359 — the notorious "No such tool available: mcp__playwright__browser_navigate" bug — is **closed**, with `0.0.41` as the validated fallback pin. Claude Code 2.0.1 through 2.1.22 are all compatible with a correctly pinned server.
|
|
6
|
+
|
|
7
|
+
---
|
|
8
|
+
|
|
9
|
+
## 1. Installation and configuration for Claude Code
|
|
10
|
+
|
|
11
|
+
### Canonical install (straight from the Microsoft README)
|
|
12
|
+
|
|
13
|
+
```bash
|
|
14
|
+
claude mcp add playwright npx @playwright/mcp@latest
|
|
15
|
+
```
|
|
16
|
+
|
|
17
|
+
Node 18+ required. This works because `claude mcp add` reads `npx` as the command and `@playwright/mcp@latest` as its first arg. When you start passing flags that could be confused with `claude mcp add`'s own flags, use the `--` separator:
|
|
18
|
+
|
|
19
|
+
```bash
|
|
20
|
+
# With flags passed through to the MCP server:
|
|
21
|
+
claude mcp add playwright -- npx @playwright/mcp@latest --headless --viewport-size "1440x900"
|
|
22
|
+
claude mcp add playwright -- npx @playwright/mcp@latest --device "iPhone 15"
|
|
23
|
+
claude mcp add playwright -- npx @playwright/mcp@latest --isolated --storage-state ./auth/storage.json
|
|
24
|
+
```
|
|
25
|
+
|
|
26
|
+
**Windows**: wrap with `cmd /c` — `claude mcp add playwright -- cmd /c npx @playwright/mcp@latest`.
|
|
27
|
+
|
|
28
|
+
### Scopes: local vs project vs user
|
|
29
|
+
|
|
30
|
+
| Scope | Flag | Storage | Shared? | Use for |
|
|
31
|
+
|---|---|---|---|---|
|
|
32
|
+
| **local** (default) | `--scope local` | `~/.claude.json` under `projects["<cwd>"].mcpServers` | Private, per-directory | Experiments, personal tokens |
|
|
33
|
+
| **project** | `--scope project` | **`.mcp.json` at repo root** | Yes — commit to git | Team-shared tooling |
|
|
34
|
+
| **user** | `--scope user` | `~/.claude.json` top-level `mcpServers` | Private, all your projects | Personal global tools |
|
|
35
|
+
|
|
36
|
+
Precedence when multiple scopes define the same server: **local > project > user**. Historical note: older Claude Code called `local` → `project` and `user` → `global`.
|
|
37
|
+
|
|
38
|
+
### The `.mcp.json` schema (project scope)
|
|
39
|
+
|
|
40
|
+
Lives at the project root and should be committed. Minimal form:
|
|
41
|
+
|
|
42
|
+
```json
|
|
43
|
+
{
|
|
44
|
+
"mcpServers": {
|
|
45
|
+
"playwright": {
|
|
46
|
+
"command": "npx",
|
|
47
|
+
"args": ["@playwright/mcp@latest"]
|
|
48
|
+
}
|
|
49
|
+
}
|
|
50
|
+
}
|
|
51
|
+
```
|
|
52
|
+
|
|
53
|
+
Extended form with explicit transport type, timeout, and env:
|
|
54
|
+
|
|
55
|
+
```json
|
|
56
|
+
{
|
|
57
|
+
"mcpServers": {
|
|
58
|
+
"playwright": {
|
|
59
|
+
"type": "stdio",
|
|
60
|
+
"command": "npx",
|
|
61
|
+
"timeout": 30,
|
|
62
|
+
"args": ["-y", "@playwright/mcp@0.0.70", "--headless", "--isolated",
|
|
63
|
+
"--output-dir", "./audit/screens", "--image-responses", "omit"],
|
|
64
|
+
"env": {
|
|
65
|
+
"PLAYWRIGHT_MCP_CONSOLE_LEVEL": "warning"
|
|
66
|
+
},
|
|
67
|
+
"disabled": false
|
|
68
|
+
}
|
|
69
|
+
}
|
|
70
|
+
}
|
|
71
|
+
```
|
|
72
|
+
|
|
73
|
+
User and local scopes live inside **`~/.claude.json`** rather than a dedicated file — a structure shared with `allowedTools`, `mcpContextUris`, and per-project state.
|
|
74
|
+
|
|
75
|
+
### Version state and the #1359 tool-name bug
|
|
76
|
+
|
|
77
|
+
**Latest:** `@playwright/mcp@0.0.70` (published mid-April 2026). **Known-broken:** `0.0.56` and `0.0.61` — both failed against Claude Code 2.0.1 → 2.1.22 with the error:
|
|
78
|
+
|
|
79
|
+
```
|
|
80
|
+
Error: No such tool available: mcp__playwright__browser_navigate
|
|
81
|
+
```
|
|
82
|
+
|
|
83
|
+
Root cause was a **tool-schema registration mismatch**: the MCP server connected successfully but Claude Code's session-stored tool manifest didn't include the `mcp__playwright__browser_*` tools, so the model couldn't see or call them. Not a permissions issue, not a server crash — a plumbing bug between the two. Issue #1359 is **closed**; `@latest` should work again. **Recommended pin for shared configs**: `@playwright/mcp@0.0.41` (the community-validated fallback) or `@playwright/mcp@0.0.70` after you verify it in your environment. Never pin `0.0.56` or `0.0.61`. The pre-release alpha Playwright runtime this package tracks is another reason to pin rather than use `@latest` in `.mcp.json` committed to a team repo.
|
|
84
|
+
|
|
85
|
+
Resolution pattern:
|
|
86
|
+
|
|
87
|
+
```bash
|
|
88
|
+
claude mcp remove playwright
|
|
89
|
+
claude mcp add playwright npx @playwright/mcp@0.0.41 # known-good fallback
|
|
90
|
+
```
|
|
91
|
+
|
|
92
|
+
### Verification
|
|
93
|
+
|
|
94
|
+
Outside a session: `claude mcp list` shows registered servers and their connection state. Inside a session: the **`/mcp`** slash command opens a panel listing each server's status and its available tools. Simon Willison's first-test prompt:
|
|
95
|
+
|
|
96
|
+
```
|
|
97
|
+
Use playwright mcp to open a browser to example.com
|
|
98
|
+
```
|
|
99
|
+
|
|
100
|
+
You **must** say "playwright mcp" explicitly — otherwise Claude often reaches for Bash or `curl` instead of the MCP tools (see §7).
|
|
101
|
+
|
|
102
|
+
### Transports: stdio vs HTTP/SSE
|
|
103
|
+
|
|
104
|
+
**stdio** (default) — Claude Code spawns `npx @playwright/mcp@latest` and talks JSON-RPC over stdin/stdout. Use locally.
|
|
105
|
+
|
|
106
|
+
**HTTP/SSE** — run the server separately, connect by URL. Use when running headed browser on a display-less host (WSL, remote dev box), Docker, or sharing a server across a team.
|
|
107
|
+
|
|
108
|
+
```bash
|
|
109
|
+
npx @playwright/mcp@latest --port 8931
|
|
110
|
+
# --host 0.0.0.0 to bind all interfaces
|
|
111
|
+
```
|
|
112
|
+
|
|
113
|
+
Register:
|
|
114
|
+
|
|
115
|
+
```bash
|
|
116
|
+
claude mcp add --transport http playwright http://localhost:8931/mcp
|
|
117
|
+
```
|
|
118
|
+
|
|
119
|
+
Or in `.mcp.json`:
|
|
120
|
+
|
|
121
|
+
```json
|
|
122
|
+
{ "mcpServers": { "playwright": { "url": "http://localhost:8931/mcp" } } }
|
|
123
|
+
```
|
|
124
|
+
|
|
125
|
+
### Browser binaries and Linux deps
|
|
126
|
+
|
|
127
|
+
Playwright MCP needs a browser binary. Three paths:
|
|
128
|
+
|
|
129
|
+
```bash
|
|
130
|
+
npx playwright install chromium # ahead of time
|
|
131
|
+
npx playwright install # all browsers
|
|
132
|
+
npx playwright install-deps # Linux/Docker system libs (apt packages)
|
|
133
|
+
```
|
|
134
|
+
|
|
135
|
+
Or call the MCP tool `browser_install` after connection — it installs whatever `--browser` the config specifies. Because `@playwright/mcp` often tracks alpha Playwright builds, running `npx playwright install` from a *different* local Playwright version can produce mismatched binaries. The `@latest` suffix on the MCP package fetches a clean copy rather than reusing whatever Playwright lives in your `node_modules`.
|
|
136
|
+
|
|
137
|
+
Docker (headless chromium only):
|
|
138
|
+
|
|
139
|
+
```json
|
|
140
|
+
{ "mcpServers": { "playwright": {
|
|
141
|
+
"command": "docker",
|
|
142
|
+
"args": ["run", "-i", "--rm", "--init", "--pull=always",
|
|
143
|
+
"mcr.microsoft.com/playwright/mcp"]
|
|
144
|
+
}}}
|
|
145
|
+
```
|
|
146
|
+
|
|
147
|
+
### Microsoft vs ExecuteAutomation — pick Microsoft
|
|
148
|
+
|
|
149
|
+
| Dimension | **Microsoft `@playwright/mcp`** | ExecuteAutomation `@executeautomation/playwright-mcp-server` |
|
|
150
|
+
|---|---|---|
|
|
151
|
+
| Status | **Official**, Microsoft Playwright team | Community third-party |
|
|
152
|
+
| Stars / forks | 31k / 2.5k | 5.4k / 489 |
|
|
153
|
+
| Tool prefix | `browser_*` | `playwright_*` |
|
|
154
|
+
| Interaction model | **Accessibility-tree-based** (`ref=eNN` identifiers) | DOM/selector + visible-text |
|
|
155
|
+
| Vision needed? | No — structured snapshots | Optional (screenshot parsing) |
|
|
156
|
+
| Tool count | ~19 core + opt-in caps (vision/pdf/testing/tracing) → 70+ | Smaller, DOM-focused |
|
|
157
|
+
| Chrome extension ("attach to my real tab") | Yes (`--extension`) | No |
|
|
158
|
+
| Docker image | `mcr.microsoft.com/playwright/mcp` | Community builds only |
|
|
159
|
+
| Cadence | 306 versions; weekly releases | Active but slower |
|
|
160
|
+
|
|
161
|
+
**Pick Microsoft's** for Claude Code audits: the accessibility-tree model gives **deterministic element targeting via stable `ref` IDs**, no vision-model dependency, and tracks Playwright core features as they ship. The Playwright docs explicitly recommend it; Builder.io's guide calls out the namespace collision warning that *"`Playwright MCP server` in search results is often ExecuteAutomation's separate community project — Microsoft's official package is `@playwright/mcp`."*
|
|
162
|
+
|
|
163
|
+
---
|
|
164
|
+
|
|
165
|
+
## 2. Complete tool API (@playwright/mcp, current)
|
|
166
|
+
|
|
167
|
+
All tools expose as `mcp__<server-name>__<tool-name>` to Claude — so `browser_navigate` on a server you registered as `playwright` becomes **`mcp__playwright__browser_navigate`**. Tools are organized by capability. **Core + core-tabs + core-install are always on**. `--caps=vision,pdf,testing,tracing` enable opt-in groups.
|
|
168
|
+
|
|
169
|
+
### Core automation (always enabled)
|
|
170
|
+
|
|
171
|
+
**`browser_navigate`** — `url` (string, required). Navigate to a URL.
|
|
172
|
+
|
|
173
|
+
**`browser_navigate_back`** — no params. Go back one page. ⚠️ `browser_navigate_forward` is **not in the current README** — earlier versions had it; assume absent in 0.0.70.
|
|
174
|
+
|
|
175
|
+
**`browser_snapshot`** — `filename` (string, optional). Returns the **accessibility tree** as YAML-like text (roles, accessible names, `ref=eNN` identifiers). *"This is better than screenshot"* per the README. Token-cheap and deterministic. See §2.4 for the ref system.
|
|
176
|
+
|
|
177
|
+
**`browser_take_screenshot`** — `type` (`png`|`jpeg`, default png), `filename` (default `page-{timestamp}.{png|jpeg}`), `element` (string) + `ref` (string) for per-element (must be provided together), `fullPage` (boolean, cannot combine with element). Per README: *"You can't perform actions based on the screenshot — use `browser_snapshot` for actions."*
|
|
178
|
+
|
|
179
|
+
**`browser_click`** — `element` (required human-readable desc), `ref` (required), `doubleClick` (bool), `button` (`left`|`middle`|`right`), `modifiers` (array).
|
|
180
|
+
|
|
181
|
+
**`browser_type`** — `element`, `ref`, `text` (all required), `submit` (bool, press Enter after), `slowly` (bool, char-by-char for key handlers).
|
|
182
|
+
|
|
183
|
+
**`browser_hover`** — `element`, `ref` (both required).
|
|
184
|
+
|
|
185
|
+
**`browser_fill_form`** — `fields` (array, required). ⚠️ The tool is **`browser_fill_form`**, not `browser_fill`. Inner per-field schema is not itemized in the README prose (it's in the runtime JSON Schema).
|
|
186
|
+
|
|
187
|
+
**`browser_press_key`** — `key` (string, required): `ArrowLeft`, `Enter`, a single char, etc.
|
|
188
|
+
|
|
189
|
+
**`browser_select_option`** — `element`, `ref`, `values` (array, single or multiple).
|
|
190
|
+
|
|
191
|
+
**`browser_drag`** — `startElement`, `startRef`, `endElement`, `endRef` (all required).
|
|
192
|
+
|
|
193
|
+
**`browser_resize`** — `width` (number), `height` (number), both required. Runtime viewport resize. No `device` or `orientation` param — those are `--device` launch-time only.
|
|
194
|
+
|
|
195
|
+
**`browser_evaluate`** — `function` (string, required, form `() => { ... }` or `(element) => { ... }`), `element` + `ref` (optional pair). Cannot accept `ref=eNN` as the argument directly; pass a CSS selector inside the function body (see issue #870).
|
|
196
|
+
|
|
197
|
+
**`browser_run_code`** — `code` (string, required). Full Playwright snippet: `async (page) => { await page.getByRole('button', { name: 'Submit' }).click(); return await page.title(); }`.
|
|
198
|
+
|
|
199
|
+
**`browser_console_messages`** — `level` (`error`|`warning`|`info`|`debug`, default `info`). Each level includes more severe levels.
|
|
200
|
+
|
|
201
|
+
**`browser_network_requests`** — `includeStatic` (bool, default false). Static assets like images/fonts/scripts are filtered unless enabled. Failed-request field names are not documented in the README.
|
|
202
|
+
|
|
203
|
+
**`browser_wait_for`** — mutually exclusive: `text` (appears), `textGone` (disappears), or `time` (seconds). **Prefer `text`** — see §7.
|
|
204
|
+
|
|
205
|
+
**`browser_handle_dialog`** — `accept` (bool, required), `promptText` (string). For native `alert`/`confirm`/`prompt`, not HTML modals.
|
|
206
|
+
|
|
207
|
+
**`browser_file_upload`** — `paths` (array of absolute paths). Empty cancels the chooser.
|
|
208
|
+
|
|
209
|
+
**`browser_close`** — no params.
|
|
210
|
+
|
|
211
|
+
### Tabs + install (always enabled)
|
|
212
|
+
|
|
213
|
+
**`browser_tabs`** — `action` (`list`|`new`|`close`|`select`), `index` (number, optional). Unifies what earlier versions had as `browser_tab_new`/`_close`/`_list`/`_select`.
|
|
214
|
+
|
|
215
|
+
**`browser_install`** — no params. Installs the browser specified in the config. Call this on "browser not installed" errors.
|
|
216
|
+
|
|
217
|
+
### Opt-in caps
|
|
218
|
+
|
|
219
|
+
`--caps=vision`:
|
|
220
|
+
- **`browser_mouse_click_xy`** — `element`, `x`, `y`
|
|
221
|
+
- **`browser_mouse_move_xy`** — `element`, `x`, `y`
|
|
222
|
+
- **`browser_mouse_drag_xy`** — `element`, `startX`, `startY`, `endX`, `endY`
|
|
223
|
+
|
|
224
|
+
`--caps=pdf`:
|
|
225
|
+
- **`browser_pdf_save`** — `filename` (default `page-{timestamp}.pdf`).
|
|
226
|
+
|
|
227
|
+
`--caps=testing`:
|
|
228
|
+
- **`browser_generate_locator`** — `element`, `ref`. Generate test-grade locator.
|
|
229
|
+
- **`browser_verify_element_visible`** — `role`, `accessibleName`.
|
|
230
|
+
- **`browser_verify_text_visible`** — `text`.
|
|
231
|
+
- **`browser_verify_list_visible`** — `element`, `ref`, `items` (array).
|
|
232
|
+
- **`browser_verify_value`** — `type`, `element`, `ref`, `value` (use `"true"`/`"false"` for checkboxes).
|
|
233
|
+
|
|
234
|
+
`--caps=tracing`:
|
|
235
|
+
- **`browser_start_tracing`** / **`browser_stop_tracing`** — no params.
|
|
236
|
+
|
|
237
|
+
### The `ref=eNN` accessibility-reference system
|
|
238
|
+
|
|
239
|
+
`browser_snapshot` returns a structured accessibility tree, **not pixels**. Every interactive node carries a role, accessible name, and a stable ref — **assigned at snapshot time** by walking the tree:
|
|
240
|
+
|
|
241
|
+
```yaml
|
|
242
|
+
- banner:
|
|
243
|
+
- heading "Example Domain" [level=1] [ref=e3]
|
|
244
|
+
- paragraph [ref=e4]: "This domain is for use in illustrative examples..."
|
|
245
|
+
- link "More information..." [ref=e5]:
|
|
246
|
+
/url: https://www.iana.org/domains/example
|
|
247
|
+
- textbox "Search" [ref=e12]
|
|
248
|
+
- button "Submit" [ref=e13]
|
|
249
|
+
```
|
|
250
|
+
|
|
251
|
+
**Scope and stability** — refs are scoped to a single snapshot. The `e{N}` prefix is the main frame; `s{F}e{N}` is subframe-F element-N. **After any mutation (click, type, navigate), refs go stale**. Call `browser_snapshot` again before the next interaction, or rely on the auto-snapshot that most tools return in their response.
|
|
252
|
+
|
|
253
|
+
**Why two params (`element` + `ref`)** — every interaction tool takes both. `ref` is the deterministic target; `element` is a human-readable description used for permission prompts and logging. For drag, the pattern doubles into `startElement`/`startRef` + `endElement`/`endRef`:
|
|
254
|
+
|
|
255
|
+
```json
|
|
256
|
+
{ "tool": "mcp__playwright__browser_click",
|
|
257
|
+
"arguments": { "element": "'More information' link in banner", "ref": "e5" } }
|
|
258
|
+
```
|
|
259
|
+
|
|
260
|
+
**Snapshot modes** — `--snapshot-mode` (env `PLAYWRIGHT_MCP_SNAPSHOT_MODE`): `incremental` (default, return diff only), `full` (always complete tree), `none` (suppress auto-snapshot; you must call `browser_snapshot` manually). Use `none` on very large pages to avoid context bloat; see §10.
|
|
261
|
+
|
|
262
|
+
---
|
|
263
|
+
|
|
264
|
+
## 3. Viewport and device emulation
|
|
265
|
+
|
|
266
|
+
### Launch-time flags
|
|
267
|
+
|
|
268
|
+
```
|
|
269
|
+
--viewport-size <WxH> e.g. "1440x900" env PLAYWRIGHT_MCP_VIEWPORT_SIZE
|
|
270
|
+
--device <name> e.g. "iPhone 15"
|
|
271
|
+
--user-agent <string> override UA env PLAYWRIGHT_MCP_USER_AGENT
|
|
272
|
+
```
|
|
273
|
+
|
|
274
|
+
Viewport format is **`WIDTHxHEIGHT`** with lowercase `x`. The comma form (`1280,720`) is not documented.
|
|
275
|
+
|
|
276
|
+
### Runtime resize
|
|
277
|
+
|
|
278
|
+
`browser_resize(width, height)` calls `page.setViewportSize(...)` under the hood — it changes **viewport dimensions only**. Any `--device`-derived `deviceScaleFactor`, `userAgent`, `isMobile`, or `hasTouch` flags remain intact. There is **no MCP tool to change `--device` mid-session**; to switch device emulation, restart the server.
|
|
279
|
+
|
|
280
|
+
### Useful device names from Playwright's registry
|
|
281
|
+
|
|
282
|
+
All names are case-sensitive, from `deviceDescriptorsSource.json`. Each has a `" landscape"` variant.
|
|
283
|
+
|
|
284
|
+
| `--device` value | Viewport | DPR | Engine |
|
|
285
|
+
|---|---|---|---|
|
|
286
|
+
| `"iPhone SE"` | 320×568 | 2 | webkit |
|
|
287
|
+
| `"iPhone 13"` / `"iPhone 14"` | 390×664 | 3 | webkit |
|
|
288
|
+
| `"iPhone 15"` / `"iPhone 15 Pro"` | 393×659 | 3 | webkit |
|
|
289
|
+
| `"iPhone 15 Pro Max"` | 430×739 | 3 | webkit |
|
|
290
|
+
| `"Pixel 5"` | 393×727 | 2.75 | chromium |
|
|
291
|
+
| `"Pixel 7"` | 412×839 | ~2.625 | chromium |
|
|
292
|
+
| `"Galaxy S9+"` | 320×658 | 4.5 | chromium |
|
|
293
|
+
| `"iPad Mini"` | 768×1024 | 2 | webkit |
|
|
294
|
+
| `"iPad Pro 11"` | 834×1194 | 2 | webkit |
|
|
295
|
+
| `"Desktop Chrome"` / `"Desktop Safari"` / `"Desktop Edge"` / `"Desktop Firefox"` | 1280×720 | 1 | varies |
|
|
296
|
+
|
|
297
|
+
### Standard audit breakpoints
|
|
298
|
+
|
|
299
|
+
The community converges on three: **375×812 mobile**, **768×1024 tablet**, **1440×900 desktop**. For comprehensive sweeps, add **1920×1080** (full HD). For the smallest mobile, test **320×568** (iPhone SE). Use `browser_resize` between pages — don't restart the server.
|
|
300
|
+
|
|
301
|
+
### Device-pixel-ratio and screenshot bloat
|
|
302
|
+
|
|
303
|
+
**iPhone 15 at DPR 3** turns a 393×2000 CSS-px full-page shot into **1179×6000 physical pixels** — easily 3–10 MB, and catastrophic if returned inline to the model. Mitigations, in priority order:
|
|
304
|
+
|
|
305
|
+
1. Run with `--image-responses omit` (env `PLAYWRIGHT_MCP_IMAGE_RESPONSES=omit`) so screenshots go to disk instead of being base64-encoded into the response.
|
|
306
|
+
2. Prefer `browser_snapshot` for decisions; use `browser_take_screenshot` only for reviewer evidence.
|
|
307
|
+
3. On high-DPR devices, pass `type: "jpeg"` or restrict to element screenshots.
|
|
308
|
+
4. Save with descriptive relative filenames into `--output-dir`.
|
|
309
|
+
|
|
310
|
+
---
|
|
311
|
+
|
|
312
|
+
## 4. Capture strategies
|
|
313
|
+
|
|
314
|
+
### Snapshot vs screenshot (know when to use which)
|
|
315
|
+
|
|
316
|
+
| Need | Tool | Token cost | When |
|
|
317
|
+
|---|---|---|---|
|
|
318
|
+
| Decide what to do, get refs | `browser_snapshot` | 200–400 tokens on small pages, multi-K on rich apps | **Default**; after every state change |
|
|
319
|
+
| Visual evidence for human reviewer | `browser_take_screenshot` | 4–8K inline, ~50 if saved to disk | Only when visuals matter; always save to disk |
|
|
320
|
+
| Computed styles (hex, px, ratios) | `browser_evaluate` | Tiny | Required for any WCAG-grade numeric claim |
|
|
321
|
+
| Error context | `browser_console_messages` | Small | At top of every page audit |
|
|
322
|
+
| Failed resource requests | `browser_network_requests` | Small–medium | When diagnosing 404s, blocked requests |
|
|
323
|
+
|
|
324
|
+
Playwright's own benchmark shows **~114K tokens via MCP vs ~27K via CLI** on equivalent tasks — a 4× multiplier driven almost entirely by inline images and auto-snapshots. A single content-heavy-page screenshot inline is 5–8K tokens; saved to disk, it's ~50 tokens for the path.
|
|
325
|
+
|
|
326
|
+
### Full-page and element captures
|
|
327
|
+
|
|
328
|
+
```json
|
|
329
|
+
// Full page
|
|
330
|
+
{ "tool": "browser_take_screenshot",
|
|
331
|
+
"arguments": { "fullPage": true, "filename": "home_mobile.png" } }
|
|
332
|
+
|
|
333
|
+
// Per-element (from a prior snapshot)
|
|
334
|
+
{ "tool": "browser_take_screenshot",
|
|
335
|
+
"arguments": { "element": "Primary CTA button",
|
|
336
|
+
"ref": "e87",
|
|
337
|
+
"filename": "cta_mobile.png" } }
|
|
338
|
+
```
|
|
339
|
+
|
|
340
|
+
`fullPage: true` and `element`/`ref` are **mutually exclusive**.
|
|
341
|
+
|
|
342
|
+
### Output env vars (verified from `--help`)
|
|
343
|
+
|
|
344
|
+
```
|
|
345
|
+
--output-dir <path> env PLAYWRIGHT_MCP_OUTPUT_DIR
|
|
346
|
+
--save-session env PLAYWRIGHT_MCP_SAVE_SESSION
|
|
347
|
+
--save-trace env PLAYWRIGHT_MCP_SAVE_TRACE
|
|
348
|
+
--save-video <WxH> env PLAYWRIGHT_MCP_SAVE_VIDEO
|
|
349
|
+
--image-responses allow|omit env PLAYWRIGHT_MCP_IMAGE_RESPONSES
|
|
350
|
+
--snapshot-mode incremental|full|none env PLAYWRIGHT_MCP_SNAPSHOT_MODE
|
|
351
|
+
```
|
|
352
|
+
|
|
353
|
+
⚠️ **`PLAYWRIGHT_MCP_OUTPUT_MODE` is NOT documented** in the current upstream README. If your notes or third-party guides reference it with values `file`/`stdout`, they're citing a fork or an outdated version. The actual lever for "send images to disk, not into the model" is `--image-responses omit`. The closest match for session-trace persistence is `--save-session` / `PLAYWRIGHT_MCP_SAVE_SESSION`.
|
|
354
|
+
|
|
355
|
+
### Default output directory
|
|
356
|
+
|
|
357
|
+
The README does **not** explicitly state the default. Files go to an OS temp directory created at startup when `--output-dir` is omitted. **Always pass `--output-dir` explicitly** for audits so evidence lands in a predictable place. Default auto-generated filenames are `page-{timestamp}.png|jpeg|pdf`.
|
|
358
|
+
|
|
359
|
+
### File-naming strategy for audit evidence
|
|
360
|
+
|
|
361
|
+
Deterministic, sortable, and relative. Pass `filename` explicitly for every shot:
|
|
362
|
+
|
|
363
|
+
```
|
|
364
|
+
{route-slug}_{viewport-label}_{state}_{iso-timestamp}.png
|
|
365
|
+
|
|
366
|
+
home_iphone15_loggedout_2026-04-18T14-32-11Z.png
|
|
367
|
+
pricing_1440x900_loggedin_2026-04-18T14-33-02Z.png
|
|
368
|
+
checkout-step2_ipadpro_error_2026-04-18T14-35-44Z.png
|
|
369
|
+
```
|
|
370
|
+
|
|
371
|
+
Keep filenames relative (no leading `/`) so they resolve inside `--output-dir`.
|
|
372
|
+
|
|
373
|
+
---
|
|
374
|
+
|
|
375
|
+
## 5. Authentication and state
|
|
376
|
+
|
|
377
|
+
### Persistent profile (default)
|
|
378
|
+
|
|
379
|
+
Without `--isolated`, Playwright MCP uses a persistent profile. Default locations:
|
|
380
|
+
|
|
381
|
+
```
|
|
382
|
+
Windows: %USERPROFILE%\AppData\Local\ms-playwright\mcp-{channel}-profile
|
|
383
|
+
macOS: ~/Library/Caches/ms-playwright/mcp-{channel}-profile
|
|
384
|
+
Linux: ~/.cache/ms-playwright/mcp-{channel}-profile
|
|
385
|
+
```
|
|
386
|
+
|
|
387
|
+
`{channel}` is `chrome` / `msedge` / `chromium`. Some newer versions use `mcp-{channel}-{workspace-hash}` to give different projects separate profiles — inspect the directory after first run to see which form your version uses.
|
|
388
|
+
|
|
389
|
+
### The "log in yourself, then continue" pattern
|
|
390
|
+
|
|
391
|
+
From the README: *"All the logged in information will be stored in the persistent profile; you can delete it between sessions if you'd like to clear the offline state."* Flow:
|
|
392
|
+
|
|
393
|
+
1. Launch MCP in headed mode (default, don't pass `--headless`) with an explicit `--user-data-dir ./.pw-mcp-profile`.
|
|
394
|
+
2. Agent navigates to the login URL via `browser_navigate`.
|
|
395
|
+
3. **You log in manually** in the headed browser while the agent pauses or waits on a text signal.
|
|
396
|
+
4. Cookies/localStorage persist into the profile directory.
|
|
397
|
+
5. Future sessions with the same `--user-data-dir` skip login.
|
|
398
|
+
|
|
399
|
+
### `--storage-state` for CI
|
|
400
|
+
|
|
401
|
+
With `--isolated`, every session starts clean — closing the browser discards all state. Pre-seed credentials via Playwright's standard storageState JSON:
|
|
402
|
+
|
|
403
|
+
```json
|
|
404
|
+
{ "mcpServers": { "playwright": {
|
|
405
|
+
"command": "npx",
|
|
406
|
+
"args": ["@playwright/mcp@latest",
|
|
407
|
+
"--isolated",
|
|
408
|
+
"--storage-state=./auth/storage.json"]
|
|
409
|
+
}}}
|
|
410
|
+
```
|
|
411
|
+
|
|
412
|
+
Generate the file once with `playwright codegen --save-storage=./auth/storage.json` or from test code.
|
|
413
|
+
|
|
414
|
+
### The browser-extension approach ("share my real tab")
|
|
415
|
+
|
|
416
|
+
Install the **Playwright MCP Bridge** extension from github.com/microsoft/playwright-mcp/releases. Load it as unpacked in `chrome://extensions/` with Developer mode on. Then:
|
|
417
|
+
|
|
418
|
+
```json
|
|
419
|
+
{ "mcpServers": { "playwright-extension": {
|
|
420
|
+
"command": "npx",
|
|
421
|
+
"args": ["@playwright/mcp@latest", "--extension"]
|
|
422
|
+
}}}
|
|
423
|
+
```
|
|
424
|
+
|
|
425
|
+
On first tool call, the extension opens a tab-picker; you select the running tab the agent attaches to. For headless auto-approval, copy the `PLAYWRIGHT_MCP_EXTENSION_TOKEN` from the extension popup and put it in the server's `env` block. This is the only way to get **real SSO sessions, enterprise policies, and ad-blocker state** without manual reauth — MCP attaches rather than spawning a clean Playwright browser.
|
|
426
|
+
|
|
427
|
+
---
|
|
428
|
+
|
|
429
|
+
## 6. Security and scope
|
|
430
|
+
|
|
431
|
+
### Origin allow/block lists
|
|
432
|
+
|
|
433
|
+
```
|
|
434
|
+
--allowed-origins "https://a.com;https://*.b.com" env PLAYWRIGHT_MCP_ALLOWED_ORIGINS
|
|
435
|
+
--blocked-origins "https://evil.com" env PLAYWRIGHT_MCP_BLOCKED_ORIGINS
|
|
436
|
+
```
|
|
437
|
+
|
|
438
|
+
**Semicolon-separated**, not comma. Glob wildcards with `*` are supported (Playwright-style). Blocklist is evaluated first. With only a blocklist, non-matching requests are still allowed.
|
|
439
|
+
|
|
440
|
+
### File access
|
|
441
|
+
|
|
442
|
+
`--allow-unrestricted-file-access` / `PLAYWRIGHT_MCP_ALLOW_UNRESTRICTED_FILE_ACCESS` unlocks two defaults:
|
|
443
|
+
- File-system access (for upload paths, output paths) is otherwise restricted to the MCP client's workspace roots (or cwd).
|
|
444
|
+
- `file://` navigation is otherwise blocked.
|
|
445
|
+
|
|
446
|
+
### The "not a security boundary" disclaimer (verbatim)
|
|
447
|
+
|
|
448
|
+
The README — and both origin flag help strings — explicitly state:
|
|
449
|
+
|
|
450
|
+
> **Playwright MCP is *not* a security boundary. See MCP Security Best Practices for guidance on securing your deployment.**
|
|
451
|
+
|
|
452
|
+
> "`--allowed-origins` / `--blocked-origins`: Important: *does not* serve as a security boundary and *does not* affect redirects."
|
|
453
|
+
|
|
454
|
+
> "`allowUnrestrictedFileAccess` acts as a guardrail to prevent the LLM from accidentally wandering outside its intended workspace. It is a convenience defense to catch unintended file access, not a secure boundary; a deliberate attempt to reach other directories can be easily worked around, so always rely on client-level permissions for true security."
|
|
455
|
+
|
|
456
|
+
Practical implication: origin lists can be bypassed via **DNS rebinding, HTTP redirects, or direct navigation**. Treat them as hygiene controls, not containment. For real isolation, run MCP in a disposable container or VM.
|
|
457
|
+
|
|
458
|
+
### Data-leakage implications for authenticated audits
|
|
459
|
+
|
|
460
|
+
Every byte returned by an MCP tool is forwarded to Anthropic's API. Specifically, that includes:
|
|
461
|
+
|
|
462
|
+
- `browser_snapshot` — all visible text, `aria-label`s, form labels, occasionally form values
|
|
463
|
+
- `browser_take_screenshot` — raw pixels (names, emails, dashboards, tokens in URL bars)
|
|
464
|
+
- `browser_console_messages` — stack traces frequently containing tokens, debug dumps, API request bodies
|
|
465
|
+
- `browser_network_requests` — URLs including query-string session IDs
|
|
466
|
+
- `browser_evaluate` — arbitrary JS (`document.cookie`, `localStorage`, DOM secrets)
|
|
467
|
+
|
|
468
|
+
**Never audit prod with real PII.** Use a staging tenant seeded with synthetic data, pair it with `--image-responses omit` + local human review, and check your Anthropic retention/ZDR tier before running against anything internal.
|
|
469
|
+
|
|
470
|
+
### Docker and `--no-sandbox`
|
|
471
|
+
|
|
472
|
+
`--no-sandbox` disables Chromium's sandbox — required inside minimal containers (including `mcr.microsoft.com/playwright/mcp` when running as root without user-namespace capabilities). Fine for a disposable container, bad on a workstation. Example long-lived container from the README:
|
|
473
|
+
|
|
474
|
+
```bash
|
|
475
|
+
docker run -d -i --rm --init --pull=always \
|
|
476
|
+
--entrypoint node --name playwright -p 8931:8931 \
|
|
477
|
+
mcr.microsoft.com/playwright/mcp \
|
|
478
|
+
cli.js --headless --browser chromium --no-sandbox --port 8931
|
|
479
|
+
```
|
|
480
|
+
|
|
481
|
+
---
|
|
482
|
+
|
|
483
|
+
## 7. Prompting patterns: how to get the tools actually used
|
|
484
|
+
|
|
485
|
+
### Rule zero: say "Playwright MCP" explicitly
|
|
486
|
+
|
|
487
|
+
Simon Willison's TIL (til.simonwillison.net/claude-code/playwright-mcp-claude-code) is blunt:
|
|
488
|
+
|
|
489
|
+
> "I found I needed to explicitly say 'playwright mcp' the first time, otherwise it might try to use Bash to run Playwright instead."
|
|
490
|
+
|
|
491
|
+
The mechanism is Claude's tool-selection heuristic — "test the login page" semantically matches the general-purpose `Bash` tool more strongly than `mcp__playwright__browser_navigate`, **especially in repos where `playwright` is already a dev dependency**. Writing "Playwright MCP" (or the `mcp__playwright__` prefix) in the first turn aligns the request with the MCP tool descriptions. Canonical kickoff:
|
|
492
|
+
|
|
493
|
+
```
|
|
494
|
+
Using the Playwright MCP server (mcp__playwright__*), open http://localhost:3000
|
|
495
|
+
and perform a responsive audit. Do NOT use Bash, curl, or write a Playwright
|
|
496
|
+
test file — drive the browser live through the MCP tools.
|
|
497
|
+
```
|
|
498
|
+
|
|
499
|
+
### Standard recon flow
|
|
500
|
+
|
|
501
|
+
```
|
|
502
|
+
browser_navigate(url)
|
|
503
|
+
→ browser_wait_for(text="<known copy on loaded page>") # text, not time
|
|
504
|
+
→ browser_console_messages() # bail early on JS errors
|
|
505
|
+
→ browser_snapshot() # get refs
|
|
506
|
+
→ [quote refs verbatim in findings]
|
|
507
|
+
→ browser_click({ ref: "e42", element: "Submit button in 'Billing' form" })
|
|
508
|
+
→ browser_wait_for(text="<post-action signal>")
|
|
509
|
+
→ browser_snapshot() # REFRESH — old refs stale
|
|
510
|
+
→ browser_evaluate(...) # numeric evidence
|
|
511
|
+
→ browser_take_screenshot({ fullPage: true, filename: "..." })
|
|
512
|
+
```
|
|
513
|
+
|
|
514
|
+
### Viewport loop (responsive audits)
|
|
515
|
+
|
|
516
|
+
Embed in the subagent system prompt. Run **sequentially** inside one subagent, not as parallel subagents sharing one browser tab (issue #893 — they fight over the same tab and return inconsistent results unless you use `--isolated` + per-agent `browser_tab_new`).
|
|
517
|
+
|
|
518
|
+
```python
|
|
519
|
+
VIEWPORTS = [("mobile", 375, 812), ("tablet", 768, 1024), ("desktop", 1440, 900)]
|
|
520
|
+
PAGES = ["/", "/pricing", "/docs", "/signup"]
|
|
521
|
+
|
|
522
|
+
for page in PAGES:
|
|
523
|
+
for name, w, h in VIEWPORTS:
|
|
524
|
+
browser_resize(width=w, height=h)
|
|
525
|
+
browser_navigate(url=BASE+page)
|
|
526
|
+
browser_wait_for(text=EXPECTED[page])
|
|
527
|
+
browser_console_messages()
|
|
528
|
+
snap = browser_snapshot() # save to audit/snapshots/{page}_{vp}.yaml
|
|
529
|
+
browser_take_screenshot(fullPage=True,
|
|
530
|
+
filename=f"audit/screens/{page}_{vp}.png")
|
|
531
|
+
styles = browser_evaluate(function="() => { /* getComputedStyle for h1, CTA, nav */ }")
|
|
532
|
+
append_finding(...)
|
|
533
|
+
```
|
|
534
|
+
|
|
535
|
+
### Waiting strategies — text beats time, always
|
|
536
|
+
|
|
537
|
+
Playwright itself *"discourages waitFor methods that wait for network connections to be idle"* — modern apps never stop talking (analytics beacons, websockets, polling). `networkidle` is a trap.
|
|
538
|
+
|
|
539
|
+
Decision tree:
|
|
540
|
+
- Just navigated? → `browser_wait_for(text="<known copy>")`
|
|
541
|
+
- Clicked something that opens a modal? → `browser_wait_for(text="<modal heading>")`
|
|
542
|
+
- Removed a spinner? → `browser_wait_for(textGone="Loading...")`
|
|
543
|
+
- No textual signal? → `browser_evaluate` polling `document.readyState` or a specific selector
|
|
544
|
+
- Absolutely last resort → `browser_wait_for(time=1)` with a comment explaining why
|
|
545
|
+
|
|
546
|
+
Also: **auto-wait exists**. `browser_click`/`browser_type` already wait for actionability. Don't stack a redundant `browser_wait_for` before every action — only after navigation and async state changes.
|
|
547
|
+
|
|
548
|
+
### `browser_evaluate` patterns for WCAG-grade numbers
|
|
549
|
+
|
|
550
|
+
```js
|
|
551
|
+
// Contrast + typography
|
|
552
|
+
(el => {
|
|
553
|
+
const s = getComputedStyle(el);
|
|
554
|
+
return { color: s.color, background: s.backgroundColor,
|
|
555
|
+
fontSize: s.fontSize, lineHeight: s.lineHeight,
|
|
556
|
+
fontWeight: s.fontWeight };
|
|
557
|
+
})(document.querySelector('h1.hero__title'))
|
|
558
|
+
|
|
559
|
+
// Touch-target audit (WCAG 2.5.5 — 44×44 CSS px)
|
|
560
|
+
[...document.querySelectorAll('button, a, [role=button]')].map(el => {
|
|
561
|
+
const r = el.getBoundingClientRect();
|
|
562
|
+
return { text: el.innerText.slice(0,40), w: r.width, h: r.height,
|
|
563
|
+
ok: r.width >= 44 && r.height >= 44 };
|
|
564
|
+
}).filter(x => !x.ok)
|
|
565
|
+
```
|
|
566
|
+
|
|
567
|
+
### Error handling
|
|
568
|
+
|
|
569
|
+
| Failure | Response |
|
|
570
|
+
|---|---|
|
|
571
|
+
| `ref=eNN` not found | Page re-rendered. Re-snapshot, re-identify by accessible name, retry with new ref. Don't guess CSS selectors. |
|
|
572
|
+
| Click hit wrong element | Two "Submit" buttons — re-snapshot, include parent context in `element`: `"Submit button inside 'Shipping address' form"`, pick the ref nested under that parent in the YAML. |
|
|
573
|
+
| `browser_wait_for(text=…)` timeout | 80% of the time: there's a JS error. Dump `browser_console_messages` first, then `browser_snapshot` to see what actually rendered. Retry with different text or `textGone`. |
|
|
574
|
+
| "No browser" error | `browser_install` once, then retry. Don't loop blindly. |
|
|
575
|
+
| Same step fails twice | **Stop.** Write failure + snapshot + console to findings, surface to orchestrator. Do NOT fabricate success. |
|
|
576
|
+
|
|
577
|
+
---
|
|
578
|
+
|
|
579
|
+
## 8. Anti-hallucination and verification
|
|
580
|
+
|
|
581
|
+
Community reports are consistent: the agent often loses its place around step 12 of a 20-step flow, starts hallucinating selectors, and writes findings for pages it never visited. The antidote is structural, not prompt-hopeful.
|
|
582
|
+
|
|
583
|
+
### Require a structured JSON findings file
|
|
584
|
+
|
|
585
|
+
The findings file is the **single source of truth** — nothing the agent *says* in chat counts unless it's in the file:
|
|
586
|
+
|
|
587
|
+
```json
|
|
588
|
+
{
|
|
589
|
+
"id": "f-001",
|
|
590
|
+
"page_url": "https://app.example.com/pricing",
|
|
591
|
+
"viewport": { "name": "mobile", "w": 375, "h": 812 },
|
|
592
|
+
"screenshot_path": "audit/screens/pricing_mobile.png",
|
|
593
|
+
"snapshot_path": "audit/snapshots/pricing_mobile.yaml",
|
|
594
|
+
"snapshot_quote": "- button \"Get started\" [ref=e87]",
|
|
595
|
+
"dom_selector": "main > section.cta > button.primary",
|
|
596
|
+
"computed_style_excerpt": {
|
|
597
|
+
"color": "rgb(255, 255, 255)", "background-color": "rgb(147, 197, 253)",
|
|
598
|
+
"font-size": "14px", "min-height": "36px", "min-width": "88px"
|
|
599
|
+
},
|
|
600
|
+
"wcag_criterion": "2.5.5 Target Size (AAA) / 1.4.3 Contrast",
|
|
601
|
+
"severity": "high",
|
|
602
|
+
"evidence_hex_fg": "#FFFFFF",
|
|
603
|
+
"evidence_hex_bg": "#93C5FD",
|
|
604
|
+
"evidence_contrast_ratio": 1.89,
|
|
605
|
+
"finding": "Primary CTA is 88×36px (fails 44×44) and 1.89:1 contrast (fails 4.5:1)."
|
|
606
|
+
}
|
|
607
|
+
```
|
|
608
|
+
|
|
609
|
+
Every field is mandatory. Missing field → orchestrator rejects the run.
|
|
610
|
+
|
|
611
|
+
### Orchestrator-side verification: stat the paths
|
|
612
|
+
|
|
613
|
+
The parent agent (or a `SubagentStop` hook) must verify **before** accepting the report:
|
|
614
|
+
|
|
615
|
+
```bash
|
|
616
|
+
# scripts/verify-audit.sh
|
|
617
|
+
set -euo pipefail
|
|
618
|
+
jq -e 'length > 0' audit/findings.json >/dev/null
|
|
619
|
+
jq -r '.[] | .screenshot_path, .snapshot_path' audit/findings.json | while read p; do
|
|
620
|
+
[ -s "$p" ] || { echo "MISSING OR EMPTY: $p"; exit 1; }
|
|
621
|
+
done
|
|
622
|
+
# Every snapshot_quote must literally appear in the snapshot file it cites
|
|
623
|
+
jq -c '.[]' audit/findings.json | while read f; do
|
|
624
|
+
q=$(echo "$f" | jq -r .snapshot_quote)
|
|
625
|
+
s=$(echo "$f" | jq -r .snapshot_path)
|
|
626
|
+
grep -qF "$q" "$s" || { echo "QUOTE NOT IN SNAPSHOT: $s"; exit 1; }
|
|
627
|
+
done
|
|
628
|
+
```
|
|
629
|
+
|
|
630
|
+
If verification fails, re-dispatch the subagent with a **SHOW-YOUR-WORK retry prompt** that lists the specific gaps.
|
|
631
|
+
|
|
632
|
+
### Verbatim accessibility-tree quoting
|
|
633
|
+
|
|
634
|
+
Playwright MCP snapshots have a fixed format: `- <role> "<name>" [ref=eNN]`. Rule in the subagent prompt:
|
|
635
|
+
|
|
636
|
+
> For every finding, `snapshot_quote` MUST be a single line copied character-for-character from the snapshot file you saved, containing a `[ref=eNN]` token. If you can't produce a verbatim line, the element wasn't in the a11y tree — say so and switch to a DOM selector with a screenshot crop as evidence.
|
|
637
|
+
|
|
638
|
+
A hallucinated ref fails `grep -qF` trivially — that's the point.
|
|
639
|
+
|
|
640
|
+
### Show-your-work discipline
|
|
641
|
+
|
|
642
|
+
Every finding must cite four things:
|
|
643
|
+
|
|
644
|
+
- **[SHOT]** — a screenshot file on disk
|
|
645
|
+
- **[QUOTE]** — a verbatim line from the snapshot YAML, including `[ref=eNN]`
|
|
646
|
+
- **[SEL]** — a CSS selector that resolves
|
|
647
|
+
- **[VAL]** — a computed value from `browser_evaluate`
|
|
648
|
+
|
|
649
|
+
Miss any one → don't file the finding, file the gap instead: *"Component observed in <screenshot> but absent from accessibility tree — likely shadow DOM or canvas."*
|
|
650
|
+
|
|
651
|
+
### Retry loop
|
|
652
|
+
|
|
653
|
+
When verification fails:
|
|
654
|
+
|
|
655
|
+
```
|
|
656
|
+
Your previous audit run failed verification. Specific problems:
|
|
657
|
+
- findings.json is missing screenshot_path for 3 entries
|
|
658
|
+
- snapshot_quote "button 'Buy now' [ref=e42]" not found in
|
|
659
|
+
audit/snapshots/pricing_mobile.yaml
|
|
660
|
+
|
|
661
|
+
Re-run with SHOW-YOUR-WORK discipline:
|
|
662
|
+
1. Every finding MUST cite SHOT+QUOTE+SEL+VAL (all four).
|
|
663
|
+
2. Do NOT invent findings for elements you did not snapshot.
|
|
664
|
+
3. If you can't see an element in the a11y tree, say so.
|
|
665
|
+
4. Save screenshots and snapshots to disk BEFORE writing findings.json.
|
|
666
|
+
```
|
|
667
|
+
|
|
668
|
+
### Required summary block
|
|
669
|
+
|
|
670
|
+
`audit/SUMMARY.md` with fixed headings (grep'd by the verifier for URL count, viewport count, finding row count):
|
|
671
|
+
|
|
672
|
+
```markdown
|
|
673
|
+
# Audit Summary
|
|
674
|
+
## Pages visited
|
|
675
|
+
- https://app.example.com/
|
|
676
|
+
- https://app.example.com/pricing
|
|
677
|
+
## Viewports tested
|
|
678
|
+
- mobile 375×812
|
|
679
|
+
- tablet 768×1024
|
|
680
|
+
- desktop 1440×900
|
|
681
|
+
## Findings
|
|
682
|
+
| id | url | vp | ref | selector | hex_fg | hex_bg | px | severity |
|
|
683
|
+
|----|-----|----|-----|----------|--------|--------|----|----------|
|
|
684
|
+
```
|
|
685
|
+
|
|
686
|
+
---
|
|
687
|
+
|
|
688
|
+
## 9. Integration with Claude Code subagents (.claude/agents/)
|
|
689
|
+
|
|
690
|
+
### Frontmatter fields (2026)
|
|
691
|
+
|
|
692
|
+
Required: `name`, `description`. Optional:
|
|
693
|
+
|
|
694
|
+
| Field | Purpose |
|
|
695
|
+
|---|---|
|
|
696
|
+
| `tools` | Comma-separated allowlist. Omit → inherits ALL parent tools (including every MCP). |
|
|
697
|
+
| `disallowedTools` | Explicit deny list, overrides inherit |
|
|
698
|
+
| `model` | `sonnet` / `opus` / `haiku` / full ID / `inherit` |
|
|
699
|
+
| `permissionMode` | `default` / `acceptEdits` / `dontAsk` / `bypassPermissions` / `plan` |
|
|
700
|
+
| `mcpServers` | List of MCP server names to scope to this subagent (additive, see caveat below) |
|
|
701
|
+
| `hooks` | Pre/PostToolUse/Stop hooks scoped to this subagent |
|
|
702
|
+
| `skills` | Skills preloaded at startup |
|
|
703
|
+
| `memory` | `user` / `project` / `local` |
|
|
704
|
+
| `maxTurns` | Hard cap on agentic turns |
|
|
705
|
+
| `background` | `true` → concurrent background task |
|
|
706
|
+
| `isolation` | `worktree` → run in temp git worktree |
|
|
707
|
+
| `color`, `effort` | UI color; effort tier for Opus |
|
|
708
|
+
|
|
709
|
+
Plugin-installed subagents **cannot** use `hooks`, `mcpServers`, or `permissionMode` (security restriction).
|
|
710
|
+
|
|
711
|
+
### MCP tool naming
|
|
712
|
+
|
|
713
|
+
Confirmed pattern: **`mcp__<server-name>__<tool-name>`** (double underscores). A server registered with `claude mcp add playwright ...` exposes `mcp__playwright__browser_navigate`, `mcp__playwright__browser_snapshot`, etc. Wildcards work in allowlists: `mcp__playwright__*` grants all tools from that server. Plugin-scoped servers add a `plugin_<plugin-name>_` prefix.
|
|
714
|
+
|
|
715
|
+
### Per-subagent MCP scoping: real but imperfect
|
|
716
|
+
|
|
717
|
+
The `mcpServers:` frontmatter field exists but is **additive, not isolating**. From issue #24054:
|
|
718
|
+
|
|
719
|
+
> "No isolation: there's no way to make an MCP server available only to a subagent or skill. The `mcpServers` frontmatter field in agent definitions is additive — it selects from globally-configured servers but doesn't hide them from the parent."
|
|
720
|
+
|
|
721
|
+
And issue #25200 (open): `mcpServers` + MCP tools in `tools:` fails at runtime under deferred tool loading — workaround is to include the Playwright tools in the parent session's allowlist or set `ENABLE_TOOL_SEARCH=off`. Issue #6915 specifically cites Playwright as the motivating example: *"the main chat should be able to give instructions to a subagent to use the tools without polluting the context of the main chat with all of the Playwright MCP tool calls."* That feature is pending.
|
|
722
|
+
|
|
723
|
+
**Today's practical pattern**: include Playwright in the subagent's tool allowlist; in the parent's CLAUDE.md, instruct *"do not call `mcp__playwright__*` directly; delegate to the `ux-auditor` subagent."* The tool schemas still load into the parent's context — there's no way around that without the pending feature.
|
|
724
|
+
|
|
725
|
+
### Context-cost concern
|
|
726
|
+
|
|
727
|
+
Playwright MCP registers **~25 tools**. Simon Willison observed on Mastodon: *"the Playwright one is pretty big, it has 25 tools defined which may be too many for the local LLMs to handle."* Community benchmarks estimate **~500 tokens per tool schema**, i.e. **~12–15K tokens just to load Playwright's schemas** before any action. Combined with auto-snapshots (2–10K tokens each), a 30-action flow routinely burns 100K+ tokens. Playwright's own comparison: 114K MCP vs 27K CLI for equivalent work.
|
|
728
|
+
|
|
729
|
+
Mitigations, in priority order:
|
|
730
|
+
1. **Tool Search** (default in Claude Code late-2025+) loads only tool names at startup; schemas load on demand. Set `ENABLE_TOOL_SEARCH=auto`.
|
|
731
|
+
2. Scope Playwright to the audit subagent — even though schemas still load in the parent, the orchestrator won't *invoke* them, so snapshot pollution is avoided.
|
|
732
|
+
3. Run the server with `--output-dir` + `--image-responses omit` so images go to disk, not inline into context. Biggest single lever.
|
|
733
|
+
4. For long flows (>20 steps), Microsoft themselves recommend the CLI + Skill path: *"Modern coding agents increasingly favor CLI-based workflows exposed as Skills over MCP because CLI invocations are more token-efficient."*
|
|
734
|
+
|
|
735
|
+
---
|
|
736
|
+
|
|
737
|
+
## 10. Common pitfalls
|
|
738
|
+
|
|
739
|
+
**Snapshot churn on dynamic pages.** Refs are indexed per-snapshot, not stable IDs — the same button may be `ref=e87` in one snapshot and `ref=e91` in the next. Official guidance: *"Refs are stable within a single snapshot… after navigation or DOM updates, the tool returns a fresh snapshot with new refs. Most tools also return a snapshot automatically after each action."* Enforce: **treat every `[ref=eNN]` as valid for ONE action only.** Never reuse a ref across actions without a fresh snapshot.
|
|
740
|
+
|
|
741
|
+
**Accessibility tree vs DOM discrepancies.** Shadow DOM (Lit/Shoelace/Stencil, especially closed shadow roots) can be invisible to the a11y snapshot — Playwright itself can't pierce closed shadow roots at all. Cross-origin iframes are generally opaque. Canvas and custom-rendered charts don't appear in the tree. Detection rule: *"If you see a visual element in a screenshot but no corresponding entry in the YAML snapshot, report it as a high-severity finding: component not exposed to the accessibility tree — screen-reader users can't perceive it."*
|
|
742
|
+
|
|
743
|
+
**Modal and cookie-banner handling.** HTML modals and cookie banners are real DOM — dismiss via `browser_click` on the ref, **before** snapshotting main content (otherwise "real" content renders but is inert/covered). Native JS dialogs (`alert`, `confirm`, `prompt`) are NOT DOM and require `browser_handle_dialog` separately. A page-load `confirm()` will hang the entire session until handled. Prompt pattern: right after every `browser_navigate`, scan the snapshot for accessible names containing "cookie"/"consent"/"accept"/"GDPR"/"subscribe" and dismiss first.
|
|
744
|
+
|
|
745
|
+
**JS errors cascade.** One uncaught error during init silently breaks every downstream interaction — clicks "succeed" but state never updates. Call `browser_console_messages` immediately after every navigation; if errors exist, write them to findings and stop auditing that page rather than generating more findings on broken state.
|
|
746
|
+
|
|
747
|
+
**Duplicate accessible names.** Two buttons named "Submit" means both `element` disambiguation and `ref` matter. Resolution: include parent context in `element` (`"Submit button in 'Shipping address' form"`), pick the ref nested under the right parent in the YAML, or use `data-testid` via `--test-id-attribute`.
|
|
748
|
+
|
|
749
|
+
**Animation timing.** Screenshots mid-animation look broken (50%-opacity modals, half-slid sidebars). Don't use `networkidle`. Instead wait for text inside the animated component, or **disable animations** via `browser_evaluate` at the start of each page:
|
|
750
|
+
|
|
751
|
+
```js
|
|
752
|
+
() => {
|
|
753
|
+
const s = document.createElement('style');
|
|
754
|
+
s.textContent = '*,*::before,*::after{animation:none!important;transition:none!important;}';
|
|
755
|
+
document.head.appendChild(s);
|
|
756
|
+
}
|
|
757
|
+
```
|
|
758
|
+
|
|
759
|
+
**Parallel subagents fighting over one tab** (issue #893). Multiple Claude subagents launched in parallel share the MCP server's browser tab and produce inconsistent results. Run viewport loops **sequentially inside one subagent**, or start the server with `--isolated` and have each agent call `browser_tab_new` first.
|
|
760
|
+
|
|
761
|
+
**`browser_evaluate` cannot use snapshot refs** (issue #870). The `evaluate` tool doesn't understand `ref=eNN` — you must pass a CSS/XPath selector inside the function body.
|
|
762
|
+
|
|
763
|
+
**Pages too big for context** (issue #1329). Some pages produce snapshots that exceed the context budget. Set `PLAYWRIGHT_MCP_SNAPSHOT_MODE=none` to suppress auto-snapshots and call `browser_snapshot` manually with the `filename` param so the tree goes to disk.
|
|
764
|
+
|
|
765
|
+
**Deprecated package warning.** `@modelcontextprotocol/server-playwright` is abandoned — use `@playwright/mcp`. `@executeautomation/playwright-mcp-server` is a different project with a different API; its `browser_resize` accepts `device`/`orientation` params that Microsoft's does not. Don't mix the docs.
|
|
766
|
+
|
|
767
|
+
---
|
|
768
|
+
|
|
769
|
+
## 11. A complete, ready-to-paste subagent
|
|
770
|
+
|
|
771
|
+
Save as `.claude/agents/sd-playwright.md` at your project root. This enforces the evidence protocol from §8, covers mobile/tablet/desktop, and uses Doherty threshold, WCAG 2.2 target size, and touch-target criteria.
|
|
772
|
+
|
|
773
|
+
````markdown
|
|
774
|
+
---
|
|
775
|
+
name: sd-playwright
|
|
776
|
+
description: >
|
|
777
|
+
Performs responsive + WCAG 2.2 UX audits on a running web app using
|
|
778
|
+
Playwright MCP. Use PROACTIVELY after any frontend change, or when the
|
|
779
|
+
user says "audit", "UX review", "design review", "accessibility check",
|
|
780
|
+
or "responsive test". Produces verified evidence: screenshots, snapshot
|
|
781
|
+
YAML, computed styles, and a JSON findings file.
|
|
782
|
+
model: sonnet
|
|
783
|
+
permissionMode: acceptEdits
|
|
784
|
+
maxTurns: 80
|
|
785
|
+
color: cyan
|
|
786
|
+
mcpServers:
|
|
787
|
+
- playwright
|
|
788
|
+
tools:
|
|
789
|
+
- Read
|
|
790
|
+
- Write
|
|
791
|
+
- Edit
|
|
792
|
+
- Glob
|
|
793
|
+
- Grep
|
|
794
|
+
- Bash
|
|
795
|
+
- mcp__playwright__browser_navigate
|
|
796
|
+
- mcp__playwright__browser_navigate_back
|
|
797
|
+
- mcp__playwright__browser_resize
|
|
798
|
+
- mcp__playwright__browser_snapshot
|
|
799
|
+
- mcp__playwright__browser_take_screenshot
|
|
800
|
+
- mcp__playwright__browser_evaluate
|
|
801
|
+
- mcp__playwright__browser_click
|
|
802
|
+
- mcp__playwright__browser_type
|
|
803
|
+
- mcp__playwright__browser_hover
|
|
804
|
+
- mcp__playwright__browser_press_key
|
|
805
|
+
- mcp__playwright__browser_select_option
|
|
806
|
+
- mcp__playwright__browser_wait_for
|
|
807
|
+
- mcp__playwright__browser_console_messages
|
|
808
|
+
- mcp__playwright__browser_network_requests
|
|
809
|
+
- mcp__playwright__browser_handle_dialog
|
|
810
|
+
- mcp__playwright__browser_tabs
|
|
811
|
+
- mcp__playwright__browser_install
|
|
812
|
+
- mcp__playwright__browser_close
|
|
813
|
+
---
|
|
814
|
+
|
|
815
|
+
# Role
|
|
816
|
+
|
|
817
|
+
You are a UX + accessibility auditor. You drive a real browser through the
|
|
818
|
+
Playwright MCP server and produce **verifiable evidence** of every finding.
|
|
819
|
+
You do NOT run Playwright via Bash, do NOT write a Playwright test file,
|
|
820
|
+
do NOT use curl. You use the `mcp__playwright__*` tools exclusively.
|
|
821
|
+
|
|
822
|
+
# Non-negotiable rules
|
|
823
|
+
|
|
824
|
+
1. **Say "Playwright MCP" literally** in any sub-invocation you make. Use
|
|
825
|
+
only `mcp__playwright__*` tools for browser work.
|
|
826
|
+
2. **Every finding cites four things: [SHOT], [QUOTE], [SEL], [VAL].**
|
|
827
|
+
Missing any one → you do not file the finding. You file the gap.
|
|
828
|
+
3. **Snapshots are per-call.** Every `[ref=eNN]` is valid for ONE action.
|
|
829
|
+
Re-snapshot after any click, type, select, navigate, or waitFor.
|
|
830
|
+
4. **Save artifacts to disk BEFORE writing findings.json.** No exceptions.
|
|
831
|
+
5. **On JS console errors, stop auditing that page.** Record the errors
|
|
832
|
+
verbatim and move to the next page.
|
|
833
|
+
6. **Dismiss cookie banners / consent modals FIRST** on every page, before
|
|
834
|
+
you capture the canonical snapshot.
|
|
835
|
+
7. **Text waits, never time waits.** Use `browser_wait_for(text=…)` or
|
|
836
|
+
`textGone=…`. Use `time=` only as a documented last resort.
|
|
837
|
+
8. **Sequential, not parallel.** Do not spawn parallel flows against the
|
|
838
|
+
same Playwright MCP — tabs will collide.
|
|
839
|
+
|
|
840
|
+
# Evaluation criteria
|
|
841
|
+
|
|
842
|
+
## WCAG 2.2 focus points
|
|
843
|
+
- **1.4.3 Contrast (Minimum)** — text ≥ 4.5:1, large text ≥ 3:1
|
|
844
|
+
- **2.4.7 Focus Visible** — every interactive element has a visible focus ring
|
|
845
|
+
- **2.5.5 Target Size (AAA)** — interactive targets ≥ 44×44 CSS px
|
|
846
|
+
- **2.5.8 Target Size (Minimum, AA, new in 2.2)** — ≥ 24×24 CSS px
|
|
847
|
+
- **3.3.8 Accessible Authentication** — no memory puzzles
|
|
848
|
+
- **1.3.1 Info and Relationships** — headings in order; labels tied to inputs
|
|
849
|
+
|
|
850
|
+
## Performance & interaction
|
|
851
|
+
- **Doherty threshold (400 ms)** — any perceived response time over 400 ms
|
|
852
|
+
from click to visible feedback is a finding. Measure with
|
|
853
|
+
`performance.now()` via `browser_evaluate`.
|
|
854
|
+
- **Tap target spacing** — 8 px minimum between adjacent targets on mobile.
|
|
855
|
+
- **Scroll chaining / viewport overflow** — horizontal scroll at 375 px
|
|
856
|
+
is a blocker.
|
|
857
|
+
|
|
858
|
+
## Visual polish
|
|
859
|
+
- Alignment within 2 px grid
|
|
860
|
+
- Consistent spacing scale (4/8/12/16/24/32/48/64)
|
|
861
|
+
- Typographic hierarchy (h1 > h2 > h3 size ratios obvious)
|
|
862
|
+
- Empty / loading / error states exist for every async view
|
|
863
|
+
|
|
864
|
+
# Standard flow
|
|
865
|
+
|
|
866
|
+
For each viewport ∈ [mobile 375×812, tablet 768×1024, desktop 1440×900]:
|
|
867
|
+
For each page in the audit set:
|
|
868
|
+
|
|
869
|
+
1. `browser_resize(width, height)` for the viewport.
|
|
870
|
+
2. `browser_navigate(url)`.
|
|
871
|
+
3. `browser_wait_for(text="<known copy on loaded page>")`.
|
|
872
|
+
4. **Disable animations once** via `browser_evaluate` (style override).
|
|
873
|
+
5. **Dismiss cookie banners** by snapshotting and clicking any node with
|
|
874
|
+
role=button whose accessible name contains cookie/consent/accept/GDPR.
|
|
875
|
+
6. `browser_console_messages(level="error")`. If non-empty, record and
|
|
876
|
+
SKIP the rest of this page.
|
|
877
|
+
7. `browser_snapshot({ filename: "audit/snapshots/{page}_{vp}.yaml" })`.
|
|
878
|
+
8. `browser_take_screenshot({ fullPage: true,
|
|
879
|
+
filename: "audit/screens/{page}_{vp}_full.png" })`.
|
|
880
|
+
9. `browser_evaluate(...)` for computed styles of the key elements
|
|
881
|
+
(headings, primary CTA, nav items, form fields). Save to
|
|
882
|
+
`audit/styles/{page}_{vp}.json`.
|
|
883
|
+
10. `browser_network_requests({ includeStatic: false })`. Record any
|
|
884
|
+
failed requests (status ≥ 400) to `audit/network/{page}_{vp}.json`.
|
|
885
|
+
11. For each issue found, append an entry to `audit/findings.json` with
|
|
886
|
+
all four evidence fields.
|
|
887
|
+
|
|
888
|
+
# Findings schema (strict)
|
|
889
|
+
|
|
890
|
+
```json
|
|
891
|
+
{
|
|
892
|
+
"id": "f-001",
|
|
893
|
+
"page_url": "https://…",
|
|
894
|
+
"viewport": { "name": "mobile", "w": 375, "h": 812 },
|
|
895
|
+
"screenshot_path": "audit/screens/…png", // SHOT
|
|
896
|
+
"snapshot_path": "audit/snapshots/…yaml",
|
|
897
|
+
"snapshot_quote": "- button \"…\" [ref=e87]", // QUOTE (verbatim)
|
|
898
|
+
"dom_selector": "main>…", // SEL
|
|
899
|
+
"computed_style_excerpt": { "…": "…" }, // VAL
|
|
900
|
+
"wcag_criterion": "2.5.5 Target Size (AAA)",
|
|
901
|
+
"severity": "blocker|high|medium|nitpick",
|
|
902
|
+
"evidence_hex_fg": "#FFFFFF",
|
|
903
|
+
"evidence_hex_bg": "#93C5FD",
|
|
904
|
+
"evidence_contrast_ratio": 1.89,
|
|
905
|
+
"evidence_px": { "w": 88, "h": 36 },
|
|
906
|
+
"finding": "<one-sentence impact statement>"
|
|
907
|
+
}
|
|
908
|
+
```
|
|
909
|
+
|
|
910
|
+
# File outputs (produced BEFORE you return)
|
|
911
|
+
|
|
912
|
+
```
|
|
913
|
+
audit/
|
|
914
|
+
findings.json # JSON array, append-only
|
|
915
|
+
SUMMARY.md # Pages visited, viewports tested, findings table
|
|
916
|
+
screens/ # PNG screenshots, one per page × viewport
|
|
917
|
+
snapshots/ # Accessibility-tree YAML, one per page × viewport
|
|
918
|
+
styles/ # Computed-style JSON per page × viewport
|
|
919
|
+
network/ # Failed-request JSON per page × viewport
|
|
920
|
+
```
|
|
921
|
+
|
|
922
|
+
# Error handling
|
|
923
|
+
|
|
924
|
+
| Failure | Action |
|
|
925
|
+
|---|---|
|
|
926
|
+
| `ref=eNN` not found | Re-snapshot, re-identify by accessible name, retry. Don't guess selectors. |
|
|
927
|
+
| Two elements with same name | Include parent context in `element` parameter; pick ref nested under correct parent. |
|
|
928
|
+
| `browser_wait_for(text)` timeout | Dump `browser_console_messages`, then `browser_snapshot`, then retry with different text. |
|
|
929
|
+
| "No browser" | `browser_install` once, retry once. If still fails, stop and report. |
|
|
930
|
+
| Same step fails twice | **Stop.** Write failure + snapshot + console into findings, hand back to orchestrator. Do NOT fabricate success. |
|
|
931
|
+
|
|
932
|
+
# Final checks before returning
|
|
933
|
+
|
|
934
|
+
Run these Bash checks and do not return until they pass:
|
|
935
|
+
|
|
936
|
+
```bash
|
|
937
|
+
# 1. Every screenshot_path and snapshot_path in findings.json exists on disk.
|
|
938
|
+
jq -r '.[] | .screenshot_path, .snapshot_path' audit/findings.json | \
|
|
939
|
+
while read p; do [ -s "$p" ] || { echo "MISSING: $p"; exit 1; }; done
|
|
940
|
+
|
|
941
|
+
# 2. Every snapshot_quote appears verbatim in its cited snapshot file.
|
|
942
|
+
jq -c '.[]' audit/findings.json | while read f; do
|
|
943
|
+
q=$(echo "$f" | jq -r .snapshot_quote)
|
|
944
|
+
s=$(echo "$f" | jq -r .snapshot_path)
|
|
945
|
+
grep -qF "$q" "$s" || { echo "QUOTE NOT IN SNAPSHOT: $s"; exit 1; }
|
|
946
|
+
done
|
|
947
|
+
|
|
948
|
+
# 3. Every viewport × page pair produced a screenshot and snapshot.
|
|
949
|
+
```
|
|
950
|
+
|
|
951
|
+
If any check fails, fix the gap and re-verify. Do not return with gaps.
|
|
952
|
+
````
|
|
953
|
+
|
|
954
|
+
Pair this with an `.mcp.json` at project root:
|
|
955
|
+
|
|
956
|
+
```json
|
|
957
|
+
{
|
|
958
|
+
"mcpServers": {
|
|
959
|
+
"playwright": {
|
|
960
|
+
"command": "npx",
|
|
961
|
+
"args": ["@playwright/mcp@0.0.70",
|
|
962
|
+
"--isolated",
|
|
963
|
+
"--output-dir", "./audit/screens",
|
|
964
|
+
"--image-responses", "omit",
|
|
965
|
+
"--caps", "vision"]
|
|
966
|
+
}
|
|
967
|
+
}
|
|
968
|
+
}
|
|
969
|
+
```
|
|
970
|
+
|
|
971
|
+
And a `scripts/verify-audit.sh` containing the Bash block from §8.
|
|
972
|
+
|
|
973
|
+
---
|
|
974
|
+
|
|
975
|
+
## 12. Real working examples from the community
|
|
976
|
+
|
|
977
|
+
### Example 1 — OneRedOak/claude-code-workflows (the canonical ancestor)
|
|
978
|
+
|
|
979
|
+
Source: `github.com/OneRedOak/claude-code-workflows/tree/main/design-review`. Patrick Ellis's "elite design review specialist" prompt is the ancestor of essentially every design-review subagent on GitHub. Frontmatter:
|
|
980
|
+
|
|
981
|
+
```yaml
|
|
982
|
+
---
|
|
983
|
+
name: design-review
|
|
984
|
+
description: |
|
|
985
|
+
Use this agent when you need to conduct a comprehensive design review on
|
|
986
|
+
front-end pull requests or general UI changes. Requires a live preview
|
|
987
|
+
environment and uses Playwright for automated interaction testing.
|
|
988
|
+
tools: Grep, LS, Read, Edit, Write, WebFetch, WebSearch, TodoWrite, Bash,
|
|
989
|
+
mcp__playwright__browser_navigate, mcp__playwright__browser_click,
|
|
990
|
+
mcp__playwright__browser_type, mcp__playwright__browser_resize,
|
|
991
|
+
mcp__playwright__browser_take_screenshot, mcp__playwright__browser_snapshot,
|
|
992
|
+
mcp__playwright__browser_console_messages, mcp__playwright__browser_hover,
|
|
993
|
+
mcp__playwright__browser_select_option, mcp__playwright__browser_evaluate
|
|
994
|
+
model: sonnet
|
|
995
|
+
---
|
|
996
|
+
```
|
|
997
|
+
|
|
998
|
+
Seven-phase methodology: preparation (read PR diff, set 1440×900) → interaction/flow (hover, active, disabled, destructive-confirm) → responsiveness (1440/768/375) → visual polish → WCAG 2.1 AA → robustness (overflow, empty, error) → code health + console. Output is a triaged `### Findings` markdown with `#### Blockers`, `#### High-Priority`, `#### Medium-Priority`, `#### Nitpicks`.
|
|
999
|
+
|
|
1000
|
+
**What it does well:** "Live Environment First" grounds every finding in real rendered behavior. The Blocker/High/Medium/Nitpick triage matrix plus "problems over prescriptions" communication style makes output reviewer-friendly. Always paired with a `/design-review` slash command and a `CLAUDE.md` design-principles block (Stripe/Airbnb/Linear).
|
|
1001
|
+
|
|
1002
|
+
**Weakness:** Tool list is very broad — it inherits mutating tools (click, type, file_upload) and Bash, so the subagent can act on the live app, not just audit it. Safe against an ephemeral preview env, risky against anything else.
|
|
1003
|
+
|
|
1004
|
+
### Example 2 — EricTechPro/match-me
|
|
1005
|
+
|
|
1006
|
+
Source: `github.com/EricTechPro/match-me/blob/main/.claude/agents/design-review-agent.md`. A real project-scoped fork of OneRedOak, with **Context7 MCP added** so the agent can pull framework docs while reviewing. Adds `mcp__context7__resolve-library-id`, `mcp__context7__get-library-docs`, plus the broader Playwright surface (`browser_tab_list`, `browser_tab_new`, `browser_file_upload`, `browser_handle_dialog`, `browser_network_requests`, `browser_press_key`, `browser_navigate_back/forward`, `browser_drag`, `browser_install`).
|
|
1007
|
+
|
|
1008
|
+
**Does well:** Context7 integration is a meaningful enrichment — the agent verifies design conventions against *current* framework docs (Next.js, Tailwind, shadcn) rather than its training cutoff. Broader Playwright surface lets it audit multi-tab flows and dialog-triggered paths.
|
|
1009
|
+
|
|
1010
|
+
**Weakness:** Essentially a verbatim fork — no domain specialization for the dating-app context. The expanded tool list consumes more context tokens without adding review sophistication.
|
|
1011
|
+
|
|
1012
|
+
### Example 3 — claude-code-community-ireland / vibeworks-library
|
|
1013
|
+
|
|
1014
|
+
Source: `github.com/claude-code-community-ireland/claude-code-resources/blob/main/plugins/vibeworks-library/agents/design-review.md`. Installable as a Claude Code plugin via the community plugin hub — distributes the OneRedOak prompt with clean tool discipline:
|
|
1015
|
+
|
|
1016
|
+
```yaml
|
|
1017
|
+
---
|
|
1018
|
+
name: design-review
|
|
1019
|
+
description: Use this agent when you need to conduct a comprehensive design
|
|
1020
|
+
review on front-end pull requests or general UI changes...
|
|
1021
|
+
model: sonnet
|
|
1022
|
+
tools: Grep, LS, Read, Edit, MultiEdit, Write, NotebookEdit, WebFetch,
|
|
1023
|
+
TodoWrite, WebSearch
|
|
1024
|
+
---
|
|
1025
|
+
```
|
|
1026
|
+
|
|
1027
|
+
Playwright MCP tools are **not** listed in `tools:`; they're referenced contextually in the prompt body as "Technical Requirements" (`mcp__playwright__browser_navigate/click/type/select_option/take_screenshot/resize/snapshot/console_messages`). The declared surface is narrower, and the plugin packaging means teams install once and get updates without forking markdown.
|
|
1028
|
+
|
|
1029
|
+
**Does well:** Cleanest tool discipline of the three — frontmatter only whitelists generic Read/Write/Grep tools, with MCP access narrowed by the plugin runtime. Distributable via plugin hub for consistent team rollout.
|
|
1030
|
+
|
|
1031
|
+
**Weakness:** Low adoption (2 stars / 0 forks on parent repo), and prompt content is unchanged from OneRedOak — no plugin-specific value beyond the packaging.
|
|
1032
|
+
|
|
1033
|
+
### Cross-cutting observations
|
|
1034
|
+
|
|
1035
|
+
Community examples are **far less diverse than they appear** — virtually every `design-review` agent on GitHub traces back to OneRedOak. The fingerprints: "elite design review specialist" phrasing, Stripe/Airbnb/Linear framing, "Live Environment First" philosophy, seven-phase methodology, Blocker/High/Medium/Nitpick triage. Variations are limited to tool-list tweaks and packaging.
|
|
1036
|
+
|
|
1037
|
+
Common patterns worth adopting:
|
|
1038
|
+
- **Three-viewport sweep** (1440 / 768 / 375) is universal and correct.
|
|
1039
|
+
- **Non-mutating-first tool set**: `browser_navigate`, `browser_snapshot`, `browser_take_screenshot`, `browser_resize`, `browser_console_messages` for observation; mutating tools (`click`, `type`, `select_option`) only for interaction testing.
|
|
1040
|
+
- **Paired-artifact deployment**: subagent + `/design-review` slash command + CLAUDE.md design-principles — the subagent rarely stands alone.
|
|
1041
|
+
- **`model: sonnet`** is standard (balance of vision reasoning and cost).
|
|
1042
|
+
|
|
1043
|
+
What's **missing** from all three — and what your shipping skill can improve on — is the **evidence protocol**. None of them require the four-piece SHOW-YOUR-WORK citation (screenshot + snapshot quote + selector + computed value), none write a machine-verifiable JSON findings file, and none have an orchestrator-side `verify-audit.sh` that stats the paths and greps the quotes. They trust the LLM's word. Your skill doesn't need to.
|
|
1044
|
+
|
|
1045
|
+
### Adjacent non-audit references
|
|
1046
|
+
|
|
1047
|
+
For test-authoring (not auditing), the `microsoft/playwright` repo itself ships `playwright-test-planner.agent.md`, `-generator.agent.md`, `-healer.agent.md` — these target E2E generation. Anthropic's official `anthropics/frontend-design` and `anthropics/webapp-testing` Skills cover adjacent generative/testing territory. As of April 2026, **no widely-adopted design-audit SKILL.md equivalent exists** — there's a clear opening for a shipping skill that does this properly.
|
|
1048
|
+
|
|
1049
|
+
---
|
|
1050
|
+
|
|
1051
|
+
## Takeaways: what actually ships
|
|
1052
|
+
|
|
1053
|
+
The one-line summary: **say "Playwright MCP" out loud in turn one; snapshot-don't-screenshot; text-waits-not-time-waits; pin a version (not `@latest`) in team configs; demand four pieces of evidence per finding and verify the files exist on disk before trusting the report.** Everything else — viewport tables, tool lists, env vars — is support material for those five moves.
|
|
1054
|
+
|
|
1055
|
+
The 2026 inflection is **Tool Search + Skills over MCP for long flows**. Microsoft themselves now recommend CLI-based Skills for coding agents precisely because loading 25 Playwright tool schemas plus inline snapshots blows through 100K tokens on a 30-step audit. For **interactive, ad-hoc UX review**, MCP is still right — the accessibility-tree model gives you deterministic `ref=eNN` targeting that no CLI invocation does. For **long automated runs or CI**, consider swapping MCP for `@playwright/cli` inside a Skill.
|
|
1056
|
+
|
|
1057
|
+
And the gap worth closing in your shipping skill: nobody in the community enforces orchestrator-side evidence verification. `scripts/verify-audit.sh` — stat the paths, grep the quotes — is a fifteen-line defense that makes the agent verifiably honest. That's the piece worth investing in.
|