web-tester-for-claude 0.4.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/LICENSE +21 -0
- package/README.md +651 -0
- package/bin/web-tester.js +35 -0
- package/package.json +64 -0
- package/src/browser/attrs.ts +79 -0
- package/src/browser/session.ts +139 -0
- package/src/cli.ts +1488 -0
- package/src/impact.ts +165 -0
- package/src/init.ts +260 -0
- package/src/inspector/capture.ts +293 -0
- package/src/inspector/deep.ts +147 -0
- package/src/inspector/packs.ts +98 -0
- package/src/inspector/report.ts +667 -0
- package/src/inspector/run.ts +544 -0
- package/src/inspector/steps.ts +380 -0
- package/src/inspector/summarise.ts +178 -0
- package/src/inspector/verdict.ts +275 -0
- package/src/journeys.ts +78 -0
- package/src/kb.ts +84 -0
- package/src/map/classify.ts +149 -0
- package/src/map/crawl.ts +394 -0
- package/src/map/generate.ts +253 -0
- package/src/map/report.ts +112 -0
- package/src/map/run.ts +219 -0
- package/src/sitemap.ts +75 -0
- package/src/sweep.ts +476 -0
- package/src/templates/agent-section.md +77 -0
- package/src/templates/dot-web-tester/impact-rules.json +36 -0
- package/src/templates/dot-web-tester/instructions/getting-started.md +62 -0
- package/src/templates/dot-web-tester/instructions/recipes.md +105 -0
- package/src/templates/dot-web-tester/journeys/example-signup.json +17 -0
- package/src/templates/dot-web-tester/urls-smoke.txt +19 -0
- package/src/templates/skill.md +59 -0
- package/src/util/log.ts +26 -0
- package/src/util/paths.ts +141 -0
- package/src/util/prompt.ts +50 -0
- package/tsconfig.json +14 -0
package/LICENSE
ADDED
|
@@ -0,0 +1,21 @@
|
|
|
1
|
+
MIT License
|
|
2
|
+
|
|
3
|
+
Copyright (c) 2026 Haroon Khan
|
|
4
|
+
|
|
5
|
+
Permission is hereby granted, free of charge, to any person obtaining a copy
|
|
6
|
+
of this software and associated documentation files (the "Software"), to deal
|
|
7
|
+
in the Software without restriction, including without limitation the rights
|
|
8
|
+
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
|
|
9
|
+
copies of the Software, and to permit persons to whom the Software is
|
|
10
|
+
furnished to do so, subject to the following conditions:
|
|
11
|
+
|
|
12
|
+
The above copyright notice and this permission notice shall be included in all
|
|
13
|
+
copies or substantial portions of the Software.
|
|
14
|
+
|
|
15
|
+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
|
16
|
+
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
|
17
|
+
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
|
18
|
+
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
|
19
|
+
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
|
|
20
|
+
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
|
|
21
|
+
SOFTWARE.
|
package/README.md
ADDED
|
@@ -0,0 +1,651 @@
|
|
|
1
|
+
# web-tester-for-claude
|
|
2
|
+
|
|
3
|
+
> Let your coding agent **see and verify** the web changes it makes. web-tester
|
|
4
|
+
> drives your dev site, captures everything to one report, and runs a whole
|
|
5
|
+
> flow in a **single model turn** — not a dozen turn-by-turn tool calls.
|
|
6
|
+
|
|
7
|
+
web-tester wraps Chromium with a single, opinionated capture pipeline: every
|
|
8
|
+
console line, every network request, every page error, every step screenshot,
|
|
9
|
+
the whole video, the full DOM if you ask for it — into one self-contained
|
|
10
|
+
HTML report and one structured `result.json` per run. The agent reads back only
|
|
11
|
+
the slices it needs, so the **edit → verify → edit loop stays cheap and fast**
|
|
12
|
+
even across many steps.
|
|
13
|
+
|
|
14
|
+
It's intentionally a *toolkit*, not a pipeline. There is no LLM stage, no
|
|
15
|
+
test generation, no judging. You (or an AI agent like Claude Code) decide
|
|
16
|
+
what to look at; `web-tester` just makes it cheap to look.
|
|
17
|
+
|
|
18
|
+
```bash
|
|
19
|
+
# Quick verify a change — fail on any 5xx, assert text is visible, in ~6s
|
|
20
|
+
npx web-tester-for-claude inspect "/products/widget" \
|
|
21
|
+
--step settle --quick \
|
|
22
|
+
--expect "text=Add to Cart" \
|
|
23
|
+
--fail-on http-5xx
|
|
24
|
+
|
|
25
|
+
# Drive a flow, capture state at every step
|
|
26
|
+
npx web-tester-for-claude inspect "/products/widget" \
|
|
27
|
+
--step settle \
|
|
28
|
+
--step screenshot:initial \
|
|
29
|
+
--step "click:button:has-text(\"Add to Cart\")" \
|
|
30
|
+
--step wait:networkidle \
|
|
31
|
+
--step goto:/cart \
|
|
32
|
+
--step screenshot:cart
|
|
33
|
+
|
|
34
|
+
# Bulk-sweep many URLs in parallel
|
|
35
|
+
npx web-tester-for-claude sweep --sitemap --filter '^/products/' --concurrency 4 \
|
|
36
|
+
--fail-on http-5xx
|
|
37
|
+
```
|
|
38
|
+
|
|
39
|
+
---
|
|
40
|
+
|
|
41
|
+
## Why web-tester
|
|
42
|
+
|
|
43
|
+
You can drive Playwright yourself, of course. web-tester earns its weight in
|
|
44
|
+
three ways that show up every day:
|
|
45
|
+
|
|
46
|
+
1. **Disk-as-cache report shape.** One run captures everything to
|
|
47
|
+
`.web-tester/runs/<id>/`, and the CLI prints the path to a self-contained
|
|
48
|
+
`report.html`. AI agents read `result.json` selectively
|
|
49
|
+
(`jq '.steps[3].network'`) instead of pulling every byte of browser
|
|
50
|
+
state back into their conversation context. For "reproduce this bug,
|
|
51
|
+
tell me what happened" tasks this uses **5–10× fewer tokens** than
|
|
52
|
+
piping DOM snapshots back through stdout.
|
|
53
|
+
2. **One step grammar.** No heredoc Playwright scripts to maintain.
|
|
54
|
+
`--step click:…`, `--step fill:…=…`, `--step wait:url-contains:…` —
|
|
55
|
+
composable, copy-pasteable from a recipe, no boilerplate.
|
|
56
|
+
3. **Knowledge files travel with the repo.** Drop project quirks into
|
|
57
|
+
`.web-tester/instructions/*.md` and any future session — yours or your
|
|
58
|
+
AI agent's — gets them as a warm start instead of re-discovering them.
|
|
59
|
+
|
|
60
|
+
The HTML report has a sticky video player with speed presets, a step
|
|
61
|
+
timeline with screenshot + console/network slices, lightboxed full-page
|
|
62
|
+
screenshots, and collapsible global logs. Open it first; the JSON is for
|
|
63
|
+
programmatic reads.
|
|
64
|
+
|
|
65
|
+
---
|
|
66
|
+
|
|
67
|
+
## Why a CLI, and not an MCP server?
|
|
68
|
+
|
|
69
|
+
[Microsoft's Playwright MCP](https://github.com/microsoft/playwright-mcp) is
|
|
70
|
+
excellent for *live, interactive* browser control — the agent decides each
|
|
71
|
+
click as it goes. web-tester is deliberately a **CLI** instead, because a
|
|
72
|
+
coding agent's job isn't to click around live; it's to **verify a change it
|
|
73
|
+
just made**, repeatedly, per project. A CLI fits that better in three ways an
|
|
74
|
+
MCP server structurally can't:
|
|
75
|
+
|
|
76
|
+
- **It learns the project over time.** Everything lives in `.web-tester/` —
|
|
77
|
+
recipes, instructions, a route map, journeys — and *grows* as you use it. The
|
|
78
|
+
next session gets a warm start instead of rediscovering your site. An MCP
|
|
79
|
+
server is stateless per project; it remembers nothing between runs.
|
|
80
|
+
- **It produces artifacts.** One run writes a self-contained `report.html`
|
|
81
|
+
(video + step timeline) and a structured `result.json` you can diff, attach
|
|
82
|
+
to a PR, or hand to CI. MCP returns everything into the conversation and
|
|
83
|
+
it's gone.
|
|
84
|
+
- **It barely touches context.** MCP returns a full page snapshot into the
|
|
85
|
+
conversation on *every* step; those tokens pile up and never leave. web-tester
|
|
86
|
+
runs the whole flow in one process and hands back a compact verdict — the
|
|
87
|
+
agent reads `result.json` slices only if it needs them.
|
|
88
|
+
|
|
89
|
+
### Measured: tokens, round-trips, and cost
|
|
90
|
+
|
|
91
|
+
The same task, run each way — counting what enters the model's context, the
|
|
92
|
+
model round-trips, and the resulting **token cost** ([methodology](#methodology)):
|
|
93
|
+
|
|
94
|
+

|
|
95
|
+
|
|
96
|
+
| Task | Tool | Input tok | Output tok | Round-trips | Cost / run | Per 1,000 runs |
|
|
97
|
+
|---|---|--:|--:|--:|--:|--:|
|
|
98
|
+
| **TodoMVC**<br>add 3, complete 1, filter | Playwright MCP | ~1,240 | ~600 | 6 | $0.013 | $12.70 |
|
|
99
|
+
| | **web-tester** | **~300** | ~150 | **1** | **$0.003** | **$3.16** · 4× less |
|
|
100
|
+
| **Hacker News**<br>verify front page | Playwright MCP | ~10,100 | ~100 | 1 | $0.032 | $31.80 |
|
|
101
|
+
| | **web-tester** | **~220** | ~150 | 1 | **$0.003** | **$2.90** · 11× less |
|
|
102
|
+
|
|
103
|
+
Cost is at Claude Sonnet 4.6 list price ($3 / $15 per 1M input / output tokens)
|
|
104
|
+
and scales linearly with whatever model you run (≈1.7× at Opus 4.8 rates). Input
|
|
105
|
+
tokens are measured; output is a modest per-round-trip estimate.
|
|
106
|
+
|
|
107
|
+
Two honest caveats: **raw browser time is comparable** (same engine — the time
|
|
108
|
+
that matters is *model round-trips*, not browser speed), and these numbers
|
|
109
|
+
*under*-count MCP — we reproduced its payload with Playwright's aria snapshot,
|
|
110
|
+
which omits the per-node `[ref]` metadata MCP also sends, and we bill each
|
|
111
|
+
context token only once (a real agent loop re-sends the growing context every
|
|
112
|
+
turn, so MCP's snapshots get re-billed; prompt caching offsets some of that).
|
|
113
|
+
The single Hacker News snapshot alone is ~10k tokens.
|
|
114
|
+
|
|
115
|
+
### And it compounds on reruns
|
|
116
|
+
|
|
117
|
+
The bigger win isn't the first run — it's the *second*. Playwright MCP has no
|
|
118
|
+
project memory: every rerun re-explores the page from scratch, at full cost.
|
|
119
|
+
web-tester saves the flow on the first run (`inspect … --save-journey todomvc`)
|
|
120
|
+
as a **~500-byte plain-text recipe** — just the URL, the steps, and the
|
|
121
|
+
assertions. Not HTML, not snapshots; the big `report.html`/video stay in the
|
|
122
|
+
disposable `runs/` folder and are never reused. Every rerun is then one command
|
|
123
|
+
(`web-tester journey todomvc`) that replays those steps live — no snapshots, no
|
|
124
|
+
re-deriving selectors. So the cost gap widens with every repeat:
|
|
125
|
+
|
|
126
|
+

|
|
127
|
+
|
|
128
|
+
| | Run 1 (fresh) | each rerun | cost after 5 runs |
|
|
129
|
+
|---|---|---|---|
|
|
130
|
+
| **Playwright MCP** | $0.013 · 6 round-trips | $0.013 · 6 round-trips | **$0.064 · 30 round-trips** |
|
|
131
|
+
| **web-tester** | $0.003 · 1 round-trip (+saves the journey) | $0.002 · 1 round-trip | **$0.012 · 5 round-trips** |
|
|
132
|
+
|
|
133
|
+
That's the whole point of a per-project CLI: it *accumulates*. Recipes,
|
|
134
|
+
journeys, and the route map become the project's test memory — the agent does
|
|
135
|
+
the expensive exploration once and replays it for free, while a stateless MCP
|
|
136
|
+
server pays full price every time.
|
|
137
|
+
|
|
138
|
+
**The two pair well, they don't compete.** Use Playwright MCP for open-ended,
|
|
139
|
+
exploratory clicking; use web-tester to verify changes cheaply, sweep pages,
|
|
140
|
+
and build the project's test memory. web-tester can even hand MCP a logged-in
|
|
141
|
+
session (its saved storage state) when you want to drive an authenticated app
|
|
142
|
+
by hand.
|
|
143
|
+
|
|
144
|
+
<sub><a name="methodology"></a>**Methodology:** tasks run against
|
|
145
|
+
`demo.playwright.dev/todomvc` and `news.ycombinator.com`, June 2026. MCP input =
|
|
146
|
+
the accessibility snapshot returned per action (captured via Playwright's
|
|
147
|
+
`ariaSnapshot()` on the same live pages); web-tester input = the CLI's printed
|
|
148
|
+
summary; a rerun = `web-tester journey todomvc` against a saved journey. Output
|
|
149
|
+
tokens are a modest per-round-trip estimate. Dollar cost uses Claude Sonnet 4.6
|
|
150
|
+
list pricing ($3 / $15 per 1M input / output). Tokens ≈ characters ÷ 4.
|
|
151
|
+
Benchmark: [`docs/bench.js`](docs/bench.js); charts:
|
|
152
|
+
[`docs/make-charts.js`](docs/make-charts.js).</sub>
|
|
153
|
+
|
|
154
|
+
---
|
|
155
|
+
|
|
156
|
+
## Install
|
|
157
|
+
|
|
158
|
+
```bash
|
|
159
|
+
npx web-tester-for-claude help # zero-install, runs the latest from npm
|
|
160
|
+
```
|
|
161
|
+
|
|
162
|
+
Or as a project dev dep so the version is pinned:
|
|
163
|
+
|
|
164
|
+
```bash
|
|
165
|
+
npm install -D web-tester-for-claude
|
|
166
|
+
npx web-tester-for-claude help
|
|
167
|
+
```
|
|
168
|
+
|
|
169
|
+
The first run will fetch Playwright's Chromium binary on demand if it's not
|
|
170
|
+
already on disk (`npx playwright install chromium` to do it explicitly).
|
|
171
|
+
|
|
172
|
+
---
|
|
173
|
+
|
|
174
|
+
## Quick start
|
|
175
|
+
|
|
176
|
+
```bash
|
|
177
|
+
# 1. Interactive setup: scaffolds .web-tester/, writes a Claude Code skill +
|
|
178
|
+
# CLAUDE.md section, saves your base URL. (Bare `npx web-tester-for-claude` on a fresh
|
|
179
|
+
# project runs this automatically.)
|
|
180
|
+
npx web-tester-for-claude init
|
|
181
|
+
|
|
182
|
+
# 2. Start your dev server
|
|
183
|
+
npm run dev # whatever your dev command is
|
|
184
|
+
|
|
185
|
+
# 3. Map the running site → preset + recipes + journey drafts, all auto-generated
|
|
186
|
+
npx web-tester-for-claude map
|
|
187
|
+
|
|
188
|
+
# 4. Verify a single URL works end-to-end
|
|
189
|
+
npx web-tester-for-claude inspect / \
|
|
190
|
+
--step settle --quick \
|
|
191
|
+
--expect "selector=main" \
|
|
192
|
+
--fail-on http-5xx
|
|
193
|
+
```
|
|
194
|
+
|
|
195
|
+
The CLI prints the absolute path to `report.html` at the end of every run —
|
|
196
|
+
open it in a browser. Run artifacts land in `.web-tester/runs/` in your project
|
|
197
|
+
(override with `WEB_TESTER_RUNS_DIR`).
|
|
198
|
+
|
|
199
|
+
---
|
|
200
|
+
|
|
201
|
+
## Commands
|
|
202
|
+
|
|
203
|
+
| Command | What it does |
|
|
204
|
+
|---|---|
|
|
205
|
+
| `init` | Scaffold `.web-tester/` and wire the agent-instructions section into your `CLAUDE.md` / `AGENTS.md`. Run once per project. |
|
|
206
|
+
| `map` | Crawl your running site, classify every page, and auto-generate a sweep preset, smoke recipes, and form journey drafts. |
|
|
207
|
+
| `inspect <url>` | Drive one page, optionally with `--step …`, capture everything. |
|
|
208
|
+
| `sweep` | Run inspect concurrently across many URLs (one Chromium, N contexts). |
|
|
209
|
+
| `journey <name>` | Run a saved JSON journey from `.web-tester/journeys/<name>.json`. |
|
|
210
|
+
| `journey` (no arg) | List available journeys. |
|
|
211
|
+
| `impact` | Diff-aware advisory run — match changed files against rules in `.web-tester/impact-rules.json` and run the indicated sweeps/journeys. **Always exits 0.** |
|
|
212
|
+
| `kb` / `kb <topic>` | List or print a `.md` file in `.web-tester/instructions/` (or `.web-tester/`). |
|
|
213
|
+
| `help` | Full reference. |
|
|
214
|
+
|
|
215
|
+
Every command targets `http://localhost:3000` by default. Point at anything
|
|
216
|
+
else with `WEB_TESTER_BASE_URL=…`.
|
|
217
|
+
|
|
218
|
+
---
|
|
219
|
+
|
|
220
|
+
## Setup — `web-tester init`
|
|
221
|
+
|
|
222
|
+
The **first time** you run web-tester in a project, it drops into an
|
|
223
|
+
interactive setup (you can also run it explicitly any time):
|
|
224
|
+
|
|
225
|
+
```bash
|
|
226
|
+
npx web-tester-for-claude # first run → guided setup
|
|
227
|
+
npx web-tester-for-claude init # or run setup explicitly
|
|
228
|
+
```
|
|
229
|
+
|
|
230
|
+
It asks a few questions (each with a sensible default — just press Enter):
|
|
231
|
+
your dev server base URL, which agent file to write, how eagerly Claude
|
|
232
|
+
should reach for web-tester, whether to generate a Claude Code skill, and
|
|
233
|
+
whether to install Chromium now. Then it writes:
|
|
234
|
+
|
|
235
|
+
- **`.web-tester/`** — starter `impact-rules.json`, `urls-smoke.txt`, an
|
|
236
|
+
example journey, `instructions/` recipes, and a `config.json` holding your
|
|
237
|
+
base URL (so commands work without setting `WEB_TESTER_BASE_URL`). Run
|
|
238
|
+
artifacts go in `.web-tester/runs/`, gitignored automatically.
|
|
239
|
+
- **`.claude/skills/web-tester/SKILL.md`** — a [Claude Code
|
|
240
|
+
skill](https://docs.claude.com/en/docs/claude-code/skills) so Claude can
|
|
241
|
+
drive web-tester natively (auto-invoked for runtime-behavior questions, or
|
|
242
|
+
on demand via `/web-tester`), with the right `Bash(npx web-tester-for-claude *)`
|
|
243
|
+
permissions pre-approved.
|
|
244
|
+
- **`CLAUDE.md`** (or `AGENTS.md`) — a marker-fenced agent-instructions block
|
|
245
|
+
teaching *when* to reach for web-tester. Re-running replaces it in place;
|
|
246
|
+
your surrounding notes are untouched.
|
|
247
|
+
- **`.claude/settings.local.json`** — your `WEB_TESTER_AUTO_USE` preference,
|
|
248
|
+
merged in without clobbering existing settings.
|
|
249
|
+
|
|
250
|
+
Everything is idempotent — existing files are skipped (settings and config are
|
|
251
|
+
merged, never overwritten). Run non-interactively in CI with `--yes`.
|
|
252
|
+
|
|
253
|
+
| Flag | Purpose |
|
|
254
|
+
|---|---|
|
|
255
|
+
| `-y, --yes` | Non-interactive; accept all defaults. |
|
|
256
|
+
| `--base-url <url>` | Set the dev server base URL. |
|
|
257
|
+
| `--auto-use <on\|ask\|off>` | How eagerly Claude should reach for web-tester. |
|
|
258
|
+
| `--no-skill` | Don't generate the Claude Code skill. |
|
|
259
|
+
| `--no-agent` / `--agent-file <p>` | Skip, or target a specific agent file. |
|
|
260
|
+
| `--install-browser` | Fetch Chromium during setup. |
|
|
261
|
+
| `--force` | Overwrite existing scaffolded files. |
|
|
262
|
+
|
|
263
|
+
---
|
|
264
|
+
|
|
265
|
+
## Mapping a site — `web-tester map`
|
|
266
|
+
|
|
267
|
+
Point `map` at your running dev server and it crawls the site, classifies
|
|
268
|
+
every page, and writes a ready-to-use coverage starter kit — no hand-authoring:
|
|
269
|
+
|
|
270
|
+
```bash
|
|
271
|
+
npx web-tester-for-claude map # crawl from BASE_URL (uses sitemap.xml if present)
|
|
272
|
+
npx web-tester-for-claude map /docs # crawl just the /docs subtree
|
|
273
|
+
npx web-tester-for-claude map --no-sitemap --depth 2 # follow links only, two hops deep
|
|
274
|
+
```
|
|
275
|
+
|
|
276
|
+
It discovers pages two ways: it seeds from `sitemap.xml` when one exists, and
|
|
277
|
+
follows same-origin links breadth-first. Each page is classified (`home`,
|
|
278
|
+
`list`, `detail`, `form`, `auth`, `search`, `content`) and collapsed by route
|
|
279
|
+
template (`/products/12` and `/products/34` → `/products/:id`, capped per
|
|
280
|
+
template so a big catalog can't dominate). From that it generates, into
|
|
281
|
+
`.web-tester/`:
|
|
282
|
+
|
|
283
|
+
- **`urls-map.txt`** — one representative path per route, annotated with the
|
|
284
|
+
strongest expectation pack each page satisfied. Sweep it with
|
|
285
|
+
`web-tester sweep --preset map --fail-on http-5xx`.
|
|
286
|
+
- **`instructions/recipes.md`** — a copy-paste `inspect` one-liner per page
|
|
287
|
+
type, in a marker-fenced block that `map` refreshes on each run.
|
|
288
|
+
- **`journeys/*.json`** — a draft journey per distinct form found (fields
|
|
289
|
+
pre-filled with sample values). Review the selectors, values, and add
|
|
290
|
+
expectations before relying on them.
|
|
291
|
+
|
|
292
|
+
Plus an HTML site map (`runs/map-<id>/map.html`) with a screenshot, status,
|
|
293
|
+
and link count per route.
|
|
294
|
+
|
|
295
|
+
| Flag | Purpose |
|
|
296
|
+
|---|---|
|
|
297
|
+
| `--limit <n>` | Max pages to fetch (default 50). |
|
|
298
|
+
| `--depth <n>` | Max link hops when crawling (default 3; ignored for sitemap seeds). |
|
|
299
|
+
| `--per-template <n>` | Max pages fetched per route template (default 3). |
|
|
300
|
+
| `--max-journeys <n>` | Cap on generated journey drafts (default 12). |
|
|
301
|
+
| `--no-sitemap` | Don't seed from `sitemap.xml`; follow links only. |
|
|
302
|
+
| `--sitemap <url>` | Use a specific sitemap URL. |
|
|
303
|
+
| `--filter` / `--exclude <regex>` | Keep / drop matching paths. |
|
|
304
|
+
| `--no-screenshots` | Skip per-page screenshots (faster). |
|
|
305
|
+
| `--force` | Overwrite existing generated journeys. |
|
|
306
|
+
|
|
307
|
+
Everything `map` writes is yours to edit — it's a starting point that turns a
|
|
308
|
+
cold project into a covered one in one command.
|
|
309
|
+
|
|
310
|
+
---
|
|
311
|
+
|
|
312
|
+
## What lands in `runs/<id>/`
|
|
313
|
+
|
|
314
|
+
| File | Contents |
|
|
315
|
+
|---|---|
|
|
316
|
+
| `report.html` | **Self-contained HTML report.** Open this first. |
|
|
317
|
+
| `result.json` | Full structured report — same data as the HTML. Programmatic reads. |
|
|
318
|
+
| `video/page@<hash>.webm` | Screen recording (omit with `--no-video` or `--quick`). |
|
|
319
|
+
| `initial.png` / `initial-full.png` | Viewport + full-page after first load. |
|
|
320
|
+
| `final.png` / `final-full.png` | Viewport + full-page after last step. |
|
|
321
|
+
| `steps/NN-<label>.png` | One screenshot per step. |
|
|
322
|
+
| `initial.html` / `final.html` | Page HTML (only if `--html`). |
|
|
323
|
+
| `console.json`, `network.json` | Raw streams (also embedded in `result.json`). |
|
|
324
|
+
|
|
325
|
+
`--quick` is the most useful flag: no video, no full-page screenshots, no
|
|
326
|
+
HTML capture, no AI summary. Pair with `--expect` / `--fail-on` for a real
|
|
327
|
+
pass/fail gate in 5–10s.
|
|
328
|
+
|
|
329
|
+
---
|
|
330
|
+
|
|
331
|
+
## Step grammar
|
|
332
|
+
|
|
333
|
+
`--step` can be repeated. Steps run sequentially, with their own screenshot
|
|
334
|
+
plus the slice of console / network / page-errors produced *during* that step.
|
|
335
|
+
|
|
336
|
+
```
|
|
337
|
+
goto:<url> navigate (absolute or path)
|
|
338
|
+
reload reload current page
|
|
339
|
+
wait:<load|domcontentloaded|networkidle>
|
|
340
|
+
wait:<ms> sleep N ms
|
|
341
|
+
wait:<selector> wait for selector
|
|
342
|
+
wait:text=<exact text> wait for matching text
|
|
343
|
+
wait:url-stable[=<ms>] wait until URL changes at least once then
|
|
344
|
+
stays still for <ms> (default 250)
|
|
345
|
+
wait:url-contains:<sub>[@<ms>] wait until URL contains <sub>
|
|
346
|
+
(use @ not = so <sub> can include '=')
|
|
347
|
+
settle[:<ms>] wait for data-attr-selected-label to
|
|
348
|
+
populate on any [data-attr-name] element.
|
|
349
|
+
Fast-paths in ~3s if none are present.
|
|
350
|
+
Apps without data-attrs should prefer
|
|
351
|
+
'wait:networkidle'.
|
|
352
|
+
click:<selector> click (Playwright locator; supports CSS
|
|
353
|
+
and :has-text())
|
|
354
|
+
hover:<selector>
|
|
355
|
+
fill:<selector>=<value> native input
|
|
356
|
+
react-fill:<selector>=<value> React-controlled input (calls the native
|
|
357
|
+
value setter + dispatches synthetic
|
|
358
|
+
input/change/blur events)
|
|
359
|
+
press:<selector>=<key> keyboard press
|
|
360
|
+
select:<selector>=<value> native <select>
|
|
361
|
+
scroll:<top|bottom|<px>>
|
|
362
|
+
screenshot[:<name>] viewport screenshot
|
|
363
|
+
screenshot-full[:<name>] full-page screenshot
|
|
364
|
+
eval:<JS expression> run in page context; result attached to step
|
|
365
|
+
```
|
|
366
|
+
|
|
367
|
+
For long step chains, drop them in a JSON file and pass `--steps-file flow.json`:
|
|
368
|
+
|
|
369
|
+
```json
|
|
370
|
+
["settle", "screenshot:initial", "click:button:has-text(\"Submit\")",
|
|
371
|
+
"wait:networkidle", "goto:/thanks"]
|
|
372
|
+
```
|
|
373
|
+
|
|
374
|
+
---
|
|
375
|
+
|
|
376
|
+
## Verdict & assertions
|
|
377
|
+
|
|
378
|
+
Use these to turn a run into a real pass/fail gate.
|
|
379
|
+
|
|
380
|
+
| Flag | Purpose |
|
|
381
|
+
|---|---|
|
|
382
|
+
| `--fail-on <list>` | Comma-sep kinds that flip `ok` to false: `page-errors`, `console-errors`, `4xx`, `5xx`. Exit code 1 on any trigger. |
|
|
383
|
+
| `--expect <kind>=<value>` | (Repeatable) final-page assertion. Kinds: `text=…`, `no-text=…`, `selector=…`, `no-selector=…`, `attr=<Name>:<value>`. |
|
|
384
|
+
| `--persist <ms>` | Re-check every `--expect` after waiting `<ms>`. **Both** checks must pass — catches transient state (a toast that flashes for 1s then disappears). |
|
|
385
|
+
|
|
386
|
+
```bash
|
|
387
|
+
# Don't trust a single check for derived state. --persist re-validates.
|
|
388
|
+
npx web-tester-for-claude inspect /pricing \
|
|
389
|
+
--step settle --quick \
|
|
390
|
+
--expect "text=$49/mo" \
|
|
391
|
+
--persist 2500 \
|
|
392
|
+
--fail-on http-5xx
|
|
393
|
+
```
|
|
394
|
+
|
|
395
|
+
---
|
|
396
|
+
|
|
397
|
+
## Deeper capture — `--deep`
|
|
398
|
+
|
|
399
|
+
When a one-line console message isn't enough, add `--deep` to `inspect`. It
|
|
400
|
+
turns on three heavier signals that are off by default:
|
|
401
|
+
|
|
402
|
+
- **Request + response bodies** for XHR/fetch/document requests (textual
|
|
403
|
+
content only, truncated). The bug is often *in the payload* — a `200` that
|
|
404
|
+
returns `{"error":"out of stock"}` looks fine until you read the body.
|
|
405
|
+
- **Local scope at every uncaught exception.** web-tester attaches a Chrome
|
|
406
|
+
DevTools Protocol debugger, pauses on each throw, dumps the throwing
|
|
407
|
+
function's local + closure variables, and resumes immediately. Instead of
|
|
408
|
+
just `TypeError: cannot read 'id' of undefined`, you get
|
|
409
|
+
`local: userId=42, cart={ items: 3, total: 9.99 }` at the throw site.
|
|
410
|
+
- **Unhandled promise rejections**, which the normal `pageerror` stream
|
|
411
|
+
misses entirely.
|
|
412
|
+
|
|
413
|
+
```bash
|
|
414
|
+
npx web-tester-for-claude inspect /checkout \
|
|
415
|
+
--deep --quick \
|
|
416
|
+
--step "click:button:has-text(\"Pay\")" \
|
|
417
|
+
--step wait:networkidle
|
|
418
|
+
```
|
|
419
|
+
|
|
420
|
+
The CLI prints the exceptions with their scope; the full dump (and bodies)
|
|
421
|
+
land in `result.json` under `deepErrors`, `unhandledRejections`, and each
|
|
422
|
+
`network.entries[].responseBody`. The debugger pauses add overhead, so reach
|
|
423
|
+
for `--deep` when you're diagnosing a specific failure, not on every run.
|
|
424
|
+
|
|
425
|
+
---
|
|
426
|
+
|
|
427
|
+
## Authentication
|
|
428
|
+
|
|
429
|
+
Most real flows live behind a login. web-tester drives the login **once** and
|
|
430
|
+
reuses the session, so gated pages work without logging in every run.
|
|
431
|
+
|
|
432
|
+
```bash
|
|
433
|
+
# 1. Run your login flow with --save-session
|
|
434
|
+
web-tester inspect /login \
|
|
435
|
+
--step "fill:input[name=email]=test@example.com" \
|
|
436
|
+
--step "fill:input[name=password]=your-test-password" \
|
|
437
|
+
--step "click:button[type=submit]" \
|
|
438
|
+
--step "wait:url-contains:/dashboard" \
|
|
439
|
+
--save-session
|
|
440
|
+
|
|
441
|
+
# 2. Every later inspect / sweep / journey is now authenticated automatically.
|
|
442
|
+
web-tester inspect /account --quick --expect "text=Sign out"
|
|
443
|
+
|
|
444
|
+
# Force a logged-out run any time:
|
|
445
|
+
web-tester inspect / --no-session
|
|
446
|
+
```
|
|
447
|
+
|
|
448
|
+
`--save-session` writes the browser session — cookies + localStorage — to
|
|
449
|
+
`~/.web-tester/session.json`. That file is **machine-local**: it lives in your
|
|
450
|
+
home directory, not the repo, and is never committed. It's saved only after a
|
|
451
|
+
clean run (so a failed login can't overwrite a good session), and refreshed
|
|
452
|
+
automatically on later runs so rotating tokens keep working. You can save the
|
|
453
|
+
login as a journey (`--save-journey login`) and re-authenticate with
|
|
454
|
+
`web-tester journey login --save-session`.
|
|
455
|
+
|
|
456
|
+
> ⚠️ **Use test credentials only — at your own risk.**
|
|
457
|
+
>
|
|
458
|
+
> Anything you put in a `--step`, a saved journey, or otherwise hand to
|
|
459
|
+
> web-tester is **visible to the AI agent** driving it. Credentials written
|
|
460
|
+
> into a step are stored in **plain text** in `.web-tester/journeys/*.json`,
|
|
461
|
+
> which is committed to your repo. The saved session in
|
|
462
|
+
> `~/.web-tester/session.json` grants access to anything that account can reach.
|
|
463
|
+
>
|
|
464
|
+
> Never use production, personal, or privileged accounts. Use a **disposable
|
|
465
|
+
> test account** scoped to a safe environment, and treat anything reachable
|
|
466
|
+
> with it as exposed. You assume all responsibility for credentials, tokens,
|
|
467
|
+
> and actions taken with them.
|
|
468
|
+
|
|
469
|
+
---
|
|
470
|
+
|
|
471
|
+
## `.web-tester/` — your project's recipes
|
|
472
|
+
|
|
473
|
+
Everything project-specific lives in `.web-tester/` at your project root.
|
|
474
|
+
All files are optional; commands fail gracefully when they're missing.
|
|
475
|
+
|
|
476
|
+
```
|
|
477
|
+
.web-tester/
|
|
478
|
+
impact-rules.json # rules for `web-tester impact`
|
|
479
|
+
urls-<name>.txt # URL preset for `web-tester sweep --preset <name>`
|
|
480
|
+
journeys/<name>.json # saved flows for `web-tester journey <name>`
|
|
481
|
+
instructions/*.md # knowledge base (or .web-tester/*.md flat for
|
|
482
|
+
# small projects)
|
|
483
|
+
```
|
|
484
|
+
|
|
485
|
+
### `impact-rules.json`
|
|
486
|
+
|
|
487
|
+
Each rule names a set of path globs and what to run if any changed file
|
|
488
|
+
matches. `web-tester impact` reads `git diff` against `origin/main` (or
|
|
489
|
+
`--base <ref>`) and executes matched rules. **Advisory only — never blocks
|
|
490
|
+
your push.**
|
|
491
|
+
|
|
492
|
+
```json
|
|
493
|
+
{
|
|
494
|
+
"rules": [
|
|
495
|
+
{
|
|
496
|
+
"name": "Auth code changed — full sign-up journey",
|
|
497
|
+
"when_changed_any": ["src/auth/**", "src/pages/api/auth/**"],
|
|
498
|
+
"journey": "signup"
|
|
499
|
+
},
|
|
500
|
+
{
|
|
501
|
+
"name": "Shared layout changed — sweep top pages",
|
|
502
|
+
"when_changed_any": ["src/components/Layout/**"],
|
|
503
|
+
"sweep": {
|
|
504
|
+
"urls": ["/", "/pricing", "/docs"],
|
|
505
|
+
"packs": ["homepage"]
|
|
506
|
+
}
|
|
507
|
+
}
|
|
508
|
+
]
|
|
509
|
+
}
|
|
510
|
+
```
|
|
511
|
+
|
|
512
|
+
### `urls-<name>.txt`
|
|
513
|
+
|
|
514
|
+
Newline-separated URLs/paths. `#` comments allowed. Per-URL `#pack=<name>`
|
|
515
|
+
annotations apply the named expectation pack on top of anything global.
|
|
516
|
+
|
|
517
|
+
```
|
|
518
|
+
# urls-smoke.txt
|
|
519
|
+
/ #pack=homepage
|
|
520
|
+
/pricing
|
|
521
|
+
/docs #pack=has-h1 #pack=has-main
|
|
522
|
+
```
|
|
523
|
+
|
|
524
|
+
### `journeys/<name>.json`
|
|
525
|
+
|
|
526
|
+
Bundles a URL + step chain + assertions for `web-tester journey <name>`.
|
|
527
|
+
|
|
528
|
+
```json
|
|
529
|
+
{
|
|
530
|
+
"description": "User signs up, lands on dashboard",
|
|
531
|
+
"url": "/signup",
|
|
532
|
+
"steps": [
|
|
533
|
+
"settle",
|
|
534
|
+
"fill:input[name=email]=test@example.com",
|
|
535
|
+
"fill:input[name=password]=hunter2",
|
|
536
|
+
"click:button[type=submit]",
|
|
537
|
+
"wait:url-contains:/dashboard"
|
|
538
|
+
],
|
|
539
|
+
"expectations": ["text=Welcome", "selector=[data-test=dashboard]"],
|
|
540
|
+
"failOn": "http-5xx"
|
|
541
|
+
}
|
|
542
|
+
```
|
|
543
|
+
|
|
544
|
+
### `instructions/*.md`
|
|
545
|
+
|
|
546
|
+
Plain-English notes on your project's quirks. Run `web-tester kb` to list
|
|
547
|
+
them, `web-tester kb <topic>` to print one. AI agents read these instead of
|
|
548
|
+
re-discovering domain knowledge by grepping your source.
|
|
549
|
+
|
|
550
|
+
---
|
|
551
|
+
|
|
552
|
+
## Built-in expectation packs
|
|
553
|
+
|
|
554
|
+
Pass `--pack <name>` to apply one to every URL in a sweep, or annotate
|
|
555
|
+
URLs in a `urls-*.txt` file with `#pack=<name>`.
|
|
556
|
+
|
|
557
|
+
| Pack | Asserts |
|
|
558
|
+
|---|---|
|
|
559
|
+
| `homepage` | `<header>` + `<footer>` present |
|
|
560
|
+
| `static` | `<header>` + `<footer>` present |
|
|
561
|
+
| `category` | `<header>` + `<footer>` + an internal anchor inside `<main>` containing an `<img>` |
|
|
562
|
+
| `has-main` | `<main>` present |
|
|
563
|
+
| `has-h1` | `<h1>` present |
|
|
564
|
+
|
|
565
|
+
Add project-specific packs in `src/inspector/packs.ts` (PRs welcome for
|
|
566
|
+
genuinely generic patterns) or wrap web-tester with your own pre-flight
|
|
567
|
+
script that injects `--expect …` flags.
|
|
568
|
+
|
|
569
|
+
---
|
|
570
|
+
|
|
571
|
+
## Environment
|
|
572
|
+
|
|
573
|
+
| Var | Default | Purpose |
|
|
574
|
+
|---|---|---|
|
|
575
|
+
| `WEB_TESTER_BASE_URL` | `http://localhost:3000` | Resolves bare paths to absolute URLs. |
|
|
576
|
+
| `GOTO_TIMEOUT_MS` | `30000` | Initial `page.goto` timeout. |
|
|
577
|
+
| `STEP_TIMEOUT_MS` | `15000` | Per-step action timeout. |
|
|
578
|
+
| `SETTLE_TIMEOUT_MS` | `30000` | `settle` step ceiling. |
|
|
579
|
+
|
|
580
|
+
`.env` files in the cwd are loaded automatically (via `dotenv`).
|
|
581
|
+
|
|
582
|
+
---
|
|
583
|
+
|
|
584
|
+
## Report shape (excerpt)
|
|
585
|
+
|
|
586
|
+
```jsonc
|
|
587
|
+
{
|
|
588
|
+
"runId": "2026-06-04T17-12-03",
|
|
589
|
+
"ok": false,
|
|
590
|
+
"video": "video/page@abc….webm",
|
|
591
|
+
"requestedUrl": "http://localhost:3000/products/widget",
|
|
592
|
+
"finalUrl": "http://localhost:3000/cart",
|
|
593
|
+
"title": "Cart | Acme",
|
|
594
|
+
"durationMs": 8423,
|
|
595
|
+
"failedSteps": 0,
|
|
596
|
+
"verdictTriggers": [],
|
|
597
|
+
"initial": { "screenshot": "initial.png", "attrs": [] },
|
|
598
|
+
"final": { "screenshot": "final.png", "attrs": [] },
|
|
599
|
+
"console": { "totals": { "error": 1, "log": 14 }, "entries": [] },
|
|
600
|
+
"network": { "count": 23, "failedCount": 1, "entries": [] },
|
|
601
|
+
"pageErrors": [],
|
|
602
|
+
"steps": [
|
|
603
|
+
{
|
|
604
|
+
"index": 1,
|
|
605
|
+
"step": { "kind": "click", "selector": "button:has-text(\"Submit\")" },
|
|
606
|
+
"label": "click button:has-text(\"Submit\")",
|
|
607
|
+
"ok": true,
|
|
608
|
+
"durationMs": 412,
|
|
609
|
+
"url": "http://localhost:3000/products/widget",
|
|
610
|
+
"screenshot": "steps/01-click.png",
|
|
611
|
+
"console": [],
|
|
612
|
+
"network": [{ "method": "POST", "url": ".../cart", "status": 200, "durationMs": 187 }],
|
|
613
|
+
"pageErrors": []
|
|
614
|
+
}
|
|
615
|
+
]
|
|
616
|
+
}
|
|
617
|
+
```
|
|
618
|
+
|
|
619
|
+
---
|
|
620
|
+
|
|
621
|
+
## What it is *not*
|
|
622
|
+
|
|
623
|
+
- Not an LLM pipeline: `map` generates scaffolding **deterministically** from
|
|
624
|
+
what it observes in the browser — no model picks your assertions. (The
|
|
625
|
+
optional `--summary` is the one exception, and it's off by default.)
|
|
626
|
+
- Not a judge: nothing decides whether a result is good or bad.
|
|
627
|
+
- Not a test runner: there are no `expect()` calls, no pass/fail beyond
|
|
628
|
+
the literal "did the steps execute, did the `--expect` flags hold" gate.
|
|
629
|
+
|
|
630
|
+
What `map` writes is a *starting point* — the meaningful assertions, the
|
|
631
|
+
which-flows-matter decisions, the weighing of a finding all still belong to
|
|
632
|
+
you, or to your AI agent reading the report.
|
|
633
|
+
|
|
634
|
+
---
|
|
635
|
+
|
|
636
|
+
## Contributing
|
|
637
|
+
|
|
638
|
+
Issues and PRs welcome. Run the type check:
|
|
639
|
+
|
|
640
|
+
```bash
|
|
641
|
+
npm run tsc
|
|
642
|
+
```
|
|
643
|
+
|
|
644
|
+
The codebase is intentionally small (~3K LOC) and TypeScript with no
|
|
645
|
+
runtime deps beyond `playwright`, `tsx`, and `dotenv`. Keep it that way.
|
|
646
|
+
|
|
647
|
+
---
|
|
648
|
+
|
|
649
|
+
## License
|
|
650
|
+
|
|
651
|
+
MIT — see [LICENSE](LICENSE).
|