web-tester-for-claude 0.4.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/LICENSE ADDED
@@ -0,0 +1,21 @@
1
+ MIT License
2
+
3
+ Copyright (c) 2026 Haroon Khan
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in all
13
+ copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21
+ SOFTWARE.
package/README.md ADDED
@@ -0,0 +1,651 @@
1
+ # web-tester-for-claude
2
+
3
+ > Let your coding agent **see and verify** the web changes it makes. web-tester
4
+ > drives your dev site, captures everything to one report, and runs a whole
5
+ > flow in a **single model turn** — not a dozen turn-by-turn tool calls.
6
+
7
+ web-tester wraps Chromium with a single, opinionated capture pipeline: every
8
+ console line, every network request, every page error, every step screenshot,
9
+ the whole video, the full DOM if you ask for it — into one self-contained
10
+ HTML report and one structured `result.json` per run. The agent reads back only
11
+ the slices it needs, so the **edit → verify → edit loop stays cheap and fast**
12
+ even across many steps.
13
+
14
+ It's intentionally a *toolkit*, not a pipeline. There is no LLM stage, no
15
+ test generation, no judging. You (or an AI agent like Claude Code) decide
16
+ what to look at; `web-tester` just makes it cheap to look.
17
+
18
+ ```bash
19
+ # Quick verify a change — fail on any 5xx, assert text is visible, in ~6s
20
+ npx web-tester-for-claude inspect "/products/widget" \
21
+ --step settle --quick \
22
+ --expect "text=Add to Cart" \
23
+ --fail-on http-5xx
24
+
25
+ # Drive a flow, capture state at every step
26
+ npx web-tester-for-claude inspect "/products/widget" \
27
+ --step settle \
28
+ --step screenshot:initial \
29
+ --step "click:button:has-text(\"Add to Cart\")" \
30
+ --step wait:networkidle \
31
+ --step goto:/cart \
32
+ --step screenshot:cart
33
+
34
+ # Bulk-sweep many URLs in parallel
35
+ npx web-tester-for-claude sweep --sitemap --filter '^/products/' --concurrency 4 \
36
+ --fail-on http-5xx
37
+ ```
38
+
39
+ ---
40
+
41
+ ## Why web-tester
42
+
43
+ You can drive Playwright yourself, of course. web-tester earns its weight in
44
+ three ways that show up every day:
45
+
46
+ 1. **Disk-as-cache report shape.** One run captures everything to
47
+ `.web-tester/runs/<id>/`, and the CLI prints the path to a self-contained
48
+ `report.html`. AI agents read `result.json` selectively
49
+ (`jq '.steps[3].network'`) instead of pulling every byte of browser
50
+ state back into their conversation context. For "reproduce this bug,
51
+ tell me what happened" tasks this uses **5–10× fewer tokens** than
52
+ piping DOM snapshots back through stdout.
53
+ 2. **One step grammar.** No heredoc Playwright scripts to maintain.
54
+ `--step click:…`, `--step fill:…=…`, `--step wait:url-contains:…` —
55
+ composable, copy-pasteable from a recipe, no boilerplate.
56
+ 3. **Knowledge files travel with the repo.** Drop project quirks into
57
+ `.web-tester/instructions/*.md` and any future session — yours or your
58
+ AI agent's — gets them as a warm start instead of re-discovering them.
59
+
60
+ The HTML report has a sticky video player with speed presets, a step
61
+ timeline with screenshot + console/network slices, lightboxed full-page
62
+ screenshots, and collapsible global logs. Open it first; the JSON is for
63
+ programmatic reads.
64
+
65
+ ---
66
+
67
+ ## Why a CLI, and not an MCP server?
68
+
69
+ [Microsoft's Playwright MCP](https://github.com/microsoft/playwright-mcp) is
70
+ excellent for *live, interactive* browser control — the agent decides each
71
+ click as it goes. web-tester is deliberately a **CLI** instead, because a
72
+ coding agent's job isn't to click around live; it's to **verify a change it
73
+ just made**, repeatedly, per project. A CLI fits that better in three ways an
74
+ MCP server structurally can't:
75
+
76
+ - **It learns the project over time.** Everything lives in `.web-tester/` —
77
+ recipes, instructions, a route map, journeys — and *grows* as you use it. The
78
+ next session gets a warm start instead of rediscovering your site. An MCP
79
+ server is stateless per project; it remembers nothing between runs.
80
+ - **It produces artifacts.** One run writes a self-contained `report.html`
81
+ (video + step timeline) and a structured `result.json` you can diff, attach
82
+ to a PR, or hand to CI. MCP returns everything into the conversation and
83
+ it's gone.
84
+ - **It barely touches context.** MCP returns a full page snapshot into the
85
+ conversation on *every* step; those tokens pile up and never leave. web-tester
86
+ runs the whole flow in one process and hands back a compact verdict — the
87
+ agent reads `result.json` slices only if it needs them.
88
+
89
+ ### Measured: tokens, round-trips, and cost
90
+
91
+ The same task, run each way — counting what enters the model's context, the
92
+ model round-trips, and the resulting **token cost** ([methodology](#methodology)):
93
+
94
+ ![web-tester vs Playwright MCP — tokens into context per task](docs/mcp-comparison.svg)
95
+
96
+ | Task | Tool | Input tok | Output tok | Round-trips | Cost / run | Per 1,000 runs |
97
+ |---|---|--:|--:|--:|--:|--:|
98
+ | **TodoMVC**<br>add 3, complete 1, filter | Playwright MCP | ~1,240 | ~600 | 6 | $0.013 | $12.70 |
99
+ | | **web-tester** | **~300** | ~150 | **1** | **$0.003** | **$3.16** · 4× less |
100
+ | **Hacker News**<br>verify front page | Playwright MCP | ~10,100 | ~100 | 1 | $0.032 | $31.80 |
101
+ | | **web-tester** | **~220** | ~150 | 1 | **$0.003** | **$2.90** · 11× less |
102
+
103
+ Cost is at Claude Sonnet 4.6 list price ($3 / $15 per 1M input / output tokens)
104
+ and scales linearly with whatever model you run (≈1.7× at Opus 4.8 rates). Input
105
+ tokens are measured; output is a modest per-round-trip estimate.
106
+
107
+ Two honest caveats: **raw browser time is comparable** (same engine — the time
108
+ that matters is *model round-trips*, not browser speed), and these numbers
109
+ *under*-count MCP — we reproduced its payload with Playwright's aria snapshot,
110
+ which omits the per-node `[ref]` metadata MCP also sends, and we bill each
111
+ context token only once (a real agent loop re-sends the growing context every
112
+ turn, so MCP's snapshots get re-billed; prompt caching offsets some of that).
113
+ The single Hacker News snapshot alone is ~10k tokens.
114
+
115
+ ### And it compounds on reruns
116
+
117
+ The bigger win isn't the first run — it's the *second*. Playwright MCP has no
118
+ project memory: every rerun re-explores the page from scratch, at full cost.
119
+ web-tester saves the flow on the first run (`inspect … --save-journey todomvc`)
120
+ as a **~500-byte plain-text recipe** — just the URL, the steps, and the
121
+ assertions. Not HTML, not snapshots; the big `report.html`/video stay in the
122
+ disposable `runs/` folder and are never reused. Every rerun is then one command
123
+ (`web-tester journey todomvc`) that replays those steps live — no snapshots, no
124
+ re-deriving selectors. So the cost gap widens with every repeat:
125
+
126
+ ![Cumulative cost across 5 reruns of the same task](docs/mcp-cost.svg)
127
+
128
+ | | Run 1 (fresh) | each rerun | cost after 5 runs |
129
+ |---|---|---|---|
130
+ | **Playwright MCP** | $0.013 · 6 round-trips | $0.013 · 6 round-trips | **$0.064 · 30 round-trips** |
131
+ | **web-tester** | $0.003 · 1 round-trip (+saves the journey) | $0.002 · 1 round-trip | **$0.012 · 5 round-trips** |
132
+
133
+ That's the whole point of a per-project CLI: it *accumulates*. Recipes,
134
+ journeys, and the route map become the project's test memory — the agent does
135
+ the expensive exploration once and replays it for free, while a stateless MCP
136
+ server pays full price every time.
137
+
138
+ **The two pair well, they don't compete.** Use Playwright MCP for open-ended,
139
+ exploratory clicking; use web-tester to verify changes cheaply, sweep pages,
140
+ and build the project's test memory. web-tester can even hand MCP a logged-in
141
+ session (its saved storage state) when you want to drive an authenticated app
142
+ by hand.
143
+
144
+ <sub><a name="methodology"></a>**Methodology:** tasks run against
145
+ `demo.playwright.dev/todomvc` and `news.ycombinator.com`, June 2026. MCP input =
146
+ the accessibility snapshot returned per action (captured via Playwright's
147
+ `ariaSnapshot()` on the same live pages); web-tester input = the CLI's printed
148
+ summary; a rerun = `web-tester journey todomvc` against a saved journey. Output
149
+ tokens are a modest per-round-trip estimate. Dollar cost uses Claude Sonnet 4.6
150
+ list pricing ($3 / $15 per 1M input / output). Tokens ≈ characters ÷ 4.
151
+ Benchmark: [`docs/bench.js`](docs/bench.js); charts:
152
+ [`docs/make-charts.js`](docs/make-charts.js).</sub>
153
+
154
+ ---
155
+
156
+ ## Install
157
+
158
+ ```bash
159
+ npx web-tester-for-claude help # zero-install, runs the latest from npm
160
+ ```
161
+
162
+ Or as a project dev dep so the version is pinned:
163
+
164
+ ```bash
165
+ npm install -D web-tester-for-claude
166
+ npx web-tester-for-claude help
167
+ ```
168
+
169
+ The first run will fetch Playwright's Chromium binary on demand if it's not
170
+ already on disk (`npx playwright install chromium` to do it explicitly).
171
+
172
+ ---
173
+
174
+ ## Quick start
175
+
176
+ ```bash
177
+ # 1. Interactive setup: scaffolds .web-tester/, writes a Claude Code skill +
178
+ # CLAUDE.md section, saves your base URL. (Bare `npx web-tester-for-claude` on a fresh
179
+ # project runs this automatically.)
180
+ npx web-tester-for-claude init
181
+
182
+ # 2. Start your dev server
183
+ npm run dev # whatever your dev command is
184
+
185
+ # 3. Map the running site → preset + recipes + journey drafts, all auto-generated
186
+ npx web-tester-for-claude map
187
+
188
+ # 4. Verify a single URL works end-to-end
189
+ npx web-tester-for-claude inspect / \
190
+ --step settle --quick \
191
+ --expect "selector=main" \
192
+ --fail-on http-5xx
193
+ ```
194
+
195
+ The CLI prints the absolute path to `report.html` at the end of every run —
196
+ open it in a browser. Run artifacts land in `.web-tester/runs/` in your project
197
+ (override with `WEB_TESTER_RUNS_DIR`).
198
+
199
+ ---
200
+
201
+ ## Commands
202
+
203
+ | Command | What it does |
204
+ |---|---|
205
+ | `init` | Scaffold `.web-tester/` and wire the agent-instructions section into your `CLAUDE.md` / `AGENTS.md`. Run once per project. |
206
+ | `map` | Crawl your running site, classify every page, and auto-generate a sweep preset, smoke recipes, and form journey drafts. |
207
+ | `inspect <url>` | Drive one page, optionally with `--step …`, capture everything. |
208
+ | `sweep` | Run inspect concurrently across many URLs (one Chromium, N contexts). |
209
+ | `journey <name>` | Run a saved JSON journey from `.web-tester/journeys/<name>.json`. |
210
+ | `journey` (no arg) | List available journeys. |
211
+ | `impact` | Diff-aware advisory run — match changed files against rules in `.web-tester/impact-rules.json` and run the indicated sweeps/journeys. **Always exits 0.** |
212
+ | `kb` / `kb <topic>` | List or print a `.md` file in `.web-tester/instructions/` (or `.web-tester/`). |
213
+ | `help` | Full reference. |
214
+
215
+ Every command targets `http://localhost:3000` by default. Point at anything
216
+ else with `WEB_TESTER_BASE_URL=…`.
217
+
218
+ ---
219
+
220
+ ## Setup — `web-tester init`
221
+
222
+ The **first time** you run web-tester in a project, it drops into an
223
+ interactive setup (you can also run it explicitly any time):
224
+
225
+ ```bash
226
+ npx web-tester-for-claude # first run → guided setup
227
+ npx web-tester-for-claude init # or run setup explicitly
228
+ ```
229
+
230
+ It asks a few questions (each with a sensible default — just press Enter):
231
+ your dev server base URL, which agent file to write, how eagerly Claude
232
+ should reach for web-tester, whether to generate a Claude Code skill, and
233
+ whether to install Chromium now. Then it writes:
234
+
235
+ - **`.web-tester/`** — starter `impact-rules.json`, `urls-smoke.txt`, an
236
+ example journey, `instructions/` recipes, and a `config.json` holding your
237
+ base URL (so commands work without setting `WEB_TESTER_BASE_URL`). Run
238
+ artifacts go in `.web-tester/runs/`, gitignored automatically.
239
+ - **`.claude/skills/web-tester/SKILL.md`** — a [Claude Code
240
+ skill](https://docs.claude.com/en/docs/claude-code/skills) so Claude can
241
+ drive web-tester natively (auto-invoked for runtime-behavior questions, or
242
+ on demand via `/web-tester`), with the right `Bash(npx web-tester-for-claude *)`
243
+ permissions pre-approved.
244
+ - **`CLAUDE.md`** (or `AGENTS.md`) — a marker-fenced agent-instructions block
245
+ teaching *when* to reach for web-tester. Re-running replaces it in place;
246
+ your surrounding notes are untouched.
247
+ - **`.claude/settings.local.json`** — your `WEB_TESTER_AUTO_USE` preference,
248
+ merged in without clobbering existing settings.
249
+
250
+ Everything is idempotent — existing files are skipped (settings and config are
251
+ merged, never overwritten). Run non-interactively in CI with `--yes`.
252
+
253
+ | Flag | Purpose |
254
+ |---|---|
255
+ | `-y, --yes` | Non-interactive; accept all defaults. |
256
+ | `--base-url <url>` | Set the dev server base URL. |
257
+ | `--auto-use <on\|ask\|off>` | How eagerly Claude should reach for web-tester. |
258
+ | `--no-skill` | Don't generate the Claude Code skill. |
259
+ | `--no-agent` / `--agent-file <p>` | Skip, or target a specific agent file. |
260
+ | `--install-browser` | Fetch Chromium during setup. |
261
+ | `--force` | Overwrite existing scaffolded files. |
262
+
263
+ ---
264
+
265
+ ## Mapping a site — `web-tester map`
266
+
267
+ Point `map` at your running dev server and it crawls the site, classifies
268
+ every page, and writes a ready-to-use coverage starter kit — no hand-authoring:
269
+
270
+ ```bash
271
+ npx web-tester-for-claude map # crawl from BASE_URL (uses sitemap.xml if present)
272
+ npx web-tester-for-claude map /docs # crawl just the /docs subtree
273
+ npx web-tester-for-claude map --no-sitemap --depth 2 # follow links only, two hops deep
274
+ ```
275
+
276
+ It discovers pages two ways: it seeds from `sitemap.xml` when one exists, and
277
+ follows same-origin links breadth-first. Each page is classified (`home`,
278
+ `list`, `detail`, `form`, `auth`, `search`, `content`) and collapsed by route
279
+ template (`/products/12` and `/products/34` → `/products/:id`, capped per
280
+ template so a big catalog can't dominate). From that it generates, into
281
+ `.web-tester/`:
282
+
283
+ - **`urls-map.txt`** — one representative path per route, annotated with the
284
+ strongest expectation pack each page satisfied. Sweep it with
285
+ `web-tester sweep --preset map --fail-on http-5xx`.
286
+ - **`instructions/recipes.md`** — a copy-paste `inspect` one-liner per page
287
+ type, in a marker-fenced block that `map` refreshes on each run.
288
+ - **`journeys/*.json`** — a draft journey per distinct form found (fields
289
+ pre-filled with sample values). Review the selectors, values, and add
290
+ expectations before relying on them.
291
+
292
+ Plus an HTML site map (`runs/map-<id>/map.html`) with a screenshot, status,
293
+ and link count per route.
294
+
295
+ | Flag | Purpose |
296
+ |---|---|
297
+ | `--limit <n>` | Max pages to fetch (default 50). |
298
+ | `--depth <n>` | Max link hops when crawling (default 3; ignored for sitemap seeds). |
299
+ | `--per-template <n>` | Max pages fetched per route template (default 3). |
300
+ | `--max-journeys <n>` | Cap on generated journey drafts (default 12). |
301
+ | `--no-sitemap` | Don't seed from `sitemap.xml`; follow links only. |
302
+ | `--sitemap <url>` | Use a specific sitemap URL. |
303
+ | `--filter` / `--exclude <regex>` | Keep / drop matching paths. |
304
+ | `--no-screenshots` | Skip per-page screenshots (faster). |
305
+ | `--force` | Overwrite existing generated journeys. |
306
+
307
+ Everything `map` writes is yours to edit — it's a starting point that turns a
308
+ cold project into a covered one in one command.
309
+
310
+ ---
311
+
312
+ ## What lands in `runs/<id>/`
313
+
314
+ | File | Contents |
315
+ |---|---|
316
+ | `report.html` | **Self-contained HTML report.** Open this first. |
317
+ | `result.json` | Full structured report — same data as the HTML. Programmatic reads. |
318
+ | `video/page@<hash>.webm` | Screen recording (omit with `--no-video` or `--quick`). |
319
+ | `initial.png` / `initial-full.png` | Viewport + full-page after first load. |
320
+ | `final.png` / `final-full.png` | Viewport + full-page after last step. |
321
+ | `steps/NN-<label>.png` | One screenshot per step. |
322
+ | `initial.html` / `final.html` | Page HTML (only if `--html`). |
323
+ | `console.json`, `network.json` | Raw streams (also embedded in `result.json`). |
324
+
325
+ `--quick` is the most useful flag: no video, no full-page screenshots, no
326
+ HTML capture, no AI summary. Pair with `--expect` / `--fail-on` for a real
327
+ pass/fail gate in 5–10s.
328
+
329
+ ---
330
+
331
+ ## Step grammar
332
+
333
+ `--step` can be repeated. Steps run sequentially, with their own screenshot
334
+ plus the slice of console / network / page-errors produced *during* that step.
335
+
336
+ ```
337
+ goto:<url> navigate (absolute or path)
338
+ reload reload current page
339
+ wait:<load|domcontentloaded|networkidle>
340
+ wait:<ms> sleep N ms
341
+ wait:<selector> wait for selector
342
+ wait:text=<exact text> wait for matching text
343
+ wait:url-stable[=<ms>] wait until URL changes at least once then
344
+ stays still for <ms> (default 250)
345
+ wait:url-contains:<sub>[@<ms>] wait until URL contains <sub>
346
+ (use @ not = so <sub> can include '=')
347
+ settle[:<ms>] wait for data-attr-selected-label to
348
+ populate on any [data-attr-name] element.
349
+ Fast-paths in ~3s if none are present.
350
+ Apps without data-attrs should prefer
351
+ 'wait:networkidle'.
352
+ click:<selector> click (Playwright locator; supports CSS
353
+ and :has-text())
354
+ hover:<selector>
355
+ fill:<selector>=<value> native input
356
+ react-fill:<selector>=<value> React-controlled input (calls the native
357
+ value setter + dispatches synthetic
358
+ input/change/blur events)
359
+ press:<selector>=<key> keyboard press
360
+ select:<selector>=<value> native <select>
361
+ scroll:<top|bottom|<px>>
362
+ screenshot[:<name>] viewport screenshot
363
+ screenshot-full[:<name>] full-page screenshot
364
+ eval:<JS expression> run in page context; result attached to step
365
+ ```
366
+
367
+ For long step chains, drop them in a JSON file and pass `--steps-file flow.json`:
368
+
369
+ ```json
370
+ ["settle", "screenshot:initial", "click:button:has-text(\"Submit\")",
371
+ "wait:networkidle", "goto:/thanks"]
372
+ ```
373
+
374
+ ---
375
+
376
+ ## Verdict & assertions
377
+
378
+ Use these to turn a run into a real pass/fail gate.
379
+
380
+ | Flag | Purpose |
381
+ |---|---|
382
+ | `--fail-on <list>` | Comma-sep kinds that flip `ok` to false: `page-errors`, `console-errors`, `4xx`, `5xx`. Exit code 1 on any trigger. |
383
+ | `--expect <kind>=<value>` | (Repeatable) final-page assertion. Kinds: `text=…`, `no-text=…`, `selector=…`, `no-selector=…`, `attr=<Name>:<value>`. |
384
+ | `--persist <ms>` | Re-check every `--expect` after waiting `<ms>`. **Both** checks must pass — catches transient state (a toast that flashes for 1s then disappears). |
385
+
386
+ ```bash
387
+ # Don't trust a single check for derived state. --persist re-validates.
388
+ npx web-tester-for-claude inspect /pricing \
389
+ --step settle --quick \
390
+ --expect "text=$49/mo" \
391
+ --persist 2500 \
392
+ --fail-on http-5xx
393
+ ```
394
+
395
+ ---
396
+
397
+ ## Deeper capture — `--deep`
398
+
399
+ When a one-line console message isn't enough, add `--deep` to `inspect`. It
400
+ turns on three heavier signals that are off by default:
401
+
402
+ - **Request + response bodies** for XHR/fetch/document requests (textual
403
+ content only, truncated). The bug is often *in the payload* — a `200` that
404
+ returns `{"error":"out of stock"}` looks fine until you read the body.
405
+ - **Local scope at every uncaught exception.** web-tester attaches a Chrome
406
+ DevTools Protocol debugger, pauses on each throw, dumps the throwing
407
+ function's local + closure variables, and resumes immediately. Instead of
408
+ just `TypeError: cannot read 'id' of undefined`, you get
409
+ `local: userId=42, cart={ items: 3, total: 9.99 }` at the throw site.
410
+ - **Unhandled promise rejections**, which the normal `pageerror` stream
411
+ misses entirely.
412
+
413
+ ```bash
414
+ npx web-tester-for-claude inspect /checkout \
415
+ --deep --quick \
416
+ --step "click:button:has-text(\"Pay\")" \
417
+ --step wait:networkidle
418
+ ```
419
+
420
+ The CLI prints the exceptions with their scope; the full dump (and bodies)
421
+ land in `result.json` under `deepErrors`, `unhandledRejections`, and each
422
+ `network.entries[].responseBody`. The debugger pauses add overhead, so reach
423
+ for `--deep` when you're diagnosing a specific failure, not on every run.
424
+
425
+ ---
426
+
427
+ ## Authentication
428
+
429
+ Most real flows live behind a login. web-tester drives the login **once** and
430
+ reuses the session, so gated pages work without logging in every run.
431
+
432
+ ```bash
433
+ # 1. Run your login flow with --save-session
434
+ web-tester inspect /login \
435
+ --step "fill:input[name=email]=test@example.com" \
436
+ --step "fill:input[name=password]=your-test-password" \
437
+ --step "click:button[type=submit]" \
438
+ --step "wait:url-contains:/dashboard" \
439
+ --save-session
440
+
441
+ # 2. Every later inspect / sweep / journey is now authenticated automatically.
442
+ web-tester inspect /account --quick --expect "text=Sign out"
443
+
444
+ # Force a logged-out run any time:
445
+ web-tester inspect / --no-session
446
+ ```
447
+
448
+ `--save-session` writes the browser session — cookies + localStorage — to
449
+ `~/.web-tester/session.json`. That file is **machine-local**: it lives in your
450
+ home directory, not the repo, and is never committed. It's saved only after a
451
+ clean run (so a failed login can't overwrite a good session), and refreshed
452
+ automatically on later runs so rotating tokens keep working. You can save the
453
+ login as a journey (`--save-journey login`) and re-authenticate with
454
+ `web-tester journey login --save-session`.
455
+
456
+ > ⚠️ **Use test credentials only — at your own risk.**
457
+ >
458
+ > Anything you put in a `--step`, a saved journey, or otherwise hand to
459
+ > web-tester is **visible to the AI agent** driving it. Credentials written
460
+ > into a step are stored in **plain text** in `.web-tester/journeys/*.json`,
461
+ > which is committed to your repo. The saved session in
462
+ > `~/.web-tester/session.json` grants access to anything that account can reach.
463
+ >
464
+ > Never use production, personal, or privileged accounts. Use a **disposable
465
+ > test account** scoped to a safe environment, and treat anything reachable
466
+ > with it as exposed. You assume all responsibility for credentials, tokens,
467
+ > and actions taken with them.
468
+
469
+ ---
470
+
471
+ ## `.web-tester/` — your project's recipes
472
+
473
+ Everything project-specific lives in `.web-tester/` at your project root.
474
+ All files are optional; commands fail gracefully when they're missing.
475
+
476
+ ```
477
+ .web-tester/
478
+ impact-rules.json # rules for `web-tester impact`
479
+ urls-<name>.txt # URL preset for `web-tester sweep --preset <name>`
480
+ journeys/<name>.json # saved flows for `web-tester journey <name>`
481
+ instructions/*.md # knowledge base (or .web-tester/*.md flat for
482
+ # small projects)
483
+ ```
484
+
485
+ ### `impact-rules.json`
486
+
487
+ Each rule names a set of path globs and what to run if any changed file
488
+ matches. `web-tester impact` reads `git diff` against `origin/main` (or
489
+ `--base <ref>`) and executes matched rules. **Advisory only — never blocks
490
+ your push.**
491
+
492
+ ```json
493
+ {
494
+ "rules": [
495
+ {
496
+ "name": "Auth code changed — full sign-up journey",
497
+ "when_changed_any": ["src/auth/**", "src/pages/api/auth/**"],
498
+ "journey": "signup"
499
+ },
500
+ {
501
+ "name": "Shared layout changed — sweep top pages",
502
+ "when_changed_any": ["src/components/Layout/**"],
503
+ "sweep": {
504
+ "urls": ["/", "/pricing", "/docs"],
505
+ "packs": ["homepage"]
506
+ }
507
+ }
508
+ ]
509
+ }
510
+ ```
511
+
512
+ ### `urls-<name>.txt`
513
+
514
+ Newline-separated URLs/paths. `#` comments allowed. Per-URL `#pack=<name>`
515
+ annotations apply the named expectation pack on top of anything global.
516
+
517
+ ```
518
+ # urls-smoke.txt
519
+ / #pack=homepage
520
+ /pricing
521
+ /docs #pack=has-h1 #pack=has-main
522
+ ```
523
+
524
+ ### `journeys/<name>.json`
525
+
526
+ Bundles a URL + step chain + assertions for `web-tester journey <name>`.
527
+
528
+ ```json
529
+ {
530
+ "description": "User signs up, lands on dashboard",
531
+ "url": "/signup",
532
+ "steps": [
533
+ "settle",
534
+ "fill:input[name=email]=test@example.com",
535
+ "fill:input[name=password]=hunter2",
536
+ "click:button[type=submit]",
537
+ "wait:url-contains:/dashboard"
538
+ ],
539
+ "expectations": ["text=Welcome", "selector=[data-test=dashboard]"],
540
+ "failOn": "http-5xx"
541
+ }
542
+ ```
543
+
544
+ ### `instructions/*.md`
545
+
546
+ Plain-English notes on your project's quirks. Run `web-tester kb` to list
547
+ them, `web-tester kb <topic>` to print one. AI agents read these instead of
548
+ re-discovering domain knowledge by grepping your source.
549
+
550
+ ---
551
+
552
+ ## Built-in expectation packs
553
+
554
+ Pass `--pack <name>` to apply one to every URL in a sweep, or annotate
555
+ URLs in a `urls-*.txt` file with `#pack=<name>`.
556
+
557
+ | Pack | Asserts |
558
+ |---|---|
559
+ | `homepage` | `<header>` + `<footer>` present |
560
+ | `static` | `<header>` + `<footer>` present |
561
+ | `category` | `<header>` + `<footer>` + an internal anchor inside `<main>` containing an `<img>` |
562
+ | `has-main` | `<main>` present |
563
+ | `has-h1` | `<h1>` present |
564
+
565
+ Add project-specific packs in `src/inspector/packs.ts` (PRs welcome for
566
+ genuinely generic patterns) or wrap web-tester with your own pre-flight
567
+ script that injects `--expect …` flags.
568
+
569
+ ---
570
+
571
+ ## Environment
572
+
573
+ | Var | Default | Purpose |
574
+ |---|---|---|
575
+ | `WEB_TESTER_BASE_URL` | `http://localhost:3000` | Resolves bare paths to absolute URLs. |
576
+ | `GOTO_TIMEOUT_MS` | `30000` | Initial `page.goto` timeout. |
577
+ | `STEP_TIMEOUT_MS` | `15000` | Per-step action timeout. |
578
+ | `SETTLE_TIMEOUT_MS` | `30000` | `settle` step ceiling. |
579
+
580
+ `.env` files in the cwd are loaded automatically (via `dotenv`).
581
+
582
+ ---
583
+
584
+ ## Report shape (excerpt)
585
+
586
+ ```jsonc
587
+ {
588
+ "runId": "2026-06-04T17-12-03",
589
+ "ok": false,
590
+ "video": "video/page@abc….webm",
591
+ "requestedUrl": "http://localhost:3000/products/widget",
592
+ "finalUrl": "http://localhost:3000/cart",
593
+ "title": "Cart | Acme",
594
+ "durationMs": 8423,
595
+ "failedSteps": 0,
596
+ "verdictTriggers": [],
597
+ "initial": { "screenshot": "initial.png", "attrs": [] },
598
+ "final": { "screenshot": "final.png", "attrs": [] },
599
+ "console": { "totals": { "error": 1, "log": 14 }, "entries": [] },
600
+ "network": { "count": 23, "failedCount": 1, "entries": [] },
601
+ "pageErrors": [],
602
+ "steps": [
603
+ {
604
+ "index": 1,
605
+ "step": { "kind": "click", "selector": "button:has-text(\"Submit\")" },
606
+ "label": "click button:has-text(\"Submit\")",
607
+ "ok": true,
608
+ "durationMs": 412,
609
+ "url": "http://localhost:3000/products/widget",
610
+ "screenshot": "steps/01-click.png",
611
+ "console": [],
612
+ "network": [{ "method": "POST", "url": ".../cart", "status": 200, "durationMs": 187 }],
613
+ "pageErrors": []
614
+ }
615
+ ]
616
+ }
617
+ ```
618
+
619
+ ---
620
+
621
+ ## What it is *not*
622
+
623
+ - Not an LLM pipeline: `map` generates scaffolding **deterministically** from
624
+ what it observes in the browser — no model picks your assertions. (The
625
+ optional `--summary` is the one exception, and it's off by default.)
626
+ - Not a judge: nothing decides whether a result is good or bad.
627
+ - Not a test runner: there are no `expect()` calls, no pass/fail beyond
628
+ the literal "did the steps execute, did the `--expect` flags hold" gate.
629
+
630
+ What `map` writes is a *starting point* — the meaningful assertions, the
631
+ which-flows-matter decisions, the weighing of a finding all still belong to
632
+ you, or to your AI agent reading the report.
633
+
634
+ ---
635
+
636
+ ## Contributing
637
+
638
+ Issues and PRs welcome. Run the type check:
639
+
640
+ ```bash
641
+ npm run tsc
642
+ ```
643
+
644
+ The codebase is intentionally small (~3K LOC) and TypeScript with no
645
+ runtime deps beyond `playwright`, `tsx`, and `dotenv`. Keep it that way.
646
+
647
+ ---
648
+
649
+ ## License
650
+
651
+ MIT — see [LICENSE](LICENSE).