npm - @blazediff/agent - Versions diffs - 0.0.1 - Mend

@blazediff/agent 0.0.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (11) hide show

package/JUDGING.md ADDED Viewed

@@ -0,0 +1,60 @@
+# blazediff — judging ambiguous diffs
+Triggered when `check --judge host` reports `pendingJudgments > 0`. The heuristic returned `ambiguous` for those entries and is deferring to you.
+## Look for a mass-mask first
+**Before opening any tile**, scan the slim payload's `results[]` for a recurring pattern in `verdict.headline`. If many entries share the same shape (e.g. "1 content-change @ bottom-right (0.02%, low)" across docs-react, docs-ssim, docs-jest …), they're almost certainly the same site-wide source (footer timestamp, theme-toggle, "Last updated" stamp, dynamic year). Open one or two representative tiles to confirm, then **mask once at the source** (see `MASKING.md` → "Mass-masking shared noise") instead of writing N near-identical verdicts.
+Heuristics to spot a mass-mask candidate before judging:
+- `pendingJudgments >= 5` with verdict headlines that look near-identical (same position, similar pixel %).
+- Region tiles at the same `@ bottom-right` / `@ top-right` / `@ bottom` position across entries — that's the same layout slot rendering different per-page content.
+When in doubt, open one tile from the largest group and one from the smallest; if they're the same UI element, the whole group is one mask.
+## Per-entry judging
+For genuinely-distinct ambiguous entries, judge per-entry. For each `<TARGET>/.blazediff/judgments/<id>/`:
+> **Token discipline — read regions, not the page.** `regions.png` is a tight crop of every changed area at native resolution; `locator.png` is a ~400 px overview thumbnail. Together they're typically 10–100x smaller than the full baseline/actual/diff PNGs and contain everything needed to judge the change. **Never** open `paths.baseline` / `paths.actual` / `paths.diff` as a first move — they are full-page fallbacks for the rare case where a region clearly continues outside its crop (e.g., a layout shift that runs off the bottom of the tile). For "is something wrong with this screenshot?" investigations of any kind, default to regions first.
+1. Read `request.json`. It contains `regions[]` (bbox + pixelCount + change type per region), `paths.locator`, `paths.tiles`, `heuristicVerdict`, and full `manifestEntry` context. The `paths.baseline` / `paths.actual` / `paths.diff` fields are full-page fallbacks — prefer the tiles.
+2. **Batch-read `locator.png` and `regions.png` in a single tool call** (one message with two parallel Read invocations). `locator.png` is a ~400 px thumbnail of the diff with every change region outlined in red — use it for spatial orientation. `regions.png` is a vertical stack of `[baseline | actual]` pairs, one row per change region at native resolution. Row order matches the `regions[]` array (top = largest by pixelCount). When multiple pending entries exist, batch reads across entries too — every Read in one tool call.
+3. Base your verdict primarily on what `regions.png` shows. Only open the full diff / baseline / actual PNGs (`paths.diff` etc., relative to the target) if the composite is itself ambiguous (e.g., a change clearly continues outside the cropped region).
+   - **Dimension-change verdicts** (`headline: "image dimensions changed"`) have no `regions.png` because pixel-region analysis can't run across differently-sized images — `regions[]` will be empty. Open `paths.baseline` and `paths.actual` directly. Page-height shifts are usually intentional content edits (text added/removed, a section grew); label `intentional-likely` after confirming the content delta matches a recent commit.
+4. Write `<TARGET>/.blazediff/judgments/<id>/verdict.json` (next to the request.json) with shape:
+   ```json
+   {
+     "id": "<same id>",
+     "verdict": {
+       "label": "regression-likely" | "intentional-likely" | "noise-likely",
+       "headline": "<one-line summary>",
+       "rationale": ["<short reason>", "..."],
+       "action": "investigate" | "rewrite-if-intended" | "ignore-or-rewrite"
+     },
+     "rationale": "<one-paragraph explanation of what you saw>",
+     "confidence": 0.0
+   }
+   ```
+   Pick `action` to match `label`: `regression-likely` → `investigate`, `intentional-likely` → `rewrite-if-intended`, `noise-likely` → `ignore-or-rewrite`.
+5. Run `blazediff-agent --cwd "$TARGET" check --apply-judgments --json`. The CLI regenerates `summary.md` from your verdicts (no re-screenshot).
+6. Resume the check flow with the upgraded verdicts.
+## zsh-safe shell loops for fanning out ids
+When you must fan out a list of ids in a `Bash` call to write verdict files, **never** rely on word-splitting on a space-delimited variable — under zsh (macOS default) it doesn't split, so `for id in $IDS` iterates once with the whole string as a single value, breaking paths like `judgments/<id>/verdict.json`. Use a heredoc + `while IFS= read -r id` instead:
+```sh
+TARGET="$(cd /abs/path && pwd -P)"
+while IFS= read -r id; do
+  [ -n "$id" ] || continue
+  mkdir -p "$TARGET/.blazediff/judgments/$id"
+  cat >"$TARGET/.blazediff/judgments/$id/verdict.json" <<JSON
+  {"id":"$id","verdict":{"label":"noise-likely","headline":"...","rationale":["..."],"action":"ignore-or-rewrite"},"rationale":"...","confidence":0.9}
+JSON
+done <<'IDS'
+docs
+docs-bun
+docs-cli
+IDS
+```
+Never use bash-only constructs (`declare -A`, `mapfile`, `(( ))` with strings) — they fail silently under zsh.

package/LICENSE.md ADDED Viewed

@@ -0,0 +1,21 @@
+MIT License
+Copyright (c) 2025 Teimur Gasanov
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.

package/MASKING.md ADDED Viewed

@@ -0,0 +1,86 @@
+# blazediff — masking unstable regions
+When a diff is `noise-likely`, or when a `regression-likely`/`intentional-likely` diff is actually caused by something inherently non-deterministic in the page, the right fix is usually a **mask**, not a rebaseline. A rebaseline just resets the clock on a flake; a mask removes it.
+## When to mask
+**Mask whenever the changing region is:**
+- An auto-cycling animation: carousels, marquees, demo widgets with `setInterval`, video posters, Lottie loops.
+- A third-party iframe or embed: Storybook, YouTube, Twitter, codesandbox, Stripe checkout — anything whose load timing or content you don't control. `networkidle` does not wait for embedded iframes' subresources to finish.
+- Time-derived content: `Date.now()` clocks, "X minutes ago" timestamps, today-highlighted calendars, expiry countdowns, copyright years on Dec 31 / Jan 1.
+- Per-session randomness: avatars seeded from session id, A/B-test variants, generated IDs, shuffled lists.
+- Anti-bot / personalization noise: cookie banners that load asynchronously, recommendation strips, geolocation-derived prices.
+**Don't mask** real content that just happens to be changing — that's the change you want the test to catch. If unsure: mask only after you've seen the same region flake twice, or after you've confirmed the source is inherently non-deterministic (e.g., grep'd for `setInterval` / `<iframe` / `Date.now()` in the component).
+## Default attribute (no manifest changes)
+The agent always masks any element matching `[data-blazediff-agent-mask]`. No selector needs to be added to the manifest entry. This is the preferred path whenever you can edit the source.
+```tsx
+<div data-blazediff-agent-mask>...</div>
+// or, with a reason inline:
+<div data-blazediff-agent-mask="report-carousel">...</div>
+```
+The attribute's value is ignored by the matcher; presence is enough. Use the value to document intent for future readers.
+When the default attribute covers the unstable region, leave the manifest entry's `mask: []` and skip the rest of this file.
+## Picking a selector (for cases you can't annotate)
+Per-entry `mask` selectors still exist for cases where you can't edit the source (third-party iframes, transient build artifacts, components owned by another team). They're CSS selectors passed to `document.querySelectorAll`, painted with a magenta rect over the bounding rect in both baseline and actual.
+- For external/third-party embeds, target the element type: `iframe`, `video`, `[data-testid="storybook-preview"]`.
+- Avoid Tailwind class chains and nth-child selectors. They break on the next style tweak.
+- Scope matters. Each manifest entry has its own `mask` array, so `iframe` on `/examples/web-components` won't affect `/home`. Use the narrowest selector that covers the unstable region.
+- If you find yourself reaching for a per-entry selector because the source is yours, prefer the `data-blazediff-agent-mask` attribute instead. Zero manifest churn, survives refactors.
+## Mass-masking shared noise
+When the same unstable region appears across many routes (a footer "Last updated" stamp, a global theme toggle, a sitewide cookie banner), don't write a per-entry mask N times. The single best move is to add `data-blazediff-agent-mask` to the shared component (layout, header, footer, doc-framework template). The default matcher catches it on every route automatically. No manifest edits.
+```tsx
+<footer>
+  Last updated <span data-blazediff-agent-mask>{date}</span>
+</footer>
+```
+After tagging the source, re-capture the affected routes so baselines reflect the new mask:
+```sh
+TARGET="$(cd /abs/path && pwd -P)"
+# Build the entries list from the ids you saw in pendingJudgments (or results[] if already judged)
+python3 -c '
+import json
+ids = """docs docs-bun docs-cli docs-core docs-jest docs-react""".split()
+url_map = {
+  "docs": "/docs", "docs-bun": "/docs/bun", "docs-cli": "/docs/cli",
+  "docs-core": "/docs/core", "docs-jest": "/docs/jest", "docs-react": "/docs/react",
+}
+print(json.dumps([{"id": i, "url": url_map[i]} for i in ids]))
+' | blazediff-agent --cwd "$TARGET" capture --stdin --mode baseline --json
+```
+Re-run `check` / `run`. The pending count should collapse from N to 0 (or to a much smaller distinct set).
+If the shared element is third-party (can't be annotated), fall back to a per-entry selector mask and pass `mask: ["iframe"]` (or whatever fits) in the capture JSON.
+## Applying a mask
+(Re-baselines the entry; treat as user-confirmed when the user said "mask".)
+1. If you can edit the source, add `data-blazediff-agent-mask` to the unstable element. No manifest changes are needed; the default matcher handles it.
+2. If you can't edit the source (third-party iframe, framework-owned element), prepare a per-entry CSS selector. Pass the new mask list to `capture --stdin --mode baseline`, which rewrites both the manifest mask and the baseline PNG:
+   ```
+   cat <<'EOF' | blazediff-agent --cwd "$TARGET" capture --stdin --mode baseline --json
+   [
+     {"id":"examples-web-components","url":"/examples/web-components","mask":["iframe"]}
+   ]
+   EOF
+   ```
+   The mask list replaces the existing one. Include every selector you want kept, not just the new one. To inspect the current mask, grep the manifest (read-only).
+3. Re-run `run` / `check` to confirm the entry now passes. If it still fails, the attribute or selector didn't match anything. Verify in browser devtools on the live page.
+4. If `config.devServer` is non-null and you started it for the recapture, `serve-status --kill --json` afterwards.
+The default attribute is preferred when you own the source. Per-entry CSS selectors keep the blast radius small when you don't.

package/README.md ADDED Viewed

@@ -0,0 +1,136 @@
+# @blazediff/agent
+Agentic visual regression for BlazeDiff. Auto-discovers routes, captures deterministic screenshots via Playwright, compares them against committed baselines using the native BlazeDiff core, and hands ambiguous diffs back to your coding agent (Claude Code, Cursor, Codex) to judge.
+The package ships a deterministic CLI (`blazediff-agent`) plus a portable playbook (`skill/blazediff/SKILL.md`) that any host coding agent drives. No embedded LLM call, no API key in the default flow - the host supplies the loop, vision, and context.
+## Install
+```sh
+npm install -g @blazediff/agent
+# or as a dev dep
+npm install --save-dev @blazediff/agent
+```
+First run will prompt to install Chromium via the bundled Playwright. No sudo, no `npx playwright install --with-deps`.
+## Quickstart
+```sh
+# 1. Author (from your coding agent, via /blazediff or equivalent)
+blazediff-agent init --json                # writes .blazediff/config.json
+blazediff-agent browsers install --check   # ensure chromium
+# host agent discovers routes and pipes them to:
+echo '[{"id":"home","url":"/"}]' | blazediff-agent capture --stdin --mode baseline --json
+# 2. Check (CI or local)
+blazediff-agent run --judge host --json    # pipelined: capture → diff → verdict → judge
+# or
+blazediff-agent check --judge host --json  # single-pool, simpler
+# 3. Accept intentional regression
+blazediff-agent rewrite home --json
+```
+Commit `.blazediff/` (config + manifest + baselines). Run `check` / `run` in CI.
+## Onboarding a coding agent
+`blazediff-agent onboard` installs the playbook into whatever coding-agent harness you're using:
+```sh
+blazediff-agent onboard --json                 # auto-detect Claude Code / Codex / Cursor in cwd
+blazediff-agent onboard --harness codex        # explicit (override detection)
+blazediff-agent onboard --harness all          # all three
+blazediff-agent onboard --force                # overwrite existing playbook
+```
+Per harness:
+- **Claude Code** writes `<project>/.claude/skills/blazediff/SKILL.md`
+- **Codex** writes `~/.codex/prompts/blazediff.md` (user-global; Codex CLI looks here for slash-command prompts)
+- **Cursor** writes `<project>/.cursor/rules/blazediff.mdc` with the right frontmatter
+Detection is project-local (looks for `.claude/` / `CLAUDE.md` / `AGENTS.md` for Claude Code, `AGENTS.md` / `.codex/` for Codex, `.cursor/` / `.cursorrules` for Cursor). Both Claude Code and Codex read `AGENTS.md`, so a project with only `AGENTS.md` will install for both. On a TTY with no detection, the command prompts.
+## Commands
+| Command | What it does |
+|---|---|
+| `onboard` | Install the playbook into the detected coding-agent harness (Claude Code, Codex, Cursor) |
+| `init` | Detect framework/dev-script, write `.blazediff/config.json` + `.gitignore` |
+| `discover` | BFS-crawl routes from `baseUrl` as a fallback when source-walking fails |
+| `capture --stdin` | Read a JSON list of routes, screenshot each, write baselines/actuals + manifest |
+| `check` | Re-capture every manifest entry, diff against baseline, emit `CheckReport` |
+| `run` | Same as `check` but pipelines capture → diff → verdict → judge via LangGraph for parallelism + LangSmith traces |
+| `rewrite <id...>` | Re-baseline existing manifest entries (preserves mask/viewport/waitFor) |
+| `diff <id>` | Re-diff one entry against its actual capture without re-screenshotting |
+| `manifest` | Inspect / list manifest entries |
+| `serve-status` | Start / stop / probe the configured dev server |
+| `browsers install` | Install bundled Playwright Chromium |
+| `reset --yes` | Wipe `.blazediff/` entirely |
+All commands accept `--json` for machine-readable output. Pass `--cwd <abs-path>` to operate on a sub-directory (e.g. an app inside a monorepo).
+## Judging model
+The diff heuristic emits one of `regression-likely | intentional-likely | noise-likely | ambiguous`. The first three are acted on directly. For `ambiguous`, the `--judge host` backend writes a `JudgmentRequest` (region tiles + locator thumbnail + bbox metadata) to `.blazediff/judgments/<id>/request.json` and exits with a non-zero `pendingJudgments` count.
+The host coding agent reads `regions.png` (a tight crop of every change at native resolution) and `locator.png` (a small overview thumbnail), writes a `verdict.json` next to the request, and re-runs `check --apply-judgments` to merge the verdicts into the report. The full playbook lives in `skill/blazediff/SKILL.md` at the repo root.
+This handoff was designed for vision-token efficiency: the region tiles are 10–100× smaller than the full-page PNGs and contain everything needed to classify the change.
+## Masking unstable regions
+Auto-cycling carousels, third-party iframes, clocks, randomized avatars and other non-deterministic content should be masked, not re-baselined. The agent paints a magenta rectangle over each masked region in both baseline and actual, so the diff is zeroed.
+The default and preferred path: add `data-blazediff-agent-mask` to the source element. The agent auto-masks anything matching `[data-blazediff-agent-mask]` on every route. No manifest changes needed.
+```tsx
+<div data-blazediff-agent-mask>...</div>
+// or with a reason inline:
+<div data-blazediff-agent-mask="report-carousel">...</div>
+```
+For external embeds you can't annotate (third-party iframes, framework-owned elements), fall back to a per-entry CSS selector in `manifest.entries[].mask` and re-capture via `capture --stdin --mode baseline`. The mask list replaces the existing one. See the SKILL playbook for full guidance.
+## Configuration
+`.blazediff/config.json`:
+```json
+{
+  "devServer": { "command": "pnpm dev", "port": 3000, "readyTimeoutMs": 60000 },
+  "framework": "next",
+  "packageManager": "pnpm",
+  "baseUrl": "http://127.0.0.1:3000"
+}
+```
+`.blazediff/manifest.json` is written by `capture` - never edit it directly. Each entry holds `{ id, url, mask[], viewport, waitFor, fullPage }`.
+## CI
+Only `check` / `run` are allowed in CI (`CI=1` or no TTY). Capture/rewrite/init/reset are explicitly blocked. Exit codes:
+- `0` - all passed
+- `1` - at least one regression, intentional, or pending-judgment entry
+- non-zero with structured error JSON on infra failures
+## Files
+- `src/cli.ts` - entry point
+- `src/check.ts` / `src/graph/` - single-pool and LangGraph-pipelined runners
+- `src/judge/` - pluggable judge (`host` / `none`), region-tile generator, verdict applier
+- `src/browser/launch.ts` - Chromium serialization + mask overlay painter
+- `src/discover/` - source-walking for Next.js / Vite / Remix + BFS fallback
+- `src/diff/` - heuristic verdict pipeline
+- `src/report/markdown.ts` - `summary.md` generator (5-column `id | baseline | actual | diff | verdict`)
+- `ROADMAP.md` - phase tracking
+- Playbook: `skill/blazediff/SKILL.md` (repo root)
+## Links
+- [GitHub](https://github.com/teimurjan/blazediff/tree/main/packages/agent)
+- [BlazeDiff docs](https://blazediff.dev/docs)
+- [Roadmap](./ROADMAP.md)

package/SKILL.md ADDED Viewed

@@ -0,0 +1,93 @@
+---
+name: blazediff
+description: Run, author, or update BlazeDiff visual regression tests. Trigger on "visual test", "screenshot regression", "blazediff", "/blazediff".
+---
+# blazediff
+CLI binary is `blazediff-agent` (the name `blazediff` belongs to the cargo image-diff binary).
+Sibling files in this skill directory — read on demand:
+- `JUDGING.md` — judging ambiguous diffs (`pendingJudgments > 0`) + zsh-safe shell loops for writing verdicts.
+- `MASKING.md` — picking selectors, mass-masking shared noise across routes, applying masks.
+## Be terse
+- Pass `--json` on every `blazediff-agent` call; parse fields. Do not echo CLI output.
+- `check`/`run --json` returns a **slim payload**: `{ summaryPath, createdAt, totalEntries, passed, failed, pendingJudgments, results }`. `results` lists non-pass entries only, each as `{ id, url, status, verdict?: { label, headline, action } }`. The full per-entry detail (regions, paths, rationale) lives in `<TARGET>/.blazediff/summary.md` and `<TARGET>/.blazediff/judgments/<id>/request.json`.
+- Authoring uses ONE `capture --stdin` call piped a JSON list of routes — never a per-route loop.
+- No `ls`, `cat`, `find` for paths the CLI already returns.
+- One final summary line — for authoring: `N captured | M skipped (reasons) | K auth-gated`; for check: `P/T passed (F failed)` plus failure ids.
+## Pick the target directory
+- If the user names a sub-folder ("test apps/website", "set up tests for packages/foo"), resolve it to an **absolute path** and pass that to every call:
+  ```
+  TARGET="$(cd /path/to/repo/apps/website && pwd -P)"
+  blazediff-agent --cwd "$TARGET" ...
+  ```
+- Never pass a relative `--cwd`. Never `cd` into the target. The CLI catches the common double-nest case (`apps/website/apps/website`) but absolute paths avoid it entirely.
+## Pick the mode
+- `.blazediff/manifest.json` (in `--cwd`) exists → **check**.
+- Else → **authoring**.
+## check
+1. `blazediff-agent --cwd "$TARGET" check --judge host --json` (the CLI starts the dev server if `devServer` is configured; otherwise hits the configured baseUrl directly).
+   - Prefer `run` instead for large sites (≥10 routes): `blazediff-agent --cwd "$TARGET" run --judge host --json`. Same flags, same report shape — it just pipelines capture → diff → verdict → judge through a LangGraph state graph so per-entry stages overlap. Use `check` when you want the simpler, single-pool implementation; `run` when wall-time matters or you want LangSmith traces.
+   - **Cold Next.js / Vite servers** can take 5–30s to compile a route on first hit; under default 30s `page.goto` timeout the first route in a fresh dev session sometimes times out (`page.goto: Timeout 30000ms exceeded`). If that happens, **rerun the same `check` command** — the dev server is now warm and the next pass usually completes. Don't change `waitFor` or restart the dev server; the issue is one-time compilation, not a routing or wait-condition bug.
+   - **Codex sandbox / restricted-bash environments** may block Playwright's chromium launch (`browserType.launch: Target page, context or browser has been closed`). Rerun with the sandbox/escape escalation the host agent provides (in Codex: approve the command for "always run outside sandbox"). Not a blazediff bug.
+2. Pass: report `P/T passed`. Stop.
+3. **Pending judgments** (`pendingJudgments > 0`): the heuristic couldn't classify some diffs. You are the judge — **read `JUDGING.md`** in this skill directory for the full workflow. After judging, re-run `check --apply-judgments --json`, then re-evaluate as if from step 2/4.
+4. Fail: read `<TARGET>/.blazediff/summary.md` (5-column `id | baseline | actual | diff | verdict` table with inline image previews; the `--json` stdout has the same data as `CheckReport`). Each failing entry has a `verdict`: `{ label, headline, action, rationale[] }`. Emit one line per failure: `<id>: <verdict.label> — <verdict.headline>`. Then act per `verdict.label`:
+   - `regression-likely` → point the user at `<TARGET>/.blazediff/actual/<id>.diff.png` and ask them to investigate. Do not rewrite.
+   - `intentional-likely` → ask the user to confirm; if yes, `blazediff-agent --cwd "$TARGET" rewrite <id> --json`.
+   - `noise-likely` → ask the user once: ignore, mask, or rewrite. **Prefer masking over rewriting** when the source is inherently non-deterministic (carousel, iframe, clock, randomized avatar) — rewriting only delays the next flake. See `MASKING.md`. If rewriting, group with other rewrites in one call (`rewrite <id1> <id2> ...`).
+   Never rewrite or mask without explicit user confirmation.
+## accept regression (rebaseline)
+Use `verdict.action === "rewrite-if-intended"` (or explicit user confirmation) before calling `rewrite`. When the user confirms a failing entry's new state is correct:
+- All failing entries from the last check: `blazediff-agent --cwd "$TARGET" rewrite --failed --json`
+- Specific entries: `blazediff-agent --cwd "$TARGET" rewrite <id> [<id>...] --json`
+- Whole manifest (rare; ask before doing this): `blazediff-agent --cwd "$TARGET" rewrite --all --json`
+`rewrite` preserves the existing manifest entry's mask, viewport, waitFor, and fullPage settings — only the PNG is regenerated. After it returns, suggest the user re-run `check` to confirm and then `git add .blazediff/baselines/ && git commit`.
+## reset (start from scratch)
+When the user asks to wipe blazediff's state and start over (manifest stale beyond repair, switching frameworks, etc.):
+- `blazediff-agent --cwd "$TARGET" reset --yes --json` — deletes the entire `.blazediff/` directory (config, manifest, baselines, actual, judgments, summary, pid/log). Tracked dev server is stopped first.
+- Then re-run the full **authoring** flow below. Do not call `reset` without explicit user request — it discards committed baselines.
+## authoring
+1. **Config.**
+   - User points at a URL ("test https://blazediff.dev", "server's running on :3001") → `blazediff-agent --cwd "$TARGET" init --url <url> --json`.
+   - Local app, dev script ambiguous or wrong → `init --dev-command "<cmd>" --port <n> --json`.
+   - Local app, single obvious dev script → `init --json`. On error or ambiguity, the CLI lists candidates; pick one with `--dev-script <name>`.
+2. **Chromium.** `blazediff-agent browsers install --check --json`. If `installed: false`, run `blazediff-agent browsers install`. This uses the bundled playwright — no sudo, no `npx playwright install --with-deps`. (On Linux, OS-level deps for chromium may still need `npx playwright install-deps chromium` if the run fails on missing libs; tell the user.)
+3. **Dev server.** If `config.devServer` is non-null, run `blazediff-agent --cwd "$TARGET" serve-status --detach --json`. **Expect this to wait up to 60s** for the port to open before returning. Do not background or poll it.
+4. **Discover routes.** Prefer reading the router source directly:
+   - Next.js: `app/**/page.{tsx,jsx,mdx}` + `pages/**/*.{tsx,jsx}` (skip `api/`, `_app`, `_document`, `_error`).
+   - Vite + react-router: parse `<Route path=...>` in `router.{ts,tsx}`.
+   - Remix / SvelteKit / Astro: walk `app/routes` or `src/routes`.
+   If the framework is unknown or the router source is opaque, call `blazediff-agent --cwd "$TARGET" discover --json`. That command does a BFS crawl from the configured `baseUrl` (depth 2, up to 50 routes), reads `.next/routes-manifest.json` if present, and reads `/sitemap.xml`. It's a fallback for when source-walking fails.
+5. **Filter.** Drop `/api/*`, dynamic segments without sample data, redirects/404s. Flag auth-gated as `auth: required` (record in manifest, don't capture).
+6. **Capture in one call.** Build a JSON array of route entries and pipe it through stdin:
+   ```
+   cat <<'EOF' | blazediff-agent --cwd "$TARGET" capture --stdin --mode baseline --json
+   [
+     {"id":"home","url":"/","mask":[".timestamp"]},
+     {"id":"pricing","url":"/pricing"}
+   ]
+   EOF
+   ```
+   Entries: `{ id, url, mask?, viewport?, waitFor?, fullPage?, mode? }`. Only `id` and `url` required. Manifest entries are written automatically (pass `--no-manifest` to skip).
+   - `id`: semantic kebab-case (`home`, `pricing`, `docs-getting-started`), not URL slug.
+   - `mask`: CSS selectors for unstable regions (timestamps, randomized IDs, avatars, "X ago" times, carousels, third-party iframes). Omit if none. The agent always masks `[data-blazediff-agent-mask]` automatically, so prefer tagging the source element when you can edit it. See `MASKING.md` for full guidance.
+7. **Teardown — ALWAYS run, even on error.** If `config.devServer` is non-null, run `blazediff-agent --cwd "$TARGET" serve-status --kill --json` as the very last step regardless of capture success/failure. The CLI kills by tracked PID first, then falls back to whatever process is listening on the configured port — so it cleans up stale dev servers from prior crashed runs too. If the kill returns `stopped: false`, no server was running; that's fine. Wrap your capture call so this step runs even if capture failed mid-list (shell `trap`, try/finally in the host agent's flow, etc.).
+8. **Final summary line.** Suggest `git add .blazediff/ && git commit`.
+## Hard rules
+- Never `--mode baseline` an existing manifest entry without explicit user request.
+- Never edit `.blazediff/manifest.json` directly.
+- In CI (`CI=1` or no TTY), only `check` is allowed.
+- A route that times out is logged once in the result array and skipped — never block the run.
+- Never leave a dev server running after authoring exits. Teardown is mandatory on every exit path (success, capture failure, user interrupt). If you can't run teardown for some reason, tell the user the port number to kill manually.