@blazediff/agent 0.0.1 → 0.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -1,98 +1,89 @@
1
1
  # @blazediff/agent
2
2
 
3
- Agentic visual regression for BlazeDiff. Auto-discovers routes, captures deterministic screenshots via Playwright, compares them against committed baselines using the native BlazeDiff core, and hands ambiguous diffs back to your coding agent (Claude Code, Cursor, Codex) to judge.
3
+ <div align="center">
4
4
 
5
- The package ships a deterministic CLI (`blazediff-agent`) plus a portable playbook (`skill/blazediff/SKILL.md`) that any host coding agent drives. No embedded LLM call, no API key in the default flow - the host supplies the loop, vision, and context.
5
+ [![npm bundle size](https://img.shields.io/npm/unpacked-size/%40blazediff%2Fagent?style=flat-square)](https://www.npmjs.com/package/@blazediff/agent)
6
+ [![NPM Downloads](https://img.shields.io/npm/dy/%40blazediff%2Fagent?style=flat-square)](https://www.npmjs.com/package/@blazediff/agent)
6
7
 
7
- ## Install
8
+ </div>
8
9
 
9
- ```sh
10
- npm install -g @blazediff/agent
11
- # or as a dev dep
12
- npm install --save-dev @blazediff/agent
13
- ```
14
-
15
- First run will prompt to install Chromium via the bundled Playwright. No sudo, no `npx playwright install --with-deps`.
16
-
17
- ## Quickstart
10
+ Agentic visual regression for BlazeDiff. Discovers routes, screenshots them with Playwright, diffs against committed baselines, and hands ambiguous diffs to a coding agent (Claude Code, Codex, Cursor) for judgment.
18
11
 
19
- ```sh
20
- # 1. Author (from your coding agent, via /blazediff or equivalent)
21
- blazediff-agent init --json # writes .blazediff/config.json
22
- blazediff-agent browsers install --check # ensure chromium
23
- # host agent discovers routes and pipes them to:
24
- echo '[{"id":"home","url":"/"}]' | blazediff-agent capture --stdin --mode baseline --json
12
+ **Features:**
13
+ - Deterministic CLI no embedded LLM, no API key required
14
+ - Source-walking route discovery for Next.js / Vite / Remix (BFS fallback)
15
+ - Heuristic verdict: `regression-likely | intentional-likely | noise-likely | ambiguous`
16
+ - LangGraph pipeline with per-entry subgraphs, suspendable via `interrupt()` and resumable from an on-disk checkpoint
17
+ - Region-tile handoff to host agents (10–100× smaller than full PNGs)
18
+ - Auto-masking via `data-blazediff-agent-mask` attribute
25
19
 
26
- # 2. Check (CI or local)
27
- blazediff-agent run --judge host --json # pipelined: capture → diff → verdict → judge
28
- # or
29
- blazediff-agent check --judge host --json # single-pool, simpler
20
+ ## Installation
30
21
 
31
- # 3. Accept intentional regression
32
- blazediff-agent rewrite home --json
22
+ ```bash
23
+ npm install --save-dev @blazediff/agent
33
24
  ```
34
25
 
35
- Commit `.blazediff/` (config + manifest + baselines). Run `check` / `run` in CI.
36
-
37
- ## Onboarding a coding agent
26
+ First run prompts to install bundled Chromium.
38
27
 
39
- `blazediff-agent onboard` installs the playbook into whatever coding-agent harness you're using:
28
+ ## Quickstart
40
29
 
41
- ```sh
42
- blazediff-agent onboard --json # auto-detect Claude Code / Codex / Cursor in cwd
43
- blazediff-agent onboard --harness codex # explicit (override detection)
44
- blazediff-agent onboard --harness all # all three
45
- blazediff-agent onboard --force # overwrite existing playbook
30
+ ```bash
31
+ blazediff-agent init # write .blazediff/config.json
32
+ blazediff-agent browsers install # ensure chromium
33
+ echo '[{"id":"home","url":"/"}]' \
34
+ | blazediff-agent capture --stdin --mode baseline
35
+ blazediff-agent check --judge host # CI
36
+ blazediff-agent rewrite home # accept intentional change
46
37
  ```
47
38
 
48
- Per harness:
49
-
50
- - **Claude Code** writes `<project>/.claude/skills/blazediff/SKILL.md`
51
- - **Codex** writes `~/.codex/prompts/blazediff.md` (user-global; Codex CLI looks here for slash-command prompts)
52
- - **Cursor** writes `<project>/.cursor/rules/blazediff.mdc` with the right frontmatter
53
-
54
- Detection is project-local (looks for `.claude/` / `CLAUDE.md` / `AGENTS.md` for Claude Code, `AGENTS.md` / `.codex/` for Codex, `.cursor/` / `.cursorrules` for Cursor). Both Claude Code and Codex read `AGENTS.md`, so a project with only `AGENTS.md` will install for both. On a TTY with no detection, the command prompts.
39
+ Commit `.blazediff/` (config + manifest + baselines). All commands accept `--json`.
55
40
 
56
41
  ## Commands
57
42
 
58
- | Command | What it does |
59
- |---|---|
60
- | `onboard` | Install the playbook into the detected coding-agent harness (Claude Code, Codex, Cursor) |
61
- | `init` | Detect framework/dev-script, write `.blazediff/config.json` + `.gitignore` |
62
- | `discover` | BFS-crawl routes from `baseUrl` as a fallback when source-walking fails |
63
- | `capture --stdin` | Read a JSON list of routes, screenshot each, write baselines/actuals + manifest |
64
- | `check` | Re-capture every manifest entry, diff against baseline, emit `CheckReport` |
65
- | `run` | Same as `check` but pipelines capture diff verdict judge via LangGraph for parallelism + LangSmith traces |
66
- | `rewrite <id...>` | Re-baseline existing manifest entries (preserves mask/viewport/waitFor) |
67
- | `diff <id>` | Re-diff one entry against its actual capture without re-screenshotting |
68
- | `manifest` | Inspect / list manifest entries |
69
- | `serve-status` | Start / stop / probe the configured dev server |
70
- | `browsers install` | Install bundled Playwright Chromium |
71
- | `reset --yes` | Wipe `.blazediff/` entirely |
72
-
73
- All commands accept `--json` for machine-readable output. Pass `--cwd <abs-path>` to operate on a sub-directory (e.g. an app inside a monorepo).
74
-
75
- ## Judging model
76
-
77
- The diff heuristic emits one of `regression-likely | intentional-likely | noise-likely | ambiguous`. The first three are acted on directly. For `ambiguous`, the `--judge host` backend writes a `JudgmentRequest` (region tiles + locator thumbnail + bbox metadata) to `.blazediff/judgments/<id>/request.json` and exits with a non-zero `pendingJudgments` count.
78
-
79
- The host coding agent reads `regions.png` (a tight crop of every change at native resolution) and `locator.png` (a small overview thumbnail), writes a `verdict.json` next to the request, and re-runs `check --apply-judgments` to merge the verdicts into the report. The full playbook lives in `skill/blazediff/SKILL.md` at the repo root.
80
-
81
- This handoff was designed for vision-token efficiency: the region tiles are 10–100× smaller than the full-page PNGs and contain everything needed to classify the change.
43
+ <table>
44
+ <tr><th width="200">Command</th><th>Description</th></tr>
45
+ <tr><td><code>onboard</code></td><td>Install the playbook into Claude Code / Codex / Cursor</td></tr>
46
+ <tr><td><code>init</code></td><td>Detect framework, write <code>.blazediff/config.json</code></td></tr>
47
+ <tr><td><code>discover</code></td><td>BFS-crawl routes from <code>baseUrl</code></td></tr>
48
+ <tr><td><code>capture --stdin</code></td><td>Screenshot routes from stdin JSON, write baselines/actuals</td></tr>
49
+ <tr><td><code>check</code></td><td>Re-capture, diff against baseline, emit <code>CheckReport</code>. <code>--judge host</code> suspends on the first ambiguous entry; <code>--apply-judgments</code> resumes from <code>.blazediff/checkpoints/</code> once verdicts are written.</td></tr>
50
+ <tr><td><code>rewrite &lt;id...&gt;</code></td><td>Re-baseline existing entries (also <code>--failed</code> / <code>--all</code>). Cleans stale <code>actual/</code>, <code>judgments/</code>, <code>summary.md</code>, <code>checkpoints/</code> for the rewritten ids.</td></tr>
51
+ <tr><td><code>diff &lt;id&gt;</code></td><td>Re-diff one entry without re-screenshotting</td></tr>
52
+ <tr><td><code>manifest</code></td><td>Inspect / list manifest entries</td></tr>
53
+ <tr><td><code>serve-status</code></td><td>Start / stop / probe the dev server</td></tr>
54
+ <tr><td><code>browsers install</code></td><td>Install bundled Playwright Chromium</td></tr>
55
+ <tr><td><code>reset --yes</code></td><td>Wipe <code>.blazediff/</code></td></tr>
56
+ </table>
57
+
58
+ Pass `--cwd <abs-path>` to target a sub-package in a monorepo.
59
+
60
+ ## Onboarding
61
+
62
+ ```bash
63
+ blazediff-agent onboard # auto-detect harness
64
+ blazediff-agent onboard --harness codex # explicit
65
+ blazediff-agent onboard --harness all
66
+ ```
82
67
 
83
- ## Masking unstable regions
68
+ Writes:
69
+ - Claude Code → `<project>/.claude/skills/blazediff/SKILL.md`
70
+ - Codex → `~/.codex/skills/blazediff/SKILL.md`
71
+ - Cursor → `<project>/.cursor/rules/blazediff.mdc`
84
72
 
85
- Auto-cycling carousels, third-party iframes, clocks, randomized avatars and other non-deterministic content should be masked, not re-baselined. The agent paints a magenta rectangle over each masked region in both baseline and actual, so the diff is zeroed.
73
+ ## Masking
86
74
 
87
- The default and preferred path: add `data-blazediff-agent-mask` to the source element. The agent auto-masks anything matching `[data-blazediff-agent-mask]` on every route. No manifest changes needed.
75
+ Mark non-deterministic content (carousels, clocks, randomized avatars) in source:
88
76
 
89
77
  ```tsx
90
78
  <div data-blazediff-agent-mask>...</div>
91
- // or with a reason inline:
92
79
  <div data-blazediff-agent-mask="report-carousel">...</div>
93
80
  ```
94
81
 
95
- For external embeds you can't annotate (third-party iframes, framework-owned elements), fall back to a per-entry CSS selector in `manifest.entries[].mask` and re-capture via `capture --stdin --mode baseline`. The mask list replaces the existing one. See the SKILL playbook for full guidance.
82
+ For third-party embeds you can't annotate, use a per-entry `manifest.entries[].mask` CSS selector and re-capture.
83
+
84
+ ## Judging
85
+
86
+ Every non-match routes through the configured judge. With `--judge host` the judge node `interrupt()`s the LangGraph pipeline, writes a `JudgmentRequest` (region tiles + locator thumbnail) to `.blazediff/judgments/<id>/`, and the suspended graph is checkpointed to `.blazediff/checkpoints/`. The host agent reads the tiles, writes `verdict.json`, and `check --apply-judgments` resumes the same graph with the verdicts — no re-capture, no re-diff.
96
87
 
97
88
  ## Configuration
98
89
 
@@ -107,30 +98,18 @@ For external embeds you can't annotate (third-party iframes, framework-owned ele
107
98
  }
108
99
  ```
109
100
 
110
- `.blazediff/manifest.json` is written by `capture` - never edit it directly. Each entry holds `{ id, url, mask[], viewport, waitFor, fullPage }`.
101
+ `.blazediff/manifest.json` is written by `capture` don't edit it directly.
111
102
 
112
103
  ## CI
113
104
 
114
- Only `check` / `run` are allowed in CI (`CI=1` or no TTY). Capture/rewrite/init/reset are explicitly blocked. Exit codes:
105
+ Only `check` is allowed under `CI=1`. Exit codes:
115
106
 
116
- - `0` - all passed
117
- - `1` - at least one regression, intentional, or pending-judgment entry
107
+ - `0` all passed
108
+ - `1` regression, intentional, or pending judgment
118
109
  - non-zero with structured error JSON on infra failures
119
110
 
120
- ## Files
121
-
122
- - `src/cli.ts` - entry point
123
- - `src/check.ts` / `src/graph/` - single-pool and LangGraph-pipelined runners
124
- - `src/judge/` - pluggable judge (`host` / `none`), region-tile generator, verdict applier
125
- - `src/browser/launch.ts` - Chromium serialization + mask overlay painter
126
- - `src/discover/` - source-walking for Next.js / Vite / Remix + BFS fallback
127
- - `src/diff/` - heuristic verdict pipeline
128
- - `src/report/markdown.ts` - `summary.md` generator (5-column `id | baseline | actual | diff | verdict`)
129
- - `ROADMAP.md` - phase tracking
130
- - Playbook: `skill/blazediff/SKILL.md` (repo root)
131
-
132
111
  ## Links
133
112
 
134
113
  - [GitHub](https://github.com/teimurjan/blazediff/tree/main/packages/agent)
135
- - [BlazeDiff docs](https://blazediff.dev/docs)
114
+ - [Documentation](https://blazediff.dev/docs)
136
115
  - [Roadmap](./ROADMAP.md)
package/SKILL.md CHANGED
@@ -13,7 +13,7 @@ Sibling files in this skill directory — read on demand:
13
13
 
14
14
  ## Be terse
15
15
  - Pass `--json` on every `blazediff-agent` call; parse fields. Do not echo CLI output.
16
- - `check`/`run --json` returns a **slim payload**: `{ summaryPath, createdAt, totalEntries, passed, failed, pendingJudgments, results }`. `results` lists non-pass entries only, each as `{ id, url, status, verdict?: { label, headline, action } }`. The full per-entry detail (regions, paths, rationale) lives in `<TARGET>/.blazediff/summary.md` and `<TARGET>/.blazediff/judgments/<id>/request.json`.
16
+ - `check --json` returns a **slim payload**: `{ summaryPath, createdAt, totalEntries, passed, failed, pendingJudgments, results }`. `results` lists non-pass entries only, each as `{ id, url, status, verdict?: { label, headline, action } }`. The full per-entry detail (regions, paths, rationale) lives in `<TARGET>/.blazediff/summary.md` and `<TARGET>/.blazediff/judgments/<id>/request.json`.
17
17
  - Authoring uses ONE `capture --stdin` call piped a JSON list of routes — never a per-route loop.
18
18
  - No `ls`, `cat`, `find` for paths the CLI already returns.
19
19
  - One final summary line — for authoring: `N captured | M skipped (reasons) | K auth-gated`; for check: `P/T passed (F failed)` plus failure ids.
@@ -31,8 +31,7 @@ Sibling files in this skill directory — read on demand:
31
31
  - Else → **authoring**.
32
32
 
33
33
  ## check
34
- 1. `blazediff-agent --cwd "$TARGET" check --judge host --json` (the CLI starts the dev server if `devServer` is configured; otherwise hits the configured baseUrl directly).
35
- - Prefer `run` instead for large sites (≥10 routes): `blazediff-agent --cwd "$TARGET" run --judge host --json`. Same flags, same report shape — it just pipelines capture → diff → verdict → judge through a LangGraph state graph so per-entry stages overlap. Use `check` when you want the simpler, single-pool implementation; `run` when wall-time matters or you want LangSmith traces.
34
+ 1. `blazediff-agent --cwd "$TARGET" check --judge host --json` (the CLI starts the dev server if `devServer` is configured; otherwise hits the configured baseUrl directly). Capture, diff, verdict, and judge run through a LangGraph state graph so per-entry stages overlap.
36
35
  - **Cold Next.js / Vite servers** can take 5–30s to compile a route on first hit; under default 30s `page.goto` timeout the first route in a fresh dev session sometimes times out (`page.goto: Timeout 30000ms exceeded`). If that happens, **rerun the same `check` command** — the dev server is now warm and the next pass usually completes. Don't change `waitFor` or restart the dev server; the issue is one-time compilation, not a routing or wait-condition bug.
37
36
  - **Codex sandbox / restricted-bash environments** may block Playwright's chromium launch (`browserType.launch: Target page, context or browser has been closed`). Rerun with the sandbox/escape escalation the host agent provides (in Codex: approve the command for "always run outside sandbox"). Not a blazediff bug.
38
37
  2. Pass: report `P/T passed`. Stop.