@rolepod/uiproof 0.4.1 → 0.6.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -1,7 +1,7 @@
1
1
  {
2
2
  "$schema": "https://anthropic.com/claude-code/marketplace.schema.json",
3
3
  "name": "rolepod-uiproof",
4
- "description": "Multi-platform UI / mobile automation MCP server + 4 shipped skills (verify-ui, audit-a11y, visual-diff, scaffold-e2e) for AI coding agents.",
4
+ "description": "Multi-platform UI / mobile automation MCP server + 5 shipped skills (verify-ui, audit-a11y, visual-diff, scaffold-e2e, check-errors) for AI coding agents.",
5
5
  "owner": {
6
6
  "name": "nuttaruj",
7
7
  "url": "https://github.com/nuttaruj"
@@ -10,8 +10,8 @@
10
10
  {
11
11
  "name": "rolepod-uiproof",
12
12
  "source": "./",
13
- "description": "15 MCP tools (10 atomic browser/mobile primitives + 5 composite workflows) + 4 user-invocable skills. Web production-ready via Playwright; mobile (iOS/Android) via Appium scaffolded — see `rolepod-uiproof doctor` for readiness.",
14
- "version": "0.4.0",
13
+ "description": "26 MCP tools (21 atomic browser/mobile primitives + 5 composite workflows) + 5 user-invocable skills. v0.6 adds Extension Protocol v1 support — works standalone today, becomes the verify-phase UI provider when installed alongside the `rolepod` parent plugin (evidence routes to `.rolepod/evidence/` with `manifest.json`). Replaces chrome-devtools-mcp and playwright-mcp for UI testing. Web production-ready via Playwright; mobile (iOS/Android) via Appium scaffolded — see `rolepod-uiproof doctor` for readiness.",
14
+ "version": "0.6.0",
15
15
  "author": {
16
16
  "name": "nuttaruj"
17
17
  },
@@ -1,7 +1,7 @@
1
1
  {
2
2
  "name": "rolepod-uiproof",
3
- "version": "0.4.1",
4
- "description": "Multi-platform UI/mobile automation for AI agents — 4 shipped skills (verify-ui, audit-a11y, visual-diff, scaffold-e2e) + MCP server with 15 tools. v0.3 adds AppiumEngine scaffolding for iOS/Android, scope={ref} audit, replay CLI, ddmin minimization, doctor + install:mobile.",
3
+ "version": "0.6.0",
4
+ "description": "Multi-platform UI/mobile automation for AI agents — 5 shipped skills (verify-ui, audit-a11y, visual-diff, scaffold-e2e, check-errors) + MCP server with 26 tools. Works standalone OR with the `rolepod` parent plugin: when ROLEPOD_PARENT=1 is set, evidence routes to `.rolepod/evidence/` with a `manifest.json` per Extension Protocol v1, so parent's `check-work` skill can aggregate UI verify results into its phase report. v0.5 completed the UI verification surface (console + network observability, hover/drag/fill_form/upload/dialog, runtime emulation, multi-page, gated JS eval).",
5
5
  "author": {
6
6
  "name": "nuttaruj",
7
7
  "url": "https://github.com/nuttaruj"
@@ -1,7 +1,7 @@
1
1
  {
2
2
  "name": "rolepod-uiproof",
3
- "version": "0.4.0",
4
- "description": "Multi-platform UI/mobile automation for AI agents — 4 shipped skills (verify-ui, audit-a11y, visual-diff, scaffold-e2e) + MCP server with 15 tools. v0.3 adds AppiumEngine scaffolding for iOS/Android; web is production-ready via Playwright.",
3
+ "version": "0.6.0",
4
+ "description": "Multi-platform UI/mobile automation for AI agents — 5 shipped skills (verify-ui, audit-a11y, visual-diff, scaffold-e2e, check-errors) + MCP server with 26 tools. v0.6 adds Extension Protocol v1 works standalone today, becomes the verify-phase UI provider when paired with the `rolepod` parent plugin.",
5
5
  "author": {
6
6
  "name": "nuttaruj",
7
7
  "url": "https://github.com/nuttaruj"
@@ -25,7 +25,7 @@
25
25
  "interface": {
26
26
  "displayName": "Rolepod UIProof",
27
27
  "shortDescription": "UI verification, a11y audits, visual diff, e2e scaffolding — for AI coding agents.",
28
- "longDescription": "rolepod-uiproof ships an MCP server with 15 tools (10 atomic + 5 composite) and 4 user-invocable skills (/verify-ui, /audit-a11y, /visual-diff, /scaffold-e2e). Web is fully supported via Playwright; mobile (iOS/Android via Appium) lands in v0.3.",
28
+ "longDescription": "rolepod-uiproof ships an MCP server with 26 tools (21 atomic + 5 composite) and 5 user-invocable skills (/verify-ui, /audit-a11y, /visual-diff, /scaffold-e2e, /check-errors). Web is fully supported via Playwright; mobile (iOS/Android via Appium) supports basic input. v0.6: pair with the `rolepod` parent plugin (v2.7+) and uiproof becomes the verify-phase UI provider — evidence routes to `.rolepod/evidence/` with a `manifest.json` per Extension Protocol v1.",
29
29
  "developerName": "nuttaruj",
30
30
  "category": "Productivity",
31
31
  "capabilities": ["Read", "Write", "Bash"],
@@ -1,8 +1,8 @@
1
1
  {
2
2
  "name": "rolepod-uiproof",
3
3
  "displayName": "Rolepod UIProof",
4
- "version": "0.4.0",
5
- "description": "Multi-platform UI / mobile automation MCP server + 4 shipped skills (verify-ui, audit-a11y, visual-diff, scaffold-e2e) for AI coding agents. Web production-ready via Playwright; mobile (iOS/Android via Appium) scaffolded.",
4
+ "version": "0.6.0",
5
+ "description": "Multi-platform UI / mobile automation MCP server + 5 shipped skills (verify-ui, audit-a11y, visual-diff, scaffold-e2e, check-errors) for AI coding agents. v0.6 adds Extension Protocol v1 — works standalone today, becomes the verify-phase UI provider when paired with the `rolepod` parent plugin (evidence routes to `.rolepod/evidence/` with `manifest.json`). Replaces chrome-devtools-mcp and playwright-mcp.",
6
6
  "author": {
7
7
  "name": "nuttaruj"
8
8
  },
package/CHANGELOG.md CHANGED
@@ -7,6 +7,174 @@ release.
7
7
 
8
8
  ## [Unreleased]
9
9
 
10
+ ## [0.6.0] — 2026-05-27
11
+
12
+ **Extension Protocol v1 — `uiproof` becomes parent-aware. Standalone
13
+ behavior unchanged.**
14
+
15
+ When the parent `rolepod` plugin (v2.7+) sets `ROLEPOD_PARENT=1` via
16
+ its SessionStart hook, uiproof routes evidence to the shared
17
+ `.rolepod/evidence/` tree and emits a `manifest.json` per spec so the
18
+ parent's `check-work` skill can aggregate UI verify results into its
19
+ phase report. With no parent installed the v0.5 behavior is preserved
20
+ exactly — same artifact path, same tool output, plus a `manifest.json`
21
+ in each run dir as a bonus.
22
+
23
+ ### Added
24
+
25
+ - **Env-aware evidence path** in `ArtifactStore`. Detected at
26
+ construction from `process.env.ROLEPOD_PARENT === "1"`.
27
+ - standalone: `.rolepod-uiproof/artifacts/{prefix}_{ts}_{uuid}/`
28
+ - with-parent: `.rolepod/evidence/{ts}-rolepod-uiproof-{skill}/`
29
+ - **`manifest.json`** written by every composite that starts a run
30
+ (`verify_ui_flow`, `audit_a11y`, `visual_diff`, `scaffold_e2e`).
31
+ Schema follows Extension Protocol v1: `protocol`, `plugin`, `skill`,
32
+ `phase`, `status`, `summary`, `started_at`, `finished_at`,
33
+ `artifacts: [{type, path}]`, `metadata`. Best-effort: any IO failure
34
+ is logged but never thrown.
35
+ - **Graduated a11y status**. `audit_a11y` manifest carries `status`:
36
+ `critical/serious > 0 → fail`, `moderate/minor > 0 → warn`, no
37
+ issues → `pass`. Keeps the `warn` signal a strict pass/fail would
38
+ discard.
39
+ - **Protocol version check**. When `ROLEPOD_PROTOCOL` is set but
40
+ does not equal `v1`, `buildServer()` logs a one-shot warning. Does
41
+ not block; manifest is still written in v1 shape.
42
+ - **`/check-errors` evidence routing doc** alongside the other 4
43
+ skills.
44
+
45
+ ### Changed
46
+
47
+ - `ArtifactStore.startRun(prefix, opts?)` — `opts.skill` is new and
48
+ optional. Provides the canonical skill name for both the
49
+ with-parent dirname and the manifest's `skill` field. Return shape
50
+ extended with `skill` and `mode` (back-compat: existing destructuring
51
+ of `{ runId, runDir }` keeps working).
52
+ - `buildServer()` log line surfaces `protocol: "v1"` and
53
+ `mode: "standalone" | "with-parent"` alongside the existing version
54
+ + tools list.
55
+ - All 5 shipped skills' SKILL.md gained an "Evidence routing" section
56
+ between "Process" / "Outputs" and "If the tool is unavailable".
57
+ Mirrored to `plugins/rolepod-uiproof/skills/`.
58
+ - README "Standalone vs Combined" section added explaining the two
59
+ modes.
60
+
61
+ ### Behavior
62
+
63
+ - **Standalone:** unchanged. Evidence still written to
64
+ `.rolepod-uiproof/artifacts/`. New: a `manifest.json` appears in each
65
+ run dir. Tool return values gain an optional `manifest: "<path>"`
66
+ field; everything else is byte-for-byte identical.
67
+ - **With rolepod parent:** evidence written to
68
+ `.rolepod/evidence/<ts>-rolepod-uiproof-<skill>/` with `manifest.json`
69
+ per protocol spec. Visual baselines stay in
70
+ `.rolepod-uiproof/baselines/` regardless of mode.
71
+
72
+ ### Non-goals (kept out of v0.6)
73
+
74
+ - Dynamic capabilities registry (`.claude-plugin/capabilities.json`)
75
+ - Protocol version negotiation beyond a single warn
76
+ - Cross-child coordination (uiproof ↔ wplab handoff inside one run)
77
+ - Mobile platform support stays at the v0.5 partial level
78
+
79
+ ## [0.5.0] — 2026-05-27
80
+
81
+ **Complete UI verification surface — one MCP replaces chrome-devtools-mcp
82
+ and playwright-mcp for UI testing use cases.**
83
+
84
+ Tool count: 15 → 26 (atomic 10 → 21, composite 5 unchanged). The five
85
+ "out of scope for `uiproof`" families (Lighthouse, performance traces,
86
+ heap snapshots, extensions, third-party page tools) are intentionally
87
+ **not** added — those belong to future `rolepod-perfproof` and
88
+ `rolepod-secproof` MCPs.
89
+
90
+ ### Added — 11 new atomic tools
91
+
92
+ Cross-platform (work on chromium/firefox/webkit; mobile stubs throw
93
+ `engine_error` until gestures land):
94
+
95
+ - `rolepod_browser_hover` — `locator.hover()`; refs stay valid
96
+ - `rolepod_browser_drag` — `locator.dragTo()`
97
+ - `rolepod_browser_fill_form` — batch input/select/checkbox/radio
98
+ - `rolepod_browser_upload_file` — `locator.setInputFiles()`, abs path required
99
+
100
+ Web-only (cast to `PlaywrightEngine`):
101
+
102
+ - `rolepod_browser_handle_dialog` — pre-arm one-shot accept/dismiss
103
+ - `rolepod_browser_console` — list/filter/clear ring-buffered console
104
+ messages (1000-entry cap, errors+warnings default)
105
+ - `rolepod_browser_network` — list/filter network requests, optional HAR export
106
+ - `rolepod_browser_set_env` — runtime viewport / offline / geolocation /
107
+ color_scheme / reduced_motion / extra_headers / network_throttle (CDP) /
108
+ cpu_throttle (CDP)
109
+ - `rolepod_browser_evaluate` — arbitrary JS in page context.
110
+ **Disabled by default** — opt in via `ROLEPOD_ALLOW_EVAL=1` env var
111
+ - `rolepod_browser_pages` — list pages in active context (popups,
112
+ target=_blank, OAuth windows)
113
+ - `rolepod_browser_switch_page` — set active page index
114
+
115
+ ### Added — verify_ui_flow capture lifecycle (impl)
116
+
117
+ The `capture` array has accepted these values since v0.1, but only
118
+ `screenshot` was wired. v0.5 fills in the rest:
119
+
120
+ - `console` → `{runDir}/console.json`
121
+ - `har` → `{runDir}/network.har`
122
+ - `video` → `{runDir}/videos/*.webm`
123
+ - `trace` → `{runDir}/trace.zip` (view with `npx playwright show-trace`)
124
+ - `a11y_tree` → `{runDir}/a11y_tree.json`
125
+
126
+ ### Added — 8 new verify_ui_flow step kinds
127
+
128
+ `hover`, `drag`, `fill_form`, `upload`, `dialog`, `set_env`,
129
+ `switch_page`, `evaluate`. All get first-class codegen in
130
+ `scaffold_e2e` for playwright-test and pytest+selenium.
131
+
132
+ ### Added — 4 new verify_ui_flow expect kinds
133
+
134
+ - `no_console_errors` — filter level=error, drop excludes, count must be 0
135
+ - `no_failed_requests` — filter `failure || status>=400` (or `>=500`
136
+ when `allow_4xx`), drop excludes, count must be 0
137
+ - `request_made` — URL regex + optional method must match `min_count`
138
+ (default 1) times
139
+ - `response_status` — URL regex + exact status code must match
140
+
141
+ ### Added — multi-page support
142
+
143
+ A session is now a `context` (was a single page). Popups and
144
+ `target="_blank"` links are auto-tracked. Use `browser_pages` to list,
145
+ `browser_switch_page` to activate. Default active = page 0.
146
+
147
+ ### Added — new skill `/check-errors`
148
+
149
+ Thin wrapper over `rolepod_verify_ui_flow` with strict assertions baked
150
+ in. Use case: PR-gate or post-merge smoke.
151
+
152
+ ### Changed — `/verify-ui` and `/scaffold-e2e` skills
153
+
154
+ Documented every new step / expect / capture kind. Default suggestion
155
+ in `/verify-ui`: include `no_console_errors` and `no_failed_requests`
156
+ in `expect` for any user-visible flow.
157
+
158
+ ### Changed — Engine interface
159
+
160
+ Adds four cross-platform input methods: `hover`, `drag`, `fillForm`,
161
+ `uploadFile`. `OpenOptions.capture` accepts `{ har, video, trace }`.
162
+ `WebSession.page` renamed to `mainPage`; internal call sites go through
163
+ `activePage(s)`.
164
+
165
+ ### Non-changes (intentional)
166
+
167
+ - `screencast_*` not added — Playwright `trace.zip` is strictly better.
168
+ - `click_at` not added — use refs from `snapshot`.
169
+ - Lighthouse not added — axe-core covers a11y.
170
+ - Performance traces / heap snapshots not added — `rolepod-perfproof` scope.
171
+ - Extension management not added — out of scope.
172
+
173
+ ### Migration from 0.4
174
+
175
+ Pure additions; no behavioral changes on existing tools or
176
+ step/expect/capture kinds. Existing replay bundles play back unchanged.
177
+
10
178
  ## [0.4.1] — 2026-05-27
11
179
 
12
180
  ### Fixed
package/README.md CHANGED
@@ -1,28 +1,47 @@
1
1
  # rolepod-uiproof
2
2
 
3
- **rolepod-uiproof gives Claude Code, Cursor, Codex CLI, and Gemini CLI a real browser/mobile driver — so the AI can actually click through your UI, audit accessibility, diff screenshots, and scaffold e2e tests instead of guessing.**
3
+ **rolepod-uiproof gives Claude Code, Cursor, Codex CLI, and Gemini CLI a real browser/mobile driver — so the AI can actually click through your UI, audit accessibility, check console errors, inspect network requests, diff screenshots, and scaffold e2e tests instead of guessing.**
4
4
 
5
- One MCP server, one tool surface, four skills you invoke from chat. Web is production-ready via Playwright; iOS and Android use Appium (same client as alumnium — needs a local Appium daemon + simulator/emulator, or a real device). No internal LLM — your Lead agent drives every action.
5
+ One MCP server, one tool surface, five skills you invoke from chat. Web is production-ready via Playwright; iOS and Android use Appium (same client as alumnium — needs a local Appium daemon + simulator/emulator, or a real device). No internal LLM — your Lead agent drives every action.
6
+
7
+ **v0.5 completes the UI verification surface — replacing `chrome-devtools-mcp` and `playwright-mcp` for UI testing.** 26 tools total (21 atomic + 5 composite). New in v0.5: console + network observability, hover / drag / fill_form / upload / dialog, runtime emulation (resize / offline / geolocation / color_scheme / network + CPU throttle), multi-page support, gated JS eval, and impl of HAR / video / trace capture in `/verify-ui`.
6
8
 
7
9
  ## What it helps with
8
10
 
9
- - **Verify a UI change in seconds.** `/verify-ui` opens a real browser, runs your steps, checks your assertions, saves a screenshot + replay bundle.
11
+ - **Verify a UI change in seconds.** `/verify-ui` opens a real browser, runs your steps, checks your assertions, saves a screenshot + replay bundle (optionally HAR + video + trace + console logs).
12
+ - **Gate merges on "no regressions during this flow".** `/check-errors` runs a flow with strict `no_console_errors` + `no_failed_requests` assertions baked in. PR-gate or post-merge smoke check.
10
13
  - **Catch a11y regressions before merge.** `/audit-a11y` runs axe-core against WCAG-A / AA / AAA and returns issues grouped by severity, with WCAG references and fix links.
11
14
  - **Lock down the visual contract.** `/visual-diff` captures a screenshot and compares against a named baseline under `./.rolepod-uiproof/baselines/`. First call seeds; subsequent calls diff.
12
- - **Turn an interactive verify run into a real test file.** `/scaffold-e2e` transcribes a replay bundle into Playwright Test, Vitest+Playwright, or pytest+selenium.
15
+ - **Turn an interactive verify run into a real test file.** `/scaffold-e2e` transcribes a replay bundle into Playwright Test, Vitest+Playwright, or pytest+selenium — with first-class codegen for every step + expect kind.
13
16
  - **Reproduce + minimize a bug deterministically.** `/verify-ui` with `mode: "reproduce"` runs ddmin step-elimination to find the shortest still-reproducing sequence.
14
17
 
15
- ## The four skills
18
+ ## The five skills
16
19
 
17
20
  | Skill | Wraps | What it does |
18
21
  |---|---|---|
19
- | `/verify-ui` | `rolepod_verify_ui_flow` | Drive a session through steps, evaluate assertions, save evidence + replay bundle. `mode: assert` (default) or `reproduce` with optional ddmin minimization. |
22
+ | `/verify-ui` | `rolepod_verify_ui_flow` | Drive a session through steps, evaluate assertions (incl. console errors / failed requests / specific request made / response status), save evidence (screenshot / console / HAR / video / trace / a11y_tree) + replay bundle. `mode: assert` or `reproduce` with optional ddmin minimization. |
23
+ | `/check-errors` | `rolepod_verify_ui_flow` | Thin wrapper with strict `no_console_errors` + `no_failed_requests` baked in. Use as PR-gate or post-merge smoke. |
20
24
  | `/audit-a11y` | `rolepod_audit_a11y` | axe-core audit at WCAG-A / AA / AAA. `scope: "page"` or `scope: { ref }`. Markdown or JSON report. |
21
25
  | `/visual-diff` | `rolepod_visual_diff` | Pixel diff against a named baseline. Auto-seeds on first call. Configurable threshold + pixelmatch sensitivity. |
22
- | `/scaffold-e2e` | `rolepod_scaffold_e2e` | Generate a runnable test file from a scenario + optional replay bundle. Three target frameworks. |
26
+ | `/scaffold-e2e` | `rolepod_scaffold_e2e` | Generate a runnable test file from a scenario + optional replay bundle. Three target frameworks. v0.5 codegen handles every step + expect kind. |
23
27
 
24
28
  Every skill is **single-backend** (D-024) — it calls the rolepod-uiproof server and only the rolepod-uiproof server. If the server is unavailable, the skill fails with a clear diagnostic. Multi-backend routing belongs in the parent [`rolepod`](https://github.com/nuttaruj/rolepod) plugin's phase skills, not here.
25
29
 
30
+ ## Standalone vs Combined
31
+
32
+ `rolepod-uiproof` works either as a **standalone** browser MCP for any project, or **combined** with the [`rolepod`](https://github.com/nuttaruj/rolepod) parent plugin (v2.7+) where it becomes the Verify phase provider for UI artifacts.
33
+
34
+ **Standalone** (default): use the 5 skills directly as atomic browser tools. Evidence saved under `./.rolepod-uiproof/artifacts/<run>/` with a `manifest.json` per Extension Protocol v1.
35
+
36
+ **Combined with rolepod parent**: when the parent's SessionStart hook sets `ROLEPOD_PARENT=1`, uiproof writes evidence to `./.rolepod/evidence/<ts>-rolepod-uiproof-<skill>/` instead, where parent's `check-work` skill auto-aggregates manifests into the verify report. No skill changes — same 26 tools, same 5 skills, smarter routing.
37
+
38
+ | Install | Unlocks |
39
+ |---|---|
40
+ | uiproof alone | Browser test, a11y audit, visual diff, e2e scaffold, error gate |
41
+ | uiproof + rolepod parent | + verify-phase aggregation, evidence handoff to `check-work` |
42
+
43
+ The `manifest.json` is written in BOTH modes, so installing the parent later still lets historic artifacts get picked up. Baselines for `/visual-diff` always live in `./.rolepod-uiproof/baselines/` regardless of mode — they are user-curated configuration, not per-run evidence.
44
+
26
45
  ## Install
27
46
 
28
47
  Pick your CLI. All install paths share the same MCP server (`@rolepod/uiproof` on npm) and the same skill set.