@rolepod/uiproof 0.4.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md ADDED
@@ -0,0 +1,170 @@
1
+ # rolepod-uiproof
2
+
3
+ **rolepod-uiproof gives Claude Code, Cursor, Codex CLI, and Gemini CLI a real browser/mobile driver — so the AI can actually click through your UI, audit accessibility, diff screenshots, and scaffold e2e tests instead of guessing.**
4
+
5
+ One MCP server, one tool surface, four skills you invoke from chat. Web is production-ready via Playwright; iOS and Android use Appium (same client as alumnium — needs a local Appium daemon + simulator/emulator, or a real device). No internal LLM — your Lead agent drives every action.
6
+
7
+ ## What it helps with
8
+
9
+ - **Verify a UI change in seconds.** `/verify-ui` opens a real browser, runs your steps, checks your assertions, saves a screenshot + replay bundle.
10
+ - **Catch a11y regressions before merge.** `/audit-a11y` runs axe-core against WCAG-A / AA / AAA and returns issues grouped by severity, with WCAG references and fix links.
11
+ - **Lock down the visual contract.** `/visual-diff` captures a screenshot and compares against a named baseline under `./.rolepod-uiproof/baselines/`. First call seeds; subsequent calls diff.
12
+ - **Turn an interactive verify run into a real test file.** `/scaffold-e2e` transcribes a replay bundle into Playwright Test, Vitest+Playwright, or pytest+selenium.
13
+ - **Reproduce + minimize a bug deterministically.** `/verify-ui` with `mode: "reproduce"` runs ddmin step-elimination to find the shortest still-reproducing sequence.
14
+
15
+ ## The four skills
16
+
17
+ | Skill | Wraps | What it does |
18
+ |---|---|---|
19
+ | `/verify-ui` | `rolepod_verify_ui_flow` | Drive a session through steps, evaluate assertions, save evidence + replay bundle. `mode: assert` (default) or `reproduce` with optional ddmin minimization. |
20
+ | `/audit-a11y` | `rolepod_audit_a11y` | axe-core audit at WCAG-A / AA / AAA. `scope: "page"` or `scope: { ref }`. Markdown or JSON report. |
21
+ | `/visual-diff` | `rolepod_visual_diff` | Pixel diff against a named baseline. Auto-seeds on first call. Configurable threshold + pixelmatch sensitivity. |
22
+ | `/scaffold-e2e` | `rolepod_scaffold_e2e` | Generate a runnable test file from a scenario + optional replay bundle. Three target frameworks. |
23
+
24
+ Every skill is **single-backend** (D-024) — it calls the rolepod-uiproof server and only the rolepod-uiproof server. If the server is unavailable, the skill fails with a clear diagnostic. Multi-backend routing belongs in the parent [`rolepod`](https://github.com/nuttaruj/rolepod) plugin's phase skills, not here.
25
+
26
+ ## Install
27
+
28
+ Pick your CLI. All install paths share the same MCP server (`@rolepod/uiproof` on npm) and the same skill set.
29
+
30
+ ### Claude Code (recommended)
31
+
32
+ ```bash
33
+ # Install
34
+ claude plugin marketplace add nuttaruj/rolepod-uiproof
35
+ claude plugin install rolepod-uiproof@rolepod-uiproof
36
+
37
+ # Update
38
+ claude plugin marketplace update rolepod-uiproof
39
+ claude plugin install rolepod-uiproof@rolepod-uiproof
40
+
41
+ # Uninstall
42
+ claude plugin uninstall rolepod-uiproof@rolepod-uiproof
43
+ claude plugin marketplace remove rolepod-uiproof
44
+ ```
45
+
46
+ The plugin auto-registers the four `/verify-ui` / `/audit-a11y` / `/visual-diff` / `/scaffold-e2e` skills AND spawns the MCP server (`npx -y @rolepod/uiproof`) on session start.
47
+
48
+ ### Cursor IDE
49
+
50
+ Cursor's plugin marketplace is enterprise-only (Free / Pro plans cannot install marketplace plugins). For everyone else, drop the workspace MCP config:
51
+
52
+ ```bash
53
+ # Per project — copy from this repo, or run:
54
+ mkdir -p .cursor
55
+ curl -fsSL https://raw.githubusercontent.com/nuttaruj/rolepod-uiproof/main/.cursor/mcp.json -o .cursor/mcp.json
56
+
57
+ # Or global (across every project)
58
+ mkdir -p ~/.cursor
59
+ curl -fsSL https://raw.githubusercontent.com/nuttaruj/rolepod-uiproof/main/.cursor/mcp.json -o ~/.cursor/mcp.json
60
+ ```
61
+
62
+ Then **fully restart Cursor** — MCP servers load only at startup. Verify under **Settings → MCP**.
63
+
64
+ Skills are not auto-registered under Cursor (no unified plugin format for skills + MCP in one). The MCP tools are still available; invoke them by name in chat (`Use rolepod_verify_ui_flow to …`).
65
+
66
+ > **Teams / Enterprise:** add `https://github.com/nuttaruj/rolepod-uiproof` as a team marketplace under **Settings → Plugins** for one-click install with skills auto-registered.
67
+
68
+ ### Codex CLI
69
+
70
+ ```bash
71
+ # Install
72
+ codex plugin marketplace add nuttaruj/rolepod-uiproof
73
+ codex plugin install rolepod-uiproof@rolepod-uiproof
74
+
75
+ # Update
76
+ codex plugin marketplace upgrade rolepod-uiproof
77
+ codex plugin install rolepod-uiproof@rolepod-uiproof
78
+ ```
79
+
80
+ Codex reads the plugin from `.agents/plugins/marketplace.json` + `.codex-plugin/plugin.json` in this repo. Skills install to `~/.codex/skills/` (Codex's plugin loader handles registration).
81
+
82
+ ### Gemini CLI
83
+
84
+ Not yet shipped. The Gemini extension format is not yet stable enough to commit to; we plan to add `gemini-extension.json` in v0.4. Track [issue #N](https://github.com/nuttaruj/rolepod-uiproof/issues) if you need it.
85
+
86
+ ### Direct npm (any MCP-aware tool)
87
+
88
+ Use this when your tool reads a standard `mcpServers` config (most non-CLI MCP clients):
89
+
90
+ ```json
91
+ {
92
+ "mcpServers": {
93
+ "rolepod-uiproof": {
94
+ "command": "npx",
95
+ "args": ["-y", "@rolepod/uiproof"]
96
+ }
97
+ }
98
+ }
99
+ ```
100
+
101
+ 15 MCP tools (`rolepod_browser_*` + `rolepod_verify_ui_flow` + 4 composites) will appear in your client. Skills are not surfaced via this path — call the tools by name.
102
+
103
+ ## Quick start
104
+
105
+ After install, in your Claude Code / Cursor / Codex session:
106
+
107
+ ```
108
+ /verify-ui https://example.com
109
+ steps: []
110
+ expect: text_visible "Example Domain", text_visible "Learn more"
111
+ ```
112
+
113
+ Returns a `run_id`, `passed: true`, and a path under `./.rolepod-uiproof/artifacts/verify_<run_id>/`:
114
+
115
+ ```
116
+ .rolepod-uiproof/artifacts/verify_20260524T101512_a1b2c3d4/
117
+ ├── final.png screenshot at end of run
118
+ └── replay.json replay bundle — re-runnable via `npx rolepod-uiproof replay …`
119
+ ```
120
+
121
+ Convert that to a Playwright Test file:
122
+
123
+ ```
124
+ /scaffold-e2e from .rolepod-uiproof/artifacts/verify_…/replay.json using playwright-test
125
+ ```
126
+
127
+ ## Verify your setup
128
+
129
+ ```bash
130
+ npx rolepod-uiproof doctor
131
+ ```
132
+
133
+ ```
134
+ ✓ Node ≥20 24.14.0
135
+ ✓ Playwright Chromium installed ~/Library/Caches/ms-playwright
136
+ ✓ webdriverio (mobile client, v0.3)
137
+ • Appium server (roadmap v0.3) Not reachable at http://127.0.0.1:4723/status
138
+ ✓ Xcode (iOS, roadmap v0.3) /Applications/Xcode.app
139
+ • Android SDK (roadmap v0.3) Set ANDROID_HOME — needed only for Android
140
+ • SeleniumEngine (roadmap v0.4) Not implemented — deferred to v0.4
141
+ ✓ Artifact root writable
142
+ ```
143
+
144
+ `✓` = ready · `•` = optional / deferred · `✗` = blocker.
145
+
146
+ ## What's inside
147
+
148
+ - **15 MCP tools** — 10 atomic browser/mobile primitives (`browser_open`, `_close`, `_snapshot`, `_click`, `_type`, `_key`, `_scroll`, `_wait_for`, `_screenshot`, `_navigate`) + 5 composites (`verify_ui_flow`, `audit_a11y`, `visual_diff`, `scaffold_e2e`, `extract_ui_state`). All prefixed `rolepod_*` to namespace away from other MCP servers.
149
+ - **2 engines behind one interface** — `PlaywrightEngine` for web (Chromium / Firefox / WebKit), `AppiumEngine` for iOS XCUITest + Android UIAutomator2. The Lead sees one unified `A11yNode` shape regardless of platform.
150
+ - **Stable refs with explicit invalidation (D-010)** — every state-changing call invalidates prior refs; the engine returns a structured `stale_ref` error if you try to reuse one. No silent locator drift.
151
+ - **Replay bundles** — every `/verify-ui` run writes a JSON replay you can re-run later with `npx rolepod-uiproof replay <bundle.json>`, agent-free.
152
+ - **No internal LLM (D-004)** — your Lead agent makes every decision. We don't double-bill you for inference.
153
+
154
+ ## Use with parent rolepod
155
+
156
+ If you also use [`rolepod`](https://github.com/nuttaruj/rolepod) (the markdown plugin), its `check-work`, `debug-issue`, and `review-code` skills auto-route to `/verify-ui`, `/audit-a11y`, and `/visual-diff` when the rolepod-uiproof server is present. Nothing breaks if it isn't — parent falls back to Playwright MCP / Chrome DevTools MCP / manual verification.
157
+
158
+ The two are **independent**: install rolepod-uiproof standalone and get a complete experience via slash commands, or install both together and let parent's phase router pick the right backend automatically.
159
+
160
+ ## Docs
161
+
162
+ - [docs/sessions.md](docs/sessions.md) — session lifecycle, stale-ref semantics, multi-session
163
+ - [docs/artifacts.md](docs/artifacts.md) — `.rolepod-uiproof/` layout, run_id convention, replay bundle format
164
+ - [docs/recipes/](docs/recipes/) — `verify-a-checkout-flow`, `audit-a11y-during-review`, `visual-baseline-workflow`
165
+ - [CHANGELOG.md](CHANGELOG.md) — release history with per-version "Not yet verified" notes mapped to milestones
166
+ - [CONTRIBUTING.md](CONTRIBUTING.md), [SECURITY.md](SECURITY.md), [CODE_OF_CONDUCT.md](CODE_OF_CONDUCT.md)
167
+
168
+ ---
169
+
170
+ MIT licensed — see [LICENSE](LICENSE) and [THIRD_PARTY.md](THIRD_PARTY.md). Mobile AT normalizers are alumnium-inspired ([UPSTREAM_TRACKING.md](UPSTREAM_TRACKING.md)). Feedback + runtime reports for Cursor / Codex / Gemini install paths especially welcome via [issues](https://github.com/nuttaruj/rolepod-uiproof/issues).
package/THIRD_PARTY.md ADDED
@@ -0,0 +1,104 @@
1
+ # Third-Party Notices
2
+
3
+ rolepod-uiproof depends on and (in later milestones) incorporates code from the
4
+ following third-party projects. All listed projects are MIT-licensed and
5
+ compatible with this project's MIT license (see `LICENSE`).
6
+
7
+ ---
8
+
9
+ ## alumnium
10
+
11
+ - **Project:** [alumnium-hq/alumnium](https://github.com/alumnium-hq/alumnium)
12
+ - **License:** MIT
13
+ - **Used for:** Driver abstraction and accessibility-tree extractors for
14
+ Chromium (web), XCUITest (iOS), and UIAutomator2 (Android).
15
+ - **Relationship:** Code is **forked** (copied with modification), not
16
+ depended on as an npm package. The LLM-driven `Alumni` class, LangChain
17
+ bindings, and OpenAI integration are **not** copied — only the driver and
18
+ accessibility layers. See `UPSTREAM_TRACKING.md` for the fork rationale,
19
+ the upstream commit referenced, and the cherry-pick policy.
20
+
21
+ ### Forked files
22
+
23
+ > **Note (v0.3):** After surveying alumnium during scaffolding we chose
24
+ > an **inspired-by** reimplementation rather than a verbatim fork. See
25
+ > [`UPSTREAM_TRACKING.md`](UPSTREAM_TRACKING.md) for the reasoning,
26
+ > the upstream commit SHA referenced, and the quarterly cherry-pick
27
+ > policy.
28
+ >
29
+ > The Chromium AT path uses Playwright 1.60's built-in
30
+ > `page.ariaSnapshot({mode:'ai'})` directly. The mobile AT extractors
31
+ > (`src/engine/a11y/xcuitest.ts`, `uiautomator2.ts`) are
32
+ > alumnium-inspired Original code parsing Appium's XML page source via
33
+ > `fast-xml-parser`.
34
+ >
35
+ > Should literal alumnium source be copied in a future revision, each
36
+ > file will carry this header:
37
+ >
38
+ > ```
39
+ > /*
40
+ > * Originally from alumnium-hq/alumnium (MIT License).
41
+ > * Source commit: <SHA>
42
+ > * Modified for rolepod-uiproof.
43
+ > */
44
+ > ```
45
+
46
+ ### Upstream MIT notice
47
+
48
+ ```
49
+ MIT License
50
+
51
+ Copyright (c) alumnium-hq contributors
52
+
53
+ Permission is hereby granted, free of charge, to any person obtaining a copy
54
+ of this software and associated documentation files (the "Software"), to deal
55
+ in the Software without restriction, including without limitation the rights
56
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
57
+ copies of the Software, and to permit persons to whom the Software is
58
+ furnished to do so, subject to the following conditions:
59
+
60
+ The above copyright notice and this permission notice shall be included in all
61
+ copies or substantial portions of the Software.
62
+
63
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
64
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
65
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
66
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
67
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
68
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
69
+ SOFTWARE.
70
+ ```
71
+
72
+ ---
73
+
74
+ ## Runtime npm dependencies
75
+
76
+ The following npm packages are direct runtime dependencies. Each retains its
77
+ own license; this section is acknowledgement only.
78
+
79
+ - `@modelcontextprotocol/sdk` — MIT — MCP protocol implementation.
80
+ - `playwright` — Apache-2.0 — Web automation engine for the `web` platform.
81
+ - `zod` — MIT — Tool input/output schema validation.
82
+ - `js-yaml` — MIT — Parses Playwright's `ariaSnapshot({mode:'ai'})` YAML
83
+ output into the unified `A11yNode` tree.
84
+ - `@axe-core/playwright` — MPL-2.0 — Powers the `rolepod_audit_a11y`
85
+ composite. axe-core is dual-licensed MPL-2.0 (weak copyleft); using it
86
+ as an unmodified runtime dependency is compatible with this project's
87
+ MIT license. We do not modify axe-core source.
88
+ - `pixelmatch` — ISC — Pixel-level image comparison for
89
+ `rolepod_visual_diff`.
90
+ - `pngjs` — MIT — PNG encode/decode for baseline + diff images in
91
+ `rolepod_visual_diff`.
92
+ - `fast-xml-parser` — MIT — Parses Appium's XML page source in the
93
+ mobile AT normalizers (`xcuitest.ts`, `uiautomator2.ts`).
94
+
95
+ ## Optional npm dependencies
96
+
97
+ - `webdriverio` — MIT — Loaded lazily by `AppiumEngine` when a mobile
98
+ session is requested. Web-only installs skip it via npm
99
+ `optionalDependencies`.
100
+
101
+ ## Build-time-only dependencies
102
+
103
+ - `zod-to-json-schema` — ISC — Used by `npm run build:schemas` to emit
104
+ `dist/schemas/tools.json`. Not shipped at runtime.
@@ -0,0 +1,58 @@
1
+ # Upstream tracking
2
+
3
+ Records the upstream sources that have informed rolepod-uiproof's design and
4
+ implementation, so that future cherry-picks or audits can locate them.
5
+
6
+ ## alumnium-hq/alumnium (MIT)
7
+
8
+ - **Repo:** https://github.com/alumnium-hq/alumnium
9
+ - **Commit referenced:** `94dea1e6916c3fb8e38fc229a7c7c85aa6230d52`
10
+ - **Date referenced:** 2026-05-24
11
+ - **Used as:** Design reference for mobile accessibility-tree shape and
12
+ the XCUITest / UIAutomator2 XML → unified tree mapping.
13
+
14
+ ### Status
15
+
16
+ The original engine-layer design specified a *verbatim fork* of
17
+ alumnium's `packages/typescript/src/drivers/` and
18
+ `packages/typescript/src/accessibility/`. After surveying the source
19
+ during v0.3 scaffolding we instead chose an **inspired-by**
20
+ reimplementation:
21
+
22
+ - alumnium uses bun-style `.ts` import extensions throughout; our
23
+ Node + tsup setup uses `.js` resolution. Mass-renaming imports was
24
+ the bulk of any literal fork effort.
25
+ - alumnium pulls four runtime XML deps (`domhandler`, `htmlparser2`,
26
+ `dom-serializer`, `xml-formatter`) plus its internal `alwaysly`
27
+ helper. `fast-xml-parser` (MIT, single dep) covers our needs.
28
+ - The accessibility-tree types in alumnium serve their LLM `Alumni`
29
+ class; ours serve the unified `A11yNode` schema in
30
+ `src/schema/tools.ts`, so the field set differs anyway.
31
+
32
+ ### What we keep from alumnium
33
+
34
+ - The overall shape of the XCUITest and UIAutomator2 tree extractors —
35
+ walk the Appium XML page source, assign stable refs, map native
36
+ attributes (`name`, `label`, `value`, `content-desc`, `text`,
37
+ `resource-id`, etc.) into a normalized accessibility shape.
38
+ - The decision to use Appium's `getPageSource` as the AT entry point
39
+ for mobile (alumnium proved this is workable).
40
+
41
+ ### What we DO NOT keep
42
+
43
+ - The `Alumni` / LLM-driven action loop (incompatible with our
44
+ Lead-driven D-004 design).
45
+ - The `Xml` namespace + 4 XML parsing deps.
46
+ - The `pythonic*` polyfills.
47
+ - The CLI / MCP wrappers (we have our own).
48
+ - Literal source files.
49
+
50
+ ### Quarterly cherry-pick policy
51
+
52
+ Each quarter, review alumnium's commit log between the SHA above and
53
+ their `HEAD`. Cherry-pick *behavioral* fixes that apply (a UIAutomator2
54
+ attribute we missed, an XCUITest edge case, etc.). We do **not** commit
55
+ to staying current on every bug fix; we commit to staying *correct*.
56
+
57
+ When alumnium fixes a behavioral bug we share, update this file with
58
+ the new SHA + date + a one-line note on what changed.
@@ -0,0 +1 @@
1
+ #!/usr/bin/env node