npm - pi-chrome - Versions diffs - 0.14.5 → 0.14.7 - Mend

pi-chrome 0.14.5 → 0.14.7

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (5) hide show

package/CHANGELOG.md +8 -0
package/README.md +7 -3
package/docs/COMPARISON.md +2 -2
package/extensions/chrome-profile-bridge/browser-extension/manifest.json +1 -1
package/package.json +1 -1

package/CHANGELOG.md CHANGED Viewed

@@ -2,6 +2,14 @@
 All notable user-facing changes to `pi-chrome`.
+## 0.14.7
+- Replace "30+ challenges" hand-wave in README + COMPARISON.md with the accurate framing from chrome-benchmark: **38 primitive challenges + 4 hermetic BrowserGym-style long-horizon tasks**, scored by **expected-outcome-by-mode** (not raw PASS count). Explains why a synthetic-events tool isn't supposed to satisfy a clipboard user-activation gate — matching that expectation is the pass.
+## 0.14.6
+- Fix Browser Use license in `docs/COMPARISON.md`: MIT (not Apache-2.0). Confirmed against upstream LICENSE on GitHub.
 ## 0.14.5
 - `docs/COMPARISON.md` rewritten with a three-axis landscape (drivers / agent frameworks / cloud providers). Adds Browser Use, Stagehand, Skyvern, Magnitude, Alumnium, OpenAI Operator, Project Mariner, Surfer 2, Anthropic Computer Use, Browserbase, Steel.dev, Hyperbrowser, Anchor, Browserless. Adds Interop section, public-benchmark cheat sheet (WebArena, WorkArena++, BrowseComp, Mind2Web 2, WebChoreArena, MiniWoB++, BrowserGym).

package/README.md CHANGED Viewed

@@ -32,10 +32,10 @@ You:    [keeps coding — agent never asked you to log in]
 | Multi-session safe              | ✅ shared local bridge            | ❌ port collisions             | ❌                             | ❌                             |
 | Network/console capture         | ✅ built-in                       | ✅                             | ✅                             | ⚠️ via extensions             |
 | Honest result envelopes¹       | ✅                                 | ⚠️                            | ❌                             | ❌                             |
-| Built-in benchmark suite²      | ✅ 30+ challenges                  | n/a                           | n/a                           | n/a                           |
+| Built-in benchmark suite²      | ✅ 38 primitives + 4 long-horizon  | n/a                           | n/a                           | n/a                           |
 ¹ Every action returns `pageMutated`, `defaultPrevented`, `elementVisible`, `occludedBy`, and `valueMatches` so the agent knows when a click didn't take effect — instead of looping blindly.
-² See [`test-suite/`](./test-suite) — static pages that grade any browser-control tool on trusted clicks, pointer humanization, keyboard fidelity, drag/drop, clipboard, Shadow DOM, iframes, file uploads, network capture, and fingerprint leaks.
+² See [`test-suite/`](./test-suite) — 38 primitive challenges plus 4 hermetic BrowserGym-style tasks. Scoring is expected-outcome-by-mode (`synthetic` / `trusted` / `manual`), not raw PASS count. Pages grade any browser-control tool on trusted clicks, pointer humanization, keyboard fidelity, drag/drop, clipboard, Shadow DOM, iframes, file uploads, network capture, and fingerprint leaks.
 ---
@@ -240,7 +240,11 @@ There is no network exposure; the bridge binds to loopback only.
 ## Built-in benchmark suite
-[`test-suite/`](./test-suite) is a static page benchmark for **any** browser-control agent (not just pi-chrome). Each challenge exposes `window.__verdict` / `window.__reason` / `window.__events` and a manifest entry with expected results per mode (`synthetic`, `trusted`, `manual`).
+[`test-suite/`](./test-suite) is a benchmark for **any** browser-control agent (not just pi-chrome). It includes **38 primitive challenges** plus **4 hermetic BrowserGym-style long-horizon tasks**.
+Scoring is **expected-outcome-by-mode**, not raw PASS count: each challenge has an expected verdict per mode (`synthetic`, `trusted`, `manual`) and a tool grades itself by whether its actual outcome matches the expected one. This avoids false equivalence between modes — a synthetic-events tool isn't supposed to satisfy a clipboard user-activation gate; matching that expectation is the pass.
+Each challenge exposes `window.__verdict` / `window.__reason` / `window.__events` and a manifest entry with expected results per mode.
 ```bash
 cd test-suite && python3 -m http.server 8765

package/docs/COMPARISON.md CHANGED Viewed

@@ -64,7 +64,7 @@ These wrap a driver with an LLM loop. They are **higher-level than pi-chrome** a
 | Framework                | Driver underneath              | Approach                                                                                      | Open source     |
 | ------------------------ | ------------------------------ | --------------------------------------------------------------------------------------------- | --------------- |
-| **Browser Use**          | Playwright                     | DOM + a11y tree → LLM → action JSON. Open-source leader; widely cited on WebVoyager.          | Apache-2.0 (Python) |
+| **Browser Use**          | Playwright                     | DOM + a11y tree → LLM → action JSON. Open-source leader; widely cited on WebVoyager.          | MIT (Python)    |
 | **Stagehand** (Browserbase) | Playwright                  | Natural-language `.act()` / `.observe()` / `.extract()`; deterministic + AI mix.              | MIT (TypeScript)|
 | **Skyvern**              | Playwright + own DOM model     | Vision-first + DOM; YAML workflows for form/workflow automation.                              | AGPL (Python)   |
 | **Magnitude**            | Playwright                     | NL test authoring; QA-focused.                                                                | open            |
@@ -134,7 +134,7 @@ If your threat model excludes extensions with broad permissions, neither approac
 ## Public benchmarks worth knowing (for axis 2 / axis 3 comparison)
-Pi-chrome itself ships a per-primitive benchmark suite ([`../test-suite/`](../test-suite)) covering trusted-input, pointer humanization, keyboard fidelity, drag/drop, Shadow DOM, file uploads, network observability, fingerprint leaks, and agent-safety honeypots. That's **driver-level** grading.
+Pi-chrome itself ships a benchmark suite ([`../test-suite/`](../test-suite)) of **38 primitive challenges** plus **4 hermetic BrowserGym-style long-horizon tasks** covering trusted-input, pointer humanization, keyboard fidelity, drag/drop, Shadow DOM, file uploads, network observability, fingerprint leaks, and agent-safety honeypots. Scoring is **expected-outcome-by-mode** (not raw PASS count): each challenge has expected verdicts per mode (`synthetic` / `trusted` / `manual`) and a tool grades itself by whether its actual outcome matches expectations. That's **driver-level** grading.
 For **agent-level** comparison (axis 2), the public benchmarks worth citing:

package/extensions/chrome-profile-bridge/browser-extension/manifest.json CHANGED Viewed

@@ -1,7 +1,7 @@
 {
   "manifest_version": 3,
   "name": "Pi Chrome Connector",
-  "version": "0.14.5",
+  "version": "0.14.7",
   "description": "Lets Pi control tabs in Chrome via a local connector at 127.0.0.1.",
   "permissions": ["tabs", "scripting", "storage", "activeTab", "alarms", "webNavigation", "debugger"],
   "host_permissions": ["<all_urls>", "http://127.0.0.1:17318/*"],

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
 	"name": "pi-chrome",
-	"version": "0.14.5",
+	"version": "0.14.7",
 	"description": "The de-facto browser automation toolkit for Pi agents. Drive your existing logged-in Chrome — no re-login, no throwaway profile, no CDP. 20+ tools (click, type, navigate, screenshot, network capture, file upload, drag, scroll, touch) + honest result envelopes + a built-in benchmark suite.",
 	"keywords": [
 		"pi",