npm - pi-chrome - Versions diffs - 0.14.4 → 0.14.5 - Mend

pi-chrome 0.14.4 → 0.14.5

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (5) hide show

package/CHANGELOG.md +6 -0
package/README.md +2 -0
package/docs/COMPARISON.md +129 -50
package/extensions/chrome-profile-bridge/browser-extension/manifest.json +1 -1
package/package.json +1 -1

package/CHANGELOG.md CHANGED Viewed

@@ -2,6 +2,12 @@
 All notable user-facing changes to `pi-chrome`.
+## 0.14.5
+- `docs/COMPARISON.md` rewritten with a three-axis landscape (drivers / agent frameworks / cloud providers). Adds Browser Use, Stagehand, Skyvern, Magnitude, Alumnium, OpenAI Operator, Project Mariner, Surfer 2, Anthropic Computer Use, Browserbase, Steel.dev, Hyperbrowser, Anchor, Browserless. Adds Interop section, public-benchmark cheat sheet (WebArena, WorkArena++, BrowseComp, Mind2Web 2, WebChoreArena, MiniWoB++, BrowserGym).
+- README gains a one-liner pointing at the new three-axis framing.
+- Sourced from `benchmark-search` session research. No code changes.
 ## 0.14.4
 - Sync `manifest.json` version to match `package.json` (0.14.3 shipped with stale manifest, would trigger spurious `/chrome doctor` drift warnings). No code or behavior changes.

package/README.md CHANGED Viewed

@@ -20,6 +20,8 @@ You:    [keeps coding — agent never asked you to log in]
 ## Why pi-chrome vs. everything else
+> Short version: **pi-chrome is primitives — "Playwright for the Chrome you're already signed into."** Not an agent loop. Plug it under any agent framework (Browser Use, Stagehand, LangGraph) or call its tools directly from a Pi agent. See [docs/COMPARISON.md](./docs/COMPARISON.md) for the full three-axis landscape (drivers, agents, cloud providers).
 |                                | **pi-chrome**                     | Playwright / Puppeteer        | CDP-based agents              | Selenium / WebDriver          |
 | ------------------------------ | --------------------------------- | ----------------------------- | ----------------------------- | ----------------------------- |
 | Uses your real signed-in Chrome | ✅ yes (extension in your profile) | ❌ throwaway profile            | ⚠️ requires `--remote-debug` | ❌ throwaway profile            |

package/docs/COMPARISON.md CHANGED Viewed

@@ -1,87 +1,166 @@
-# pi-chrome vs. other browser-automation stacks
+# pi-chrome vs. the rest of the browser-automation landscape
-This is the "should I use pi-chrome?" page. Honest comparisons. We benchmark in public — see [`../test-suite/`](../test-suite).
+This is the honest "which tool when" page. The browser-automation space has three different layers and people often compare across them — let's not do that.
+We benchmark in public — see [`../test-suite/`](../test-suite). Where exact scores matter (WebVoyager, WorkArena++, BrowseComp, Mind2Web 2), check the live leaderboards; they shift monthly.
+---
 ## TL;DR
 | You are…                                                       | Use…                            |
 | -------------------------------------------------------------- | ------------------------------- |
 | A Pi agent operator who wants the agent to use **your real Chrome** (logged-in tabs, cookies, extensions) | **pi-chrome**                   |
+| Building a Pi/LLM agent and want low-level browser primitives  | **pi-chrome**                   |
 | Writing deterministic end-to-end tests in CI                   | Playwright / Cypress            |
-| Building a hosted scraping fleet on isolated profiles          | Playwright / Puppeteer / Browserless |
-| Building a "general web agent" SaaS that fetches its own pages  | browser-use / Stagehand / Skyvern    |
+| Building a hosted scraping/agent fleet on isolated profiles    | Playwright / Puppeteer + Browserbase / Steel |
+| Want a turnkey "natural-language agent" with built-in loop     | Browser Use / Stagehand / Skyvern |
+| Want a hosted, vendor-managed agent                            | OpenAI Operator / Project Mariner / Surfer |
 | Debugging your own app from your editor without leaving your real session | **pi-chrome**                   |
-Different tools, different jobs. `pi-chrome` is the only one that says: *"don't make me sign in again to do work I'd do as myself."*
+`pi-chrome` is **primitives**, not an opinionated agent loop. Think of it as **"Playwright for the Chrome you're already signed into"** — and pluggable under any agent framework above the line.
+---
+## The three axes you should compare on
+| Axis                              | Examples                                                              | What it gives you             |
+| --------------------------------- | --------------------------------------------------------------------- | ----------------------------- |
+| **1. Driver / transport**         | Playwright, Puppeteer, Selenium, CDP raw, `puppeteer-stealth`, **pi-chrome** | low-level tools (click, type, navigate) |
+| **2. Agent framework**            | Browser Use, Stagehand, Skyvern, Magnitude, Alumnium, LangGraph-with-Playwright, Operator, Mariner, Surfer | LLM loop, planning, NL API    |
+| **3. Cloud browser provider**     | Browserbase, Steel.dev, Hyperbrowser, Anchor, Browserless             | managed Chromes, sessions, quotas |
+`pi-chrome` is **axis 1**. You can wrap it with any axis-2 agent framework, or run it directly from a Pi agent's `chrome_*` tool calls. It explicitly does NOT compete with axis 3 — it runs locally inside *your* Chrome.
+---
+## Axis 1 — drivers (where pi-chrome lives)
+| Tool                              | Transport                                  | Profile                            | Trusted events       | Banner when controlling            | Default detectable as bot |
+| --------------------------------- | ------------------------------------------ | ---------------------------------- | -------------------- | ----------------------------------- | ------------------------- |
+| Playwright                        | CDP (own driver)                           | throwaway by default               | always               | always ("controlled by test software") | yes (webdriver flag, automation flags) |
+| Puppeteer                         | CDP                                        | throwaway by default               | always               | always                              | yes                       |
+| Selenium                          | WebDriver / BiDi                           | throwaway                          | partial (BiDi improves) | always                          | most detectable           |
+| puppeteer-stealth / playwright-extra | CDP + patches                           | throwaway                          | always               | always                              | medium (patches flags)    |
+| Raw CDP                           | direct devtools protocol                   | either (needs `--remote-debugging-port`) | always           | always                              | yes                       |
+| **pi-chrome**                     | **Chrome extension bridge → local loopback** | **your real Chrome profile, signed-in cookies, extensions, history** | **opt-in** (`/chrome clicks on` or `trusted: true`) | **only when trusted mode is active** | **synthetic mode bypasses common detection signals**¹ |
+¹ pi-chrome synthetic mode dispatches DOM events with `isTrusted=false` — most sites don't check; some anti-bot defenses do. The [`test-suite/`](../test-suite) grades both modes against common detection signals. Trusted mode uses `chrome.debugger` and shows Chrome's banner like every other CDP-based tool.
+### What makes pi-chrome different on this axis
+1. **Profile attach, not driver launch.** Every other driver fights cookie persistence, login walls, MFA, and extension state. pi-chrome inherits all of it because it *is* your Chrome.
+2. **Synthetic-first, trusted-on-demand.** Two-tier event model — agents pick the right tradeoff per call. Competitors are all-trusted (CDP) and always show the banner. pi-chrome avoids it by default; you opt in when a site needs it (autoplay gate, clipboard, file picker).
+3. **Extension bridge transport.** No `--remote-debugging-port`, no throwaway Chromium. Survives Chrome auto-updates. Works alongside your normal Chrome usage.
+4. **Honest result envelopes.** Every action returns `pageMutated`, `defaultPrevented`, `elementVisible`, `occludedBy`, `valueMatches`. Competitors return `void` or generic acks; agents loop blindly on broken clicks.
+5. **Multi-session shared bridge.** Planner + worker + audit Pi sessions all drive the same Chrome concurrently.
+6. **Stable element uids.** `chrome_snapshot` returns deterministic uids you can pass to subsequent actions — similar to BrowserGym's `bid`, but built into the snapshot tool itself.
+---
+## Axis 2 — agent frameworks (built on top of axis 1)
+These wrap a driver with an LLM loop. They are **higher-level than pi-chrome** and **complementary**, not competitors.
+| Framework                | Driver underneath              | Approach                                                                                      | Open source     |
+| ------------------------ | ------------------------------ | --------------------------------------------------------------------------------------------- | --------------- |
+| **Browser Use**          | Playwright                     | DOM + a11y tree → LLM → action JSON. Open-source leader; widely cited on WebVoyager.          | Apache-2.0 (Python) |
+| **Stagehand** (Browserbase) | Playwright                  | Natural-language `.act()` / `.observe()` / `.extract()`; deterministic + AI mix.              | MIT (TypeScript)|
+| **Skyvern**              | Playwright + own DOM model     | Vision-first + DOM; YAML workflows for form/workflow automation.                              | AGPL (Python)   |
+| **Magnitude**            | Playwright                     | NL test authoring; QA-focused.                                                                | open            |
+| **Alumnium**             | Selenium / Playwright          | Test-author NL → agent. QA-focused.                                                           | open            |
+| **LangGraph / AutoGen + Playwright** | Playwright             | Generic agent graph + browser tools.                                                          | open            |
+| **OpenAI Operator**      | proprietary                    | OpenAI's own VLM + browser; ChatGPT-integrated.                                               | closed, hosted  |
+| **Project Mariner** (Google) | proprietary Chrome integration | Google's own VLM Chrome experiment.                                                       | closed          |
+| **Surfer 2 / Surfer-H** (H Company) | proprietary             | Hosted proprietary agent stack.                                                               | closed, hosted  |
+| **Anthropic Computer Use** | OS-level screenshots + mouse/keyboard | Broader than browser; trusted events at OS level.                                       | closed (API)    |
+**Why pi-chrome is not on this list:** it's intentionally **not an agent**. There's no LLM loop, no `.act("click the blue button")`. Pi handles the loop; pi-chrome provides the primitives. This means:
+- You can use pi-chrome under Browser Use, Stagehand, LangGraph, or any other agent framework (see [Interop](#interop)).
+- You don't pay for an opinion you don't want.
+- Agent improvements compound across all your workflows because the primitives stay stable.
 ---
-## Feature matrix
-|                                                | **pi-chrome**     | Playwright          | Puppeteer           | Stagehand           | browser-use         | Selenium            |
-| ---------------------------------------------- | ----------------- | ------------------- | ------------------- | ------------------- | ------------------- | ------------------- |
-| Drives your **real, signed-in** Chrome profile | ✅                | ❌ throwaway        | ❌ throwaway        | ❌ throwaway        | ❌ throwaway        | ❌ throwaway        |
-| Zero re-login                                   | ✅                | ❌                  | ❌                  | ❌                  | ❌                  | ❌                  |
-| First-class Pi tool integration                | ✅ 20 tools        | n/a                 | n/a                 | n/a                 | n/a                 | n/a                 |
-| Synthetic input (fast, no banner)               | ✅                | ❌ always trusted   | ❌                  | ❌                  | ❌                  | ❌                  |
-| Trusted (CDP) input on demand                  | ✅ opt-in          | ✅ always           | ✅ always           | ✅                  | ✅                  | ✅                  |
-| Honest result envelopes (`pageMutated`, `occludedBy`, `valueMatches`) | ✅                | ⚠️ partial          | ⚠️ partial          | ❌                  | ❌                  | ❌                  |
-| Network capture w/ response body               | ✅                | ✅                  | ✅                  | ⚠️                  | ⚠️                  | ⚠️                  |
-| Console capture                                 | ✅                | ✅                  | ✅                  | ⚠️                  | ⚠️                  | ⚠️                  |
-| Real touch events (mobile PWAs)                | ✅ `chrome_tap`    | ✅                  | ✅                  | ⚠️                  | ⚠️                  | ❌                  |
-| File upload through React/Vue controlled inputs | ✅ no native picker | ✅                  | ✅                  | ✅                  | ⚠️                  | ⚠️                  |
-| HTML5 drag/drop with `DataTransfer`            | ✅                | ✅                  | ⚠️                  | ⚠️                  | ⚠️                  | ⚠️                  |
-| Multi-session shared bridge                    | ✅                | ❌                  | ❌                  | ❌                  | ❌                  | ❌                  |
-| MIT-licensed                                   | ✅                | Apache-2            | Apache-2            | MIT                 | MIT                 | Apache-2            |
-| Watch the agent work, live                     | ✅ default         | ❌ headless / new   | ❌                  | ❌                  | ❌                  | ❌                  |
+## Axis 3 — cloud browser providers (orthogonal)
+| Provider          | What it sells                                            |
+| ----------------- | -------------------------------------------------------- |
+| **Browserbase**   | Managed browsers; pairs with Stagehand.                  |
+| **Steel.dev**     | Managed browsers + public agent leaderboards.            |
+| **Hyperbrowser**  | Managed browsers, session APIs.                          |
+| **Anchor Browser**| Managed browsers.                                        |
+| **Browserless**   | Managed Chrome, scraping focus.                          |
+**pi-chrome doesn't compete here.** It runs locally in *your* Chrome. The right framing if someone asks: *"no cloud cost, no session-handoff, no rate limits — and the agent runs against your real logged-in state."* If you need fleets of isolated Chromes in CI, you want one of these.
 ---
-## "But Playwright has a `storageState` option"
+## Interop
-Yes — you can export cookies and replay them. In practice for agent workflows that breaks down fast:
+`pi-chrome` exposes tools that any Pi agent can call. If you want to use it from outside Pi:
-1. **OAuth + SSO** providers (Okta, Google, GitHub) often pin sessions to TLS fingerprints, device IDs, and browser-extension state that doesn't survive replay.
-2. **MFA** tokens expire. Re-prompts in the middle of an agent run = dead end.
-3. **Internal admin tools** often hard-pin to your real device for security.
-4. **Your editor / IDE / colleagues** see your real Chrome, not a hidden one. Watching an agent in a separate window is jarring.
+1. The local bridge speaks HTTP JSON-RPC at `127.0.0.1:17318` (default). The API is internal but stable across patch versions.
+2. Tool surface mirrors Playwright closely (click/type/navigate/snapshot/screenshot/evaluate/wait_for) so adapter code is short.
+3. Honest envelopes (`pageMutated`, `valueMatches`, `occludedBy`) let agent harnesses skip retry/heal logic.
-`pi-chrome` sidesteps all of this by living **inside** the Chrome profile you're already logged into. The agent operates as you, because it literally is you (in your tabs).
+If you want a first-class pi-chrome adapter for Browser Use / Stagehand / LangGraph, file an issue with your use case.
+---
+## "But Playwright has `storageState` / Puppeteer has user-data-dir"
+Yes — you can export cookies and replay them, or point Playwright at your existing profile directory. In practice for agent workflows that breaks down fast:
+1. **OAuth + SSO** providers (Okta, Google, GitHub) often pin sessions to TLS fingerprints, device IDs, and browser-extension state that doesn't survive replay or a parallel Chrome instance.
+2. **MFA** tokens expire mid-run.
+3. **Internal admin tools** hard-pin to your real device.
+4. **Pointing Playwright at your real `user-data-dir`** requires closing your normal Chrome (Chrome won't share the profile lock). pi-chrome doesn't fight you for the profile because it lives *inside* it.
+5. **Watching the agent work** in your real window is a different UX than a hidden parallel Chrome. Demos, pair-driving, and confidence-building all want axis-1-with-attach.
 ---
 ## "Is this safer than CDP?"
-It's a different security boundary, not strictly safer. Trade-offs:
+Different security boundary, not strictly safer.
-- **CDP-based tools** require `chrome --remote-debugging-port`. That port is unauthenticated and exposes the whole browser to any local process. Easy to misconfigure.
-- **pi-chrome** runs through an extension you install yourself with broad permissions (tabs, scripting, debugger). The bridge listens on `127.0.0.1:17318` loopback only. **Only install the bundled extension if you trust the source you got the npm package from.**
+- **CDP-based tools** require `chrome --remote-debugging-port=...`. That port is unauthenticated and exposes the whole browser to any local process. Easy to misconfigure.
+- **pi-chrome** runs through an extension you install yourself with broad permissions (tabs, scripting, debugger, webNavigation). The bridge listens on `127.0.0.1:17318` loopback only. **Only install the bundled extension if you trust the source you got the npm package from.**
 If your threat model excludes extensions with broad permissions, neither approach is a fit — you want a sandboxed CI runner.
 ---
-## When NOT to use pi-chrome
+## Public benchmarks worth knowing (for axis 2 / axis 3 comparison)
+Pi-chrome itself ships a per-primitive benchmark suite ([`../test-suite/`](../test-suite)) covering trusted-input, pointer humanization, keyboard fidelity, drag/drop, Shadow DOM, file uploads, network observability, fingerprint leaks, and agent-safety honeypots. That's **driver-level** grading.
+For **agent-level** comparison (axis 2), the public benchmarks worth citing:
+| Benchmark        | What it measures                                  | Notes                                            |
+| ---------------- | ------------------------------------------------- | ------------------------------------------------ |
+| **WebArena** (CMU) | Hermetic, programmatic graders                  | Gold standard for reproducibility.               |
+| **WorkArena++** (ServiceNow) | Enterprise SaaS workflows             | Hardest realistic benchmark; <5% frontier.        |
+| **BrowseComp** (OpenAI) | Hard info-retrieval                        | Not saturated.                                    |
+| **Mind2Web 2** (NeurIPS '25) | Long-horizon, rubric-tree judge       | New, well-designed.                              |
+| **WebChoreArena** | Tedious cross-page workflows                     | Reflects real ops work.                          |
+| **WebVoyager**   | Live web tasks                                    | **Saturated** — 90%+ scores common; cite only as smoke test. |
+| **VisualWebArena** | Multimodal                                      |                                                  |
+| **MiniWoB++**    | Classic unit-task suite                           |                                                  |
+| **BrowserGym + AgentLab** | Research harness covering the above     | The de-facto research API; pi-chrome's snapshot uid is comparable to BrowserGym's `bid`. |
-- **Headless production scraping at scale** — you want isolated profiles, parallel containers, retries. Use Playwright or Browserless.
-- **Cross-browser testing** — pi-chrome is Chrome only by design (it's a Chrome extension). Use Playwright for Firefox/WebKit.
-- **Untrusted page execution** — same reason. Use a sandbox.
-- **You don't already use Chrome** — the whole value proposition is "your real Chrome". If your day is in Arc / Firefox / Brave, you'll get less out of it (though Brave/other Chromium browsers work fine technically).
+Cite live leaderboards rather than hard-coded numbers; agent scores shift monthly.
 ---
-## Reproducing this table
+## Reproducing pi-chrome's driver-level claims
-Run [`../test-suite/`](../test-suite) against your tool of choice and open a PR with your scores. The benchmark covers:
+Run [`../test-suite/`](../test-suite) against any browser-control tool. Each challenge exposes `window.__verdict` / `window.__reason` / `window.__events`, so any tool (Playwright, Puppeteer, Selenium, Stagehand, pi-chrome) can grade itself deterministically.
-- trusted-input gates (clipboard, fullscreen, file picker, autoplay)
-- pointer humanization (entropy, paths, timing)
-- keyboard fidelity (Tab flows, modifier chords, IME)
-- React-style value tracking and contenteditable
-- Shadow DOM + iframe targeting
-- file attachment to `<input type=file>`
-- console & network observability
-- fingerprint leaks (`navigator.webdriver`, runtime flags)
-- agent-safety honeypots
+```bash
+cd test-suite && python3 -m http.server 8765
+# open http://127.0.0.1:8765/ in the Chrome window the tool controls
+```
-`window.__verdict` / `window.__reason` / `window.__events` make it deterministic for any tool to grade itself.
+If you build a competing tool, please open a PR with your scores.

package/extensions/chrome-profile-bridge/browser-extension/manifest.json CHANGED Viewed

@@ -1,7 +1,7 @@
 {
   "manifest_version": 3,
   "name": "Pi Chrome Connector",
-  "version": "0.14.4",
+  "version": "0.14.5",
   "description": "Lets Pi control tabs in Chrome via a local connector at 127.0.0.1.",
   "permissions": ["tabs", "scripting", "storage", "activeTab", "alarms", "webNavigation", "debugger"],
   "host_permissions": ["<all_urls>", "http://127.0.0.1:17318/*"],

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
 	"name": "pi-chrome",
-	"version": "0.14.4",
+	"version": "0.14.5",
 	"description": "The de-facto browser automation toolkit for Pi agents. Drive your existing logged-in Chrome — no re-login, no throwaway profile, no CDP. 20+ tools (click, type, navigate, screenshot, network capture, file upload, drag, scroll, touch) + honest result envelopes + a built-in benchmark suite.",
 	"keywords": [
 		"pi",