pi-chrome 0.15.15 → 0.15.17

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/CHANGELOG.md CHANGED
@@ -2,6 +2,15 @@
2
2
 
3
3
  All notable user-facing changes to `pi-chrome`.
4
4
 
5
+ ## 0.15.17 — 2026-05-14
6
+
7
+ - **Docs accuracy pass.** Updated README, FAQ, comparison, contributing notes, and package metadata for the current real-input-only, terminal-authorized tool surface.
8
+ - **Input verification fix.** `includeSnapshot=true` now works for `chrome_click`, `chrome_type`, `chrome_fill`, and `chrome_key`, returning the Chrome-input result plus a fresh snapshot.
9
+
10
+ ## 0.15.16 — 2026-05-14
11
+
12
+ - **Visible `/chrome` loading state.** Bare `/chrome` and `/chrome status` now immediately say “Checking Chrome connection…” before probing the companion extension, so a slow Chrome bridge no longer looks like the command did nothing.
13
+
5
14
  ## 0.15.15 — 2026-05-14
6
15
 
7
16
  - **Terminal authorization restored.** `/chrome authorize` is back to terminal-based confirmation. Removed the browser-side Chrome consent page and companion-extension consent polling.
package/CONTRIBUTING.md CHANGED
@@ -5,8 +5,8 @@ Thanks for considering a contribution. pi-chrome aims to be the **de-facto brows
5
5
  ## Non-negotiables
6
6
 
7
7
  1. **No re-login.** Every change must keep working against the user's already-signed-in Chrome profile. Anything that requires a fresh profile or extra auth steps is out of scope.
8
- 2. **Honest result envelopes.** Every action tool returns `pageMutated`, `defaultPrevented`, `elementVisible`, `occludedBy` (when relevant), `valueMatches` (for input). Agents need to know **why** something didn't take effect.
9
- 3. **Quiet by default, trusted by opt-in.** Synthetic DOM events first. CDP/`chrome.debugger` only when explicitly requested (`trusted: true`) or when the smart-auto heuristic detects an obvious user-activation gate.
8
+ 2. **Verifiable action results.** Input tools must return structured details and support `includeSnapshot` where verification matters. Agents need enough evidence to avoid blind retries.
9
+ 3. **Chrome real input.** Interactive controls use Chrome's input layer through `chrome.debugger`; do not re-expose synthetic/untrusted input as public UX.
10
10
  4. **Benchmarks gate features.** Add a page in `test-suite/` that fails before your change and passes after. We accept PRs faster when there's a green/red verdict to point at.
11
11
 
12
12
  ## Local dev
@@ -25,7 +25,7 @@ python3 -m http.server 8765
25
25
 
26
26
  1. Register in `extensions/chrome-profile-bridge/index.ts` (the `register*Tool` calls near line 840+).
27
27
  2. Implement the handler in `extensions/chrome-profile-bridge/browser-extension/service_worker.js`.
28
- 3. Return a `pageMutated` + relevant fields.
28
+ 3. Return structured details and support `includeSnapshot` for user-visible state changes when relevant.
29
29
  4. Add a benchmark page under `test-suite/challenges/` and a manifest entry.
30
30
  5. Update `README.md` "What an agent gets" table.
31
31
  6. Add a `CHANGELOG.md` entry.
package/README.md CHANGED
@@ -1,7 +1,7 @@
1
1
  # pi-chrome
2
2
 
3
3
  > **The fastest way to give a [Pi](https://pi.dev) agent your real Chrome.**
4
- > No CDP. No throwaway profile. No re-login. Watch it work — or run silent.
4
+ > No remote-debug port. No throwaway profile. No re-login. Watch it work — or run silent.
5
5
 
6
6
  **MIT · 0 runtime deps · loopback-only bridge (`127.0.0.1:17318`) · inspect [`extensions/chrome-profile-bridge/browser-extension/`](./extensions/chrome-profile-bridge/browser-extension) before loading.** Verify connectivity in one command: `/chrome doctor`.
7
7
 
@@ -12,7 +12,7 @@ Agent: chrome_tab(list) → chrome_snapshot(uid:…) → chrome_screenshot(...)
12
12
  You: [keeps coding — agent never asked you to log in]
13
13
  ```
14
14
 
15
- `pi-chrome` ships **20+ browser tools** for Pi agents, backed by a small MIT-licensed Chrome extension that runs inside the Chrome profile **you already use** — including every site you're already signed into.
15
+ `pi-chrome` ships **19 browser tools** for Pi agents, backed by a small MIT-licensed Chrome extension that runs inside the Chrome profile **you already use** — including every site you're already signed into.
16
16
 
17
17
  ---
18
18
 
@@ -30,7 +30,7 @@ Then in Pi:
30
30
 
31
31
  On macOS this opens `chrome://extensions`, reveals the bundled `browser-extension/` folder in Finder, and copies its path to your clipboard. In Chrome: **Developer mode** → **Load unpacked** → paste the path. Done.
32
32
 
33
- Verify, then authorize current Pi session in Pi:
33
+ Verify, then authorize current Pi session from the terminal:
34
34
 
35
35
  ```text
36
36
  /chrome doctor
@@ -120,27 +120,23 @@ You: [files the ticket with the folder attached]
120
120
 
121
121
  ---
122
122
 
123
- ## Honest results
123
+ ## Verifiable actions
124
124
 
125
- Most browser-automation libraries return `void` or a generic ack. `pi-chrome` returns a structured envelope on every interaction:
125
+ Input tools return structured details such as the coordinates used, target tag, uploaded paths, key pressed, or scroll distance. For click/type/fill/key calls, pass `includeSnapshot: true` to get a fresh page snapshot in the same result:
126
126
 
127
127
  ```text
128
- chrome_click(occluded-button) →
129
- "Clicked el-3 pageMutated=false; occluded by <div#overlay>"
128
+ chrome_click(uid:"el-3", includeSnapshot:true) →
129
+ result: { input:"chrome", x:412, y:238, tag:"BUTTON" }
130
+ snapshot: { title, url, text, elements:[...] }
130
131
  ```
131
132
 
132
- ```text
133
- chrome_type(react-input, "hello") →
134
- "Typed into el-7 — valueMatches=true; pageMutated=true"
135
- ```
136
-
137
- This is why agents using pi-chrome don't get stuck in retry loops on broken sites. They get the **reason** the action didn't land and can fix course in one turn.
133
+ Agents can verify page state immediately instead of blindly retrying.
138
134
 
139
135
  ---
140
136
 
141
137
  ## What an agent gets
142
138
 
143
- **20 tools**, grouped by job. Every one runs against your already-open tabs.
139
+ **19 tools**, grouped by job. Every one runs against your already-open tabs.
144
140
 
145
141
  | Category | Tools |
146
142
  | --------------- | ---------------------------------------------------------------------------------------------- |
@@ -52,7 +52,7 @@ We benchmark in public — see [`../test-suite/`](../test-suite). Where exact sc
52
52
  1. **Profile attach, not driver launch.** Every other driver fights cookie persistence, login walls, MFA, and extension state. pi-chrome inherits all of it because it *is* your Chrome.
53
53
  2. **Chrome input against your real profile.** Interactive tools use CDP input for reliability while still controlling the Chrome profile you already use.
54
54
  3. **Extension bridge transport.** No `--remote-debugging-port`, no throwaway Chromium. Survives Chrome auto-updates. Works alongside your normal Chrome usage.
55
- 4. **Honest result envelopes.** Every action returns `pageMutated`, `defaultPrevented`, `elementVisible`, `occludedBy`, `valueMatches`. Competitors return `void` or generic acks; agents loop blindly on broken clicks.
55
+ 4. **Structured action results.** Input tools return target coordinates/tags and can include a fresh snapshot (`includeSnapshot`) so agents can verify state instead of blindly retrying.
56
56
  5. **Multi-session shared bridge.** Planner + worker + audit Pi sessions all drive the same Chrome concurrently.
57
57
  6. **Stable element uids.** `chrome_snapshot` returns deterministic uids you can pass to subsequent actions — similar to BrowserGym's `bid`, but built into the snapshot tool itself.
58
58
 
@@ -101,9 +101,9 @@ These wrap a driver with an LLM loop. They are **higher-level than pi-chrome** a
101
101
 
102
102
  `pi-chrome` exposes tools that any Pi agent can call. If you want to use it from outside Pi:
103
103
 
104
- 1. The local bridge speaks HTTP JSON-RPC at `127.0.0.1:17318` (default). The API is internal but stable across patch versions.
104
+ 1. The local bridge speaks HTTP JSON over `127.0.0.1:17318` (default). The API is internal; use the Pi tool surface unless you are building an adapter.
105
105
  2. Tool surface mirrors Playwright closely (click/type/navigate/snapshot/screenshot/evaluate/wait_for) so adapter code is short.
106
- 3. Honest envelopes (`pageMutated`, `valueMatches`, `occludedBy`) let agent harnesses skip retry/heal logic.
106
+ 3. `includeSnapshot` on input tools lets agent harnesses verify state after actions.
107
107
 
108
108
  If you want a first-class pi-chrome adapter for Browser Use / Stagehand / LangGraph, file an issue with your use case.
109
109
 
package/docs/EXAMPLES.md CHANGED
@@ -119,10 +119,8 @@ On my staging app:
119
119
  ### React controlled inputs
120
120
 
121
121
  ```text
122
- chrome_fill (not chrome_type) for React inputs it uses the
123
- framework-aware native value setter so the form's state actually updates.
124
- After each fill, the result envelope's valueMatches=true confirms the
125
- component re-rendered with the new value.
122
+ Use `chrome_fill` for React inputs when you want to replace the full value.
123
+ Pass `includeSnapshot=true` to verify the component re-rendered with the new value.
126
124
  ```
127
125
 
128
126
  ### File upload without the native picker
@@ -160,7 +158,7 @@ Interactive tools use Chrome's real input layer by default: clicks, typing, fill
160
158
  - sign-in flows
161
159
  - guarded buttons
162
160
  - audio/video controls
163
- - fullscreen / permission prompts
161
+ - fullscreen and other user-activation checks
164
162
  - pages with strict CSP or user-activation checks
165
163
 
166
164
  Chrome may show its debugger banner while pi-chrome is attached.
package/docs/FAQ.md CHANGED
@@ -32,9 +32,9 @@ Chrome control is also locked per Pi session until you run `/chrome authorize`;
32
32
 
33
33
  Yes. The first session opens the local bridge; later sessions detect it and pipe their commands through the same bridge. Each Pi session must be authorized with `/chrome authorize` before its chrome_* tools work.
34
34
 
35
- ## Why can't this be on the Chrome Web Store?
35
+ ## Why ship as an unpacked extension?
36
36
 
37
- Web Store extensions cannot communicate with a local process bridge controlled by another tool Google's policy. pi-chrome must ship as an unpacked extension you load yourself. The upside: you can read the source. The downside: each Chrome update may prompt you to re-confirm.
37
+ pi-chrome ships as an unpacked extension so the source and broad browser permissions are easy to inspect and update with the npm package. The downside: you load it manually from `chrome://extensions` and reload it after package updates.
38
38
 
39
39
  ## What happens when I update pi-chrome?
40
40
 
@@ -42,8 +42,8 @@ Web Store extensions cannot communicate with a local process bridge controlled b
42
42
 
43
43
  ## What's the install footprint?
44
44
 
45
- - Pi side: one extension that registers ~20 tools and a few slash commands.
46
- - Chrome side: one unpacked extension, ~5000 LOC of plain JavaScript, no dependencies.
45
+ - Pi side: one extension that registers 19 tools and a few slash commands.
46
+ - Chrome side: one unpacked extension, ~2000 LOC of plain JavaScript, no dependencies.
47
47
 
48
48
  ## Can I script it without Pi?
49
49
 
@@ -53,18 +53,11 @@ The Pi-facing tools are thin wrappers around an HTTP bridge at `127.0.0.1:17318`
53
53
 
54
54
  Not always. `chrome_evaluate` and `chrome_snapshot` run in the page's MAIN world through the Function constructor, so pages whose CSP blocks `'unsafe-eval'` can reject them. `chrome_screenshot`, `chrome_navigate`, tab tools, and real Chrome input still work because they use extension/browser APIs rather than page JavaScript.
55
55
 
56
- ## Why does my click return `pageMutated=false`?
56
+ ## How do I tell whether a click or type worked?
57
57
 
58
- Either:
59
- - The element was occluded (look for `occludedBy: <selector>` in the envelope).
60
- - The click handler called `event.preventDefault()` and the page intentionally ignored it.
61
- - The target changed after your snapshot; take a fresh snapshot or screenshot.
58
+ Use `includeSnapshot=true` on `chrome_click`, `chrome_type`, `chrome_fill`, or `chrome_key`. The tool returns the Chrome-input result plus a fresh snapshot, so the agent can verify text, URL, visible elements, or form values before continuing.
62
59
 
63
- The result envelope tells you which one. **Don't blind-retry.**
64
-
65
- ## Why does `chrome_type` return `valueMatches=false`?
66
-
67
- The field rejected or transformed the typed value. Common culprits: contenteditable rich-text editors, native date pickers, masked-input libraries, or masks. Try `chrome_fill`, then verify with `includeSnapshot=true`.
60
+ If the page did not change, take a fresh snapshot or screenshot and check for overlays, disabled controls, stale element uids, or app-side validation.
68
61
 
69
62
  ## How do I attach a file to a React file input?
70
63
 
@@ -1,7 +1,7 @@
1
1
  {
2
2
  "manifest_version": 3,
3
3
  "name": "Pi Chrome Connector",
4
- "version": "0.15.15",
4
+ "version": "0.15.17",
5
5
  "description": "Lets Pi control tabs in Chrome via a local connector at 127.0.0.1.",
6
6
  "permissions": [
7
7
  "tabs",
@@ -712,7 +712,7 @@ async function dispatch(action, params) {
712
712
  case "page.evaluate":
713
713
  return evaluateInTab(params);
714
714
  case "page.click":
715
- return chromeInputClick(params);
715
+ return withOptionalSnapshot(params, chromeInputClick);
716
716
  case "page.hover":
717
717
  return chromeInputHover(params);
718
718
  case "page.drag":
@@ -720,11 +720,11 @@ async function dispatch(action, params) {
720
720
  case "page.upload":
721
721
  return chromeInputUpload(params);
722
722
  case "page.type":
723
- return chromeInputType(params);
723
+ return withOptionalSnapshot(params, chromeInputType);
724
724
  case "page.fill":
725
- return chromeInputFill(params);
725
+ return withOptionalSnapshot(params, chromeInputFill);
726
726
  case "page.key":
727
- return chromeInputKey(params);
727
+ return withOptionalSnapshot(params, chromeInputKey);
728
728
  case "page.scroll":
729
729
  return chromeInputScroll(params);
730
730
  case "page.tap":
@@ -932,8 +932,8 @@ async function evaluateInTab(params) {
932
932
  return v;
933
933
  }
934
934
 
935
- async function executeActionInTab(params, func, args) {
936
- const result = await executeInTab(params, func, args);
935
+ async function withOptionalSnapshot(params, actionFn) {
936
+ const result = await actionFn(params);
937
937
  if (params.includeSnapshot) {
938
938
  const snapshot = await executeInTab({ ...params, foreground: false }, snapshotPage, [params.maxElements || 80, null, null, null]);
939
939
  return { result, snapshot };
@@ -8,9 +8,8 @@ import { dirname, join, resolve } from "node:path";
8
8
  /**
9
9
  * Existing-profile Chrome bridge for pi.
10
10
  *
11
- * This is intentionally not a Chrome DevTools Protocol integration. CDP cannot attach to
12
- * already-running normal Chrome windows and recent Chrome builds block default-profile
13
- * remote debugging. Instead, install the companion Chrome extension from the
11
+ * This is intentionally not a remote-debugging-port integration. Chrome blocks default-profile
12
+ * remote debugging in many normal launches, so pi-chrome uses a companion extension from the
14
13
  * browser-extension folder bundled next to this Pi extension.
15
14
  *
16
15
  * The companion extension runs inside the user's real Chrome profile and polls this local
@@ -496,18 +495,18 @@ export default function (pi: ExtensionAPI): void {
496
495
  pi.on("before_agent_start", (event) => {
497
496
  const primer = `
498
497
  <chrome-profile-bridge>
499
- Chrome control is available through the chrome_* tools via a companion Chrome extension installed in the user's normal Chrome profile. Tools target the existing signed-in profile, no CDP, no throwaway profile.
498
+ Chrome control is available through the chrome_* tools via a companion Chrome extension installed in the user's normal Chrome profile. Tools target the existing signed-in profile: no remote-debug port, no throwaway profile.
500
499
 
501
500
  Capability model (important):
502
501
  - Interactive controls (click/type/fill/key/hover/drag/scroll/tap) use Chrome's real input layer via chrome.debugger / CDP. Events satisfy normal user-activation gates.
503
502
  - Input bypasses page CSP because it is injected at browser input layer, not page JavaScript. Chrome may show the “Pi Chrome Connector started debugging this browser” banner while attached.
504
503
  - \`chrome_evaluate\` and \`chrome_snapshot\` run in MAIN world via the **Function constructor**, which requires \`'unsafe-eval'\` in the page CSP. Pages with strict CSP (e.g. github.com, many bank/SaaS apps) will throw \`EvalError: ... 'unsafe-eval' is not an allowed source of script\` and chrome_snapshot will return empty. On those pages, drive the page with \`chrome_screenshot\` + viewport-coordinate \`chrome_click\`/\`chrome_type\`/\`chrome_key\`. \`chrome_navigate\`, \`chrome_screenshot\`, \`chrome_tab\`, and Chrome input all keep working under any CSP.
505
- - Tool results include \`pageMutated\`, \`defaultPrevented\`, \`elementVisible\`, \`occludedBy\`, and (for type/fill) \`valueMatches\`. If an action result indicates no page change or occlusion, inspect current page state instead of repeating blindly.
504
+ - Input tools return structured details and support \`includeSnapshot=true\` on click/type/fill/key. Use the fresh snapshot to verify state instead of repeating blindly.
506
505
 
507
506
  Usage rules:
508
507
  1. If a chrome_* tool says Chrome control is locked, ask the user to run \`/chrome authorize\` before retrying.
509
508
  2. \`chrome_snapshot\` before clicking/typing; pass \`uid\` over \`selector\`.
510
- 3. \`includeSnapshot=true\` on click/type/fill to verify in one round trip.
509
+ 3. \`includeSnapshot=true\` on click/type/fill/key to verify in one round trip.
511
510
  4. If \`chrome_evaluate\` returns null when you expected a value, the expression evaluated to null/undefined in the page; surface the value via \`JSON.stringify\` to confirm.
512
511
  5. \`chrome_navigate\` supports an optional \`initScript\` that runs at document_start in MAIN world for the next navigation (good for seeding localStorage or stubbing Date.now).
513
512
  6. By default chrome_* tools focus Chrome so the user can watch; pass \`background=true\` or run /chrome background on for session-wide background execution.
@@ -684,6 +683,7 @@ Usage rules:
684
683
  };
685
684
 
686
685
  const statusHandler = async (ctx: ExtensionContext) => {
686
+ ctx.ui.notify("Checking Chrome connection…", "info");
687
687
  ctx.ui.notify(await statusSummary(), "info");
688
688
  };
689
689
 
@@ -723,6 +723,7 @@ Usage rules:
723
723
 
724
724
  const openCommandMenu = async (ctx: ExtensionContext): Promise<void> => {
725
725
  while (true) {
726
+ ctx.ui.notify("Checking Chrome connection…", "info");
726
727
  const choice = await ctx.ui.select(`pi-chrome\n${await statusSummary()}`, [
727
728
  "Authorize Chrome control…",
728
729
  "Lock Chrome control",
package/package.json CHANGED
@@ -1,11 +1,11 @@
1
1
  {
2
2
  "name": "pi-chrome",
3
- "version": "0.15.15",
3
+ "version": "0.15.17",
4
4
  "scripts": {
5
5
  "version": "node scripts/sync-manifest-version.js",
6
6
  "prepublishOnly": "node scripts/sync-manifest-version.js"
7
7
  },
8
- "description": "Give a Pi agent your real, signed-in Chrome. No CDP, no throwaway profile, no re-login. 20+ tools (click, type, navigate, screenshot, network capture, file upload, drag, touch) with honest result envelopes — and a built-in browser-control benchmark suite.",
8
+ "description": "Give a Pi agent your real, signed-in Chrome. No remote-debug port, no throwaway profile, no re-login. 19 tools for click, type, navigate, screenshot, network capture, file upload, drag, and touch.",
9
9
  "keywords": [
10
10
  "pi",
11
11
  "pi-package",