pi-agent-browser-native 0.2.31 → 0.2.32

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -5,6 +5,7 @@ Related docs:
5
5
  - [`REQUIREMENTS.md`](REQUIREMENTS.md)
6
6
  - [`ARCHITECTURE.md`](ARCHITECTURE.md)
7
7
  - [`COMMAND_REFERENCE.md`](COMMAND_REFERENCE.md)
8
+ - [`ELECTRON.md`](ELECTRON.md)
8
9
  - [`SUPPORT_MATRIX.md`](SUPPORT_MATRIX.md)
9
10
 
10
11
  ## V1 tool
@@ -48,6 +49,7 @@ Agent-facing efficiency claims are measured with `npm run benchmark:agent-browse
48
49
  - For stateful browser context work, prefer purpose-specific page actions before dumping browser data: use auth save --password-stdin with the tool stdin field for credentials, state save/load for portable test state, cookies get/set/clear and storage local|session only when the task needs those values, and expect cookie/storage/auth/state summaries to redact credential-like fields.
49
50
  - For batch chains that touch cookies, storage, auth, or other secret-bearing commands, use details.batchSteps for per-step artifacts, categories, spill paths, and full structured errors; top-level details.data on batch is only a compact redacted step matrix (success, argv-redacted command, redacted result or scrubbed error text) built from the same presentation rules as standalone calls.
50
51
  - For non-core families, pass current upstream commands through the native tool directly: network route/requests/har, diff snapshot/screenshot/url, trace/profiler/record, console/errors/highlight/inspect/clipboard, stream enable/disable/status, dashboard start/stop, and chat. For compact network requests output, prefer details.nextActions for request detail, actionable failed-request networkSourceLookup, filtering, or HAR capture follow-ups instead of guessing request-id syntax. Artifact-producing commands report details.artifacts and verification state; long-running starts such as stream, dashboard, trace/profiler, and record should be paired with the matching stop/disable command when the task is done.
52
+ - For Electron desktop apps, prefer top-level electron for wrapper-owned discovery, isolated launch, status, compact probe, and cleanup: list first, treat likely-sensitive annotations as hints rather than enforcement, launch with the default snapshot handoff unless handoff: "tabs" is the safer diagnostic starting point, use electron.probe or snapshot -i/qa.attached for current-session state, and always cleanup the returned launchId when done. electron.launch uses an isolated temporary profile; it does not reuse the app's normal signed-in profile or attach to an already-running authenticated app. For signed-in local app state, host-launch the normal app with --remote-debugging-port when appropriate, then use raw args connect <port|url>; leave shutdown/profile cleanup to the host owner.
51
53
  - For provider or specialized app workflows, load version-matched upstream guidance with skills get agentcore|electron|slack|dogfood|vercel-sandbox through the native tool. Provider launches such as -p ios, --provider browserbase/kernel/browseruse/browserless/agentcore, and iOS --device are upstream-owned setup paths; use sessionMode fresh when switching providers and expect external credentials or local Appium/Xcode setup to be required.
52
54
  - For dialogs and frames, use dialog status/accept/dismiss and frame <selector|main> through native args; when --confirm-actions produces a pending confirmation, use details.nextActions or exact confirm <id> / deny <id> calls instead of inventing ids.
53
55
  - If a session lands on the wrong page or tab, an interaction changes origin unexpectedly, or an open call returns blocked, blank, or otherwise unexpected results, use tab list / tab <tab-id-or-label> / snapshot -i to recover state before retrying different URLs or fallback strategies. Only use wait with an explicit argument like milliseconds, --load <state>, --url <matcher>, --fn <js>, or --text <matcher>.
@@ -64,7 +66,7 @@ Agent-facing efficiency claims are measured with `npm run benchmark:agent-browse
64
66
 
65
67
  ## Parameters
66
68
 
67
- Illustrative shapes (each real call uses exactly one of `args`, `semanticAction`, `job`, `qa`, `sourceLookup`, or `networkSourceLookup`):
69
+ Illustrative shapes (each real call uses exactly one of `args`, `semanticAction`, `job`, `qa`, `sourceLookup`, `networkSourceLookup`, or `electron`):
68
70
 
69
71
  ```json
70
72
  { "args": ["open", "https://example.com"], "stdin": "optional raw stdin content", "sessionMode": "auto" }
@@ -75,10 +77,15 @@ Illustrative shapes (each real call uses exactly one of `args`, `semanticAction`
75
77
  { "semanticAction": { "action": "select", "selector": "#flavor", "value": "chocolate" } }
76
78
  ```
77
79
 
80
+ ```json
81
+ { "electron": { "action": "list", "query": "code" } }
82
+ { "electron": { "action": "launch", "appName": "Visual Studio Code", "handoff": "snapshot" } }
83
+ ```
84
+
78
85
  ### `args`
79
86
 
80
87
  - type: `string[]`
81
- - required unless `semanticAction`, `job`, `qa`, `sourceLookup`, or `networkSourceLookup` is provided
88
+ - required unless `semanticAction`, `job`, `qa`, `sourceLookup`, `networkSourceLookup`, or `electron` is provided
82
89
  - exact CLI args passed after `agent-browser`
83
90
  - no shell operators
84
91
  - do not include the binary name
@@ -95,16 +102,16 @@ Examples:
95
102
  ### `semanticAction`
96
103
 
97
104
  - type: object
98
- - optional; mutually exclusive with `args`, `job`, `qa`, `sourceLookup`, and `networkSourceLookup` (omit all of them when using this field)
105
+ - optional; mutually exclusive with `args`, `job`, `qa`, `sourceLookup`, `networkSourceLookup`, and `electron` (omit all of them when using this field)
99
106
  - top-level tool input only: `batch` stdin remains upstream argv arrays; express find steps inside batch as string arrays such as `["find","role","button","click","--name","Export"]`, not nested `semanticAction` objects
100
107
  - thin intent schema compiled by this wrapper into existing upstream commands; locator actions compile to `find`, while native dropdown selection compiles to `select <selector> <value...>`; behavior and locator/selector semantics stay upstream-owned
101
108
  - supported actions: `click`, `fill`, `check`, `select`, `uncheck`
102
109
  - supported locators for `click` / `fill` / `check` / `uncheck`: `role`, `text`, `label`, `placeholder`, `alt`, `title`, `testid`
103
- - for locator actions, `value` is the locator argument (for example ARIA role token `"button"`, label text, or visible substring), must be a non-empty string after trim
110
+ - for locator actions, `value` is the locator argument (for example ARIA role token `"button"`, label text, or visible substring), must be a non-empty string after trim; for `locator: "role"`, callers may provide `role` instead of redundant `value`
104
111
  - `fill` requires non-empty `text` (compiled as the trailing value argument to `find`)
105
112
  - `select` requires non-empty `selector` plus either `value` (single option value) or `values` (non-empty array of option values). `select` does not accept `locator`, `role`, `name`, or `text`; upstream `find` does not expose a verified `select` action, so role/name/label dropdown targeting must first be resolved to a stable selector or current `@ref`.
106
113
  - optional `name` is only valid with `locator: "role"` and compiles to `--name <name>` after the action (and after `text` for `fill` when present)
107
- - optional `role` is accepted only when `locator` is `role` and must equal `value` if set (redundant with `value`; prefer `value` alone)
114
+ - optional `role` is accepted only when `locator` is `role`; it may replace `value`, and must equal `value` if both are set
108
115
  - optional `session` is an upstream session name; when set, compilation prepends `--session <session>` before the compiled `find` or `select` command so the shorthand targets that named browser context instead of the managed default; this is independent of top-level `sessionMode`, which only injects or rotates the extension-managed implicit session when the planned argv does not already start with `--session` (see `buildExecutionPlan` in `extensions/agent-browser/lib/runtime.ts`). On successful unified results, `details.sessionName` matches that name and `usedImplicitSession` is `false` because the call named upstream directly rather than consuming the extension-managed implicit session slot.
109
116
 
110
117
  Compilation (then `--json` and session handling apply like any other call):
@@ -112,7 +119,7 @@ Compilation (then `--json` and session handling apply like any other call):
112
119
  | Fields | Compiled `args` (conceptually) |
113
120
  | --- | --- |
114
121
  | `click`, `check`, or `uncheck` + non-`role` locator | `["find", <locator>, <value>, <action>]` |
115
- | `click` / `check` / `uncheck` + `role` + optional `name` | `["find","role",<value>,<action>]` plus `["--name",<name>]` when `name` is set |
122
+ | `click` / `check` / `uncheck` + `role` or `value` + optional `name` | `["find","role",<role-or-value>,<action>]` plus `["--name",<name>]` when `name` is set |
116
123
  | `fill` | `["find",<locator>,<value>,"fill",<text>]` plus optional `["--name",<name>]` after `text` when `locator` is `role` and `name` is set |
117
124
  | `select` + `selector` + `value` / `values` | `["select",<selector>,<value...>]` |
118
125
  | any supported action + `session` | prepends `["--session",<session>]` before the compiled argv |
@@ -131,6 +138,7 @@ Examples:
131
138
 
132
139
  ```json
133
140
  { "semanticAction": { "action": "click", "locator": "role", "value": "button", "name": "Export" } }
141
+ { "semanticAction": { "action": "click", "locator": "role", "role": "button", "name": "Continue without Signing In" } }
134
142
  { "semanticAction": { "action": "click", "locator": "text", "value": "Close" } }
135
143
  { "semanticAction": { "action": "fill", "locator": "label", "value": "Email", "text": "user@example.com" } }
136
144
  { "semanticAction": { "action": "select", "selector": "#flavor", "value": "chocolate" } }
@@ -143,7 +151,7 @@ Examples:
143
151
  ### `job`
144
152
 
145
153
  - type: object with a non-empty `steps` array
146
- - optional; mutually exclusive with `args`, `semanticAction`, `qa`, `sourceLookup`, and `networkSourceLookup`
154
+ - optional; mutually exclusive with `args`, `semanticAction`, `qa`, `sourceLookup`, `networkSourceLookup`, and `electron`
147
155
  - top-level tool input only; do not nest `job` inside `batch` stdin
148
156
  - constrained orchestration only: every step compiles to existing upstream `batch` argv and the compiled plan is echoed as `details.compiledJob`
149
157
  - there is no separate reusable named “browser recipe” extension surface above `job`, `qa`, and raw `batch` yet; the closed `RQ-0068` decision, evidence bar, and revisit criteria are in [`ARCHITECTURE.md`](ARCHITECTURE.md#no-reusable-recipe-layer-yet) and [`SUPPORT_MATRIX.md`](SUPPORT_MATRIX.md)
@@ -189,10 +197,11 @@ Because `job` still executes as upstream `batch` with generated stdin, the same
189
197
 
190
198
  ### `qa`
191
199
 
192
- - type: object with required `url`
193
- - optional; mutually exclusive with `args`, `semanticAction`, `job`, `sourceLookup`, and `networkSourceLookup`
200
+ - type: object with either required `url` (normal URL-opening QA) or `attached: true` (current attached-session QA)
201
+ - optional; mutually exclusive with `args`, `semanticAction`, `job`, `sourceLookup`, `networkSourceLookup`, and `electron`
194
202
  - lightweight preset built on the same batch compiler path as `job`
195
- - clears enabled diagnostic buffers first (`network requests --clear`, `console --clear`, `errors --clear`), then opens `url`, waits with `wait --load <loadState>`, optionally asserts `expectedText` (string or string array) and/or `expectedSelector` (each may be omitted for a load-plus-diagnostics-only smoke), then runs enabled diagnostics: `network requests`, `console`, and `errors`
203
+ - URL form: clears enabled diagnostic buffers first (`network requests --clear`, `console --clear`, `errors --clear`), then opens `url`, waits with `wait --load <loadState>`, optionally asserts `expectedText` (string or string array) and/or `expectedSelector` (each may be omitted for a load-plus-diagnostics-only smoke), then runs enabled diagnostics: `network requests`, `console`, and `errors`
204
+ - attached form: `qa: { attached: true, expectedText?, expectedSelector?, screenshotPath?, checkNetwork?, checkConsole?, checkErrors?, loadState? }` runs the same waits, optional assertions, diagnostics, and screenshot against the current attached managed session without opening a URL. It rejects `url` and cannot be used with `sessionMode: "fresh"`; attach first with `electron.launch` or raw `args: ["connect", "<port-or-url>"]`, then run `qa.attached`.
196
205
  - `loadState` is optional and must be `domcontentloaded`, `load`, or `networkidle`; it defaults to `domcontentloaded` so analytics-heavy or long-polling pages do not hang routine QA. Use `networkidle` only when the site is expected to go fully quiet.
197
206
  - `checkNetwork`, `checkConsole`, and `checkErrors` default to `true`; set a field to `false` to omit that diagnostic
198
207
  - optional `screenshotPath` adds an evidence screenshot step
@@ -207,10 +216,147 @@ Example:
207
216
 
208
217
  Use custom `job` or raw `batch` for QA flows that need custom commands, flags, auth setup, HAR capture, or project-specific assertions.
209
218
 
219
+ ### `electron`
220
+
221
+ Workflow-oriented public guide: [`ELECTRON.md`](ELECTRON.md). This section remains the canonical field contract; the guide covers when and how to use these actions in practice.
222
+
223
+ - type: object with required `action`
224
+ - optional; mutually exclusive with `args`, `semanticAction`, `job`, `qa`, `sourceLookup`, and `networkSourceLookup`
225
+ - top-level wrapper shorthand for Electron desktop apps; do not nest it inside `batch` stdin
226
+ - `stdin` is rejected with `electron`; host-only actions manage their own local work and `launch` manages its own upstream `connect`
227
+ - supported actions: `list`, `launch`, `status`, `cleanup`, and `probe`
228
+
229
+ Action schemas:
230
+
231
+ | Action | Fields | Behavior |
232
+ | --- | --- | --- |
233
+ | `list` | `query?`, `maxResults?` | Scans supported platform app locations for Electron evidence and returns bounded app metadata in `details.electron.apps`. Likely-sensitive app annotations are advisory metadata only. Does not spawn upstream `agent-browser`; list output also warns that later wrapper launches are isolated and will not read existing signed-in desktop state. |
234
+ | `launch` | exactly one of `appPath`, `appName`, `bundleId`, or `executablePath`; optional `appArgs`, `handoff`, `targetType`, `timeoutMs`, `allow`, `deny` | Resolves and verifies an Electron target, launches it with a wrapper-owned isolated profile and OS-chosen CDP port, attaches through upstream `connect` using `sessionMode: "fresh"`, and records `details.electron.launch`. It does not reuse the app's normal signed-in profile or attach to an already-running authenticated app; when signed-in local app state is the goal, use a host debug-port launch plus raw `connect` instead. |
235
+ | `status` | optional `launchId` or `all`, optional `timeoutMs` | Inspects wrapper-tracked launches, debug-port liveness, and current CDP targets without mutating the app. With neither `launchId` nor `all`, selects the single active wrapper launch when unambiguous. |
236
+ | `cleanup` | optional `launchId` or `all`, optional `timeoutMs` | Closes the tracked upstream session when present, stops only the wrapper-tracked process, verifies debug-port shutdown, removes the wrapper-created `userDataDir`, and marks records cleaned or partial. |
237
+ | `probe` | optional `launchId`, optional `timeoutMs` | Runs bounded current-session or launch-scoped state reads (`get title`, `get url`, focused-element `eval --stdin`, `tab list`, compact `snapshot -i`) and reports `details.electron.probe`. Without `launchId`, requires an active attached managed session; with `launchId`, it resolves the tracked launch session and can report mismatch guidance. |
238
+
239
+ Validation and defaults:
240
+
241
+ - `launch` requires exactly one target field. `list` accepts only `query` and `maxResults`; `probe` accepts only `launchId` and `timeoutMs` beyond `action`; `status` / `cleanup` accept only `launchId`, `all`, and `timeoutMs`.
242
+ - Host install discovery (`electron.list` and resolving `appName` / `bundleId` through `discoverElectronApps` in `extensions/agent-browser/lib/electron/discovery.ts`) runs on **macOS** and **Linux** only. On **Windows** (and any other platform), `list` returns `platform: "unsupported"` with an empty `apps` array, and `launch` cannot resolve purely name-based targets without a prior scan—use `executablePath` or a host `appPath` that resolves to a verifiable Electron binary (`inspectElectronExecutablePath` still gates Windows executables).
243
+ - `launch.handoff` defaults to `"snapshot"`; supported values are `"connect"`, `"tabs"`, and `"snapshot"`. `"connect"` stops after attach, `"tabs"` adds a session-scoped `tab list` and is the safer diagnostic starting point when you do not want to capture refs/content yet, and `"snapshot"` adds `tab list` plus `snapshot -i` so current refs are immediately available. If the first Electron snapshot returns no refs while the app is still settling, the wrapper retries briefly before reporting no refs and tells agents to run `snapshot -i` once more before treating the UI as unusable.
244
+ - `launch.targetType` defaults to `"page"`; supported values are `"page"`, `"webview"`, and `"any"`. When a matching CDP target exposes a WebSocket URL, launch connects to that target; otherwise it falls back to the browser port.
245
+ - `appArgs` are passed to the Electron app, but wrapper-owned lifecycle/debug flags are rejected (`--user-data-dir`, `--remote-debugging-port`, `--remote-debugging-address`, `--remote-debugging-pipe`, and `--`).
246
+ - `allow` and `deny` are optional caller-owned policy lists. Entries match app name, bundle id, desktop id, app path, or executable path by substring. If `allow` is set, the target must match it; `deny` wins on conflict. With neither list, launch is permitted.
247
+ - `electron.status` / `electron.cleanup` accept optional `all` only as the boolean literal `true` to include every wrapper-tracked launch; `all` and `launchId` cannot both be set.
248
+ - `electron.launch` `timeoutMs` is an optional positive integer for the host CDP readiness window. The launcher normalizes missing or non-positive values to **15000 ms** and **caps** at **120000 ms** (`normalizeTimeoutMs` in `extensions/agent-browser/lib/electron/launch.ts`). Other actions use `timeoutMs` differently: **`status`** forwards it only to managed-session `get title` / `get url` reads for mismatch diagnostics (when omitted, those subprocess reads use the normal `runAgentBrowserProcess` default from `getAgentBrowserProcessTimeoutMs`), while localhost CDP liveness in `inspectElectronLaunchStatus` uses a fixed **1000 ms** fetch budget per HTTP probe (`ELECTRON_STATUS_FETCH_TIMEOUT_MS` in `extensions/agent-browser/lib/electron/cleanup.ts`). **`cleanup`** applies one budget to upstream managed-session `close` plus host process exit, debug-port verification, and temp profile removal (`cleanupTrackedElectronLaunches` in `extensions/agent-browser/index.ts`); when omitted it defaults to **`PI_AGENT_BROWSER_IMPLICIT_SESSION_CLOSE_TIMEOUT_MS`** else **5000 ms** (`getImplicitSessionCloseTimeoutMs` in `extensions/agent-browser/lib/runtime.ts`). **`probe`** forwards it to each bounded upstream read (`get title`, `get url`, `eval --stdin`, `tab list`, `snapshot -i`); when omitted, those reads use the same `runAgentBrowserProcess` default as other tool calls.
249
+ - Non-Electron targets are rejected as a correctness failure; the wrapper does not blindly launch arbitrary executables as Electron.
250
+
251
+ Safety defaults and ownership:
252
+
253
+ - `launch` always uses a new isolated wrapper-created `userDataDir` and `--remote-debugging-port=0`, then reads `DevToolsActivePort`; callers cannot choose a fixed debug port, cannot reuse the app's normal signed-in profile, and cannot use this path to attach to an already-running authenticated app.
254
+ - The wrapper also passes `--disable-extensions`, `--no-first-run`, and `--no-default-browser-check` alongside sanitized caller `appArgs`.
255
+ - Remote debugging exposes app contents to the attached browser tool. The wrapper gives isolation defaults and optional `allow` / `deny`; the user still owns the decision to launch or attach to a sensitive desktop app.
256
+ - `electron.list` may annotate apps as likely sensitive (`sensitivity.level: "likely-sensitive"`, categories such as `notes`, `chat`, `mail`, `developer-workspace`, or `passwords-auth`) and print `[likely sensitive: …]`. These annotations are non-blocking hints, not enforcement; caller-owned `allow` / `deny` policy still controls launch decisions.
257
+ - Cleanup is wrapper-owned **only** for records created by `electron.launch`. `electron.cleanup` never targets manually launched apps, externally supplied debug ports, or arbitrary Electron processes. Explicit screenshots/downloads/HARs/traces remain host-file cleanup, not Electron cleanup.
258
+ - On Pi session shutdown, active wrapper-owned Electron launches are best-effort cleaned. Stale restored records are reported instead of guessed/killed when the wrapper lacks a live child process.
259
+
260
+ Details fields:
261
+
262
+ ```json
263
+ {
264
+ "compiledElectron": {
265
+ "action": "launch",
266
+ "appName": "My App",
267
+ "handoff": "snapshot",
268
+ "targetType": "page",
269
+ "appArgs": ["--safe-mode"]
270
+ },
271
+ "electron": {
272
+ "action": "launch",
273
+ "status": "succeeded",
274
+ "launch": {
275
+ "version": 1,
276
+ "launchId": "electron-…",
277
+ "launchedByWrapper": true,
278
+ "appName": "My App",
279
+ "bundleId": "com.example.my-app",
280
+ "appPath": "/Applications/My App.app",
281
+ "executablePath": "/Applications/My App.app/Contents/MacOS/My App",
282
+ "userDataDir": "/tmp/pi-agent-browser-electron-…",
283
+ "port": 12345,
284
+ "pid": 1234,
285
+ "sessionName": "pi-…",
286
+ "webSocketDebuggerUrl": "ws://127.0.0.1:12345/devtools/browser/…",
287
+ "createdAtMs": 1770000000000,
288
+ "cleanupState": "active"
289
+ },
290
+ "targets": [{ "type": "page", "title": "My App", "url": "app://index" }],
291
+ "handoff": { "handoff": "snapshot" },
292
+ "profileIsolation": {
293
+ "isolatedLaunch": true,
294
+ "reusesExistingSignedInProfile": false,
295
+ "attachesToAlreadyRunningApp": false,
296
+ "hostDebugLaunchExample": "macOS: open -a <App Name> --args --remote-debugging-port=9222 --remote-allow-origins='*'; then agent_browser connect 9222 with sessionMode=fresh"
297
+ }
298
+ }
299
+ }
300
+ ```
301
+
302
+ Action-specific `details.electron` fields:
303
+
304
+ - `list`: `{ action: "list", status: "succeeded", apps, platform, query?, maxResults, skippedCount, omittedCount?, sensitiveAppCount?, profileIsolation }`. Each app is platform-tagged and may include `name`, `bundleId`, `desktopId`, `appPath`, `executablePath`, `icon`, `packageSource`, and non-blocking `sensitivity` metadata depending on platform/discovery source.
305
+ - `launch`: `{ action: "launch", status, launch, targets?, version?, handoff?, cleanup?, identifiers?, profileIsolation }`. `profileIsolation` states that wrapper launches use a new temporary profile, do not reuse existing signed-in app state, and do not attach to already-running authenticated apps; it also includes host debug-launch guidance for the separate normal-app attach path. `identifiers` repeats the launch-scoped `launchId` and attached `sessionName` so agents distinguish Electron lifecycle actions from browser session/tab actions. `launch.cleanupState` is one of `"active"`, `"cleaned"`, `"dead"`, `"failed"`, or `"partial"`. Failed launches expose `details.electron.failure.diagnostics` when available, including `pid` / `pidAlive`, wrapper `userDataDir`, elapsed/timeout timing, `DevToolsActivePort` file state, discovered port, and whether CDP `/json/version` was reached.
306
+ - `status`: `{ action: "status", status: "succeeded", launches, statuses, targets, identifiers?, identifierList?, managedSession?, managedSessions?, sessionMismatch?, sessionMismatches? }`, where each status includes the tracked `launchId`, port/pid liveness, and bounded CDP target metadata. Mismatch fields explain when the current managed session or tab does not match a live wrapper launch target.
307
+ - `cleanup`: `{ action: "cleanup", status: "succeeded" | "partial", cleanup: { partial, records, results } }`. Partial cleanup is a failed tool result with `failureCategory: "cleanup-failed"` and retry next actions. Cleanup steps may include `managed-session`, `process`, `debug-port`, and `user-data-dir`; managed-session close failures are reported while host-owned process/profile cleanup still runs.
308
+ - `probe`: `{ action: "probe", status: "succeeded" | "partial", probe, probeContext, identifiers?, sessionMismatch?, statusTargets?, launchStatus? }`. `probeContext` records whether the probe inspected the current managed session or a specific `launchId`. `probe` includes bounded `title`, `url`, `focusedElement`, `activeTab`, `tabs`, compact `snapshot` metadata (`refCount`, `refIds`, optional text preview and omission counts), `errors?`, and `summary`. When launch status is known, visible probe output includes debug-port/pid liveness so `about:blank` plus a dead wrapper launch is unmistakable. It also updates the normal session target/ref tracking when a snapshot is collected.
309
+
310
+ Failure categories and next actions:
311
+
312
+ - `policy-blocked` is used when `electron.launch` is blocked by caller-supplied `allow` / `deny`; inspect `details.electron.failure.policy` for the matched list and entry when present.
313
+ - `cleanup-failed` is used when `electron.cleanup` only partially cleans tracked resources; inspect `details.electron.cleanup.results[].steps` for remaining process, port, or profile cleanup state.
314
+ - Launch timeout maps to `timeout`; non-Electron targets and input issues map to `validation-error`; launch/attach/spawn/CDP failures map to `upstream-error` unless a more specific category applies. If a successful-looking Electron mutation is followed by a dead process/debug port or an unrecoverable `about:blank` target, the wrapper upgrades the result to `failureCategory: "tab-drift"` and sets `details.electronPostCommandHealth` with the launch status and recovery action ids.
315
+ - Successful Electron `fill <selector> <text>` commands may run a read-only `get value <selector>` verification. If the value still differs, the wrapper keeps the tool successful but adds `details.fillVerification`, visible guidance to use snapshot/focus/keyboard typing for custom quick-input controls, and `inspect-after-fill-verification` / `verify-filled-value` next actions.
316
+ - Successful active launches/status/probe results may include exact `details.nextActions` with ids `status-electron-launch`, `probe-electron-launch`, `cleanup-electron-launch`, `list-electron-tabs`, and `snapshot-electron-session`. Electron status/probe mismatch diagnostics may also include `reattach-electron-launch` before fresh tab/snapshot inspection. Electron post-command health failures include status/probe/cleanup actions for the same `launchId`; Electron `@e…` mutations may add `refresh-electron-refs-after-rerender` because desktop apps often rerender without URL changes. Electron cleanup partial failures may include `status-electron-launch` and `retry-electron-cleanup`.
317
+
318
+ Next-action payload examples:
319
+
320
+ ```json
321
+ {
322
+ "tool": "agent_browser",
323
+ "id": "cleanup-electron-launch",
324
+ "reason": "Clean the wrapper-owned Electron process and isolated userDataDir when the run is complete.",
325
+ "safety": "Only operates on the launchId created by electron.launch; explicit artifacts and manually launched apps remain host-owned.",
326
+ "params": { "electron": { "action": "cleanup", "launchId": "electron-…" } }
327
+ }
328
+ ```
329
+
330
+ ```json
331
+ {
332
+ "tool": "agent_browser",
333
+ "id": "snapshot-electron-session",
334
+ "reason": "Refresh interactive refs for the attached Electron session.",
335
+ "safety": "Use current Electron refs only after a fresh snapshot for this session.",
336
+ "params": { "args": ["--session", "pi-…", "snapshot", "-i"] }
337
+ }
338
+ ```
339
+
340
+ Examples:
341
+
342
+ ```json
343
+ { "electron": { "action": "list", "query": "code" } }
344
+ { "electron": { "action": "launch", "appName": "Visual Studio Code", "handoff": "snapshot" } }
345
+ { "electron": { "action": "probe" } }
346
+ { "qa": { "attached": true, "expectedText": "Explorer" } }
347
+ { "electron": { "action": "cleanup", "launchId": "electron-…" } }
348
+ ```
349
+
350
+ For an app you launched manually with remote debugging enabled, skip `electron.cleanup` and use the upstream path directly:
351
+
352
+ ```json
353
+ { "args": ["connect", "9222"], "sessionMode": "fresh" }
354
+ ```
355
+
210
356
  ### `sourceLookup`
211
357
 
212
358
  - type: object with at least one of `selector`, `reactFiberId`, or `componentName`
213
- - optional; mutually exclusive with `args`, `semanticAction`, `job`, `qa`, and `networkSourceLookup`
359
+ - optional; mutually exclusive with `args`, `semanticAction`, `job`, `qa`, `networkSourceLookup`, and `electron`
214
360
  - experimental opt-in helper for local app debugging; it reports candidate source locations with confidence and evidence instead of claiming a guaranteed DOM-to-file mapping
215
361
  - compiles to existing upstream `batch` commands only:
216
362
  - `selector` adds `is visible <selector>` and, unless `includeDomHints: false`, adds `get html <selector>` for source-like DOM attributes (`data-source-file`, `data-file`, `data-component-file`, `data-source`, plus optional `data-source-line` / `data-line` and `data-source-column` / `data-column`) and for `.ts`/`.tsx`/`.js`/`.jsx` paths embedded in HTML text
@@ -218,10 +364,11 @@ Use custom `job` or raw `batch` for QA flows that need custom commands, flags, a
218
364
  - `componentName` runs `react tree` and performs a bounded local workspace scan under the Pi tool session **cwd** for matching component declarations in `.ts`, `.tsx`, `.js`, and `.jsx` files (skipping directories such as `.git`, `node_modules`, `dist`, `build`, `coverage`, `.next`, `out`, `tmp`, and `temp`); the walk stops after `maxWorkspaceFiles` files (default 2000, hard cap 5000) and records at most ten `workspace-search` candidates
219
365
  - optional `includeDomHints: false` skips the selector HTML read
220
366
  - optional `maxWorkspaceFiles` bounds the local component-name scan; default is 2000 source files and the hard maximum is 5000
221
- - reports `details.compiledSourceLookup` with the generated batch plan and `details.sourceLookup` with `{ status, candidates, limitations, summary }`
367
+ - reports `details.compiledSourceLookup` with the generated batch plan and `details.sourceLookup` with `{ status, candidates, limitations, summary, workspaceRoot?, electronContext? }`
222
368
  - each `candidates[]` entry includes `source` (`react-inspect`, `dom-attribute`, or `workspace-search`), `confidence` (`high`, `medium`, or `low`), `evidence` (string reasons), and optional `file`, `line`, `column`, and `componentName`
223
369
  - `details.sourceLookup.status` is one of `candidates-found`, `no-candidates`, or `unsupported`; `unsupported` applies only when **no** candidates were collected **and** at least one compiled `react` batch step failed (for example React DevTools not enabled, no renderer, or inspect errors). If DOM or workspace evidence still produced candidates, `status` stays `candidates-found` even when a `react` step failed
224
370
  - when analysis produces a `summary`, the wrapper prepends it to the primary visible text block (or inserts a leading text block) for quick scanning; unlike `qa`, it never flips the unified tool outcome to failed solely because diagnostics look noisy or because `status` is `no-candidates` / metadata was missing—failed upstream batch steps still surface as normal tool errors
371
+ - for wrapper-tracked packaged Electron sessions where `status` is `no-candidates`, the wrapper may add `workspaceRoot` and `electronContext` (`launchId?`, `appName?`, `appPath?`, `executablePath?`, `sessionName?`, `url?`) plus limitations explaining that the local workspace scan only covered the Pi tool cwd and did not unpack installed app resources or `app.asar`; it may also append `snapshot-electron-session`, `probe-electron-launch`, and `list-electron-tabs` next actions for live app inspection
225
372
 
226
373
  Example:
227
374
 
@@ -234,7 +381,7 @@ Use raw `args` for direct upstream React inspection when you already know the ex
234
381
  ### `networkSourceLookup`
235
382
 
236
383
  - type: object with at least one of `requestId`, `filter`, or `url`, plus optional `maxWorkspaceFiles`
237
- - optional; mutually exclusive with `args`, `semanticAction`, `job`, `qa`, and `sourceLookup`
384
+ - optional; mutually exclusive with `args`, `semanticAction`, `job`, `qa`, `sourceLookup`, and `electron`
238
385
  - experimental failed-request source-hint helper; it reports failed network requests and candidate source hints with evidence instead of assigning blame
239
386
  - compiles to existing upstream `batch` commands only: `network request <requestId>` when provided plus `network requests` with `--filter <filter-or-url>` when a filter or URL is provided (if both are set, `filter` wins; when only `url` is set, it becomes the `--filter` argument); optional `session` prepends `--session <name>` before that generated `batch`
240
387
  - detects failed requests from `status >= 400`, `failed: true`, or an `error` field
@@ -362,7 +509,7 @@ Stable category fields are part of the machine-readable contract:
362
509
 
363
510
  - `resultCategory`: always either `"success"` or `"failure"`.
364
511
  - `successCategory`: present on successful results. Current values are `"completed"`, `"artifact-saved"`, `"artifact-unverified"`, and `"inspection"`. `artifact-unverified` means upstream reported success but the merged `artifactVerification` summary still reports missing or unverified rows (including manifest-backed spill rows), or the legacy artifact classifier still sees a non-pending file without confirmed disk presence; inspect `artifactVerification` (counts and per-entry `state` / optional `limitation`) before treating paths as durable.
365
- - `failureCategory`: present on failed results. Current values are `"aborted"`, `"confirmation-required"`, `"download-not-verified"`, `"missing-binary"`, `"parse-failure"`, `"qa-failure"`, `"selector-not-found"`, `"selector-unsupported"`, `"stale-ref"`, `"tab-drift"`, `"timeout"`, `"upstream-error"`, and `"validation-error"`.
512
+ - `failureCategory`: present on failed results. Current values are `"aborted"`, `"cleanup-failed"`, `"confirmation-required"`, `"download-not-verified"`, `"missing-binary"`, `"parse-failure"`, `"policy-blocked"`, `"qa-failure"`, `"selector-not-found"`, `"selector-unsupported"`, `"stale-ref"`, `"tab-drift"`, `"timeout"`, `"upstream-error"`, and `"validation-error"`.
366
513
 
367
514
  These categories are intentionally bounded and stable so agents can branch on them instead of parsing prose. They do not replace raw diagnostics: `details.error`, `details.stderr`, `details.parseError`, `details.validationError`, and visible content still preserve the specific upstream or wrapper message after normal redaction.
368
515
 
@@ -370,7 +517,7 @@ For `batch`, top-level `details` still carries `resultCategory` plus `successCat
370
517
 
371
518
  Top-level `details.data` on `batch` is a compact per-step roll-up (not a verbatim replay of raw upstream batch JSON): each element is `{ success, command, result? | error? }` where `command` is argv-redacted the same way as echoed invocation args (including `cookies set` cookie values, `storage local|session set` values, and other sensitive flags/positionals), `result` is the presentation-layer data for that step after the same structured redaction as non-batch commands, and `error` is failure text with cookie/storage/password literals stripped when those values appeared in argv. Prefer `batchSteps[]` for full per-step `details` (artifacts, categories, spill paths); use the roll-up when you only need a redacted matrix of what ran.
372
519
 
373
- `details.refSnapshot` may appear after successful `snapshot` calls and subsequent same-session calls. It records the latest page-scoped ref ids known to the wrapper and the page target they came from so mutation-prone `@e…` commands can fail fast instead of silently hitting recycled refs after navigation.
520
+ `details.refSnapshot` may appear after successful `snapshot` calls and subsequent same-session calls. It records the latest page-scoped ref ids known to the wrapper and the page target they came from so mutation-prone `@e…` commands can fail fast instead of silently hitting recycled refs after navigation. For wrapper-tracked Electron sessions, `details.electronRefFreshness` may also appear after a successful `@e…` mutation as a softer same-URL rerender warning: run `snapshot -i` before reusing old refs even if the URL did not change.
374
521
 
375
522
  Ref preflight details (implementation in `extensions/agent-browser/index.ts`):
376
523
 
@@ -379,7 +526,7 @@ Ref preflight details (implementation in `extensions/agent-browser/index.ts`):
379
526
 
380
527
  **Presentation redaction (implementation map):** Successful non-`batch` tool calls and each successful `batchSteps[]` row run upstream `data` through `redactPresentationData` in `extensions/agent-browser/lib/results/presentation.ts`: `cookies` and `storage` walk objects/arrays and replace case-insensitive `value` keys with `"[REDACTED]"` (diagnostic formatters still describe rows without expanding secrets); every other command’s payload is recursively scrubbed with `redactStructuredPresentationValue`, which redacts known sensitive key names and applies string-level sensitivity heuristics so network, diff, trace/profiler, stream, dashboard, chat, and other structured results do not echo bearer tokens, proxy credentials, or similar fields verbatim into `details.data`. Echoed `command` arrays in `details` and in batch roll-ups use `redactInvocationArgs` from `extensions/agent-browser/lib/runtime.ts` to mask trailing values for sensitive global flags (including `--body`, `--headers`, `--password`, and `--proxy`), preserve the special positional rules for `cookies set`, `storage local|session set`, and `set credentials`, and scrub other argv tokens for URLs and inline secrets. Failed batch steps additionally run `redactExactValues` on structured step errors so literals taken from that step’s argv (cookie value, storage set value, `--password` / `--password=` tokens) cannot reappear inside formatted error blobs.
381
528
 
382
- `nextActions` is an optional machine-readable list of exact native `agent_browser` follow-ups. Each entry includes `tool: "agent_browser"`, an `id`, a short `reason`, optional `safety`, and either `params` (`args`, optional `stdin`, optional `sessionMode`, optional `networkSourceLookup`) or an `artifactPath` for saved-file workflows. Agents should prefer these payloads over prose when present. Current recommendations include: `open` success → `snapshot -i`; mutating/navigation commands (see `buildAgentBrowserNextActions` in source for the exact command set) → `snapshot -i`; stale refs and selector failures → `snapshot -i` via `refresh-interactive-refs` (prefixed with `--session <name>` when the failed call ran in a named or managed session); selector misses with exact current snapshot role/name matches → direct ref retries via `try-current-visible-ref` or bounded `try-current-visible-ref-N`; unknown getter shortcuts such as `title` / `url` → exact read-only retries like `get title` / `get url` with ids `use-get-title` / `use-get-url`; compact `network requests` results with safe request IDs → bounded read-only request detail, `networkSourceLookup`, path filter, or HAR-capture follow-ups; semantic `selector-not-found` failures that compiled from `semanticAction` may append `try-searchbox-name-candidate`, `try-textbox-name-candidate`, `try-button-name-candidate`, `try-link-name-candidate`, or `try-labeled-textbox-candidate` after presentation `nextActions` only for the bounded fill/click pairs enumerated under `semanticAction`; semantic `stale-ref` failures that compiled from `semanticAction` `find` argv may also include `retry-semantic-action-after-stale-ref` after that snapshot step; qualifying same-URL top-level clicks (see `overlayBlockers` below) with fresh snapshot evidence of likely overlay/banner/dialog close controls may append `inspect-overlay-state` and bounded `try-overlay-blocker-candidate-*` entries; successful top-level `scroll` calls whose pre/post viewport and sampled scroll-container positions do not change may append `inspect-after-noop-scroll` and `verify-noop-scroll-visually`; explicit combobox-targeted actions that focus a combobox without visible options may append `inspect-focused-combobox`, `try-open-combobox-with-arrow`, and `try-open-combobox-with-enter`; `get text <selector>` calls with hidden/multiple CSS matches may append `inspect-visible-text-candidates` with a read-only `eval --stdin` probe (each prefixed with `--session <name>` when `details.sessionName` is set, same `sessionPrefixArgs` rule as other session-scoped follow-ups); confirmations → exact `confirm <id>` and `deny <id>` choices; tab drift → `tab list` then `snapshot -i`; download verification failures or missing successful download artifacts → `wait --download [path]`; saved artifacts → the artifact path to inspect/consume after checking `artifactVerification`/metadata; missing non-download artifacts → `verify-artifact-path` so agents do not trust an absent file. When nothing applies, the field is omitted.
529
+ `nextActions` is an optional machine-readable list of exact native `agent_browser` follow-ups. Each entry includes `tool: "agent_browser"`, an `id`, a short `reason`, optional `safety`, and either `params` (`args`, optional `stdin`, optional `sessionMode`, optional `networkSourceLookup`, optional `electron`) or an `artifactPath` for saved-file workflows. Agents should prefer these payloads over prose when present. Current recommendations include: Electron launches → wrapper-tracked `electron.status` / `electron.probe` / `electron.cleanup` actions plus session-scoped tab/snapshot inspection when attached; Electron status/probe mismatch diagnostics → `reattach-electron-launch` plus fresh tab/snapshot inspection; Electron post-command health failures → status/probe/cleanup for the same `launchId`; Electron fill verification mismatches → `inspect-after-fill-verification` and `verify-filled-value`; Electron same-URL ref freshness warnings → `refresh-electron-refs-after-rerender`; packaged-Electron `sourceLookup` no-candidate diagnostics → session snapshot, launch probe, and tab list; Electron cleanup partial failures → status plus retry-cleanup for the same wrapper-owned `launchId`; `open` success → `snapshot -i`; mutating/navigation commands (see `buildAgentBrowserNextActions` in source for the exact command set) → `snapshot -i`; stale refs and selector failures → `snapshot -i` via `refresh-interactive-refs` (prefixed with `--session <name>` when the failed call ran in a named or managed session); selector misses with exact current snapshot role/name matches → direct ref retries via `try-current-visible-ref` or bounded `try-current-visible-ref-N`; unknown getter shortcuts such as `title` / `url` → exact read-only retries like `get title` / `get url` with ids `use-get-title` / `use-get-url`; compact `network requests` results with safe request IDs → bounded read-only request detail, `networkSourceLookup`, path filter, or HAR-capture follow-ups; semantic `selector-not-found` failures that compiled from `semanticAction` may append `try-searchbox-name-candidate`, `try-textbox-name-candidate`, `try-button-name-candidate`, `try-link-name-candidate`, or `try-labeled-textbox-candidate` after presentation `nextActions` only for the bounded fill/click pairs enumerated under `semanticAction`; semantic `stale-ref` failures that compiled from `semanticAction` `find` argv may also include `retry-semantic-action-after-stale-ref` after that snapshot step; qualifying same-URL non-Electron top-level clicks (see `overlayBlockers` below) with fresh snapshot evidence of likely overlay/banner/dialog close controls may append `inspect-overlay-state` and bounded `try-overlay-blocker-candidate-*` entries; successful top-level `scroll` calls whose pre/post viewport and sampled scroll-container positions do not change may append `inspect-after-noop-scroll` and `verify-noop-scroll-visually`; explicit combobox-targeted actions that focus a combobox without visible options may append `inspect-focused-combobox`, `try-open-combobox-with-arrow`, and `try-open-combobox-with-enter`; `get text <selector>` calls with hidden/multiple CSS matches may append `inspect-visible-text-candidates` with a read-only `eval --stdin` probe (each prefixed with `--session <name>` when `details.sessionName` is set, same `sessionPrefixArgs` rule as other session-scoped follow-ups); confirmations → exact `confirm <id>` and `deny <id>` choices; tab drift → `tab list` then `snapshot -i`; `wait --text` assertion failures → `inspect-after-text-assertion-failure` with a read-only snapshot; download verification failures or missing successful download artifacts → `wait --download [path]`; saved artifacts → the artifact path to inspect/consume after checking `artifactVerification`/metadata; missing non-download artifacts → `verify-artifact-path` so agents do not trust an absent file. When nothing applies, the field is omitted.
383
530
 
384
531
  **Unknown-command getter hints (failure presentation):** `buildToolPresentation` in `extensions/agent-browser/lib/results/presentation.ts` only runs this path when upstream error text (after model-facing redaction) matches `unknown command`, `unknown subcommand`, or `unrecognized command` (case-insensitive) **and** the failed invocation’s primary command token is one of `attr`, `count`, `html`, `text`, `title`, `url`, or `value`. Visible text then includes a grouped-`get` hint line plus per-token guidance (`get text <selector>`, `get html …`, `get attr …`, `get count …`, `get value …`, `get title`, `get url`). Machine `nextActions` with ids `use-get-title` / `use-get-url` are emitted only for `title` / `url`, with `params.args` optionally prefixed by `--session <name>` when the failed call targeted a named session. If the error string already contains `Agent-browser hint:` from selector recovery (stale-ref or unsupported selector dialect appendages), the getter block is skipped so two stacked `Agent-browser hint:` headers are not emitted.
385
532
 
@@ -389,7 +536,7 @@ For `batch`, each `batchSteps[]` entry can carry its own `nextActions` for that
389
536
 
390
537
  `pageChangeSummary` is an optional compact summary for mutation-prone and artifact-producing commands. It includes `changeType` (`"navigation"`, `"mutation"`, `"artifact"`, or `"confirmation"`), `command`, a readable `summary`, optional `title`/`url`, optional `artifactCount` or `savedFilePath`, and `nextActionIds` that link the observed change to `nextActions` without repeating full payloads. The wrapper maintains an explicit allowlist of mutation-prone commands in `extensions/agent-browser/lib/results/presentation.ts` (`PAGE_CHANGE_SUMMARY_COMMANDS`): those commands still emit a `mutation`-typed summary when upstream JSON lacks navigation metadata, as long as no stronger signal (artifact, saved path, navigation fields, or pending confirmation) applies. Commands outside that set omit `pageChangeSummary` unless the parsed payload shows navigation, a confirmation prompt, saved files, or artifacts—including read-only inspection commands, which normally have no summary unless one of those signals appears. For `batch`, the top-level summary favors artifact rollups when any step produced artifacts; otherwise it may synthesize a `mutation` summary from steps that carried their own `pageChangeSummary`. Treat mutation summaries as "upstream attempted the action" evidence, not proof the application handled it; agents should verify URL/text/state for important mutations before continuing.
391
538
 
392
- `overlayBlockers` may appear after a successful **top-level** `click` (the unified `details.command` is `click`, not `batch`/`job`/`qa` flows that compile to `batch`) only when upstream JSON includes a string `data.clicked` ref, the session’s prior pinned tab URL (`priorSessionTabTarget.url`) and the post-click active tab URL both exist and stay equal after the same URL normalization used for ref preflight (trimmed hosts/paths; **`#fragment` dropped** while the query string stays significant), and the wrapper did not apply session tab correction or an about-blank mismatch recovery in the same result. The post-click side comes from `details.navigationSummary.url`, which the wrapper fills with one read-only `eval` summary (`({ title: document.title, url: location.href })`) only when upstream click JSON omits **both** string `data.url` and `data.title` (`shouldCaptureNavigationSummary` in `extensions/agent-browser/index.ts`). If either field is present as a string on the click payload, that probe is skipped, `navigationSummary` stays unset here, and overlay diagnostics are omitted even when the page did not navigate. The wrapper then issues **one** extra session-scoped `snapshot -i`, scans that snapshot’s `refs` map, and only emits diagnostics when **both** are true: at least one ref has a strong modal role (`dialog` or `alertdialog`), and there are up to **three** separate `button`/`link`/`menuitem` refs whose names match close/dismiss-style patterns (for example “Close”, “Dismiss”, “No thanks”, or a lone `×`). Page-wide text such as “privacy”, “sign in”, or “banner” without a dialog role is not enough, which avoids warning on ordinary same-page menu opens or app button mutations. Each candidate carries `ref` (`@eN`), optional `role`/`name`, exact `click` argv in `args`, and a short evidence `reason`. The struct also includes a `summary` string (one sentence stating that the click left the tab on the same normalized URL and the fresh snapshot shows likely dismiss controls) plus a `snapshot` object (same shape as `details.refSnapshot` after a normal snapshot): on success the wrapper may treat that snapshot as the session’s latest ref map for subsequent calls, so agents should assume refs can move to match this post-diagnostic tree. Visible text appends the same bullets under `Possible overlay blockers`, and `details.nextActions` gains `inspect-overlay-state` plus `try-overlay-blocker-candidate-1`…`3` after any presentation `nextActions` (for example `inspect-after-mutation`); when `details.sessionName` is set, those appended actions use `sessionPrefixArgs` so `params.args` begin with `--session <name>` unless argv already starts with `--session`. This is conservative evidence, not proof the candidate should be clicked; prefer `inspect-overlay-state` first unless the dismiss control is clearly safe.
539
+ `overlayBlockers` may appear after a successful **top-level non-Electron** `click` (the unified `details.command` is `click`, not `batch`/`job`/`qa` flows that compile to `batch`) only when upstream JSON includes a string `data.clicked` ref, the session’s prior pinned tab URL (`priorSessionTabTarget.url`) and the post-click active tab URL both exist and stay equal after the same URL normalization used for ref preflight (trimmed hosts/paths; **`#fragment` dropped** while the query string stays significant), and the wrapper did not apply session tab correction or an about-blank mismatch recovery in the same result. Wrapper-tracked Electron clicks prefer lifecycle health and ref-freshness diagnostics because desktop app chrome produced too many false overlay candidates in dogfood. The post-click side comes from `details.navigationSummary.url`, which the wrapper fills with one read-only `eval` summary (`({ title: document.title, url: location.href })`) only when upstream click JSON omits **both** string `data.url` and `data.title` (`shouldCaptureNavigationSummary` in `extensions/agent-browser/index.ts`). If either field is present as a string on the click payload, that probe is skipped, `navigationSummary` stays unset here, and overlay diagnostics are omitted even when the page did not navigate. The wrapper then issues **one** extra session-scoped `snapshot -i`, scans that snapshot’s `refs` map, and only emits diagnostics when **both** are true: at least one ref has a strong modal role (`dialog` or `alertdialog`), and there are up to **three** separate `button`/`link`/`menuitem` refs whose names match close/dismiss-style patterns (for example “Close”, “Dismiss”, “No thanks”, or a lone `×`). Page-wide text such as “privacy”, “sign in”, or “banner” without a dialog role is not enough, which avoids warning on ordinary same-page menu opens or app button mutations. Each candidate carries `ref` (`@eN`), optional `role`/`name`, exact `click` argv in `args`, and a short evidence `reason`. The struct also includes a `summary` string (one sentence stating that the click left the tab on the same normalized URL and the fresh snapshot shows likely dismiss controls) plus a `snapshot` object (same shape as `details.refSnapshot` after a normal snapshot): on success the wrapper may treat that snapshot as the session’s latest ref map for subsequent calls, so agents should assume refs can move to match this post-diagnostic tree. Visible text appends the same bullets under `Possible overlay blockers`, and `details.nextActions` gains `inspect-overlay-state` plus `try-overlay-blocker-candidate-1`…`3` after any presentation `nextActions` (for example `inspect-after-mutation`); when `details.sessionName` is set, those appended actions use `sessionPrefixArgs` so `params.args` begin with `--session <name>` unless argv already starts with `--session`. This is conservative evidence, not proof the candidate should be clicked; prefer `inspect-overlay-state` first unless the dismiss control is clearly safe.
393
540
 
394
541
  Example shape (fields vary by scenario):
395
542
 
@@ -442,14 +589,19 @@ Implementation and precedence:
442
589
  - Types, classifiers, and generic follow-up assembly live in `extensions/agent-browser/lib/results/shared.ts`: `classifyAgentBrowserSuccessCategory`, `classifyAgentBrowserFailureCategory`, `buildAgentBrowserResultCategoryDetails` (the last prefers an explicit `failureCategory` when the caller already knows the bucket, otherwise it runs the classifier), and `buildAgentBrowserNextActions`. Failed upstream `network requests` rows also flow through `classifyNetworkRequestFailure` / `summarizeNetworkFailures` there for QA analysis (`analyzeQaPresetResults` in `extensions/agent-browser/index.ts`) and for actionable-vs-benign lines plus request-specific nextActions in `network requests` presentation (`extensions/agent-browser/lib/results/presentation.ts`).
443
590
  - Artifact verification: `ArtifactVerificationSummary` / `ArtifactVerificationEntry` types live in `shared.ts`. `buildArtifactVerificationSummary`, `getArtifactVerificationEntry`, and `getManifestVerificationEntry` in `presentation.ts` merge each resolved file artifact with manifest rows whose `storageScope` is not `explicit-path` (those rows duplicate file artifacts) and whose `path` is in the current result’s spill path set. Successful presentation merges then run `classifyPresentationSuccessCategory` in `presentation.ts`, which forces `successCategory: "artifact-unverified"` when `artifactVerification.missingCount` or `artifactVerification.unverifiedCount` is greater than zero before delegating to `classifyAgentBrowserSuccessCategory`.
444
591
  - Inner success categories (`classifyAgentBrowserSuccessCategory` in `shared.ts`, after verification counts are clear): if `inspection` is true → `"inspection"`; else if any non-pending artifact lacks confirmed on-disk presence (`exists !== true`) → `"artifact-unverified"`; else if there is a `savedFile` or any `artifacts` → `"artifact-saved"`; else → `"completed"`.
445
- - Failure: the classifier walks a single ordered chain (first match wins): `confirmation-required` → `timeout` → `missing-binary` → `parse-failure` → `aborted` → `tab-drift` → `stale-ref` (including “unknown ref” text and a narrow `@eN` plus “element not found” heuristic) → `selector-unsupported` → `selector-not-found` → `download-not-verified` (download / wait-download style failures) → `validation-error` when a wrapper `validationError` is present → default `upstream-error`.
592
+ - Failure: the classifier walks a single ordered chain (first match wins): `confirmation-required` → `timeout` → `missing-binary` → `parse-failure` → `aborted` → `policy-blocked` → `cleanup-failed` → `tab-drift` → `stale-ref` (including “unknown ref” text and a narrow `@eN` plus “element not found” heuristic) → `selector-unsupported` → `selector-not-found` → `download-not-verified` (download / wait-download style failures) → `validation-error` when a wrapper `validationError` is present → default `upstream-error`.
446
593
  - The main tool implementation merges these fields into Pi-facing `details` from `extensions/agent-browser/index.ts` and from `extensions/agent-browser/lib/results/presentation.ts` for presentation-time failures.
447
594
 
448
595
  Additional structured fields can appear when relevant:
449
596
  - `compiledSemanticAction` when the call used `semanticAction` and the result includes the unified `details` merge: `{ action, locator, args }` for `find` actions or `{ action: "select", selector, values, args }` for `select`, with the same redaction rules as `args` / `effectiveArgs`; omitted for plain `args`/`job` calls and omitted on some early error returns that omit this field (see the `semanticAction` section above)
450
597
  - `compiledJob` when the call used `job` or the job-backed `qa` preset: `{ args: ["batch"], stdin, steps: [{ action, args }] }`, with step args redacted the same way as other invocation details
451
- - `compiledQaPreset` when the call used `qa`: the compiled job fields plus the QA `checks` object
598
+ - `compiledQaPreset` when the call used `qa`: the compiled job fields plus the QA `checks` object. `checks.attached` is `true` for current-session QA and `checks.url` is present only for URL-opening QA.
599
+ - `compiledSourceLookup` when the call used `sourceLookup`: `{ args: ["batch"], stdin, steps, query }` with the generated local-evidence plan and original query fields (`selector?`, `reactFiberId?`, `componentName?`, `includeDomHints?`, `maxWorkspaceFiles?`).
600
+ - `sourceLookup` when the call used `sourceLookup`: `{ status, candidates, limitations, summary, workspaceRoot?, electronContext? }`; wrapper-tracked packaged Electron no-candidate diagnostics may carry `workspaceRoot` plus `electronContext` and live Electron nextActions without marking the successful batch as a tool failure.
601
+ - `compiledNetworkSourceLookup` / `networkSourceLookup` when the call used `networkSourceLookup`: the generated batch plan plus bounded failed-request/candidate evidence as described above.
452
602
  - `qaPreset` when the call used `qa`: `{ passed, failedChecks, warnings, summary }`. Network rows inside the `network requests` batch step use `summarizeNetworkFailures` / `classifyNetworkRequestFailure` in `shared.ts`: actionable failures appear in `failedChecks` (and fail the tool when the upstream batch still succeeded); benign icon-classified failures appear only in `warnings` and in `summary` as `QA preset passed with warnings: …` when nothing else failed.
603
+ - `compiledElectron` when the call used `electron`: redacted action plan for `list`, `launch`, `status`, `cleanup`, or `probe`.
604
+ - `electron` when the call used `electron`: action-specific lifecycle, discovery, probe, and cleanup data; see the `electron` section below.
453
605
  - `batchFailure` and `batchSteps` for `batch` rendering, including mixed-success runs
454
606
  - `navigationSummary` for navigation-style commands like `click`, `back`, `forward`, and `reload`
455
607
  - `pageChangeSummary` for compact mutation/artifact/navigation summaries on commands that can change browser state
@@ -459,6 +611,7 @@ Additional structured fields can appear when relevant:
459
611
  - `comboboxFocus` after a successful explicit combobox-targeted `click` / `fill` / `find … click|fill` (for example `semanticAction` with role `combobox`, including when that semantic action resolves through a current visible `@ref` before execution) when a read-only probe sees the active element is combobox-like, `aria-expanded` is explicitly present (`false` or `true`), and no visible `listbox` / `option` / menu option elements are open. Shape: `{ reason: "focused-combobox-without-visible-options", message, activeElement, visibleListboxCount, visibleOptionCount, recommendations }`; `activeElement` includes bounded role/tag/expanded/hasPopup/name metadata with normal text redaction. Visible text appends `Combobox diagnostic: focused combobox did not expose visible options`, and `details.nextActions` gains `inspect-focused-combobox` (`snapshot -i`), `try-open-combobox-with-arrow` (`press ArrowDown`), and `try-open-combobox-with-enter` (`press Enter`), session-prefixed when applicable. The diagnostic is deliberately gated to explicit combobox-targeted calls to avoid extra probes or false positives on ordinary clicks/textboxes.
460
612
  - `recordingDependencyWarning` after a successful `record start` or `record restart` when the wrapper cannot find an executable `ffmpeg` on the Pi process `PATH`. Shape: `{ reason: "ffmpeg-missing-for-recording", dependency: "ffmpeg", command, message, recommendations }`. Visible text appends `Recording dependency warning: ffmpeg not found on PATH`. This is a non-blocking preflight warning: upstream may start recording, but `record stop` needs `ffmpeg` to encode the WebM.
461
613
  - `selectorTextVisibility` after a **successful** upstream `get text <selector>` (standalone or inside a successful `batch`) when the wrapper’s follow-up probe finds a hazard: more than one DOM match (upstream reads the first `querySelectorAll` hit, which may be the wrong tab/panel), or the first match is hidden while at least one other match is visible (requires multiple DOM nodes so a visible peer exists; a lone hidden match is not flagged). The probe is a read-only `eval --stdin` script (`buildVisibleTextProbeScript` in `extensions/agent-browser/index.ts`) that counts matches, applies a small visibility heuristic (`display`/`visibility`/`opacity` plus non-zero client rects), and may include a redacted `firstVisibleTextPreview`. It is **not** run for page-scoped `@e…` selectors or when the selector string is withheld because `selectorMayExposeSensitiveLiteral` would risk echoing secrets in probe output. `details.selectorTextVisibility` mirrors the primary diagnostic (first sorted entry); when several selectors in one `batch` qualify, `selectorTextVisibilityAll` lists every diagnostic sorted so hidden-first cases precede generic multi-match ambiguity. Appended `details.nextActions` use ids `inspect-visible-text-candidates` and `inspect-visible-text-candidates-2`, … with the probe replayed via `eval --stdin` for each hazardous selector.
614
+ - `electronGetTextScopeWarning` after a successful attached Electron `get text <selector>` (standalone or successful `batch`) when a broad non-ref CSS selector such as `body`, `html`, `main`, `div`, or `[role=application]` may read the whole app shell. Shape: `{ selector, summary, electronContext: { launchId?, sessionName?, url? } }`; multiple batched diagnostics use `electronGetTextScopeWarnings`. Visible text appends `Broad Electron get text selector warning`, and next actions use `snapshot-for-electron-text-scope` ids with session-scoped `snapshot -i` payloads.
462
615
  - `evalStdinHint` after a successful `eval --stdin` when caller stdin (trimmed) looks function-shaped to the wrapper’s lightweight detector (`looksLikeFunctionEvalStdin` in `extensions/agent-browser/index.ts`: leading `function` / `async function`, parenthesized arrow `(…) =>`, or a concise `name =>` / `async name =>` form) **and** upstream JSON `data` is an object whose `result` field is a plain empty object (`{}`). Arrays such as `[]` do not qualify. It includes `reason` and `suggestion`; visible output appends `Eval stdin hint` with the same guidance. This is a heuristic for the common mistake of returning a function object instead of invoking it or passing a plain expression, not a JavaScript parser or proof that the page returned no useful data.
463
616
  - `timeoutPartialProgress` after `runAgentBrowserProcess` reports `timedOut` (wrapper child-process watchdog) when best-effort recovery finds useful context. `summary` is a short sentence counting how many declared artifact paths exist on disk versus how many were scanned, and whether page context came from live session reads or only from a planned URL (when nothing in the plan declares an artifact path, the fraction may read `0/0` while `currentPage` can still carry session or planned URL context). `steps` lists planned argv from the compiled `job` or `qa` batch plan (`compiledJob` in `extensions/agent-browser/index.ts`, which is only populated for those top-level modes) or, when that object is absent, from the same JSON-array `batch` stdin the tool sends upstream—whether caller-authored or wrapper-generated for `sourceLookup` / `networkSourceLookup` (1-based indices; only JSON-array stdin whose elements are string[] argv arrays is parsed); timeouts on other argv shapes may still emit `currentPage` / summary evidence without `steps`. `currentPage` comes from session-scoped `get url` / `get title` when the session answers, otherwise a fallback URL may be inferred from the last `open` / `navigate` / `pushstate` step in the plan. `artifacts` covers declared output paths on `screenshot`, `pdf`, `download`, and `wait --download` steps (absolute path, existence, optional `sizeBytes`, `stepIndex`). Visible text repeats the same block under `Timeout partial progress`, applying URL and path-segment redaction; the prose `Planned steps` list shows at most six steps, then an omitted-count line when the plan is longer. This is recovery evidence only; missing entries do not prove the upstream step never ran or that no other side effects occurred.
464
617
  - `managedSessionOutcome` after a managed-session plan reaches process execution (`buildManagedSessionOutcome` / `formatManagedSessionOutcomeText` in `extensions/agent-browser/index.ts`). Populated when `buildExecutionPlan` injects an extension-managed implicit or fresh `--session` (omitted when the caller already set explicit upstream `--session` or for stateless inspection paths that skip injection). Fields: `status` (`created`, `replaced`, `unchanged`, `closed`, `preserved`, or `abandoned`), `sessionMode`, `attemptedSessionName`, `previousSessionName`, `currentSessionName`, optional `replacedSessionName`, `activeBefore`, `activeAfter`, `succeeded`, and `summary`. Model-visible echo: only when `sessionMode` is `"fresh"` **and** `succeeded` is false, the wrapper appends a line of the form `Managed session outcome: ${summary}` after the primary presentation (including missing-binary failures on a fresh plan, where it follows the missing-binary message and no other diagnostic tail runs). When other trailing diagnostic prose is also emitted in the same result, that line is concatenated **after** semantic-action candidate lines, overlay/selector-visibility tails, and `Timeout partial progress` (see `rawAppendedDiagnosticText` in `extensions/agent-browser/index.ts`). For `"auto"` failures the same struct may appear on `details` without that extra line. When post-upstream analysis (for example **`qa`** preset failure) flips the overall tool result after a successful batch, the implementation only realigns `managedSessionOutcome.succeeded` to the final outcome; `status`/`summary` may still describe the managed-session transition (for example `replaced` while `failureCategory` is `qa-failure`), so read `failureCategory` / `qaPreset` / `batchFailure` alongside this object.