pi-agent-browser-native 0.2.31 → 0.2.33
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/CHANGELOG.md +35 -0
- package/README.md +64 -18
- package/docs/ARCHITECTURE.md +13 -10
- package/docs/COMMAND_REFERENCE.md +71 -16
- package/docs/ELECTRON.md +387 -0
- package/docs/RELEASE.md +34 -4
- package/docs/REQUIREMENTS.md +5 -3
- package/docs/SUPPORT_MATRIX.md +36 -21
- package/docs/TOOL_CONTRACT.md +198 -40
- package/extensions/agent-browser/index.ts +1585 -3486
- package/extensions/agent-browser/lib/electron/cleanup.ts +287 -0
- package/extensions/agent-browser/lib/electron/discovery.ts +717 -0
- package/extensions/agent-browser/lib/electron/launch.ts +553 -0
- package/extensions/agent-browser/lib/input-modes/electron.ts +170 -0
- package/extensions/agent-browser/lib/input-modes/job.ts +203 -0
- package/extensions/agent-browser/lib/input-modes/lookups.ts +447 -0
- package/extensions/agent-browser/lib/input-modes/params.ts +188 -0
- package/extensions/agent-browser/lib/input-modes/semantic-action.ts +107 -0
- package/extensions/agent-browser/lib/input-modes/shared.ts +46 -0
- package/extensions/agent-browser/lib/input-modes/types.ts +221 -0
- package/extensions/agent-browser/lib/input-modes.ts +41 -0
- package/extensions/agent-browser/lib/orchestration/browser-run/diagnostics.ts +696 -0
- package/extensions/agent-browser/lib/orchestration/browser-run/final-result.ts +450 -0
- package/extensions/agent-browser/lib/orchestration/browser-run/index.ts +46 -0
- package/extensions/agent-browser/lib/orchestration/browser-run/prepare.ts +711 -0
- package/extensions/agent-browser/lib/orchestration/browser-run/process-output.ts +386 -0
- package/extensions/agent-browser/lib/orchestration/browser-run/session-state.ts +868 -0
- package/extensions/agent-browser/lib/orchestration/browser-run/types.ts +476 -0
- package/extensions/agent-browser/lib/orchestration/browser-run.ts +1 -0
- package/extensions/agent-browser/lib/orchestration/input-plan.ts +338 -0
- package/extensions/agent-browser/lib/playbook.ts +15 -13
- package/extensions/agent-browser/lib/process.ts +106 -4
- package/extensions/agent-browser/lib/results/action-recommendations.ts +269 -0
- package/extensions/agent-browser/lib/results/artifact-manifest.ts +114 -0
- package/extensions/agent-browser/lib/results/artifact-state.ts +13 -0
- package/extensions/agent-browser/lib/results/categories.ts +106 -0
- package/extensions/agent-browser/lib/results/contracts.ts +220 -0
- package/extensions/agent-browser/lib/results/editable-ref-evidence.ts +72 -0
- package/extensions/agent-browser/lib/results/envelope.ts +2 -1
- package/extensions/agent-browser/lib/results/network.ts +64 -0
- package/extensions/agent-browser/lib/results/next-actions.ts +117 -0
- package/extensions/agent-browser/lib/results/presentation/artifacts.ts +506 -0
- package/extensions/agent-browser/lib/results/presentation/batch.ts +355 -0
- package/extensions/agent-browser/lib/results/presentation/common.ts +53 -0
- package/extensions/agent-browser/lib/results/presentation/content.ts +36 -0
- package/extensions/agent-browser/lib/results/presentation/diagnostics.ts +730 -0
- package/extensions/agent-browser/lib/results/presentation/errors.ts +125 -0
- package/extensions/agent-browser/lib/results/presentation/large-output.ts +182 -0
- package/extensions/agent-browser/lib/results/presentation/navigation.ts +216 -0
- package/extensions/agent-browser/lib/results/presentation/registry.ts +154 -0
- package/extensions/agent-browser/lib/results/presentation/skills.ts +143 -0
- package/extensions/agent-browser/lib/results/presentation.ts +87 -2369
- package/extensions/agent-browser/lib/results/recovery-actions.ts +139 -0
- package/extensions/agent-browser/lib/results/recovery-next-actions.ts +71 -0
- package/extensions/agent-browser/lib/results/selector-recovery.ts +312 -0
- package/extensions/agent-browser/lib/results/shared.ts +17 -701
- package/extensions/agent-browser/lib/results/snapshot-high-value-controls.ts +262 -0
- package/extensions/agent-browser/lib/results/snapshot-refs.ts +100 -0
- package/extensions/agent-browser/lib/results/snapshot-segments.ts +366 -0
- package/extensions/agent-browser/lib/results/snapshot-spill.ts +63 -0
- package/extensions/agent-browser/lib/results/snapshot.ts +37 -489
- package/extensions/agent-browser/lib/results/text.ts +40 -0
- package/extensions/agent-browser/lib/results.ts +16 -5
- package/extensions/agent-browser/lib/session-page-state.ts +486 -0
- package/extensions/agent-browser/lib/temp.ts +26 -0
- package/package.json +6 -4
|
@@ -4,6 +4,7 @@ Related docs:
|
|
|
4
4
|
- [`../README.md`](../README.md)
|
|
5
5
|
- [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md)
|
|
6
6
|
- [`ARCHITECTURE.md`](ARCHITECTURE.md)
|
|
7
|
+
- [`ELECTRON.md`](ELECTRON.md)
|
|
7
8
|
- [`RELEASE.md`](RELEASE.md)
|
|
8
9
|
- [`SUPPORT_MATRIX.md`](SUPPORT_MATRIX.md)
|
|
9
10
|
|
|
@@ -26,7 +27,7 @@ Use `npm run benchmark:agent-browser` or `npm run verify -- benchmark` before an
|
|
|
26
27
|
|
|
27
28
|
## Core mental model
|
|
28
29
|
|
|
29
|
-
Tool parameters (use exactly one of `args`, `semanticAction`, `job`, `qa`, `sourceLookup`, or `
|
|
30
|
+
Tool parameters (use exactly one of `args`, `semanticAction`, `job`, `qa`, `sourceLookup`, `networkSourceLookup`, or `electron`):
|
|
30
31
|
|
|
31
32
|
```json
|
|
32
33
|
{ "args": ["open", "https://example.com"], "sessionMode": "auto" }
|
|
@@ -53,13 +54,19 @@ Tool parameters (use exactly one of `args`, `semanticAction`, `job`, `qa`, `sour
|
|
|
53
54
|
{ "networkSourceLookup": { "requestId": "req-1", "url": "/api/fail" } }
|
|
54
55
|
```
|
|
55
56
|
|
|
56
|
-
|
|
57
|
-
|
|
57
|
+
```json
|
|
58
|
+
{ "electron": { "action": "list", "query": "code" } }
|
|
59
|
+
{ "electron": { "action": "launch", "appName": "Visual Studio Code", "handoff": "snapshot" } }
|
|
60
|
+
```
|
|
61
|
+
|
|
62
|
+
- `args`: exact `agent-browser` CLI tokens after the binary name. Omit when using `semanticAction`, `job`, `qa`, `sourceLookup`, `networkSourceLookup`, or `electron` instead (mutually exclusive).
|
|
63
|
+
- `semanticAction`: optional shorthand for common `find` flows and native dropdown `select`; compiles to upstream argv and is rejected together with `args`, `job`, `qa`, `sourceLookup`, `networkSourceLookup`, or `electron` on the same call.
|
|
58
64
|
- `job`: optional constrained short-workflow schema; compiles to existing upstream `batch` args/stdin and reports the compiled plan in `details.compiledJob`.
|
|
59
65
|
- `qa`: optional lightweight QA preset; compiles to the same batch path and reports `details.compiledQaPreset` plus `details.qaPreset` pass/fail evidence.
|
|
60
66
|
- `sourceLookup`: optional experimental helper for local UI-to-source *candidates*; compiles to the same `batch` path, reports `details.compiledSourceLookup` and `details.sourceLookup`, and never reclassifies a fully successful upstream batch as failed the way `qa` can (see [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#sourcelookup) and the longer notes below).
|
|
61
67
|
- `networkSourceLookup`: optional experimental helper for failed request-to-source *candidates*; compiles to generated `batch`, reports `details.compiledNetworkSourceLookup` and `details.networkSourceLookup`, and never assigns blame or edits files.
|
|
62
|
-
- `
|
|
68
|
+
- `electron`: optional Electron desktop-app shorthand. `list`, `status`, `cleanup`, and `probe` are wrapper-owned host/session helpers; `launch` starts a wrapper-owned isolated Electron profile and attaches through upstream `connect`.
|
|
69
|
+
- `stdin`: only for `batch`, `eval --stdin`, and `auth save --password-stdin`; other command/stdin combinations are rejected before `agent-browser` is launched. `job`, `qa`, `sourceLookup`, `networkSourceLookup`, and `electron` generate or manage their own input.
|
|
63
70
|
- `sessionMode`:
|
|
64
71
|
- `"auto"` reuses the extension-managed session when possible.
|
|
65
72
|
- `"fresh"` rotates that managed session to a fresh upstream launch so launch-scoped flags like `--profile`, `--session-name`, `--cdp`, `--state`, `--auto-connect`, `--init-script`, `--enable`, `-p` / `--provider`, or iOS `--device` apply.
|
|
@@ -127,6 +134,7 @@ Examples:
|
|
|
127
134
|
{ "args": ["find", "text", "Close", "click"] }
|
|
128
135
|
{ "args": ["find", "label", "Email", "fill", "user@example.com"] }
|
|
129
136
|
{ "semanticAction": { "action": "click", "locator": "role", "value": "button", "name": "Close" } }
|
|
137
|
+
{ "semanticAction": { "action": "click", "locator": "role", "role": "button", "name": "Continue without Signing In" } }
|
|
130
138
|
{ "semanticAction": { "action": "fill", "locator": "label", "value": "Email", "text": "user@example.com" } }
|
|
131
139
|
{ "semanticAction": { "action": "select", "selector": "#flavor", "value": "chocolate" } }
|
|
132
140
|
{ "semanticAction": { "action": "click", "locator": "text", "value": "Close", "session": "named-browser" } }
|
|
@@ -135,11 +143,13 @@ Examples:
|
|
|
135
143
|
{ "args": ["snapshot", "-i"] }
|
|
136
144
|
```
|
|
137
145
|
|
|
138
|
-
The optional native `semanticAction` object is only a thin schema for common locator-based actions and native dropdown selection; it compiles locator actions to existing upstream `find` commands, compiles `action: "select"` to upstream `select <selector> <value...>`, and reports the compiled argv in `details.compiledSemanticAction` (see [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#semanticaction) for the full field rules). It is a top-level alternative to `args`, `job`, `qa`, `sourceLookup`, and `
|
|
146
|
+
The optional native `semanticAction` object is only a thin schema for common locator-based actions and native dropdown selection; it compiles locator actions to existing upstream `find` commands, compiles `action: "select"` to upstream `select <selector> <value...>`, and reports the compiled argv in `details.compiledSemanticAction` (see [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#semanticaction) for the full field rules). For `locator: "role"`, pass either `value: "button"` or `role: "button"`; if both are present they must match. It is a top-level alternative to `args`, `job`, `qa`, `sourceLookup`, `networkSourceLookup`, and `electron`, not a nested shape inside `batch` stdin arrays. Add `session` inside `semanticAction` when the shorthand should target a named upstream browser session; the compiled argv prepends `--session <name>` before `find` or `select`, and fallback candidate actions preserve that prefix. For active sessions, role/name click/check/uncheck shorthands may resolve through the current `snapshot -i` refs before execution so hidden duplicate matches do not steal the action; inspect `details.effectiveArgs` when you need the exact executed argv. `select` shorthand intentionally requires a stable selector or current `@ref` plus `value`/`values`; upstream `find` does not expose a verified `select` action, so role/name/label dropdown resolution stays a snapshot/selector decision instead of hidden wrapper magic. If a raw `find` or semantic action misses with `selector-not-found`, the wrapper may take one fresh snapshot and append `Current snapshot ref fallback` when that snapshot has exact visible role/name matches for the failed target. Non-fill matches can include direct `try-current-visible-ref*` next actions. Semantic click misses may also include `Agent-browser candidate fallbacks`; `details.nextActions` first recommends a fresh `snapshot -i` and may include bounded role/name retries such as `button`/`link` for a missed `text` click, each as a `try-*-candidate` entry carrying redacted `find role …` argv.
|
|
147
|
+
|
|
148
|
+
For desktop or host-controlled rich inputs, treat a semantic `fill` miss differently. If the fresh snapshot finds an exact current editable ref (`searchbox` or `textbox`), `details.richInputRecovery` and visible `Rich input recovery` describe the candidate and append `focus-current-editable-ref*` / `click-current-editable-ref*` next actions. Those actions deliberately do **not** copy the fill text and never press `Enter` or submit. Use the safe ladder instead: refresh refs, choose the current editable `@ref`, focus or click it, then send the intended text with `keyboard inserttext` or `keyboard type` in a separate call. Do not auto-submit unless the user flow explicitly calls for it.
|
|
139
149
|
|
|
140
150
|
Do not assume Playwright selector dialects such as `text=Close` or `button:has-text('Close')` are supported wrapper syntax. If you need those forms, verify current upstream `agent-browser` behavior first; otherwise use refs, `find`, or known CSS selectors.
|
|
141
151
|
|
|
142
|
-
Treat `@e…` refs as page-scoped. After a successful `snapshot`, the wrapper records the latest refs and page target for that session; mutation-prone ref commands such as `click @e4`, `select @e5 chocolate`, or batch steps with old refs fail with `failureCategory: "stale-ref"` when the page target changed or the ref is absent from the latest same-page snapshot. Inside `batch` stdin JSON, the wrapper also walks steps in order before spawn: steps whose first token can navigate or mutate set a latch; a later step whose first token is `snapshot` clears that latch for following rows; guarded steps that still mention `@e…` after an uncleared latch fail with the same `stale-ref` bucket without launching upstream. Same-snapshot form fills are allowed before a click or submit step, so a login-style `fill`, `fill`, `click` batch can run from one snapshot; split dynamic or autosubmit forms with a fresh snapshot if a fill itself rerenders the targets. Follow the `refresh-interactive-refs` next action (it includes `--session <name>` when needed) and prefer stable `find` or `semanticAction` locators when navigation or rerendering is likely. Contract detail: [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#details) (`refSnapshot`).
|
|
152
|
+
Treat `@e…` refs as page-scoped. After a successful `snapshot`, the wrapper records the latest refs and page target for that session; mutation-prone ref commands such as `click @e4`, `select @e5 chocolate`, or batch steps with old refs fail with `failureCategory: "stale-ref"` when the page target changed or the ref is absent from the latest same-page snapshot. If a session `snapshot -i` fails with `No active page`, the wrapper invalidates prior refs for that session; later mutation-prone `@e…` calls fail before upstream until a successful fresh `snapshot -i` records refs again. Inside `batch` stdin JSON, the wrapper also walks steps in order before spawn: steps whose first token can navigate or mutate set a latch; a later step whose first token is `snapshot` clears that latch for following rows; guarded steps that still mention `@e…` after an uncleared latch fail with the same `stale-ref` bucket without launching upstream. Same-snapshot form fills are allowed before a click or submit step, so a login-style `fill`, `fill`, `click` batch can run from one snapshot; split dynamic or autosubmit forms with a fresh snapshot if a fill itself rerenders the targets. Follow the `refresh-interactive-refs` next action (it includes `--session <name>` when needed) and prefer stable `find` or `semanticAction` locators when navigation or rerendering is likely. Contract detail: [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#details) (`refSnapshot`, `refSnapshotInvalidation`).
|
|
143
153
|
|
|
144
154
|
A successful `click` result means upstream reported a target, not that the app definitely handled the event. When the workflow depends on a mutation, use `details.pageChangeSummary`, a wait, URL/text extraction, or a fresh `snapshot -i` before trusting the state; if nothing changed, retry with a current visible ref or stable selector and report the workflow issue. Preserve explicit user stop boundaries: if the user says to stop before a final order, post, purchase, or submit action, gather evidence from that page and do not click the final action. The wrapper avoids site-specific fallback clicks and keeps the verification burden explicit.
|
|
145
155
|
|
|
@@ -164,7 +174,7 @@ Prefer `get` and scoped `eval --stdin` for read-only extraction. Getter names ar
|
|
|
164
174
|
|
|
165
175
|
Return the intended JavaScript value from `eval --stdin` instead of relying on `console.log`. For object-shaped extraction, pass a plain expression such as `({ title: document.title, url: location.href })`; if you send a function-shaped snippet, invoke it explicitly, for example `(() => ({ title: document.title }))()`. When upstream serializes a function result to `{}`, the wrapper can append `Eval stdin hint` and `details.evalStdinHint`.
|
|
166
176
|
|
|
167
|
-
On tabbed or hidden-DOM pages, `get text <selector>` reads the upstream-selected match, which may be hidden even when a later match is visible. For non-`@ref` CSS selectors with multiple matches, including successful `batch` steps, the wrapper may add `Selector text visibility warning`, `details.selectorTextVisibility` (and `details.selectorTextVisibilityAll` for multiple batched warnings), and `inspect-visible-text-candidates` next actions so agents know to use a fresher `snapshot -i`, a visible `@ref`, or a more specific selector instead of trusting hidden tab content.
|
|
177
|
+
On tabbed or hidden-DOM pages, `get text <selector>` reads the upstream-selected match, which may be hidden even when a later match is visible. For non-`@ref` CSS selectors with multiple matches, including successful `batch` steps, the wrapper may add `Selector text visibility warning`, `details.selectorTextVisibility` (and `details.selectorTextVisibilityAll` for multiple batched warnings), and `inspect-visible-text-candidates` next actions. The warning names the matching `details.nextActions` id so agents know to use a fresher `snapshot -i`, a visible `@ref`, or a more specific selector instead of trusting hidden tab content.
|
|
168
178
|
|
|
169
179
|
### Run a multi-step flow in one browser invocation
|
|
170
180
|
|
|
@@ -190,11 +200,11 @@ For short constrained flows, use top-level `job` instead of hand-writing `batch`
|
|
|
190
200
|
|
|
191
201
|
On app pages that expose a native dropdown, add a `select` step such as `{ "action": "select", "selector": "#flavor", "value": "chocolate" }` before the assertion that depends on it.
|
|
192
202
|
|
|
193
|
-
Use raw `args: ["batch"]` with `stdin` when you need arbitrary upstream commands, flags, or batch failure policies outside the constrained schema. Do not pass `stdin` with `job`, `qa`, `sourceLookup`, or `
|
|
203
|
+
Use raw `args: ["batch"]` with `stdin` when you need arbitrary upstream commands, flags, or batch failure policies outside the constrained schema. Do not pass `stdin` with `job`, `qa`, `sourceLookup`, `networkSourceLookup`, or `electron`; those modes generate or manage their own input.
|
|
194
204
|
|
|
195
|
-
For quick smoke/QA checks, use top-level `qa`. It clears enabled network/console/page-error buffers before opening the target URL, waits for page readiness, checks expected text/selector, inspects fresh network requests, console messages, and page errors, and can capture an evidence screenshot. The readiness wait defaults to `loadState: "domcontentloaded"`; set `loadState` to `"load"` or `"networkidle"` only when that stricter state is useful and the site is not expected to keep background requests alive. QA network diagnostics classify failed requests by likely impact and list failed rows first in the network preview: actionable document/script/API-style failures fail the preset, while common low-impact browser icon misses such as `favicon.ico` are surfaced as warnings (`qaPreset.warnings`) so they do not fail an otherwise healthy page.
|
|
205
|
+
For quick smoke/QA checks, use top-level `qa`. It clears enabled network/console/page-error buffers before opening the target URL, waits for page readiness, checks expected text/selector, inspects fresh network requests, console messages, and page errors, and can capture an evidence screenshot. The readiness wait defaults to `loadState: "domcontentloaded"`; set `loadState` to `"load"` or `"networkidle"` only when that stricter state is useful and the site is not expected to keep background requests alive. QA network diagnostics classify failed requests by likely impact and list failed rows first in the network preview: actionable document/script/API-style failures fail the preset, while common low-impact browser icon misses such as `favicon.ico` are surfaced as warnings (`qaPreset.warnings`) so they do not fail an otherwise healthy page. Failed QA presets report `details.resultCategory: "failure"`, `failureCategory: "qa-failure"`, and real Pi sessions treat the diagnostic as a failed tool result. Prose output also gets a model-visible result-category line including `Pi tool isError: true`; caller-requested `--json` output keeps the JSON string parseable and relies on the patched `isError` plus `details` fields.
|
|
196
206
|
|
|
197
|
-
The same classification drives plain `network requests` presentation: when any row counts as failed (HTTP status ≥ 400, `failed: true`, or a string `error`), model-facing text starts with a line like `Network failure summary: 0 actionable, 1 benign low-impact (1 total).`, and each preview line can end with an impact tag such as `[benign: low-impact browser icon asset]` or `[actionable: document, script, API, or non-benign request failure]`. When safe request IDs are present, `details.nextActions` adds bounded read-only follow-ups such as `network request <id>`, `networkSourceLookup` for actionable failed rows, `network requests --filter <path>`, and `network har start`; prefer those payloads over rebuilding request-id commands from prose. Rules live in `classifyNetworkRequestFailure` / `summarizeNetworkFailures` in `extensions/agent-browser/lib/results/
|
|
207
|
+
The same classification drives plain `network requests` presentation: when any row counts as failed (HTTP status ≥ 400, `failed: true`, or a string `error`), model-facing text starts with a line like `Network failure summary: 0 actionable, 1 benign low-impact (1 total).`, and each preview line can end with an impact tag such as `[benign: low-impact browser icon asset]` or `[actionable: document, script, API, or non-benign request failure]`. When safe request IDs are present, `details.nextActions` adds bounded read-only follow-ups such as `network request <id>`, `networkSourceLookup` for actionable failed rows, `network requests --filter <path>`, and `network har start`; prefer those payloads over rebuilding request-id commands from prose. Rules live in `classifyNetworkRequestFailure` / `summarizeNetworkFailures` in `extensions/agent-browser/lib/results/network.ts`; QA aggregation is `analyzeQaPresetResults` in `extensions/agent-browser/index.ts`.
|
|
198
208
|
|
|
199
209
|
```json
|
|
200
210
|
{ "qa": { "url": "https://example.com", "expectedText": "Example Domain", "screenshotPath": ".dogfood/qa-example.png" } }
|
|
@@ -202,9 +212,52 @@ The same classification drives plain `network requests` presentation: when any r
|
|
|
202
212
|
|
|
203
213
|
Optional `loadState`, `checkNetwork`, `checkConsole`, and `checkErrors` default to `"domcontentloaded"`, `true`, `true`, and `true`; set a check to `false` to skip that diagnostic. Omit `expectedText` and `expectedSelector` when you only need load plus diagnostics.
|
|
204
214
|
|
|
215
|
+
For attached Electron or manually connected CDP sessions, use `qa.attached` after the session exists. It does not open a URL and rejects `sessionMode: "fresh"` because it checks the current managed session.
|
|
216
|
+
|
|
217
|
+
```json
|
|
218
|
+
{ "qa": { "attached": true, "expectedText": "Explorer", "screenshotPath": ".dogfood/electron.png" } }
|
|
219
|
+
```
|
|
220
|
+
|
|
205
221
|
Use custom `job` or raw `batch` when you need a different check sequence.
|
|
206
222
|
|
|
207
|
-
|
|
223
|
+
### Electron desktop apps
|
|
224
|
+
|
|
225
|
+
Full public guide: [`ELECTRON.md`](ELECTRON.md). Use it as the entry point when Electron support is the task; this section keeps the inline workflow snippets for agents reading the broader command surface.
|
|
226
|
+
|
|
227
|
+
Use top-level `electron` when the wrapper should discover, launch, attach to, probe, and clean up a desktop Electron app. The wrapper owns only launches it created. It uses an isolated temporary `userDataDir`, `--remote-debugging-port=0`, and safe launch defaults; it does **not** reuse the app's normal signed-in profile or attach to an already-running authenticated app. For already-authenticated desktop app content, do not stop at the isolated-launch warning: when host tools are available and the app is not already running, launch the normal app with a debug port (macOS example: `open -a Slack --args --remote-debugging-port=9222 --remote-allow-origins='*'`), verify the port, then attach with `{ "args": ["connect", "9222"], "sessionMode": "fresh" }`; if the app is already running without a debug port, ask before relaunching it. Remote debugging still exposes app content, so use caller-owned `allow` / `deny` lists for sensitive app policies when needed. `electron.list` may annotate common private-data apps as `[likely sensitive: …]`; this is advisory metadata only and does not block `launch` or replace caller policy.
|
|
228
|
+
|
|
229
|
+
Install scans for `electron.list` (and resolving `appName` / `bundleId` targets) are implemented for **macOS and Linux** hosts only. On **Windows**, `list` returns `platform: "unsupported"` with no apps, so prefer `executablePath` (or a host `appPath` that points at the real Electron `.exe`) when launching there—the wrapper still runs Electron evidence checks on that path before spawn.
|
|
230
|
+
|
|
231
|
+
Typical lifecycle:
|
|
232
|
+
|
|
233
|
+
```json
|
|
234
|
+
{ "electron": { "action": "list", "query": "code" } }
|
|
235
|
+
{ "electron": { "action": "launch", "appName": "Visual Studio Code", "handoff": "snapshot" } }
|
|
236
|
+
{ "args": ["snapshot", "-i"] }
|
|
237
|
+
{ "electron": { "action": "probe", "timeoutMs": 5000 } }
|
|
238
|
+
{ "electron": { "action": "cleanup", "launchId": "electron-…" } }
|
|
239
|
+
```
|
|
240
|
+
|
|
241
|
+
`electron.status` and `electron.cleanup` take either `launchId`, **`all: true`** (literal boolean) to walk every wrapper-tracked launch in one call, or neither when exactly one active launch exists—never both `launchId` and `all`. For `electron.launch`, `timeoutMs` bounds host CDP readiness with a **15s** default and **120s** cap in `extensions/agent-browser/lib/electron/launch.ts`. Optional `timeoutMs` on **`status`** applies to managed-session `get title` / `get url` reads (localhost CDP probes stay on a short fixed fetch budget). On **`cleanup`**, it caps upstream `close` **and** host teardown (process exit, debug-port idle check, isolated profile removal); when omitted it follows the implicit session close default (**5s** unless `PI_AGENT_BROWSER_IMPLICIT_SESSION_CLOSE_TIMEOUT_MS` overrides). On **`probe`**, it bounds each underlying upstream read subprocess—omit it to use the normal tool subprocess default, or raise it on slow desktops.
|
|
242
|
+
|
|
243
|
+
`launch.handoff` defaults to `"snapshot"`, which attaches through upstream `connect`, lists targets, and captures a current `snapshot -i` in one call. Snapshot handoff retries briefly when the first Electron snapshot has no refs; if it still reports no refs, run `snapshot -i` once more before assuming the app is blank. Use `handoff: "tabs"` as the safer diagnostic starting point when you only need target discovery and do not want to snapshot app content yet, or `handoff: "connect"` when you want to attach first and run your own follow-up commands. `targetType` defaults to `"page"`; use `"webview"` or `"any"` for apps that expose useful webviews. When a matching CDP target exposes a WebSocket URL, launch connects to that target; otherwise it falls back to the browser port.
|
|
244
|
+
|
|
245
|
+
After launch, prefer the exact `details.nextActions` payloads when present: `status-electron-launch` checks liveness, `probe-electron-launch` runs compact diagnostics for a tracked launch, `snapshot-electron-session` refreshes current refs, `list-electron-tabs` inspects targets, and `cleanup-electron-launch` removes the wrapper-owned process/profile when the run is done. If launch times out, inspect `details.electron.failure.diagnostics` for PID, wrapper profile, `DevToolsActivePort`, and timing evidence before retrying. If status/probe detects a session or target mismatch, follow `reattach-electron-launch` or a fresh snapshot action before using old refs. If a click/fill/type looks successful but the Electron PID or debug port dies, the wrapper now fails the result with `details.electronPostCommandHealth` and same-launch status/probe/cleanup next actions instead of leaving the agent on `about:blank`. If cleanup is partial (`failureCategory: "cleanup-failed"`), inspect `details.electron.cleanup.results` and use `retry-electron-cleanup` only for the same `launchId`.
|
|
246
|
+
|
|
247
|
+
Manual path for externally launched apps: if you started the Electron app yourself with a debug port or DevTools URL, skip the wrapper lifecycle and attach directly with upstream `connect`. In this path you own app shutdown and profile cleanup; do not use `electron.cleanup`. `close` only closes the browser/CDP session and does not quit the manually launched app or remove explicit artifacts.
|
|
248
|
+
|
|
249
|
+
```json
|
|
250
|
+
{ "args": ["connect", "9222"], "sessionMode": "fresh" }
|
|
251
|
+
{ "args": ["tab", "list"] }
|
|
252
|
+
{ "args": ["tab", "t2"] }
|
|
253
|
+
{ "args": ["snapshot", "-i"] }
|
|
254
|
+
```
|
|
255
|
+
|
|
256
|
+
A successful raw `connect` means the debug endpoint accepted the session, not that the app has an active ready page. Prefer `details.nextActions` when present: `list-connected-session-tabs` runs the session-scoped tab inspection. After that read-only list, select or confirm the stable `t<N>` target and run `snapshot -i` explicitly before trusting refs. If a `snapshot -i` says `No active page`, the wrapper clears any prior refs for that session; follow `list-tabs-after-no-active-page`, select the stable `t<N>` surface, then use a condition wait or retry `snapshot -i` before trusting refs.
|
|
257
|
+
|
|
258
|
+
For current-session smoke checks after either path, use `qa.attached`; for compact state instead of separate title/url/focus/tab/snapshot calls, use `electron.probe`. `electron.probe.timeoutMs` bounds each underlying read subprocess; `electron.probe.launchId` ties the probe to a wrapper launch and can surface session or target mismatch guidance before you trust page refs. For VS Code-style quick inputs, treat a successful `fill` as tentative: the wrapper may append `details.fillVerification` if `get value` still reads empty or different, and Electron `@e…` mutations can append `refresh-electron-refs-after-rerender` because same-URL UI rerenders commonly churn refs.
|
|
259
|
+
|
|
260
|
+
For local app debugging, top-level `sourceLookup` can gather candidate component/file locations for a visible element from selector DOM hints, React DevTools inspection, and a bounded workspace component-name search rooted at the Pi session working directory (`maxWorkspaceFiles` defaults to 2000 and cannot exceed 5000; the scan records at most ten `workspace-search` candidates). With a `selector`, the wrapper runs `is visible` and, unless `includeDomHints` is `false`, `get html` so DOM data attributes and embedded source-like paths can become `dom-attribute` candidates. It reports evidence and confidence in `details.sourceLookup` instead of claiming a guaranteed source file. React hints require a session opened with `--enable react-devtools`. The `details.sourceLookup.status` field reads `unsupported` only when no candidates were collected **and** a `react` batch step failed (inspect errors, missing renderer, and similar); it reads `no-candidates` when the batch succeeded but nothing matched. If selector or workspace hints still yield candidates, `status` remains `candidates-found` even when React inspection failed. Unlike `qa`, the wrapper does not downgrade a **fully successful** upstream batch to `isError` solely because those statuses appear—though failed batch steps still produce normal tool errors. For wrapper-tracked packaged Electron sessions with no candidates, `details.sourceLookup.workspaceRoot` and optional `details.sourceLookup.electronContext` explain that the scan only covered the Pi tool cwd; installed app resources or `app.asar` bundles are outside that scan and are not unpacked. Those results may add `snapshot-electron-session`, `probe-electron-launch`, and `list-electron-tabs` next actions so you can inspect the live packaged app before deciding whether to change the workspace or app bundle.
|
|
208
261
|
|
|
209
262
|
```json
|
|
210
263
|
{ "sourceLookup": { "selector": "#save", "reactFiberId": "2", "componentName": "SaveButton" } }
|
|
@@ -224,7 +277,9 @@ Top-level `networkSourceLookup` does the same for failed browser requests. When
|
|
|
224
277
|
{ "args": ["wait", "--download", "/tmp/report.pdf"] }
|
|
225
278
|
```
|
|
226
279
|
|
|
227
|
-
Do not
|
|
280
|
+
Do not omit the load state value; use `wait --load <state>` with `load`, `domcontentloaded`, or `networkidle`.
|
|
281
|
+
|
|
282
|
+
For desktop-host readiness, prefer condition waits over fixed sleeps. Use this ladder: `wait --text` / `wait --url` / `wait --fn` / `wait --load <state>` / `wait --download` when a real condition exists; after raw `connect`, run `tab list` → `tab t<N>` → condition wait or `snapshot -i`; after wrapper-owned `electron.launch`, use `electron.probe` / `electron.status` for launch health or target mismatch; use `qa.attached` when expected text or selector plus diagnostics can express the check. Fixed waits are a last resort: `wait 30000` is intentionally blocked by the wrapper IPC budget, and a successful fixed-wait payload such as `"waited":"timeout"` means elapsed time only, not proof that the desktop host finished. Verify with an observed condition, fresh snapshot, or screenshot before continuing.
|
|
228
283
|
|
|
229
284
|
Use `wait --download [path]` after an earlier action has already started a browser download, such as a dashboard export button that responds asynchronously:
|
|
230
285
|
|
|
@@ -298,7 +353,7 @@ Oversized snapshots and oversized generic outputs are different: when a persiste
|
|
|
298
353
|
{ "args": ["snapshot", "-i"] }
|
|
299
354
|
```
|
|
300
355
|
|
|
301
|
-
Use `tab list` and `tab <tab-id-or-label>` when a profile restore, pop-up, or click opens or focuses the wrong tab.
|
|
356
|
+
Use `tab list` and `tab <tab-id-or-label>` when a profile restore, pop-up, or click opens or focuses the wrong tab. Generic tab-drift recovery lists tabs first; run `snapshot -i` only after selecting or confirming the intended stable target. When the wrapper already knows the target, `details.nextActions` may include recovery actions that list tabs, select the intended tab, and refresh refs in the right session.
|
|
302
357
|
|
|
303
358
|
### Recover from guarded-action confirmations
|
|
304
359
|
|
|
@@ -470,7 +525,7 @@ Stable tab ids look like `t1`, `t2`, and `t3`. Optional user labels such as `doc
|
|
|
470
525
|
| `snapshot -d <n>` / `snapshot --depth <n>` | Limit tree depth. |
|
|
471
526
|
| `snapshot -s <sel>` / `snapshot --selector <sel>` | Scope to a CSS selector. |
|
|
472
527
|
|
|
473
|
-
When a snapshot is too large for inline output, the Pi wrapper renders a compact view before spilling the full raw snapshot to `details.fullOutputPath`. Compact snapshots are main-content-first, but dense pages can still hide actionable controls in omitted content;
|
|
528
|
+
When a snapshot is too large for inline output, the Pi wrapper renders a compact view before spilling the full raw snapshot to `details.fullOutputPath`. Compact snapshots are main-content-first, but dense pages and desktop host screens can still hide actionable controls in omitted content; scan `Omitted high-value controls` before opening the spill file. That bounded section favors editable/searchbox/textbox/combobox controls, named tab/surface controls, and primary action buttons, then includes other useful controls such as checkboxes, radios, options, and menuitems that were not already listed under key refs or other refs. When that section appears, `details.data.highValueControlRefIds` repeats the same visible ref ids for programmatic follow-up alongside fields such as `previewMode`, `previewSections`, and counts on `details.data` (see [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#details)).
|
|
474
529
|
|
|
475
530
|
### Wait
|
|
476
531
|
|
|
@@ -481,7 +536,7 @@ When a snapshot is too large for inline output, the Pi wrapper renders a compact
|
|
|
481
536
|
| `wait --url <pattern>` | Wait for the URL to match a pattern. |
|
|
482
537
|
| `wait --load <state>` | Wait for load state: `load`, `domcontentloaded`, or `networkidle`. |
|
|
483
538
|
| `wait --fn <expression>` | Wait for a JavaScript expression to become truthy. |
|
|
484
|
-
| `wait --text <text>` | Wait for text to appear on the page. |
|
|
539
|
+
| `wait --text <text>` | Wait for text to appear on the page; failures may include `inspect-after-text-assertion-failure` with a session-scoped `snapshot -i` payload. |
|
|
485
540
|
| `wait --download [path]` | Wait for a download started by a previous action and optionally save it to `path`; successful wrapper results include upstream-reported `savedFilePath`/`savedFile`, while `details.artifacts[].exists` is the wrapper's on-disk verification signal. |
|
|
486
541
|
| `wait --download [path] --timeout <ms>` | Set download-start timeout in milliseconds. In the native Pi wrapper, use `25000` ms or less per call to stay under the upstream CLI IPC budget. |
|
|
487
542
|
| `wait <selector> --state hidden` | Wait for an element to become hidden. |
|
|
@@ -624,7 +679,7 @@ Other useful environment variables include `AGENT_BROWSER_DEFAULT_TIMEOUT`, `AGE
|
|
|
624
679
|
- After launch-scoped open/goto/navigate calls that can restore existing tabs (for example --profile, --session-name, or --state), agent_browser best-effort re-selects the tab whose URL matches the returned page when restored tabs steal focus during launch.
|
|
625
680
|
- After the wrapper observes tab-drift risk for a session (for example profile restore correction, overlapping stale opens, or resumed session state), later active-tab commands best-effort pin that tab inside the same upstream invocation. Routine same-session commands are not preflighted with tab list just because a target tab is known.
|
|
626
681
|
- For sessions with observed tab-drift risk, after a successful command on a known target tab, agent_browser also best-effort restores that intended tab if a restored/background tab steals focus after the command completes. Routine same-session commands skip this post-command tab-list probe.
|
|
627
|
-
- If a known session target unexpectedly reports about:blank, agent_browser
|
|
682
|
+
- If a known session target unexpectedly reports about:blank, agent_browser best-effort re-selects the prior intended target when it still exists; if recovery fails, it records the observed about:blank target and reports exact recovery guidance instead of treating the prior page as active.
|
|
628
683
|
<!-- agent-browser-playbook:end wrapper-tab-recovery -->
|
|
629
684
|
- Wrapper-spawned commands clamp `AGENT_BROWSER_DEFAULT_TIMEOUT` to 25 seconds and use a 28-second child-process watchdog (`PI_AGENT_BROWSER_PROCESS_TIMEOUT_MS` overrides the default 28s budget) so one upstream CLI call does not cross the upstream 30-second IPC read-timeout/retry path. When that watchdog fires, `details.timeoutPartialProgress` may include a planned step list for compiled `job` / `qa` plans or caller `batch` stdin, current page title/URL from best-effort session `get url` / `get title` (or a planned URL inferred from the step list when the session cannot answer), and declared artifact paths such as `screenshot`, `pdf`, `download`, or `wait --download` outputs with existence/size checks; the same evidence is appended under `Timeout partial progress` in visible text with URL/path redaction.
|
|
630
685
|
- Oversized snapshots and oversized generic outputs may be compacted in tool content, with the full raw output written to a spill file path shown directly in the tool result. Recent artifact metadata is bounded by `PI_AGENT_BROWSER_SESSION_ARTIFACT_MANIFEST_MAX_ENTRIES` (default 100); persisted spill files are separately bounded by `PI_AGENT_BROWSER_SESSION_ARTIFACT_MAX_BYTES` (default 32 MiB).
|