pi-agent-browser-native 0.2.32 → 0.2.34
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/CHANGELOG.md +36 -0
- package/README.md +61 -20
- package/docs/ARCHITECTURE.md +9 -2
- package/docs/COMMAND_REFERENCE.md +45 -14
- package/docs/ELECTRON.md +23 -4
- package/docs/RELEASE.md +15 -5
- package/docs/REQUIREMENTS.md +1 -1
- package/docs/SUPPORT_MATRIX.md +36 -22
- package/docs/TOOL_CONTRACT.md +90 -31
- package/extensions/agent-browser/index.ts +407 -4373
- package/extensions/agent-browser/lib/input-modes/electron.ts +170 -0
- package/extensions/agent-browser/lib/input-modes/job.ts +265 -0
- package/extensions/agent-browser/lib/input-modes/lookups.ts +447 -0
- package/extensions/agent-browser/lib/input-modes/params.ts +188 -0
- package/extensions/agent-browser/lib/input-modes/semantic-action.ts +107 -0
- package/extensions/agent-browser/lib/input-modes/shared.ts +46 -0
- package/extensions/agent-browser/lib/input-modes/types.ts +221 -0
- package/extensions/agent-browser/lib/input-modes.ts +44 -0
- package/extensions/agent-browser/lib/orchestration/browser-run/diagnostics.ts +762 -0
- package/extensions/agent-browser/lib/orchestration/browser-run/final-result.ts +450 -0
- package/extensions/agent-browser/lib/orchestration/browser-run/index.ts +46 -0
- package/extensions/agent-browser/lib/orchestration/browser-run/prepare.ts +736 -0
- package/extensions/agent-browser/lib/orchestration/browser-run/process-output.ts +413 -0
- package/extensions/agent-browser/lib/orchestration/browser-run/session-state.ts +868 -0
- package/extensions/agent-browser/lib/orchestration/browser-run/types.ts +482 -0
- package/extensions/agent-browser/lib/orchestration/browser-run.ts +1 -0
- package/extensions/agent-browser/lib/orchestration/input-plan.ts +338 -0
- package/extensions/agent-browser/lib/playbook.ts +22 -20
- package/extensions/agent-browser/lib/process.ts +106 -4
- package/extensions/agent-browser/lib/results/action-recommendations.ts +269 -0
- package/extensions/agent-browser/lib/results/artifact-manifest.ts +114 -0
- package/extensions/agent-browser/lib/results/artifact-state.ts +13 -0
- package/extensions/agent-browser/lib/results/categories.ts +106 -0
- package/extensions/agent-browser/lib/results/contracts.ts +220 -0
- package/extensions/agent-browser/lib/results/editable-ref-evidence.ts +72 -0
- package/extensions/agent-browser/lib/results/envelope.ts +2 -1
- package/extensions/agent-browser/lib/results/network.ts +64 -0
- package/extensions/agent-browser/lib/results/next-actions.ts +117 -0
- package/extensions/agent-browser/lib/results/presentation/artifacts.ts +506 -0
- package/extensions/agent-browser/lib/results/presentation/batch.ts +355 -0
- package/extensions/agent-browser/lib/results/presentation/common.ts +53 -0
- package/extensions/agent-browser/lib/results/presentation/content.ts +36 -0
- package/extensions/agent-browser/lib/results/presentation/diagnostics.ts +730 -0
- package/extensions/agent-browser/lib/results/presentation/errors.ts +125 -0
- package/extensions/agent-browser/lib/results/presentation/large-output.ts +182 -0
- package/extensions/agent-browser/lib/results/presentation/navigation.ts +216 -0
- package/extensions/agent-browser/lib/results/presentation/registry.ts +182 -0
- package/extensions/agent-browser/lib/results/presentation/semantic-action.ts +133 -0
- package/extensions/agent-browser/lib/results/presentation/skills.ts +143 -0
- package/extensions/agent-browser/lib/results/presentation.ts +96 -2403
- package/extensions/agent-browser/lib/results/recovery-actions.ts +139 -0
- package/extensions/agent-browser/lib/results/recovery-next-actions.ts +71 -0
- package/extensions/agent-browser/lib/results/selector-recovery.ts +312 -0
- package/extensions/agent-browser/lib/results/shared.ts +17 -789
- package/extensions/agent-browser/lib/results/snapshot-high-value-controls.ts +262 -0
- package/extensions/agent-browser/lib/results/snapshot-refs.ts +100 -0
- package/extensions/agent-browser/lib/results/snapshot-segments.ts +366 -0
- package/extensions/agent-browser/lib/results/snapshot-spill.ts +63 -0
- package/extensions/agent-browser/lib/results/snapshot.ts +37 -489
- package/extensions/agent-browser/lib/results/text.ts +40 -0
- package/extensions/agent-browser/lib/results.ts +16 -5
- package/extensions/agent-browser/lib/session-page-state.ts +486 -0
- package/package.json +2 -1
package/docs/TOOL_CONTRACT.md
CHANGED
|
@@ -33,12 +33,40 @@ The native command reference in `docs/COMMAND_REFERENCE.md` is driven by the sam
|
|
|
33
33
|
|
|
34
34
|
Agent-facing efficiency claims are measured with `npm run benchmark:agent-browser` or `npm run verify -- benchmark`. The benchmark is deterministic and does not launch a browser; it tracks representative workflow success, tool calls, model-visible output size, stale-ref failures and recoveries, artifact success, failure-category coverage, and elapsed-time estimates so future abstractions can prove they reduce agent work before replacing raw tool use.
|
|
35
35
|
|
|
36
|
+
## Input mode chooser
|
|
37
|
+
|
|
38
|
+
Use exactly one top-level input per call:
|
|
39
|
+
|
|
40
|
+
| When you need | Use | Notes |
|
|
41
|
+
| --- | --- | --- |
|
|
42
|
+
| Routine browse, click, fill, screenshots, upstream commands | `args` | Default path: `open` → `snapshot -i` → `click`/`fill` `@eN` → `snapshot -i` after navigation or DOM changes. Do not pass `--json`; the wrapper injects it (see [Wrapper `--json`](#wrapper-json)). |
|
|
43
|
+
| Stable visible role/text/label/placeholder targets | `semanticAction` | Compiles to upstream `find` or `select`; optional `session` for a named upstream browser. |
|
|
44
|
+
| Short multi-step smoke or evidence flows | `job` or `qa` | Both compile to `batch`; `qa` may reclassify diagnostics as failure. |
|
|
45
|
+
| Desktop Electron apps (list/launch/probe/cleanup) | `electron` | Wrapper-owned lifecycle; not for ordinary websites. |
|
|
46
|
+
| Local UI source hints (experimental) | `sourceLookup` | **Candidates only** with confidence/evidence; not guaranteed DOM-to-file mappings. |
|
|
47
|
+
| Failed fetch/API source hints (experimental) | `networkSourceLookup` | **Candidates only** from initiator metadata and bounded workspace URL literals; not definitive blame. |
|
|
48
|
+
|
|
49
|
+
For link and button text, use the **exact** visible label from the latest `snapshot -i` (or `semanticAction` locators), not guessed copy. The `https://example.com/` smoke page uses heading `Example Domain` and link `Learn more`; do not assume older example.com strings such as `More information...`.
|
|
50
|
+
|
|
51
|
+
## Snapshot and getter batching
|
|
52
|
+
|
|
53
|
+
- **`snapshot -i`**: default for interaction—interactive `@eN` refs, main-content-first trimming, and the usual click/fill workflow.
|
|
54
|
+
- **`snapshot --compact`**: denser same-page tree when you still need refs but want less output than full interactive snapshot.
|
|
55
|
+
- **Full `snapshot`** (no `-i`): use only when you need the complete accessibility tree; expect larger output and possible spill files.
|
|
56
|
+
- Re-run `snapshot -i` after navigation, scrolling, rerendering, or other major DOM changes; refs are page-scoped.
|
|
57
|
+
- When you need **three or more** `get title` / `get url` / `get text` / similar reads for known refs or selectors on the same page, prefer one `batch` stdin array (for example `[["get","text","@e1"],["get","text","@e2"]]`) instead of serial tool calls.
|
|
58
|
+
|
|
59
|
+
## Wrapper `--json`
|
|
60
|
+
|
|
61
|
+
The extension always plans normal browser commands with `--json` prepended in `effectiveArgs` so upstream returns structured JSON for presentation and `details`. **Do not** include `--json` in caller `args`; it is unnecessary and can confuse planning or transcript hooks that treat caller-requested JSON differently. Plain-text inspection (`--help`, `--version`) and read-only `skills list` / `skills get` / `skills path` keep their own output shapes and skip implicit session injection as documented under `sessionMode`.
|
|
62
|
+
|
|
36
63
|
<!-- agent-browser-playbook:start shared-guidelines -->
|
|
37
64
|
<!-- Generated from extensions/agent-browser/lib/playbook.ts. Run `npm run docs -- playbook write` to update. -->
|
|
38
65
|
- Standard workflow: open the page, snapshot -i, interact using current @refs from that snapshot, and re-snapshot after navigation, scrolling, rerendering, or other major DOM changes because refs are page-scoped; the wrapper fails mutation-prone stale/recycled refs before upstream can silently target a different current-page element.
|
|
39
66
|
- For ordinary forms from one snapshot, batch multiple fill @refs before the submit/click step to avoid serial tool calls; if a fill may autosubmit, navigate, or rerender later fields, split the flow and refresh refs first.
|
|
40
|
-
- When snapshot -i compacts because the tree is oversized, scan visible output for Omitted high-value controls and optional details.data.highValueControlRefIds before opening the spill file: those list bounded searchboxes, textboxes, comboboxes, buttons, tabs, checkboxes, radios, options, and menuitems that did not fit the key/other ref previews.
|
|
67
|
+
- Snapshot choice: prefer snapshot -i for routine clicks/fills (interactive @refs, main-content-first). Use snapshot --compact when you need a denser same-page tree without full spill; use full snapshot (no -i) only when you need the complete accessibility tree. Re-snapshot after navigation or major DOM changes. When snapshot -i compacts because the tree is oversized, scan visible output for Omitted high-value controls and optional details.data.highValueControlRefIds before opening the spill file: those list bounded searchboxes, textboxes, comboboxes, buttons, tabs, checkboxes, radios, options, and menuitems that did not fit the key/other ref previews.
|
|
41
68
|
- When a visible text or accessible-name target should survive ref churn, prefer find locators such as role, text, label, placeholder, alt, title, or testid with the intended action instead of guessing a CSS selector.
|
|
69
|
+
- For desktop or host-controlled rich inputs, if semanticAction fill misses, refresh refs and prefer a current editable @ref from details.richInputRecovery or the latest snapshot; focus or click that ref, then use keyboard inserttext or keyboard type with the intended text. Do not auto-submit with Enter or a submit button unless the user flow explicitly calls for it.
|
|
42
70
|
- Do not assume Playwright selector dialects such as text=Close or button:has-text('Close') are supported wrapper syntax unless current upstream agent-browser behavior has been verified.
|
|
43
71
|
- For authenticated or user-specific content explicitly requested by the user, such as feeds, inboxes, account pages, or private dashboards, prefer --profile Default on the first browser call and let the implicit session carry continuity. Do not use a real profile for public pages just because they are dashboards. Treat visible page content from real profiles as model-visible transcript data; use --auto-connect only if profile-based reuse is unavailable or the task is specifically about attaching to a running debug-enabled browser.
|
|
44
72
|
- Do not invent fixed explicit session names for routine tasks. Use the implicit session unless you truly need multiple isolated browser sessions in the same conversation.
|
|
@@ -49,10 +77,10 @@ Agent-facing efficiency claims are measured with `npm run benchmark:agent-browse
|
|
|
49
77
|
- For stateful browser context work, prefer purpose-specific page actions before dumping browser data: use auth save --password-stdin with the tool stdin field for credentials, state save/load for portable test state, cookies get/set/clear and storage local|session only when the task needs those values, and expect cookie/storage/auth/state summaries to redact credential-like fields.
|
|
50
78
|
- For batch chains that touch cookies, storage, auth, or other secret-bearing commands, use details.batchSteps for per-step artifacts, categories, spill paths, and full structured errors; top-level details.data on batch is only a compact redacted step matrix (success, argv-redacted command, redacted result or scrubbed error text) built from the same presentation rules as standalone calls.
|
|
51
79
|
- For non-core families, pass current upstream commands through the native tool directly: network route/requests/har, diff snapshot/screenshot/url, trace/profiler/record, console/errors/highlight/inspect/clipboard, stream enable/disable/status, dashboard start/stop, and chat. For compact network requests output, prefer details.nextActions for request detail, actionable failed-request networkSourceLookup, filtering, or HAR capture follow-ups instead of guessing request-id syntax. Artifact-producing commands report details.artifacts and verification state; long-running starts such as stream, dashboard, trace/profiler, and record should be paired with the matching stop/disable command when the task is done.
|
|
52
|
-
- For Electron desktop apps, prefer top-level electron for wrapper-owned discovery, isolated launch, status, compact probe, and cleanup: list first, treat likely-sensitive annotations as hints rather than enforcement, launch with the default snapshot handoff unless handoff: "tabs" is the safer diagnostic starting point, use electron.probe or snapshot -i/qa.attached for current-session state, and always cleanup the returned launchId when done. electron.launch uses an isolated temporary profile; it does not reuse the app's normal signed-in profile or attach to an already-running authenticated app. For signed-in local app state, host-launch the normal app with --remote-debugging-port when appropriate, then use raw args connect <port|url>; leave shutdown
|
|
80
|
+
- For Electron desktop apps, prefer top-level electron for wrapper-owned discovery, isolated launch, status, compact probe, and cleanup: list first, treat likely-sensitive annotations as hints rather than enforcement, launch with the default snapshot handoff unless handoff: "tabs" is the safer diagnostic starting point, use electron.probe or snapshot -i/qa.attached for current-session state, and always cleanup the returned launchId when done. electron.launch uses an isolated temporary profile; it does not reuse the app's normal signed-in profile or attach to an already-running authenticated app. For signed-in local app state, host-launch the normal app with --remote-debugging-port when appropriate, then use raw args connect <port|url>; after connect, inspect tab list, select the stable tab id such as tab t2, then run a condition wait or snapshot -i before using refs. close only closes the browser/CDP session; leave manually launched app shutdown, profile cleanup, and explicit artifacts to the host owner.
|
|
53
81
|
- For provider or specialized app workflows, load version-matched upstream guidance with skills get agentcore|electron|slack|dogfood|vercel-sandbox through the native tool. Provider launches such as -p ios, --provider browserbase/kernel/browseruse/browserless/agentcore, and iOS --device are upstream-owned setup paths; use sessionMode fresh when switching providers and expect external credentials or local Appium/Xcode setup to be required.
|
|
54
82
|
- For dialogs and frames, use dialog status/accept/dismiss and frame <selector|main> through native args; when --confirm-actions produces a pending confirmation, use details.nextActions or exact confirm <id> / deny <id> calls instead of inventing ids.
|
|
55
|
-
- If a session lands on the wrong page or tab, an interaction changes origin unexpectedly, or an open call returns blocked, blank, or otherwise unexpected results, use tab list / tab <tab-id-or-label> / snapshot -i to recover state before retrying different URLs or fallback strategies.
|
|
83
|
+
- If a session lands on the wrong page or tab, an interaction changes origin unexpectedly, or an open call returns blocked, blank, or otherwise unexpected results, use tab list / tab <tab-id-or-label> / snapshot -i to recover state before retrying different URLs or fallback strategies. For desktop readiness, prefer real conditions first: wait --text, wait --url, wait --fn, wait --load <state>, wait --download, or qa.attached; use electron.probe/status for wrapper-owned launch health or target mismatch. Fixed waits are a last resort, must stay below the wrapper IPC budget (wait 30000 is intentionally blocked), and a successful payload like "waited":"timeout" means elapsed time only—verify completion with an observed condition, fresh snapshot, or screenshot.
|
|
56
84
|
- For feed, timeline, or inbox reading tasks, focus on the main timeline/list region and read the first item there rather than unrelated composer or sidebar content.
|
|
57
85
|
- For read-only browsing tasks, prefer extracting the answer from the current snapshot, structured ref labels, or eval --stdin on the current page before navigating away. Only click into media viewers, detail routes, or new pages when the current view does not contain the needed information.
|
|
58
86
|
- For downloads, prefer download <selector> <path> when an element click should save a file. Do not rely on click alone when you need the downloaded file on disk.
|
|
@@ -61,6 +89,9 @@ Agent-facing efficiency claims are measured with `npm run benchmark:agent-browse
|
|
|
61
89
|
- When using eval --stdin for extraction, return the value you want instead of relying on console.log as the primary result channel. Prefer plain expressions like ({ title: document.title }) or explicitly invoked functions like (() => ({ title: document.title }))(); if a function-shaped snippet returns {}, details.evalStdinHint may warn that the function was serialized instead of called. If get text on a CSS selector surfaces details.selectorTextVisibility or selectorTextVisibilityAll, prefer a visible @ref, a more specific selector, or the inspect-visible-text-candidates nextAction over hidden tab content.
|
|
62
90
|
- When details.pageChangeSummary is present, use changeType and summary as a compact signal for navigation, DOM mutation, confirmations, or artifacts; when nextActionIds is set, match those ids to entries in details.nextActions (or per-step nextActions inside batch) for concrete follow-up payloads instead of inferring from prose alone. If a no-navigation click surfaces details.overlayBlockers, inspect the fresh snapshot evidence before using a close/dismiss candidate nextAction; ordinary page chrome without dialog/alertdialog evidence should not trigger this diagnostic.
|
|
63
91
|
- When commands save or spill files (screenshots, downloads, PDFs, traces, recordings, HAR, large snapshot spills), use the user's exact requested paths when given and treat paths as provisional until details.artifactVerification shows every row verified: branch on missingCount, pendingCount, unverifiedCount, per-entry state, and optional limitation before downstream file use or PASS/FAIL reporting.
|
|
92
|
+
- For evidence-only screenshots, QA captures, or other audit artifacts, save to an explicit path and branch on details.artifactVerification plus details.artifacts before reporting PASS/FAIL; do not require vision review of inline image attachments unless the user asked for visual inspection.
|
|
93
|
+
- Respect explicit user stop boundaries: if the user says to stop before order/post/purchase/submit, do not click that final action.
|
|
94
|
+
- Successful record stop needs ffmpeg on PATH; the wrapper may warn after record start when ffmpeg is missing.
|
|
64
95
|
- Do not call --help or other exploratory inspection commands unless the user explicitly asks for them or debugging the browser integration is necessary.
|
|
65
96
|
<!-- agent-browser-playbook:end shared-guidelines -->
|
|
66
97
|
|
|
@@ -89,6 +120,8 @@ Illustrative shapes (each real call uses exactly one of `args`, `semanticAction`
|
|
|
89
120
|
- exact CLI args passed after `agent-browser`
|
|
90
121
|
- no shell operators
|
|
91
122
|
- do not include the binary name
|
|
123
|
+
- do not include `--json`; the wrapper injects it (see [Wrapper `--json`](#wrapper-json))
|
|
124
|
+
- first-call recipe: `open` → `snapshot -i` → `click` / `fill` with current `@eN` refs from that snapshot → `snapshot -i` again after navigation or DOM changes
|
|
92
125
|
|
|
93
126
|
Examples:
|
|
94
127
|
|
|
@@ -126,13 +159,13 @@ Compilation (then `--json` and session handling apply like any other call):
|
|
|
126
159
|
|
|
127
160
|
When `semanticAction` compiles successfully, `details.compiledSemanticAction` echoes `{ action, locator, args }` for `find` actions or `{ action: "select", selector, values, args }` for `select`, with `args` redacted the same way as other invocation details. Expect it on the initial wrapper validation return (when that path still builds the early `details` object) and on the unified result after `agent-browser` runs. It is omitted when the call used `args` only, when compilation never produced argv, and on some in-`execute` error returns that attach a slimmer `details` shape before the unified merge (for example certain session-plan, stdin-contract, tab-pinning, or missing-binary guard paths); compare `extensions/agent-browser/index.ts` where `compiledSemanticAction` is assigned. For active sessions, role/name `click`, `check`, and `uncheck` semantic actions may be resolved through one fresh `snapshot -i` to a current visible `@ref` before execution; this avoids hidden duplicate matches stealing an upstream `find` action. In that case `details.compiledSemanticAction` still records the original semantic target while `details.effectiveArgs` shows the executed ref action.
|
|
128
161
|
|
|
129
|
-
If a raw `find` or compiled `semanticAction` fails with `failureCategory: "selector-not-found"`, the wrapper may run one fresh session-scoped `snapshot -i` and add visible `Current snapshot ref fallback
|
|
162
|
+
If a raw `find` or compiled `semanticAction` fails with `failureCategory: "selector-not-found"`, the wrapper may run one fresh session-scoped `snapshot -i` and add visible `Current snapshot ref fallback` plus `details.visibleRefFallback` when that snapshot contains exact role/name matches for the failed target. Non-fill matches can also add `try-current-visible-ref` / `try-current-visible-ref-N` next actions. The matcher is bounded to current snapshot refs and exact normalized role/name matches: role locators require `--name`, text-click falls back only to exact-name `button`/`link` refs, label-fill to exact-name `textbox`, and placeholder-fill to exact-name `searchbox`/`textbox`. It never fuzzy-matches names such as prefixes; when several exact refs match, each action carries safety copy telling agents to inspect the snapshot and choose only if unambiguous. For `fill` matches, `visibleRefFallback.candidates[].args` and `visibleRefFallback.target.text` are omitted so recovery details do not repeat the fill text.
|
|
130
163
|
|
|
131
|
-
If a compiled `semanticAction` fails with `failureCategory: "selector-not-found"`, visible content can also include an `Agent-browser candidate fallbacks` block when the wrapper has bounded role/name retries for that locator and action, and `details.nextActions` includes the normal `refresh-interactive-refs` snapshot step plus those entries. When `session` was provided, candidate retry args preserve the same `--session <session>` prefix. Today `buildSemanticActionCandidateActions` in `extensions/agent-browser/index.ts` only appends candidates for
|
|
164
|
+
If a compiled `semanticAction` fails with `failureCategory: "selector-not-found"`, visible content can also include an `Agent-browser candidate fallbacks` block when the wrapper has bounded role/name retries for that locator and action, and `details.nextActions` includes the normal `refresh-interactive-refs` snapshot step plus those entries. When `session` was provided, candidate retry args preserve the same `--session <session>` prefix. Today `buildSemanticActionCandidateActions` in `extensions/agent-browser/index.ts` only appends click candidates for `click` + `text` → `try-button-name-candidate` and `try-link-name-candidate`. Fill misses no longer emit `find … fill <text>` retry actions because those would repeat potentially sensitive text. Instead, when the same selector-miss snapshot finds exact current editable refs (`searchbox` or `textbox`), the wrapper emits `details.richInputRecovery`, visible `Rich input recovery`, and `focus-current-editable-ref` / `click-current-editable-ref` (numbered when ambiguous) next actions. Those actions carry only focus/click argv for the candidate ref; they do not copy fill text, press `Enter`, or submit. Use `keyboard inserttext` or `keyboard type` with the intended text only after focusing the right current ref, and submit only when the user flow explicitly calls for it. Candidate fallbacks are heuristics, not proof that an element exists; inspect the page when several controls could share the same name.
|
|
132
165
|
|
|
133
|
-
If a compiled `semanticAction` `find` action fails with `failureCategory: "stale-ref"`, `details.nextActions` includes `retry-semantic-action-after-stale-ref` with the same redacted compiled argv as `details.compiledSemanticAction` in `params.args` (any leading `--session` pair from `semanticAction.session`, then the `find` tokens). The wrapper appends that entry **after** any `refresh-interactive-refs` snapshot step from `buildAgentBrowserNextActions` in `extensions/agent-browser/lib/results/
|
|
166
|
+
If a compiled `semanticAction` `find` action fails with `failureCategory: "stale-ref"`, `details.nextActions` includes `retry-semantic-action-after-stale-ref` with the same redacted compiled argv as `details.compiledSemanticAction` in `params.args` (any leading `--session` pair from `semanticAction.session`, then the `find` tokens). The wrapper appends that entry **after** any `refresh-interactive-refs` snapshot step from `buildAgentBrowserNextActions` in `extensions/agent-browser/lib/results/action-recommendations.ts` (re-exported from `shared.ts`; see `extensions/agent-browser/index.ts` where `nextActions` is merged). That retry is only offered because the semantic target is stable and the stale-ref error proves the previous action did not execute; `select` shorthands with stale `@e…` selectors and direct stale `@e…` commands still return refresh guidance instead of an unsafe blind retry.
|
|
134
167
|
|
|
135
|
-
For direct page-scoped `@e…` refs, successful `snapshot` results record `details.refSnapshot` with the latest ref ids and page target for the session. Before mutation-prone ref commands such as `click`, `fill`, `check`, `select`, `download`, drag/upload/keyboard-style actions, or equivalent batch steps run, the wrapper rejects refs from an older page target
|
|
168
|
+
For direct page-scoped `@e…` refs, successful `snapshot` results record `details.refSnapshot` with the latest ref ids and page target for the session. A failed session `snapshot` whose upstream error says `No active page` clears that session’s prior ref snapshot and records `details.refSnapshotInvalidation.reason: "no-active-page"`; mutation-prone `@e…` preflight then fails with `failureCategory: "stale-ref"` until a later successful `snapshot -i` records fresh refs. Before mutation-prone ref commands such as `click`, `fill`, `check`, `select`, `download`, drag/upload/keyboard-style actions, or equivalent batch steps run, the wrapper rejects refs from an older page target, refs absent from the latest same-page snapshot, or refs from an invalidated snapshot state. This is a best-effort wrapper guard against upstream ref-number recycling after navigation; it does not prove the DOM stayed unchanged after the snapshot. Refresh with the session-aware `refresh-interactive-refs` next action before retrying.
|
|
136
169
|
|
|
137
170
|
Examples:
|
|
138
171
|
|
|
@@ -166,7 +199,9 @@ Examples:
|
|
|
166
199
|
- `waitForDownload` with `path` (compiled as `wait --download <path>`)
|
|
167
200
|
- `screenshot` with `path`
|
|
168
201
|
|
|
169
|
-
|
|
202
|
+
**Navigation assertions are explicit only.** `job` never treats a successful `click` (or a `select` / submit-style interaction that may navigate) as proof that the expected next page loaded. Top-level `click` may still surface optional `details.navigationSummary` or `pageChangeSummary` hints for operators, but compiled `job` / `batch` steps do **not** auto-insert `assertUrl` or `assertText` after clicks—there is no deterministic expected URL source without caller intent. After any navigation-prone step (link/submit clicks, checkout or form flows, tab-sensitive UI), add an explicit `assertUrl` with the destination pattern you expect, `assertText` for on-page copy, or both, **before** screenshots or steps that assume the new page state.
|
|
203
|
+
|
|
204
|
+
Example (static landing page):
|
|
170
205
|
|
|
171
206
|
```json
|
|
172
207
|
{
|
|
@@ -180,12 +215,29 @@ Example:
|
|
|
180
215
|
}
|
|
181
216
|
```
|
|
182
217
|
|
|
183
|
-
|
|
218
|
+
Example (open → fill → click → assert destination → screenshot):
|
|
219
|
+
|
|
220
|
+
```json
|
|
221
|
+
{
|
|
222
|
+
"job": {
|
|
223
|
+
"steps": [
|
|
224
|
+
{ "action": "open", "url": "https://shop.example/checkout" },
|
|
225
|
+
{ "action": "fill", "selector": "#email", "text": "user@example.com" },
|
|
226
|
+
{ "action": "click", "selector": "#continue" },
|
|
227
|
+
{ "action": "assertUrl", "url": "**/shipping" },
|
|
228
|
+
{ "action": "assertText", "text": "Shipping address" },
|
|
229
|
+
{ "action": "screenshot", "path": ".dogfood/shipping.png" }
|
|
230
|
+
]
|
|
231
|
+
}
|
|
232
|
+
}
|
|
233
|
+
```
|
|
234
|
+
|
|
235
|
+
Compiled shape for the navigation example:
|
|
184
236
|
|
|
185
237
|
```json
|
|
186
238
|
{
|
|
187
239
|
"args": ["batch"],
|
|
188
|
-
"stdin": "[[\"open\",\"https://example.com\"],[\"wait\",\"--text\",\"
|
|
240
|
+
"stdin": "[[\"open\",\"https://shop.example/checkout\"],[\"fill\",\"#email\",\"user@example.com\"],[\"click\",\"#continue\"],[\"wait\",\"--url\",\"**/shipping\"],[\"wait\",\"--text\",\"Shipping address\"],[\"screenshot\",\".dogfood/shipping.png\"]]"
|
|
189
241
|
}
|
|
190
242
|
```
|
|
191
243
|
|
|
@@ -200,13 +252,14 @@ Because `job` still executes as upstream `batch` with generated stdin, the same
|
|
|
200
252
|
- type: object with either required `url` (normal URL-opening QA) or `attached: true` (current attached-session QA)
|
|
201
253
|
- optional; mutually exclusive with `args`, `semanticAction`, `job`, `sourceLookup`, `networkSourceLookup`, and `electron`
|
|
202
254
|
- lightweight preset built on the same batch compiler path as `job`
|
|
203
|
-
- URL form: clears enabled diagnostic buffers first (`network requests --clear`, `console --clear`, `errors --clear`), then opens `url`, waits with `wait --load <loadState
|
|
204
|
-
- attached form: `qa: { attached: true, expectedText?, expectedSelector?, screenshotPath?, checkNetwork?, checkConsole?, checkErrors?, loadState? }` runs the same waits, optional assertions, diagnostics, and screenshot against the current attached managed session without opening a URL. It rejects `url` and cannot be used with `sessionMode: "fresh"`; attach first with `electron.launch` or raw `args: ["connect", "<port-or-url>"]`, then run `qa.attached`.
|
|
255
|
+
- URL form: clears enabled diagnostic buffers first (`network requests --clear`, `console --clear`, `errors --clear`), then opens `url`, waits with `wait --load <state>` using the resolved `loadState`, optionally asserts `expectedText` (string or string array) and/or `expectedSelector` (each may be omitted for a load-plus-diagnostics-only smoke), then runs enabled diagnostics: `network requests`, `console`, and `errors`
|
|
256
|
+
- attached form: `qa: { attached: true, expectedText?, expectedSelector?, screenshotPath?, checkNetwork?, checkConsole?, checkErrors?, loadState? }` runs the same waits, optional assertions, diagnostics, and screenshot against the current attached managed session without opening a URL. It rejects `url` and cannot be used with `sessionMode: "fresh"`; attach first with `electron.launch` or raw `args: ["connect", "<port-or-url>"]`, then run `qa.attached`. Before spawning the diagnostic batch, the wrapper preflights the attached session: `get url` must succeed and return an `http:` or `https:` page URL. Missing URLs, read failures, and non-http(s) surfaces fail fast with `failureCategory: "validation-error"`, `details.validationError`, and recovery `nextActions` such as `list-tabs-before-qa-attached` and `snapshot-before-qa-attached` instead of running the full QA batch.
|
|
205
257
|
- `loadState` is optional and must be `domcontentloaded`, `load`, or `networkidle`; it defaults to `domcontentloaded` so analytics-heavy or long-polling pages do not hang routine QA. Use `networkidle` only when the site is expected to go fully quiet.
|
|
206
258
|
- `checkNetwork`, `checkConsole`, and `checkErrors` default to `true`; set a field to `false` to omit that diagnostic
|
|
207
259
|
- optional `screenshotPath` adds an evidence screenshot step
|
|
208
260
|
- reports `details.compiledQaPreset` with the compiled batch plan and resolved `loadState`, plus `details.qaPreset` with `{ passed, failedChecks, warnings, summary }`
|
|
209
|
-
-
|
|
261
|
+
- on success with no failed checks, model-visible prose collapses to a compact pass summary (current page URL/title when known, checks run, optional screenshot path plus artifact verification, pointer to `details.qaPreset` and `details.batchSteps` for the full matrix). Failed QA and QA reclassified to `qa-failure` keep the verbose per-step batch output.
|
|
262
|
+
- fails the native tool result with `failureCategory: "qa-failure"` when diagnostics report page errors, console error messages, actionable failed network requests, or any batch step failure. Benign classification (implementation: `classifyNetworkRequestFailure` → `isBenignAssetFailure` in `extensions/agent-browser/lib/results/network.ts`) applies only when the row is already treated as failed (`status >= 400`, `failed: true`, or a string `error`—see `isFailedNetworkRequest`), the URL path’s last segment matches the icon basename heuristic (`favicon` plus `.ico`/`.png`/`.svg`, or `apple-touch-icon` plus `.png`, each allowing an optional `[-.\w]*` stem suffix before the extension), **and** at least one of `status === 404`, `failed === true`, or `typeof request.error === "string"` holds (so a **status-only** failure such as `500` on that path with neither `failed` nor a string `error` stays actionable). It also requires the upstream `resourceType` / `mimeType` (whichever is present) to be absent or look image-like: `image`, `img`, `other`, or a value starting with `image/`. Those rows are counted in `qaPreset.warnings` (for example `N benign network request failure(s) ignored`) and omitted from the actionable failed-network tally; every other failed request stays actionable.
|
|
210
263
|
|
|
211
264
|
Example:
|
|
212
265
|
|
|
@@ -357,7 +410,7 @@ For an app you launched manually with remote debugging enabled, skip `electron.c
|
|
|
357
410
|
|
|
358
411
|
- type: object with at least one of `selector`, `reactFiberId`, or `componentName`
|
|
359
412
|
- optional; mutually exclusive with `args`, `semanticAction`, `job`, `qa`, `networkSourceLookup`, and `electron`
|
|
360
|
-
-
|
|
413
|
+
- **EXPERIMENTAL — candidates only:** opt-in helper for local app debugging; it reports candidate source locations with confidence and evidence instead of claiming a guaranteed DOM-to-file mapping. Do not treat output as authoritative file ownership or edit targets without verification.
|
|
361
414
|
- compiles to existing upstream `batch` commands only:
|
|
362
415
|
- `selector` adds `is visible <selector>` and, unless `includeDomHints: false`, adds `get html <selector>` for source-like DOM attributes (`data-source-file`, `data-file`, `data-component-file`, `data-source`, plus optional `data-source-line` / `data-line` and `data-source-column` / `data-column`) and for `.ts`/`.tsx`/`.js`/`.jsx` paths embedded in HTML text
|
|
363
416
|
- `reactFiberId` runs `react inspect <id>`; this requires the page to have been launched with `--enable react-devtools` before first navigation and for the app build to expose source information
|
|
@@ -382,7 +435,7 @@ Use raw `args` for direct upstream React inspection when you already know the ex
|
|
|
382
435
|
|
|
383
436
|
- type: object with at least one of `requestId`, `filter`, or `url`, plus optional `maxWorkspaceFiles`
|
|
384
437
|
- optional; mutually exclusive with `args`, `semanticAction`, `job`, `qa`, `sourceLookup`, and `electron`
|
|
385
|
-
-
|
|
438
|
+
- **EXPERIMENTAL — candidates only:** failed-request source-hint helper; it reports failed network requests and candidate source hints with evidence instead of assigning blame or proving root cause
|
|
386
439
|
- compiles to existing upstream `batch` commands only: `network request <requestId>` when provided plus `network requests` with `--filter <filter-or-url>` when a filter or URL is provided (if both are set, `filter` wins; when only `url` is set, it becomes the `--filter` argument); optional `session` prepends `--session <name>` before that generated `batch`
|
|
387
440
|
- detects failed requests from `status >= 400`, `failed: true`, or an `error` field
|
|
388
441
|
- candidate sources come from source-like initiator/stack metadata in upstream network results and bounded local workspace search for URL/path literals under the Pi session cwd
|
|
@@ -441,6 +494,8 @@ Recommended use:
|
|
|
441
494
|
|
|
442
495
|
## Wrapper behavior
|
|
443
496
|
|
|
497
|
+
Caller `args` should omit `--json`; the wrapper prepends it for normal execution so `details` and presentation stay structured. See [Wrapper `--json`](#wrapper-json).
|
|
498
|
+
|
|
444
499
|
The extension should:
|
|
445
500
|
- inject `--json`
|
|
446
501
|
- invoke `agent-browser` directly, not through a shell
|
|
@@ -471,11 +526,11 @@ Primary content should be:
|
|
|
471
526
|
- an image attachment when relevant
|
|
472
527
|
- browser-aware compacting for oversized snapshots so the model gets a concise actionable view before raw page noise
|
|
473
528
|
- compact snapshots should be main-content-first: prefer the primary content block and nearby sections over top-of-page chrome, ads, or unrelated sidebars when those can be distinguished from the snapshot tree
|
|
474
|
-
- when compacting hides actionable controls, snapshot output should add an `Omitted high-value controls` section for bounded
|
|
529
|
+
- when compacting hides actionable controls, snapshot output should add an `Omitted high-value controls` section for bounded editable/searchbox/textbox/combobox controls, named tab/surface controls, primary action buttons, and other useful controls such as checkboxes, radios, options, and menuitems that were not already shown in key refs
|
|
475
530
|
|
|
476
531
|
Examples:
|
|
477
532
|
- small `snapshot` results should include the actual snapshot text
|
|
478
|
-
- oversized `snapshot` results should switch to a compact view that preserves the primary content, nearby sections, a trimmed set of high-value refs, and a separate bounded list of omitted high-value controls when dense pages would otherwise hide
|
|
533
|
+
- oversized `snapshot` results should switch to a compact view that preserves the primary content, nearby sections, a trimmed set of high-value refs, and a separate bounded list of omitted high-value controls when dense pages or desktop host screens would otherwise hide editable inputs, named surfaces/tabs, or primary action buttons, while exposing the full raw snapshot path directly in the rendered tool text and via `details.fullOutputPath`
|
|
479
534
|
- successful navigation actions like `click`, `back`, `forward`, and `reload` should include a lightweight post-action title/url summary when the wrapper can address the active session
|
|
480
535
|
- `tab list` should include a readable tab summary
|
|
481
536
|
- `screenshot` should include the saved-path summary plus the inline image attachment when available
|
|
@@ -513,6 +568,8 @@ Stable category fields are part of the machine-readable contract:
|
|
|
513
568
|
|
|
514
569
|
These categories are intentionally bounded and stable so agents can branch on them instead of parsing prose. They do not replace raw diagnostics: `details.error`, `details.stderr`, `details.parseError`, `details.validationError`, and visible content still preserve the specific upstream or wrapper message after normal redaction.
|
|
515
570
|
|
|
571
|
+
Real Pi custom tools only mark a tool result failed when the tool throws during `execute`; returned `isError` fields are not authoritative. The extension therefore also registers a `tool_result` handler that treats any `agent_browser` result with `details.resultCategory: "failure"` as a real Pi tool error. For normal prose output, it appends `Result category: failure; failureCategory: …; Pi tool isError: true.` to model-visible text. For caller-requested `--json` output, it only patches `isError` and preserves the visible JSON string unchanged so the content remains parseable. The hook treats `--json` as requested when echoed `details.args` or the original tool `input.args` includes that flag; it skips appending the prose notice only when a text content item is non-empty parseable JSON (so mixed or invalid JSON bodies still get the visible line). Implementation: `buildAgentBrowserToolResultPatch` in `extensions/agent-browser/index.ts`. This keeps Pi transcript semantics aligned with the machine-readable result contract, including wrapper-side reclassifications such as `qa-failure` after an upstream-successful batch.
|
|
572
|
+
|
|
516
573
|
For `batch`, top-level `details` still carries `resultCategory` plus `successCategory` or `failureCategory` for the **aggregate** tool outcome: if any step fails, the overall result is a failure (`resultCategory: "failure"`) even when later steps succeed—inspect `batchSteps[]` for per-step outcomes. Each `batchSteps[]` entry includes its own `resultCategory` and either `successCategory` or `failureCategory` for that step. `batchFailure.failedStep` duplicates the first failing step’s details, including its `failureCategory` and any `nextActions`.
|
|
517
574
|
|
|
518
575
|
Top-level `details.data` on `batch` is a compact per-step roll-up (not a verbatim replay of raw upstream batch JSON): each element is `{ success, command, result? | error? }` where `command` is argv-redacted the same way as echoed invocation args (including `cookies set` cookie values, `storage local|session set` values, and other sensitive flags/positionals), `result` is the presentation-layer data for that step after the same structured redaction as non-batch commands, and `error` is failure text with cookie/storage/password literals stripped when those values appeared in argv. Prefer `batchSteps[]` for full per-step `details` (artifacts, categories, spill paths); use the roll-up when you only need a redacted matrix of what ran.
|
|
@@ -524,17 +581,17 @@ Ref preflight details (implementation in `extensions/agent-browser/index.ts`):
|
|
|
524
581
|
- **URL alignment:** `refSnapshot.target.url` and the session’s current tab URL are compared via `targetsMatch` / `normalizeComparableUrl` in `extensions/agent-browser/index.ts`: values are trimmed, parsed as URLs when possible, compared **after dropping the `#fragment`**, and the query string remains significant. If either side lacks a `url`, `targetsMatch` treats the pair as matching so early-session calls are not blocked.
|
|
525
582
|
- **Batch stdin ordering:** user `batch` JSON is scanned in order. Any step whose first token is in `REF_INVALIDATING_BATCH_COMMANDS` sets a latch that blocks later steps whose first token is in `REF_GUARDED_COMMANDS` and that mention `@e…` refs. A step whose first token is `snapshot` clears that latch for subsequent steps (pre-spawn intent only; it does not wait for upstream success). The invalidating set includes navigation/mutation verbs such as `open`, `goto`, `reload`, `click`, and related upstream commands; same-snapshot `fill` rows stay guarded but do not set the latch, allowing ordinary form-fill batches before a click/submit step. The guarded set is the commands that accept page-scoped refs for interaction (`click`, `fill`, `download`, `scrollintoview`, and others enumerated next to those literals in source). Changing either set requires updating this contract, [`docs/SUPPORT_MATRIX.md`](SUPPORT_MATRIX.md) `RQ-0072`/`RQ-0087` notes, README and command-reference pitfalls, and `test/agent-browser.extension-validation.test.ts`.
|
|
526
583
|
|
|
527
|
-
**Presentation redaction (implementation map):** Successful non-`batch` tool calls and each successful `batchSteps[]` row run upstream `data` through `redactPresentationData` in `extensions/agent-browser/lib/results/presentation.ts`: `cookies` and `storage` walk objects/arrays and replace case-insensitive `value` keys with `"[REDACTED]"` (diagnostic formatters still describe rows without expanding secrets); every other command’s payload is recursively scrubbed with `redactStructuredPresentationValue`, which redacts known sensitive key names and applies string-level sensitivity heuristics so network, diff, trace/profiler, stream, dashboard, chat, and other structured results do not echo bearer tokens, proxy credentials, or similar fields verbatim into `details.data`. Echoed `command` arrays in `details` and in batch roll-ups use `redactInvocationArgs` from `extensions/agent-browser/lib/runtime.ts` to mask trailing values for sensitive global flags (including `--body`, `--headers`, `--password`, and `--proxy`), preserve the special positional rules for `cookies set`, `storage local|session set`, and `set credentials`, and scrub other argv tokens for URLs and inline secrets. Failed batch steps additionally run `redactExactValues` on structured step errors so literals taken from that step’s argv (cookie value, storage set value, `--password` / `--password=` tokens) cannot reappear inside formatted error blobs.
|
|
584
|
+
**Presentation redaction (implementation map):** Successful non-`batch` tool calls and each successful `batchSteps[]` row run upstream `data` through `redactPresentationData` in `extensions/agent-browser/lib/results/presentation/diagnostics.ts`: `cookies` and `storage` walk objects/arrays and replace case-insensitive `value` keys with `"[REDACTED]"` (diagnostic formatters still describe rows without expanding secrets); every other command’s payload is recursively scrubbed with `redactStructuredPresentationValue`, which redacts known sensitive key names and applies string-level sensitivity heuristics so network, diff, trace/profiler, stream, dashboard, chat, and other structured results do not echo bearer tokens, proxy credentials, or similar fields verbatim into `details.data`. Echoed `command` arrays in `details` and in batch roll-ups use `redactInvocationArgs` from `extensions/agent-browser/lib/runtime.ts` to mask trailing values for sensitive global flags (including `--body`, `--headers`, `--password`, and `--proxy`), preserve the special positional rules for `cookies set`, `storage local|session set`, and `set credentials`, and scrub other argv tokens for URLs and inline secrets. Failed batch steps additionally run `redactExactValues` on structured step errors so literals taken from that step’s argv (cookie value, storage set value, `--password` / `--password=` tokens) cannot reappear inside formatted error blobs.
|
|
528
585
|
|
|
529
|
-
`nextActions` is an optional machine-readable list of exact native `agent_browser` follow-ups. Each entry includes `tool: "agent_browser"`, an `id`, a short `reason`, optional `safety`, and either `params` (`args`, optional `stdin`, optional `sessionMode`, optional `networkSourceLookup`, optional `electron`) or an `artifactPath` for saved-file workflows. Agents should prefer these payloads over prose when present. Current recommendations include: Electron launches → wrapper-tracked `electron.status` / `electron.probe` / `electron.cleanup` actions plus session-scoped tab/snapshot inspection when attached; Electron status/probe mismatch diagnostics → `reattach-electron-launch` plus fresh tab/snapshot inspection; Electron post-command health failures → status/probe/cleanup for the same `launchId`; Electron fill verification mismatches → `inspect-after-fill-verification` and `verify-filled-value`; Electron same-URL ref freshness warnings → `refresh-electron-refs-after-rerender`; packaged-Electron `sourceLookup` no-candidate diagnostics → session snapshot, launch probe, and tab list; Electron cleanup partial failures → status plus retry-cleanup for the same wrapper-owned `launchId`; `open` success → `snapshot -i`; mutating/navigation commands (see `buildAgentBrowserNextActions` in source for the exact command set) → `snapshot -i`; stale refs and selector failures → `snapshot -i` via `refresh-interactive-refs` (prefixed with `--session <name>` when the failed call ran in a named or managed session); selector misses with exact current snapshot role/name matches → direct ref retries via `try-current-visible-ref` or bounded `try-current-visible-ref-N
|
|
586
|
+
`nextActions` is an optional machine-readable list of exact native `agent_browser` follow-ups. Each entry includes `tool: "agent_browser"`, an `id`, a short `reason`, optional `safety`, and either `params` (`args`, optional `stdin`, optional `sessionMode`, optional `networkSourceLookup`, optional `electron`) or an `artifactPath` for saved-file workflows. Agents should prefer these payloads over prose when present. Tab/session recovery id strings are centralized in `AGENT_BROWSER_RECOVERY_NEXT_ACTION_IDS`, while rich-input focus/click recovery ids are centralized in `AGENT_BROWSER_RICH_INPUT_RECOVERY_NEXT_ACTION_IDS` plus `getAgentBrowserRichInputRecoveryNextActionId(s)` in `extensions/agent-browser/lib/results/recovery-actions.ts` (both registries are also re-exported from `shared.ts`); docs and tests mirror those registries/helpers rather than inventing recovery ids in prose. Current recommendations include: raw `connect` success → session-scoped `list-connected-session-tabs` only, then the agent should inspect/select a stable `tab t<N>` target and run `snapshot -i` explicitly; `snapshot` failures whose upstream error says `No active page` and whose wrapper result has a known session → `list-tabs-after-no-active-page` only, because this path has no wrapper-observed safe tab id to select atomically; Electron launches → wrapper-tracked `electron.status` / `electron.probe` / `electron.cleanup` actions plus session-scoped tab/snapshot inspection when attached; Electron status/probe mismatch diagnostics → `reattach-electron-launch` plus fresh tab/snapshot inspection; Electron post-command health failures → status/probe/cleanup for the same `launchId`; Electron fill verification mismatches → `inspect-after-fill-verification` and `verify-filled-value`; Electron same-URL ref freshness warnings → `refresh-electron-refs-after-rerender`; packaged-Electron `sourceLookup` no-candidate diagnostics → session snapshot, launch probe, and tab list; Electron cleanup partial failures → status plus retry-cleanup for the same wrapper-owned `launchId`; `open` success → `snapshot -i`; mutating/navigation commands (see `buildAgentBrowserNextActions` in source for the exact command set) → `snapshot -i`; stale refs and selector failures → `snapshot -i` via `refresh-interactive-refs` (prefixed with `--session <name>` when the failed call ran in a named or managed session); selector misses with exact current snapshot role/name matches → direct ref retries via `try-current-visible-ref` or bounded `try-current-visible-ref-N` for non-fill targets; semantic `fill` selector misses with exact current editable refs → `focus-current-editable-ref` / `click-current-editable-ref` or numbered variants that do not include fill text or submit; unknown getter shortcuts such as `title` / `url` → exact read-only retries like `get title` / `get url` with ids `use-get-title` / `use-get-url`; compact `network requests` results with safe request IDs → bounded read-only request detail, `networkSourceLookup`, path filter, or HAR-capture follow-ups; semantic `selector-not-found` failures that compiled from `semanticAction` may append `try-button-name-candidate` or `try-link-name-candidate` after presentation `nextActions` only for the bounded click pair enumerated under `semanticAction`; semantic `stale-ref` failures that compiled from `semanticAction` `find` argv may also include `retry-semantic-action-after-stale-ref` after that snapshot step; qualifying same-URL non-Electron top-level clicks (see `overlayBlockers` below) with fresh snapshot evidence of likely overlay/banner/dialog close controls may append `inspect-overlay-state` and bounded `try-overlay-blocker-candidate-*` entries; successful top-level `scroll` calls whose pre/post viewport and sampled scroll-container positions do not change may append `inspect-after-noop-scroll` and `verify-noop-scroll-visually`; explicit combobox-targeted actions that focus a combobox without visible options may append `inspect-focused-combobox`, `try-open-combobox-with-arrow`, and `try-open-combobox-with-enter`; `get text <selector>` calls with hidden/multiple CSS matches may append `inspect-visible-text-candidates` with a read-only `eval --stdin` probe (each prefixed with `--session <name>` when `details.sessionName` is set, same `sessionPrefixArgs` rule as other session-scoped follow-ups); confirmations → exact `confirm <id>` and `deny <id>` choices; generic tab drift → `list-tabs-for-recovery` with `tab list` first, then select or confirm the stable target before running `snapshot -i`; about:blank or tab-drift recovery with a wrapper-known target → `list-tabs-for-about-blank-recovery` or `list-tabs-for-tab-drift-recovery`, plus `select-intended-tab-after-drift` and `snapshot-after-tab-recovery` when the wrapper already observed the stable `t<N>` tab id; `wait --text` assertion failures → `inspect-after-text-assertion-failure` with a read-only snapshot; download verification failures or missing successful download artifacts → `wait --download [path]`; saved artifacts → the artifact path to inspect/consume after checking `artifactVerification`/metadata; missing non-download artifacts → `verify-artifact-path` so agents do not trust an absent file. When nothing applies, the field is omitted.
|
|
530
587
|
|
|
531
|
-
**Unknown-command getter hints (failure presentation):** `
|
|
588
|
+
**Unknown-command getter hints (failure presentation):** `buildErrorPresentation` in `extensions/agent-browser/lib/results/presentation/errors.ts` only runs this path when upstream error text (after model-facing redaction) matches `unknown command`, `unknown subcommand`, or `unrecognized command` (case-insensitive) **and** the failed invocation’s primary command token is one of `attr`, `count`, `html`, `text`, `title`, `url`, or `value`. Visible text then includes a grouped-`get` hint line plus per-token guidance (`get text <selector>`, `get html …`, `get attr …`, `get count …`, `get value …`, `get title`, `get url`). Machine `nextActions` with ids `use-get-title` / `use-get-url` are emitted only for `title` / `url`, with `params.args` optionally prefixed by `--session <name>` when the failed call targeted a named session. If the error string already contains `Agent-browser hint:` from selector recovery (stale-ref or unsupported selector dialect appendages), the getter block is skipped so two stacked `Agent-browser hint:` headers are not emitted.
|
|
532
589
|
|
|
533
590
|
For `network requests`, `details.nextActions` is bounded to one selected safe request ID, preferring actionable failed rows, then API/fetch-like rows, then benign failed rows, then the first request with a safe ID. Detail/filter/HAR actions use `params.args` and preserve a known `--session <name>` prefix when the current presentation has `details.sessionName`; source-candidate actions use `params.networkSourceLookup` with the selected `requestId` plus `session` when known and are only emitted for actionable failed rows that the failed-request analyzer can correlate. URLs and query strings are not copied into action params; path filters are skipped when they look sensitive or too large.
|
|
534
591
|
|
|
535
592
|
For `batch`, each `batchSteps[]` entry can carry its own `nextActions` for that step’s success or failure. Top-level `details.nextActions` on a failed batch duplicates `batchFailure.failedStep.nextActions` so callers can read one aggregate object. On a fully successful batch, top-level `nextActions` may still list artifact follow-ups derived from the combined step artifacts.
|
|
536
593
|
|
|
537
|
-
`pageChangeSummary` is an optional compact summary for mutation-prone and artifact-producing commands. It includes `changeType` (`"navigation"`, `"mutation"`, `"artifact"`, or `"confirmation"`), `command`, a readable `summary`, optional `title`/`url`, optional `artifactCount` or `savedFilePath`, and `nextActionIds` that link the observed change to `nextActions` without repeating full payloads. The wrapper maintains an explicit allowlist of mutation-prone commands in `extensions/agent-browser/lib/results/presentation.ts` (`PAGE_CHANGE_SUMMARY_COMMANDS`): those commands still emit a `mutation`-typed summary when upstream JSON lacks navigation metadata, as long as no stronger signal (artifact, saved path, navigation fields, or pending confirmation) applies. Commands outside that set omit `pageChangeSummary` unless the parsed payload shows navigation, a confirmation prompt, saved files, or artifacts—including read-only inspection commands, which normally have no summary unless one of those signals appears. For `batch`, the top-level summary favors artifact rollups when any step produced artifacts; otherwise it may synthesize a `mutation` summary from steps that carried their own `pageChangeSummary`. Treat mutation summaries as "upstream attempted the action" evidence, not proof the application handled it; agents should verify URL/text/state for important mutations before continuing.
|
|
594
|
+
`pageChangeSummary` is an optional compact summary for mutation-prone and artifact-producing commands. It includes `changeType` (`"navigation"`, `"mutation"`, `"artifact"`, or `"confirmation"`), `command`, a readable `summary`, optional `title`/`url`, optional `artifactCount` or `savedFilePath`, and `nextActionIds` that link the observed change to `nextActions` without repeating full payloads. The wrapper maintains an explicit allowlist of mutation-prone commands in `extensions/agent-browser/lib/results/presentation/navigation.ts` (`PAGE_CHANGE_SUMMARY_COMMANDS`): those commands still emit a `mutation`-typed summary when upstream JSON lacks navigation metadata, as long as no stronger signal (artifact, saved path, navigation fields, or pending confirmation) applies. Commands outside that set omit `pageChangeSummary` unless the parsed payload shows navigation, a confirmation prompt, saved files, or artifacts—including read-only inspection commands, which normally have no summary unless one of those signals appears. For `batch`, the top-level summary favors artifact rollups when any step produced artifacts; otherwise it may synthesize a `mutation` summary from steps that carried their own `pageChangeSummary`. Treat mutation summaries as "upstream attempted the action" evidence, not proof the application handled it; agents should verify URL/text/state for important mutations before continuing.
|
|
538
595
|
|
|
539
596
|
`overlayBlockers` may appear after a successful **top-level non-Electron** `click` (the unified `details.command` is `click`, not `batch`/`job`/`qa` flows that compile to `batch`) only when upstream JSON includes a string `data.clicked` ref, the session’s prior pinned tab URL (`priorSessionTabTarget.url`) and the post-click active tab URL both exist and stay equal after the same URL normalization used for ref preflight (trimmed hosts/paths; **`#fragment` dropped** while the query string stays significant), and the wrapper did not apply session tab correction or an about-blank mismatch recovery in the same result. Wrapper-tracked Electron clicks prefer lifecycle health and ref-freshness diagnostics because desktop app chrome produced too many false overlay candidates in dogfood. The post-click side comes from `details.navigationSummary.url`, which the wrapper fills with one read-only `eval` summary (`({ title: document.title, url: location.href })`) only when upstream click JSON omits **both** string `data.url` and `data.title` (`shouldCaptureNavigationSummary` in `extensions/agent-browser/index.ts`). If either field is present as a string on the click payload, that probe is skipped, `navigationSummary` stays unset here, and overlay diagnostics are omitted even when the page did not navigate. The wrapper then issues **one** extra session-scoped `snapshot -i`, scans that snapshot’s `refs` map, and only emits diagnostics when **both** are true: at least one ref has a strong modal role (`dialog` or `alertdialog`), and there are up to **three** separate `button`/`link`/`menuitem` refs whose names match close/dismiss-style patterns (for example “Close”, “Dismiss”, “No thanks”, or a lone `×`). Page-wide text such as “privacy”, “sign in”, or “banner” without a dialog role is not enough, which avoids warning on ordinary same-page menu opens or app button mutations. Each candidate carries `ref` (`@eN`), optional `role`/`name`, exact `click` argv in `args`, and a short evidence `reason`. The struct also includes a `summary` string (one sentence stating that the click left the tab on the same normalized URL and the fresh snapshot shows likely dismiss controls) plus a `snapshot` object (same shape as `details.refSnapshot` after a normal snapshot): on success the wrapper may treat that snapshot as the session’s latest ref map for subsequent calls, so agents should assume refs can move to match this post-diagnostic tree. Visible text appends the same bullets under `Possible overlay blockers`, and `details.nextActions` gains `inspect-overlay-state` plus `try-overlay-blocker-candidate-1`…`3` after any presentation `nextActions` (for example `inspect-after-mutation`); when `details.sessionName` is set, those appended actions use `sessionPrefixArgs` so `params.args` begin with `--session <name>` unless argv already starts with `--session`. This is conservative evidence, not proof the candidate should be clicked; prefer `inspect-overlay-state` first unless the dismiss control is clearly safe.
|
|
540
597
|
|
|
@@ -552,7 +609,7 @@ Example shape (fields vary by scenario):
|
|
|
552
609
|
]
|
|
553
610
|
```
|
|
554
611
|
|
|
555
|
-
When `semanticAction` produced compiled `find` argv and the unified result is `failureCategory: "stale-ref"` with `details.compiledSemanticAction` still present, `nextActions` chains snapshot refresh then the compiled `find` retry; `select` shorthands with stale `@refs` stop at refresh guidance. `reason` / `safety` strings match `buildAgentBrowserNextActions` in `extensions/agent-browser/lib/results/
|
|
612
|
+
When `semanticAction` produced compiled `find` argv and the unified result is `failureCategory: "stale-ref"` with `details.compiledSemanticAction` still present, `nextActions` chains snapshot refresh then the compiled `find` retry; `select` shorthands with stale `@refs` stop at refresh guidance. `reason` / `safety` strings match `buildAgentBrowserNextActions` in `extensions/agent-browser/lib/results/action-recommendations.ts` and the append in `extensions/agent-browser/index.ts`:
|
|
556
613
|
|
|
557
614
|
```json
|
|
558
615
|
"nextActions": [
|
|
@@ -586,9 +643,9 @@ When `semanticAction` produced compiled `find` argv and the unified result is `f
|
|
|
586
643
|
|
|
587
644
|
Implementation and precedence:
|
|
588
645
|
|
|
589
|
-
-
|
|
590
|
-
- Artifact verification: `ArtifactVerificationSummary` / `ArtifactVerificationEntry` types live in `
|
|
591
|
-
- Inner success categories (`classifyAgentBrowserSuccessCategory` in `
|
|
646
|
+
- Shared machine-readable types are centralized in `extensions/agent-browser/lib/results/contracts.ts` (including re-exports such as `AgentBrowserNextAction` from `next-actions.ts`). Classifiers live in `categories.ts` (`classifyAgentBrowserSuccessCategory`, `classifyAgentBrowserFailureCategory`, `buildAgentBrowserResultCategoryDetails`—the last prefers an explicit `failureCategory` when the caller already knows the bucket, otherwise it runs the classifier). Generic follow-up assembly lives in `action-recommendations.ts` (`buildAgentBrowserNextActions`). Tab/session recovery ids live in `recovery-actions.ts` (`AGENT_BROWSER_RECOVERY_NEXT_ACTION_IDS`, `AGENT_BROWSER_RICH_INPUT_RECOVERY_NEXT_ACTION_IDS`, `getAgentBrowserRichInputRecoveryNextActionId`, `getAgentBrowserRichInputRecoveryNextActionIds`, `buildRecoveryNextActions`) and session-aware wrappers live in `recovery-next-actions.ts`. Selector miss and rich-input diagnostic shapes/actions live in `selector-recovery.ts`. `extensions/agent-browser/lib/results/shared.ts` re-exports focused modules for compatibility only. Failed upstream `network requests` rows flow through `classifyNetworkRequestFailure` / `summarizeNetworkFailures` in `network.ts` for QA analysis (`analyzeQaPresetResults` in `extensions/agent-browser/index.ts`) and for actionable-vs-benign lines plus request-specific nextActions in `network requests` presentation (`extensions/agent-browser/lib/results/presentation/diagnostics.ts`).
|
|
647
|
+
- Artifact verification: `ArtifactVerificationSummary` / `ArtifactVerificationEntry` types live in `contracts.ts`. `buildArtifactVerificationSummary`, `getArtifactVerificationEntry`, and `getManifestVerificationEntry` in `presentation/artifacts.ts` merge each resolved file artifact with manifest rows whose `storageScope` is not `explicit-path` (those rows duplicate file artifacts) and whose `path` is in the current result’s spill path set. Successful presentation merges then run `classifyPresentationSuccessCategory` in `presentation/artifacts.ts`, which forces `successCategory: "artifact-unverified"` when `artifactVerification.missingCount` or `artifactVerification.unverifiedCount` is greater than zero before delegating to `classifyAgentBrowserSuccessCategory`.
|
|
648
|
+
- Inner success categories (`classifyAgentBrowserSuccessCategory` in `categories.ts`, after verification counts are clear): if `inspection` is true → `"inspection"`; else if any non-pending artifact lacks confirmed on-disk presence (`exists !== true`) → `"artifact-unverified"`; else if there is a `savedFile` or any `artifacts` → `"artifact-saved"`; else → `"completed"`.
|
|
592
649
|
- Failure: the classifier walks a single ordered chain (first match wins): `confirmation-required` → `timeout` → `missing-binary` → `parse-failure` → `aborted` → `policy-blocked` → `cleanup-failed` → `tab-drift` → `stale-ref` (including “unknown ref” text and a narrow `@eN` plus “element not found” heuristic) → `selector-unsupported` → `selector-not-found` → `download-not-verified` (download / wait-download style failures) → `validation-error` when a wrapper `validationError` is present → default `upstream-error`.
|
|
593
650
|
- The main tool implementation merges these fields into Pi-facing `details` from `extensions/agent-browser/index.ts` and from `extensions/agent-browser/lib/results/presentation.ts` for presentation-time failures.
|
|
594
651
|
|
|
@@ -599,23 +656,25 @@ Additional structured fields can appear when relevant:
|
|
|
599
656
|
- `compiledSourceLookup` when the call used `sourceLookup`: `{ args: ["batch"], stdin, steps, query }` with the generated local-evidence plan and original query fields (`selector?`, `reactFiberId?`, `componentName?`, `includeDomHints?`, `maxWorkspaceFiles?`).
|
|
600
657
|
- `sourceLookup` when the call used `sourceLookup`: `{ status, candidates, limitations, summary, workspaceRoot?, electronContext? }`; wrapper-tracked packaged Electron no-candidate diagnostics may carry `workspaceRoot` plus `electronContext` and live Electron nextActions without marking the successful batch as a tool failure.
|
|
601
658
|
- `compiledNetworkSourceLookup` / `networkSourceLookup` when the call used `networkSourceLookup`: the generated batch plan plus bounded failed-request/candidate evidence as described above.
|
|
602
|
-
- `qaPreset` when the call used `qa`: `{ passed, failedChecks, warnings, summary }`. Network rows inside the `network requests` batch step use `summarizeNetworkFailures` / `classifyNetworkRequestFailure` in `
|
|
659
|
+
- `qaPreset` when the call used `qa`: `{ passed, failedChecks, warnings, summary }`. Network rows inside the `network requests` batch step use `summarizeNetworkFailures` / `classifyNetworkRequestFailure` in `network.ts`: actionable failures appear in `failedChecks` (and fail the tool when the upstream batch still succeeded); benign icon-classified failures appear only in `warnings` and in `summary` as `QA preset passed with warnings: …` when nothing else failed.
|
|
603
660
|
- `compiledElectron` when the call used `electron`: redacted action plan for `list`, `launch`, `status`, `cleanup`, or `probe`.
|
|
604
661
|
- `electron` when the call used `electron`: action-specific lifecycle, discovery, probe, and cleanup data; see the `electron` section below.
|
|
605
662
|
- `batchFailure` and `batchSteps` for `batch` rendering, including mixed-success runs
|
|
606
663
|
- `navigationSummary` for navigation-style commands like `click`, `back`, `forward`, and `reload`
|
|
607
664
|
- `pageChangeSummary` for compact mutation/artifact/navigation summaries on commands that can change browser state
|
|
608
665
|
- `overlayBlockers` for conservative post-click overlay/banner/dialog blocker candidates when a direct click stays on the same URL and a fresh snapshot provides evidence (`candidates`, `summary`, and `snapshot` per `OverlayBlockerDiagnostic` in `extensions/agent-browser/index.ts`)
|
|
609
|
-
- `visibleRefFallback` after a raw `find` or compiled `semanticAction` fails with `selector-not-found` and a fresh snapshot finds exact role/name `@ref` matches. Shape follows `VisibleRefFallbackDiagnostic` in `extensions/agent-browser/
|
|
666
|
+
- `visibleRefFallback` after a raw `find` or compiled `semanticAction` fails with `selector-not-found` and a fresh snapshot finds exact role/name `@ref` matches. Shape follows `VisibleRefFallbackDiagnostic` in `extensions/agent-browser/lib/results/selector-recovery.ts`: `{ candidates, snapshot, summary, target }`, where each candidate has `ref`, `role`, `name`, optional direct ref `args`, and `reason`; visible text appends `Current snapshot ref fallback`. Non-fill candidates with direct args add `try-current-visible-ref` or numbered `try-current-visible-ref-N` actions. Fill candidates omit direct args and target text so recovery details do not repeat potentially sensitive fill text.
|
|
667
|
+
- `refSnapshotInvalidation` after a session `snapshot` fails with `No active page`. Shape follows `SessionRefSnapshotInvalidation` in `extensions/agent-browser/lib/session-page-state.ts`: `{ reason: "no-active-page", summary }`. The wrapper deletes prior refs for that session, persists the invalidation for resume, and blocks mutation-prone `@e…` preflight with `failureCategory: "stale-ref"` until a successful fresh `snapshot -i` records refs again.
|
|
668
|
+
- `richInputRecovery` after a raw `find` or compiled `semanticAction` `fill` fails with `selector-not-found` and the same current-ref diagnostic finds exact editable `searchbox` / `textbox` candidates. Shape follows `RichInputRecoveryDiagnostic` in `extensions/agent-browser/lib/results/selector-recovery.ts`: `{ candidates, inputMethodHint, nextActionIds, summary, target }`, where each candidate has `ref`, `role`, `name`, `focusArgs`, `clickArgs`, and `reason`. Visible text appends `Rich input recovery`, and `details.nextActions` gains ids from `getAgentBrowserRichInputRecoveryNextActionIds`: `focus-current-editable-ref` / `click-current-editable-ref` (or numbered variants). These actions are bounded to focus/click/inspect-style recovery: they do not include the fill text, do not press `Enter`, and do not submit. After the right current editable ref is focused, the agent should use `keyboard inserttext` or `keyboard type` with the intended text in a separate call and submit only when explicitly required by the flow.
|
|
610
669
|
- `scrollNoop` after a successful **top-level** `scroll` when wrapper-side read-only probes before and after the command show no change in `window.scrollX` / `window.scrollY` and no change in the sampled prominent scrollable containers. To avoid pre-launching a session without caller startup state, this probe is skipped when the invocation includes startup-scoped flags such as `--profile`, `--state`, `--session-name`, `--cdp`, providers, init scripts, or similar launch settings. Shape: `{ reason: "no-observed-scroll-position-change", message, before, after, recommendations }`; `before` / `after` include viewport dimensions, document scroll dimensions, and up to ten sampled container descriptors plus scroll offsets. Container descriptors use only sample index, tag name, and ARIA role; DOM ids/classes are intentionally not stored. This diagnostic is conservative evidence that the page-level scroll likely missed a nested pane, not proof that every app-specific region is unchanged. Visible text appends `Scroll diagnostic: no observed scroll movement`, and `details.nextActions` gains `inspect-after-noop-scroll` (`snapshot -i`) plus `verify-noop-scroll-visually` (`screenshot`), session-prefixed when applicable.
|
|
611
670
|
- `comboboxFocus` after a successful explicit combobox-targeted `click` / `fill` / `find … click|fill` (for example `semanticAction` with role `combobox`, including when that semantic action resolves through a current visible `@ref` before execution) when a read-only probe sees the active element is combobox-like, `aria-expanded` is explicitly present (`false` or `true`), and no visible `listbox` / `option` / menu option elements are open. Shape: `{ reason: "focused-combobox-without-visible-options", message, activeElement, visibleListboxCount, visibleOptionCount, recommendations }`; `activeElement` includes bounded role/tag/expanded/hasPopup/name metadata with normal text redaction. Visible text appends `Combobox diagnostic: focused combobox did not expose visible options`, and `details.nextActions` gains `inspect-focused-combobox` (`snapshot -i`), `try-open-combobox-with-arrow` (`press ArrowDown`), and `try-open-combobox-with-enter` (`press Enter`), session-prefixed when applicable. The diagnostic is deliberately gated to explicit combobox-targeted calls to avoid extra probes or false positives on ordinary clicks/textboxes.
|
|
612
671
|
- `recordingDependencyWarning` after a successful `record start` or `record restart` when the wrapper cannot find an executable `ffmpeg` on the Pi process `PATH`. Shape: `{ reason: "ffmpeg-missing-for-recording", dependency: "ffmpeg", command, message, recommendations }`. Visible text appends `Recording dependency warning: ffmpeg not found on PATH`. This is a non-blocking preflight warning: upstream may start recording, but `record stop` needs `ffmpeg` to encode the WebM.
|
|
613
|
-
- `selectorTextVisibility` after a **successful** upstream `get text <selector>` (standalone or inside a successful `batch`) when the wrapper’s follow-up probe finds a hazard: more than one DOM match (upstream reads the first `querySelectorAll` hit, which may be the wrong tab/panel), or the first match is hidden while at least one other match is visible (requires multiple DOM nodes so a visible peer exists; a lone hidden match is not flagged). The probe is a read-only `eval --stdin` script (`buildVisibleTextProbeScript` in `extensions/agent-browser/index.ts`) that counts matches, applies a small visibility heuristic (`display`/`visibility`/`opacity` plus non-zero client rects), and may include a redacted `firstVisibleTextPreview`. It is **not** run for page-scoped `@e…` selectors or when the selector string is withheld because `selectorMayExposeSensitiveLiteral` would risk echoing secrets in probe output. `details.selectorTextVisibility` mirrors the primary diagnostic (first sorted entry); when several selectors in one `batch` qualify, `selectorTextVisibilityAll` lists every diagnostic sorted so hidden-first cases precede generic multi-match ambiguity. Appended `details.nextActions` use ids `inspect-visible-text-candidates` and `inspect-visible-text-candidates-2`, … with the probe replayed via `eval --stdin` for each hazardous selector.
|
|
672
|
+
- `selectorTextVisibility` after a **successful** upstream `get text <selector>` (standalone or inside a successful `batch`) when the wrapper’s follow-up probe finds a hazard: more than one DOM match (upstream reads the first `querySelectorAll` hit, which may be the wrong tab/panel), or the first match is hidden while at least one other match is visible (requires multiple DOM nodes so a visible peer exists; a lone hidden match is not flagged). The probe is a read-only `eval --stdin` script (`buildVisibleTextProbeScript` in `extensions/agent-browser/index.ts`) that counts matches, applies a small visibility heuristic (`display`/`visibility`/`opacity` plus non-zero client rects), and may include a redacted `firstVisibleTextPreview`. It is **not** run for page-scoped `@e…` selectors or when the selector string is withheld because `selectorMayExposeSensitiveLiteral` would risk echoing secrets in probe output. `details.selectorTextVisibility` mirrors the primary diagnostic (first sorted entry); when several selectors in one `batch` qualify, `selectorTextVisibilityAll` lists every diagnostic sorted so hidden-first cases precede generic multi-match ambiguity. Appended visible warning text names the matching `details.nextActions` id so model-facing transcripts can recover without guessing. Appended `details.nextActions` use ids `inspect-visible-text-candidates` and `inspect-visible-text-candidates-2`, … with the probe replayed via `eval --stdin` for each hazardous selector.
|
|
614
673
|
- `electronGetTextScopeWarning` after a successful attached Electron `get text <selector>` (standalone or successful `batch`) when a broad non-ref CSS selector such as `body`, `html`, `main`, `div`, or `[role=application]` may read the whole app shell. Shape: `{ selector, summary, electronContext: { launchId?, sessionName?, url? } }`; multiple batched diagnostics use `electronGetTextScopeWarnings`. Visible text appends `Broad Electron get text selector warning`, and next actions use `snapshot-for-electron-text-scope` ids with session-scoped `snapshot -i` payloads.
|
|
615
674
|
- `evalStdinHint` after a successful `eval --stdin` when caller stdin (trimmed) looks function-shaped to the wrapper’s lightweight detector (`looksLikeFunctionEvalStdin` in `extensions/agent-browser/index.ts`: leading `function` / `async function`, parenthesized arrow `(…) =>`, or a concise `name =>` / `async name =>` form) **and** upstream JSON `data` is an object whose `result` field is a plain empty object (`{}`). Arrays such as `[]` do not qualify. It includes `reason` and `suggestion`; visible output appends `Eval stdin hint` with the same guidance. This is a heuristic for the common mistake of returning a function object instead of invoking it or passing a plain expression, not a JavaScript parser or proof that the page returned no useful data.
|
|
616
675
|
- `timeoutPartialProgress` after `runAgentBrowserProcess` reports `timedOut` (wrapper child-process watchdog) when best-effort recovery finds useful context. `summary` is a short sentence counting how many declared artifact paths exist on disk versus how many were scanned, and whether page context came from live session reads or only from a planned URL (when nothing in the plan declares an artifact path, the fraction may read `0/0` while `currentPage` can still carry session or planned URL context). `steps` lists planned argv from the compiled `job` or `qa` batch plan (`compiledJob` in `extensions/agent-browser/index.ts`, which is only populated for those top-level modes) or, when that object is absent, from the same JSON-array `batch` stdin the tool sends upstream—whether caller-authored or wrapper-generated for `sourceLookup` / `networkSourceLookup` (1-based indices; only JSON-array stdin whose elements are string[] argv arrays is parsed); timeouts on other argv shapes may still emit `currentPage` / summary evidence without `steps`. `currentPage` comes from session-scoped `get url` / `get title` when the session answers, otherwise a fallback URL may be inferred from the last `open` / `navigate` / `pushstate` step in the plan. `artifacts` covers declared output paths on `screenshot`, `pdf`, `download`, and `wait --download` steps (absolute path, existence, optional `sizeBytes`, `stepIndex`). Visible text repeats the same block under `Timeout partial progress`, applying URL and path-segment redaction; the prose `Planned steps` list shows at most six steps, then an omitted-count line when the plan is longer. This is recovery evidence only; missing entries do not prove the upstream step never ran or that no other side effects occurred.
|
|
617
676
|
- `managedSessionOutcome` after a managed-session plan reaches process execution (`buildManagedSessionOutcome` / `formatManagedSessionOutcomeText` in `extensions/agent-browser/index.ts`). Populated when `buildExecutionPlan` injects an extension-managed implicit or fresh `--session` (omitted when the caller already set explicit upstream `--session` or for stateless inspection paths that skip injection). Fields: `status` (`created`, `replaced`, `unchanged`, `closed`, `preserved`, or `abandoned`), `sessionMode`, `attemptedSessionName`, `previousSessionName`, `currentSessionName`, optional `replacedSessionName`, `activeBefore`, `activeAfter`, `succeeded`, and `summary`. Model-visible echo: only when `sessionMode` is `"fresh"` **and** `succeeded` is false, the wrapper appends a line of the form `Managed session outcome: ${summary}` after the primary presentation (including missing-binary failures on a fresh plan, where it follows the missing-binary message and no other diagnostic tail runs). When other trailing diagnostic prose is also emitted in the same result, that line is concatenated **after** semantic-action candidate lines, overlay/selector-visibility tails, and `Timeout partial progress` (see `rawAppendedDiagnosticText` in `extensions/agent-browser/index.ts`). For `"auto"` failures the same struct may appear on `details` without that extra line. When post-upstream analysis (for example **`qa`** preset failure) flips the overall tool result after a successful batch, the implementation only realigns `managedSessionOutcome.succeeded` to the final outcome; `status`/`summary` may still describe the managed-session transition (for example `replaced` while `failureCategory` is `qa-failure`), so read `failureCategory` / `qaPreset` / `batchFailure` alongside this object.
|
|
618
|
-
- `imagePath` / `imagePaths` for Pi inline image attachments from the **`screenshot`** command (including batched screenshot steps). **`diff screenshot`** still records the diff output as an `image`-kind entry in `details.artifacts`, but it does **not** populate `imagePath` / `imagePaths` or attach an inline image: only plain `screenshot` is treated as a trusted live-capture path for automatic inlining (`isTrustedScreenshotOutput` in `extensions/agent-browser/lib/results/presentation.ts`).
|
|
677
|
+
- `imagePath` / `imagePaths` for Pi inline image attachments from the **`screenshot`** command (including batched screenshot steps). **`diff screenshot`** still records the diff output as an `image`-kind entry in `details.artifacts`, but it does **not** populate `imagePath` / `imagePaths` or attach an inline image: only plain `screenshot` is treated as a trusted live-capture path for automatic inlining (`isTrustedScreenshotOutput` in `extensions/agent-browser/lib/results/presentation/artifacts.ts`).
|
|
619
678
|
- `artifacts` for upstream saved files such as screenshots, `state save` outputs, `diff screenshot` diff images, PDFs, downloads, `wait --download` files, traces, CPU profiles, completed WebM recordings, path-bearing HAR captures, and future recording output paths reported by `record start`. Each artifact includes the original saved or requested `path`, resolved `absolutePath`, `kind`/`artifactType`, optional `mediaType`, optional `extension`, best-effort disk metadata such as `exists` and `sizeBytes`, plus `requestedPath`, `status`, `cwd`, `session`, and `tempPath` when applicable.
|
|
620
679
|
- `savedFilePath` / `savedFile` for direct `download`, `pdf`, and `wait --download` saved-file workflows; batch results preserve the same fields on the relevant `batchSteps` entry.
|
|
621
680
|
- `batchSteps[].artifacts` for per-step artifacts in `batch` output; top-level `artifacts` aggregates all step artifacts in order
|
|
@@ -624,7 +683,7 @@ Additional structured fields can appear when relevant:
|
|
|
624
683
|
- `artifactManifest` for a bounded, metadata-only inventory of recent session artifacts. Entries include path metadata, artifact `kind`, source `command`/`subcommand` when safe, `storageScope` (`persistent-session`, `process-temp`, or `explicit-path`), and `retentionState` (`live`, `ephemeral`, `missing`, or `evicted`). The default recent window is 100 entries and can be configured with `PI_AGENT_BROWSER_SESSION_ARTIFACT_MANIFEST_MAX_ENTRIES`. The manifest must not store command args, output contents, headers, DOM snapshots, or downloaded file contents.
|
|
625
684
|
- `artifactRetentionSummary` with a concise count of live, evicted, ephemeral, and missing artifacts from the current manifest; results append this summary to model-facing text only when retention state affects recovery, such as spill files, ephemeral files, or evictions. Routine explicit saved files keep the summary in details to avoid noisy browsing transcripts.
|
|
626
685
|
- `artifactCleanup` after a successful `close` when `artifactManifest` exists and `entries` is non-empty. Fields: `owner: "host-file-tools"`, `summary` (same retention summary string as `artifactRetentionSummary` for that manifest), `note` explaining that browser close does not delete explicit screenshots/downloads/PDFs/traces/HAR/recordings, and `explicitArtifactPaths`: up to ten **distinct existing** paths taken from manifest rows with `storageScope: "explicit-path"` in encounter order (de-duplicated after checking the filesystem); deleted/stale explicit paths are skipped. When the recent window has no existing explicit rows—for example only spill/ephemeral inventory or explicit paths already deleted—the array is empty but `summary` / `note` still surface so agents know close is not file deletion. The native browser tool intentionally does not expose a delete operation for arbitrary user-chosen artifact paths; agents should inspect `artifactVerification` / manifest metadata, then remove files with normal host file tools when cleanup is required.
|
|
627
|
-
- compact **snapshot** metadata on successful presentation when `details.data.compacted` is true (oversized trees): `previewMode` (`"structured"` vs outline `"outline"`), `structuredPreviewUsed`, `previewRefIds`, `previewSections` (per-section `linesShown` / `omittedLines` / root `role` / `title`), `additionalSectionsOmitted`, counts such as `refCount`, `snapshotLineCount`, and `roleCounts`, optional `highValueControlRefIds` aligned with the visible `Omitted high-value controls` lines, and optional `spillError` when the wrapper could not write the raw spill file; the model text still ends with `Full raw snapshot path:` or an explicit unavailable reason plus `details.fullOutputPath` when a path exists
|
|
686
|
+
- compact **snapshot** metadata on successful presentation when `details.data.compacted` is true (oversized trees): `previewMode` (`"structured"` vs outline `"outline"`), `structuredPreviewUsed`, `previewRefIds`, `previewSections` (per-section `linesShown` / `omittedLines` / root `role` / `title`), `additionalSectionsOmitted`, counts such as `refCount`, `snapshotLineCount`, and `roleCounts`, optional `highValueControlRefIds` aligned with the visible bounded `Omitted high-value controls` lines, and optional `spillError` when the wrapper could not write the raw spill file; the model text still ends with `Full raw snapshot path:` or an explicit unavailable reason plus `details.fullOutputPath` when a path exists
|
|
628
687
|
- `sessionRecoveryHint` when startup-scoped flags need `sessionMode: "fresh"` while an implicit session is already active: includes `reason`, `recommendedSessionMode` (`"fresh"`), redacted `exampleArgs`, and `exampleParams` where `sessionMode` is `"fresh"` and `args` is the same redacted argv as `exampleArgs` (from `buildExecutionPlan` in `extensions/agent-browser/lib/runtime.ts`, merged through `redactRecoveryHint` in `extensions/agent-browser/index.ts`)
|
|
629
688
|
- `inspection: true` plus `stdout` for successful plain-text inspection commands like `--help` and `--version`
|
|
630
689
|
|
|
@@ -682,7 +741,7 @@ If `agent-browser` is not on `PATH`, fail with a message that:
|
|
|
682
741
|
- After launch-scoped open/goto/navigate calls that can restore existing tabs (for example --profile, --session-name, or --state), agent_browser best-effort re-selects the tab whose URL matches the returned page when restored tabs steal focus during launch.
|
|
683
742
|
- After the wrapper observes tab-drift risk for a session (for example profile restore correction, overlapping stale opens, or resumed session state), later active-tab commands best-effort pin that tab inside the same upstream invocation. Routine same-session commands are not preflighted with tab list just because a target tab is known.
|
|
684
743
|
- For sessions with observed tab-drift risk, after a successful command on a known target tab, agent_browser also best-effort restores that intended tab if a restored/background tab steals focus after the command completes. Routine same-session commands skip this post-command tab-list probe.
|
|
685
|
-
- If a known session target unexpectedly reports about:blank, agent_browser
|
|
744
|
+
- If a known session target unexpectedly reports about:blank, agent_browser best-effort re-selects the prior intended target when it still exists; if recovery fails, it records the observed about:blank target and reports exact recovery guidance instead of treating the prior page as active.
|
|
686
745
|
<!-- agent-browser-playbook:end wrapper-tab-recovery -->
|
|
687
746
|
- on local Unix launches, set a short private socket directory for wrapper-spawned `agent-browser` processes so extension-generated session names do not fail the upstream Unix socket-path length limit in longer cwd/session-name combinations
|
|
688
747
|
- keep wrapper-spawned commands below the upstream CLI IPC read-timeout budget by clamping `AGENT_BROWSER_DEFAULT_TIMEOUT` to 25 seconds and stopping a stuck child process before the upstream 30-second retry path begins (`PI_AGENT_BROWSER_PROCESS_TIMEOUT_MS` configures the watchdog); timed-out compiled `job` / `qa` or caller `batch` calls may add `details.timeoutPartialProgress` and visible `Timeout partial progress` evidence with planned steps, current page title/URL, and declared artifact path checks
|