pi-agent-browser-native 0.2.25 → 0.2.26

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -34,7 +34,8 @@ Agent-facing efficiency claims are measured with `npm run benchmark:agent-browse
34
34
 
35
35
  <!-- agent-browser-playbook:start shared-guidelines -->
36
36
  <!-- Generated from extensions/agent-browser/lib/playbook.ts. Run `npm run docs -- playbook write` to update. -->
37
- - Standard workflow: open the page, snapshot -i, interact using current @refs from that snapshot, and re-snapshot after navigation, scrolling, rerendering, or other major DOM changes because refs can become stale.
37
+ - Standard workflow: open the page, snapshot -i, interact using current @refs from that snapshot, and re-snapshot after navigation, scrolling, rerendering, or other major DOM changes because refs are page-scoped; the wrapper fails mutation-prone stale/recycled refs before upstream can silently target a different current-page element.
38
+ - When snapshot -i compacts because the tree is oversized, scan visible output for Omitted high-value controls and optional details.data.highValueControlRefIds before opening the spill file: those list bounded searchboxes, textboxes, comboboxes, buttons, tabs, checkboxes, radios, options, and menuitems that did not fit the key/other ref previews.
38
39
  - When a visible text or accessible-name target should survive ref churn, prefer find locators such as role, text, label, placeholder, alt, title, or testid with the intended action instead of guessing a CSS selector.
39
40
  - Do not assume Playwright selector dialects such as text=Close or button:has-text('Close') are supported wrapper syntax unless current upstream agent-browser behavior has been verified.
40
41
  - For authenticated or user-specific content like feeds, inboxes, dashboards, and accounts, prefer --profile Default on the first browser call and let the implicit session carry continuity. Use --auto-connect only if profile-based reuse is unavailable or the task is specifically about attaching to a running debug-enabled browser.
@@ -53,8 +54,8 @@ Agent-facing efficiency claims are measured with `npm run benchmark:agent-browse
53
54
  - For read-only browsing tasks, prefer extracting the answer from the current snapshot, structured ref labels, or eval --stdin on the current page before navigating away. Only click into media viewers, detail routes, or new pages when the current view does not contain the needed information.
54
55
  - For downloads, prefer download <selector> <path> when an element click should save a file. Do not rely on click alone when you need the downloaded file on disk.
55
56
  - When using eval --stdin, scope checks and actions to the target element or route whenever possible instead of relying on broad page-wide text heuristics.
56
- - When using eval --stdin for extraction, return the value you want instead of relying on console.log as the primary result channel.
57
- - When details.pageChangeSummary is present, use changeType and summary as a compact signal for navigation, DOM mutation, confirmations, or artifacts; when nextActionIds is set, match those ids to entries in details.nextActions (or per-step nextActions inside batch) for concrete follow-up payloads instead of inferring from prose alone.
57
+ - When using eval --stdin for extraction, return the value you want instead of relying on console.log as the primary result channel. Prefer plain expressions like ({ title: document.title }) or explicitly invoked functions like (() => ({ title: document.title }))(); if a function-shaped snippet returns {}, details.evalStdinHint may warn that the function was serialized instead of called. If get text on a CSS selector surfaces details.selectorTextVisibility or selectorTextVisibilityAll, prefer a visible @ref, a more specific selector, or the inspect-visible-text-candidates nextAction over hidden tab content.
58
+ - When details.pageChangeSummary is present, use changeType and summary as a compact signal for navigation, DOM mutation, confirmations, or artifacts; when nextActionIds is set, match those ids to entries in details.nextActions (or per-step nextActions inside batch) for concrete follow-up payloads instead of inferring from prose alone. If a no-navigation click surfaces details.overlayBlockers, inspect the fresh snapshot evidence before using a close/dismiss candidate nextAction.
58
59
  - When commands save or spill files (screenshots, downloads, PDFs, traces, recordings, HAR, large snapshot spills), treat paths as provisional until details.artifactVerification shows every row verified: branch on missingCount, pendingCount, unverifiedCount, per-entry state, and optional limitation before downstream file use.
59
60
  - Do not call --help or other exploratory inspection commands unless the user explicitly asks for them or debugging the browser integration is necessary.
60
61
  <!-- agent-browser-playbook:end shared-guidelines -->
@@ -100,6 +101,7 @@ Examples:
100
101
  - `fill` and `select` require non-empty `text` (compiled as the trailing value argument to `find`)
101
102
  - optional `name` is only valid with `locator: "role"` and compiles to `--name <name>` after the action (and after `text` when present)
102
103
  - optional `role` is accepted only when `locator` is `role` and must equal `value` if set (redundant with `value`; prefer `value` alone)
104
+ - optional `session` is an upstream session name; when set, compilation prepends `--session <session>` before `find` so the shorthand targets that named browser context instead of the managed default; this is independent of top-level `sessionMode`, which only injects or rotates the extension-managed implicit session when the planned argv does not already start with `--session` (see `buildExecutionPlan` in `extensions/agent-browser/lib/runtime.ts`). On successful unified results, `details.sessionName` matches that name and `usedImplicitSession` is `false` because the call named upstream directly rather than consuming the extension-managed implicit session slot.
103
105
 
104
106
  Compilation (then `--json` and session handling apply like any other call):
105
107
 
@@ -108,10 +110,15 @@ Compilation (then `--json` and session handling apply like any other call):
108
110
  | `click`, `check`, or `uncheck` + non-`role` locator | `["find", <locator>, <value>, <action>]` |
109
111
  | `click` / `check` / `uncheck` + `role` + optional `name` | `["find","role",<value>,<action>]` plus `["--name",<name>]` when `name` is set |
110
112
  | `fill` or `select` | `["find",<locator>,<value>,<action>,<text>]` plus optional `["--name",<name>]` after `text` when `locator` is `role` and `name` is set |
113
+ | any supported action + `session` | prepends `["--session",<session>]` before the compiled `find` argv |
111
114
 
112
115
  When `semanticAction` compiles successfully, `details.compiledSemanticAction` echoes `{ action, locator, args }` with `args` redacted the same way as other invocation details. Expect it on the initial wrapper validation return (when that path still builds the early `details` object) and on the unified result after `agent-browser` runs. It is omitted when the call used `args` only, when compilation never produced argv, and on some in-`execute` error returns that attach a slimmer `details` shape before the unified merge (for example certain session-plan, stdin-contract, tab-pinning, or missing-binary guard paths); compare `extensions/agent-browser/index.ts` where `compiledSemanticAction` is assigned.
113
116
 
114
- If a compiled `semanticAction` fails with `failureCategory: "stale-ref"`, `details.nextActions` includes `retry-semantic-action-after-stale-ref` with the exact compiled `find` argv in `params.args`. The wrapper appends that entry **after** any `refresh-interactive-refs` snapshot step from `buildAgentBrowserNextActions` in `extensions/agent-browser/lib/results/shared.ts` (see `extensions/agent-browser/index.ts` where `nextActions` is merged). That retry is only offered because the semantic target is stable and the stale-ref error proves the previous action did not execute; direct stale `@e…` commands still return snapshot/find recovery guidance instead of an unsafe blind retry.
117
+ If a compiled `semanticAction` fails with `failureCategory: "selector-not-found"`, visible content includes an `Agent-browser candidate fallbacks` block when the wrapper has bounded role/name retries for that locator and action, and `details.nextActions` includes the normal `refresh-interactive-refs` snapshot step plus those entries. When `session` was provided, candidate retry args preserve the same `--session <session>` prefix. Today `buildSemanticActionCandidateActions` in `extensions/agent-browser/index.ts` only appends candidates for: `fill` + `placeholder` → `try-searchbox-name-candidate` and `try-textbox-name-candidate` (same accessible name as `value`); `click` + `text` `try-button-name-candidate` and `try-link-name-candidate`; `fill` + `label` → `try-labeled-textbox-candidate`. `select` misses—including `select` + `placeholder`—do not append `try-*` entries even when a parallel `fill` would; other locator/action pairs omit this block too. `fill` candidates keep the same trailing `text` token as the original compile before `--name <value>`; `click` candidates omit text. Each entry carries `safety` noting the match may be ambiguous. Candidate fallbacks are heuristics, not proof that an element exists; inspect the page when several controls could share the same name.
118
+
119
+ If a compiled `semanticAction` fails with `failureCategory: "stale-ref"`, `details.nextActions` includes `retry-semantic-action-after-stale-ref` with the same redacted compiled argv as `details.compiledSemanticAction` in `params.args` (any leading `--session` pair from `semanticAction.session`, then the `find` tokens). The wrapper appends that entry **after** any `refresh-interactive-refs` snapshot step from `buildAgentBrowserNextActions` in `extensions/agent-browser/lib/results/shared.ts` (see `extensions/agent-browser/index.ts` where `nextActions` is merged). That retry is only offered because the semantic target is stable and the stale-ref error proves the previous action did not execute; direct stale `@e…` commands still return snapshot/find recovery guidance instead of an unsafe blind retry.
120
+
121
+ For direct page-scoped `@e…` refs, successful `snapshot` results record `details.refSnapshot` with the latest ref ids and page target for the session. Before mutation-prone ref commands such as `click`, `fill`, `check`, `select`, `download`, drag/upload/keyboard-style actions, or equivalent batch steps run, the wrapper rejects refs from an older page target or refs absent from the latest same-page snapshot with `failureCategory: "stale-ref"`. This is a best-effort wrapper guard against upstream ref-number recycling after navigation; it does not prove the DOM stayed unchanged after the snapshot. Refresh with the session-aware `refresh-interactive-refs` next action before retrying.
115
122
 
116
123
  Examples:
117
124
 
@@ -122,6 +129,7 @@ Examples:
122
129
  { "semanticAction": { "action": "check", "locator": "label", "value": "Remember me" } }
123
130
  { "semanticAction": { "action": "uncheck", "locator": "label", "value": "Remember me" } }
124
131
  { "semanticAction": { "action": "select", "locator": "label", "value": "Country", "text": "United States" } }
132
+ { "semanticAction": { "action": "click", "locator": "text", "value": "Close", "session": "named-browser" } }
125
133
  ```
126
134
 
127
135
  ### `job`
@@ -166,6 +174,8 @@ Compiled shape:
166
174
 
167
175
  Use raw `args` plus `stdin` for upstream `batch` when a flow needs commands, flags, stdin forms, or failure policies outside this constrained schema.
168
176
 
177
+ Because `job` still executes as upstream `batch` with generated stdin, the same wrapper page-scoped `@e…` preflight applies: if you pass `@refs` in `click`/`fill` selectors after an `open` or another step that can navigate or mutate the page, split the work across tool calls or switch to raw `batch` and insert your own `snapshot -i` rows between steps—the constrained `job` vocabulary does not emit snapshot steps for you.
178
+
169
179
  ### `qa`
170
180
 
171
181
  - type: object with required `url`
@@ -174,8 +184,8 @@ Use raw `args` plus `stdin` for upstream `batch` when a flow needs commands, fla
174
184
  - clears enabled diagnostic buffers first (`network requests --clear`, `console --clear`, `errors --clear`), then opens `url`, waits with `wait --load networkidle`, optionally asserts `expectedText` (string or string array) and/or `expectedSelector` (each may be omitted for a load-plus-diagnostics-only smoke), then runs enabled diagnostics: `network requests`, `console`, and `errors`
175
185
  - `checkNetwork`, `checkConsole`, and `checkErrors` default to `true`; set a field to `false` to omit that diagnostic
176
186
  - optional `screenshotPath` adds an evidence screenshot step
177
- - reports `details.compiledQaPreset` with the compiled batch plan and `details.qaPreset` with `{ passed, failedChecks, summary }`
178
- - fails the native tool result with `failureCategory: "qa-failure"` when diagnostics report page errors, console error messages, failed network requests, or any batch step failure
187
+ - reports `details.compiledQaPreset` with the compiled batch plan and `details.qaPreset` with `{ passed, failedChecks, warnings, summary }`
188
+ - fails the native tool result with `failureCategory: "qa-failure"` when diagnostics report page errors, console error messages, actionable failed network requests, or any batch step failure. Benign classification (implementation: `classifyNetworkRequestFailure` → `isBenignAssetFailure` in `extensions/agent-browser/lib/results/shared.ts`) applies only when the row is already treated as failed (`status >= 400`, `failed: true`, or a string `error`—see `isFailedNetworkRequest`), the URL path’s last segment matches the icon basename heuristic (`favicon` plus `.ico`/`.png`/`.svg`, or `apple-touch-icon` plus `.png`, each allowing an optional `[-.\w]*` stem suffix before the extension), **and** at least one of `status === 404`, `failed === true`, or `typeof request.error === "string"` holds (so a **status-only** failure such as `500` on that path with neither `failed` nor a string `error` stays actionable). It also requires the upstream `resourceType` / `mimeType` (whichever is present) to be absent or look image-like: `image`, `img`, `other`, or a value starting with `image/`. Those rows are counted in `qaPreset.warnings` (for example `N benign network request failure(s) ignored`) and omitted from the actionable failed-network tally; every other failed request stays actionable.
179
189
 
180
190
  Example:
181
191
 
@@ -260,7 +270,7 @@ Examples:
260
270
  - default: `"auto"`
261
271
 
262
272
  Behavior:
263
- - if `args` already include `--session`, upstream session choice wins
273
+ - if `args` already include `--session` (including argv compiled from optional `semanticAction.session`), upstream session choice wins
264
274
  - `"auto"` prepends the current extension-managed active session when appropriate
265
275
  - `"fresh"` rotates that managed session to a fresh upstream launch so startup-scoped flags like `--profile`, `--session-name`, `--cdp`, `--state`, `--auto-connect`, `--init-script`, `--enable`, `-p` / `--provider`, or iOS `--device` apply and later default calls follow the new browser
266
276
  - stateless paths skip that injection even under `"auto"`: plain-text `--help` / `-h` / `--version` / `-V` (see the generated inspection playbook fragment below) and read-only `skills list`, `skills get …`, and `skills path …` keep `effectiveArgs` free of the implicit managed `--session` unless the caller supplied `--session` explicitly; successful results therefore omit `usedImplicitSession` and the extension-managed `sessionName` for those calls (`extensions/agent-browser/lib/runtime.ts`, `buildExecutionPlan`)
@@ -268,6 +278,7 @@ Behavior:
268
278
  Recommended use:
269
279
  - use `"auto"` for the common browse/snapshot/click flow inside one `pi` session
270
280
  - use `"fresh"` when switching from an already-active implicit session to a new profile/debug/auth/provider launch without inventing a fixed explicit session name
281
+ - when a fresh launch fails or times out before becoming current, check `details.managedSessionOutcome`: it states whether the prior managed session was preserved or whether the attempted fresh session was abandoned because no prior managed session existed; when `sessionMode` is `"fresh"` and the tool ultimately fails, the model-visible result also appends `Managed session outcome: …` (see `#details` below). Failures under `sessionMode: "auto"` still expose the struct on `details` when the extension injects a managed `--session`, but they do not add that extra prose line.
271
282
 
272
283
  ## Wrapper behavior
273
284
 
@@ -301,10 +312,11 @@ Primary content should be:
301
312
  - an image attachment when relevant
302
313
  - browser-aware compacting for oversized snapshots so the model gets a concise actionable view before raw page noise
303
314
  - compact snapshots should be main-content-first: prefer the primary content block and nearby sections over top-of-page chrome, ads, or unrelated sidebars when those can be distinguished from the snapshot tree
315
+ - when compacting hides actionable controls, snapshot output should add an `Omitted high-value controls` section for bounded searchboxes, textboxes, comboboxes, buttons, tabs, checkboxes, radios, options, and menuitems that were not already shown in key refs
304
316
 
305
317
  Examples:
306
318
  - small `snapshot` results should include the actual snapshot text
307
- - oversized `snapshot` results should switch to a compact view that preserves the primary content, nearby sections, and a trimmed set of high-value refs, while exposing the full raw snapshot path directly in the rendered tool text and via `details.fullOutputPath`
319
+ - oversized `snapshot` results should switch to a compact view that preserves the primary content, nearby sections, a trimmed set of high-value refs, and a separate bounded list of omitted high-value controls when dense pages would otherwise hide important inputs/tabs/buttons, while exposing the full raw snapshot path directly in the rendered tool text and via `details.fullOutputPath`
308
320
  - successful navigation actions like `click`, `back`, `forward`, and `reload` should include a lightweight post-action title/url summary when the wrapper can address the active session
309
321
  - `tab list` should include a readable tab summary
310
322
  - `screenshot` should include the saved-path summary plus the inline image attachment when available
@@ -346,14 +358,25 @@ For `batch`, top-level `details` still carries `resultCategory` plus `successCat
346
358
 
347
359
  Top-level `details.data` on `batch` is a compact per-step roll-up (not a verbatim replay of raw upstream batch JSON): each element is `{ success, command, result? | error? }` where `command` is argv-redacted the same way as echoed invocation args (including `cookies set` cookie values, `storage local|session set` values, and other sensitive flags/positionals), `result` is the presentation-layer data for that step after the same structured redaction as non-batch commands, and `error` is failure text with cookie/storage/password literals stripped when those values appeared in argv. Prefer `batchSteps[]` for full per-step `details` (artifacts, categories, spill paths); use the roll-up when you only need a redacted matrix of what ran.
348
360
 
361
+ `details.refSnapshot` may appear after successful `snapshot` calls and subsequent same-session calls. It records the latest page-scoped ref ids known to the wrapper and the page target they came from so mutation-prone `@e…` commands can fail fast instead of silently hitting recycled refs after navigation.
362
+
363
+ Ref preflight details (implementation in `extensions/agent-browser/index.ts`):
364
+
365
+ - **URL alignment:** `refSnapshot.target.url` and the session’s current tab URL are compared via `targetsMatch` / `normalizeComparableUrl` in `extensions/agent-browser/index.ts`: values are trimmed, parsed as URLs when possible, compared **after dropping the `#fragment`**, and the query string remains significant. If either side lacks a `url`, `targetsMatch` treats the pair as matching so early-session calls are not blocked.
366
+ - **Batch stdin ordering:** user `batch` JSON is scanned in order. Any step whose first token is in `REF_INVALIDATING_BATCH_COMMANDS` sets a latch that blocks later steps whose first token is in `REF_GUARDED_COMMANDS` and that mention `@e…` refs. A step whose first token is `snapshot` clears that latch for subsequent steps (pre-spawn intent only; it does not wait for upstream success). The invalidating set includes navigation/mutation verbs such as `open`, `goto`, `reload`, `click`, `fill`, and related upstream commands; the guarded set is the commands that accept page-scoped refs for interaction (`click`, `fill`, `download`, `scrollintoview`, and others enumerated next to those literals in source). Changing either set requires updating this contract, [`docs/SUPPORT_MATRIX.md`](SUPPORT_MATRIX.md) `RQ-0072` notes, README and command-reference pitfalls, and `test/agent-browser.extension-validation.test.ts`.
367
+
349
368
  **Presentation redaction (implementation map):** Successful non-`batch` tool calls and each successful `batchSteps[]` row run upstream `data` through `redactPresentationData` in `extensions/agent-browser/lib/results/presentation.ts`: `cookies` and `storage` walk objects/arrays and replace case-insensitive `value` keys with `"[REDACTED]"` (diagnostic formatters still describe rows without expanding secrets); every other command’s payload is recursively scrubbed with `redactStructuredPresentationValue`, which redacts known sensitive key names and applies string-level sensitivity heuristics so network, diff, trace/profiler, stream, dashboard, chat, and other structured results do not echo bearer tokens, proxy credentials, or similar fields verbatim into `details.data`. Echoed `command` arrays in `details` and in batch roll-ups use `redactInvocationArgs` from `extensions/agent-browser/lib/runtime.ts` to mask trailing values for sensitive global flags (including `--body`, `--headers`, `--password`, and `--proxy`), preserve the special positional rules for `cookies set`, `storage local|session set`, and `set credentials`, and scrub other argv tokens for URLs and inline secrets. Failed batch steps additionally run `redactExactValues` on structured step errors so literals taken from that step’s argv (cookie value, storage set value, `--password` / `--password=` tokens) cannot reappear inside formatted error blobs.
350
369
 
351
- `nextActions` is an optional machine-readable list of exact native `agent_browser` follow-ups. Each entry includes `tool: "agent_browser"`, an `id`, a short `reason`, optional `safety`, and either `params` (`args`, optional `stdin`, optional `sessionMode`) or an `artifactPath` for saved-file workflows. Agents should prefer these payloads over prose when present. Current recommendations include: `open` success → `snapshot -i`; mutating/navigation commands (see `buildAgentBrowserNextActions` in source for the exact command set) → `snapshot -i`; stale refs and selector failures → `snapshot -i` via `refresh-interactive-refs`; semantic `stale-ref` failures that compiled from `semanticAction` may also include `retry-semantic-action-after-stale-ref` after that snapshot step (see the `semanticAction` section above); confirmations → exact `confirm <id>` and `deny <id>` choices; tab drift → `tab list` then `snapshot -i`; download verification failures or missing successful download artifacts → `wait --download [path]`; saved artifacts → the artifact path to inspect/consume after checking `artifactVerification`/metadata; missing non-download artifacts → `verify-artifact-path` so agents do not trust an absent file. When nothing applies, the field is omitted.
370
+ `nextActions` is an optional machine-readable list of exact native `agent_browser` follow-ups. Each entry includes `tool: "agent_browser"`, an `id`, a short `reason`, optional `safety`, and either `params` (`args`, optional `stdin`, optional `sessionMode`) or an `artifactPath` for saved-file workflows. Agents should prefer these payloads over prose when present. Current recommendations include: `open` success → `snapshot -i`; mutating/navigation commands (see `buildAgentBrowserNextActions` in source for the exact command set) → `snapshot -i`; stale refs and selector failures → `snapshot -i` via `refresh-interactive-refs` (prefixed with `--session <name>` when the failed call ran in a named or managed session); unknown getter shortcuts such as `title` / `url` → exact read-only retries like `get title` / `get url` with ids `use-get-title` / `use-get-url`; semantic `selector-not-found` failures that compiled from `semanticAction` may append `try-searchbox-name-candidate`, `try-textbox-name-candidate`, `try-button-name-candidate`, `try-link-name-candidate`, or `try-labeled-textbox-candidate` after presentation `nextActions` only for the bounded fill/click pairs enumerated under `semanticAction` (not for `select`); semantic `stale-ref` failures that compiled from `semanticAction` may also include `retry-semantic-action-after-stale-ref` after that snapshot step; qualifying same-URL top-level clicks (see `overlayBlockers` below) with fresh snapshot evidence of likely overlay/banner/dialog close controls may append `inspect-overlay-state` and bounded `try-overlay-blocker-candidate-*` entries; `get text <selector>` calls with hidden/multiple CSS matches may append `inspect-visible-text-candidates` with a read-only `eval --stdin` probe (each prefixed with `--session <name>` when `details.sessionName` is set, same `sessionPrefixArgs` rule as other session-scoped follow-ups); confirmations → exact `confirm <id>` and `deny <id>` choices; tab drift → `tab list` then `snapshot -i`; download verification failures or missing successful download artifacts → `wait --download [path]`; saved artifacts → the artifact path to inspect/consume after checking `artifactVerification`/metadata; missing non-download artifacts → `verify-artifact-path` so agents do not trust an absent file. When nothing applies, the field is omitted.
371
+
372
+ **Unknown-command getter hints (failure presentation):** `buildToolPresentation` in `extensions/agent-browser/lib/results/presentation.ts` only runs this path when upstream error text (after model-facing redaction) matches `unknown command`, `unknown subcommand`, or `unrecognized command` (case-insensitive) **and** the failed invocation’s primary command token is one of `attr`, `count`, `html`, `text`, `title`, `url`, or `value`. Visible text then includes a grouped-`get` hint line plus per-token guidance (`get text <selector>`, `get html …`, `get attr …`, `get count …`, `get value …`, `get title`, `get url`). Machine `nextActions` with ids `use-get-title` / `use-get-url` are emitted only for `title` / `url`, with `params.args` optionally prefixed by `--session <name>` when the failed call targeted a named session. If the error string already contains `Agent-browser hint:` from selector recovery (stale-ref or unsupported selector dialect appendages), the getter block is skipped so two stacked `Agent-browser hint:` headers are not emitted.
352
373
 
353
374
  For `batch`, each `batchSteps[]` entry can carry its own `nextActions` for that step’s success or failure. Top-level `details.nextActions` on a failed batch duplicates `batchFailure.failedStep.nextActions` so callers can read one aggregate object. On a fully successful batch, top-level `nextActions` may still list artifact follow-ups derived from the combined step artifacts.
354
375
 
355
376
  `pageChangeSummary` is an optional compact summary for mutation-prone and artifact-producing commands. It includes `changeType` (`"navigation"`, `"mutation"`, `"artifact"`, or `"confirmation"`), `command`, a readable `summary`, optional `title`/`url`, optional `artifactCount` or `savedFilePath`, and `nextActionIds` that link the observed change to `nextActions` without repeating full payloads. The wrapper maintains an explicit allowlist of mutation-prone commands in `extensions/agent-browser/lib/results/presentation.ts` (`PAGE_CHANGE_SUMMARY_COMMANDS`): those commands still emit a `mutation`-typed summary when upstream JSON lacks navigation metadata, as long as no stronger signal (artifact, saved path, navigation fields, or pending confirmation) applies. Commands outside that set omit `pageChangeSummary` unless the parsed payload shows navigation, a confirmation prompt, saved files, or artifacts—including read-only inspection commands, which normally have no summary unless one of those signals appears. For `batch`, the top-level summary favors artifact rollups when any step produced artifacts; otherwise it may synthesize a `mutation` summary from steps that carried their own `pageChangeSummary`.
356
377
 
378
+ `overlayBlockers` may appear after a successful **top-level** `click` (the unified `details.command` is `click`, not `batch`/`job`/`qa` flows that compile to `batch`) only when upstream JSON includes a string `data.clicked` ref, the session’s prior pinned tab URL (`priorSessionTabTarget.url`) and the post-click active tab URL both exist and stay equal after the same URL normalization used for ref preflight (trimmed hosts/paths; **`#fragment` dropped** while the query string stays significant), and the wrapper did not apply session tab correction or an about-blank mismatch recovery in the same result. The post-click side comes from `details.navigationSummary.url`, which the wrapper fills via follow-up `get url` / `get title` only when upstream click JSON omits **both** string `data.url` and `data.title` (`shouldCaptureNavigationSummary` in `extensions/agent-browser/index.ts`). If either field is present as a string on the click payload, that probe is skipped, `navigationSummary` stays unset here, and overlay diagnostics are omitted even when the page did not navigate. The wrapper then issues **one** extra session-scoped `snapshot -i`, scans that snapshot’s `refs` map, and only emits diagnostics when **both** are true: at least one ref looks like overlay/banner/dialog context (`dialog` / `alertdialog` roles or name text matching banner/modal/cookie/consent style patterns in `extensions/agent-browser/index.ts`), and there are up to **three** separate `button`/`link`/`menuitem` refs whose names match close/dismiss-style patterns (for example “Close”, “Dismiss”, “No thanks”, or a lone `×`). Each candidate carries `ref` (`@eN`), optional `role`/`name`, exact `click` argv in `args`, and a short evidence `reason`. The struct also includes a `summary` string (one sentence stating that the click left the tab on the same normalized URL and the fresh snapshot shows likely dismiss controls) plus a `snapshot` object (same shape as `details.refSnapshot` after a normal snapshot): on success the wrapper may treat that snapshot as the session’s latest ref map for subsequent calls, so agents should assume refs can move to match this post-diagnostic tree. Visible text appends the same bullets under `Possible overlay blockers`, and `details.nextActions` gains `inspect-overlay-state` plus `try-overlay-blocker-candidate-1`…`3` after any presentation `nextActions` (for example `inspect-after-mutation`); when `details.sessionName` is set, those appended actions use `sessionPrefixArgs` so `params.args` begin with `--session <name>` unless argv already starts with `--session`. This is conservative evidence, not proof the candidate should be clicked; prefer `inspect-overlay-state` first unless the dismiss control is clearly safe.
379
+
357
380
  Example shape (fields vary by scenario):
358
381
 
359
382
  ```json
@@ -402,7 +425,7 @@ When `semanticAction` produced compiled argv but the unified result is `failureC
402
425
 
403
426
  Implementation and precedence:
404
427
 
405
- - Types, classifiers, and follow-up assembly live in `extensions/agent-browser/lib/results/shared.ts`: `classifyAgentBrowserSuccessCategory`, `classifyAgentBrowserFailureCategory`, `buildAgentBrowserResultCategoryDetails` (the last prefers an explicit `failureCategory` when the caller already knows the bucket, otherwise it runs the classifier), and `buildAgentBrowserNextActions`.
428
+ - Types, classifiers, and follow-up assembly live in `extensions/agent-browser/lib/results/shared.ts`: `classifyAgentBrowserSuccessCategory`, `classifyAgentBrowserFailureCategory`, `buildAgentBrowserResultCategoryDetails` (the last prefers an explicit `failureCategory` when the caller already knows the bucket, otherwise it runs the classifier), and `buildAgentBrowserNextActions`. Failed upstream `network requests` rows also flow through `classifyNetworkRequestFailure` / `summarizeNetworkFailures` there for QA analysis (`analyzeQaPresetResults` in `extensions/agent-browser/index.ts`) and for actionable-vs-benign lines in `network requests` presentation (`extensions/agent-browser/lib/results/presentation.ts`).
406
429
  - Artifact verification: `ArtifactVerificationSummary` / `ArtifactVerificationEntry` types live in `shared.ts`. `buildArtifactVerificationSummary`, `getArtifactVerificationEntry`, and `getManifestVerificationEntry` in `presentation.ts` merge each resolved file artifact with manifest rows whose `storageScope` is not `explicit-path` (those rows duplicate file artifacts) and whose `path` is in the current result’s spill path set. Successful presentation merges then run `classifyPresentationSuccessCategory` in `presentation.ts`, which forces `successCategory: "artifact-unverified"` when `artifactVerification.missingCount` or `artifactVerification.unverifiedCount` is greater than zero before delegating to `classifyAgentBrowserSuccessCategory`.
407
430
  - Inner success categories (`classifyAgentBrowserSuccessCategory` in `shared.ts`, after verification counts are clear): if `inspection` is true → `"inspection"`; else if any non-pending artifact lacks confirmed on-disk presence (`exists !== true`) → `"artifact-unverified"`; else if there is a `savedFile` or any `artifacts` → `"artifact-saved"`; else → `"completed"`.
408
431
  - Failure: the classifier walks a single ordered chain (first match wins): `confirmation-required` → `timeout` → `missing-binary` → `parse-failure` → `aborted` → `tab-drift` → `stale-ref` (including “unknown ref” text and a narrow `@eN` plus “element not found” heuristic) → `selector-unsupported` → `selector-not-found` → `download-not-verified` (download / wait-download style failures) → `validation-error` when a wrapper `validationError` is present → default `upstream-error`.
@@ -412,10 +435,15 @@ Additional structured fields can appear when relevant:
412
435
  - `compiledSemanticAction` when the call used `semanticAction` and the result includes the unified `details` merge: `{ action, locator, args }` with the same redaction rules as `args` / `effectiveArgs`; omitted for plain `args`/`job` calls and omitted on some early error returns that omit this field (see the `semanticAction` section above)
413
436
  - `compiledJob` when the call used `job` or the job-backed `qa` preset: `{ args: ["batch"], stdin, steps: [{ action, args }] }`, with step args redacted the same way as other invocation details
414
437
  - `compiledQaPreset` when the call used `qa`: the compiled job fields plus the QA `checks` object
415
- - `qaPreset` when the call used `qa`: `{ passed, failedChecks, summary }`
438
+ - `qaPreset` when the call used `qa`: `{ passed, failedChecks, warnings, summary }`. Network rows inside the `network requests` batch step use `summarizeNetworkFailures` / `classifyNetworkRequestFailure` in `shared.ts`: actionable failures appear in `failedChecks` (and fail the tool when the upstream batch still succeeded); benign icon-classified failures appear only in `warnings` and in `summary` as `QA preset passed with warnings: …` when nothing else failed.
416
439
  - `batchFailure` and `batchSteps` for `batch` rendering, including mixed-success runs
417
440
  - `navigationSummary` for navigation-style commands like `click`, `back`, `forward`, and `reload`
418
441
  - `pageChangeSummary` for compact mutation/artifact/navigation summaries on commands that can change browser state
442
+ - `overlayBlockers` for conservative post-click overlay/banner/dialog blocker candidates when a direct click stays on the same URL and a fresh snapshot provides evidence (`candidates`, `summary`, and `snapshot` per `OverlayBlockerDiagnostic` in `extensions/agent-browser/index.ts`)
443
+ - `selectorTextVisibility` after a **successful** upstream `get text <selector>` (standalone or inside a successful `batch`) when the wrapper’s follow-up probe finds a hazard: more than one DOM match (upstream reads the first `querySelectorAll` hit, which may be the wrong tab/panel), or the first match is hidden while at least one other match is visible (requires multiple DOM nodes so a visible peer exists; a lone hidden match is not flagged). The probe is a read-only `eval --stdin` script (`buildVisibleTextProbeScript` in `extensions/agent-browser/index.ts`) that counts matches, applies a small visibility heuristic (`display`/`visibility`/`opacity` plus non-zero client rects), and may include a redacted `firstVisibleTextPreview`. It is **not** run for page-scoped `@e…` selectors or when the selector string is withheld because `selectorMayExposeSensitiveLiteral` would risk echoing secrets in probe output. `details.selectorTextVisibility` mirrors the primary diagnostic (first sorted entry); when several selectors in one `batch` qualify, `selectorTextVisibilityAll` lists every diagnostic sorted so hidden-first cases precede generic multi-match ambiguity. Appended `details.nextActions` use ids `inspect-visible-text-candidates` and `inspect-visible-text-candidates-2`, … with the probe replayed via `eval --stdin` for each hazardous selector.
444
+ - `evalStdinHint` after a successful `eval --stdin` when caller stdin (trimmed) looks function-shaped to the wrapper’s lightweight detector (`looksLikeFunctionEvalStdin` in `extensions/agent-browser/index.ts`: leading `function` / `async function`, parenthesized arrow `(…) =>`, or a concise `name =>` / `async name =>` form) **and** upstream JSON `data` is an object whose `result` field is an empty object (`{}`). It includes `reason` and `suggestion`; visible output appends `Eval stdin hint` with the same guidance. This is a heuristic for the common mistake of returning a function object instead of invoking it or passing a plain expression, not a JavaScript parser or proof that the page returned no useful data.
445
+ - `timeoutPartialProgress` after `runAgentBrowserProcess` reports `timedOut` (wrapper child-process watchdog) when best-effort recovery finds useful context. `summary` is a short sentence counting how many declared artifact paths exist on disk versus how many were scanned, and whether page context came from live session reads or only from a planned URL (when nothing in the plan declares an artifact path, the fraction may read `0/0` while `currentPage` can still carry session or planned URL context). `steps` lists planned argv from the compiled `job` or `qa` batch plan (`compiledJob` in `extensions/agent-browser/index.ts`, which is only populated for those top-level modes) or, when that object is absent, from the same JSON-array `batch` stdin the tool sends upstream—whether caller-authored or wrapper-generated for `sourceLookup` / `networkSourceLookup` (1-based indices; only JSON-array stdin whose elements are string[] argv arrays is parsed); timeouts on other argv shapes may still emit `currentPage` / summary evidence without `steps`. `currentPage` comes from session-scoped `get url` / `get title` when the session answers, otherwise a fallback URL may be inferred from the last `open` / `navigate` / `pushstate` step in the plan. `artifacts` covers declared output paths on `screenshot`, `pdf`, `download`, and `wait --download` steps (absolute path, existence, optional `sizeBytes`, `stepIndex`). Visible text repeats the same block under `Timeout partial progress`, applying URL and path-segment redaction; the prose `Planned steps` list shows at most six steps, then an omitted-count line when the plan is longer. This is recovery evidence only; missing entries do not prove the upstream step never ran or that no other side effects occurred.
446
+ - `managedSessionOutcome` after a managed-session plan reaches process execution (`buildManagedSessionOutcome` / `formatManagedSessionOutcomeText` in `extensions/agent-browser/index.ts`). Populated when `buildExecutionPlan` injects an extension-managed implicit or fresh `--session` (omitted when the caller already set explicit upstream `--session` or for stateless inspection paths that skip injection). Fields: `status` (`created`, `replaced`, `unchanged`, `closed`, `preserved`, or `abandoned`), `sessionMode`, `attemptedSessionName`, `previousSessionName`, `currentSessionName`, optional `replacedSessionName`, `activeBefore`, `activeAfter`, `succeeded`, and `summary`. Model-visible echo: only when `sessionMode` is `"fresh"` **and** `succeeded` is false, the wrapper appends a line of the form `Managed session outcome: ${summary}` after the primary presentation (including missing-binary failures on a fresh plan, where it follows the missing-binary message and no other diagnostic tail runs). When other trailing diagnostic prose is also emitted in the same result, that line is concatenated **after** semantic-action candidate lines, overlay/selector-visibility tails, and `Timeout partial progress` (see `rawAppendedDiagnosticText` in `extensions/agent-browser/index.ts`). For `"auto"` failures the same struct may appear on `details` without that extra line. When post-upstream analysis (for example **`qa`** preset failure) flips the overall tool result after a successful batch, the implementation only realigns `managedSessionOutcome.succeeded` to the final outcome; `status`/`summary` may still describe the managed-session transition (for example `replaced` while `failureCategory` is `qa-failure`), so read `failureCategory` / `qaPreset` / `batchFailure` alongside this object.
419
447
  - `imagePath` / `imagePaths` for Pi inline image attachments from the **`screenshot`** command (including batched screenshot steps). **`diff screenshot`** still records the diff output as an `image`-kind entry in `details.artifacts`, but it does **not** populate `imagePath` / `imagePaths` or attach an inline image: only plain `screenshot` is treated as a trusted live-capture path for automatic inlining (`isTrustedScreenshotOutput` in `extensions/agent-browser/lib/results/presentation.ts`).
420
448
  - `artifacts` for upstream saved files such as screenshots, `state save` outputs, `diff screenshot` diff images, PDFs, downloads, `wait --download` files, traces, CPU profiles, completed WebM recordings, path-bearing HAR captures, and future recording output paths reported by `record start`. Each artifact includes the original saved or requested `path`, resolved `absolutePath`, `kind`/`artifactType`, optional `mediaType`, optional `extension`, best-effort disk metadata such as `exists` and `sizeBytes`, plus `requestedPath`, `status`, `cwd`, `session`, and `tempPath` when applicable.
421
449
  - `savedFilePath` / `savedFile` for direct `download`, `pdf`, and `wait --download` saved-file workflows; batch results preserve the same fields on the relevant `batchSteps` entry.
@@ -424,6 +452,8 @@ Additional structured fields can appear when relevant:
424
452
  - `fullOutputPath` / `fullOutputPaths` when large snapshot output or other oversized tool output is compacted and spilled to a private file; persisted sessions keep that path under a private session-scoped artifact directory with a bounded per-session budget so it survives reload/resume without unbounded growth
425
453
  - `artifactManifest` for a bounded, metadata-only inventory of recent session artifacts. Entries include path metadata, artifact `kind`, source `command`/`subcommand` when safe, `storageScope` (`persistent-session`, `process-temp`, or `explicit-path`), and `retentionState` (`live`, `ephemeral`, `missing`, or `evicted`). The default recent window is 100 entries and can be configured with `PI_AGENT_BROWSER_SESSION_ARTIFACT_MANIFEST_MAX_ENTRIES`. The manifest must not store command args, output contents, headers, DOM snapshots, or downloaded file contents.
426
454
  - `artifactRetentionSummary` with a concise count of live, evicted, ephemeral, and missing artifacts from the current manifest; results append this summary to model-facing text only when retention state affects recovery, such as spill files, ephemeral files, or evictions. Routine explicit saved files keep the summary in details to avoid noisy browsing transcripts.
455
+ - `artifactCleanup` after a successful `close` when `artifactManifest` exists and `entries` is non-empty. Fields: `owner: "host-file-tools"`, `summary` (same retention summary string as `artifactRetentionSummary` for that manifest), `note` explaining that browser close does not delete explicit screenshots/downloads/PDFs/traces/HAR/recordings, and `explicitArtifactPaths`: up to ten **distinct** paths taken from manifest rows with `storageScope: "explicit-path"` in encounter order (de-duplicated); when the recent window has no such rows—for example only spill or ephemeral inventory—the array is empty but `summary` / `note` still surface so agents know close is not file deletion. The native browser tool intentionally does not expose a delete operation for arbitrary user-chosen artifact paths; agents should inspect `artifactVerification` / manifest metadata, then remove files with normal host file tools when cleanup is required.
456
+ - compact **snapshot** metadata on successful presentation when `details.data.compacted` is true (oversized trees): `previewMode` (`"structured"` vs outline `"outline"`), `structuredPreviewUsed`, `previewRefIds`, `previewSections` (per-section `linesShown` / `omittedLines` / root `role` / `title`), `additionalSectionsOmitted`, counts such as `refCount`, `snapshotLineCount`, and `roleCounts`, optional `highValueControlRefIds` aligned with the visible `Omitted high-value controls` lines, and optional `spillError` when the wrapper could not write the raw spill file; the model text still ends with `Full raw snapshot path:` or an explicit unavailable reason plus `details.fullOutputPath` when a path exists
427
457
  - `sessionRecoveryHint` when startup-scoped flags need `sessionMode: "fresh"` while an implicit session is already active: includes `reason`, `recommendedSessionMode` (`"fresh"`), redacted `exampleArgs`, and `exampleParams` where `sessionMode` is `"fresh"` and `args` is the same redacted argv as `exampleArgs` (from `buildExecutionPlan` in `extensions/agent-browser/lib/runtime.ts`, merged through `redactRecoveryHint` in `extensions/agent-browser/index.ts`)
428
458
  - `inspection: true` plus `stdout` for successful plain-text inspection commands like `--help` and `--version`
429
459
 
@@ -450,7 +480,7 @@ Worth doing in v1:
450
480
  - navigation actions like `click`, `back`, `forward`, and `reload` → lightweight post-action title/url summary when available
451
481
  - tab lists → compact summary/table
452
482
  - stream status → enabled/connected/port summary plus WebSocket URL and frame format when a port is known; if the caller explicitly passed `--json`, visible text is valid JSON instead of a prose summary
453
- - diagnostic/status families (`session`, `session list`, `profiles`, `doctor`, `auth list`/`show`, `cookies`, `storage`, `dialog`, `frame`, `state`, `network requests`, `console`, `errors`, and dashboard start/stop/status outputs) → compact readable summaries with counts and stable fields; large log/request/error outputs use previews plus `fullOutputPath` spill files; sensitive nested auth/header/token fields are not expanded in the model-facing text
483
+ - diagnostic/status families (`session`, `session list`, `profiles`, `doctor`, `auth list`/`show`, `cookies`, `storage`, `dialog`, `frame`, `state`, `network requests`, `console`, `errors`, and dashboard start/stop/status outputs) → compact readable summaries with counts and stable fields; network request lists include an actionable-vs-benign failed-request summary and mark low-impact browser icon failures separately; large log/request/error outputs use previews plus `fullOutputPath` spill files; sensitive nested auth/header/token fields are not expanded in the model-facing text
454
484
  - trace/profiler owner conflicts → when the wrapper has observed one owner active for a session, block conflicting starts/stops with "wrapper believes ..." wording because upstream or external CLI use can desynchronize wrapper-local state
455
485
 
456
486
  ## Missing binary behavior
@@ -472,6 +502,7 @@ If `agent-browser` is not on `PATH`, fail with a message that:
472
502
  - clean up process-private temp spill artifacts on shutdown, while keeping persisted-session snapshot spill files in a private session-scoped artifact directory so `details.fullOutputPath` survives reload/restart and the oldest spill files are evicted if the per-session artifact budget is exceeded
473
503
  - reconstruct the current extension-managed session and latest `artifactManifest` from persisted tool details on resume/reload so later default calls keep following the active managed browser and can continue reporting artifact retention state
474
504
  - when an unnamed `sessionMode: "fresh"` launch succeeds, make it the new extension-managed session so later default calls keep using it
505
+ - when an unnamed `sessionMode: "fresh"` launch fails or times out, preserve the previous managed session when one was active or report the attempted fresh session as abandoned when no managed session was active (`details.managedSessionOutcome`; visible `Managed session outcome: …` only when the final tool call used `sessionMode: "fresh"` and failed—see `#details`)
475
506
  - if that unnamed fresh launch replaced an already-active managed session, best-effort close the old managed session after the switch succeeds
476
507
  - treat explicit caller-provided `--session` choices as user-managed; `--session` isolates a live browser session but is not a persisted tab/auth restore mechanism after `close`, so use `--profile`, `--session-name`, or `--state` when persisted auth/tab state is required
477
508
  - pass explicit `--profile` straight through to upstream `agent-browser`; no profile-cloning or isolation layer is added in v1
@@ -483,7 +514,7 @@ If `agent-browser` is not on `PATH`, fail with a message that:
483
514
  - If a known session target unexpectedly reports about:blank, agent_browser preserves the prior intended target, best-effort re-selects it when it still exists, and reports exact recovery guidance when it cannot be re-selected.
484
515
  <!-- agent-browser-playbook:end wrapper-tab-recovery -->
485
516
  - on local Unix launches, set a short private socket directory for wrapper-spawned `agent-browser` processes so extension-generated session names do not fail the upstream Unix socket-path length limit in longer cwd/session-name combinations
486
- - keep wrapper-spawned commands below the upstream CLI IPC read-timeout budget by clamping `AGENT_BROWSER_DEFAULT_TIMEOUT` to 25 seconds and stopping a stuck child process before the upstream 30-second retry path begins
517
+ - keep wrapper-spawned commands below the upstream CLI IPC read-timeout budget by clamping `AGENT_BROWSER_DEFAULT_TIMEOUT` to 25 seconds and stopping a stuck child process before the upstream 30-second retry path begins (`PI_AGENT_BROWSER_PROCESS_TIMEOUT_MS` configures the watchdog); timed-out compiled `job` / `qa` or caller `batch` calls may add `details.timeoutPartialProgress` and visible `Timeout partial progress` evidence with planned steps, current page title/URL, and declared artifact path checks
487
518
  - interactive or long-running upstream families such as `chat` without a prompt, `dashboard start`, `stream enable`, `trace start`, `profiler start`, `record start`, `inspect`, `install`, `upgrade`, `doctor --fix`, and `confirm-interactive` are passed through thinly but remain bounded by the same wrapper timeout/session planning rules; prefer explicit arguments, single-shot `chat <message>`, non-interactive flags like `doctor --offline --quick` or `doctor --json`, and cleanup pairs such as `dashboard stop`, `stream disable`, `trace stop`, `profiler stop`, and `record stop`
488
519
  - treat successful plain-text inspection commands like `--help` and `--version` as stateless: do not inject the implicit managed session and do not let those calls claim the managed-session slot
489
520
  - if startup-scoped flags like `--profile`, `--session-name`, `--cdp`, `--state`, `--auto-connect`, `--init-script`, `--enable`, `-p` / `--provider`, or iOS `--device` are supplied after the implicit session is already active while `sessionMode` is `"auto"`, return a validation error with a structured recovery hint that recommends `sessionMode: "fresh"`