npm - pi-agent-browser-native - Versions diffs - 0.2.44 → 0.2.45 - Mend

pi-agent-browser-native 0.2.44 → 0.2.45

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (64) hide show

package/docs/TOOL_CONTRACT.md CHANGED Viewed

@@ -140,7 +140,7 @@ The extension always plans normal browser commands with `--json` prepended in `e
 <!-- agent-browser-playbook:start shared-guidelines -->
 <!-- Generated from extensions/agent-browser/lib/playbook.ts. Run `npm run docs -- playbook write` to update. -->
-- Standard workflow: open the page, snapshot -i, interact using current @refs from that snapshot, and re-snapshot after navigation, scrolling, rerendering, or other major DOM changes because refs are page-scoped; the wrapper fails mutation-prone stale/recycled refs before upstream can silently target a different current-page element.
+- Standard workflow: open the page, snapshot -i, interact using current @refs from that snapshot, and re-snapshot after navigation, scrolling, rerendering, or other major DOM changes because refs are page-scoped; the wrapper fails mutation-prone stale/recycled refs before upstream can silently target a different current-page element. On dense pages, use wrapper-side snapshot -i --search <text> or snapshot -i --filter role=<role> to render matching refs while preserving the full ref map in details.refSnapshot, add snapshot --viewport when scroll position or above/below-fold context matters, and add snapshot --diff when a quick before/after ref-map delta would prevent reading a full spill file.
 - For ordinary forms from one snapshot, batch multiple fill @refs before the submit/click step to avoid serial tool calls; if a fill may autosubmit, navigate, or rerender later fields, split the flow and refresh refs first.
 - Snapshot choice: prefer snapshot -i for routine clicks/fills (interactive @refs, main-content-first). Use snapshot --compact when you need a denser same-page tree without full spill; use full snapshot (no -i) only when you need the complete accessibility tree. Re-snapshot after navigation or major DOM changes. When snapshot -i compacts because the tree is oversized, scan visible output for Omitted high-value controls and optional details.data.highValueControlRefIds before opening the spill file: those list bounded searchboxes, textboxes, comboboxes, buttons, tabs, checkboxes, radios, options, and menuitems that did not fit the key/other ref previews.
 - When a visible text or accessible-name target should survive ref churn, prefer find locators such as role, text, label, placeholder, alt, title, or testid with the intended action instead of guessing a CSS selector.
@@ -154,21 +154,21 @@ The extension always plans normal browser commands with `--json` prepended in `e
 - For first-navigation setup, use open without a URL plus network route --resource-type <csv>, cookies set --curl <file>, or --init-script/--enable before navigate/opening the target page.
 - For stateful browser context work, prefer purpose-specific page actions before dumping browser data: use auth save --password-stdin with the tool stdin field for credentials, auth list/show/delete/remove for local auth-profile maintenance, auth login when you need the browser to fill a saved profile, state save/load for portable test state, state list/show/rename/clear/clear -a/clean for saved-state lifecycle cleanup, cookies get/set/clear and storage local|session only when the task needs those values, and expect cookie/storage/auth/state summaries to redact credential-like fields while allowing benign primitive storage values when useful for local QA.
 - For batch chains that touch cookies, storage, auth, or other secret-bearing commands, use details.batchSteps for per-step artifacts, categories, spill paths, and full structured errors; top-level details.data on batch is only a compact redacted step matrix (success, argv-redacted command, redacted result or scrubbed error text) built from the same presentation rules as standalone calls.
-- For non-core families, pass current upstream commands through the native tool directly: network route/requests/har (including request filters like --type/--method/--status), diff snapshot/screenshot/url with scoped/baseline options, trace/profiler/record, console/errors/highlight/inspect/clipboard, stream enable/disable/status, dashboard start/stop, device list for iOS simulator inventory, and chat. For compact network requests output, prefer details.nextActions for request detail, route-mock diagnostics, actionable failed-request networkSourceLookup, filtering, or HAR capture follow-ups instead of guessing request-id syntax. Artifact-producing commands report details.artifacts and verification state; long-running starts such as stream, dashboard, trace/profiler, and record should be paired with the matching stop/disable command when the task is done; stream enable already-enabled outcomes are treated as idempotent success with status/disable follow-ups.
+- For non-core families, pass current upstream commands through the native tool directly: network route/requests/har (including request filters like --type/--method/--status), diff snapshot/screenshot/url with scoped/baseline options, trace/profiler/record, console/errors/highlight/inspect/clipboard, stream enable/disable/status, dashboard start/stop, device list for iOS simulator inventory, and chat. For compact network requests output, prefer details.nextActions for request detail, route-mock diagnostics, actionable failed-request networkSourceLookup, filtering, clearing the aggregate buffer before repro, or HAR capture follow-ups instead of guessing request-id syntax. Artifact-producing commands report details.artifacts and verification state; long-running starts such as stream, dashboard, trace/profiler, and record should be paired with the matching stop/disable command when the task is done; stream enable already-enabled outcomes are treated as idempotent success with status/disable follow-ups.
 - For Electron desktop apps, prefer top-level electron for wrapper-owned discovery, isolated launch, status, compact probe, and cleanup: list first, treat likely-sensitive annotations as hints rather than enforcement, launch with the default snapshot handoff unless handoff: "tabs" is the safer diagnostic starting point, use electron.probe or snapshot -i/qa.attached for current-session state, and always cleanup the returned launchId when done. electron.launch uses an isolated temporary profile; it does not reuse the app's normal signed-in profile or attach to an already-running authenticated app. For signed-in local app state, host-launch the normal app with --remote-debugging-port when appropriate, then use raw args connect <port|url>; after connect, inspect tab list, select the stable tab id such as tab t2, then run a condition wait or snapshot -i before using refs. close commands (`close`, `quit`, or `exit`) only close the browser/CDP session; leave manually launched app shutdown, profile cleanup, and explicit artifacts to the host owner.
 - For provider or specialized app workflows, load version-matched upstream guidance with skills get agentcore|electron|slack|dogfood|vercel-sandbox through the native tool; add --full when you need references/templates, and use skills get --all only for broad skill audits. Provider launches such as -p ios, --provider browserbase/kernel/browseruse/browserless/agentcore, and iOS --device are upstream-owned setup paths; use sessionMode fresh when switching providers and expect external credentials or local Appium/Xcode setup to be required.
-- For dialogs and frames, use dialog status/accept/dismiss and frame <selector|main> through native args; when --confirm-actions produces a pending confirmation, use details.nextActions or exact confirm <id> / deny <id> calls instead of inventing ids.
+- For dialogs and frames, use dialog status/accept/dismiss and frame <selector|main> through native args; dialog commands and eval snippets that look like alert/confirm/prompt/dialog triggers are shorter-bounded than normal browser calls, and timed-out dialog-like interactions may add inspect-dialog-after-timeout, dismiss-dialog-after-timeout, or recover-fresh-session-after-dialog-timeout nextActions. When --confirm-actions produces a pending confirmation, use details.nextActions or exact confirm <id> / deny <id> calls instead of inventing ids.
 - If a session lands on the wrong page or tab, an interaction changes origin unexpectedly, or an open call returns blocked, blank, or otherwise unexpected results, use tab list / tab <tab-id-or-label> / snapshot -i to recover state before retrying different URLs or fallback strategies. For headed demos, put --headed on the first launch with sessionMode=fresh and verify with screenshot/tab/get-url evidence because tool success cannot prove the OS window is visible to the user. For desktop readiness, prefer real conditions first: wait --text, wait --url, wait --fn, wait --load <state>, wait --download, or qa.attached; for disappearance checks in agent-browser 0.27.1, use wait --fn predicates instead of stale upstream-help examples like wait <selector> --state hidden. Use electron.probe/status for wrapper-owned launch health or target mismatch. Fixed waits are a last resort, must stay below the wrapper IPC budget (wait 30000 is intentionally blocked), and a successful payload like "waited":"timeout" means elapsed time only—verify completion with an observed condition, fresh snapshot, or screenshot.
 - For feed, timeline, or inbox reading tasks, focus on the main timeline/list region and read the first item there rather than unrelated composer or sidebar content.
 - For read-only browsing tasks, prefer extracting the answer from the current snapshot, structured ref labels, or eval --stdin on the current page before navigating away. Only click into media viewers, detail routes, or new pages when the current view does not contain the needed information.
-- For downloads, prefer download <selector> <path> when an element click should save a file. Do not rely on click alone when you need the downloaded file on disk.
-- On dashboards with nested scroll containers, verify scroll with a screenshot or fresh snapshot -i; if the viewport did not move, prefer scrollintoview <@ref> or target the actual scrollable region. For native selects, use select <selector> <value...> (or semanticAction/job select) instead of clicking option refs; for custom comboboxes, a click/semanticAction may only focus the field, so re-snapshot and fall back to type, press Enter/arrow keys, or visible option refs.
+- For downloads, prefer download <selector> <path> when an element click should save a file; simple loopback anchor downloads are saved to the requested path when the wrapper can resolve an HTTP(S) href. Do not rely on click alone when you need the downloaded file on disk.
+- On dashboards with nested scroll containers, verify scroll with a screenshot or fresh snapshot -i; if the viewport did not move, details.data.scrolled may be false/noMovement true and you should prefer scrollintoview <@ref> or target the actual scrollable region with scroll <selector> <dir> [px|percent]. For native selects, use select <selector> <value...> (or semanticAction/job select) instead of clicking option refs; for custom comboboxes, a click/semanticAction may only focus the field, so re-snapshot and fall back to type, press Enter/arrow keys, or visible option refs.
 - When using eval --stdin, scope checks and actions to the target element or route whenever possible instead of relying on broad page-wide text heuristics.
-- When using eval --stdin for extraction, pass the JavaScript through the native tool stdin field, not as an extra args token after --stdin, and return the value you want instead of relying on console.log as the primary result channel. Prefer plain expressions like ({ title: document.title }) or explicitly invoked functions like (() => ({ title: document.title }))(); if a function-shaped snippet returns {}, details.evalStdinHint may warn that the function was serialized instead of called. On file:// pages, when upstream JSON returns result: null for non-trivial stdin, details.evalResultWarning may append Eval result warning without failing the tool—treat that as inconclusive DOM verification. If get text on a CSS selector surfaces details.selectorTextVisibility or selectorTextVisibilityAll, prefer a visible @ref, a more specific selector, or the inspect-visible-text-candidates nextAction over hidden tab content.
+- When using eval --stdin for extraction, pass the JavaScript through the native tool stdin field, not as an extra args token after --stdin, and return the value you want instead of relying on console.log as the primary result channel. Prefer plain expressions like ({ title: document.title }) or explicitly invoked functions like (() => ({ title: document.title }))(); use outputPath when the eval/get/snapshot data should be saved as a durable local file. If a function-shaped snippet returns {}, details.evalStdinHint may warn that the function was serialized instead of called. On file:// pages, when upstream JSON returns result: null for non-trivial stdin, details.evalResultWarning may append Eval result warning without failing the tool—treat that as inconclusive DOM verification. If get text on a CSS selector surfaces details.selectorTextVisibility or selectorTextVisibilityAll, prefer a visible @ref, a more specific selector, or the inspect-visible-text-candidates nextAction over hidden tab content.
 - When details.pageChangeSummary is present, use changeType and summary as a compact signal for navigation, DOM mutation, confirmations, or artifacts; when nextActionIds is set, match those ids to entries in details.nextActions (or per-step nextActions inside batch) for concrete follow-up payloads instead of inferring from prose alone. If details.clickDispatch reports no trusted DOM event, refresh/inspect/retry the real click first; for static local fixtures only, an explicit eval --stdin programmatic .click() can exercise app handlers, but treat it as an untrusted scripted workaround and never use it to bypass stop-before-submit/order/purchase boundaries. If a no-navigation click surfaces details.overlayBlockers, inspect the fresh snapshot evidence before using a close/dismiss candidate nextAction; ordinary page chrome without dialog/alertdialog evidence should not trigger this diagnostic.
 - When commands save or spill files (screenshots, downloads, PDFs, traces, recordings, HAR, large snapshot spills), use the user's exact requested paths when given and treat paths as provisional until details.artifactVerification shows every row verified: branch on missingCount, pendingCount, unverifiedCount, per-entry state, and optional limitation before downstream file use or PASS/FAIL reporting.
 - For evidence-only screenshots, QA captures, or other audit artifacts, save to an explicit path and branch on details.artifactVerification plus details.artifacts before reporting PASS/FAIL; do not require vision review of inline image attachments unless the user asked for visual inspection.
-- Respect explicit user stop boundaries: if the user says to stop before order/post/purchase/submit, do not click that final action. If the wrapper returns details.promptGuard.reason=explicit-user-stop-boundary, gather evidence on the current page instead of retrying the blocked final action.
+- Respect explicit user stop boundaries yourself: if the user says to stop before order/post/purchase/submit, do not click that final action. The wrapper does not infer broad business intent from prompt text; details.promptGuard is reserved for concrete artifact-before-close checks.
 - Successful record stop needs ffmpeg on PATH; the wrapper may warn after record start when ffmpeg is missing.
 - Do not call --help or other exploratory inspection commands unless the user explicitly asks for them or debugging the browser integration is necessary.
 <!-- agent-browser-playbook:end shared-guidelines -->
@@ -178,11 +178,12 @@ The extension always plans normal browser commands with `--json` prepended in `e
 Illustrative shapes (each real call uses exactly one of `args`, `semanticAction`, `job`, `qa`, `sourceLookup`, `networkSourceLookup`, or `electron`):
 ```json
-{ "args": ["open", "https://example.com"], "stdin": "optional raw stdin content", "sessionMode": "auto" }
+{ "args": ["open", "https://example.com"], "stdin": "optional raw stdin content", "outputPath": "logs/result.json", "timeoutMs": 35000, "sessionMode": "auto" }
 ```
 ```json
 { "semanticAction": { "action": "click", "locator": "role", "value": "button", "name": "Export" }, "sessionMode": "auto" }
+{ "semanticAction": { "action": "fill", "selector": "@e1", "text": "prompt text" } }
 { "semanticAction": { "action": "select", "selector": "#flavor", "value": "chocolate" } }
 ```
@@ -217,16 +218,17 @@ Examples:
 - type: object
 - optional; mutually exclusive with `args`, `job`, `qa`, `sourceLookup`, `networkSourceLookup`, and `electron` (omit all of them when using this field)
 - top-level tool input only: `batch` stdin remains upstream argv arrays; express find steps inside batch as string arrays such as `["find","role","button","click","--name","Export"]`, not nested `semanticAction` objects
-- thin intent schema compiled by this wrapper into existing upstream commands; locator actions compile to `find`, while native dropdown selection compiles to `select <selector> <value...>`; behavior and locator/selector semantics stay upstream-owned
+- thin intent schema compiled by this wrapper into existing upstream commands; locator actions compile to `find`, direct selector/ref `click` / `check` / `fill` compile to the matching upstream command, and native dropdown selection compiles to `select <selector> <value...>`; behavior and locator/selector semantics stay upstream-owned
 - supported actions: `click`, `fill`, `check`, `select`
 - supported locators for `click` / `fill` / `check`: `role`, `text`, `label`, `placeholder`, `alt`, `title`, `testid`
+- optional `selector` is accepted for direct `click`, `check`, and `fill` targets (including current `@refs`); do not combine it with `locator`, `value`, `role`, or `name`. For `fill`, `text` is still required.
 - `semanticAction` does not expose `uncheck` while upstream `find ... uncheck` is not runtime-supported; use raw `args: ["uncheck", <selector-or-ref>]` after a stable selector or current snapshot ref
 - for locator actions, `value` is the locator argument (for example ARIA role token `"button"`, label text, or visible substring), must be a non-empty string after trim; for `locator: "role"`, callers may provide `role` instead of redundant `value`
 - `fill` requires non-empty `text` (compiled as the trailing value argument to `find`)
 - `select` requires non-empty `selector` plus either `value` (single option value) or `values` (non-empty array of option values). `select` does not accept `locator`, `role`, `name`, or `text`; upstream `find` does not expose a verified `select` action, so role/name/label dropdown targeting must first be resolved to a stable selector or current `@ref`.
 - optional `name` is only valid with `locator: "role"` and compiles to `--name <name>` after the action (and after `text` for `fill` when present)
 - optional `role` is accepted only when `locator` is `role`; it may replace `value`, and must equal `value` if both are set
-- optional `session` is an upstream session name; when set, compilation prepends `--session <session>` before the compiled `find` or `select` command so the shorthand targets that named browser context instead of the managed default; this is independent of top-level `sessionMode`, which only injects or rotates the extension-managed implicit session when the planned argv does not already start with `--session` (see `buildExecutionPlan` in `extensions/agent-browser/lib/runtime.ts`). On successful unified results, `details.sessionName` matches that name and `usedImplicitSession` is `false` because the call named upstream directly rather than consuming the extension-managed implicit session slot.
+- optional `session` is an upstream session name; when set, compilation prepends `--session <session>` before the compiled `find`, direct selector/ref command, or `select` command so the shorthand targets that named browser context instead of the managed default; this is independent of top-level `sessionMode`, which only injects or rotates the extension-managed implicit session when the planned argv does not already start with `--session` (see `buildExecutionPlan` in `extensions/agent-browser/lib/runtime.ts`). On successful unified results, `details.sessionName` matches that name and `usedImplicitSession` is `false` because the call named upstream directly rather than consuming the extension-managed implicit session slot.
 Compilation (then `--json` and session handling apply like any other call):
@@ -235,10 +237,11 @@ Compilation (then `--json` and session handling apply like any other call):
 | `click` or `check` + non-`role` locator | `["find", <locator>, <value>, <action>]` |
 | `click` / `check` / `fill` + `role` or `value` + optional `name` | `["find","role",<role-or-value>,<action>]` plus `["--name",<name>]` when `name` is set; active-session fill may pre-resolve to `fill @ref <text>` when one exact editable current ref matches |
 | `fill` | `["find",<locator>,<value>,"fill",<text>]` plus optional `["--name",<name>]` after `text` when `locator` is `role` and `name` is set |
+| `click` / `check` / `fill` + `selector` | `[<action>,<selector>]` (plus `<text>` for `fill`) |
 | `select` + `selector` + `value` / `values` | `["select",<selector>,<value...>]` |
 | any supported action + `session` | prepends `["--session",<session>]` before the compiled argv |
-When `semanticAction` compiles successfully, `details.compiledSemanticAction` echoes `{ action, locator, args }` for `find` actions or `{ action: "select", selector, values, args }` for `select`, with `args` redacted the same way as other invocation details. Expect it on the initial wrapper validation return (when that path still builds the early `details` object) and on the unified result after `agent-browser` runs. It is omitted when the call used `args` only, when compilation never produced argv, and on some in-`execute` error returns that attach a slimmer `details` shape before the unified merge (for example certain session-plan, stdin-contract, tab-pinning, or missing-binary guard paths); compare `extensions/agent-browser/index.ts` where `compiledSemanticAction` is assigned. For active sessions, role/name `click`, `check`, and guarded `fill` semantic actions may be resolved through one fresh `snapshot -i` to a current visible `@ref` before execution; fill only resolves when one exact editable `combobox`, `searchbox`, or `textbox` ref matches. This avoids hidden duplicate matches stealing an upstream `find` action. In that case `details.compiledSemanticAction` still records the original semantic target while `details.effectiveArgs` shows the executed ref action.
+When `semanticAction` compiles successfully, `details.compiledSemanticAction` echoes `{ action, locator, args }` for `find` actions, `{ action, selector, args }` for direct selector/ref click/check/fill actions, or `{ action: "select", selector, values, args }` for `select`, with `args` redacted the same way as other invocation details. Expect it on the initial wrapper validation return (when that path still builds the early `details` object) and on the unified result after `agent-browser` runs. It is omitted when the call used `args` only, when compilation never produced argv, and on some in-`execute` error returns that attach a slimmer `details` shape before the unified merge (for example certain session-plan, stdin-contract, tab-pinning, or missing-binary guard paths); compare `extensions/agent-browser/index.ts` where `compiledSemanticAction` is assigned. For active sessions, role/name `click`, `check`, and guarded `fill` semantic actions may be resolved through one fresh `snapshot -i` to a current visible `@ref` before execution; fill only resolves when one exact editable `combobox`, `searchbox`, or `textbox` ref matches. This avoids hidden duplicate matches stealing an upstream `find` action. In that case `details.compiledSemanticAction` still records the original semantic target while `details.effectiveArgs` shows the executed ref action.
 If a raw `find` or compiled `semanticAction` fails with `failureCategory: "selector-not-found"`, the wrapper may run one fresh session-scoped `snapshot -i` and add visible `Current snapshot ref fallback` plus `details.visibleRefFallback` when that snapshot contains exact role/name matches for the failed target. Non-fill matches can also add `try-current-visible-ref` / `try-current-visible-ref-N` next actions. The matcher is bounded to current snapshot refs and exact normalized role/name matches: role locators require `--name`, text-click falls back only to exact-name `button`/`link` refs, label-fill to exact-name `textbox`, and placeholder-fill to exact-name `searchbox`/`textbox`. It never fuzzy-matches names such as prefixes; when several exact refs match, each action carries safety copy telling agents to inspect the snapshot and choose only if unambiguous. For post-failure `fill` matches, `visibleRefFallback.candidates[].args` and `visibleRefFallback.target.text` are omitted so recovery details do not repeat the fill text.
@@ -255,6 +258,8 @@ Examples:
 { "semanticAction": { "action": "click", "locator": "role", "role": "button", "name": "Continue without Signing In" } }
 { "semanticAction": { "action": "click", "locator": "text", "value": "Close" } }
 { "semanticAction": { "action": "fill", "locator": "label", "value": "Email", "text": "user@example.com" } }
+{ "semanticAction": { "action": "fill", "selector": "@e1", "text": "prompt text" } }
+{ "semanticAction": { "action": "click", "selector": "#submit" } }
 { "semanticAction": { "action": "select", "selector": "#flavor", "value": "chocolate" } }
 { "semanticAction": { "action": "select", "selector": "#multi", "values": ["dark", "compact"] } }
 { "semanticAction": { "action": "check", "locator": "label", "value": "Remember me" } }
@@ -267,19 +272,22 @@ Examples:
 - optional; mutually exclusive with `args`, `semanticAction`, `qa`, `sourceLookup`, `networkSourceLookup`, and `electron`
 - top-level tool input only; do not nest `job` inside `batch` stdin
 - constrained orchestration only: every step compiles to existing upstream `batch` argv and the compiled plan is echoed as `details.compiledJob`
+- optional `failFast` boolean; defaults to `true`, compiling to upstream `batch --bail` so later mutating job steps do not run after an earlier required step fails. Set `failFast: false` only when you explicitly want upstream batch's continue-after-error behavior.
 - there is no separate reusable named “browser recipe” extension surface above `job`, `qa`, and raw `batch` yet; the closed `RQ-0068` decision, evidence bar, and revisit criteria are in [`ARCHITECTURE.md`](ARCHITECTURE.md#no-reusable-recipe-layer-yet) and [`SUPPORT_MATRIX.md`](SUPPORT_MATRIX.md)
-- supported steps (each row becomes one upstream `batch` step; `click` / `fill` / `select` pass `selector` through as the same argv token shape standalone upstream commands would use, including `@refs`, not the `semanticAction` locator schema):
-  - `open` with `url`
+- supported steps (`open.loadState` can emit an extra readiness row immediately after the `open`; otherwise each row becomes one upstream `batch` step except paced `type`, which expands to existing focus/keyboard/wait/press rows; `click` / `fill` / `type` / `select` pass `selector` through as the same argv token shape standalone upstream commands would use, including `@refs`, not the `semanticAction` locator schema):
+  - `open` with `url`; optional `loadState` (`domcontentloaded`, `load`, or `networkidle`) inserts `wait --load <state>` immediately after the open when the next step needs page-readiness evidence
   - `click` with `selector`
   - `fill` with `selector` and `text`
+  - `type` with `text`; optional `selector` focuses the target first, optional `delayMs` emits per-character `keyboard type` plus `wait` rows, and optional `press` emits a final `press <key>` row such as `Enter`. Delayed typing is capped at 200 characters per step. Model-visible batch prose compacts generated per-character rows; `details.batchSteps` and `details.compiledJob.steps` retain the full bounded row list.
   - `select` with `selector` plus either `value` or `values` (one or more option values; compiled as `select <selector> <value...>`)
   - `wait` with positive integer `milliseconds`
   - `assertText` with `text` (compiled as passive `wait --text <text>`)
   - `assertUrl` with `url` pattern (compiled as `wait --url <pattern>`)
   - `waitForDownload` with `path` (compiled as `wait --download <path>`)
+  - `snapshot` (compiled as `snapshot -i`; useful between mutation-prone steps before reusing current refs)
   - `screenshot` with `path`
-**Navigation assertions are explicit only.** `job` never treats a successful `click` (or a `select` / submit-style interaction that may navigate) as proof that the expected next page loaded. Top-level `click` may still surface optional `details.navigationSummary` or `pageChangeSummary` hints for operators, but compiled `job` / `batch` steps do **not** auto-insert `assertUrl` or `assertText` after clicks—there is no deterministic expected URL source without caller intent. After any navigation-prone step (link/submit clicks, checkout or form flows, tab-sensitive UI), add an explicit `assertUrl` with the destination pattern you expect, `assertText` for on-page copy, or both, **before** screenshots or steps that assume the new page state.
+**Navigation assertions are explicit only.** `job` never treats a successful `click` (or a `select` / submit-style interaction that may navigate) as proof that the expected next page loaded. Top-level `click` may still surface optional `details.navigationSummary` or `pageChangeSummary` hints for operators, but compiled `job` / `batch` steps do **not** auto-insert `assertUrl` or `assertText` after clicks—there is no deterministic expected URL source without caller intent. Use `open.loadState` to wait for initial page readiness after an `open`; after any later navigation-prone step (link/submit clicks, checkout or form flows, tab-sensitive UI), add an explicit `assertUrl` with the destination pattern you expect, `assertText` for on-page copy, or both, **before** screenshots or steps that assume the new page state.
 Example (static landing page):
@@ -316,30 +324,30 @@ Compiled shape for the navigation example:
 ```json
 {
-  "args": ["batch"],
+  "args": ["batch", "--bail"],
   "stdin": "[[\"open\",\"https://shop.example/checkout\"],[\"fill\",\"#email\",\"user@example.com\"],[\"click\",\"#continue\"],[\"wait\",\"--url\",\"**/shipping\"],[\"wait\",\"--text\",\"Shipping address\"],[\"screenshot\",\".dogfood/shipping.png\"]]"
 }
 ```
-On app pages that expose a native dropdown, add a `select` step such as `{ "action": "select", "selector": "#flavor", "value": "chocolate" }` before the assertion that depends on it.
+On app pages that expose a native dropdown, add a `select` step such as `{ "action": "select", "selector": "#flavor", "value": "chocolate" }` before the assertion that depends on it. `click` and `fill` steps may use either `selector` or semantic locator fields. A semantic locator step mirrors the top-level `semanticAction` compiler and must include `locator` plus the locator value (`role`/`value` and optional `name` for role locators); it compiles to upstream `find`, for example `{ "action": "fill", "locator": "role", "role": "searchbox", "name": "Search", "text": "agent browser" }` → `find role searchbox fill "agent browser" --name Search` and `{ "action": "click", "locator": "role", "role": "button", "name": "Search" }` → `find role button click --name Search`. Supplying both `selector` and semantic locator fields is rejected.
-Use raw `args` plus `stdin` for upstream `batch` when a flow needs commands, flags, stdin forms, or failure policies outside this constrained schema.
+Use raw `args` plus `stdin` for upstream `batch` when a flow needs commands, flags, stdin forms, or failure policies outside this constrained schema. `job.failFast: false` keeps the constrained schema but removes `--bail` when collecting later diagnostic artifacts is more important than stopping on the first failure.
-Because `job` still executes as upstream `batch` with generated stdin, the same wrapper page-scoped `@e…` preflight applies: if you pass `@refs` in `click`/`fill`/`select` selectors after an `open`, non-form `click`, or another step that can navigate or mutate the page, split the work across tool calls or switch to raw `batch` and insert your own `snapshot -i` rows between steps—the constrained `job` vocabulary does not emit snapshot steps for you. Multiple same-snapshot `fill @e…` rows may run before the first click/submit-style step. Raw `args:["batch"]` stdin can also batch native form-control rows (`check`/`uncheck` checkbox or radio refs and `select` combobox refs) before that click.
+Because `job` still executes as upstream `batch` with generated stdin, the same wrapper page-scoped `@e…` preflight applies: if you pass `@refs` in `click`/`fill`/`select` selectors after an `open`, non-form `click`, or another step that can navigate or mutate the page, split the work across tool calls or switch to raw `batch` and insert your own `snapshot -i` rows between steps—the constrained `job` vocabulary emits a `snapshot` step only when you include `{ action: "snapshot" }` explicitly. Multiple same-snapshot `fill @e…` rows may run before the first click/submit-style step. Raw `args:["batch"]` stdin can also batch native form-control rows (`check`/`uncheck` checkbox or radio refs and `select` combobox refs) before that click.
 ### `qa`
 - type: object with either required `url` (normal URL-opening QA) or `attached: true` (current attached-session QA)
 - optional; mutually exclusive with `args`, `semanticAction`, `job`, `sourceLookup`, `networkSourceLookup`, and `electron`
-- lightweight preset built on the same batch compiler path as `job`
-- URL form: clears enabled diagnostic buffers first (`network requests --clear`, `console --clear`, `errors --clear`), then opens `url`, waits with `wait --load <state>` using the resolved `loadState`, optionally asserts `expectedText` (string or string array) and/or `expectedSelector` (each may be omitted for a load-plus-diagnostics-only smoke), then runs enabled diagnostics: `network requests`, `console`, and `errors`
-- attached form: `qa: { attached: true, expectedText?, expectedSelector?, screenshotPath?, checkNetwork?, checkConsole?, checkErrors?, loadState? }` runs the same waits, optional assertions, diagnostics, and screenshot against the current attached managed session without opening a URL. It rejects `url` and cannot be used with `sessionMode: "fresh"`; attach first with `electron.launch` or raw `args: ["connect", "<port-or-url>"]`, then run `qa.attached`. Before spawning the diagnostic batch, the wrapper preflights the attached session: `get url` must succeed and return an `http:` or `https:` page URL. Missing URLs, read failures, and non-http(s) surfaces fail fast with `failureCategory: "validation-error"`, `details.validationError`, and recovery `nextActions` such as `list-tabs-before-qa-attached` and `snapshot-before-qa-attached` instead of running the full QA batch.
+- lightweight preset built on the same batch compiler path as `job`, using `batch --bail` so missing readiness/text/selector assertions stop before slower diagnostics can burn the wrapper watchdog
+- URL form: clears enabled diagnostic buffers first (`network requests --clear`, `console --clear`, `errors --clear`), then opens `url`, waits with `wait --load <state>` using the resolved `loadState`, optionally asserts `expectedText` (string or string array, compiled to bounded visible-text `wait --fn … --timeout 5000` predicates after load) and/or `expectedSelector` (each may be omitted for a load-plus-diagnostics-only smoke), then runs enabled diagnostics: `network requests`, `console`, and `errors` only if preceding batch steps pass
+- attached form: `qa: { attached: true, expectedText?, expectedSelector?, screenshotPath?, checkNetwork?, checkConsole?, checkErrors?, loadState? }` runs the same waits, optional assertions, diagnostics, and screenshot against the current attached managed session without opening a URL. It rejects `url` and cannot be used with `sessionMode: "fresh"`; attach first with `electron.launch` or raw `args: ["connect", "<port-or-url>"]`, then run `qa.attached`. Before spawning the diagnostic batch, the wrapper preflights the attached session: `get url` must succeed and return an `http:` or `https:` page URL. Missing URLs, read failures, and non-http(s) surfaces fail fast with `failureCategory: "validation-error"`, `details.validationError`, and recovery `nextActions` such as `list-tabs-before-qa-attached` and `snapshot-before-qa-attached` instead of running the full QA batch. Attached QA does **not** run `network requests --clear`, `console --clear`, or `errors --clear`; `details.compiledQaPreset.checks.diagnosticsResetAtStart` is `false`. Visible text warns that existing diagnostic buffers were preserved only when `checkNetwork`, `checkConsole`, or `checkErrors` is enabled, and those diagnostics may include events from before the QA check.
 - `loadState` is optional and must be `domcontentloaded`, `load`, or `networkidle`; it defaults to `domcontentloaded` so analytics-heavy or long-polling pages do not hang routine QA. Use `networkidle` only when the site is expected to go fully quiet.
-- `checkNetwork`, `checkConsole`, and `checkErrors` default to `true`; set a field to `false` to omit that diagnostic
+- `checkNetwork`, `checkConsole`, and `checkErrors` default to `true` for URL-opening QA; for `qa.attached` they default to `false` because preserved upstream buffers may predate the current check. Set a field to `true` on `qa.attached` to opt into preserved-buffer diagnostics.
 - optional `screenshotPath` adds an evidence screenshot step
 - reports `details.compiledQaPreset` with the compiled batch plan and resolved `loadState`, plus `details.qaPreset` with `{ passed, failedChecks, warnings, summary }`
 - on success with no failed checks, model-visible prose collapses to a compact pass summary (current page URL/title when known, checks run, optional screenshot path plus artifact verification, pointer to `details.qaPreset` and `details.batchSteps` for the full matrix). Failed QA and QA reclassified to `qa-failure` keep the verbose per-step batch output.
-- fails the native tool result with `failureCategory: "qa-failure"` when diagnostics report page errors, console error messages, actionable failed network requests, or any batch step failure. Benign classification (implementation: `classifyNetworkRequestFailure` → `isBenignAssetFailure` in `extensions/agent-browser/lib/results/network.ts`) applies only when the row is already treated as failed (`status >= 400`, `failed: true`, or a string `error`—see `isFailedNetworkRequest`), the URL path’s last segment matches the icon basename heuristic (`favicon` plus `.ico`/`.png`/`.svg`, or `apple-touch-icon` plus `.png`, each allowing an optional `[-.\w]*` stem suffix before the extension), **and** at least one of `status === 404`, `failed === true`, or `typeof request.error === "string"` holds (so a **status-only** failure such as `500` on that path with neither `failed` nor a string `error` stays actionable). It also requires the upstream `resourceType` / `mimeType` (whichever is present) to be absent or look image-like: `image`, `img`, `other`, or a value starting with `image/`. Those rows are counted in `qaPreset.warnings` (for example `N benign network request failure(s) ignored`) and omitted from the actionable failed-network tally; every other failed request stays actionable.
+- fails the native tool result with `failureCategory: "qa-failure"` when diagnostics report page errors, console error messages, actionable failed network requests, any batch step failure, or missing expected text after the requested load state. Expected-text QA checks use bounded visible-element predicates plus fail-fast batch behavior instead of reading all body text, so dense pages can pass on visible headings/copy and missing text reports `expected text not found: …` rather than burning the full wrapper watchdog. Benign classification (implementation: `classifyNetworkRequestFailure` → `isBenignAssetFailure` in `extensions/agent-browser/lib/results/network.ts`) applies only when the row is already treated as failed (`status >= 400`, `failed: true`, or a string `error`—see `isFailedNetworkRequest`), the URL path’s last segment matches the icon basename heuristic (`favicon` plus `.ico`/`.png`/`.svg`, or `apple-touch-icon` plus `.png`, each allowing an optional `[-.\w]*` stem suffix before the extension), **and** at least one of `status === 404`, `failed === true`, or `typeof request.error === "string"` holds (so a **status-only** failure such as `500` on that path with neither `failed` nor a string `error` stays actionable). It also requires the upstream `resourceType` / `mimeType` (whichever is present) to be absent or look image-like: `image`, `img`, `other`, or a value starting with `image/`. Those rows are counted in `qaPreset.warnings` (for example `N benign network request failure(s) ignored`) and omitted from the actionable failed-network tally; every other failed request stays actionable.
 Example:
@@ -445,7 +453,7 @@ Failure categories and next actions:
 - `policy-blocked` is used when `electron.launch` is blocked by caller-supplied `allow` / `deny`; inspect `details.electron.failure.policy` for the matched list and entry when present.
 - `cleanup-failed` is used when `electron.cleanup` only partially cleans tracked resources; inspect `details.electron.cleanup.results[].steps` for remaining process, port, or profile cleanup state.
 - Launch timeout maps to `timeout`; non-Electron targets and input issues map to `validation-error`; launch/attach/spawn/CDP failures map to `upstream-error` unless a more specific category applies. If a successful-looking Electron mutation is followed by a dead process/debug port or an unrecoverable `about:blank` target, the wrapper upgrades the result to `failureCategory: "tab-drift"` and sets `details.electronPostCommandHealth` with the launch status and recovery action ids.
-- Successful Electron `fill <selector> <text>` commands may run a read-only `get value <selector>` verification. If the value still differs, the wrapper keeps the tool successful but adds `details.fillVerification`, visible guidance to use snapshot/focus/keyboard typing for custom quick-input controls, and `inspect-after-fill-verification` / `verify-filled-value` next actions.
+- Successful Electron `fill <selector> <text>` commands may run a read-only `get value <selector>` verification, and successful fills against refs whose latest snapshot metadata proves a contenteditable target may run `get text <ref>` verification. If the value/text still differs, the wrapper keeps the tool successful but adds `details.fillVerification`, visible guidance to use snapshot/focus/keyboard typing for custom quick-input or rich editor controls, and `inspect-after-fill-verification` / `verify-filled-value` next actions.
 - Successful active launches/status/probe results may include exact `details.nextActions` with ids `status-electron-launch`, `probe-electron-launch`, `cleanup-electron-launch`, `list-electron-tabs`, and `snapshot-electron-session`. Electron status/probe mismatch diagnostics may also include `reattach-electron-launch` before fresh tab/snapshot inspection. Electron post-command health failures include status/probe/cleanup actions for the same `launchId`; Electron `@e…` mutations may add `refresh-electron-refs-after-rerender` because desktop apps often rerender without URL changes. Electron cleanup partial failures may include `status-electron-launch` and `retry-electron-cleanup`.
 Next-action payload examples:
@@ -557,6 +565,34 @@ For `eval --stdin`, put the script in the top-level `stdin` field. The wrapper n
 { "args": ["auth", "save", "my-login", "--password-stdin"], "stdin": "password from the user-approved secret source" }
 ```
+### `outputPath`
+- type: `string`
+- optional; can be used with any successful browser result path, most often `eval --stdin`, `get text`, `get html`, `snapshot`, or diagnostic captures whose result should become a durable local file
+- workspace-relative paths resolve against the Pi session cwd; absolute paths are used as-is; a leading `@` is stripped for consistency with Pi file arguments
+- after the upstream command completes, the wrapper writes `details.data` when present, otherwise the model-facing text content; objects/arrays are written as pretty JSON with a trailing newline and strings are written as-is
+- successful writes append `details.outputFile = { status: "saved", path, absolutePath, source, bytes }`; they also append a visible `Output file: …` line except when the caller explicitly passed upstream `--json`, where parseable JSON content is preserved and the saved-file notice lives only in `details.outputFile`. Write failures append `details.outputFile.status: "failed"`, remove success-only category fields, and mark the tool result failed without rolling back browser session state.
+Example:
+```json
+{ "args": ["eval", "--stdin"], "stdin": "({ title: document.title, url: location.href })", "outputPath": "logs/page-state.json" }
+```
+### `timeoutMs`
+- type: positive integer milliseconds
+- optional per-call wrapper subprocess watchdog for browser CLI calls (`args`, `job`, `qa`, `sourceLookup`, and `networkSourceLookup`); Electron actions use nested `electron.timeoutMs` instead
+- use for long opens, large snapshots, paced `job` typing, or captures that legitimately need more than the default watchdog
+- fixed `wait` steps still must stay under the upstream IPC wait budget; increasing `timeoutMs` does not make `wait 30000` valid
+- when the watchdog fires, `details.timeoutMs`, `details.timedOut`, and possibly `details.timeoutPartialProgress` explain what was recovered
+Example:
+```json
+{ "args": ["open", "https://slow.example.test/"], "timeoutMs": 45000 }
+```
 ### `sessionMode`
 - type: `"auto" | "fresh"`
@@ -607,8 +643,9 @@ Primary content should be:
 - useful result text for the model, not just a status line
 - an image attachment when relevant
 - browser-aware compacting for oversized snapshots so the model gets a concise actionable view before raw page noise
-- compact snapshots should be main-content-first: prefer the primary content block and nearby sections over top-of-page chrome, ads, or unrelated sidebars when those can be distinguished from the snapshot tree
-- when compacting hides actionable controls, snapshot output should add an `Omitted high-value controls` section for bounded editable/searchbox/textbox/combobox controls, named tab/surface controls, primary action buttons, and other useful controls such as checkboxes, radios, options, and menuitems that were not already shown in key refs
+- compact snapshots should be main-content-first: prefer the primary content block and nearby sections over top-of-page chrome, ads, or unrelated sidebars when those can be distinguished from the snapshot tree. They are DOM/signal-prioritized, not guaranteed viewport-first after scroll; compact output may include `details.snapshotCompaction.viewportOrdering: "dom-signal-prioritized"` and a visible viewport note when viewport context matters.
+- when compacting hides actionable controls, snapshot output should add an `Omitted high-value controls` section for bounded editable/searchbox/textbox/combobox controls, named tab/surface controls, primary action buttons, high-signal named links such as repository search results, and other useful controls such as checkboxes, radios, options, and menuitems that were not already shown in key refs
+- wrapper-side `snapshot -i --search <text>` and `snapshot -i --filter role=<role>` filters should strip those wrapper-only flags before upstream spawn, preserve the full latest ref map in `details.refSnapshot`, and render only matching snapshot refs/lines with `details.snapshotFilter` counts so dense-page agents can find controls without opening raw spill files; wrapper-side `--viewport` should also strip before upstream spawn, run one read-only viewport/scroll probe, and report `details.snapshotViewport`; wrapper-side `--diff` should strip before upstream spawn and report `details.snapshotDiff` against the previous wrapper-tracked ref map for that session
 Examples:
 - small `snapshot` results should include the actual snapshot text
@@ -663,15 +700,15 @@ Top-level `details.data` on `batch` is a compact per-step roll-up (not a verbati
 Ref preflight details (command taxonomy in `extensions/agent-browser/lib/command-taxonomy.ts`, orchestration in `extensions/agent-browser/lib/orchestration/browser-run/session-state.ts`):
 - **URL alignment:** `refSnapshot.target.url` and the session’s current tab URL are compared via `targetsMatch` / `normalizeComparableUrl` in `extensions/agent-browser/index.ts`: values are trimmed, parsed as URLs when possible, compared **after dropping the `#fragment`**, and the query string remains significant. If either side lacks a `url`, `targetsMatch` treats the pair as matching so early-session calls are not blocked.
-- **Batch stdin ordering:** user `batch` JSON is scanned in order. Any step whose first token satisfies `isRefInvalidatingBatchCommand` sets a latch that blocks later steps whose first token satisfies `isRefGuardedCommand` and that mention `@e…` refs, except for same-snapshot native form-control steps whose current snapshot role metadata identifies all refs as safe controls (`check`/`uncheck` on checkbox or radio refs and `select` on combobox refs). A step whose first token is `snapshot` clears that latch for subsequent steps (pre-spawn intent only; it does not wait for upstream success). These predicates read explicit command capability flags from `command-taxonomy.ts`: navigation/mutation verbs such as `open` / `goto`, `reload`, non-form `click`, and related upstream commands have `invalidatesBatchRefs`; same-snapshot `fill` rows and the role-checked native form-control rows stay guarded against missing/stale refs but do not set the latch, allowing ordinary form batches before a click/submit step; direct `click @e…` remains invalidating even when the snapshot role is checkbox or radio because role metadata alone does not prove native element semantics. Ref-guarded commands accept page-scoped refs for interaction (`click`, `fill`, `download`, `scrollintoview` / `scrollinto`, and others centralized in the command taxonomy). Changing either capability requires updating this contract, [`docs/SUPPORT_MATRIX.md`](SUPPORT_MATRIX.md) `RQ-0072`/`RQ-0087` notes, README and command-reference pitfalls, and `test/agent-browser.extension-validation.test.ts`.
+- **Batch stdin ordering:** user `batch` JSON is scanned in order. Any step whose first token satisfies `isRefInvalidatingBatchCommand` sets a latch that blocks later steps whose first token satisfies `isRefGuardedCommand` and that mention `@e…` refs, except for same-snapshot native form-control steps whose current snapshot role metadata identifies all refs as safe controls (`check`/`uncheck` or direct `click`/`tap` on checkbox or radio refs, and `select` on combobox refs). A step whose first token is `snapshot` clears that latch for subsequent steps (pre-spawn intent only; it does not wait for upstream success). These predicates read explicit command capability flags from `command-taxonomy.ts`: navigation/mutation verbs such as `open` / `goto`, `reload`, non-form `click`, and related upstream commands have `invalidatesBatchRefs`; same-snapshot `fill` rows and the role-checked native form-control rows stay guarded against missing/stale refs but do not set the latch, allowing ordinary form batches before a final click/submit step. Direct `click`/`tap @e…` is only treated as a safe form-control row when every ref in that step is a latest-snapshot checkbox or radio; other click/tap refs remain invalidating. Ref-guarded commands accept page-scoped refs for interaction (`click`, `fill`, `download`, `scrollintoview` / `scrollinto`, and others centralized in the command taxonomy). Changing either capability requires updating this contract, [`docs/SUPPORT_MATRIX.md`](SUPPORT_MATRIX.md) `RQ-0072`/`RQ-0087` notes, README and command-reference pitfalls, and `test/agent-browser.extension-validation.test.ts`.
 **Presentation redaction (implementation map):** Successful non-`batch` tool calls and each successful `batchSteps[]` row run upstream `data` through `redactPresentationData` in `extensions/agent-browser/lib/results/presentation/diagnostics.ts`: `cookies` still walk objects/arrays and replace case-insensitive `value` keys with `"[REDACTED]"`; `storage` redacts values when the key or value looks credential-like (token, cookie, auth, secret, JWT, bearer/basic credential, high-entropy token-like string, or nested sensitive JSON) but keeps low-risk primitive QA values such as booleans, numbers, and short strings visible. Redacted storage entries add `valueRedacted` plus `valueRedactionReason` in `details.data`; diagnostic formatters mirror the same decision. Every other command’s payload is recursively scrubbed with `redactStructuredPresentationValue`, which redacts known sensitive key names and applies string-level sensitivity heuristics so network, diff, trace/profiler, stream, dashboard, chat, and other structured results do not echo bearer tokens, proxy credentials, or similar fields verbatim into `details.data`. Echoed `command` arrays in `details` and in batch roll-ups use `redactInvocationArgs` from `extensions/agent-browser/lib/runtime.ts` to mask trailing values for sensitive global flags (including `--body`, `--headers`, `--password`, and `--proxy`), preserve the special positional rules for `cookies set`, `storage local|session set`, and `set credentials`, and scrub other argv tokens for URLs and inline secrets. Failed batch steps additionally run `redactExactValues` on structured step errors so literals taken from that step’s argv (cookie value, storage set value, `--password` / `--password=` tokens) cannot reappear inside formatted error blobs.
-`nextActions` is an optional machine-readable list of exact native `agent_browser` follow-ups. Each entry includes `tool: "agent_browser"`, an `id`, a short `reason`, optional `safety`, and either `params` (`args`, optional `stdin`, optional `sessionMode`, optional `networkSourceLookup`, optional `electron`) or an `artifactPath` for saved-file workflows. Agents should prefer these payloads over prose when present. Tab/session recovery id strings are centralized in `AGENT_BROWSER_RECOVERY_NEXT_ACTION_IDS`, while rich-input focus/click recovery ids are centralized in `AGENT_BROWSER_RICH_INPUT_RECOVERY_NEXT_ACTION_IDS` plus `getAgentBrowserRichInputRecoveryNextActionId(s)` in `extensions/agent-browser/lib/results/recovery-actions.ts` (both registries are also re-exported from `shared.ts`); docs and tests mirror those registries/helpers rather than inventing recovery ids in prose. Current recommendations include: raw `connect` success → session-scoped `list-connected-session-tabs` only, then the agent should inspect/select a stable `tab t<N>` target and run `snapshot -i` explicitly; `snapshot` failures whose upstream error says `No active page` and whose wrapper result has a known session → `list-tabs-after-no-active-page` only, because this path has no wrapper-observed safe tab id to select atomically; browser profile/user-data-dir resolution failures → `inspect-browser-profiles` (`profiles`) and `run-agent-browser-doctor` (`doctor`) before retrying opens; Electron launches → wrapper-tracked `electron.status` / `electron.probe` / `electron.cleanup` actions plus session-scoped tab/snapshot inspection when attached; Electron status/probe mismatch diagnostics → `reattach-electron-launch` plus fresh tab/snapshot inspection; Electron post-command health failures → status/probe/cleanup for the same `launchId`; Electron fill verification mismatches → `inspect-after-fill-verification` and `verify-filled-value`; Electron same-URL ref freshness warnings → `refresh-electron-refs-after-rerender`; packaged-Electron `sourceLookup` no-candidate diagnostics → session snapshot, launch probe, and tab list; Electron cleanup partial failures → status plus retry-cleanup for the same wrapper-owned `launchId`; `open` success → `snapshot -i`; mutating/navigation commands (see `buildAgentBrowserNextActions` in source for the exact command set) → `snapshot -i`; stale refs and selector failures → `snapshot -i` via `refresh-interactive-refs` (prefixed with `--session <name>` when the failed call ran in a named or managed session); selector misses with exact current snapshot role/name matches → direct ref retries via `try-current-visible-ref` or bounded `try-current-visible-ref-N` for non-fill targets; semantic `fill` selector misses with exact current editable refs → `focus-current-editable-ref` / `click-current-editable-ref` or numbered variants that do not include fill text or submit; unknown getter shortcuts such as `title` / `url` → exact read-only retries like `get title` / `get url` with ids `use-get-title` / `use-get-url`; compact `network requests` results with safe request IDs → bounded read-only request detail, `networkSourceLookup`, path filter, or HAR-capture follow-ups; semantic `selector-not-found` failures that compiled from `semanticAction` may append `try-button-name-candidate` or `try-link-name-candidate` after presentation `nextActions` only for the bounded click pair enumerated under `semanticAction`; semantic `stale-ref` failures that compiled from `semanticAction` `find` argv may also include `retry-semantic-action-after-stale-ref` after that snapshot step; qualifying same-URL non-Electron top-level clicks (see `overlayBlockers` below) with fresh snapshot evidence of likely overlay/banner/dialog close controls may append `inspect-overlay-state` and bounded `try-overlay-blocker-candidate-*` entries; successful top-level `scroll` calls whose pre/post viewport and sampled scroll-container positions do not change may append `inspect-after-noop-scroll` and `verify-noop-scroll-visually`; explicit combobox-targeted actions that focus a combobox without visible options may append `inspect-focused-combobox`, `try-open-combobox-with-arrow`, and `try-open-combobox-with-enter`; `get text <selector>` calls with hidden/multiple CSS matches may append `inspect-visible-text-candidates` with a read-only `eval --stdin` probe (each prefixed with `--session <name>` when `details.sessionName` is set, same `sessionPrefixArgs` rule as other session-scoped follow-ups); confirmations → exact `confirm <id>` and `deny <id>` choices; generic tab drift → `list-tabs-for-recovery` with `tab list` first, then select or confirm the stable target before running `snapshot -i`; about:blank or tab-drift recovery with a wrapper-known target → `list-tabs-for-about-blank-recovery` or `list-tabs-for-tab-drift-recovery`, plus `select-intended-tab-after-drift` and `snapshot-after-tab-recovery` when the wrapper already observed the stable `t<N>` tab id; `wait --text` assertion failures → `inspect-after-text-assertion-failure` with a read-only snapshot; download verification failures or missing successful download artifacts → `wait --download [path]`; saved artifacts → the artifact path to inspect/consume after checking `artifactVerification`/metadata; missing non-download artifacts → `verify-artifact-path` so agents do not trust an absent file. When nothing applies, the field is omitted.
+`nextActions` is an optional machine-readable list of exact native `agent_browser` follow-ups. Each entry includes `tool: "agent_browser"`, an `id`, a short `reason`, optional `safety`, and either `params` (`args`, optional `stdin`, optional `sessionMode`, optional `networkSourceLookup`, optional `electron`) or an `artifactPath` for saved-file workflows. Agents should prefer these payloads over prose when present. Tab/session recovery id strings are centralized in `AGENT_BROWSER_RECOVERY_NEXT_ACTION_IDS`, while rich-input focus/click recovery ids are centralized in `AGENT_BROWSER_RICH_INPUT_RECOVERY_NEXT_ACTION_IDS` plus `getAgentBrowserRichInputRecoveryNextActionId(s)` in `extensions/agent-browser/lib/results/recovery-actions.ts` (both registries are also re-exported from `shared.ts`); docs and tests mirror those registries/helpers rather than inventing recovery ids in prose. Current recommendations include: raw `connect` success → session-scoped `list-connected-session-tabs` only, then the agent should inspect/select a stable `tab t<N>` target and run `snapshot -i` explicitly; `snapshot` failures whose upstream error says `No active page` and whose wrapper result has a known session → `list-tabs-after-no-active-page` only, because this path has no wrapper-observed safe tab id to select atomically; browser profile/user-data-dir resolution failures → `inspect-browser-profiles` (`profiles`) and `run-agent-browser-doctor` (`doctor`) before retrying opens; Electron launches → wrapper-tracked `electron.status` / `electron.probe` / `electron.cleanup` actions plus session-scoped tab/snapshot inspection when attached; Electron status/probe mismatch diagnostics → `reattach-electron-launch` plus fresh tab/snapshot inspection; Electron post-command health failures → status/probe/cleanup for the same `launchId`; Electron or contenteditable fill verification mismatches → `inspect-after-fill-verification` and `verify-filled-value`; Electron same-URL ref freshness warnings → `refresh-electron-refs-after-rerender`; packaged-Electron `sourceLookup` no-candidate diagnostics → session snapshot, launch probe, and tab list; Electron cleanup partial failures → status plus retry-cleanup for the same wrapper-owned `launchId`; `open` success → `snapshot -i`; mutating/navigation commands (see `buildAgentBrowserNextActions` in source for the exact command set) → `snapshot -i`; stale refs and selector failures → `snapshot -i` via `refresh-interactive-refs` (prefixed with `--session <name>` when the failed call ran in a named or managed session); selector misses with exact current snapshot role/name matches → direct ref retries via `try-current-visible-ref` or bounded `try-current-visible-ref-N` for non-fill targets; semantic `fill` selector misses with exact current editable refs → `focus-current-editable-ref` / `click-current-editable-ref` or numbered variants that do not include fill text or submit; unknown getter shortcuts such as `title` / `url` → exact read-only retries like `get title` / `get url` with ids `use-get-title` / `use-get-url`; compact `network requests` results with safe request IDs → bounded read-only request detail, `networkSourceLookup`, path filter, or HAR-capture follow-ups; semantic `selector-not-found` failures that compiled from `semanticAction` may append `try-button-name-candidate` or `try-link-name-candidate` after presentation `nextActions` only for the bounded click pair enumerated under `semanticAction`; semantic `stale-ref` failures that compiled from `semanticAction` `find` argv may also include `retry-semantic-action-after-stale-ref` after that snapshot step; successful snapshots or qualifying same-URL non-Electron top-level clicks (see `overlayBlockers` below) with snapshot evidence of likely overlay/banner/dialog close controls may append `inspect-overlay-state` and bounded `try-overlay-blocker-candidate-*` entries; successful top-level `scroll` calls whose pre/post viewport and sampled scroll-container positions do not change may append `inspect-after-noop-scroll` and `verify-noop-scroll-visually`; explicit combobox-targeted actions that focus a combobox without visible options may append `inspect-focused-combobox`, `try-open-combobox-with-arrow`, and `try-open-combobox-with-enter`; `get text <selector>` calls with hidden/multiple CSS matches may append `inspect-visible-text-candidates` with a read-only `eval --stdin` probe (each prefixed with `--session <name>` when `details.sessionName` is set, same `sessionPrefixArgs` rule as other session-scoped follow-ups); confirmations → exact `confirm <id>` and `deny <id>` choices; generic tab drift → `list-tabs-for-recovery` with `tab list` first, then select or confirm the stable target before running `snapshot -i`; about:blank or tab-drift recovery with a wrapper-known target → `list-tabs-for-about-blank-recovery` or `list-tabs-for-tab-drift-recovery`, plus `select-intended-tab-after-drift` and `snapshot-after-tab-recovery` when the wrapper already observed the stable `t<N>` tab id; `wait --text` assertion failures → `inspect-after-text-assertion-failure` with a read-only snapshot; download verification failures or missing successful download artifacts → `wait --download [path]`; saved artifacts → the artifact path to inspect/consume after checking `artifactVerification`/metadata; missing non-download artifacts → `verify-artifact-path` so agents do not trust an absent file. When nothing applies, the field is omitted.
 **Unknown-command getter hints (failure presentation):** `buildErrorPresentation` in `extensions/agent-browser/lib/results/presentation/errors.ts` only runs this path when upstream error text (after model-facing redaction) matches `unknown command`, `unknown subcommand`, or `unrecognized command` (case-insensitive) **and** the failed invocation’s primary command token is one of `attr`, `count`, `html`, `text`, `title`, `url`, or `value`. Visible text then includes a grouped-`get` hint line plus per-token guidance (`get text <selector>`, `get html …`, `get attr …`, `get count …`, `get value …`, `get title`, `get url`). Machine `nextActions` with ids `use-get-title` / `use-get-url` are emitted only for `title` / `url`, with `params.args` optionally prefixed by `--session <name>` when the failed call targeted a named session. If the error string already contains `Agent-browser hint:` from selector recovery (stale-ref or unsupported selector dialect appendages), the getter block is skipped so two stacked `Agent-browser hint:` headers are not emitted.
-For `network requests`, `details.nextActions` is bounded to one selected safe request ID, preferring actionable failed rows, then API/fetch-like rows, then benign failed rows, then the first request with a safe ID. Detail/filter/HAR actions use `params.args` and preserve a known `--session <name>` prefix when the current presentation has `details.sessionName`; source-candidate actions use `params.networkSourceLookup` with the selected `requestId` plus `session` when known and are only emitted for actionable failed rows that the failed-request analyzer can correlate. URLs and query strings are not copied into action params; path filters are skipped when they look sensitive or too large. If the wrapper has observed `network route` in the same session, matching pending fetch/XHR rows or CORS-looking errors also add `details.networkRouteDiagnostics[]` with `{ reason, routePattern, mode, requestId?, requestUrl?, summary }` and prepend executable route-mock next actions (`inspect-pending-routed-network-request`, `start-network-har-capture-for-route-mock`) before generic request follow-ups; same-origin/CORS fixture retry guidance stays in visible prose. The route tracker is wrapper-session-local, updated on successful `network route`/`network unroute`, and cleared when that session closes or is replaced.
+For `network requests`, `details.nextActions` is bounded to one selected safe request ID, preferring actionable failed rows, then API/fetch-like rows, then benign failed rows, then the first request with a safe ID. Detail/filter/HAR actions use `params.args` and preserve a known `--session <name>` prefix when the current presentation has `details.sessionName`; source-candidate actions use `params.networkSourceLookup` with the selected `requestId` plus `session` when known and are only emitted for actionable failed rows that the failed-request analyzer can correlate. A `clear-network-requests-before-repro` action can run `network requests --clear` so the next reproduction starts with a current-page-focused diagnostic buffer. Wrapper-side `network requests --current-page` / `--current-origin` renders only rows whose URL matches the active page origin, while `--current-url` renders exact active-document URL matches; those wrapper-only flags are removed before upstream spawn and reported in `details.networkRequestsPageFilter`. URLs and query strings are not copied into action params; path filters are skipped when they look sensitive or too large. If the wrapper has observed `network route` in the same session, matching failed, pending, or CORS-looking fetch/XHR rows also add `details.networkRouteDiagnostics[]` with `{ reason, routePattern, mode, requestId?, requestUrl?, summary }` and prepend executable route-mock next actions (`inspect-routed-network-request`, `start-network-har-capture-for-route-mock`) before generic request follow-ups; same-origin/CORS fixture retry guidance stays in visible prose. The route tracker is wrapper-session-local, updated on successful `network route`/`network unroute`, and cleared when that session closes or is replaced.
 For `batch`, each `batchSteps[]` entry can carry its own `nextActions` for that step’s success or failure. Top-level `details.nextActions` on a failed batch duplicates `batchFailure.failedStep.nextActions` so callers can read one aggregate object. On a fully successful batch, top-level `nextActions` may still list artifact follow-ups derived from the combined step artifacts.
@@ -679,9 +716,9 @@ For `batch`, each `batchSteps[]` entry can carry its own `nextActions` for that
 `clickDispatch` may appear after a **top-level non-Electron** `click` when the wrapper installed a pre-click DOM-event probe, upstream reported success, and the post-click probe found no trusted DOM event reached that target. The wrapper probes exact CSS selectors, XPath selectors, and `@e…` refs when the latest wrapper-tracked snapshot has role/name metadata that resolves to one visible DOM candidate; it does **not** take a fresh pre-click snapshot because that could recycle upstream refs before the intended click. The wrapper does **not** replay clicks in-page. On a miss it marks the tool failed, appends `Click dispatch diagnostic: …`, and sets `clickDispatch.status` to `"no-native-event-observed"` with `reason: "native-click-produced-no-target-dom-event"`, `nativeEventCount`, and a redacted `target` descriptor (`kind: "selector" | "xpath"` plus `selector`, or `kind: "accessible"` plus `refId`, `role`, and redacted `name`). `details.nextActions` gains `inspect-click-dispatch-miss` (`snapshot -i`) and `retry-click-after-dispatch-miss` (same upstream `click` argv, session-prefixed when applicable). If a local static fixture must be exercised despite this diagnostic, a caller may explicitly run a programmatic activation via `eval --stdin` such as `document.querySelector(...).click()`, but that emits an untrusted scripted event and is only a debugging/workaround path; it must not be used as proof that real user-like clicking works or to bypass prompt stop boundaries. This diagnostic is only for standalone `click`; `batch`/`job`/`qa` click steps remain upstream-owned batch behavior.
-`promptGuard` may appear on wrapper-blocked calls when the latest user prompt contains machine-recognizable safety or evidence requirements. `reason: "explicit-user-stop-boundary"` is a **best-effort click-like and Enter/Return keypress guard**, not a complete stop-boundary enforcer: it blocks likely final order/payment/submit targets (for example `Finish`, `place order`, `submit payment`, or matching selectors/`@ref` metadata) on standalone `click`/`dblclick`/`tap`, `find … click`, and matching `batch` steps, and it blocks standalone or batch `press <key>` / `key <key>` when the key is Enter or Return (keypress submits do not require a final-action label match). It does **not** block `eval`, generic `fill`/`type`/`select`, `keyboard type`/`keyboard inserttext`, non-Enter keypresses, or other scripted activation. Implementation: `extensions/agent-browser/lib/orchestration/browser-run/browser-action-model.ts` plus `prompt-guards.ts`. `reason: "requested-artifacts-missing-before-close"` blocks `close` / `quit` / `exit` when the prompt named exact required screenshot paths and the session artifact manifest has not verified those paths; optional recording paths are only required when recording appears available. Both guards return `failureCategory: "policy-blocked"` and `validationError` text instead of invoking upstream.
+`promptGuard` may appear on wrapper-blocked calls only for concrete, machine-checkable prompt requirements. `reason: "requested-artifacts-missing-before-close"` blocks `close` / `quit` / `exit` when the prompt named exact required screenshot paths and the session artifact manifest has not verified those paths; optional recording paths are only required when recording appears available. The wrapper does **not** parse broad user/business intent such as “do not place the order” or “do not post anything” into click/key blocks; agents must follow those instructions themselves. Prompt guards return `failureCategory: "policy-blocked"` and `validationError` text instead of invoking upstream.
-`overlayBlockers` may appear after a successful **top-level non-Electron** `click` (the unified `details.command` is `click`, not `batch`/`job`/`qa` flows that compile to `batch`) only when upstream JSON includes a string `data.clicked` ref, no `clickDispatch` diagnostic fired for the same result, the session’s prior pinned tab URL (`priorSessionTabTarget.url`) and the post-click active tab URL both exist and stay equal after the same URL normalization used for ref preflight (trimmed hosts/paths; **`#fragment` dropped** while the query string stays significant), and the wrapper did not apply session tab correction or an about-blank mismatch recovery in the same result. Wrapper-tracked Electron clicks prefer lifecycle health and ref-freshness diagnostics because desktop app chrome produced too many false overlay candidates in dogfood. The post-click side comes from `details.navigationSummary.url`, which the wrapper fills with one read-only `eval` summary (`({ title: document.title, url: location.href })`) only when upstream click JSON omits **both** string `data.url` and `data.title` (`shouldCaptureNavigationSummary` in `extensions/agent-browser/index.ts`). If either field is present as a string on the click payload, that probe is skipped, `navigationSummary` stays unset here, and overlay diagnostics are omitted even when the page did not navigate. The wrapper then issues **one** extra session-scoped `snapshot -i`, scans that snapshot’s `refs` map, and only emits diagnostics when **both** are true: at least one ref has a strong modal role (`dialog` or `alertdialog`), and there are up to **three** separate `button`/`link`/`menuitem` refs whose names match close/dismiss-style patterns (for example “Close”, “Dismiss”, “No thanks”, or a lone `×`). Page-wide text such as “privacy”, “sign in”, or “banner” without a dialog role is not enough, which avoids warning on ordinary same-page menu opens or app button mutations. Each candidate carries `ref` (`@eN`), optional `role`/`name`, exact `click` argv in `args`, and a short evidence `reason`. The struct also includes a `summary` string (one sentence stating that the click left the tab on the same normalized URL and the fresh snapshot shows likely dismiss controls) plus a `snapshot` object (same shape as `details.refSnapshot` after a normal snapshot): on success the wrapper may treat that snapshot as the session’s latest ref map for subsequent calls, so agents should assume refs can move to match this post-diagnostic tree. Visible text appends the same bullets under `Possible overlay blockers`, and `details.nextActions` gains `inspect-overlay-state` plus `try-overlay-blocker-candidate-1`…`3` after any presentation `nextActions` (for example `inspect-after-mutation`); when `details.sessionName` is set, those appended actions use `sessionPrefixArgs` so `params.args` begin with `--session <name>` unless argv already starts with `--session`. This is conservative evidence, not proof the candidate should be clicked; prefer `inspect-overlay-state` first unless the dismiss control is clearly safe.
+`overlayBlockers` may appear on a successful `snapshot` whose own refs show strong modal context, or after a successful **top-level non-Electron** `click` (the unified `details.command` is `click`, not `batch`/`job`/`qa` flows that compile to `batch`) only when upstream JSON includes a string `data.clicked` ref, no `clickDispatch` diagnostic fired for the same result, the session’s prior pinned tab URL (`priorSessionTabTarget.url`) and the post-click active tab URL both exist and stay equal after the same URL normalization used for ref preflight (trimmed hosts/paths; **`#fragment` dropped** while the query string stays significant), and the wrapper did not apply session tab correction or an about-blank mismatch recovery in the same result. Wrapper-tracked Electron clicks prefer lifecycle health and ref-freshness diagnostics because desktop app chrome produced too many false overlay candidates in dogfood. The post-click side comes from `details.navigationSummary.url`, which the wrapper fills with one read-only `eval` summary (`({ title: document.title, url: location.href })`) only when upstream click JSON omits **both** string `data.url` and `data.title` (`shouldCaptureNavigationSummary` in `extensions/agent-browser/index.ts`). If either field is present as a string on the click payload, that probe is skipped, `navigationSummary` stays unset here, and overlay diagnostics are omitted even when the page did not navigate. For post-click diagnostics, the wrapper then issues **one** extra session-scoped `snapshot -i`; for snapshot diagnostics, it scans that snapshot result directly. It only emits diagnostics when **both** are true: at least one ref has a strong modal role (`dialog` or `alertdialog`), and there are up to **three** separate `button`/`link`/`menuitem` refs whose names match close/dismiss-style patterns (for example “Close”, “Dismiss”, “No thanks”, or a lone `×`). Page-wide text such as “privacy”, “sign in”, or “banner” without a dialog role is not enough, which avoids warning on ordinary same-page menu opens or app button mutations. Each candidate carries `ref` (`@eN`), optional `role`/`name`, exact `click` argv in `args`, and a short evidence `reason`. The struct also includes a `summary` string (one sentence describing the snapshot/click evidence and likely dismiss controls) plus a `snapshot` object (same shape as `details.refSnapshot` after a normal snapshot): on success the wrapper may treat that snapshot as the session’s latest ref map for subsequent calls, so agents should assume refs can move to match this post-diagnostic tree. Visible text appends the same bullets under `Possible overlay blockers`, and `details.nextActions` gains `inspect-overlay-state` plus `try-overlay-blocker-candidate-1`…`3` after any presentation `nextActions` (for example `inspect-after-mutation`); when `details.sessionName` is set, those appended actions use `sessionPrefixArgs` so `params.args` begin with `--session <name>` unless argv already starts with `--session`. This is conservative evidence, not proof the candidate should be clicked; prefer `inspect-overlay-state` first unless the dismiss control is clearly safe.
 Example shape (fields vary by scenario):
@@ -739,8 +776,8 @@ Implementation and precedence:
 Additional structured fields can appear when relevant:
 - `compiledSemanticAction` when the call used `semanticAction` and the result includes the unified `details` merge: `{ action, locator, args }` for `find` actions or `{ action: "select", selector, values, args }` for `select`, with the same redaction rules as `args` / `effectiveArgs`; omitted for plain `args`/`job` calls and omitted on some early error returns that omit this field (see the `semanticAction` section above)
-- `compiledJob` when the call used `job` or the job-backed `qa` preset: `{ args: ["batch"], stdin, steps: [{ action, args }] }`, with step args redacted the same way as other invocation details
-- `compiledQaPreset` when the call used `qa`: the compiled job fields plus the QA `checks` object. `checks.attached` is `true` for current-session QA and `checks.url` is present only for URL-opening QA.
+- `compiledJob` when the call used `job` or the job-backed `qa` preset: `{ args: ["batch"], stdin, steps: [{ action, args }] }`, with step args redacted the same way as other invocation details. Semantic `job` click/fill steps appear here as their compiled upstream `find … click|fill …` argv, not as the input object.
+- `compiledQaPreset` when the call used `qa`: the compiled job fields plus the QA `checks` object. `args` is `batch --bail` and `failFast` is `true` for QA presets. `checks.attached` is `true` for current-session QA, `checks.url` is present only for URL-opening QA, and `checks.diagnosticsResetAtStart` is `true` only for URL-opening QA because `qa.attached` preserves existing session diagnostics.
 - `compiledSourceLookup` when the call used `sourceLookup`: `{ args: ["batch"], stdin, steps, query }` with the generated local-evidence plan and original query fields (`selector?`, `reactFiberId?`, `componentName?`, `includeDomHints?`, `maxWorkspaceFiles?`).
 - `sourceLookup` when the call used `sourceLookup`: `{ status, candidates, limitations, summary, workspaceRoot?, electronContext? }`; wrapper-tracked packaged Electron no-candidate diagnostics may carry `workspaceRoot` plus `electronContext` and live Electron nextActions without marking the successful batch as a tool failure.
 - `compiledNetworkSourceLookup` / `networkSourceLookup` when the call used `networkSourceLookup`: the generated batch plan plus bounded failed-request/candidate evidence as described above.
@@ -752,25 +789,29 @@ Additional structured fields can appear when relevant:
 - `navigationSummary` for navigation-style commands like `click`, `back`, `forward`, and `reload`
 - `pageChangeSummary` for compact mutation/artifact/navigation summaries on commands that can change browser state
 - `clickDispatch` when a top-level non-Electron `click` reported upstream success but the wrapper’s DOM-event probe found no trusted event reached the target; shape follows `ClickDispatchDiagnostic` in `extensions/agent-browser/lib/orchestration/browser-run/types.ts`
-- `promptGuard` when a prompt-derived guard blocks a likely final order/submit click or blocks browser close before required prompt artifact paths are verified; implementation lives in `extensions/agent-browser/lib/orchestration/browser-run/prompt-guards.ts`
-- `overlayBlockers` for conservative post-click overlay/banner/dialog blocker candidates when a direct click stays on the same URL, no `clickDispatch` diagnostic fired, and a fresh snapshot provides evidence (`candidates`, `summary`, and `snapshot` per `OverlayBlockerDiagnostic` in `extensions/agent-browser/index.ts`)
+- `promptGuard` when the requested-artifact-before-close guard blocks browser close before required prompt artifact paths are verified; implementation lives in `extensions/agent-browser/lib/orchestration/browser-run/prompt-guards.ts`
+- `overlayBlockers` for conservative overlay/banner/dialog blocker candidates when a successful snapshot itself contains strong modal evidence, or after a direct click stays on the same URL, no `clickDispatch` diagnostic fired, and a fresh snapshot provides evidence (`candidates`, `summary`, and `snapshot` per `OverlayBlockerDiagnostic` in `extensions/agent-browser/index.ts`)
 - `visibleRefFallback` after a raw `find` or compiled `semanticAction` fails with `selector-not-found` and a fresh snapshot finds exact role/name `@ref` matches. Shape follows `VisibleRefFallbackDiagnostic` in `extensions/agent-browser/lib/results/selector-recovery.ts`: `{ candidates, snapshot, summary, target }`, where each candidate has `ref`, `role`, `name`, optional direct ref `args`, and `reason`; visible text appends `Current snapshot ref fallback`. Non-fill candidates with direct args add `try-current-visible-ref` or numbered `try-current-visible-ref-N` actions. Fill candidates omit direct args and target text so recovery details do not repeat potentially sensitive fill text.
 - `refSnapshotInvalidation` after a session `snapshot` fails with `No active page`. Shape follows `SessionRefSnapshotInvalidation` in `extensions/agent-browser/lib/session-page-state.ts`: `{ reason: "no-active-page", summary }`. The wrapper deletes prior refs for that session, persists the invalidation for resume, and blocks mutation-prone `@e…` preflight with `failureCategory: "stale-ref"` until a successful fresh `snapshot -i` records refs again.
+- `snapshotFilter` after wrapper-side `snapshot -i --search <text>` or `snapshot -i --filter role=<role>`. Shape: `{ cleanArgs, search?, role?, matchedRefs, totalRefs, visibleLines, totalLines }`. The visible snapshot content is filtered, but `details.refSnapshot` still records the full upstream ref map for later stale-ref checks.
+- `snapshotViewport` after wrapper-side `snapshot --viewport` (with or without `-i`, `--search`, or `--filter`). Shape matches the scroll-position probe: viewport scroll offsets, inner/document dimensions, sampled scrollable-container count, and bounded container offsets. The wrapper strips `--viewport` before upstream spawn and gathers this with a read-only `eval --stdin` call.
+- `snapshotDiff` after wrapper-side `snapshot --diff` (with or without `-i`, `--search`, `--filter`, or `--viewport`). Shape: `{ addedRefs, removedRefs, changedRefs, unchangedRefs, summary }`, comparing ref ids plus role/name metadata from the previous wrapper-tracked snapshot for the session with the newly returned full ref map. It is a quick ref-map delta, not a visual diff.
+- `networkRequestsPageFilter` after wrapper-side `network requests --current-page`, `--current-origin`, or `--current-url`. Shape: `{ cleanArgs, currentUrl, mode, matchedRows, totalRows }`; the visible rows and `details.data.requests` / `items` / `entries` are filtered while the active session page target is read with `get url`.
 - `richInputRecovery` after a raw `find` or compiled `semanticAction` `fill` fails with `selector-not-found` and the same current-ref diagnostic finds exact editable `searchbox` / `textbox` candidates. Shape follows `RichInputRecoveryDiagnostic` in `extensions/agent-browser/lib/results/selector-recovery.ts`: `{ candidates, inputMethodHint, nextActionIds, summary, target }`, where each candidate has `ref`, `role`, `name`, `focusArgs`, `clickArgs`, and `reason`. Visible text appends `Rich input recovery`, and `details.nextActions` gains ids from `getAgentBrowserRichInputRecoveryNextActionIds`: `focus-current-editable-ref` / `click-current-editable-ref` (or numbered variants). These actions are bounded to focus/click/inspect-style recovery: they do not include the fill text, do not press `Enter`, and do not submit. After the right current editable ref is focused, the agent should use `keyboard inserttext` or `keyboard type` with the intended text in a separate call and submit only when explicitly required by the flow.
-- `scrollNoop` after a successful **top-level** `scroll` when wrapper-side read-only probes before and after the command show no change in `window.scrollX` / `window.scrollY` and no change in the sampled prominent scrollable containers. To avoid pre-launching a session without caller startup state, this probe is skipped when the invocation includes startup-scoped flags such as `--profile`, `--state`, `--session-name`, `--cdp`, providers, init scripts, or similar launch settings. Shape: `{ reason: "no-observed-scroll-position-change", message, before, after, recommendations }`; `before` / `after` include viewport dimensions, document scroll dimensions, and up to ten sampled container descriptors plus scroll offsets. Container descriptors use only sample index, tag name, and ARIA role; DOM ids/classes are intentionally not stored. This diagnostic is conservative evidence that the page-level scroll likely missed a nested pane, not proof that every app-specific region is unchanged. Visible text appends `Scroll diagnostic: no observed scroll movement`, and `details.nextActions` gains `inspect-after-noop-scroll` (`snapshot -i`) plus `verify-noop-scroll-visually` (`screenshot`), session-prefixed when applicable.
+- `scrollNoop` after a successful **top-level** `scroll` when wrapper-side read-only probes before and after the command show no change in `window.scrollX` / `window.scrollY` and no change in the sampled prominent scrollable containers. To avoid pre-launching a session without caller startup state, this probe is skipped when the invocation includes startup-scoped flags such as `--profile`, `--state`, `--session-name`, `--cdp`, providers, init scripts, or similar launch settings. Shape: `{ reason: "no-observed-scroll-position-change", message, before, after, recommendations }`; `before` / `after` include viewport dimensions, document scroll dimensions, and up to ten sampled container descriptors plus scroll offsets. Container descriptors use only sample index, tag name, and ARIA role; DOM ids/classes are intentionally not stored. This diagnostic is conservative evidence that the page-level scroll likely missed a nested pane, not proof that every app-specific region is unchanged. Visible text starts with `Scroll completed with no observed movement`, appends `Scroll diagnostic: no observed scroll movement`, sets `details.data.scrolled` to `false` / `details.data.noMovement` to `true`, and `details.nextActions` gains `inspect-after-noop-scroll` (`snapshot -i`) plus `verify-noop-scroll-visually` (`screenshot`), session-prefixed when applicable. Explicit CSS-container calls of the form `scroll <selector> <up|down|left|right> [px|percent]` are handled before upstream page scroll and report `details.scrollContainer` with before/after offsets. Explicit page-position calls `scroll to end` / `scroll to top` are handled against `document.scrollingElement` before upstream page scroll and report `details.scrollPage` with before/after offsets.
 - `comboboxFocus` after a successful explicit combobox-targeted `click` / `fill` / `find … click|fill` (for example `semanticAction` with role `combobox`, including when that semantic action resolves through a current visible `@ref` before execution) when a read-only probe sees the active element is combobox-like, `aria-expanded` is explicitly present (`false` or `true`), and no visible `listbox` / `option` / menu option elements are open. Shape: `{ reason: "focused-combobox-without-visible-options", message, activeElement, visibleListboxCount, visibleOptionCount, recommendations }`; `activeElement` includes bounded role/tag/expanded/hasPopup/name metadata with normal text redaction. Visible text appends `Combobox diagnostic: focused combobox did not expose visible options`, and `details.nextActions` gains `inspect-focused-combobox` (`snapshot -i`), `try-open-combobox-with-arrow` (`press ArrowDown`), and `try-open-combobox-with-enter` (`press Enter`), session-prefixed when applicable. The diagnostic is deliberately gated to explicit combobox-targeted calls to avoid extra probes or false positives on ordinary clicks/textboxes.
 - `recordingDependencyWarning` after a successful `record start` or `record restart` when the wrapper cannot find an executable `ffmpeg` on the Pi process `PATH`. Shape: `{ reason: "ffmpeg-missing-for-recording", dependency: "ffmpeg", command, message, recommendations }`. Visible text appends `Recording dependency warning: ffmpeg not found on PATH`. This is a non-blocking preflight warning: upstream may start recording, but `record stop` needs `ffmpeg` to encode the WebM.
 - `selectorTextVisibility` after a **successful** upstream `get text <selector>` (standalone or inside a successful `batch`) when the wrapper’s follow-up probe finds a hazard: more than one DOM match (upstream reads the first `querySelectorAll` hit, which may be the wrong tab/panel), or the first match is hidden while at least one other match is visible (requires multiple DOM nodes so a visible peer exists; a lone hidden match is not flagged). The probe is a read-only `eval --stdin` script (`buildVisibleTextProbeScript` in `extensions/agent-browser/lib/orchestration/browser-run/diagnostics.ts`) that counts matches, applies a small visibility heuristic (`display`/`visibility`/`opacity` plus non-zero client rects), may include a redacted `firstVisibleTextPreview`, and may include up to eight `visibleCandidates` entries (`index` in `querySelectorAll`, `tagName`, optional `role`, optional redacted `textPreview`). It is **not** run for page-scoped `@e…` selectors or when the selector string is withheld because `selectorMayExposeSensitiveLiteral` would risk echoing secrets in probe output. `details.selectorTextVisibility` mirrors the primary diagnostic (first sorted entry); when several selectors in one `batch` qualify, `selectorTextVisibilityAll` lists every diagnostic sorted so hidden-first cases precede generic multi-match ambiguity. Appended visible warning text names the matching `details.nextActions` id and may list visible candidate previews. Appended `details.nextActions` use ids `inspect-visible-text-candidates` and `inspect-visible-text-candidates-2`, … with the probe replayed via `eval --stdin` for each hazardous selector. If the probe still leaves more than one visible candidate, it is only ambiguity evidence; agents should narrow the selector, use a current visible `@ref`, or run a targeted visible-element `eval --stdin` rather than trusting the broad selector.
 - `electronGetTextScopeWarning` after a successful wrapper-tracked attached Electron `get text <selector>` (standalone or successful `batch`) when a broad non-ref CSS selector such as `body`, `html`, `main`, `div`, or `[role=application]` may read the whole app shell. Ordinary browser pages, including `file://` fixtures, do not qualify without wrapper-owned Electron launch provenance. Shape: `{ selector, summary, electronContext: { launchId?, sessionName?, url? } }`; multiple batched diagnostics use `electronGetTextScopeWarnings`. Visible text appends `Broad Electron get text selector warning`, and next actions use `snapshot-for-electron-text-scope` ids with session-scoped `snapshot -i` payloads.
 - `evalStdinHint` after a successful `eval --stdin` when caller stdin (trimmed) looks function-shaped to the wrapper’s lightweight detector (in `extensions/agent-browser/lib/orchestration/browser-run/diagnostics.ts`: leading `function` / `async function`, parenthesized arrow `(…) =>`, or a concise `name =>` / `async name =>` form) **and** upstream JSON `data` is an object whose `result` field is a plain empty object (`{}`). Arrays such as `[]` do not qualify. It includes `reason` and `suggestion`; visible output appends `Eval stdin hint` with the same guidance. This is a heuristic for the common mistake of returning a function object instead of invoking it or passing a plain expression, not a JavaScript parser or proof that the page returned no useful data. Before this diagnostic path runs, the wrapper also recovers the common malformed native-tool call `args: ["eval", "--stdin", "..."]` with no top-level `stdin` by moving trailing `args` tokens after `--stdin` into the process stdin stream.
 - `evalResultWarning` after a successful `eval --stdin` when the current or prior page URL is `file:` (from navigation summary, session tab target, or persisted session page state), upstream JSON `data.result` is strictly `null`, and stdin is non-empty and not a trivial literal `null`/`undefined`. Fields: `reason`, `suggestion`. Visible output appends `Eval result warning` without failing the tool. Use snapshot -i, ref-based getters, screenshots, or http(s) fixtures when file:// null results are inconclusive.
-- `timeoutPartialProgress` after `runAgentBrowserProcess` reports `timedOut` (wrapper child-process watchdog) when best-effort recovery finds useful context. `summary` is a short sentence counting how many declared artifact paths exist on disk versus how many were scanned, and whether page context came from live session reads or only from a planned URL (when nothing in the plan declares an artifact path, the fraction may read `0/0` while `currentPage` can still carry session or planned URL context). `steps` lists planned argv from the compiled `job` or `qa` batch plan (`compiledJob` in `extensions/agent-browser/index.ts`, which is only populated for those top-level modes) or, when that object is absent, from the same JSON-array `batch` stdin the tool sends upstream—whether caller-authored or wrapper-generated for `sourceLookup` / `networkSourceLookup` (1-based indices; only JSON-array stdin whose elements are string[] argv arrays is parsed); timeouts on other argv shapes may still emit `currentPage` / summary evidence without `steps`. `currentPage` comes from session-scoped `get url` / `get title` when the session answers, otherwise a fallback URL may be inferred from the last `open` / `navigate` / `pushstate` step in the plan. `artifacts` covers declared output paths on `screenshot`, `pdf`, `download`, and `wait --download` steps (absolute path, existence, optional `sizeBytes`, `stepIndex`). Visible text repeats the same block under `Timeout partial progress`, applying URL and path-segment redaction; the prose `Planned steps` list shows at most six steps, then an omitted-count line when the plan is longer. This is recovery evidence only; missing entries do not prove the upstream step never ran or that no other side effects occurred.
-- `managedSessionOutcome` after a managed-session plan reaches process execution (`buildManagedSessionOutcome` / `formatManagedSessionOutcomeText` in `extensions/agent-browser/lib/orchestration/browser-run/session-state.ts`). Populated when `buildExecutionPlan` injects an extension-managed implicit or fresh `--session`, and also when a successful explicit `--session <current-wrapper-managed-session> close` closes the current managed session. It remains omitted for unrelated explicit user-managed sessions and for sessionless inspection/local paths that skip injection. Fields: `status` (`created`, `replaced`, `unchanged`, `closed`, `preserved`, or `abandoned`), `sessionMode`, `attemptedSessionName`, `previousSessionName`, `currentSessionName`, optional `replacedSessionName`, `activeBefore`, `activeAfter`, `succeeded`, and `summary` (machine-oriented; may include generated session names). Model-visible echo: only when `sessionMode` is `"fresh"` **and** `succeeded` is false, the wrapper appends action-oriented `Managed session outcome` and `Recovery` lines without repeating generated session ids in visible prose; session names remain in `details.managedSessionOutcome`. Failed fresh launches may also append `details.nextActions` such as `run-agent-browser-doctor`, `verify-current-managed-session`, `snapshot-current-managed-session`, or `retry-fresh-managed-session`. When other trailing diagnostic prose is also emitted in the same result, that block is concatenated **after** semantic-action candidate lines, overlay/selector-visibility tails, eval hints/warnings, and `Timeout partial progress` (see `rawAppendedDiagnosticText` in `extensions/agent-browser/lib/orchestration/browser-run/final-result.ts`). For `"auto"` failures the same struct may appear on `details` without that extra line. When post-upstream analysis (for example **`qa`** preset failure) flips the overall tool result after a successful batch, the implementation only realigns `managedSessionOutcome.succeeded` to the final outcome; `status`/`summary` may still describe the managed-session transition (for example `replaced` while `failureCategory` is `qa-failure`). In that case the visible recovery says the fresh launch became current and points to `failureCategory` / `qaPreset` / `batchFailure` for the post-launch failure.
+- `timeoutPartialProgress` after `runAgentBrowserProcess` reports `timedOut` (wrapper child-process watchdog) when best-effort recovery finds useful context. `summary` is a short sentence counting recovered planned-step state and declared artifact paths, plus whether page context came from live session reads or only from a planned URL (when nothing in the plan declares an artifact path, the fraction may read `0/0` while `currentPage` can still carry session or planned URL context). `steps` lists planned argv from the compiled `job` or `qa` batch plan (`compiledJob` in `extensions/agent-browser/index.ts`, which is only populated for those top-level modes) or, when that object is absent, from the same JSON-array `batch` stdin the tool sends upstream—whether caller-authored or wrapper-generated for `sourceLookup` / `networkSourceLookup` (1-based indices; only JSON-array stdin whose elements are string[] argv arrays is parsed). Generated rows such as `open.loadState` waits may include `generatedFrom`. Each step includes `status` (`completed`, `failed`, `pending`, or `unknown`) and optional `reason`; the first incomplete step becomes `retryStep`, but `retry.args` and top-level `retry-timeout-step` are emitted only for read-only or idempotent commands such as waits, snapshots, screenshots, navigation, and diagnostics. Mutating steps such as clicks, fills, keyboard typing, presses, selects, or checks are still identified as the first incomplete step but omit executable retry args because they may already have run. When a retryable step timed out during `sessionMode: "fresh"` and no live URL was recovered, `retry-timeout-step` uses top-level `sessionMode: "fresh"` instead of prefixing the abandoned generated session name. `currentPage` comes from session-scoped `get url` / `get title` when the session answers, otherwise a fallback URL may be inferred from the last `open` / `navigate` / `pushstate` step in the plan; `liveUrlRecovered` is true only when the wrapper recovered a live URL, so planned URLs are not treated as proof that the page actually opened. `openedButPostOpenTimedOut` is set when a live opened page was recovered and a later step appears to have timed out. `artifacts` covers declared output paths on `screenshot`, `pdf`, `download`, and `wait --download` steps (absolute path, existence, `state`, optional `sizeBytes`, `stepIndex`). Visible text repeats the same block under `Timeout partial progress`, applying URL and path-segment redaction; the prose `Planned steps` list shows at most six steps, then an omitted-count line when the plan is longer. This is recovery evidence only; missing entries do not prove the upstream step never ran or that no other side effects occurred.
+- `managedSessionOutcome` after a managed-session plan reaches process execution (`buildManagedSessionOutcome` / `formatManagedSessionOutcomeText` in `extensions/agent-browser/lib/orchestration/browser-run/session-state.ts`). Populated when `buildExecutionPlan` injects an extension-managed implicit or fresh `--session`, and also when a successful explicit `--session <current-wrapper-managed-session> close` closes the current managed session. It remains omitted for unrelated explicit user-managed sessions and for sessionless inspection/local paths that skip injection. Fields: `status` (`created`, `replaced`, `unchanged`, `closed`, `preserved`, or `abandoned`), `sessionMode`, `attemptedSessionName`, `previousSessionName`, `currentSessionName`, optional `replacedSessionName`, `activeBefore`, `activeAfter`, `succeeded`, and `summary` (machine-oriented; may include generated session names). Model-visible echo: only when `sessionMode` is `"fresh"` **and** `succeeded` is false, the wrapper appends action-oriented `Managed session outcome` and `Recovery` lines without repeating generated session ids in visible prose; session names remain in `details.managedSessionOutcome`. Failed fresh launches may also append `details.nextActions` such as `run-agent-browser-doctor`, `verify-current-managed-session`, `snapshot-current-managed-session`, or `retry-fresh-managed-session`. When other trailing diagnostic prose is also emitted in the same result, that block is concatenated **after** semantic-action candidate lines, overlay/selector-visibility tails, eval hints/warnings, and `Timeout partial progress` (see `rawAppendedDiagnosticText` in `extensions/agent-browser/lib/orchestration/browser-run/final-result.ts`). For `"auto"` failures the same struct may appear on `details` without that extra line. When post-upstream analysis (for example **`qa`** preset failure) flips the overall tool result after a successful batch, or a fresh `job`/batch opens the requested page and then a later step fails, the managed-session transition still reflects that the fresh browser became current. The visible recovery says the fresh launch became current and points to `failureCategory` / `qaPreset` / `batchFailure` for the post-launch failure instead of telling the agent that the old session was preserved.
 - `imagePath` / `imagePaths` for Pi inline image attachments from the **`screenshot`** command (including batched screenshot steps). **`diff screenshot`** still records the diff output as an `image`-kind entry in `details.artifacts`, but it does **not** populate `imagePath` / `imagePaths` or attach an inline image: only plain `screenshot` is treated as a trusted live-capture path for automatic inlining (`isTrustedScreenshotOutput` in `extensions/agent-browser/lib/results/presentation/artifacts.ts`).
-- `artifacts` for upstream saved files such as screenshots, `state save` outputs, `diff screenshot` diff images, PDFs, downloads, `wait --download` files, traces, CPU profiles, completed WebM recordings, path-bearing HAR captures, and future recording output paths reported by `record start`. Each artifact includes the original saved or requested `path`, resolved `absolutePath`, `kind`/`artifactType`, optional `mediaType`, optional `extension`, best-effort disk metadata such as `exists` and `sizeBytes`, plus `requestedPath`, `status`, `cwd`, `session`, and `tempPath` when applicable.
-- `savedFilePath` / `savedFile` for direct `download`, `pdf`, and `wait --download` saved-file workflows; batch results preserve the same fields on the relevant `batchSteps` entry.
-- `batchSteps[].artifacts` for per-step artifacts in `batch` output; top-level `artifacts` aggregates all step artifacts in order
-- `artifactVerification` for a normalized verification summary on the unified result and on each successful `batchSteps[]` row (failed batch steps omit artifact rows). Top-level `batch` verification rolls up all step file artifacts; each step’s summary reflects that step’s nested tool presentation (including its spill paths and manifest slice). It reports `verified`, `verifiedCount`, `missingCount`, `pendingCount`, `unverifiedCount`, and `artifacts[]` entries with `path`, optional `absolutePath`, optional `requestedPath`, `kind` (a normal file artifact kind or `"spill"` for manifest-backed rows), optional `mediaType`, optional `exists`, optional `sizeBytes`, optional `status`, optional `retentionState` / `storageScope` on manifest-derived rows, `state` (`verified`, `missing`, `pending`, or `unverified`), and optional `limitation` (human-readable lifecycle or retention context, for example pending `record start`, missing or unverified files, ephemeral spill files, or evicted persisted spills). The summary `verified` boolean is true only when every entry is `verified`. `record start` is `pending` until `record stop`; `state load` may mention a path in command output but is not a saved artifact row.
+- `artifacts` for saved files such as screenshots, `state save` outputs, `diff screenshot` diff images, PDFs, downloads, `wait --download` files, traces, CPU profiles, completed WebM recordings, path-bearing HAR captures, and future recording output paths reported by `record start` / `record restart`. Non-file URL payloads such as `data:` / `blob:` / `http(s):` values are not treated as verified local artifacts. For direct artifact commands and batch artifact steps, the wrapper creates parent directories for requested paths before spawning upstream. Each artifact includes the original saved or requested `path`, resolved `absolutePath`, `kind`/`artifactType`, optional `mediaType`, optional `extension`, best-effort disk metadata such as `exists` and `sizeBytes`, plus `requestedPath`, `status`, `cwd`, `session`, and `tempPath` when applicable. Pending `record start` / `record restart` artifacts use `status: "pending"`, omit `exists` rather than reporting false, and include `recordingState: "openRecording"` / `willExistOnStop: true`.
+- `savedFilePath` / `savedFile` for direct `download`, `pdf`, and `wait --download` saved-file workflows when a host file path is reported or wrapper-verified. Batch results preserve the same fields on the relevant `batchSteps` entry. These fields are metadata only until `artifactVerification` verifies the file. For simple loopback `download <selector> <path>` anchors with a non-ref selector, `details.downloadRecovery.method: "direct-anchor-fetch"` means the wrapper resolved the anchor URL in-session and saved the in-page HTTP(S) response directly to the requested path before using upstream's click/download fallback; non-loopback/profile downloads stay upstream-owned so external provider behavior is preserved.
+- `batchSteps[].artifacts` for per-step artifacts in `batch` output; top-level `artifacts` aggregates all step artifacts in order. `record restart` can include both the previous recording that was finalized by the restart and the new pending recording.
+- `artifactVerification` for a normalized verification summary on the unified result and on each successful `batchSteps[]` row (failed batch steps omit artifact rows). Top-level `batch` verification rolls up all step file artifacts; each step’s summary reflects that step’s nested tool presentation (including its spill paths and manifest slice). It reports `verified`, `verifiedCount`, `missingCount`, `pendingCount`, `unverifiedCount`, and `artifacts[]` entries with `path`, optional `absolutePath`, optional `requestedPath`, `kind` (a normal file artifact kind or `"spill"` for manifest-backed rows), optional `mediaType`, optional `exists`, optional `sizeBytes`, optional `status`, optional `retentionState` / `storageScope` on manifest-derived rows, `state` (`verified`, `missing`, `pending`, or `unverified`), and optional `limitation` (human-readable lifecycle or retention context, for example pending `record start` / `record restart`, missing or unverified files, ephemeral spill files, or evicted persisted spills). The summary `verified` boolean is true only when every entry is `verified`. `record start` / `record restart` are `pending` until `record stop`; `state load` may mention a path in command output but is not a saved artifact row.
 - `fullOutputPath` / `fullOutputPaths` when large snapshot output or other oversized tool output is compacted and spilled to a private file; persisted sessions keep that path under a private session-scoped artifact directory with a bounded per-session budget so it survives reload/resume without unbounded growth
 - `artifactManifest` for a bounded, metadata-only inventory of recent session artifacts. Entries include path metadata, artifact `kind`, source `command`/`subcommand` when safe, `storageScope` (`persistent-session`, `process-temp`, or `explicit-path`), and `retentionState` (`live`, `ephemeral`, `missing`, or `evicted`). The default recent window is 100 entries and can be configured with `PI_AGENT_BROWSER_SESSION_ARTIFACT_MANIFEST_MAX_ENTRIES`. The manifest must not store command args, output contents, headers, DOM snapshots, or downloaded file contents.
 - `artifactRetentionSummary` with a concise count of live, evicted, ephemeral, and missing artifacts from the current manifest; results append this summary to model-facing text only when retention state affects recovery, such as spill files, ephemeral files, or evictions. Routine explicit saved files keep the summary in details to avoid noisy browsing transcripts.
@@ -791,8 +832,8 @@ The TUI renderer is user-facing only. It may compact or colorize what the human
 Worth doing in v1:
 - screenshots → saved-path summary, visible artifact metadata, `details.artifacts` metadata, and inline image attachment when safe; screenshot paths that upstream would treat ambiguously, such as `.dogfood/run/foo.png`, are normalized to absolute paths before launch and repaired from upstream temp output when possible
-- file artifacts such as PDFs, downloads, `wait --download` files, `state save` state files, diff screenshot output images, traces, CPU profiles, completed WebM recordings, and path-bearing HAR captures → concise saved-path summaries plus metadata in `details.artifacts` and bounded recent metadata in `details.artifactManifest`; `record start` reports recording lifecycle state and the future output path without adding a missing manifest entry; upstream needs `ffmpeg` on `PATH` for `record stop` to encode the WebM, and successful `record start` / `record restart` calls may also expose `details.recordingDependencyWarning` when the wrapper cannot find `ffmpeg`; direct saved-file workflows also expose `details.savedFilePath` / `details.savedFile`; large or binary artifacts are not inlined into model context; the recent manifest cap can age out explicit-file metadata but does not remove explicit saved files from disk
-- `diff screenshot` → same file-artifact pattern as above for the **diff** image path only (summary text uses “Saved diff image”); baseline paths and other fields stay in the structured payload but are not echoed as separate saved artifacts in the visible artifact block, and there is no Pi inline image attachment for the diff output
+- file artifacts such as PDFs, downloads, `wait --download` files, `state save` state files, diff screenshot output images, traces, CPU profiles, completed WebM recordings, and path-bearing HAR captures → concise saved-path summaries plus metadata in `details.artifacts` and bounded recent metadata in `details.artifactManifest`; `record start` / `record restart` report recording lifecycle state and the future output path without adding a missing manifest entry, and `record restart` can also report the previous wrapper-known recording that was finalized by the restart; upstream needs `ffmpeg` on `PATH` for `record stop` to encode the WebM, and successful `record start` / `record restart` calls may also expose `details.recordingDependencyWarning` when the wrapper cannot find `ffmpeg`; direct saved-file workflows also expose `details.savedFilePath` / `details.savedFile`; large or binary artifacts are not inlined into model context; the recent manifest cap can age out explicit-file metadata but does not remove explicit saved files from disk
+- `diff screenshot` → same file-artifact pattern as above for the **diff** image path only (summary text uses “Saved diff image” only when the diff output exists; missing output says “Diff image reported; file not verified” and fails as `artifact-missing`); baseline paths and other fields stay in the structured payload but are not echoed as separate saved artifacts in the visible artifact block, and there is no Pi inline image attachment for the diff output
 - `state load` → completion text may mention the loaded path, but the wrapper does **not** treat that path as a new saved artifact (`artifacts` / `artifactManifest` stay unset) the way `state save` does
 - auth, cookies, storage, clipboard, dialog, frame, state, network, debug, diff, stream, dashboard, chat, and other structured results → concise summaries that avoid expanding secret-bearing payloads; credential-like keys, values, URLs, body snippets, bearer/basic credentials, clipboard write text, cookie values, and likely secret storage values are redacted before model-facing output and `details.data`, while benign primitive storage values may remain visible for local QA
 - TUI display → custom `agent_browser` call/result rendering with colorized command/output text and a built-in-style collapsed view for long visible output; `ctrl+o` expansion reveals the full rendered tool result without changing the model-facing content
@@ -802,7 +843,7 @@ Worth doing in v1:
 - navigation actions like `click`, `back`, `forward`, and `reload` → lightweight post-action title/url summary when available
 - tab lists → compact summary/table
 - stream status → enabled/connected/port summary plus WebSocket URL and frame format when a port is known; `stream enable` errors that only say streaming is already enabled are normalized to a successful idempotent no-op with `details.data.alreadyEnabled: true` and status/disable nextActions; if the caller explicitly passed `--json`, visible text is valid JSON instead of a prose summary
-- diagnostic/status families (`session`, `session list`, `profiles`, `doctor`, `auth list`/`show`, `cookies`, `storage`, `dialog`, `frame`, `state`, `network requests`, `console`, `errors`, and dashboard start/stop/status outputs) → compact readable summaries with counts and stable fields; network request lists include an actionable-vs-benign failed-request summary and mark low-impact browser icon failures separately; active route mocks can add pending/CORS route diagnostics; request-detail URLs from `network request` remain diagnostic-only rather than session page targets; large log/request/error outputs use previews plus `fullOutputPath` spill files; sensitive nested auth/header/token fields are not expanded in the model-facing text
+- diagnostic/status families (`session`, `session list`, `profiles`, `doctor`, `auth list`/`show`, `cookies`, `storage`, `dialog`, `frame`, `state`, `network requests`, `console`, `errors`, and dashboard start/stop/status outputs) → compact readable summaries with counts and stable fields; `doctor` renders status/check/fix rows even when upstream puts those fields at the top level of its JSON envelope; `session list` and `tab list` keep names/labels/active markers/titles/URLs readable instead of opaque generated ids only; `network requests` and `console` previews label their scope as the upstream session aggregate unless upstream or a URL-opening QA preset explicitly cleared/filtered the buffers first; network request lists include an actionable-vs-benign failed-request summary and mark low-impact browser icon failures separately; active route mocks can add failed/pending/CORS route diagnostics; `data:image` artifact request rows are hidden from compact previews while preserved in raw details; request-detail URLs from `network request` remain diagnostic-only rather than session page targets; large log/request/error outputs use previews plus `fullOutputPath` spill files; sensitive nested auth/header/token fields are not expanded in the model-facing text
 - trace/profiler owner conflicts → when the wrapper has observed one owner active for a session, block conflicting starts/stops with "wrapper believes ..." wording because upstream or external CLI use can desynchronize wrapper-local state
 ## Missing binary behavior
@@ -839,7 +880,7 @@ If `agent-browser` is not on `PATH`, fail with a message that:
 - If a known session target unexpectedly reports about:blank, agent_browser best-effort re-selects the prior intended target when it still exists; if recovery fails, it records the observed about:blank target and reports exact recovery guidance instead of treating the prior page as active.
 <!-- agent-browser-playbook:end wrapper-tab-recovery -->
 - on local Unix launches, set a short private socket directory for wrapper-spawned `agent-browser` processes so extension-generated session names do not fail the upstream Unix socket-path length limit in longer cwd/session-name combinations
-- keep wrapper-spawned commands below the upstream CLI IPC read-timeout budget by clamping `AGENT_BROWSER_DEFAULT_TIMEOUT` to 25 seconds and stopping a stuck child process before the upstream 30-second retry path begins (`PI_AGENT_BROWSER_PROCESS_TIMEOUT_MS` configures the watchdog); timed-out compiled `job` / `qa` or caller `batch` calls may add `details.timeoutPartialProgress` and visible `Timeout partial progress` evidence with planned steps, current page title/URL, and declared artifact path checks
+- keep wrapper-spawned commands bounded by clamping `AGENT_BROWSER_DEFAULT_TIMEOUT` to 25 seconds while the default wrapper child-process watchdog is 35 seconds (`PI_AGENT_BROWSER_PROCESS_TIMEOUT_MS` overrides it, and top-level `timeoutMs` overrides it per call for browser CLI subprocesses). This lets normal calls survive the upstream 30-second IPC retry window while still stopping stuck children. Dialog commands use `PI_AGENT_BROWSER_DIALOG_PROCESS_TIMEOUT_MS` (default 5000 ms), and click/tap/find refs or tokens plus `eval --stdin` snippets whose text looks like alert/confirm/prompt/dialog triggers use `PI_AGENT_BROWSER_DIALOG_TRIGGER_PROCESS_TIMEOUT_MS` (default 8000 ms). Timed-out compiled `job` / `qa` or caller `batch` calls may add `details.timeoutPartialProgress` and visible `Timeout partial progress` evidence with per-step status, retry payloads, current page title/URL, and declared artifact path checks; timed-out dialog-like commands may add dialog status/dismiss/fresh-session recovery next actions
 - interactive or long-running upstream families such as `chat` without a prompt, `dashboard start`, `stream enable`, `trace start`, `profiler start`, `record start`, `inspect`, `install`, `upgrade`, `doctor --fix`, and `confirm-interactive` are passed through thinly but remain bounded by the same wrapper timeout/session planning rules; prefer explicit arguments, single-shot `chat <message>`, non-interactive flags like `doctor --offline --quick` or `doctor --json`, and cleanup pairs such as `dashboard stop`, `stream disable`, `trace stop`, `profiler stop`, and `record stop`
 - treat successful plain-text inspection commands like `--help` and `--version` as stateless: do not inject the implicit managed session and do not let those calls claim the managed-session slot
 - if startup-scoped flags like `--profile`, `--executable-path`, `--session-name`, `--cdp`, `--state`, `--auto-connect`, `--init-script`, `--enable`, `-p` / `--provider`, or iOS `--device` are supplied after the implicit session is already active while `sessionMode` is `"auto"`, return a validation error with a structured recovery hint that recommends `sessionMode: "fresh"`

package/docs/platform-smoke.md CHANGED Viewed

@@ -17,7 +17,7 @@ crabbox list --provider local-container
 crabbox list --provider parallels
 ```
-`smoke:platform:all` also runs `smoke:platform:doctor` before any target suite starts, so the explicit doctor step is a readable release checklist step rather than a hidden precondition. The canonical `npm run verify -- release` gate also runs the same platform doctor and full `macos,ubuntu,windows-native` matrix after default verification and packaged Pi smoke, so `npm publish` cannot pass `prepublishOnly` without the platform gate. After the matrix, inspect `.artifacts/platform-smoke/<run-id>/...` summaries and manifests; a green Crabbox exit without matching suite assertions is not release proof. Use provider-specific `crabbox list` commands for cleanup review because this host may have unrelated Crabbox providers configured that require credentials.
+`smoke:platform:all` also runs `smoke:platform:doctor` before any target suite starts, so the explicit doctor step is a readable release checklist step rather than a hidden precondition. The canonical `npm run verify -- release` gate also runs the configured-source lifecycle harness, then the same platform doctor and full `macos,ubuntu,windows-native` matrix after default verification and packaged Pi smoke, so `npm publish` cannot pass `prepublishOnly` without lifecycle and platform gates. After the matrix, inspect `.artifacts/platform-smoke/<run-id>/...` summaries and manifests; a green Crabbox exit without matching suite assertions is not release proof. Use provider-specific `crabbox list` commands for cleanup review because this host may have unrelated Crabbox providers configured that require credentials.
 Per-target commands are for diagnosis:
@@ -58,7 +58,8 @@ PLATFORM_SMOKE_MAC_WORK_ROOT="/Users/$USER/crabbox/pi-agent-browser-native"
 PLATFORM_SMOKE_MAC_PORT=22
 # Default local image built by npm run smoke:platform:ubuntu-image.
-PLATFORM_SMOKE_UBUNTU_IMAGE="pi-agent-browser-native-platform:node24-agent-browser0.27.1"
+# The tag suffix is derived from scripts/agent-browser-capability-baseline.mjs.
+PLATFORM_SMOKE_UBUNTU_IMAGE="pi-agent-browser-native-platform:node24-agent-browser<baseline-version>"
 PLATFORM_SMOKE_WINDOWS_VM="pi-extension-windows-template"
 PLATFORM_SMOKE_WINDOWS_SNAPSHOT="crabbox-ready"
@@ -69,7 +70,7 @@ PLATFORM_SMOKE_WINDOWS_WORK_ROOT="C:\\crabbox\\pi-agent-browser-native"
 PLATFORM_SMOKE_AUTH_ENV=""
 ```
-The Ubuntu target image is derived from `node:24-bookworm`, installs `agent-browser@0.27.1`, installs Debian Chromium through apt, creates a non-root `circleci` user, and sets `AGENT_BROWSER_EXECUTABLE_PATH=/usr/bin/chromium`. Rebuild it after upstream rebaselining, or override `PLATFORM_SMOKE_UBUNTU_IMAGE` with an equivalent prepared local image. Do not install `agent-browser` ad hoc inside the Ubuntu smoke command; a missing tool is image/template drift.
+The Ubuntu target image is derived from `node:24-bookworm`, installs the `agent-browser` version from [`scripts/agent-browser-capability-baseline.mjs`](../scripts/agent-browser-capability-baseline.mjs), installs Debian Chromium through apt, creates a non-root `circleci` user, and sets `AGENT_BROWSER_EXECUTABLE_PATH=/usr/bin/chromium`. Rebuild it after upstream rebaselining, or override `PLATFORM_SMOKE_UBUNTU_IMAGE` with an equivalent prepared local image. Do not install `agent-browser` ad hoc inside the Ubuntu smoke command; a missing tool is image/template drift.
 The configured upstream `agent-browser` baseline is imported from [`scripts/agent-browser-capability-baseline.mjs`](../scripts/agent-browser-capability-baseline.mjs). Target-local browser suites verify that exact `agent-browser` version before running. Bake the exact upstream CLI and browser runtime into the Windows template/snapshot for speed and reproducibility; missing or stale Windows `agent-browser` / browser readiness is a blocked setup, not something the smoke command repairs. The Windows browser suite checks the preinstalled browser cache and prewarms one short local file URL before the extension harness runs.