npm - pi-agent-browser-native - Versions diffs - 0.2.33 → 0.2.35 - Mend

pi-agent-browser-native 0.2.33 → 0.2.35

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (44) hide show

package/docs/TOOL_CONTRACT.md CHANGED Viewed

@@ -33,11 +33,38 @@ The native command reference in `docs/COMMAND_REFERENCE.md` is driven by the sam
 Agent-facing efficiency claims are measured with `npm run benchmark:agent-browser` or `npm run verify -- benchmark`. The benchmark is deterministic and does not launch a browser; it tracks representative workflow success, tool calls, model-visible output size, stale-ref failures and recoveries, artifact success, failure-category coverage, and elapsed-time estimates so future abstractions can prove they reduce agent work before replacing raw tool use.
+## Input mode chooser
+Use exactly one top-level input per call:
+| When you need | Use | Notes |
+| --- | --- | --- |
+| Routine browse, click, fill, screenshots, upstream commands | `args` | Default path: `open` → `snapshot -i` → `click`/`fill` `@eN` → `snapshot -i` after navigation or DOM changes. Do not pass `--json`; the wrapper injects it (see [Wrapper `--json`](#wrapper-json)). |
+| Stable visible role/text/label/placeholder targets | `semanticAction` | Compiles to upstream `find` or `select`; optional `session` for a named upstream browser. |
+| Short multi-step smoke or evidence flows | `job` or `qa` | Both compile to `batch`; `qa` may reclassify diagnostics as failure. |
+| Desktop Electron apps (list/launch/probe/cleanup) | `electron` | Wrapper-owned lifecycle; not for ordinary websites. |
+| Local UI source hints (experimental) | `sourceLookup` | **Candidates only** with confidence/evidence; not guaranteed DOM-to-file mappings. |
+| Failed fetch/API source hints (experimental) | `networkSourceLookup` | **Candidates only** from initiator metadata and bounded workspace URL literals; not definitive blame. |
+For link and button text, use the **exact** visible label from the latest `snapshot -i` (or `semanticAction` locators), not guessed copy. The `https://example.com/` smoke page uses heading `Example Domain` and link `Learn more`; do not assume older example.com strings such as `More information...`.
+## Snapshot and getter batching
+- **`snapshot -i`**: default for interaction—interactive `@eN` refs, main-content-first trimming, and the usual click/fill workflow.
+- **`snapshot --compact`**: denser same-page tree when you still need refs but want less output than full interactive snapshot.
+- **Full `snapshot`** (no `-i`): use only when you need the complete accessibility tree; expect larger output and possible spill files.
+- Re-run `snapshot -i` after navigation, scrolling, rerendering, or other major DOM changes; refs are page-scoped.
+- When you need **three or more** `get title` / `get url` / `get text` / similar reads for known refs or selectors on the same page, prefer one `batch` stdin array (for example `[["get","text","@e1"],["get","text","@e2"]]`) instead of serial tool calls.
+## Wrapper `--json`
+The extension always plans normal browser commands with `--json` prepended in `effectiveArgs` so upstream returns structured JSON for presentation and `details`. **Do not** include `--json` in caller `args`; it is unnecessary and can confuse planning or transcript hooks that treat caller-requested JSON differently. Plain-text inspection (`--help`, `--version`) keeps its own output shape. Read-only skills and local/setup commands such as `skills list` / `skills get` / `skills path`, local auth profile management (`auth save/list/show/delete/remove`), `profiles`, `dashboard`, `device list`, `doctor`, `install`, `upgrade`, `session list`, and targeted/all local saved-state maintenance including `state clear --all`, `state clear -a`, and named `state clear <session-name>` skip implicit session injection as documented under `sessionMode`.
 <!-- agent-browser-playbook:start shared-guidelines -->
 <!-- Generated from extensions/agent-browser/lib/playbook.ts. Run `npm run docs -- playbook write` to update. -->
 - Standard workflow: open the page, snapshot -i, interact using current @refs from that snapshot, and re-snapshot after navigation, scrolling, rerendering, or other major DOM changes because refs are page-scoped; the wrapper fails mutation-prone stale/recycled refs before upstream can silently target a different current-page element.
 - For ordinary forms from one snapshot, batch multiple fill @refs before the submit/click step to avoid serial tool calls; if a fill may autosubmit, navigate, or rerender later fields, split the flow and refresh refs first.
-- When snapshot -i compacts because the tree is oversized, scan visible output for Omitted high-value controls and optional details.data.highValueControlRefIds before opening the spill file: those list bounded searchboxes, textboxes, comboboxes, buttons, tabs, checkboxes, radios, options, and menuitems that did not fit the key/other ref previews.
+- Snapshot choice: prefer snapshot -i for routine clicks/fills (interactive @refs, main-content-first). Use snapshot --compact when you need a denser same-page tree without full spill; use full snapshot (no -i) only when you need the complete accessibility tree. Re-snapshot after navigation or major DOM changes. When snapshot -i compacts because the tree is oversized, scan visible output for Omitted high-value controls and optional details.data.highValueControlRefIds before opening the spill file: those list bounded searchboxes, textboxes, comboboxes, buttons, tabs, checkboxes, radios, options, and menuitems that did not fit the key/other ref previews.
 - When a visible text or accessible-name target should survive ref churn, prefer find locators such as role, text, label, placeholder, alt, title, or testid with the intended action instead of guessing a CSS selector.
 - For desktop or host-controlled rich inputs, if semanticAction fill misses, refresh refs and prefer a current editable @ref from details.richInputRecovery or the latest snapshot; focus or click that ref, then use keyboard inserttext or keyboard type with the intended text. Do not auto-submit with Enter or a submit button unless the user flow explicitly calls for it.
 - Do not assume Playwright selector dialects such as text=Close or button:has-text('Close') are supported wrapper syntax unless current upstream agent-browser behavior has been verified.
@@ -47,13 +74,13 @@ Agent-facing efficiency claims are measured with `npm run benchmark:agent-browse
 - If you already used the implicit session and now need launch-scoped flags like --profile, --session-name, --cdp, --state, --auto-connect, --init-script, --enable, -p/--provider, or iOS --device, retry with sessionMode set to fresh or pass an explicit --session for the new launch. After a successful unnamed fresh launch, later auto calls follow that new session.
 - For React introspection, launch the page with --enable react-devtools before first navigation, then use react tree, react inspect <fiberId>, sourceLookup candidates for local UI source hints, react renders start/stop, or react suspense; sourceLookup is experimental and reports confidence/evidence instead of guaranteed DOM-to-file mappings. For failed fetches and APIs, networkSourceLookup (experimental) correlates failed network requests with initiator metadata and bounded workspace URL literals—candidates only, not definitive blame. Use vitals [url] for Core Web Vitals and hydration timing, and pushstate <url> for client-side SPA navigation.
 - For first-navigation setup, use open without a URL plus network route --resource-type <csv>, cookies set --curl <file>, or --init-script/--enable before navigate/opening the target page.
-- For stateful browser context work, prefer purpose-specific page actions before dumping browser data: use auth save --password-stdin with the tool stdin field for credentials, state save/load for portable test state, cookies get/set/clear and storage local|session only when the task needs those values, and expect cookie/storage/auth/state summaries to redact credential-like fields.
+- For stateful browser context work, prefer purpose-specific page actions before dumping browser data: use auth save --password-stdin with the tool stdin field for credentials, auth list/show/delete/remove for local auth-profile maintenance, auth login when you need the browser to fill a saved profile, state save/load for portable test state, state list/show/rename/clear/clear -a/clean for saved-state lifecycle cleanup, cookies get/set/clear and storage local|session only when the task needs those values, and expect cookie/storage/auth/state summaries to redact credential-like fields.
 - For batch chains that touch cookies, storage, auth, or other secret-bearing commands, use details.batchSteps for per-step artifacts, categories, spill paths, and full structured errors; top-level details.data on batch is only a compact redacted step matrix (success, argv-redacted command, redacted result or scrubbed error text) built from the same presentation rules as standalone calls.
-- For non-core families, pass current upstream commands through the native tool directly: network route/requests/har, diff snapshot/screenshot/url, trace/profiler/record, console/errors/highlight/inspect/clipboard, stream enable/disable/status, dashboard start/stop, and chat. For compact network requests output, prefer details.nextActions for request detail, actionable failed-request networkSourceLookup, filtering, or HAR capture follow-ups instead of guessing request-id syntax. Artifact-producing commands report details.artifacts and verification state; long-running starts such as stream, dashboard, trace/profiler, and record should be paired with the matching stop/disable command when the task is done.
-- For Electron desktop apps, prefer top-level electron for wrapper-owned discovery, isolated launch, status, compact probe, and cleanup: list first, treat likely-sensitive annotations as hints rather than enforcement, launch with the default snapshot handoff unless handoff: "tabs" is the safer diagnostic starting point, use electron.probe or snapshot -i/qa.attached for current-session state, and always cleanup the returned launchId when done. electron.launch uses an isolated temporary profile; it does not reuse the app's normal signed-in profile or attach to an already-running authenticated app. For signed-in local app state, host-launch the normal app with --remote-debugging-port when appropriate, then use raw args connect <port|url>; after connect, inspect tab list, select the stable tab id such as tab t2, then run a condition wait or snapshot -i before using refs. close only closes the browser/CDP session; leave manually launched app shutdown, profile cleanup, and explicit artifacts to the host owner.
-- For provider or specialized app workflows, load version-matched upstream guidance with skills get agentcore|electron|slack|dogfood|vercel-sandbox through the native tool. Provider launches such as -p ios, --provider browserbase/kernel/browseruse/browserless/agentcore, and iOS --device are upstream-owned setup paths; use sessionMode fresh when switching providers and expect external credentials or local Appium/Xcode setup to be required.
+- For non-core families, pass current upstream commands through the native tool directly: network route/requests/har (including request filters like --type/--method/--status), diff snapshot/screenshot/url with scoped/baseline options, trace/profiler/record, console/errors/highlight/inspect/clipboard, stream enable/disable/status, dashboard start/stop, device list for iOS simulator inventory, and chat. For compact network requests output, prefer details.nextActions for request detail, actionable failed-request networkSourceLookup, filtering, or HAR capture follow-ups instead of guessing request-id syntax. Artifact-producing commands report details.artifacts and verification state; long-running starts such as stream, dashboard, trace/profiler, and record should be paired with the matching stop/disable command when the task is done.
+- For Electron desktop apps, prefer top-level electron for wrapper-owned discovery, isolated launch, status, compact probe, and cleanup: list first, treat likely-sensitive annotations as hints rather than enforcement, launch with the default snapshot handoff unless handoff: "tabs" is the safer diagnostic starting point, use electron.probe or snapshot -i/qa.attached for current-session state, and always cleanup the returned launchId when done. electron.launch uses an isolated temporary profile; it does not reuse the app's normal signed-in profile or attach to an already-running authenticated app. For signed-in local app state, host-launch the normal app with --remote-debugging-port when appropriate, then use raw args connect <port|url>; after connect, inspect tab list, select the stable tab id such as tab t2, then run a condition wait or snapshot -i before using refs. close commands (`close`, `quit`, or `exit`) only close the browser/CDP session; leave manually launched app shutdown, profile cleanup, and explicit artifacts to the host owner.
+- For provider or specialized app workflows, load version-matched upstream guidance with skills get agentcore|electron|slack|dogfood|vercel-sandbox through the native tool; add --full when you need references/templates, and use skills get --all only for broad skill audits. Provider launches such as -p ios, --provider browserbase/kernel/browseruse/browserless/agentcore, and iOS --device are upstream-owned setup paths; use sessionMode fresh when switching providers and expect external credentials or local Appium/Xcode setup to be required.
 - For dialogs and frames, use dialog status/accept/dismiss and frame <selector|main> through native args; when --confirm-actions produces a pending confirmation, use details.nextActions or exact confirm <id> / deny <id> calls instead of inventing ids.
-- If a session lands on the wrong page or tab, an interaction changes origin unexpectedly, or an open call returns blocked, blank, or otherwise unexpected results, use tab list / tab <tab-id-or-label> / snapshot -i to recover state before retrying different URLs or fallback strategies. For desktop readiness, prefer real conditions first: wait --text, wait --url, wait --fn, wait --load <state>, wait --download, or qa.attached; use electron.probe/status for wrapper-owned launch health or target mismatch. Fixed waits are a last resort, must stay below the wrapper IPC budget (wait 30000 is intentionally blocked), and a successful payload like "waited":"timeout" means elapsed time only—verify completion with an observed condition, fresh snapshot, or screenshot.
+- If a session lands on the wrong page or tab, an interaction changes origin unexpectedly, or an open call returns blocked, blank, or otherwise unexpected results, use tab list / tab <tab-id-or-label> / snapshot -i to recover state before retrying different URLs or fallback strategies. For desktop readiness, prefer real conditions first: wait --text, wait --url, wait --fn, wait --load <state>, wait --download, or qa.attached; for disappearance checks in agent-browser 0.27.0, use wait --fn predicates instead of stale upstream-help examples like wait <selector> --state hidden. Use electron.probe/status for wrapper-owned launch health or target mismatch. Fixed waits are a last resort, must stay below the wrapper IPC budget (wait 30000 is intentionally blocked), and a successful payload like "waited":"timeout" means elapsed time only—verify completion with an observed condition, fresh snapshot, or screenshot.
 - For feed, timeline, or inbox reading tasks, focus on the main timeline/list region and read the first item there rather than unrelated composer or sidebar content.
 - For read-only browsing tasks, prefer extracting the answer from the current snapshot, structured ref labels, or eval --stdin on the current page before navigating away. Only click into media viewers, detail routes, or new pages when the current view does not contain the needed information.
 - For downloads, prefer download <selector> <path> when an element click should save a file. Do not rely on click alone when you need the downloaded file on disk.
@@ -62,6 +89,9 @@ Agent-facing efficiency claims are measured with `npm run benchmark:agent-browse
 - When using eval --stdin for extraction, return the value you want instead of relying on console.log as the primary result channel. Prefer plain expressions like ({ title: document.title }) or explicitly invoked functions like (() => ({ title: document.title }))(); if a function-shaped snippet returns {}, details.evalStdinHint may warn that the function was serialized instead of called. If get text on a CSS selector surfaces details.selectorTextVisibility or selectorTextVisibilityAll, prefer a visible @ref, a more specific selector, or the inspect-visible-text-candidates nextAction over hidden tab content.
 - When details.pageChangeSummary is present, use changeType and summary as a compact signal for navigation, DOM mutation, confirmations, or artifacts; when nextActionIds is set, match those ids to entries in details.nextActions (or per-step nextActions inside batch) for concrete follow-up payloads instead of inferring from prose alone. If a no-navigation click surfaces details.overlayBlockers, inspect the fresh snapshot evidence before using a close/dismiss candidate nextAction; ordinary page chrome without dialog/alertdialog evidence should not trigger this diagnostic.
 - When commands save or spill files (screenshots, downloads, PDFs, traces, recordings, HAR, large snapshot spills), use the user's exact requested paths when given and treat paths as provisional until details.artifactVerification shows every row verified: branch on missingCount, pendingCount, unverifiedCount, per-entry state, and optional limitation before downstream file use or PASS/FAIL reporting.
+- For evidence-only screenshots, QA captures, or other audit artifacts, save to an explicit path and branch on details.artifactVerification plus details.artifacts before reporting PASS/FAIL; do not require vision review of inline image attachments unless the user asked for visual inspection.
+- Respect explicit user stop boundaries: if the user says to stop before order/post/purchase/submit, do not click that final action. If the wrapper returns details.promptGuard.reason=explicit-user-stop-boundary, gather evidence on the current page instead of retrying the blocked final action.
+- Successful record stop needs ffmpeg on PATH; the wrapper may warn after record start when ffmpeg is missing.
 - Do not call --help or other exploratory inspection commands unless the user explicitly asks for them or debugging the browser integration is necessary.
 <!-- agent-browser-playbook:end shared-guidelines -->
@@ -87,9 +117,11 @@ Illustrative shapes (each real call uses exactly one of `args`, `semanticAction`
 - type: `string[]`
 - required unless `semanticAction`, `job`, `qa`, `sourceLookup`, `networkSourceLookup`, or `electron` is provided
-- exact CLI args passed after `agent-browser`
+- exact CLI args passed after `agent-browser`; this is the 1:1 upstream CLI coverage path for the targeted `agent-browser` version
 - no shell operators
 - do not include the binary name
+- do not include `--json`; the wrapper injects it (see [Wrapper `--json`](#wrapper-json))
+- first-call recipe: `open` → `snapshot -i` → `click` / `fill` with current `@eN` refs from that snapshot → `snapshot -i` again after navigation or DOM changes
 Examples:
@@ -98,6 +130,8 @@ Examples:
 { "args": ["snapshot", "-i"] }
 { "args": ["click", "@e2"] }
 { "args": ["tab", "list"] }
+{ "args": ["network", "unroute"] }
+{ "args": ["quit"] }
 ```
 ### `semanticAction`
@@ -167,7 +201,9 @@ Examples:
   - `waitForDownload` with `path` (compiled as `wait --download <path>`)
   - `screenshot` with `path`
-Example:
+**Navigation assertions are explicit only.** `job` never treats a successful `click` (or a `select` / submit-style interaction that may navigate) as proof that the expected next page loaded. Top-level `click` may still surface optional `details.navigationSummary` or `pageChangeSummary` hints for operators, but compiled `job` / `batch` steps do **not** auto-insert `assertUrl` or `assertText` after clicks—there is no deterministic expected URL source without caller intent. After any navigation-prone step (link/submit clicks, checkout or form flows, tab-sensitive UI), add an explicit `assertUrl` with the destination pattern you expect, `assertText` for on-page copy, or both, **before** screenshots or steps that assume the new page state.
+Example (static landing page):
 ```json
 {
@@ -181,12 +217,29 @@ Example:
 }
 ```
-Compiled shape:
+Example (open → fill → click → assert destination → screenshot):
+```json
+{
+  "job": {
+    "steps": [
+      { "action": "open", "url": "https://shop.example/checkout" },
+      { "action": "fill", "selector": "#email", "text": "user@example.com" },
+      { "action": "click", "selector": "#continue" },
+      { "action": "assertUrl", "url": "**/shipping" },
+      { "action": "assertText", "text": "Shipping address" },
+      { "action": "screenshot", "path": ".dogfood/shipping.png" }
+    ]
+  }
+}
+```
+Compiled shape for the navigation example:
 ```json
 {
   "args": ["batch"],
-  "stdin": "[[\"open\",\"https://example.com\"],[\"wait\",\"--text\",\"Example Domain\"],[\"screenshot\",\".dogfood/example.png\"]]"
+  "stdin": "[[\"open\",\"https://shop.example/checkout\"],[\"fill\",\"#email\",\"user@example.com\"],[\"click\",\"#continue\"],[\"wait\",\"--url\",\"**/shipping\"],[\"wait\",\"--text\",\"Shipping address\"],[\"screenshot\",\".dogfood/shipping.png\"]]"
 }
 ```
@@ -202,11 +255,12 @@ Because `job` still executes as upstream `batch` with generated stdin, the same
 - optional; mutually exclusive with `args`, `semanticAction`, `job`, `sourceLookup`, `networkSourceLookup`, and `electron`
 - lightweight preset built on the same batch compiler path as `job`
 - URL form: clears enabled diagnostic buffers first (`network requests --clear`, `console --clear`, `errors --clear`), then opens `url`, waits with `wait --load <state>` using the resolved `loadState`, optionally asserts `expectedText` (string or string array) and/or `expectedSelector` (each may be omitted for a load-plus-diagnostics-only smoke), then runs enabled diagnostics: `network requests`, `console`, and `errors`
-- attached form: `qa: { attached: true, expectedText?, expectedSelector?, screenshotPath?, checkNetwork?, checkConsole?, checkErrors?, loadState? }` runs the same waits, optional assertions, diagnostics, and screenshot against the current attached managed session without opening a URL. It rejects `url` and cannot be used with `sessionMode: "fresh"`; attach first with `electron.launch` or raw `args: ["connect", "<port-or-url>"]`, then run `qa.attached`.
+- attached form: `qa: { attached: true, expectedText?, expectedSelector?, screenshotPath?, checkNetwork?, checkConsole?, checkErrors?, loadState? }` runs the same waits, optional assertions, diagnostics, and screenshot against the current attached managed session without opening a URL. It rejects `url` and cannot be used with `sessionMode: "fresh"`; attach first with `electron.launch` or raw `args: ["connect", "<port-or-url>"]`, then run `qa.attached`. Before spawning the diagnostic batch, the wrapper preflights the attached session: `get url` must succeed and return an `http:` or `https:` page URL. Missing URLs, read failures, and non-http(s) surfaces fail fast with `failureCategory: "validation-error"`, `details.validationError`, and recovery `nextActions` such as `list-tabs-before-qa-attached` and `snapshot-before-qa-attached` instead of running the full QA batch.
 - `loadState` is optional and must be `domcontentloaded`, `load`, or `networkidle`; it defaults to `domcontentloaded` so analytics-heavy or long-polling pages do not hang routine QA. Use `networkidle` only when the site is expected to go fully quiet.
 - `checkNetwork`, `checkConsole`, and `checkErrors` default to `true`; set a field to `false` to omit that diagnostic
 - optional `screenshotPath` adds an evidence screenshot step
 - reports `details.compiledQaPreset` with the compiled batch plan and resolved `loadState`, plus `details.qaPreset` with `{ passed, failedChecks, warnings, summary }`
+- on success with no failed checks, model-visible prose collapses to a compact pass summary (current page URL/title when known, checks run, optional screenshot path plus artifact verification, pointer to `details.qaPreset` and `details.batchSteps` for the full matrix). Failed QA and QA reclassified to `qa-failure` keep the verbose per-step batch output.
 - fails the native tool result with `failureCategory: "qa-failure"` when diagnostics report page errors, console error messages, actionable failed network requests, or any batch step failure. Benign classification (implementation: `classifyNetworkRequestFailure` → `isBenignAssetFailure` in `extensions/agent-browser/lib/results/network.ts`) applies only when the row is already treated as failed (`status >= 400`, `failed: true`, or a string `error`—see `isFailedNetworkRequest`), the URL path’s last segment matches the icon basename heuristic (`favicon` plus `.ico`/`.png`/`.svg`, or `apple-touch-icon` plus `.png`, each allowing an optional `[-.\w]*` stem suffix before the extension), **and** at least one of `status === 404`, `failed === true`, or `typeof request.error === "string"` holds (so a **status-only** failure such as `500` on that path with neither `failed` nor a string `error` stays actionable). It also requires the upstream `resourceType` / `mimeType` (whichever is present) to be absent or look image-like: `image`, `img`, `other`, or a value starting with `image/`. Those rows are counted in `qaPreset.warnings` (for example `N benign network request failure(s) ignored`) and omitted from the actionable failed-network tally; every other failed request stays actionable.
 Example:
@@ -233,9 +287,9 @@ Action schemas:
 | --- | --- | --- |
 | `list` | `query?`, `maxResults?` | Scans supported platform app locations for Electron evidence and returns bounded app metadata in `details.electron.apps`. Likely-sensitive app annotations are advisory metadata only. Does not spawn upstream `agent-browser`; list output also warns that later wrapper launches are isolated and will not read existing signed-in desktop state. |
 | `launch` | exactly one of `appPath`, `appName`, `bundleId`, or `executablePath`; optional `appArgs`, `handoff`, `targetType`, `timeoutMs`, `allow`, `deny` | Resolves and verifies an Electron target, launches it with a wrapper-owned isolated profile and OS-chosen CDP port, attaches through upstream `connect` using `sessionMode: "fresh"`, and records `details.electron.launch`. It does not reuse the app's normal signed-in profile or attach to an already-running authenticated app; when signed-in local app state is the goal, use a host debug-port launch plus raw `connect` instead. |
-| `status` | optional `launchId` or `all`, optional `timeoutMs` | Inspects wrapper-tracked launches, debug-port liveness, and current CDP targets without mutating the app. With neither `launchId` nor `all`, selects the single active wrapper launch when unambiguous. |
-| `cleanup` | optional `launchId` or `all`, optional `timeoutMs` | Closes the tracked upstream session when present, stops only the wrapper-tracked process, verifies debug-port shutdown, removes the wrapper-created `userDataDir`, and marks records cleaned or partial. |
-| `probe` | optional `launchId`, optional `timeoutMs` | Runs bounded current-session or launch-scoped state reads (`get title`, `get url`, focused-element `eval --stdin`, `tab list`, compact `snapshot -i`) and reports `details.electron.probe`. Without `launchId`, requires an active attached managed session; with `launchId`, it resolves the tracked launch session and can report mismatch guidance. |
+| `status` | optional `launchId` or `all`, optional `timeoutMs` | Inspects wrapper-tracked launches, debug-port liveness, and current CDP targets without mutating the app. With neither `launchId` nor `all`, selects the single active wrapper launch when unambiguous. Runtime-owned launches remain visible by `launchId` after a Pi `session_tree` branch switch. Current branch-visible launches survive `/reload`; off-branch owned launches are cleaned on reload. |
+| `cleanup` | optional `launchId` or `all`, optional `timeoutMs` | Closes the tracked upstream session when present, stops only the wrapper-tracked process, verifies debug-port shutdown, removes the wrapper-created `userDataDir`, and marks records cleaned or partial. Cleanup is serialized with managed-session browser work, and a successful managed-session close step clears live/restore managed-session state even when host process/profile cleanup remains partial. |
+| `probe` | optional `launchId`, optional `timeoutMs` | Runs bounded current-session or launch-scoped state reads (`get title`, `get url`, focused-element `eval --stdin`, `tab list`, compact `snapshot -i`) and reports `details.electron.probe`. Without `launchId`, requires an active attached managed session; with `launchId`, it resolves the tracked launch session, including runtime-owned off-branch launch records, and can report mismatch guidance. |
 Validation and defaults:
@@ -245,8 +299,8 @@ Validation and defaults:
 - `launch.targetType` defaults to `"page"`; supported values are `"page"`, `"webview"`, and `"any"`. When a matching CDP target exposes a WebSocket URL, launch connects to that target; otherwise it falls back to the browser port.
 - `appArgs` are passed to the Electron app, but wrapper-owned lifecycle/debug flags are rejected (`--user-data-dir`, `--remote-debugging-port`, `--remote-debugging-address`, `--remote-debugging-pipe`, and `--`).
 - `allow` and `deny` are optional caller-owned policy lists. Entries match app name, bundle id, desktop id, app path, or executable path by substring. If `allow` is set, the target must match it; `deny` wins on conflict. With neither list, launch is permitted.
-- `electron.status` / `electron.cleanup` accept optional `all` only as the boolean literal `true` to include every wrapper-tracked launch; `all` and `launchId` cannot both be set.
-- `electron.launch` `timeoutMs` is an optional positive integer for the host CDP readiness window. The launcher normalizes missing or non-positive values to **15000 ms** and **caps** at **120000 ms** (`normalizeTimeoutMs` in `extensions/agent-browser/lib/electron/launch.ts`). Other actions use `timeoutMs` differently: **`status`** forwards it only to managed-session `get title` / `get url` reads for mismatch diagnostics (when omitted, those subprocess reads use the normal `runAgentBrowserProcess` default from `getAgentBrowserProcessTimeoutMs`), while localhost CDP liveness in `inspectElectronLaunchStatus` uses a fixed **1000 ms** fetch budget per HTTP probe (`ELECTRON_STATUS_FETCH_TIMEOUT_MS` in `extensions/agent-browser/lib/electron/cleanup.ts`). **`cleanup`** applies one budget to upstream managed-session `close` plus host process exit, debug-port verification, and temp profile removal (`cleanupTrackedElectronLaunches` in `extensions/agent-browser/index.ts`); when omitted it defaults to **`PI_AGENT_BROWSER_IMPLICIT_SESSION_CLOSE_TIMEOUT_MS`** else **5000 ms** (`getImplicitSessionCloseTimeoutMs` in `extensions/agent-browser/lib/runtime.ts`). **`probe`** forwards it to each bounded upstream read (`get title`, `get url`, `eval --stdin`, `tab list`, `snapshot -i`); when omitted, those reads use the same `runAgentBrowserProcess` default as other tool calls.
+- `electron.status` / `electron.cleanup` accept optional `all` only as the boolean literal `true` to include every wrapper-tracked launch; `all` and `launchId` cannot both be set. Status and cleanup use the same runtime wrapper-tracked scope: current branch-visible records plus still-owned off-branch records. Default no-argument status/cleanup is intentionally ambiguous when more than one active launch is in that merged scope; pass `launchId` or `all: true`.
+- `electron.launch` `timeoutMs` is an optional positive integer for the host CDP readiness window. The launcher normalizes missing or non-positive values to **15000 ms** and **caps** at **120000 ms** (`normalizeTimeoutMs` in `extensions/agent-browser/lib/electron/launch.ts`). Other actions use `timeoutMs` differently: **`status`** forwards it only to managed-session `get title` / `get url` reads for mismatch diagnostics (when omitted, those subprocess reads use the normal `runAgentBrowserProcess` default from `getAgentBrowserProcessTimeoutMs`), while localhost CDP liveness in `inspectElectronLaunchStatus` uses a fixed **1000 ms** fetch budget per HTTP probe (`ELECTRON_STATUS_FETCH_TIMEOUT_MS` in `extensions/agent-browser/lib/electron/cleanup.ts`). **`cleanup`** applies one budget to upstream managed-session `close` plus host process exit, debug-port verification, and temp profile removal (`cleanupTrackedElectronHostLaunches` in `extensions/agent-browser/lib/orchestration/electron-host/index.ts`); when omitted it defaults to **`PI_AGENT_BROWSER_IMPLICIT_SESSION_CLOSE_TIMEOUT_MS`** else **5000 ms** (`getImplicitSessionCloseTimeoutMs` in `extensions/agent-browser/lib/runtime.ts`). **`probe`** forwards it to each bounded upstream read (`get title`, `get url`, `eval --stdin`, `tab list`, `snapshot -i`); when omitted, those reads use the same `runAgentBrowserProcess` default as other tool calls.
 - Non-Electron targets are rejected as a correctness failure; the wrapper does not blindly launch arbitrary executables as Electron.
 Safety defaults and ownership:
@@ -255,8 +309,8 @@ Safety defaults and ownership:
 - The wrapper also passes `--disable-extensions`, `--no-first-run`, and `--no-default-browser-check` alongside sanitized caller `appArgs`.
 - Remote debugging exposes app contents to the attached browser tool. The wrapper gives isolation defaults and optional `allow` / `deny`; the user still owns the decision to launch or attach to a sensitive desktop app.
 - `electron.list` may annotate apps as likely sensitive (`sensitivity.level: "likely-sensitive"`, categories such as `notes`, `chat`, `mail`, `developer-workspace`, or `passwords-auth`) and print `[likely sensitive: …]`. These annotations are non-blocking hints, not enforcement; caller-owned `allow` / `deny` policy still controls launch decisions.
-- Cleanup is wrapper-owned **only** for records created by `electron.launch`. `electron.cleanup` never targets manually launched apps, externally supplied debug ports, or arbitrary Electron processes. Explicit screenshots/downloads/HARs/traces remain host-file cleanup, not Electron cleanup.
-- On Pi session shutdown, active wrapper-owned Electron launches are best-effort cleaned. Stale restored records are reported instead of guessed/killed when the wrapper lacks a live child process.
+- Cleanup is wrapper-owned **only** for records created by `electron.launch`. `electron.cleanup` never targets manually launched apps, externally supplied debug ports, or arbitrary Electron processes. Explicit screenshots/downloads/HARs/traces remain host-file cleanup, not Electron cleanup. If `electron.cleanup` closes the upstream managed session but process/profile cleanup remains partial, later shutdown cleanup does not close that managed session a second time; retry cleanup focuses on the remaining host resources.
+- On Pi `quit`, active wrapper-owned Electron launches are best-effort cleaned. On `/reload`, current branch-visible active Electron launches are preserved for reload continuity, including their isolated `userDataDir` profile directories, while off-branch owned launches are cleaned before process-local ownership is cleared. If cleanup is partial and deliberately skips or fails `user-data-dir` removal because the process or debug port is still live, generic temp cleanup preserves that profile path across reload, quit, later temp sweeps, process-exit cleanup, and stale temp-root pruning after restart instead of deleting it underneath the remaining host resource. Stale restored records are reported instead of guessed/killed when the wrapper lacks a live child process.
 Details fields:
@@ -358,7 +412,7 @@ For an app you launched manually with remote debugging enabled, skip `electron.c
 - type: object with at least one of `selector`, `reactFiberId`, or `componentName`
 - optional; mutually exclusive with `args`, `semanticAction`, `job`, `qa`, `networkSourceLookup`, and `electron`
-- experimental opt-in helper for local app debugging; it reports candidate source locations with confidence and evidence instead of claiming a guaranteed DOM-to-file mapping
+- **EXPERIMENTAL — candidates only:** opt-in helper for local app debugging; it reports candidate source locations with confidence and evidence instead of claiming a guaranteed DOM-to-file mapping. Do not treat output as authoritative file ownership or edit targets without verification.
 - compiles to existing upstream `batch` commands only:
   - `selector` adds `is visible <selector>` and, unless `includeDomHints: false`, adds `get html <selector>` for source-like DOM attributes (`data-source-file`, `data-file`, `data-component-file`, `data-source`, plus optional `data-source-line` / `data-line` and `data-source-column` / `data-column`) and for `.ts`/`.tsx`/`.js`/`.jsx` paths embedded in HTML text
   - `reactFiberId` runs `react inspect <id>`; this requires the page to have been launched with `--enable react-devtools` before first navigation and for the app build to expose source information
@@ -383,7 +437,7 @@ Use raw `args` for direct upstream React inspection when you already know the ex
 - type: object with at least one of `requestId`, `filter`, or `url`, plus optional `maxWorkspaceFiles`
 - optional; mutually exclusive with `args`, `semanticAction`, `job`, `qa`, `sourceLookup`, and `electron`
-- experimental failed-request source-hint helper; it reports failed network requests and candidate source hints with evidence instead of assigning blame
+- **EXPERIMENTAL — candidates only:** failed-request source-hint helper; it reports failed network requests and candidate source hints with evidence instead of assigning blame or proving root cause
 - compiles to existing upstream `batch` commands only: `network request <requestId>` when provided plus `network requests` with `--filter <filter-or-url>` when a filter or URL is provided (if both are set, `filter` wins; when only `url` is set, it becomes the `--filter` argument); optional `session` prepends `--session <name>` before that generated `batch`
 - detects failed requests from `status >= 400`, `failed: true`, or an `error` field
 - candidate sources come from source-like initiator/stack metadata in upstream network results and bounded local workspace search for URL/path literals under the Pi session cwd
@@ -433,7 +487,7 @@ Behavior:
 - if `args` already include `--session` (including argv compiled from optional `semanticAction.session`), upstream session choice wins
 - `"auto"` prepends the current extension-managed active session when appropriate
 - `"fresh"` rotates that managed session to a fresh upstream launch so startup-scoped flags like `--profile`, `--session-name`, `--cdp`, `--state`, `--auto-connect`, `--init-script`, `--enable`, `-p` / `--provider`, or iOS `--device` apply and later default calls follow the new browser
-- stateless paths skip that injection even under `"auto"`: plain-text `--help` / `-h` / `--version` / `-V` (see the generated inspection playbook fragment below) and read-only `skills list`, `skills get …`, and `skills path …` keep `effectiveArgs` free of the implicit managed `--session` unless the caller supplied `--session` explicitly; successful results therefore omit `usedImplicitSession` and the extension-managed `sessionName` for those calls (`extensions/agent-browser/lib/runtime.ts`, `buildExecutionPlan`)
+- sessionless paths skip that injection even under `"auto"`: plain-text `--help` / `-h` / `--version` / `-V` (see the generated inspection playbook fragment below), read-only `skills list`, `skills get …`, and `skills path …`, local auth profile management (`auth save/list/show/delete/remove`), local/setup commands (`profiles`, `dashboard start/stop`, `device list`, `doctor`, `install`, `upgrade`, `session list`), and targeted/all local saved-state maintenance (`state list/show`, `state clear --all`, `state clear -a`, `state clear <session-name>`, `state clean --older-than <days>`, `state rename`) keep `effectiveArgs` free of the implicit managed `--session` unless the caller supplied `--session` explicitly; successful results therefore omit `usedImplicitSession` and the extension-managed `sessionName` for those calls, while root `session`, untargeted `state clear`, bare `state clean`, browser-backed `auth login`, and `state save/load` keep normal managed-session injection (`extensions/agent-browser/lib/command-policy.ts`, `needsManagedSession`; `extensions/agent-browser/lib/runtime.ts`, `buildExecutionPlan`)
 Recommended use:
 - use `"auto"` for the common browse/snapshot/click flow inside one `pi` session
@@ -442,6 +496,8 @@ Recommended use:
 ## Wrapper behavior
+Caller `args` should omit `--json`; the wrapper prepends it for normal execution so `details` and presentation stay structured. See [Wrapper `--json`](#wrapper-json).
 The extension should:
 - inject `--json`
 - invoke `agent-browser` directly, not through a shell
@@ -520,12 +576,12 @@ For `batch`, top-level `details` still carries `resultCategory` plus `successCat
 Top-level `details.data` on `batch` is a compact per-step roll-up (not a verbatim replay of raw upstream batch JSON): each element is `{ success, command, result? | error? }` where `command` is argv-redacted the same way as echoed invocation args (including `cookies set` cookie values, `storage local|session set` values, and other sensitive flags/positionals), `result` is the presentation-layer data for that step after the same structured redaction as non-batch commands, and `error` is failure text with cookie/storage/password literals stripped when those values appeared in argv. Prefer `batchSteps[]` for full per-step `details` (artifacts, categories, spill paths); use the roll-up when you only need a redacted matrix of what ran.
-`details.refSnapshot` may appear after successful `snapshot` calls and subsequent same-session calls. It records the latest page-scoped ref ids known to the wrapper and the page target they came from so mutation-prone `@e…` commands can fail fast instead of silently hitting recycled refs after navigation. For wrapper-tracked Electron sessions, `details.electronRefFreshness` may also appear after a successful `@e…` mutation as a softer same-URL rerender warning: run `snapshot -i` before reusing old refs even if the URL did not change.
+`details.refSnapshot` may appear after successful `snapshot` calls and subsequent same-session calls. It records the latest page-scoped ref ids known to the wrapper, optional per-ref accessible `role`/`name` metadata from the same snapshot, and the page target they came from so mutation-prone `@e…` commands can fail fast instead of silently hitting recycled refs after navigation. For wrapper-tracked Electron sessions, `details.electronRefFreshness` may also appear after a successful `@e…` mutation as a softer same-URL rerender warning: run `snapshot -i` before reusing old refs even if the URL did not change.
-Ref preflight details (implementation in `extensions/agent-browser/index.ts`):
+Ref preflight details (command taxonomy in `extensions/agent-browser/lib/command-taxonomy.ts`, orchestration in `extensions/agent-browser/lib/orchestration/browser-run/session-state.ts`):
 - **URL alignment:** `refSnapshot.target.url` and the session’s current tab URL are compared via `targetsMatch` / `normalizeComparableUrl` in `extensions/agent-browser/index.ts`: values are trimmed, parsed as URLs when possible, compared **after dropping the `#fragment`**, and the query string remains significant. If either side lacks a `url`, `targetsMatch` treats the pair as matching so early-session calls are not blocked.
-- **Batch stdin ordering:** user `batch` JSON is scanned in order. Any step whose first token is in `REF_INVALIDATING_BATCH_COMMANDS` sets a latch that blocks later steps whose first token is in `REF_GUARDED_COMMANDS` and that mention `@e…` refs. A step whose first token is `snapshot` clears that latch for subsequent steps (pre-spawn intent only; it does not wait for upstream success). The invalidating set includes navigation/mutation verbs such as `open`, `goto`, `reload`, `click`, and related upstream commands; same-snapshot `fill` rows stay guarded but do not set the latch, allowing ordinary form-fill batches before a click/submit step. The guarded set is the commands that accept page-scoped refs for interaction (`click`, `fill`, `download`, `scrollintoview`, and others enumerated next to those literals in source). Changing either set requires updating this contract, [`docs/SUPPORT_MATRIX.md`](SUPPORT_MATRIX.md) `RQ-0072`/`RQ-0087` notes, README and command-reference pitfalls, and `test/agent-browser.extension-validation.test.ts`.
+- **Batch stdin ordering:** user `batch` JSON is scanned in order. Any step whose first token satisfies `isRefInvalidatingBatchCommand` sets a latch that blocks later steps whose first token satisfies `isRefGuardedCommand` and that mention `@e…` refs. A step whose first token is `snapshot` clears that latch for subsequent steps (pre-spawn intent only; it does not wait for upstream success). These predicates read explicit command capability flags from `command-taxonomy.ts`: navigation/mutation verbs such as `open` / `goto`, `reload`, `click`, and related upstream commands have `invalidatesBatchRefs`; same-snapshot `fill` rows stay `guardsPageRefs` and mutation-summary eligible but intentionally do not set `invalidatesBatchRefs`, allowing ordinary form-fill batches before a click/submit step. Ref-guarded commands accept page-scoped refs for interaction (`click`, `fill`, `download`, `scrollintoview` / `scrollinto`, and others centralized in the command taxonomy). Changing either capability requires updating this contract, [`docs/SUPPORT_MATRIX.md`](SUPPORT_MATRIX.md) `RQ-0072`/`RQ-0087` notes, README and command-reference pitfalls, and `test/agent-browser.extension-validation.test.ts`.
 **Presentation redaction (implementation map):** Successful non-`batch` tool calls and each successful `batchSteps[]` row run upstream `data` through `redactPresentationData` in `extensions/agent-browser/lib/results/presentation/diagnostics.ts`: `cookies` and `storage` walk objects/arrays and replace case-insensitive `value` keys with `"[REDACTED]"` (diagnostic formatters still describe rows without expanding secrets); every other command’s payload is recursively scrubbed with `redactStructuredPresentationValue`, which redacts known sensitive key names and applies string-level sensitivity heuristics so network, diff, trace/profiler, stream, dashboard, chat, and other structured results do not echo bearer tokens, proxy credentials, or similar fields verbatim into `details.data`. Echoed `command` arrays in `details` and in batch roll-ups use `redactInvocationArgs` from `extensions/agent-browser/lib/runtime.ts` to mask trailing values for sensitive global flags (including `--body`, `--headers`, `--password`, and `--proxy`), preserve the special positional rules for `cookies set`, `storage local|session set`, and `set credentials`, and scrub other argv tokens for URLs and inline secrets. Failed batch steps additionally run `redactExactValues` on structured step errors so literals taken from that step’s argv (cookie value, storage set value, `--password` / `--password=` tokens) cannot reappear inside formatted error blobs.
@@ -537,9 +593,13 @@ For `network requests`, `details.nextActions` is bounded to one selected safe re
 For `batch`, each `batchSteps[]` entry can carry its own `nextActions` for that step’s success or failure. Top-level `details.nextActions` on a failed batch duplicates `batchFailure.failedStep.nextActions` so callers can read one aggregate object. On a fully successful batch, top-level `nextActions` may still list artifact follow-ups derived from the combined step artifacts.
-`pageChangeSummary` is an optional compact summary for mutation-prone and artifact-producing commands. It includes `changeType` (`"navigation"`, `"mutation"`, `"artifact"`, or `"confirmation"`), `command`, a readable `summary`, optional `title`/`url`, optional `artifactCount` or `savedFilePath`, and `nextActionIds` that link the observed change to `nextActions` without repeating full payloads. The wrapper maintains an explicit allowlist of mutation-prone commands in `extensions/agent-browser/lib/results/presentation/navigation.ts` (`PAGE_CHANGE_SUMMARY_COMMANDS`): those commands still emit a `mutation`-typed summary when upstream JSON lacks navigation metadata, as long as no stronger signal (artifact, saved path, navigation fields, or pending confirmation) applies. Commands outside that set omit `pageChangeSummary` unless the parsed payload shows navigation, a confirmation prompt, saved files, or artifacts—including read-only inspection commands, which normally have no summary unless one of those signals appears. For `batch`, the top-level summary favors artifact rollups when any step produced artifacts; otherwise it may synthesize a `mutation` summary from steps that carried their own `pageChangeSummary`. Treat mutation summaries as "upstream attempted the action" evidence, not proof the application handled it; agents should verify URL/text/state for important mutations before continuing.
+`pageChangeSummary` is an optional compact summary for mutation-prone and artifact-producing commands. It includes `changeType` (`"navigation"`, `"mutation"`, `"artifact"`, or `"confirmation"`), `command`, a readable `summary`, optional `title`/`url`, optional `artifactCount` or `savedFilePath`, and `nextActionIds` that link the observed change to `nextActions` without repeating full payloads. The wrapper maintains an explicit `eligibleForPageChangeSummary` command capability through `isPageChangeSummaryCommand` in `extensions/agent-browser/lib/command-taxonomy.ts`: those commands still emit a `mutation`-typed summary when upstream JSON lacks navigation metadata, as long as no stronger signal (artifact, saved path, navigation fields, or pending confirmation) applies. That capability is independent from `invalidatesBatchRefs` and `triggersPostMutationSnapshot`, so artifact summaries like `download` / `screenshot` and guarded-but-non-invalidating `fill` are documented directly in the capability table instead of implied by broad set spreading. Commands outside that set omit `pageChangeSummary` unless the parsed payload shows navigation, a confirmation prompt, saved files, or artifacts—including read-only inspection commands, which normally have no summary unless one of those signals appears. For `batch`, the top-level summary favors artifact rollups when any step produced artifacts; otherwise it may synthesize a `mutation` summary from steps that carried their own `pageChangeSummary`. Treat mutation summaries as "upstream attempted the action" evidence, not proof the application handled it; agents should verify URL/text/state for important mutations before continuing.
+`clickDispatch` may appear after a **top-level non-Electron** `click` when the wrapper installed a pre-click DOM-event probe for an exact CSS or XPath selector, upstream reported success, and the post-click probe found no trusted DOM event reached that target. The wrapper does **not** replay clicks in-page and does **not** install this probe for `@e…` refs because wrapper-side target resolution would diverge from upstream ref identity. On a miss it marks the tool failed, appends `Click dispatch diagnostic: …`, and sets `clickDispatch.status` to `"no-native-event-observed"` with `reason: "native-click-produced-no-target-dom-event"`, `nativeEventCount`, and a redacted `target` descriptor (`kind: "selector" | "xpath"`, plus the selector string). `details.nextActions` gains `inspect-click-dispatch-miss` (`snapshot -i`) and `retry-click-after-dispatch-miss` (same upstream `click` argv, session-prefixed when applicable). This diagnostic is only for standalone `click`; `batch`/`job`/`qa` click steps remain upstream-owned batch behavior.
+`promptGuard` may appear on wrapper-blocked calls when the latest user prompt contains machine-recognizable safety or evidence requirements. `reason: "explicit-user-stop-boundary"` is a **best-effort click-like and Enter/Return keypress guard**, not a complete stop-boundary enforcer: it blocks likely final order/payment/submit targets (for example `Finish`, `place order`, `submit payment`, or matching selectors/`@ref` metadata) on standalone `click`/`dblclick`/`tap`, `find … click`, and matching `batch` steps, and it blocks standalone or batch `press <key>` / `key <key>` when the key is Enter or Return (keypress submits do not require a final-action label match). It does **not** block `eval`, generic `fill`/`type`/`select`, `keyboard type`/`keyboard inserttext`, non-Enter keypresses, or other scripted activation. Implementation: `extensions/agent-browser/lib/orchestration/browser-run/browser-action-model.ts` plus `prompt-guards.ts`. `reason: "requested-artifacts-missing-before-close"` blocks `close` / `quit` / `exit` when the prompt named exact required screenshot paths and the session artifact manifest has not verified those paths; optional recording paths are only required when recording appears available. Both guards return `failureCategory: "policy-blocked"` and `validationError` text instead of invoking upstream.
-`overlayBlockers` may appear after a successful **top-level non-Electron** `click` (the unified `details.command` is `click`, not `batch`/`job`/`qa` flows that compile to `batch`) only when upstream JSON includes a string `data.clicked` ref, the session’s prior pinned tab URL (`priorSessionTabTarget.url`) and the post-click active tab URL both exist and stay equal after the same URL normalization used for ref preflight (trimmed hosts/paths; **`#fragment` dropped** while the query string stays significant), and the wrapper did not apply session tab correction or an about-blank mismatch recovery in the same result. Wrapper-tracked Electron clicks prefer lifecycle health and ref-freshness diagnostics because desktop app chrome produced too many false overlay candidates in dogfood. The post-click side comes from `details.navigationSummary.url`, which the wrapper fills with one read-only `eval` summary (`({ title: document.title, url: location.href })`) only when upstream click JSON omits **both** string `data.url` and `data.title` (`shouldCaptureNavigationSummary` in `extensions/agent-browser/index.ts`). If either field is present as a string on the click payload, that probe is skipped, `navigationSummary` stays unset here, and overlay diagnostics are omitted even when the page did not navigate. The wrapper then issues **one** extra session-scoped `snapshot -i`, scans that snapshot’s `refs` map, and only emits diagnostics when **both** are true: at least one ref has a strong modal role (`dialog` or `alertdialog`), and there are up to **three** separate `button`/`link`/`menuitem` refs whose names match close/dismiss-style patterns (for example “Close”, “Dismiss”, “No thanks”, or a lone `×`). Page-wide text such as “privacy”, “sign in”, or “banner” without a dialog role is not enough, which avoids warning on ordinary same-page menu opens or app button mutations. Each candidate carries `ref` (`@eN`), optional `role`/`name`, exact `click` argv in `args`, and a short evidence `reason`. The struct also includes a `summary` string (one sentence stating that the click left the tab on the same normalized URL and the fresh snapshot shows likely dismiss controls) plus a `snapshot` object (same shape as `details.refSnapshot` after a normal snapshot): on success the wrapper may treat that snapshot as the session’s latest ref map for subsequent calls, so agents should assume refs can move to match this post-diagnostic tree. Visible text appends the same bullets under `Possible overlay blockers`, and `details.nextActions` gains `inspect-overlay-state` plus `try-overlay-blocker-candidate-1`…`3` after any presentation `nextActions` (for example `inspect-after-mutation`); when `details.sessionName` is set, those appended actions use `sessionPrefixArgs` so `params.args` begin with `--session <name>` unless argv already starts with `--session`. This is conservative evidence, not proof the candidate should be clicked; prefer `inspect-overlay-state` first unless the dismiss control is clearly safe.
+`overlayBlockers` may appear after a successful **top-level non-Electron** `click` (the unified `details.command` is `click`, not `batch`/`job`/`qa` flows that compile to `batch`) only when upstream JSON includes a string `data.clicked` ref, no `clickDispatch` diagnostic fired for the same result, the session’s prior pinned tab URL (`priorSessionTabTarget.url`) and the post-click active tab URL both exist and stay equal after the same URL normalization used for ref preflight (trimmed hosts/paths; **`#fragment` dropped** while the query string stays significant), and the wrapper did not apply session tab correction or an about-blank mismatch recovery in the same result. Wrapper-tracked Electron clicks prefer lifecycle health and ref-freshness diagnostics because desktop app chrome produced too many false overlay candidates in dogfood. The post-click side comes from `details.navigationSummary.url`, which the wrapper fills with one read-only `eval` summary (`({ title: document.title, url: location.href })`) only when upstream click JSON omits **both** string `data.url` and `data.title` (`shouldCaptureNavigationSummary` in `extensions/agent-browser/index.ts`). If either field is present as a string on the click payload, that probe is skipped, `navigationSummary` stays unset here, and overlay diagnostics are omitted even when the page did not navigate. The wrapper then issues **one** extra session-scoped `snapshot -i`, scans that snapshot’s `refs` map, and only emits diagnostics when **both** are true: at least one ref has a strong modal role (`dialog` or `alertdialog`), and there are up to **three** separate `button`/`link`/`menuitem` refs whose names match close/dismiss-style patterns (for example “Close”, “Dismiss”, “No thanks”, or a lone `×`). Page-wide text such as “privacy”, “sign in”, or “banner” without a dialog role is not enough, which avoids warning on ordinary same-page menu opens or app button mutations. Each candidate carries `ref` (`@eN`), optional `role`/`name`, exact `click` argv in `args`, and a short evidence `reason`. The struct also includes a `summary` string (one sentence stating that the click left the tab on the same normalized URL and the fresh snapshot shows likely dismiss controls) plus a `snapshot` object (same shape as `details.refSnapshot` after a normal snapshot): on success the wrapper may treat that snapshot as the session’s latest ref map for subsequent calls, so agents should assume refs can move to match this post-diagnostic tree. Visible text appends the same bullets under `Possible overlay blockers`, and `details.nextActions` gains `inspect-overlay-state` plus `try-overlay-blocker-candidate-1`…`3` after any presentation `nextActions` (for example `inspect-after-mutation`); when `details.sessionName` is set, those appended actions use `sessionPrefixArgs` so `params.args` begin with `--session <name>` unless argv already starts with `--session`. This is conservative evidence, not proof the candidate should be clicked; prefer `inspect-overlay-state` first unless the dismiss control is clearly safe.
 Example shape (fields vary by scenario):
@@ -608,7 +668,9 @@ Additional structured fields can appear when relevant:
 - `batchFailure` and `batchSteps` for `batch` rendering, including mixed-success runs
 - `navigationSummary` for navigation-style commands like `click`, `back`, `forward`, and `reload`
 - `pageChangeSummary` for compact mutation/artifact/navigation summaries on commands that can change browser state
-- `overlayBlockers` for conservative post-click overlay/banner/dialog blocker candidates when a direct click stays on the same URL and a fresh snapshot provides evidence (`candidates`, `summary`, and `snapshot` per `OverlayBlockerDiagnostic` in `extensions/agent-browser/index.ts`)
+- `clickDispatch` when a top-level non-Electron `click` reported upstream success but the wrapper’s DOM-event probe found no trusted event reached the target; shape follows `ClickDispatchDiagnostic` in `extensions/agent-browser/lib/orchestration/browser-run/types.ts`
+- `promptGuard` when a prompt-derived guard blocks a likely final order/submit click or blocks browser close before required prompt artifact paths are verified; implementation lives in `extensions/agent-browser/lib/orchestration/browser-run/prompt-guards.ts`
+- `overlayBlockers` for conservative post-click overlay/banner/dialog blocker candidates when a direct click stays on the same URL, no `clickDispatch` diagnostic fired, and a fresh snapshot provides evidence (`candidates`, `summary`, and `snapshot` per `OverlayBlockerDiagnostic` in `extensions/agent-browser/index.ts`)
 - `visibleRefFallback` after a raw `find` or compiled `semanticAction` fails with `selector-not-found` and a fresh snapshot finds exact role/name `@ref` matches. Shape follows `VisibleRefFallbackDiagnostic` in `extensions/agent-browser/lib/results/selector-recovery.ts`: `{ candidates, snapshot, summary, target }`, where each candidate has `ref`, `role`, `name`, optional direct ref `args`, and `reason`; visible text appends `Current snapshot ref fallback`. Non-fill candidates with direct args add `try-current-visible-ref` or numbered `try-current-visible-ref-N` actions. Fill candidates omit direct args and target text so recovery details do not repeat potentially sensitive fill text.
 - `refSnapshotInvalidation` after a session `snapshot` fails with `No active page`. Shape follows `SessionRefSnapshotInvalidation` in `extensions/agent-browser/lib/session-page-state.ts`: `{ reason: "no-active-page", summary }`. The wrapper deletes prior refs for that session, persists the invalidation for resume, and blocks mutation-prone `@e…` preflight with `failureCategory: "stale-ref"` until a successful fresh `snapshot -i` records refs again.
 - `richInputRecovery` after a raw `find` or compiled `semanticAction` `fill` fails with `selector-not-found` and the same current-ref diagnostic finds exact editable `searchbox` / `textbox` candidates. Shape follows `RichInputRecoveryDiagnostic` in `extensions/agent-browser/lib/results/selector-recovery.ts`: `{ candidates, inputMethodHint, nextActionIds, summary, target }`, where each candidate has `ref`, `role`, `name`, `focusArgs`, `clickArgs`, and `reason`. Visible text appends `Rich input recovery`, and `details.nextActions` gains ids from `getAgentBrowserRichInputRecoveryNextActionIds`: `focus-current-editable-ref` / `click-current-editable-ref` (or numbered variants). These actions are bounded to focus/click/inspect-style recovery: they do not include the fill text, do not press `Enter`, and do not submit. After the right current editable ref is focused, the agent should use `keyboard inserttext` or `keyboard type` with the intended text in a separate call and submit only when explicitly required by the flow.
@@ -619,7 +681,7 @@ Additional structured fields can appear when relevant:
 - `electronGetTextScopeWarning` after a successful attached Electron `get text <selector>` (standalone or successful `batch`) when a broad non-ref CSS selector such as `body`, `html`, `main`, `div`, or `[role=application]` may read the whole app shell. Shape: `{ selector, summary, electronContext: { launchId?, sessionName?, url? } }`; multiple batched diagnostics use `electronGetTextScopeWarnings`. Visible text appends `Broad Electron get text selector warning`, and next actions use `snapshot-for-electron-text-scope` ids with session-scoped `snapshot -i` payloads.
 - `evalStdinHint` after a successful `eval --stdin` when caller stdin (trimmed) looks function-shaped to the wrapper’s lightweight detector (`looksLikeFunctionEvalStdin` in `extensions/agent-browser/index.ts`: leading `function` / `async function`, parenthesized arrow `(…) =>`, or a concise `name =>` / `async name =>` form) **and** upstream JSON `data` is an object whose `result` field is a plain empty object (`{}`). Arrays such as `[]` do not qualify. It includes `reason` and `suggestion`; visible output appends `Eval stdin hint` with the same guidance. This is a heuristic for the common mistake of returning a function object instead of invoking it or passing a plain expression, not a JavaScript parser or proof that the page returned no useful data.
 - `timeoutPartialProgress` after `runAgentBrowserProcess` reports `timedOut` (wrapper child-process watchdog) when best-effort recovery finds useful context. `summary` is a short sentence counting how many declared artifact paths exist on disk versus how many were scanned, and whether page context came from live session reads or only from a planned URL (when nothing in the plan declares an artifact path, the fraction may read `0/0` while `currentPage` can still carry session or planned URL context). `steps` lists planned argv from the compiled `job` or `qa` batch plan (`compiledJob` in `extensions/agent-browser/index.ts`, which is only populated for those top-level modes) or, when that object is absent, from the same JSON-array `batch` stdin the tool sends upstream—whether caller-authored or wrapper-generated for `sourceLookup` / `networkSourceLookup` (1-based indices; only JSON-array stdin whose elements are string[] argv arrays is parsed); timeouts on other argv shapes may still emit `currentPage` / summary evidence without `steps`. `currentPage` comes from session-scoped `get url` / `get title` when the session answers, otherwise a fallback URL may be inferred from the last `open` / `navigate` / `pushstate` step in the plan. `artifacts` covers declared output paths on `screenshot`, `pdf`, `download`, and `wait --download` steps (absolute path, existence, optional `sizeBytes`, `stepIndex`). Visible text repeats the same block under `Timeout partial progress`, applying URL and path-segment redaction; the prose `Planned steps` list shows at most six steps, then an omitted-count line when the plan is longer. This is recovery evidence only; missing entries do not prove the upstream step never ran or that no other side effects occurred.
-- `managedSessionOutcome` after a managed-session plan reaches process execution (`buildManagedSessionOutcome` / `formatManagedSessionOutcomeText` in `extensions/agent-browser/index.ts`). Populated when `buildExecutionPlan` injects an extension-managed implicit or fresh `--session` (omitted when the caller already set explicit upstream `--session` or for stateless inspection paths that skip injection). Fields: `status` (`created`, `replaced`, `unchanged`, `closed`, `preserved`, or `abandoned`), `sessionMode`, `attemptedSessionName`, `previousSessionName`, `currentSessionName`, optional `replacedSessionName`, `activeBefore`, `activeAfter`, `succeeded`, and `summary`. Model-visible echo: only when `sessionMode` is `"fresh"` **and** `succeeded` is false, the wrapper appends a line of the form `Managed session outcome: ${summary}` after the primary presentation (including missing-binary failures on a fresh plan, where it follows the missing-binary message and no other diagnostic tail runs). When other trailing diagnostic prose is also emitted in the same result, that line is concatenated **after** semantic-action candidate lines, overlay/selector-visibility tails, and `Timeout partial progress` (see `rawAppendedDiagnosticText` in `extensions/agent-browser/index.ts`). For `"auto"` failures the same struct may appear on `details` without that extra line. When post-upstream analysis (for example **`qa`** preset failure) flips the overall tool result after a successful batch, the implementation only realigns `managedSessionOutcome.succeeded` to the final outcome; `status`/`summary` may still describe the managed-session transition (for example `replaced` while `failureCategory` is `qa-failure`), so read `failureCategory` / `qaPreset` / `batchFailure` alongside this object.
+- `managedSessionOutcome` after a managed-session plan reaches process execution (`buildManagedSessionOutcome` / `formatManagedSessionOutcomeText` in `extensions/agent-browser/index.ts`). Populated when `buildExecutionPlan` injects an extension-managed implicit or fresh `--session`, and also when a successful explicit `--session <current-wrapper-managed-session> close` closes the current managed session. It remains omitted for unrelated explicit user-managed sessions and for sessionless inspection/local paths that skip injection. Fields: `status` (`created`, `replaced`, `unchanged`, `closed`, `preserved`, or `abandoned`), `sessionMode`, `attemptedSessionName`, `previousSessionName`, `currentSessionName`, optional `replacedSessionName`, `activeBefore`, `activeAfter`, `succeeded`, and `summary`. Model-visible echo: only when `sessionMode` is `"fresh"` **and** `succeeded` is false, the wrapper appends a line of the form `Managed session outcome: ${summary}` after the primary presentation (including missing-binary failures on a fresh plan, where it follows the missing-binary message and no other diagnostic tail runs). When other trailing diagnostic prose is also emitted in the same result, that line is concatenated **after** semantic-action candidate lines, overlay/selector-visibility tails, and `Timeout partial progress` (see `rawAppendedDiagnosticText` in `extensions/agent-browser/index.ts`). For `"auto"` failures the same struct may appear on `details` without that extra line. When post-upstream analysis (for example **`qa`** preset failure) flips the overall tool result after a successful batch, the implementation only realigns `managedSessionOutcome.succeeded` to the final outcome; `status`/`summary` may still describe the managed-session transition (for example `replaced` while `failureCategory` is `qa-failure`), so read `failureCategory` / `qaPreset` / `batchFailure` alongside this object.
 - `imagePath` / `imagePaths` for Pi inline image attachments from the **`screenshot`** command (including batched screenshot steps). **`diff screenshot`** still records the diff output as an `image`-kind entry in `details.artifacts`, but it does **not** populate `imagePath` / `imagePaths` or attach an inline image: only plain `screenshot` is treated as a trusted live-capture path for automatic inlining (`isTrustedScreenshotOutput` in `extensions/agent-browser/lib/results/presentation/artifacts.ts`).
 - `artifacts` for upstream saved files such as screenshots, `state save` outputs, `diff screenshot` diff images, PDFs, downloads, `wait --download` files, traces, CPU profiles, completed WebM recordings, path-bearing HAR captures, and future recording output paths reported by `record start`. Each artifact includes the original saved or requested `path`, resolved `absolutePath`, `kind`/`artifactType`, optional `mediaType`, optional `extension`, best-effort disk metadata such as `exists` and `sizeBytes`, plus `requestedPath`, `status`, `cwd`, `session`, and `tempPath` when applicable.
 - `savedFilePath` / `savedFile` for direct `download`, `pdf`, and `wait --download` saved-file workflows; batch results preserve the same fields on the relevant `batchSteps` entry.
@@ -628,7 +690,7 @@ Additional structured fields can appear when relevant:
 - `fullOutputPath` / `fullOutputPaths` when large snapshot output or other oversized tool output is compacted and spilled to a private file; persisted sessions keep that path under a private session-scoped artifact directory with a bounded per-session budget so it survives reload/resume without unbounded growth
 - `artifactManifest` for a bounded, metadata-only inventory of recent session artifacts. Entries include path metadata, artifact `kind`, source `command`/`subcommand` when safe, `storageScope` (`persistent-session`, `process-temp`, or `explicit-path`), and `retentionState` (`live`, `ephemeral`, `missing`, or `evicted`). The default recent window is 100 entries and can be configured with `PI_AGENT_BROWSER_SESSION_ARTIFACT_MANIFEST_MAX_ENTRIES`. The manifest must not store command args, output contents, headers, DOM snapshots, or downloaded file contents.
 - `artifactRetentionSummary` with a concise count of live, evicted, ephemeral, and missing artifacts from the current manifest; results append this summary to model-facing text only when retention state affects recovery, such as spill files, ephemeral files, or evictions. Routine explicit saved files keep the summary in details to avoid noisy browsing transcripts.
-- `artifactCleanup` after a successful `close` when `artifactManifest` exists and `entries` is non-empty. Fields: `owner: "host-file-tools"`, `summary` (same retention summary string as `artifactRetentionSummary` for that manifest), `note` explaining that browser close does not delete explicit screenshots/downloads/PDFs/traces/HAR/recordings, and `explicitArtifactPaths`: up to ten **distinct existing** paths taken from manifest rows with `storageScope: "explicit-path"` in encounter order (de-duplicated after checking the filesystem); deleted/stale explicit paths are skipped. When the recent window has no existing explicit rows—for example only spill/ephemeral inventory or explicit paths already deleted—the array is empty but `summary` / `note` still surface so agents know close is not file deletion. The native browser tool intentionally does not expose a delete operation for arbitrary user-chosen artifact paths; agents should inspect `artifactVerification` / manifest metadata, then remove files with normal host file tools when cleanup is required.
+- `artifactCleanup` after a successful close command (`close`, `quit`, or `exit`) when `artifactManifest` exists and `entries` is non-empty. Fields: `owner: "host-file-tools"`, `summary` (same retention summary string as `artifactRetentionSummary` for that manifest), `note` explaining that browser close commands do not delete explicit screenshots/downloads/PDFs/traces/HAR/recordings, and `explicitArtifactPaths`: up to ten **distinct existing** paths taken from manifest rows with `storageScope: "explicit-path"` in encounter order (de-duplicated after checking the filesystem); deleted/stale explicit paths are skipped. When the recent window has no existing explicit rows—for example only spill/ephemeral inventory or explicit paths already deleted—the array is empty but `summary` / `note` still surface so agents know close is not file deletion. The native browser tool intentionally does not expose a delete operation for arbitrary user-chosen artifact paths; agents should inspect `artifactVerification` / manifest metadata, then remove files with normal host file tools when cleanup is required.
 - compact **snapshot** metadata on successful presentation when `details.data.compacted` is true (oversized trees): `previewMode` (`"structured"` vs outline `"outline"`), `structuredPreviewUsed`, `previewRefIds`, `previewSections` (per-section `linesShown` / `omittedLines` / root `role` / `title`), `additionalSectionsOmitted`, counts such as `refCount`, `snapshotLineCount`, and `roleCounts`, optional `highValueControlRefIds` aligned with the visible bounded `Omitted high-value controls` lines, and optional `spillError` when the wrapper could not write the raw spill file; the model text still ends with `Full raw snapshot path:` or an explicit unavailable reason plus `details.fullOutputPath` when a path exists
 - `sessionRecoveryHint` when startup-scoped flags need `sessionMode: "fresh"` while an implicit session is already active: includes `reason`, `recommendedSessionMode` (`"fresh"`), redacted `exampleArgs`, and `exampleParams` where `sessionMode` is `"fresh"` and `args` is the same redacted argv as `exampleArgs` (from `buildExecutionPlan` in `extensions/agent-browser/lib/runtime.ts`, merged through `redactRecoveryHint` in `extensions/agent-browser/index.ts`)
 - `inspection: true` plus `stdout` for successful plain-text inspection commands like `--help` and `--version`
@@ -672,15 +734,18 @@ If `agent-browser` is not on `PATH`, fail with a message that:
 - derive the base implicit session name from the official `pi` session id plus a cwd hash so same-named checkouts do not collide
 - respect explicit upstream `--session` with minimal interference
 - treat the extension-managed session as convenience state owned by the wrapper
-- preserve the current extension-managed session across `/reload` and resumable session transitions so persisted sessions can keep following the live browser on `/reload` or `/resume`
+- preserve the current branch-visible extension-managed session across `/reload`, exact-session relaunch, `/resume`, and Pi `session_tree` branch transitions so persisted sessions can keep following the live browser after lifecycle changes
 - close the active extension-managed session when the originating `pi` process quits, while leaving explicit caller-provided sessions alone
 - set an idle timeout on extension-managed sessions as a backstop for abnormal exits or cleanup failures
 - clean up process-private temp spill artifacts on shutdown, while keeping persisted-session snapshot spill files in a private session-scoped artifact directory so `details.fullOutputPath` survives reload/restart and the oldest spill files are evicted if the per-session artifact budget is exceeded
-- reconstruct the current extension-managed session and latest `artifactManifest` from persisted tool details on resume/reload so later default calls keep following the active managed browser and can continue reporting artifact retention state
+- reconstruct the current branch-visible extension-managed session, latest page-scoped refs, latest `artifactManifest`, and wrapper-tracked Electron launch records from the active transcript branch on `session_start` and Pi `session_tree` so later default calls keep following the active managed browser and can continue reporting artifact retention state; successful explicit wrapper-owned close rows and `electron.cleanup` managed-session steps are restore-visible close events
+- keep runtime cleanup ownership separate from branch-visible state: `session_tree` restore and wrapper-owned browser commands are serialized with managed-session work; independent caller-owned explicit-session commands can still run in parallel, but a branch-state generation guard prevents stale completions from overwriting a newer branch restore. Extension-managed sessions and wrapper-launched Electron records owned by the current process remain eligible for quit/cleanup, and fresh-session allocation stays monotonic across branch restores, including auto rows and close rows that reference wrapper-generated fresh names
+- when a close command or `electron.cleanup` successfully closes the current wrapper-managed session, clear live page/ref state, reserve the next generated fresh-session ordinal, and rotate the next default auto call to a fresh wrapper-generated session name rather than reusing the closed name
+- when `/reload` shuts down an extension instance, close off-branch owned managed sessions and off-branch owned Electron launches before clearing process-local ownership; preserve only the current branch-visible active managed session and active Electron launch plus its isolated `userDataDir` for reload continuity, and also persistently protect `userDataDir` paths when partial cleanup intentionally skips or fails profile removal so later temp cleanup, process exit, and stale temp-root pruning after restart do not violate Electron cleanup's safety decision; rebuild active branch state from the active branch on the next `session_start`
 - when an unnamed `sessionMode: "fresh"` launch succeeds, make it the new extension-managed session so later default calls keep using it
 - when an unnamed `sessionMode: "fresh"` launch fails or times out, preserve the previous managed session when one was active or report the attempted fresh session as abandoned when no managed session was active (`details.managedSessionOutcome`; visible `Managed session outcome: …` only when the final tool call used `sessionMode: "fresh"` and failed—see `#details`)
 - if that unnamed fresh launch replaced an already-active managed session, best-effort close the old managed session after the switch succeeds
-- treat explicit caller-provided `--session` choices as user-managed; `--session` isolates a live browser session but is not a persisted tab/auth restore mechanism after `close`, so use `--profile`, `--session-name`, or `--state` when persisted auth/tab state is required
+- treat explicit caller-provided `--session` choices as user-managed; `--session` isolates a live browser session but is not a persisted tab/auth restore mechanism after a close command (`close`, `quit`, or `exit`), so use `--profile`, `--session-name`, or `--state` when persisted auth/tab state is required
 - pass explicit `--profile` straight through to upstream `agent-browser`; no profile-cloning or isolation layer is added in v1
 <!-- agent-browser-playbook:start wrapper-tab-recovery -->
 <!-- Generated from extensions/agent-browser/lib/playbook.ts. Run `npm run docs -- playbook write` to update. -->