npm - pi-agent-browser-native - Versions diffs - 0.2.26 → 0.2.27 - Mend

pi-agent-browser-native 0.2.26 → 0.2.27

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (10) hide show

package/CHANGELOG.md +9 -0
package/README.md +3 -3
package/docs/COMMAND_REFERENCE.md +5 -5
package/docs/SUPPORT_MATRIX.md +4 -4
package/docs/TOOL_CONTRACT.md +7 -6
package/extensions/agent-browser/index.ts +112 -17
package/extensions/agent-browser/lib/playbook.ts +2 -2
package/extensions/agent-browser/lib/results/presentation.ts +17 -2
package/extensions/agent-browser/lib/results.ts +1 -0
package/package.json +1 -1

package/CHANGELOG.md CHANGED Viewed

@@ -1,5 +1,14 @@
 # Changelog
+## 0.2.27 - 2026-05-14
+### Fixed
+- `semanticAction` role/name click, check, and uncheck calls in active sessions now resolve through the current `snapshot -i` refs before execution, preventing hidden duplicate upstream `find` matches from stealing the action while preserving the original target in `details.compiledSemanticAction` and showing the executed ref in `details.effectiveArgs`.
+- QA presets now default to `loadState: "domcontentloaded"` and accept explicit `domcontentloaded`, `load`, or `networkidle`, avoiding wrapper watchdog timeouts on analytics-heavy or long-polling docs sites while keeping stricter waits opt-in.
+- Network request presentation now shows actionable and benign failed rows before successful rows, so late failures remain visible even when request previews are capped.
+- Overlay blocker diagnostics now require strong modal context (`dialog` / `alertdialog`) before suggesting close/dismiss candidates, eliminating noisy warnings after ordinary same-page menu opens or app button mutations.
+- Artifact lifecycle cleanup guidance now lists only explicit artifact paths that still exist on disk, skipping deleted/stale paths while preserving the close-does-not-delete reminder.
 ## 0.2.26 - 2026-05-14
 ### Added

package/README.md CHANGED Viewed

@@ -191,11 +191,11 @@ Typical pitfalls:
 - Supply **exactly one** of `args`, `semanticAction`, `job`, `qa`, `sourceLookup`, or `networkSourceLookup` per call (not more, not none).
 - `semanticAction` and `job` are **not** valid inside `batch` stdin; batch steps stay upstream argv string arrays (spell a `find` step as tokens there if you need it in a batch).
 - Commands or locators outside the supported shorthand still require explicit `args`. Common page getters are grouped under `get`: use `get title`, `get url`, or `get text <selector>` rather than shortcut commands such as `title` or `url`; unknown getter shortcuts can return read-only `details.nextActions` like `use-get-title`.
-- Use `semanticAction.session` to target a named upstream browser session; the wrapper prepends `--session <name>` before `find` and keeps that prefix on retry/candidate actions.
+- Use `semanticAction.session` to target a named upstream browser session; the wrapper prepends `--session <name>` before `find` and keeps that prefix on retry/candidate actions. In active sessions, role/name click/check/uncheck shorthands may resolve through the current `snapshot -i` refs before execution so hidden duplicate matches do not steal the action; `details.effectiveArgs` shows the exact executed argv.
 - Do not reuse `@e…` refs across navigation. The wrapper records the latest snapshot refs per session and fails mutation-prone stale/recycled refs before upstream can silently hit a different current-page element; use the session-aware `refresh-interactive-refs` next action.
 - If upstream classifies the failure as `stale-ref` and `details.compiledSemanticAction` is present, `details.nextActions` may list `retry-semantic-action-after-stale-ref` after `refresh-interactive-refs`, carrying the same compiled `find` argv so you can retry the locator-stable target once it is safe to do so (contract in [`docs/TOOL_CONTRACT.md#semanticaction`](docs/TOOL_CONTRACT.md#semanticaction)).
 - If the failure is `selector-not-found` for a compiled `semanticAction`, visible text may add `Agent-browser candidate fallbacks` and `details.nextActions` may list bounded `try-*-candidate` follow-ups (role/name retries only for `fill` + `placeholder`, `click` + `text`, or `fill` + `label`; `select` misses do not get these entries); prefer those payloads or a fresh snapshot over guessing new selectors (same contract link).
-- If a **top-level** `click` succeeds (unified command `click`, not a `batch` step), upstream reports `data.clicked`, and the tab URL is unchanged under the same normalization as ref preflight (fragment-insensitive), the wrapper may take one extra `snapshot -i` and add `Possible overlay blockers` with `details.overlayBlockers` (`candidates`, `summary`, optional `snapshot` refresh for refs) plus session-aware `inspect-overlay-state` / bounded `try-overlay-blocker-candidate-*` next actions when that snapshot shows overlay/banner/dialog context and close/dismiss-like controls. The unchanged-URL check uses `details.navigationSummary`, which is only populated via follow-up `get url` / `get title` when the click JSON omits **both** string `data.url` and `data.title`; if upstream already includes either, overlay diagnostics are skipped here. Also skipped when tab correction or about-blank recovery already ran on that result.
+- If a **top-level** `click` succeeds (unified command `click`, not a `batch` step), upstream reports `data.clicked`, and the tab URL is unchanged under the same normalization as ref preflight (fragment-insensitive), the wrapper may take one extra `snapshot -i` and add `Possible overlay blockers` with `details.overlayBlockers` (`candidates`, `summary`, optional `snapshot` refresh for refs) plus session-aware `inspect-overlay-state` / bounded `try-overlay-blocker-candidate-*` next actions when that snapshot shows strong modal context (`dialog` / `alertdialog`) and close/dismiss-like controls. Page-wide words like privacy, sign in, or banner alone do not trigger this diagnostic. The unchanged-URL check uses `details.navigationSummary`, which is only populated via follow-up `get url` / `get title` when the click JSON omits **both** string `data.url` and `data.title`; if upstream already includes either, overlay diagnostics are skipped here. Also skipped when tab correction or about-blank recovery already ran on that result.
 - If `get text <selector>` reads a non-ref CSS selector with multiple matches or a hidden first match while visible matches exist, including successful `batch` steps, the wrapper may add `Selector text visibility warning`, `details.selectorTextVisibility` (plus `selectorTextVisibilityAll` for multiple batched warnings), and `inspect-visible-text-candidates` next actions; prefer a visible `@ref`, a scoped selector, or a targeted `eval --stdin` over hidden tab content.
 ### Constrained browser jobs
@@ -218,7 +218,7 @@ Use raw `args`/`stdin` when you need full upstream `batch` power, custom flags,
 ### Lightweight QA preset
-For a quick smoke/QA pass, use top-level `qa`. It compiles to the same batch path as `job`, clears enabled network/console/page-error buffers before opening the target URL, waits for page readiness, checks optional expected text or selector, inspects fresh network requests, console messages, and page errors, and can capture an evidence screenshot. `checkNetwork`, `checkConsole`, and `checkErrors` default to true; set one to `false` to skip that diagnostic read. Network failures are classified by likely impact: actionable document/script/API-style failures still fail QA, while some low-impact browser icon asset misses (for example certain `favicon` or `apple-touch-icon` paths when upstream marks the row failed and resource metadata looks image-like) surface only as warnings instead of failing an otherwise healthy smoke check (`details.qaPreset.warnings`, with human-readable `details.qaPreset.summary` when the preset still passes). Exact predicates live in [`docs/TOOL_CONTRACT.md`](docs/TOOL_CONTRACT.md#qa) and `classifyNetworkRequestFailure` in `extensions/agent-browser/lib/results/shared.ts`.
+For a quick smoke/QA pass, use top-level `qa`. It compiles to the same batch path as `job`, clears enabled network/console/page-error buffers before opening the target URL, waits for page readiness, checks optional expected text or selector, inspects fresh network requests, console messages, and page errors, and can capture an evidence screenshot. `loadState` defaults to `"domcontentloaded"`; set it to `"load"` or `"networkidle"` only when the stricter state is useful and the site is not expected to keep background requests alive. `checkNetwork`, `checkConsole`, and `checkErrors` default to true; set one to `false` to skip that diagnostic read. Network failures are classified by likely impact and failed rows are listed first in network previews: actionable document/script/API-style failures still fail QA, while some low-impact browser icon asset misses (for example certain `favicon` or `apple-touch-icon` paths when upstream marks the row failed and resource metadata looks image-like) surface only as warnings instead of failing an otherwise healthy smoke check (`details.qaPreset.warnings`, with human-readable `details.qaPreset.summary` when the preset still passes). Exact predicates live in [`docs/TOOL_CONTRACT.md`](docs/TOOL_CONTRACT.md#qa) and `classifyNetworkRequestFailure` in `extensions/agent-browser/lib/results/shared.ts`.
 ```json
 {

package/docs/COMMAND_REFERENCE.md CHANGED Viewed

@@ -133,13 +133,13 @@ Examples:
 { "args": ["snapshot", "-i"] }
 ```
-The optional native `semanticAction` object is only a thin schema for common locator-based actions; it compiles to existing upstream `find` commands and reports the compiled argv in `details.compiledSemanticAction` (see [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#semanticaction) for the full field rules). It is a top-level alternative to `args`, `job`, `qa`, `sourceLookup`, and `networkSourceLookup`, not a nested shape inside `batch` stdin arrays. Add `session` inside `semanticAction` when the shorthand should target a named upstream browser session; the compiled argv prepends `--session <name>` before `find`, and fallback candidate actions preserve that prefix. If a semantic action misses with `selector-not-found`, visible output may include `Agent-browser candidate fallbacks`, while `details.nextActions` first recommends a fresh `snapshot -i` and may include bounded role/name retries—for example `searchbox`/`textbox` for a missed `placeholder` fill, `button`/`link` for a missed `text` click, or a `textbox` retry for a missed `label` fill—each as a `try-*-candidate` entry carrying redacted `find role …` argv.
+The optional native `semanticAction` object is only a thin schema for common locator-based actions; it compiles to existing upstream `find` commands and reports the compiled argv in `details.compiledSemanticAction` (see [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#semanticaction) for the full field rules). It is a top-level alternative to `args`, `job`, `qa`, `sourceLookup`, and `networkSourceLookup`, not a nested shape inside `batch` stdin arrays. Add `session` inside `semanticAction` when the shorthand should target a named upstream browser session; the compiled argv prepends `--session <name>` before `find`, and fallback candidate actions preserve that prefix. For active sessions, role/name click/check/uncheck shorthands may resolve through the current `snapshot -i` refs before execution so hidden duplicate matches do not steal the action; inspect `details.effectiveArgs` when you need the exact executed argv. If a semantic action misses with `selector-not-found`, visible output may include `Agent-browser candidate fallbacks`, while `details.nextActions` first recommends a fresh `snapshot -i` and may include bounded role/name retries—for example `searchbox`/`textbox` for a missed `placeholder` fill, `button`/`link` for a missed `text` click, or a `textbox` retry for a missed `label` fill—each as a `try-*-candidate` entry carrying redacted `find role …` argv.
 Do not assume Playwright selector dialects such as `text=Close` or `button:has-text('Close')` are supported wrapper syntax. If you need those forms, verify current upstream `agent-browser` behavior first; otherwise use refs, `find`, or known CSS selectors.
 Treat `@e…` refs as page-scoped. After a successful `snapshot`, the wrapper records the latest refs and page target for that session; mutation-prone ref commands such as `click @e4` or batch steps with old refs fail with `failureCategory: "stale-ref"` when the page target changed or the ref is absent from the latest same-page snapshot. Inside `batch` stdin JSON, the wrapper also walks steps in order before spawn: steps whose first token can navigate or mutate set a latch; a later step whose first token is `snapshot` clears that latch for following rows; guarded steps that still mention `@e…` after an uncleared latch fail with the same `stale-ref` bucket without launching upstream. Follow the `refresh-interactive-refs` next action (it includes `--session <name>` when needed) and prefer stable `find` or `semanticAction` locators when navigation or rerendering is likely. Contract detail: [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#details) (`refSnapshot`).
-When a **top-level** `click` succeeds (not a `click` hidden inside a `batch`/`job` tool call—the unified command must be `click`), the upstream payload includes `data.clicked`, and the wrapper sees the active tab URL unchanged after the same normalization it uses for ref guards (**`#fragment` ignored**), it may run one extra `snapshot -i` and surface `Possible overlay blockers` plus `details.overlayBlockers` (`candidates`, `summary`, and a `snapshot` map that can refresh `refSnapshot`) when that snapshot shows overlay/banner/dialog context **and** up to three close/dismiss-like controls. The URL check compares the session’s prior pinned tab target to `details.navigationSummary.url` after the click; that summary is gathered with extra `get url` / `get title` calls only when the click JSON omits **both** string `data.url` and `data.title`—if upstream already echoes either field, overlay diagnostics are skipped on this path. The diagnostic is skipped if the wrapper already applied tab-focus correction or about-blank recovery on that result. Appended `inspect-overlay-state` / `try-overlay-blocker-candidate-*` entries in `details.nextActions` include `--session <name>` when the session is named, same as other session-scoped follow-ups. Treat `inspect-overlay-state` as the safe first follow-up; only use a `try-overlay-blocker-candidate-*` next action when the candidate is clearly the control you intend to close.
+When a **top-level** `click` succeeds (not a `click` hidden inside a `batch`/`job` tool call—the unified command must be `click`), the upstream payload includes `data.clicked`, and the wrapper sees the active tab URL unchanged after the same normalization it uses for ref guards (**`#fragment` ignored**), it may run one extra `snapshot -i` and surface `Possible overlay blockers` plus `details.overlayBlockers` (`candidates`, `summary`, and a `snapshot` map that can refresh `refSnapshot`) when that snapshot shows strong modal context (`dialog` / `alertdialog`) **and** up to three close/dismiss-like controls; page-wide words such as privacy, sign in, or banner alone do not trigger it. The URL check compares the session’s prior pinned tab target to `details.navigationSummary.url` after the click; that summary is gathered with extra `get url` / `get title` calls only when the click JSON omits **both** string `data.url` and `data.title`—if upstream already echoes either field, overlay diagnostics are skipped on this path. The diagnostic is skipped if the wrapper already applied tab-focus correction or about-blank recovery on that result. Appended `inspect-overlay-state` / `try-overlay-blocker-candidate-*` entries in `details.nextActions` include `--session <name>` when the session is named, same as other session-scoped follow-ups. Treat `inspect-overlay-state` as the safe first follow-up; only use a `try-overlay-blocker-candidate-*` next action when the candidate is clearly the control you intend to close.
 ### Extract page data
@@ -180,7 +180,7 @@ For short constrained flows, use top-level `job` instead of hand-writing `batch`
 Use raw `args: ["batch"]` with `stdin` when you need arbitrary upstream commands, flags, or batch failure policies outside the constrained schema. Do not pass `stdin` with `job`, `qa`, `sourceLookup`, or `networkSourceLookup`; those modes generate the batch stdin themselves.
-For quick smoke/QA checks, use top-level `qa`. It clears enabled network/console/page-error buffers before opening the target URL, waits for page readiness, checks expected text/selector, inspects fresh network requests, console messages, and page errors, and can capture an evidence screenshot. QA network diagnostics classify failed requests by likely impact: actionable document/script/API-style failures fail the preset, while common low-impact browser icon misses such as `favicon.ico` are surfaced as warnings (`qaPreset.warnings`) so they do not fail an otherwise healthy page.
+For quick smoke/QA checks, use top-level `qa`. It clears enabled network/console/page-error buffers before opening the target URL, waits for page readiness, checks expected text/selector, inspects fresh network requests, console messages, and page errors, and can capture an evidence screenshot. The readiness wait defaults to `loadState: "domcontentloaded"`; set `loadState` to `"load"` or `"networkidle"` only when that stricter state is useful and the site is not expected to keep background requests alive. QA network diagnostics classify failed requests by likely impact and list failed rows first in the network preview: actionable document/script/API-style failures fail the preset, while common low-impact browser icon misses such as `favicon.ico` are surfaced as warnings (`qaPreset.warnings`) so they do not fail an otherwise healthy page.
 The same classification drives plain `network requests` presentation: when any row counts as failed (HTTP status ≥ 400, `failed: true`, or a string `error`), model-facing text starts with a line like `Network failure summary: 0 actionable, 1 benign low-impact (1 total).`, and each preview line can end with an impact tag such as `[benign: low-impact browser icon asset]` or `[actionable: document, script, API, or non-benign request failure]`. Rules live in `classifyNetworkRequestFailure` / `summarizeNetworkFailures` in `extensions/agent-browser/lib/results/shared.ts`; QA aggregation is `analyzeQaPresetResults` in `extensions/agent-browser/index.ts`.
@@ -188,7 +188,7 @@ The same classification drives plain `network requests` presentation: when any r
 { "qa": { "url": "https://example.com", "expectedText": "Example Domain", "screenshotPath": ".dogfood/qa-example.png" } }
 ```
-Optional `checkNetwork`, `checkConsole`, and `checkErrors` default to `true`; set one to `false` to skip that diagnostic. Omit `expectedText` and `expectedSelector` when you only need load plus diagnostics.
+Optional `loadState`, `checkNetwork`, `checkConsole`, and `checkErrors` default to `"domcontentloaded"`, `true`, `true`, and `true`; set a check to `false` to skip that diagnostic. Omit `expectedText` and `expectedSelector` when you only need load plus diagnostics.
 Use custom `job` or raw `batch` when you need a different check sequence.
@@ -265,7 +265,7 @@ The wrapper keeps a bounded, metadata-only `details.artifactManifest` of recent
 This manifest cap controls what appears in `details.artifactManifest` and in summaries such as `Session artifacts: 42 live, 0 evicted (42/100 recent)`. It does not delete explicit files that upstream saved to paths you chose, such as screenshots, PDFs, downloads, traces, HAR files, or WebM recordings.
-Browser `close` is also not file cleanup. If `details.artifactManifest` is present with a non-empty `entries` list, a successful `close` appends an `Artifact lifecycle` note and reports `details.artifactCleanup` with the current retention summary and the same host-owned cleanup `note` as the contract (`extensions/agent-browser/index.ts`, `getArtifactCleanupGuidance`). Up to ten distinct user-chosen paths appear in `explicitArtifactPaths` when matching `explicit-path` manifest rows exist in the recent window; otherwise that array is empty and visible text may omit the “Explicit artifact paths” line even though the lifecycle block still reminds you that close does not delete saved files. Delete any paths you care about with host file tools after inspection; the native browser tool intentionally does not remove arbitrary user-chosen filesystem paths.
+Browser `close` is also not file cleanup. If `details.artifactManifest` is present with a non-empty `entries` list, a successful `close` appends an `Artifact lifecycle` note and reports `details.artifactCleanup` with the current retention summary and the same host-owned cleanup `note` as the contract (`extensions/agent-browser/index.ts`, `getArtifactCleanupGuidance`). Up to ten distinct user-chosen paths that still exist on disk appear in `explicitArtifactPaths` when matching `explicit-path` manifest rows exist in the recent window; deleted/stale paths are skipped. Otherwise that array is empty and visible text may omit the “Explicit artifact paths” line even though the lifecycle block still reminds you that close does not delete saved files. Delete any paths you care about with host file tools after inspection; the native browser tool intentionally does not remove arbitrary user-chosen filesystem paths.
 Oversized snapshots and oversized generic outputs are different: when a persisted pi session is available, their wrapper-managed spill files are stored under the private session artifact directory and are governed by the byte budget `PI_AGENT_BROWSER_SESSION_ARTIFACT_MAX_BYTES` (default 32 MiB). Raise that byte budget as well for long QA sessions that need many full raw snapshots or large text spills to survive reload/resume.

package/docs/SUPPORT_MATRIX.md CHANGED Viewed

@@ -64,17 +64,17 @@ Native `job`, `qa`, experimental `sourceLookup`, and experimental `networkSource
 `RQ-0068` closed with a no-adopt decision for reusable browser recipes. Current benchmark and repo-local dogfood evidence do not show repeated named job shapes that justify executable recipe state; examples stay in docs and prompt guidance, while the `qa` preset remains the only stable repeated smoke-test shortcut. Revisit recipes only with concrete repeated workflow evidence and a defined owner/versioning/test plan.
-`RQ-0070` adds bounded locator fallbacks when a compiled top-level `semanticAction` fails with `failureCategory: "selector-not-found"`: `extensions/agent-browser/index.ts` appends `try-*-candidate` entries to `details.nextActions` (and an `Agent-browser candidate fallbacks` block in visible text) only for `fill`+`placeholder`, `click`+`text`, or `fill`+`label` (not `select`, even with the same locators). Contract: [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#semanticaction); fake coverage: `agentBrowserExtension returns semantic locator candidates when semanticAction misses` in [`test/agent-browser.extension-validation.test.ts`](../test/agent-browser.extension-validation.test.ts).
+`RQ-0070` adds bounded locator fallbacks when a compiled top-level `semanticAction` fails with `failureCategory: "selector-not-found"`: `extensions/agent-browser/index.ts` appends `try-*-candidate` entries to `details.nextActions` (and an `Agent-browser candidate fallbacks` block in visible text) only for `fill`+`placeholder`, `click`+`text`, or `fill`+`label` (not `select`, even with the same locators). Active-session role/name click/check/uncheck shorthands also get a pre-execution visible-ref resolution pass via one fresh `snapshot -i`, so hidden duplicate upstream `find` matches do not steal the action; the original target remains in `details.compiledSemanticAction` and the executed ref appears in `details.effectiveArgs`. Contract: [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#semanticaction); fake coverage: `agentBrowserExtension returns semantic locator candidates when semanticAction misses` and `agentBrowserExtension resolves semantic role clicks through current visible snapshot refs when available` in [`test/agent-browser.extension-validation.test.ts`](../test/agent-browser.extension-validation.test.ts).
 `RQ-0071` makes that shorthand session-aware: optional `semanticAction.session` compiles to `--session <name>` before `find`, so `buildExecutionPlan` treats the call like any argv that already names an upstream session (no extra implicit `--session`); `details.sessionName` reflects the name on success; stale-ref retries copy compiled argv with that prefix, and `try-*` candidates preserve the same `--session` prefix via `getCompiledSemanticActionSessionPrefix`. Contract: [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#semanticaction); fake coverage: `semanticAction` session compile/assertions in [`test/agent-browser.extension-validation.test.ts`](../test/agent-browser.extension-validation.test.ts).
 `RQ-0072` guards page-scoped `@e…` refs against silent recycling: successful `snapshot` (or the last `snapshot` step inside a successful `batch`) records `details.refSnapshot` with ref ids and the snapshot page URL; `extensions/agent-browser/index.ts` replays per-session snapshots from the transcript on reload/resume, clears them on successful `close`, rejects mutation-prone ref argv before spawn when the tab URL diverges or a ref id is missing from the latest snapshot, blocks `batch` stdin that uses `@e…` on a guarded command after an earlier step that can navigate or mutate until a `snapshot` step appears later in the same stdin array (pre-spawn latch reset only), and prefixes `refresh-interactive-refs` with `--session` when the call names a session (including upstream-classified `stale-ref` outcomes). Contract: [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#details) (`refSnapshot`, `stale-ref`); human workflow: [`COMMAND_REFERENCE.md`](COMMAND_REFERENCE.md) snapshot/ref notes and README pitfalls; fake coverage: `agentBrowserExtension blocks page-scoped ref reuse…`, `…blocks stale refs after page-changing steps inside a batch`, `…allows batch stdin ref steps after snapshot following an invalidating step`, `…records snapshot refs returned inside a successful batch`, and `…rejects refs absent from the latest same-page snapshot` in [`test/agent-browser.extension-validation.test.ts`](../test/agent-browser.extension-validation.test.ts).
-`RQ-0073` surfaces likely overlay blockers after no-navigation clicks without inventing blind targets: for **top-level** `click` results (unified command `click`, not `batch`-wrapped steps) whose upstream JSON includes `data.clicked`, whose prior pinned tab URL and post-click URL (from `details.navigationSummary`, normally via `get url` when the click payload omits **both** string `data.url` and `data.title`) stay equal after the same fragment-insensitive normalization used for ref preflight, and where the same unified result did **not** already apply session tab correction or about-blank mismatch recovery, `extensions/agent-browser/index.ts` takes one fresh session-scoped `snapshot -i`, scans `refs` for overlay/banner/dialog context plus up to three close/dismiss-pattern `button`/`link`/`menuitem` controls, and only then emits `details.overlayBlockers` (`candidates`, `summary`, and a `snapshot` map that can advance `refSnapshot`), visible `Possible overlay blockers`, and `inspect-overlay-state` / `try-overlay-blocker-candidate-*` next actions (with `--session` prefix when the session is named) appended after presentation follow-ups such as `inspect-after-mutation`. Contract: [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#details) (`overlayBlockers`); human workflow: [`COMMAND_REFERENCE.md`](COMMAND_REFERENCE.md) no-navigation click note and README pitfalls; fake coverage: `agentBrowserExtension surfaces likely overlay blockers after a no-op click` in [`test/agent-browser.extension-validation.test.ts`](../test/agent-browser.extension-validation.test.ts).
+`RQ-0073` surfaces likely overlay blockers after no-navigation clicks without inventing blind targets: for **top-level** `click` results (unified command `click`, not `batch`-wrapped steps) whose upstream JSON includes `data.clicked`, whose prior pinned tab URL and post-click URL (from `details.navigationSummary`, normally via `get url` when the click payload omits **both** string `data.url` and `data.title`) stay equal after the same fragment-insensitive normalization used for ref preflight, and where the same unified result did **not** already apply session tab correction or about-blank mismatch recovery, `extensions/agent-browser/index.ts` takes one fresh session-scoped `snapshot -i`, scans `refs` for strong modal context (`dialog` / `alertdialog`) plus up to three close/dismiss-pattern `button`/`link`/`menuitem` controls, and only then emits `details.overlayBlockers` (`candidates`, `summary`, and a `snapshot` map that can advance `refSnapshot`), visible `Possible overlay blockers`, and `inspect-overlay-state` / `try-overlay-blocker-candidate-*` next actions (with `--session` prefix when the session is named) appended after presentation follow-ups such as `inspect-after-mutation`. Page-wide privacy/sign-in/banner text without a dialog role is deliberately ignored to avoid warnings after ordinary same-page clicks. Contract: [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#details) (`overlayBlockers`); human workflow: [`COMMAND_REFERENCE.md`](COMMAND_REFERENCE.md) no-navigation click note and README pitfalls; fake coverage: `agentBrowserExtension surfaces likely overlay blockers after a no-op click` and `agentBrowserExtension does not report overlay blockers from unrelated page chrome after a successful same-page click` in [`test/agent-browser.extension-validation.test.ts`](../test/agent-browser.extension-validation.test.ts).
 `RQ-0074` warns when `get text <selector>` may read hidden or tabbed DOM content: for non-ref CSS selectors, `extensions/agent-browser/index.ts` runs a read-only `eval --stdin` visibility probe after successful text reads, emits `details.selectorTextVisibility` plus visible warning text when the first match is hidden while visible matches exist or when multiple matches make the upstream first-match choice ambiguous, preserves multiple batched warnings in `details.selectorTextVisibilityAll`, and appends `inspect-visible-text-candidates` next actions. Contract: [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#details) (`selectorTextVisibility`); human workflow: [`COMMAND_REFERENCE.md`](COMMAND_REFERENCE.md) extraction note and README pitfalls; fake coverage: `agentBrowserExtension warns when get text may read hidden selector matches` in [`test/agent-browser.extension-validation.test.ts`](../test/agent-browser.extension-validation.test.ts).
-`RQ-0075` classifies QA and diagnostic network failures by likely impact: `summarizeNetworkFailures` / `classifyNetworkRequestFailure` in `extensions/agent-browser/lib/results/shared.ts` split rows that already count as failed (`isFailedNetworkRequest`) into actionable versus benign low-impact browser icon asset misses (`isBenignAssetFailure`: favicon/apple-touch-icon basename patterns, 404/`failed`/string `error` signals, and image-like `resourceType`/`mimeType` when present). `analyzeQaPresetResults` fails `qa` only for actionable network failures while preserving benign rows in `qaPreset.warnings`, and network request presentation adds a compact actionable/benign summary plus per-row impact tags. Contract: [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#qa) and [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#details); human workflow: [`COMMAND_REFERENCE.md`](COMMAND_REFERENCE.md) QA and network diagnostic notes; fake coverage: `agentBrowserExtension compiles lightweight QA presets and fails diagnostics` in [`test/agent-browser.extension-validation.test.ts`](../test/agent-browser.extension-validation.test.ts) plus network presentation assertions in [`test/agent-browser.presentation.test.ts`](../test/agent-browser.presentation.test.ts).
+`RQ-0075` classifies QA and diagnostic network failures by likely impact: `summarizeNetworkFailures` / `classifyNetworkRequestFailure` in `extensions/agent-browser/lib/results/shared.ts` split rows that already count as failed (`isFailedNetworkRequest`) into actionable versus benign low-impact browser icon asset misses (`isBenignAssetFailure`: favicon/apple-touch-icon basename patterns, 404/`failed`/string `error` signals, and image-like `resourceType`/`mimeType` when present). `analyzeQaPresetResults` fails `qa` only for actionable network failures while preserving benign rows in `qaPreset.warnings`, and network request presentation adds a compact actionable/benign summary plus per-row impact tags, ordered with actionable/benign failed rows before successful rows so late failures are visible even in capped previews. Contract: [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#qa) and [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#details); human workflow: [`COMMAND_REFERENCE.md`](COMMAND_REFERENCE.md) QA and network diagnostic notes; fake coverage: `agentBrowserExtension compiles lightweight QA presets and fails diagnostics` in [`test/agent-browser.extension-validation.test.ts`](../test/agent-browser.extension-validation.test.ts) plus network presentation assertions in [`test/agent-browser.presentation.test.ts`](../test/agent-browser.presentation.test.ts).
 `RQ-0076` adds best-effort timeout recovery when the wrapper watchdog kills a stuck upstream process: `extensions/agent-browser/index.ts` calls `collectTimeoutPartialProgress` / `formatTimeoutPartialProgressText` to build `details.timeoutPartialProgress` from the compiled `job` or `qa` step list or parsed caller `batch` stdin, session-scoped `get url` / `get title` (plus optional planned-URL fallback from `open`/`navigate`/`pushstate` steps), and declared artifact paths (`screenshot`, `pdf`, `download`, `wait --download`) with existence/size checks, then appends a visible `Timeout partial progress` block with redacted URLs/paths. Contract: [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#details); human workflow: [`COMMAND_REFERENCE.md`](COMMAND_REFERENCE.md) wrapper timeout note and README job section; fake coverage: `agentBrowserExtension reports partial progress and artifacts after job timeout` in [`test/agent-browser.extension-validation.test.ts`](../test/agent-browser.extension-validation.test.ts).
@@ -82,4 +82,4 @@ Native `job`, `qa`, experimental `sourceLookup`, and experimental `networkSource
 `RQ-0078` improves getter/eval discoverability: `extensions/agent-browser/lib/results/presentation.ts` matches upstream failure text containing `unknown command`, `unknown subcommand`, or `unrecognized command` (case-insensitive) when the failed command token is one of `attr`, `count`, `html`, `text`, `title`, `url`, or `value`, then adds grouped-`get` prose; only `title` / `url` also emit read-only `nextActions` (`use-get-title` / `use-get-url`, with `--session` when the failed call named a session). The getter block is skipped when selector recovery already injected an `Agent-browser hint:` line into the same error string. `extensions/agent-browser/index.ts` adds `details.evalStdinHint` plus visible `Eval stdin hint` when `looksLikeFunctionEvalStdin` matches trimmed stdin and upstream JSON carries an empty-object `data.result`. Contract: [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#details) (`nextActions`, `evalStdinHint`); human workflow: [`COMMAND_REFERENCE.md`](COMMAND_REFERENCE.md) extraction note and README quick start; fake coverage: `buildToolPresentation suggests grouped getter commands for common unknown getter shortcuts` and `agentBrowserExtension warns when eval stdin returns an empty object from a function-shaped snippet`.
-`RQ-0079` clarifies artifact lifecycle and cleanup ownership: `extensions/agent-browser/index.ts` adds `details.artifactCleanup` and visible `Artifact lifecycle` copy on successful `close` when `artifactManifest.entries` is non-empty (`getArtifactCleanupGuidance`), stating that close does not delete explicit artifacts; `explicitArtifactPaths` carries up to ten distinct `explicit-path` manifest paths (possibly empty when the recent window has no explicit rows). Contract: [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#details) (`artifactCleanup`); human workflow: [`COMMAND_REFERENCE.md`](COMMAND_REFERENCE.md) artifact retention section and README artifact notes; fake coverage: `agentBrowserExtension reports artifact lifecycle guidance on close`.
+`RQ-0079` clarifies artifact lifecycle and cleanup ownership: `extensions/agent-browser/index.ts` adds `details.artifactCleanup` and visible `Artifact lifecycle` copy on successful `close` when `artifactManifest.entries` is non-empty (`getArtifactCleanupGuidance`), stating that close does not delete explicit artifacts; `explicitArtifactPaths` carries up to ten distinct existing `explicit-path` manifest paths after a filesystem existence check, skipping stale paths already removed by host tools (possibly empty when the recent window has no existing explicit rows). Contract: [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#details) (`artifactCleanup`); human workflow: [`COMMAND_REFERENCE.md`](COMMAND_REFERENCE.md) artifact retention section and README artifact notes; fake coverage: `agentBrowserExtension reports artifact lifecycle guidance on close`.

package/docs/TOOL_CONTRACT.md CHANGED Viewed

@@ -55,7 +55,7 @@ Agent-facing efficiency claims are measured with `npm run benchmark:agent-browse
 - For downloads, prefer download <selector> <path> when an element click should save a file. Do not rely on click alone when you need the downloaded file on disk.
 - When using eval --stdin, scope checks and actions to the target element or route whenever possible instead of relying on broad page-wide text heuristics.
 - When using eval --stdin for extraction, return the value you want instead of relying on console.log as the primary result channel. Prefer plain expressions like ({ title: document.title }) or explicitly invoked functions like (() => ({ title: document.title }))(); if a function-shaped snippet returns {}, details.evalStdinHint may warn that the function was serialized instead of called. If get text on a CSS selector surfaces details.selectorTextVisibility or selectorTextVisibilityAll, prefer a visible @ref, a more specific selector, or the inspect-visible-text-candidates nextAction over hidden tab content.
-- When details.pageChangeSummary is present, use changeType and summary as a compact signal for navigation, DOM mutation, confirmations, or artifacts; when nextActionIds is set, match those ids to entries in details.nextActions (or per-step nextActions inside batch) for concrete follow-up payloads instead of inferring from prose alone. If a no-navigation click surfaces details.overlayBlockers, inspect the fresh snapshot evidence before using a close/dismiss candidate nextAction.
+- When details.pageChangeSummary is present, use changeType and summary as a compact signal for navigation, DOM mutation, confirmations, or artifacts; when nextActionIds is set, match those ids to entries in details.nextActions (or per-step nextActions inside batch) for concrete follow-up payloads instead of inferring from prose alone. If a no-navigation click surfaces details.overlayBlockers, inspect the fresh snapshot evidence before using a close/dismiss candidate nextAction; ordinary page chrome without dialog/alertdialog evidence should not trigger this diagnostic.
 - When commands save or spill files (screenshots, downloads, PDFs, traces, recordings, HAR, large snapshot spills), treat paths as provisional until details.artifactVerification shows every row verified: branch on missingCount, pendingCount, unverifiedCount, per-entry state, and optional limitation before downstream file use.
 - Do not call --help or other exploratory inspection commands unless the user explicitly asks for them or debugging the browser integration is necessary.
 <!-- agent-browser-playbook:end shared-guidelines -->
@@ -112,7 +112,7 @@ Compilation (then `--json` and session handling apply like any other call):
 | `fill` or `select` | `["find",<locator>,<value>,<action>,<text>]` plus optional `["--name",<name>]` after `text` when `locator` is `role` and `name` is set |
 | any supported action + `session` | prepends `["--session",<session>]` before the compiled `find` argv |
-When `semanticAction` compiles successfully, `details.compiledSemanticAction` echoes `{ action, locator, args }` with `args` redacted the same way as other invocation details. Expect it on the initial wrapper validation return (when that path still builds the early `details` object) and on the unified result after `agent-browser` runs. It is omitted when the call used `args` only, when compilation never produced argv, and on some in-`execute` error returns that attach a slimmer `details` shape before the unified merge (for example certain session-plan, stdin-contract, tab-pinning, or missing-binary guard paths); compare `extensions/agent-browser/index.ts` where `compiledSemanticAction` is assigned.
+When `semanticAction` compiles successfully, `details.compiledSemanticAction` echoes `{ action, locator, args }` with `args` redacted the same way as other invocation details. Expect it on the initial wrapper validation return (when that path still builds the early `details` object) and on the unified result after `agent-browser` runs. It is omitted when the call used `args` only, when compilation never produced argv, and on some in-`execute` error returns that attach a slimmer `details` shape before the unified merge (for example certain session-plan, stdin-contract, tab-pinning, or missing-binary guard paths); compare `extensions/agent-browser/index.ts` where `compiledSemanticAction` is assigned. For active sessions, role/name `click`, `check`, and `uncheck` semantic actions may be resolved through one fresh `snapshot -i` to a current visible `@ref` before execution; this avoids hidden duplicate matches stealing an upstream `find` action. In that case `details.compiledSemanticAction` still records the original semantic target while `details.effectiveArgs` shows the executed ref action.
 If a compiled `semanticAction` fails with `failureCategory: "selector-not-found"`, visible content includes an `Agent-browser candidate fallbacks` block when the wrapper has bounded role/name retries for that locator and action, and `details.nextActions` includes the normal `refresh-interactive-refs` snapshot step plus those entries. When `session` was provided, candidate retry args preserve the same `--session <session>` prefix. Today `buildSemanticActionCandidateActions` in `extensions/agent-browser/index.ts` only appends candidates for: `fill` + `placeholder` → `try-searchbox-name-candidate` and `try-textbox-name-candidate` (same accessible name as `value`); `click` + `text` → `try-button-name-candidate` and `try-link-name-candidate`; `fill` + `label` → `try-labeled-textbox-candidate`. `select` misses—including `select` + `placeholder`—do not append `try-*` entries even when a parallel `fill` would; other locator/action pairs omit this block too. `fill` candidates keep the same trailing `text` token as the original compile before `--name <value>`; `click` candidates omit text. Each entry carries `safety` noting the match may be ambiguous. Candidate fallbacks are heuristics, not proof that an element exists; inspect the page when several controls could share the same name.
@@ -181,10 +181,11 @@ Because `job` still executes as upstream `batch` with generated stdin, the same
 - type: object with required `url`
 - optional; mutually exclusive with `args`, `semanticAction`, `job`, `sourceLookup`, and `networkSourceLookup`
 - lightweight preset built on the same batch compiler path as `job`
-- clears enabled diagnostic buffers first (`network requests --clear`, `console --clear`, `errors --clear`), then opens `url`, waits with `wait --load networkidle`, optionally asserts `expectedText` (string or string array) and/or `expectedSelector` (each may be omitted for a load-plus-diagnostics-only smoke), then runs enabled diagnostics: `network requests`, `console`, and `errors`
+- clears enabled diagnostic buffers first (`network requests --clear`, `console --clear`, `errors --clear`), then opens `url`, waits with `wait --load <loadState>`, optionally asserts `expectedText` (string or string array) and/or `expectedSelector` (each may be omitted for a load-plus-diagnostics-only smoke), then runs enabled diagnostics: `network requests`, `console`, and `errors`
+- `loadState` is optional and must be `domcontentloaded`, `load`, or `networkidle`; it defaults to `domcontentloaded` so analytics-heavy or long-polling pages do not hang routine QA. Use `networkidle` only when the site is expected to go fully quiet.
 - `checkNetwork`, `checkConsole`, and `checkErrors` default to `true`; set a field to `false` to omit that diagnostic
 - optional `screenshotPath` adds an evidence screenshot step
-- reports `details.compiledQaPreset` with the compiled batch plan and `details.qaPreset` with `{ passed, failedChecks, warnings, summary }`
+- reports `details.compiledQaPreset` with the compiled batch plan and resolved `loadState`, plus `details.qaPreset` with `{ passed, failedChecks, warnings, summary }`
 - fails the native tool result with `failureCategory: "qa-failure"` when diagnostics report page errors, console error messages, actionable failed network requests, or any batch step failure. Benign classification (implementation: `classifyNetworkRequestFailure` → `isBenignAssetFailure` in `extensions/agent-browser/lib/results/shared.ts`) applies only when the row is already treated as failed (`status >= 400`, `failed: true`, or a string `error`—see `isFailedNetworkRequest`), the URL path’s last segment matches the icon basename heuristic (`favicon` plus `.ico`/`.png`/`.svg`, or `apple-touch-icon` plus `.png`, each allowing an optional `[-.\w]*` stem suffix before the extension), **and** at least one of `status === 404`, `failed === true`, or `typeof request.error === "string"` holds (so a **status-only** failure such as `500` on that path with neither `failed` nor a string `error` stays actionable). It also requires the upstream `resourceType` / `mimeType` (whichever is present) to be absent or look image-like: `image`, `img`, `other`, or a value starting with `image/`. Those rows are counted in `qaPreset.warnings` (for example `N benign network request failure(s) ignored`) and omitted from the actionable failed-network tally; every other failed request stays actionable.
 Example:
@@ -375,7 +376,7 @@ For `batch`, each `batchSteps[]` entry can carry its own `nextActions` for that
 `pageChangeSummary` is an optional compact summary for mutation-prone and artifact-producing commands. It includes `changeType` (`"navigation"`, `"mutation"`, `"artifact"`, or `"confirmation"`), `command`, a readable `summary`, optional `title`/`url`, optional `artifactCount` or `savedFilePath`, and `nextActionIds` that link the observed change to `nextActions` without repeating full payloads. The wrapper maintains an explicit allowlist of mutation-prone commands in `extensions/agent-browser/lib/results/presentation.ts` (`PAGE_CHANGE_SUMMARY_COMMANDS`): those commands still emit a `mutation`-typed summary when upstream JSON lacks navigation metadata, as long as no stronger signal (artifact, saved path, navigation fields, or pending confirmation) applies. Commands outside that set omit `pageChangeSummary` unless the parsed payload shows navigation, a confirmation prompt, saved files, or artifacts—including read-only inspection commands, which normally have no summary unless one of those signals appears. For `batch`, the top-level summary favors artifact rollups when any step produced artifacts; otherwise it may synthesize a `mutation` summary from steps that carried their own `pageChangeSummary`.
-`overlayBlockers` may appear after a successful **top-level** `click` (the unified `details.command` is `click`, not `batch`/`job`/`qa` flows that compile to `batch`) only when upstream JSON includes a string `data.clicked` ref, the session’s prior pinned tab URL (`priorSessionTabTarget.url`) and the post-click active tab URL both exist and stay equal after the same URL normalization used for ref preflight (trimmed hosts/paths; **`#fragment` dropped** while the query string stays significant), and the wrapper did not apply session tab correction or an about-blank mismatch recovery in the same result. The post-click side comes from `details.navigationSummary.url`, which the wrapper fills via follow-up `get url` / `get title` only when upstream click JSON omits **both** string `data.url` and `data.title` (`shouldCaptureNavigationSummary` in `extensions/agent-browser/index.ts`). If either field is present as a string on the click payload, that probe is skipped, `navigationSummary` stays unset here, and overlay diagnostics are omitted even when the page did not navigate. The wrapper then issues **one** extra session-scoped `snapshot -i`, scans that snapshot’s `refs` map, and only emits diagnostics when **both** are true: at least one ref looks like overlay/banner/dialog context (`dialog` / `alertdialog` roles or name text matching banner/modal/cookie/consent style patterns in `extensions/agent-browser/index.ts`), and there are up to **three** separate `button`/`link`/`menuitem` refs whose names match close/dismiss-style patterns (for example “Close”, “Dismiss”, “No thanks”, or a lone `×`). Each candidate carries `ref` (`@eN`), optional `role`/`name`, exact `click` argv in `args`, and a short evidence `reason`. The struct also includes a `summary` string (one sentence stating that the click left the tab on the same normalized URL and the fresh snapshot shows likely dismiss controls) plus a `snapshot` object (same shape as `details.refSnapshot` after a normal snapshot): on success the wrapper may treat that snapshot as the session’s latest ref map for subsequent calls, so agents should assume refs can move to match this post-diagnostic tree. Visible text appends the same bullets under `Possible overlay blockers`, and `details.nextActions` gains `inspect-overlay-state` plus `try-overlay-blocker-candidate-1`…`3` after any presentation `nextActions` (for example `inspect-after-mutation`); when `details.sessionName` is set, those appended actions use `sessionPrefixArgs` so `params.args` begin with `--session <name>` unless argv already starts with `--session`. This is conservative evidence, not proof the candidate should be clicked; prefer `inspect-overlay-state` first unless the dismiss control is clearly safe.
+`overlayBlockers` may appear after a successful **top-level** `click` (the unified `details.command` is `click`, not `batch`/`job`/`qa` flows that compile to `batch`) only when upstream JSON includes a string `data.clicked` ref, the session’s prior pinned tab URL (`priorSessionTabTarget.url`) and the post-click active tab URL both exist and stay equal after the same URL normalization used for ref preflight (trimmed hosts/paths; **`#fragment` dropped** while the query string stays significant), and the wrapper did not apply session tab correction or an about-blank mismatch recovery in the same result. The post-click side comes from `details.navigationSummary.url`, which the wrapper fills via follow-up `get url` / `get title` only when upstream click JSON omits **both** string `data.url` and `data.title` (`shouldCaptureNavigationSummary` in `extensions/agent-browser/index.ts`). If either field is present as a string on the click payload, that probe is skipped, `navigationSummary` stays unset here, and overlay diagnostics are omitted even when the page did not navigate. The wrapper then issues **one** extra session-scoped `snapshot -i`, scans that snapshot’s `refs` map, and only emits diagnostics when **both** are true: at least one ref has a strong modal role (`dialog` or `alertdialog`), and there are up to **three** separate `button`/`link`/`menuitem` refs whose names match close/dismiss-style patterns (for example “Close”, “Dismiss”, “No thanks”, or a lone `×`). Page-wide text such as “privacy”, “sign in”, or “banner” without a dialog role is not enough, which avoids warning on ordinary same-page menu opens or app button mutations. Each candidate carries `ref` (`@eN`), optional `role`/`name`, exact `click` argv in `args`, and a short evidence `reason`. The struct also includes a `summary` string (one sentence stating that the click left the tab on the same normalized URL and the fresh snapshot shows likely dismiss controls) plus a `snapshot` object (same shape as `details.refSnapshot` after a normal snapshot): on success the wrapper may treat that snapshot as the session’s latest ref map for subsequent calls, so agents should assume refs can move to match this post-diagnostic tree. Visible text appends the same bullets under `Possible overlay blockers`, and `details.nextActions` gains `inspect-overlay-state` plus `try-overlay-blocker-candidate-1`…`3` after any presentation `nextActions` (for example `inspect-after-mutation`); when `details.sessionName` is set, those appended actions use `sessionPrefixArgs` so `params.args` begin with `--session <name>` unless argv already starts with `--session`. This is conservative evidence, not proof the candidate should be clicked; prefer `inspect-overlay-state` first unless the dismiss control is clearly safe.
 Example shape (fields vary by scenario):
@@ -452,7 +453,7 @@ Additional structured fields can appear when relevant:
 - `fullOutputPath` / `fullOutputPaths` when large snapshot output or other oversized tool output is compacted and spilled to a private file; persisted sessions keep that path under a private session-scoped artifact directory with a bounded per-session budget so it survives reload/resume without unbounded growth
 - `artifactManifest` for a bounded, metadata-only inventory of recent session artifacts. Entries include path metadata, artifact `kind`, source `command`/`subcommand` when safe, `storageScope` (`persistent-session`, `process-temp`, or `explicit-path`), and `retentionState` (`live`, `ephemeral`, `missing`, or `evicted`). The default recent window is 100 entries and can be configured with `PI_AGENT_BROWSER_SESSION_ARTIFACT_MANIFEST_MAX_ENTRIES`. The manifest must not store command args, output contents, headers, DOM snapshots, or downloaded file contents.
 - `artifactRetentionSummary` with a concise count of live, evicted, ephemeral, and missing artifacts from the current manifest; results append this summary to model-facing text only when retention state affects recovery, such as spill files, ephemeral files, or evictions. Routine explicit saved files keep the summary in details to avoid noisy browsing transcripts.
-- `artifactCleanup` after a successful `close` when `artifactManifest` exists and `entries` is non-empty. Fields: `owner: "host-file-tools"`, `summary` (same retention summary string as `artifactRetentionSummary` for that manifest), `note` explaining that browser close does not delete explicit screenshots/downloads/PDFs/traces/HAR/recordings, and `explicitArtifactPaths`: up to ten **distinct** paths taken from manifest rows with `storageScope: "explicit-path"` in encounter order (de-duplicated); when the recent window has no such rows—for example only spill or ephemeral inventory—the array is empty but `summary` / `note` still surface so agents know close is not file deletion. The native browser tool intentionally does not expose a delete operation for arbitrary user-chosen artifact paths; agents should inspect `artifactVerification` / manifest metadata, then remove files with normal host file tools when cleanup is required.
+- `artifactCleanup` after a successful `close` when `artifactManifest` exists and `entries` is non-empty. Fields: `owner: "host-file-tools"`, `summary` (same retention summary string as `artifactRetentionSummary` for that manifest), `note` explaining that browser close does not delete explicit screenshots/downloads/PDFs/traces/HAR/recordings, and `explicitArtifactPaths`: up to ten **distinct existing** paths taken from manifest rows with `storageScope: "explicit-path"` in encounter order (de-duplicated after checking the filesystem); deleted/stale explicit paths are skipped. When the recent window has no existing explicit rows—for example only spill/ephemeral inventory or explicit paths already deleted—the array is empty but `summary` / `note` still surface so agents know close is not file deletion. The native browser tool intentionally does not expose a delete operation for arbitrary user-chosen artifact paths; agents should inspect `artifactVerification` / manifest metadata, then remove files with normal host file tools when cleanup is required.
 - compact **snapshot** metadata on successful presentation when `details.data.compacted` is true (oversized trees): `previewMode` (`"structured"` vs outline `"outline"`), `structuredPreviewUsed`, `previewRefIds`, `previewSections` (per-section `linesShown` / `omittedLines` / root `role` / `title`), `additionalSectionsOmitted`, counts such as `refCount`, `snapshotLineCount`, and `roleCounts`, optional `highValueControlRefIds` aligned with the visible `Omitted high-value controls` lines, and optional `spillError` when the wrapper could not write the raw spill file; the model text still ends with `Full raw snapshot path:` or an explicit unavailable reason plus `details.fullOutputPath` when a path exists
 - `sessionRecoveryHint` when startup-scoped flags need `sessionMode: "fresh"` while an implicit session is already active: includes `reason`, `recommendedSessionMode` (`"fresh"`), redacted `exampleArgs`, and `exampleParams` where `sessionMode` is `"fresh"` and `args` is the same redacted argv as `exampleArgs` (from `buildExecutionPlan` in `extensions/agent-browser/lib/runtime.ts`, merged through `redactRecoveryHint` in `extensions/agent-browser/index.ts`)
 - `inspection: true` plus `stdout` for successful plain-text inspection commands like `--help` and `--version`

package/extensions/agent-browser/index.ts CHANGED Viewed

@@ -30,6 +30,7 @@ import {
 	buildAgentBrowserNextActions,
 	buildAgentBrowserResultCategoryDetails,
 	buildToolPresentation,
+	compareRefIds,
 	getAgentBrowserErrorText,
 	parseAgentBrowserEnvelope,
 	type AgentBrowserBatchResult,
@@ -84,6 +85,7 @@ const PACKAGE_NAME = "pi-agent-browser-native";
 const AGENT_BROWSER_SEMANTIC_ACTIONS = ["check", "click", "fill", "select", "uncheck"] as const;
 const AGENT_BROWSER_SEMANTIC_LOCATORS = ["alt", "label", "placeholder", "role", "testid", "text", "title"] as const;
 const AGENT_BROWSER_JOB_STEP_ACTIONS = ["open", "click", "fill", "wait", "assertText", "assertUrl", "waitForDownload", "screenshot"] as const;
+const AGENT_BROWSER_QA_LOAD_STATES = ["domcontentloaded", "load", "networkidle"] as const;
 const SOURCE_LOOKUP_WORKSPACE_EXTENSIONS = new Set([".ts", ".tsx", ".js", ".jsx"]);
 const SOURCE_LOOKUP_IGNORED_DIRECTORIES = new Set([".git", "node_modules", "dist", "build", "coverage", ".next", "out", "tmp", "temp"]);
 const SOURCE_LOOKUP_DEFAULT_MAX_WORKSPACE_FILES = 2_000;
@@ -92,6 +94,7 @@ const SOURCE_LOOKUP_MAX_WORKSPACE_FILES = 5_000;
 type AgentBrowserSemanticActionName = (typeof AGENT_BROWSER_SEMANTIC_ACTIONS)[number];
 type AgentBrowserSemanticLocator = (typeof AGENT_BROWSER_SEMANTIC_LOCATORS)[number];
 type AgentBrowserJobStepAction = (typeof AGENT_BROWSER_JOB_STEP_ACTIONS)[number];
+type AgentBrowserQaLoadState = (typeof AGENT_BROWSER_QA_LOAD_STATES)[number];
 type AgentBrowserSourceLookupStatus = "candidates-found" | "no-candidates" | "unsupported";
 type AgentBrowserNetworkSourceLookupStatus = "failed-requests-found" | "no-failed-requests" | "no-candidates";
@@ -127,6 +130,7 @@ interface CompiledAgentBrowserQaPreset extends CompiledAgentBrowserJob {
 		checkConsole: boolean;
 		checkErrors: boolean;
 		checkNetwork: boolean;
+		loadState: AgentBrowserQaLoadState;
 		expectedText: string[];
 		expectedSelector?: string;
 		screenshotPath?: string;
@@ -238,6 +242,7 @@ const AGENT_BROWSER_PARAMS = Type.Object({
 			checkConsole: Type.Optional(Type.Boolean({ description: "Whether to fail on console error messages. Defaults to true." })),
 			checkErrors: Type.Optional(Type.Boolean({ description: "Whether to fail on page errors. Defaults to true." })),
 			checkNetwork: Type.Optional(Type.Boolean({ description: "Whether to inspect network requests and fail on actionable request failures; benign icon misses warn. Defaults to true." })),
+			loadState: Type.Optional(StringEnum(AGENT_BROWSER_QA_LOAD_STATES, { description: "Page readiness state for the QA preset before assertions and diagnostics. Defaults to domcontentloaded; use networkidle only for pages without long-lived background requests." })),
 		}),
 	),
 	sourceLookup: Type.Optional(
@@ -450,16 +455,21 @@ function compileAgentBrowserQaPreset(input: unknown): { compiled?: CompiledAgent
 			return { error: `qa.${field} must be a boolean when provided.` };
 		}
 	}
+	const rawLoadState = input.loadState;
+	if (rawLoadState !== undefined && (typeof rawLoadState !== "string" || !AGENT_BROWSER_QA_LOAD_STATES.includes(rawLoadState as AgentBrowserQaLoadState))) {
+		return { error: `qa.loadState must be one of: ${AGENT_BROWSER_QA_LOAD_STATES.join(", ")}.` };
+	}
 	const checkConsole = input.checkConsole !== false;
 	const checkErrors = input.checkErrors !== false;
 	const checkNetwork = input.checkNetwork !== false;
+	const loadState = (rawLoadState as AgentBrowserQaLoadState | undefined) ?? "domcontentloaded";
 	const steps: CompiledAgentBrowserJobStep[] = [];
 	if (checkNetwork) steps.push({ action: "wait", args: ["network", "requests", "--clear"] });
 	if (checkConsole) steps.push({ action: "wait", args: ["console", "--clear"] });
 	if (checkErrors) steps.push({ action: "wait", args: ["errors", "--clear"] });
 	steps.push(
 		{ action: "open", args: ["open", url] },
-		{ action: "wait", args: ["wait", "--load", "networkidle"] },
+		{ action: "wait", args: ["wait", "--load", loadState] },
 	);
 	for (const text of expectedText) {
 		steps.push({ action: "assertText", args: ["wait", "--text", text] });
@@ -474,7 +484,7 @@ function compileAgentBrowserQaPreset(input: unknown): { compiled?: CompiledAgent
 	return {
 		compiled: {
 			args: ["batch"],
-			checks: { checkConsole, checkErrors, checkNetwork, expectedSelector, expectedText, screenshotPath, url },
+			checks: { checkConsole, checkErrors, checkNetwork, expectedSelector, expectedText, loadState, screenshotPath, url },
 			stdin: JSON.stringify(steps.map((step) => step.args)),
 			steps,
 		},
@@ -969,6 +979,61 @@ function buildSemanticActionCandidateActions(compiled: CompiledAgentBrowserSeman
 	return [];
 }
+function normalizeSemanticActionAccessibleName(name: string): string {
+	return name.replace(/\s+/g, " ").trim().toLowerCase();
+}
+function semanticActionNameMatches(candidateName: string, targetName: string): boolean {
+	const normalizedCandidate = normalizeSemanticActionAccessibleName(candidateName);
+	const normalizedTarget = normalizeSemanticActionAccessibleName(targetName);
+	return normalizedCandidate === normalizedTarget || normalizedCandidate.startsWith(`${normalizedTarget} `);
+}
+function getCompiledSemanticActionRoleTarget(compiled: CompiledAgentBrowserSemanticAction): { role: string; targetName: string } | undefined {
+	if (compiled.locator !== "role" || !["check", "click", "uncheck"].includes(compiled.action)) return undefined;
+	const findIndex = compiled.args.indexOf("find");
+	if (findIndex < 0 || compiled.args[findIndex + 1] !== "role") return undefined;
+	const role = compiled.args[findIndex + 2];
+	const nameFlagIndex = compiled.args.indexOf("--name");
+	const targetName = nameFlagIndex >= 0 ? compiled.args[nameFlagIndex + 1] : undefined;
+	if (!role || !targetName) return undefined;
+	return { role, targetName };
+}
+function findSemanticActionRefInSnapshot(compiled: CompiledAgentBrowserSemanticAction, snapshotData: unknown): string | undefined {
+	const target = getCompiledSemanticActionRoleTarget(compiled);
+	const refs = getSnapshotRefRecord(snapshotData);
+	if (!target || !refs) return undefined;
+	const candidates = Object.entries(refs).flatMap(([ref, entry]) => {
+		if (!/^e\d+$/.test(ref) || !isRecord(entry)) return [];
+		const role = typeof entry.role === "string" ? entry.role : undefined;
+		const name = typeof entry.name === "string" ? entry.name : undefined;
+		if (!role || !name || role.toLowerCase() !== target.role.toLowerCase() || !semanticActionNameMatches(name, target.targetName)) return [];
+		return [{ exact: normalizeSemanticActionAccessibleName(name) === normalizeSemanticActionAccessibleName(target.targetName), name, ref }];
+	});
+	candidates.sort((left, right) => Number(right.exact) - Number(left.exact) || left.name.length - right.name.length || compareRefIds(left.ref, right.ref));
+	return candidates[0]?.ref;
+}
+interface SemanticActionVisibleRefResolution {
+	args: string[];
+	snapshot: SessionRefSnapshot;
+}
+async function resolveSemanticActionVisibleRefArgs(options: {
+	compiled: CompiledAgentBrowserSemanticAction | undefined;
+	cwd: string;
+	sessionName?: string;
+	signal?: AbortSignal;
+}): Promise<SemanticActionVisibleRefResolution | undefined> {
+	if (!options.compiled || !options.sessionName || !getCompiledSemanticActionRoleTarget(options.compiled)) return undefined;
+	const snapshotData = await runSessionCommandData({ args: ["snapshot", "-i"], cwd: options.cwd, sessionName: options.sessionName, signal: options.signal });
+	const ref = findSemanticActionRefInSnapshot(options.compiled, snapshotData);
+	const snapshot = extractRefSnapshotFromData(snapshotData);
+	if (!ref || !snapshot) return undefined;
+	return { args: [...getCompiledSemanticActionSessionPrefix(options.compiled), options.compiled.action, `@${ref}`], snapshot };
+}
 function compileAgentBrowserSemanticAction(input: unknown): { compiled?: CompiledAgentBrowserSemanticAction; error?: string } {
 	if (!isRecord(input)) {
 		return { error: "semanticAction must be an object." };
@@ -2627,7 +2692,6 @@ function getSnapshotRefRecord(data: unknown): Record<string, unknown> | undefine
 }
 const OVERLAY_CLOSE_NAME_PATTERN = /(?:\b(?:close|dismiss|no thanks|not now|maybe later|hide|skip|continue without|x)\b|^\s*×\s*$)/i;
-const OVERLAY_CONTEXT_NAME_PATTERN = /\b(?:banner|modal|dialog|popup|pop-up|overlay|donat(?:e|ion)|subscribe|sign in|login|cookie|privacy|consent)\b/i;
 const OVERLAY_CONTEXT_ROLES = new Set(["alertdialog", "dialog"]);
 const OVERLAY_ACTION_ROLES = new Set(["button", "link", "menuitem"]);
 const OVERLAY_BLOCKER_CANDIDATE_LIMIT = 3;
@@ -2638,8 +2702,7 @@ function getOverlayBlockerCandidates(snapshotData: unknown): OverlayBlockerCandi
 	const hasOverlayContext = Object.values(refs).some((entry) => {
 		if (!isRecord(entry)) return false;
 		const role = typeof entry.role === "string" ? entry.role : "";
-		const name = typeof entry.name === "string" ? entry.name : "";
-		return OVERLAY_CONTEXT_ROLES.has(role.toLowerCase()) || OVERLAY_CONTEXT_NAME_PATTERN.test(name);
+		return OVERLAY_CONTEXT_ROLES.has(role.toLowerCase());
 	});
 	if (!hasOverlayContext) return [];
 	const candidates: OverlayBlockerCandidate[] = [];
@@ -2810,16 +2873,27 @@ function formatEvalStdinHintText(hint: EvalStdinHint | undefined): string | unde
 	return hint ? `Eval stdin hint: ${hint.reason} ${hint.suggestion}` : undefined;
 }
-function getArtifactCleanupGuidance(options: { command?: string; manifest?: SessionArtifactManifest; succeeded: boolean }): ArtifactCleanupGuidance | undefined {
+async function getArtifactCleanupGuidance(options: { command?: string; cwd: string; manifest?: SessionArtifactManifest; succeeded: boolean }): Promise<ArtifactCleanupGuidance | undefined> {
 	if (!options.succeeded || options.command !== "close" || !options.manifest || options.manifest.entries.length === 0) return undefined;
-	const explicitArtifactPaths = options.manifest.entries
-		.filter((entry) => entry.storageScope === "explicit-path")
-		.map((entry) => entry.path)
-		.filter((path, index, paths) => paths.indexOf(path) === index)
-		.slice(0, 10);
+	const explicitEntries = options.manifest.entries.filter((entry) => entry.storageScope === "explicit-path");
+	const explicitArtifactPaths: string[] = [];
+	const seenPaths = new Set<string>();
+	for (const entry of explicitEntries) {
+		if (explicitArtifactPaths.length >= 10) break;
+		const displayPath = entry.path;
+		if (seenPaths.has(displayPath)) continue;
+		const absolutePath = entry.absolutePath ?? (isAbsolute(entry.path) ? entry.path : resolve(options.cwd, entry.path));
+		try {
+			await stat(absolutePath);
+		} catch {
+			continue;
+		}
+		seenPaths.add(displayPath);
+		explicitArtifactPaths.push(displayPath);
+	}
 	return {
 		explicitArtifactPaths,
-		note: "Closing the browser session does not delete explicit screenshots, downloads, PDFs, traces, HAR files, or recordings; clean those paths with host file tools when no longer needed.",
+		note: "Closing the browser session does not delete explicit screenshots, downloads, PDFs, traces, HAR files, or recordings; clean existing paths with host file tools when no longer needed.",
 		owner: "host-file-tools",
 		summary: formatSessionArtifactRetentionSummary(options.manifest),
 	};
@@ -3457,12 +3531,29 @@ export default function agentBrowserExtension(pi: ExtensionAPI) {
 			const runTool = async (): Promise<AgentBrowserToolResult> => {
 				const sessionMode = params.sessionMode ?? DEFAULT_SESSION_MODE;
 				const freshSessionName = createFreshSessionName(managedSessionBaseName, ephemeralSessionSeed, freshSessionOrdinal + 1);
-				const executionPlan = buildExecutionPlan(preparedArgs.args, {
+				let executionPlan = buildExecutionPlan(preparedArgs.args, {
 					freshSessionName,
 					managedSessionActive,
 					managedSessionName,
 					sessionMode,
 				});
+				let semanticActionVisibleRefResolution: SemanticActionVisibleRefResolution | undefined;
+				if (!executionPlan.validationError && executionPlan.managedSessionName !== freshSessionName) {
+					semanticActionVisibleRefResolution = await resolveSemanticActionVisibleRefArgs({
+						compiled: compiledSemanticAction,
+						cwd: ctx.cwd,
+						sessionName: executionPlan.sessionName,
+						signal,
+					});
+					if (semanticActionVisibleRefResolution) {
+						executionPlan = buildExecutionPlan(semanticActionVisibleRefResolution.args, {
+							freshSessionName,
+							managedSessionActive,
+							managedSessionName,
+							sessionMode,
+						});
+					}
+				}
 				const redactedEffectiveArgs = redactInvocationArgs(executionPlan.effectiveArgs);
 				const redactedRecoveryHint = redactRecoveryHint(executionPlan.recoveryHint);
 				const compatibilityWorkaround: CompatibilityWorkaround | undefined = executionPlan.compatibilityWorkaround;
@@ -3490,7 +3581,7 @@ export default function agentBrowserExtension(pi: ExtensionAPI) {
 					};
 				}
-				const commandTokens = extractCommandTokens(preparedArgs.args);
+				const commandTokens = semanticActionVisibleRefResolution ? extractCommandTokens(semanticActionVisibleRefResolution.args) : extractCommandTokens(preparedArgs.args);
 				const exactSensitiveValues = getExactSensitiveStdinValues({
 					command: executionPlan.commandInfo.command,
 					commandTokens,
@@ -3560,10 +3651,13 @@ export default function agentBrowserExtension(pi: ExtensionAPI) {
 				const priorSessionTabTargetState = executionPlan.sessionName ? sessionTabTargets.get(executionPlan.sessionName) : undefined;
 				const priorSessionTabTarget = priorSessionTabTargetState?.target;
 				const priorRefSnapshotState = executionPlan.sessionName ? sessionRefSnapshots.get(executionPlan.sessionName) : undefined;
+				const resolvedSemanticActionRefSnapshot = semanticActionVisibleRefResolution?.snapshot
+					? { ...semanticActionVisibleRefResolution.snapshot, target: semanticActionVisibleRefResolution.snapshot.target ?? priorSessionTabTarget }
+					: undefined;
 				const staleRefPreflight = buildStaleRefPreflight({
 					commandTokens,
 					currentTarget: priorSessionTabTarget,
-					refSnapshot: priorRefSnapshotState,
+					refSnapshot: resolvedSemanticActionRefSnapshot ?? priorRefSnapshotState,
 					stdin: toolStdin,
 				});
 				if (staleRefPreflight) {
@@ -3937,7 +4031,7 @@ export default function agentBrowserExtension(pi: ExtensionAPI) {
 				? extractRefSnapshotFromData(presentationEnvelope?.data)
 				: executionPlan.commandInfo.command === "batch"
 					? extractRefSnapshotFromBatchResults(presentationEnvelope?.data)
-					: overlayBlockerDiagnostic?.snapshot
+					: resolvedSemanticActionRefSnapshot ?? overlayBlockerDiagnostic?.snapshot
 			: undefined;
 						if (refSnapshot && shouldApplySessionTabTargetUpdate({ current: sessionRefSnapshots.get(executionPlan.sessionName), updateOrder: tabTargetUpdateOrder })) {
 							currentRefSnapshot = { ...refSnapshot, target: refSnapshot.target ?? currentSessionTabTarget };
@@ -4082,8 +4176,9 @@ export default function agentBrowserExtension(pi: ExtensionAPI) {
 						stdin: toolStdin,
 					});
 					const resultArtifactManifest = presentation.artifactManifest ?? artifactManifest;
-					const artifactCleanup = getArtifactCleanupGuidance({
+					const artifactCleanup = await getArtifactCleanupGuidance({
 						command: executionPlan.commandInfo.command,
+						cwd: ctx.cwd,
 						manifest: resultArtifactManifest,
 						succeeded,
 					});

package/extensions/agent-browser/lib/playbook.ts CHANGED Viewed

@@ -17,7 +17,7 @@ export const QUICK_START_GUIDELINES = [
 	"Quick start mental model: use exactly one of args (exact agent-browser CLI args after the binary), semanticAction (a thin find-locator shorthand compiled to find argv), job (a constrained short-workflow schema compiled to batch), qa (a lightweight QA preset built on job/batch), or the experimental sourceLookup / networkSourceLookup helpers (each compiled to batch); stdin is only for batch, eval --stdin, auth save --password-stdin, and wrapper-generated batch stdin from job, qa, sourceLookup, or networkSourceLookup, and other command/stdin combinations are rejected before launch; sessionMode=fresh switches the extension-managed pi-scoped session to a fresh upstream launch when you need new --profile, --session-name, --cdp, --state, --auto-connect, --init-script, --enable, -p/--provider, or iOS --device state.",
 	"There is no first-class reusable named browser recipe runtime above top-level job, the qa preset, and raw batch stdin; keep recurring flows in documentation examples or those inputs (closed RQ-0068; see docs/ARCHITECTURE.md#no-reusable-recipe-layer-yet).",
 	"Common first calls: { args: [\"open\", \"https://example.com\"] } then { args: [\"snapshot\", \"-i\"] }; after navigation, use { args: [\"click\", \"@e2\"] } then { args: [\"snapshot\", \"-i\"] }.",
-	"Locator-first clicks and fills without hand-building find argv: { semanticAction: { action: \"click\", locator: \"text\", value: \"Close\" } } or { semanticAction: { action: \"fill\", locator: \"label\", value: \"Email\", text: \"user@example.com\" } }; add semanticAction.session when targeting a named upstream browser session; details.compiledSemanticAction shows the derived find command; selector-not-found failures may append bounded try-*-candidate next actions (and an Agent-browser candidate fallbacks prose block) for specific placeholder/text/label shapes, and stale-ref failures can return retry-semantic-action-after-stale-ref when retry safety is provable.",
+	"Locator-first clicks and fills without hand-building find argv: { semanticAction: { action: \"click\", locator: \"text\", value: \"Close\" } } or { semanticAction: { action: \"fill\", locator: \"label\", value: \"Email\", text: \"user@example.com\" } }; add semanticAction.session when targeting a named upstream browser session; details.compiledSemanticAction shows the semantic target, while details.effectiveArgs may show a resolved current @ref for active-session role/name click/check/uncheck actions to avoid hidden duplicate matches; selector-not-found failures may append bounded try-*-candidate next actions (and an Agent-browser candidate fallbacks prose block) for specific placeholder/text/label shapes, and stale-ref failures can return retry-semantic-action-after-stale-ref when retry safety is provable.",
 	"Common advanced calls: { args: [\"batch\"], stdin: \"[[\\\"open\\\",\\\"https://example.com\\\"],[\\\"snapshot\\\",\\\"-i\\\"]]\" }, { job: { steps: [{ action: \"open\", url: \"https://example.com\" }, { action: \"assertText\", text: \"Example Domain\" }, { action: \"screenshot\", path: \".dogfood/example.png\" }] } }, { qa: { url: \"https://example.com\", expectedText: \"Example Domain\", screenshotPath: \".dogfood/qa-example.png\" } }, { args: [\"eval\", \"--stdin\"], stdin: \"document.title\" }, { args: [\"auth\", \"save\", \"name\", \"--password-stdin\"], stdin: \"<password from user-approved secret source>\" }, { args: [\"--profile\", \"Default\", \"open\", \"https://example.com/account\"], sessionMode: \"fresh\" }, and { args: [\"open\", \"--enable\", \"react-devtools\", \"https://example.com\"], sessionMode: \"fresh\" }.",
 	"High-value command reference: download <selector> <path> saves a file triggered by a click; get title/url/text/html/value/attr/count reads page state; screenshot [path] captures an image; pdf <path> saves a PDF; tab list and tab <tab-id-or-label> inspect or recover the active tab; react tree/inspect/renders/suspense introspect React after --enable react-devtools; vitals [url] measures Core Web Vitals; pushstate <url> performs SPA navigation.",
 	"For artifact-producing commands, read the visible artifact block and details.artifactVerification before using files: check requested path, absolute path, existence, size bytes, artifact kind, optional mediaType, status, optional limitation, and verified/missing/pending/unverified counts. details.artifacts contains per-file metadata. Browser close does not delete explicit saved files; if close reports details.artifactCleanup, use host file tools to remove paths listed in explicitArtifactPaths (when non-empty) after inspection. For annotated screenshots inside batch, put --annotate in top-level args (for example { args: [\"--annotate\", \"batch\"], stdin: \"[[\\\"screenshot\\\",\\\"/tmp/page.png\\\"]]\" }) rather than inside the screenshot step.",
@@ -49,7 +49,7 @@ export const SHARED_BROWSER_PLAYBOOK_GUIDELINES = [
 	"For downloads, prefer download <selector> <path> when an element click should save a file. Do not rely on click alone when you need the downloaded file on disk.",
 	"When using eval --stdin, scope checks and actions to the target element or route whenever possible instead of relying on broad page-wide text heuristics.",
 	"When using eval --stdin for extraction, return the value you want instead of relying on console.log as the primary result channel. Prefer plain expressions like ({ title: document.title }) or explicitly invoked functions like (() => ({ title: document.title }))(); if a function-shaped snippet returns {}, details.evalStdinHint may warn that the function was serialized instead of called. If get text on a CSS selector surfaces details.selectorTextVisibility or selectorTextVisibilityAll, prefer a visible @ref, a more specific selector, or the inspect-visible-text-candidates nextAction over hidden tab content.",
-	"When details.pageChangeSummary is present, use changeType and summary as a compact signal for navigation, DOM mutation, confirmations, or artifacts; when nextActionIds is set, match those ids to entries in details.nextActions (or per-step nextActions inside batch) for concrete follow-up payloads instead of inferring from prose alone. If a no-navigation click surfaces details.overlayBlockers, inspect the fresh snapshot evidence before using a close/dismiss candidate nextAction.",
+	"When details.pageChangeSummary is present, use changeType and summary as a compact signal for navigation, DOM mutation, confirmations, or artifacts; when nextActionIds is set, match those ids to entries in details.nextActions (or per-step nextActions inside batch) for concrete follow-up payloads instead of inferring from prose alone. If a no-navigation click surfaces details.overlayBlockers, inspect the fresh snapshot evidence before using a close/dismiss candidate nextAction; ordinary page chrome without dialog/alertdialog evidence should not trigger this diagnostic.",
 	"When commands save or spill files (screenshots, downloads, PDFs, traces, recordings, HAR, large snapshot spills), treat paths as provisional until details.artifactVerification shows every row verified: branch on missingCount, pendingCount, unverifiedCount, per-entry state, and optional limitation before downstream file use.",
 	"Do not call --help or other exploratory inspection commands unless the user explicitly asks for them or debugging the browser integration is necessary.",
 ] as const;

package/extensions/agent-browser/lib/results/presentation.ts CHANGED Viewed

@@ -609,12 +609,27 @@ function formatNetworkRequestsText(data: Record<string, unknown>): string | unde
 	const shown = networkFailureSummary.totalCount > 0
 		? [`Network failure summary: ${networkFailureSummary.actionableCount} actionable, ${networkFailureSummary.benignCount} benign low-impact (${networkFailureSummary.totalCount} total).`]
 		: [];
-	shown.push(...requests.slice(0, DIAGNOSTIC_REQUEST_PREVIEW_LIMIT).flatMap((item, index) => {
+	const indexedRequests = requests.map((item, index) => ({ index, item }));
+	const failedRequests: typeof indexedRequests = [];
+	const normalRequests: typeof indexedRequests = [];
+	for (const indexed of indexedRequests) {
+		if (isRecord(indexed.item) && classifyNetworkRequestFailure(indexed.item)) failedRequests.push(indexed);
+		else normalRequests.push(indexed);
+	}
+	failedRequests.sort((left, right) => {
+		const leftClassification = isRecord(left.item) ? classifyNetworkRequestFailure(left.item) : undefined;
+		const rightClassification = isRecord(right.item) ? classifyNetworkRequestFailure(right.item) : undefined;
+		const leftRank = leftClassification?.impact === "actionable" ? 0 : 1;
+		const rightRank = rightClassification?.impact === "actionable" ? 0 : 1;
+		return leftRank - rightRank || left.index - right.index;
+	});
+	const prioritizedRequests = [...failedRequests, ...normalRequests];
+	shown.push(...prioritizedRequests.slice(0, DIAGNOSTIC_REQUEST_PREVIEW_LIMIT).flatMap(({ item, index }) => {
 		if (!isRecord(item)) return [`${index + 1}. ${stringifyModelFacing(item)}`];
 		return formatNetworkRequestLine(item, index);
 	}));
 	if (requests.length > DIAGNOSTIC_REQUEST_PREVIEW_LIMIT) {
-		shown.push(`... (${requests.length - DIAGNOSTIC_REQUEST_PREVIEW_LIMIT} additional requests omitted from preview)`);
+		shown.push(`... (${requests.length - DIAGNOSTIC_REQUEST_PREVIEW_LIMIT} additional requests omitted from preview; failed requests are shown first when present)`);
 	}
 	return shown.join("\n");
 }

package/extensions/agent-browser/lib/results.ts CHANGED Viewed

@@ -13,6 +13,7 @@ export {
 	buildAgentBrowserResultCategoryDetails,
 	classifyAgentBrowserFailureCategory,
 	classifyAgentBrowserSuccessCategory,
+	compareRefIds,
 } from "./results/shared.js";
 export type {
 	AgentBrowserBatchResult,

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "pi-agent-browser-native",
-  "version": "0.2.26",
+  "version": "0.2.27",
   "description": "pi extension that exposes agent-browser as a native tool for browser automation",
   "type": "module",
   "author": "Mitch Fultz (https://github.com/fitchmultz)",