pi-agent-browser-native 0.2.29 → 0.2.30

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/CHANGELOG.md CHANGED
@@ -1,5 +1,20 @@
1
1
  # Changelog
2
2
 
3
+ ## 0.2.30 - 2026-05-18
4
+
5
+ ### Added
6
+ - Current-snapshot ref fallback for locator misses: raw `find` and compiled `semanticAction` selector misses can now surface exact visible `@ref` retry actions when a fresh snapshot shows the intended control.
7
+ - Public Sauce Demo checkout smoke guidance for validating natural browser workflows, artifact paths, and final-action stop boundaries before release.
8
+ - Efficiency benchmark coverage for multi-ref extraction workflows.
9
+
10
+ ### Changed
11
+ - Reduced wrapper-induced click fragility by replacing serial post-click title/URL probes with one read-only navigation summary eval and limiting tab-list pinning/correction probes to sessions with observed drift risk.
12
+ - Allowed same-snapshot form-fill batching: `fill @e…` remains stale-ref guarded but no longer invalidates later same-snapshot fills before the first click/submit/navigation row.
13
+ - Tightened browser playbook guidance for signed-in profile use, multi-value extraction, exact requested artifact paths, and explicit order/post/purchase/submit stop boundaries.
14
+
15
+ ### Fixed
16
+ - Removed stale release/support documentation notes after the post-`v0.2.29` review and kept command-reference, support-matrix, README, and tool-contract guidance aligned with the current wrapper behavior.
17
+
3
18
  ## 0.2.29 - 2026-05-18
4
19
 
5
20
  ### Changed
package/README.md CHANGED
@@ -59,7 +59,7 @@ The result is optimized for agent work:
59
59
  | Agents build fragile shell commands | Exposes `agent_browser` with exact `args`, an optional `semanticAction` shorthand for common `find` flows, constrained `job` / `qa` presets and experimental `sourceLookup` / `networkSourceLookup` that compile short workflows to `batch`, plus controlled `stdin` and `sessionMode` | `extensions/agent-browser/index.ts`, [`docs/TOOL_CONTRACT.md`](docs/TOOL_CONTRACT.md) |
60
60
  | Page snapshots are too large | Shows compact, main-content-first summaries, surfaces an `Omitted high-value controls` section (plus `details.data.highValueControlRefIds`) when dense pages hide inputs and tabs from the trimmed ref lists, and stores full raw output in spill files when needed | `extensions/agent-browser/lib/results/snapshot.ts`, `test/agent-browser.presentation.test.ts` |
61
61
  | Screenshots/downloads get lost in text | Normalizes artifact paths and reports existence, size, cwd, session, and repair status | [`docs/COMMAND_REFERENCE.md`](docs/COMMAND_REFERENCE.md#download-screenshot-and-pdf-files) |
62
- | Profile restores and tab drift confuse agents | Tracks managed sessions, pins intended tabs, and re-selects target tabs after drift | generated tab-recovery notes below; `test/agent-browser.resume-state.test.ts` |
62
+ | Profile restores and tab drift confuse agents | Tracks managed sessions, re-selects target tabs after observed drift, and pins later commands only for sessions with drift/restored-session risk | generated tab-recovery notes below; `test/agent-browser.resume-state.test.ts` |
63
63
  | Auth/profile workflows can leak secrets | Supports `auth save --password-stdin` and redacts sensitive args, URLs, stdout/stderr, details, and parse-failure spills | `test/agent-browser.extension-validation.test.ts` |
64
64
  | Stateful cookies/storage/auth output bloats or leaks context | Presentation layer redacts `details.data` for cookies and storage (field-aware values) and recursively scrubs other structured upstream JSON (network, diff, trace/profiler, stream, dashboard, chat, auth, dialog, frame, state, and similar) using sensitive key names plus string heuristics; masks sensitive argv flags and positionals; scrubs secrets from failed batch step errors; and exposes a compact redacted `batch` matrix on top-level `details.data` | `extensions/agent-browser/lib/results/presentation.ts`, `extensions/agent-browser/lib/runtime.ts`, `test/agent-browser.presentation.test.ts` |
65
65
  | Stale `@eN` refs fail mysteriously | Records per-session `details.refSnapshot`, rejects mismatched URLs / unknown refs / unsafe `batch` stdin ordering before spawn, adds recovery guidance to rerun `snapshot -i` or use stable `find` locators | `extensions/agent-browser/index.ts`, `test/agent-browser.results.test.ts`, `test/agent-browser.extension-validation.test.ts` |
@@ -169,7 +169,7 @@ Run a multi-step flow in one tool call:
169
169
  { "args": ["batch"], "stdin": "[[\"open\",\"https://example.com\"],[\"snapshot\",\"-i\"]]" }
170
170
  ```
171
171
 
172
- If the same `batch` stdin later uses `@e…` on interaction commands after a step that can navigate or mutate the page (`open`, `click`, `fill`, and similar), insert a `snapshot` step whose first argv token is `snapshot` (for example `["snapshot","-i"]`) between those phases. The wrapper rejects unsafe ordering with `failureCategory: "stale-ref"` before upstream runs; full rules are under `refSnapshot` in [`docs/TOOL_CONTRACT.md`](docs/TOOL_CONTRACT.md#details).
172
+ If the same `batch` stdin later uses `@e…` on interaction commands after a step that can navigate or mutate the page (`open`, `click`, `reload`, and similar), insert a `snapshot` step whose first argv token is `snapshot` (for example `["snapshot","-i"]`) between those phases. Multiple same-snapshot `fill @e…` steps may be batched before a click/submit step; dynamic or autosubmit forms should still use stable locators or split with a fresh snapshot. The wrapper rejects unsafe ordering with `failureCategory: "stale-ref"` before upstream runs; full rules are under `refSnapshot` in [`docs/TOOL_CONTRACT.md`](docs/TOOL_CONTRACT.md#details).
173
173
 
174
174
  Evaluate page JavaScript through stdin. Return the value you want as an expression; `eval --stdin` may warn with `details.evalStdinHint` when a function-shaped snippet serializes to `{}` instead of being invoked:
175
175
 
@@ -178,6 +178,12 @@ Evaluate page JavaScript through stdin. Return the value you want as an expressi
178
178
  { "args": ["eval", "--stdin"], "stdin": "({ title: document.title, url: location.href })" }
179
179
  ```
180
180
 
181
+ Extract several known refs or selectors in one `batch` call instead of many serial getter calls:
182
+
183
+ ```json
184
+ { "args": ["batch"], "stdin": "[[\"get\",\"text\",\"@e64\"],[\"get\",\"text\",\"@e65\"]]" }
185
+ ```
186
+
181
187
  Save an auth profile without putting the password in `args`:
182
188
 
183
189
  ```json
@@ -208,8 +214,9 @@ Typical pitfalls:
208
214
  - Use `semanticAction.session` to target a named upstream browser session; the wrapper prepends `--session <name>` before `find` and keeps that prefix on retry/candidate actions. In active sessions, role/name click/check/uncheck shorthands may resolve through the current `snapshot -i` refs before execution so hidden duplicate matches do not steal the action; `details.effectiveArgs` shows the exact executed argv.
209
215
  - Do not reuse `@e…` refs across navigation. The wrapper records the latest snapshot refs per session and fails mutation-prone stale/recycled refs before upstream can silently hit a different current-page element; use the session-aware `refresh-interactive-refs` next action.
210
216
  - If upstream classifies the failure as `stale-ref` and `details.compiledSemanticAction` is present, `details.nextActions` may list `retry-semantic-action-after-stale-ref` after `refresh-interactive-refs`, carrying the same compiled `find` argv so you can retry the locator-stable target once it is safe to do so (contract in [`docs/TOOL_CONTRACT.md#semanticaction`](docs/TOOL_CONTRACT.md#semanticaction)).
211
- - If the failure is `selector-not-found` for a compiled `semanticAction`, visible text may add `Agent-browser candidate fallbacks` and `details.nextActions` may list bounded `try-*-candidate` follow-ups (role/name retries only for `fill` + `placeholder`, `click` + `text`, or `fill` + `label`; `select` misses do not get these entries); prefer those payloads or a fresh snapshot over guessing new selectors (same contract link).
212
- - If a **top-level** `click` succeeds (unified command `click`, not a `batch` step), upstream reports `data.clicked`, and the tab URL is unchanged under the same normalization as ref preflight (fragment-insensitive), the wrapper may take one extra `snapshot -i` and add `Possible overlay blockers` with `details.overlayBlockers` (`candidates`, `summary`, optional `snapshot` refresh for refs) plus session-aware `inspect-overlay-state` / bounded `try-overlay-blocker-candidate-*` next actions when that snapshot shows strong modal context (`dialog` / `alertdialog`) and close/dismiss-like controls. Page-wide words like privacy, sign in, or banner alone do not trigger this diagnostic. The unchanged-URL check uses `details.navigationSummary`, which is only populated via follow-up `get url` / `get title` when the click JSON omits **both** string `data.url` and `data.title`; if upstream already includes either, overlay diagnostics are skipped here. Also skipped when tab correction or about-blank recovery already ran on that result.
217
+ - If the failure is `selector-not-found`, the wrapper may take one fresh snapshot and add `Current snapshot ref fallback` plus `try-current-visible-ref*` next actions when that snapshot has exact visible role/name matches for the failed `find` / `semanticAction` target. It still adds `Agent-browser candidate fallbacks` for bounded semanticAction role/name retries (`fill` + `placeholder`, `click` + `text`, or `fill` + `label`); prefer these payloads or a fresh snapshot over guessing new selectors (same contract link).
218
+ - A successful upstream `click` is not proof that the web app handled the event or changed state. When the task depends on a mutation, follow `inspect-after-mutation` / `pageChangeSummary` evidence with a wait, URL/text check, or fresh snapshot before trusting the result; if the target still did not change, retry with a current visible ref or stable selector and report the workflow issue instead of silently continuing. Preserve explicit user stop boundaries: if the user says to stop before order/post/purchase/submit, gather evidence on that page and do not click the final action.
219
+ - If a **top-level** `click` succeeds (unified command `click`, not a `batch` step), upstream reports `data.clicked`, and the tab URL is unchanged under the same normalization as ref preflight (fragment-insensitive), the wrapper may take one extra `snapshot -i` and add `Possible overlay blockers` with `details.overlayBlockers` (`candidates`, `summary`, optional `snapshot` refresh for refs) plus session-aware `inspect-overlay-state` / bounded `try-overlay-blocker-candidate-*` next actions when that snapshot shows strong modal context (`dialog` / `alertdialog`) and close/dismiss-like controls. Page-wide words like privacy, sign in, or banner alone do not trigger this diagnostic. The unchanged-URL check uses `details.navigationSummary`, which is populated with one read-only `eval` summary when the click JSON omits **both** string `data.url` and `data.title`; if upstream already includes either, overlay diagnostics are skipped here. Also skipped when tab correction or about-blank recovery already ran on that result.
213
220
  - If `get text <selector>` reads a non-ref CSS selector with multiple matches or a hidden first match while visible matches exist, including successful `batch` steps, the wrapper may add `Selector text visibility warning`, `details.selectorTextVisibility` (plus `selectorTextVisibilityAll` for multiple batched warnings), and `inspect-visible-text-candidates` next actions; prefer a visible `@ref`, a scoped selector, or a targeted `eval --stdin` over hidden tab content.
214
221
 
215
222
  ### Constrained browser jobs
@@ -269,7 +276,7 @@ For asynchronous exports, click first and then wait for the download:
269
276
  { "args": ["wait", "--download", "/tmp/report.csv"] }
270
277
  ```
271
278
 
272
- With upstream `agent-browser 0.27.0`, treat `details.savedFilePath` as upstream-reported metadata and confirm `details.artifacts[].exists` before relying on the requested `wait --download <path>` file being present on disk.
279
+ When a user gives exact artifact paths for screenshots, recordings, downloads, PDFs, traces, or HAR files, use those paths or explicitly report why the artifact was unavailable; do not silently substitute a different path in the final report. With upstream `agent-browser 0.27.0`, treat `details.savedFilePath` as upstream-reported metadata and confirm `details.artifacts[].exists` before relying on the requested `wait --download <path>` file being present on disk.
273
280
 
274
281
  Artifact cleanup is host-owned, not a browser command. `close` shuts down the browser session but does **not** delete explicit screenshots, downloads, PDFs, traces, HAR files, or recordings saved to paths you chose. When the session’s non-empty `details.artifactManifest` is in scope, a successful `close` appends an `Artifact lifecycle` note and sets `details.artifactCleanup` with the same retention summary as `details.artifactRetentionSummary`, a fixed `note` about host-owned cleanup, and `explicitArtifactPaths`: up to ten distinct paths from manifest rows whose `storageScope` is `explicit-path` (this list can be empty if the recent window only holds spills or other non-explicit inventory). Remove any listed paths with normal file tools after inspection.
275
282
 
@@ -283,7 +290,7 @@ After a successful unnamed fresh launch, later default `sessionMode: "auto"` cal
283
290
 
284
291
  ## Authenticated/profile workflows
285
292
 
286
- The wrapper does not clone profiles or hide what upstream Chrome profile you chose. Passing `--profile` is an explicit upstream `agent-browser` choice.
293
+ The wrapper does not clone profiles or hide what upstream Chrome profile you chose. Passing `--profile` is an explicit upstream `agent-browser` choice. Visible page content from real profiles is model-visible and may persist in transcripts or saved artifacts; redaction protects credential-like cookie/storage/auth values, not ordinary page text you asked the browser to read.
287
294
 
288
295
  Use these rules:
289
296
 
@@ -469,8 +476,8 @@ These calls return plain text and stay stateless: the extension does not inject
469
476
  <!-- agent-browser-playbook:start wrapper-tab-recovery -->
470
477
  <!-- Generated from extensions/agent-browser/lib/playbook.ts. Run `npm run docs -- playbook write` to update. -->
471
478
  - After launch-scoped open/goto/navigate calls that can restore existing tabs (for example --profile, --session-name, or --state), agent_browser best-effort re-selects the tab whose URL matches the returned page when restored tabs steal focus during launch.
472
- - After a target tab is known for a session, later active-tab commands best-effort pin that tab inside the same upstream invocation when reconnect drift would otherwise move the command to a restored/background tab.
473
- - After a successful command on a known target tab, agent_browser also best-effort restores that intended tab if a restored/background tab steals focus after the command completes.
479
+ - After the wrapper observes tab-drift risk for a session (for example profile restore correction, overlapping stale opens, or resumed session state), later active-tab commands best-effort pin that tab inside the same upstream invocation. Routine same-session commands are not preflighted with tab list just because a target tab is known.
480
+ - For sessions with observed tab-drift risk, after a successful command on a known target tab, agent_browser also best-effort restores that intended tab if a restored/background tab steals focus after the command completes. Routine same-session commands skip this post-command tab-list probe.
474
481
  - If a known session target unexpectedly reports about:blank, agent_browser preserves the prior intended target, best-effort re-selects it when it still exists, and reports exact recovery guidance when it cannot be re-selected.
475
482
  <!-- agent-browser-playbook:end wrapper-tab-recovery -->
476
483
 
@@ -116,9 +116,9 @@ Practical policy:
116
116
  - if an unnamed fresh launch replaces an active extension-managed session, best-effort close the old managed session after the switch succeeds
117
117
  - leave explicit caller-provided `--session` choices alone unless the caller closes them explicitly
118
118
  - after profiled `open` / `goto` / `navigate` calls, verify the active tab still matches the returned page URL and best-effort switch back when restored profile tabs steal focus
119
- - once the wrapper knows which tab the agent is operating on, later active-tab commands may synthesize a tiny upstream `batch` that re-selects that tab and then runs the requested command in the same upstream invocation; this stays thin while avoiding reconnect-time drift on profile-restored sessions
120
- - after a successful command on a known tab target, the wrapper may best-effort restore that same target again if restored/background tabs steal focus after the command returns
121
- - keep a per-session `refSnapshot` aligned with the last successful `snapshot` (including refs merged from a successful `batch` by taking the last successful `snapshot` step in batch result order): restore it from persisted tool `details` when reloading or resuming, drop it on successful `close`, and refuse mutation-prone `@e…` argv before spawn when the active tab URL no longer matches the snapshot URL, when a ref id was never in that snapshot, or when `batch` stdin would reuse `@e…` on a guarded step after an earlier invalidating step without a later `snapshot` step in the same stdin array—see [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#details) for the agent-visible contract and failure text
119
+ - once the wrapper observes tab-drift risk for a session (profile restore correction, overlapping stale opens, or restored session state), later active-tab commands may synthesize a tiny upstream `batch` that re-selects that tab and then runs the requested command in the same upstream invocation; routine same-session commands avoid `tab list` preflights to reduce probes that can perturb upstream click behavior
120
+ - for sessions with observed tab-drift risk, after a successful command on a known tab target, the wrapper may best-effort restore that same target again if restored/background tabs steal focus after the command returns; routine same-session commands skip this post-command `tab list` probe
121
+ - keep a per-session `refSnapshot` aligned with the last successful `snapshot` (including refs merged from a successful `batch` by taking the last successful `snapshot` step in batch result order): restore it from persisted tool `details` when reloading or resuming, drop it on successful `close`, and refuse mutation-prone `@e…` argv before spawn when the active tab URL no longer matches the snapshot URL, when a ref id was never in that snapshot, or when `batch` stdin would reuse `@e…` on a guarded step after an earlier invalidating step without a later `snapshot` step in the same stdin array. Same-snapshot `fill @e…` rows are guarded but do not themselves set that invalidation latch, so ordinary form fills can precede a click/submit row in one batch—see [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#details) for the agent-visible contract and failure text
122
122
  - after successful `get text` on a non-ref CSS selector, optionally issue one read-only `eval --stdin` probe per qualifying selector when multiple DOM matches or a hidden first match with visible peers could misread tabbed or off-screen content; merge `details.selectorTextVisibility` / `selectorTextVisibilityAll`, visible warning lines, and `inspect-visible-text-candidates*` next actions as documented in [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#details) and `RQ-0074` in [`SUPPORT_MATRIX.md`](SUPPORT_MATRIX.md)
123
123
  - for local Unix launches, set a short private socket directory so extension-generated session names do not fail on the upstream Unix socket-path length limit
124
124
  - keep wrapper-spawned upstream CLI calls inside the upstream IPC budget by clamping `AGENT_BROWSER_DEFAULT_TIMEOUT` to 25 seconds and stopping a stuck child process before the upstream 30-second read-timeout retry loop begins
@@ -133,13 +133,15 @@ Examples:
133
133
  { "args": ["snapshot", "-i"] }
134
134
  ```
135
135
 
136
- The optional native `semanticAction` object is only a thin schema for common locator-based actions; it compiles to existing upstream `find` commands and reports the compiled argv in `details.compiledSemanticAction` (see [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#semanticaction) for the full field rules). It is a top-level alternative to `args`, `job`, `qa`, `sourceLookup`, and `networkSourceLookup`, not a nested shape inside `batch` stdin arrays. Add `session` inside `semanticAction` when the shorthand should target a named upstream browser session; the compiled argv prepends `--session <name>` before `find`, and fallback candidate actions preserve that prefix. For active sessions, role/name click/check/uncheck shorthands may resolve through the current `snapshot -i` refs before execution so hidden duplicate matches do not steal the action; inspect `details.effectiveArgs` when you need the exact executed argv. If a semantic action misses with `selector-not-found`, visible output may include `Agent-browser candidate fallbacks`, while `details.nextActions` first recommends a fresh `snapshot -i` and may include bounded role/name retries—for example `searchbox`/`textbox` for a missed `placeholder` fill, `button`/`link` for a missed `text` click, or a `textbox` retry for a missed `label` fill—each as a `try-*-candidate` entry carrying redacted `find role …` argv.
136
+ The optional native `semanticAction` object is only a thin schema for common locator-based actions; it compiles to existing upstream `find` commands and reports the compiled argv in `details.compiledSemanticAction` (see [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#semanticaction) for the full field rules). It is a top-level alternative to `args`, `job`, `qa`, `sourceLookup`, and `networkSourceLookup`, not a nested shape inside `batch` stdin arrays. Add `session` inside `semanticAction` when the shorthand should target a named upstream browser session; the compiled argv prepends `--session <name>` before `find`, and fallback candidate actions preserve that prefix. For active sessions, role/name click/check/uncheck shorthands may resolve through the current `snapshot -i` refs before execution so hidden duplicate matches do not steal the action; inspect `details.effectiveArgs` when you need the exact executed argv. If a raw `find` or semantic action misses with `selector-not-found`, the wrapper may take one fresh snapshot and append `Current snapshot ref fallback` with `try-current-visible-ref*` next actions when that snapshot has exact visible role/name matches for the failed target. Semantic misses may also include `Agent-browser candidate fallbacks`; `details.nextActions` first recommends a fresh `snapshot -i` and may include bounded role/name retries—for example `searchbox`/`textbox` for a missed `placeholder` fill, `button`/`link` for a missed `text` click, or a `textbox` retry for a missed `label` fill—each as a `try-*-candidate` entry carrying redacted `find role …` argv.
137
137
 
138
138
  Do not assume Playwright selector dialects such as `text=Close` or `button:has-text('Close')` are supported wrapper syntax. If you need those forms, verify current upstream `agent-browser` behavior first; otherwise use refs, `find`, or known CSS selectors.
139
139
 
140
- Treat `@e…` refs as page-scoped. After a successful `snapshot`, the wrapper records the latest refs and page target for that session; mutation-prone ref commands such as `click @e4` or batch steps with old refs fail with `failureCategory: "stale-ref"` when the page target changed or the ref is absent from the latest same-page snapshot. Inside `batch` stdin JSON, the wrapper also walks steps in order before spawn: steps whose first token can navigate or mutate set a latch; a later step whose first token is `snapshot` clears that latch for following rows; guarded steps that still mention `@e…` after an uncleared latch fail with the same `stale-ref` bucket without launching upstream. Follow the `refresh-interactive-refs` next action (it includes `--session <name>` when needed) and prefer stable `find` or `semanticAction` locators when navigation or rerendering is likely. Contract detail: [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#details) (`refSnapshot`).
140
+ Treat `@e…` refs as page-scoped. After a successful `snapshot`, the wrapper records the latest refs and page target for that session; mutation-prone ref commands such as `click @e4` or batch steps with old refs fail with `failureCategory: "stale-ref"` when the page target changed or the ref is absent from the latest same-page snapshot. Inside `batch` stdin JSON, the wrapper also walks steps in order before spawn: steps whose first token can navigate or mutate set a latch; a later step whose first token is `snapshot` clears that latch for following rows; guarded steps that still mention `@e…` after an uncleared latch fail with the same `stale-ref` bucket without launching upstream. Same-snapshot form fills are allowed before a click or submit step, so a login-style `fill`, `fill`, `click` batch can run from one snapshot; split dynamic or autosubmit forms with a fresh snapshot if a fill itself rerenders the targets. Follow the `refresh-interactive-refs` next action (it includes `--session <name>` when needed) and prefer stable `find` or `semanticAction` locators when navigation or rerendering is likely. Contract detail: [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#details) (`refSnapshot`).
141
141
 
142
- When a **top-level** `click` succeeds (not a `click` hidden inside a `batch`/`job` tool call—the unified command must be `click`), the upstream payload includes `data.clicked`, and the wrapper sees the active tab URL unchanged after the same normalization it uses for ref guards (**`#fragment` ignored**), it may run one extra `snapshot -i` and surface `Possible overlay blockers` plus `details.overlayBlockers` (`candidates`, `summary`, and a `snapshot` map that can refresh `refSnapshot`) when that snapshot shows strong modal context (`dialog` / `alertdialog`) **and** up to three close/dismiss-like controls; page-wide words such as privacy, sign in, or banner alone do not trigger it. The URL check compares the session’s prior pinned tab target to `details.navigationSummary.url` after the click; that summary is gathered with extra `get url` / `get title` calls only when the click JSON omits **both** string `data.url` and `data.title`—if upstream already echoes either field, overlay diagnostics are skipped on this path. The diagnostic is skipped if the wrapper already applied tab-focus correction or about-blank recovery on that result. Appended `inspect-overlay-state` / `try-overlay-blocker-candidate-*` entries in `details.nextActions` include `--session <name>` when the session is named, same as other session-scoped follow-ups. Treat `inspect-overlay-state` as the safe first follow-up; only use a `try-overlay-blocker-candidate-*` next action when the candidate is clearly the control you intend to close.
142
+ A successful `click` result means upstream reported a target, not that the app definitely handled the event. When the workflow depends on a mutation, use `details.pageChangeSummary`, a wait, URL/text extraction, or a fresh `snapshot -i` before trusting the state; if nothing changed, retry with a current visible ref or stable selector and report the workflow issue. Preserve explicit user stop boundaries: if the user says to stop before a final order, post, purchase, or submit action, gather evidence from that page and do not click the final action. The wrapper avoids site-specific fallback clicks and keeps the verification burden explicit.
143
+
144
+ When a **top-level** `click` succeeds (not a `click` hidden inside a `batch`/`job` tool call—the unified command must be `click`), the upstream payload includes `data.clicked`, and the wrapper sees the active tab URL unchanged after the same normalization it uses for ref guards (**`#fragment` ignored**), it may run one extra `snapshot -i` and surface `Possible overlay blockers` plus `details.overlayBlockers` (`candidates`, `summary`, and a `snapshot` map that can refresh `refSnapshot`) when that snapshot shows strong modal context (`dialog` / `alertdialog`) **and** up to three close/dismiss-like controls; page-wide words such as privacy, sign in, or banner alone do not trigger it. The URL check compares the session’s prior pinned tab target to `details.navigationSummary.url` after the click; that summary is gathered with one read-only `eval` when the click JSON omits **both** string `data.url` and `data.title`—if upstream already echoes either field, overlay diagnostics are skipped on this path. The diagnostic is skipped if the wrapper already applied tab-focus correction or about-blank recovery on that result. Appended `inspect-overlay-state` / `try-overlay-blocker-candidate-*` entries in `details.nextActions` include `--session <name>` when the session is named, same as other session-scoped follow-ups. Treat `inspect-overlay-state` as the safe first follow-up; only use a `try-overlay-blocker-candidate-*` next action when the candidate is clearly the control you intend to close.
143
145
 
144
146
  ### Extract page data
145
147
 
@@ -150,6 +152,12 @@ When a **top-level** `click` succeeds (not a `click` hidden inside a `batch`/`jo
150
152
  { "args": ["eval", "--stdin"], "stdin": "document.title" }
151
153
  ```
152
154
 
155
+ When you already know several visible refs or selectors, extract them in one `batch` call instead of many serial getter calls:
156
+
157
+ ```json
158
+ { "args": ["batch"], "stdin": "[[\"get\",\"text\",\"@e64\"],[\"get\",\"text\",\"@e65\"],[\"get\",\"text\",\"@e66\"]]" }
159
+ ```
160
+
153
161
  Prefer `get` and scoped `eval --stdin` for read-only extraction. Getter names are grouped under `get`: use `get title`, `get url`, or `get text <selector>`, not shortcut commands such as `title` or `url`. When upstream reports an unknown command, unknown subcommand, or unrecognized command for a single-token shortcut (`attr`, `count`, `html`, `text`, `title`, `url`, or `value`), the wrapper adds a visible grouped-`get` hint; only `title` and `url` also get exact read-only `details.nextActions` (`use-get-title` / `use-get-url`, with `--session` preserved when the failed call named a session). If another `Agent-browser hint:` (selector dialect or stale-ref recovery) was already appended to the same error text, the getter hint is omitted.
154
162
 
155
163
  Return the intended JavaScript value from `eval --stdin` instead of relying on `console.log`. For object-shaped extraction, pass a plain expression such as `({ title: document.title, url: location.href })`; if you send a function-shaped snippet, invoke it explicitly, for example `(() => ({ title: document.title }))()`. When upstream serializes a function result to `{}`, the wrapper can append `Eval stdin hint` and `details.evalStdinHint`.
@@ -239,7 +247,7 @@ A successful wait-based download renders a readable summary such as `Download co
239
247
  { "args": ["pdf", "/tmp/page.pdf"] }
240
248
  ```
241
249
 
242
- The upstream screenshot aliases are `screenshot --full` for full-page capture and `screenshot --annotate` for labeled screenshots.
250
+ The upstream screenshot aliases are `screenshot --full` for full-page capture and `screenshot --annotate` for labeled screenshots. When a user gives exact artifact paths for screenshots, recordings, downloads, PDFs, traces, or HAR files, use those paths or explicitly report why the artifact was unavailable; do not silently substitute another path in the final report.
243
251
 
244
252
  Prefer `download <selector> <path>` when the target element itself is the downloadable link/control. Use `click` plus `wait --download [path]` when a previous action starts the download indirectly.
245
253
 
@@ -323,6 +331,7 @@ Stateful commands are native `agent_browser` calls, not shell commands. Keep sec
323
331
 
324
332
  Operational notes:
325
333
 
334
+ - Visible page content from real authenticated profiles is still model-visible and may persist in transcripts or saved artifacts. The wrapper redacts credential-like cookie/storage/auth data, not the ordinary page text you asked it to read.
326
335
  - `stdin` is accepted only for `batch`, `eval --stdin`, and `auth save --password-stdin`; other stdin-bearing calls are rejected before launch.
327
336
  - `auth list/show/save/login/delete` summaries avoid expanding profile secrets. Prefer `auth save --password-stdin` over `--password <value>`.
328
337
  - `state save <path>` is a verified file-artifact workflow; inspect `details.artifactVerification` before relying on the file. `state load <path>` is not treated as a newly saved artifact.
@@ -609,8 +618,8 @@ Other useful environment variables include `AGENT_BROWSER_DEFAULT_TIMEOUT`, `AGE
609
618
  <!-- agent-browser-playbook:start wrapper-tab-recovery -->
610
619
  <!-- Generated from extensions/agent-browser/lib/playbook.ts. Run `npm run docs -- playbook write` to update. -->
611
620
  - After launch-scoped open/goto/navigate calls that can restore existing tabs (for example --profile, --session-name, or --state), agent_browser best-effort re-selects the tab whose URL matches the returned page when restored tabs steal focus during launch.
612
- - After a target tab is known for a session, later active-tab commands best-effort pin that tab inside the same upstream invocation when reconnect drift would otherwise move the command to a restored/background tab.
613
- - After a successful command on a known target tab, agent_browser also best-effort restores that intended tab if a restored/background tab steals focus after the command completes.
621
+ - After the wrapper observes tab-drift risk for a session (for example profile restore correction, overlapping stale opens, or resumed session state), later active-tab commands best-effort pin that tab inside the same upstream invocation. Routine same-session commands are not preflighted with tab list just because a target tab is known.
622
+ - For sessions with observed tab-drift risk, after a successful command on a known target tab, agent_browser also best-effort restores that intended tab if a restored/background tab steals focus after the command completes. Routine same-session commands skip this post-command tab-list probe.
614
623
  - If a known session target unexpectedly reports about:blank, agent_browser preserves the prior intended target, best-effort re-selects it when it still exists, and reports exact recovery guidance when it cannot be re-selected.
615
624
  <!-- agent-browser-playbook:end wrapper-tab-recovery -->
616
625
  - Wrapper-spawned commands clamp `AGENT_BROWSER_DEFAULT_TIMEOUT` to 25 seconds and use a 28-second child-process watchdog (`PI_AGENT_BROWSER_PROCESS_TIMEOUT_MS` overrides the default 28s budget) so one upstream CLI call does not cross the upstream 30-second IPC read-timeout/retry path. When that watchdog fires, `details.timeoutPartialProgress` may include a planned step list for compiled `job` / `qa` plans or caller `batch` stdin, current page title/URL from best-effort session `get url` / `get title` (or a planned URL inferred from the step list when the session cannot answer), and declared artifact paths such as `screenshot`, `pdf`, `download`, or `wait --download` outputs with existence/size checks; the same evidence is appended under `Timeout partial progress` in visible text with URL/path redaction.
package/docs/RELEASE.md CHANGED
@@ -69,6 +69,47 @@ Minimum pass:
69
69
 
70
70
  Record release evidence as a short note with: date, package/checkout source, target URL, browser command families exercised, artifacts collected and cleaned up, known Grafana-side noise observed, and any product findings converted into CueLoop tasks. Do not commit private dogfood scripts, VFR harness files, raw browser profiles, HARs, videos, or `.dogfood/` run output as product docs.
71
71
 
72
+ ## Public Sauce Demo checkout smoke prompt
73
+
74
+ Use this validation prompt after changing click enrichment, tab pinning, ref preflight, form-fill batching, artifact handling, recording, or prompt guidance. It is intentionally more stateful than `example.com` and uses a natural user-style request so the transcript shows what the agent chooses on its own. Do **not** mention `agent_browser`, snapshots, refs, `batch`, `eval`, or upstream command names in the prompt; those are evaluator expectations, not user instructions.
75
+
76
+ Run it in an isolated checkout session. It is fine to restrict active tools at launch so the checkout extension is the only browser surface, but keep that detail out of the user prompt:
77
+
78
+ ```bash
79
+ pi --no-extensions -e . --model openai-codex/gpt-5.5:minimal --tools agent_browser --session-dir "$SESSION_DIR"
80
+ ```
81
+
82
+ Repeat with `--model openai-codex/gpt-5.5:medium` when validating instruction-following robustness. Use unique temp paths for each run and delete them afterward.
83
+
84
+ Copy/paste prompt, replacing the two artifact placeholders with exact absolute paths:
85
+
86
+ ```text
87
+ Please do an end-to-end QA pass on the public Sauce Demo store.
88
+
89
+ Site: https://www.saucedemo.com/
90
+ Demo credentials: standard_user / secret_sauce
91
+
92
+ Use a clean browser context, not my personal Chrome profile.
93
+
94
+ Scenario:
95
+ - Log in.
96
+ - Sort products by price low to high.
97
+ - Add at least two products to the cart.
98
+ - Open the cart.
99
+ - Start checkout with a fake name and postal code.
100
+ - Stop on the checkout overview page; do not place the order.
101
+
102
+ Please gather enough evidence to support the QA result:
103
+ - Save a screenshot here: <ABSOLUTE_SCREENSHOT_PATH>.png
104
+ - Save a short screen recording here if recording is available: <ABSOLUTE_RECORDING_PATH>.webm
105
+ - Include the final page title/URL, the selected sort order, cart contents, item total/tax/total, and any browser-side network, console, or page-error issues you see.
106
+ - Clean up by closing the browser when finished.
107
+
108
+ Return a concise PASS/FAIL report with evidence and any tool or workflow issues you noticed.
109
+ ```
110
+
111
+ Evaluator expectations after the queued Sauce Demo fixes: the agent should independently choose efficient, safe browser operations; native add-to-cart clicks should mutate cart state without JavaScript fallback; same-snapshot form fills may be batched safely when the agent chooses that route; the selected sort order should be verified; checkout must stop before Finish and must not place the order; screenshot and recording must use the requested paths or be explicitly reported unavailable; `network requests` may show public-demo telemetry 401s; `console` may report offline-cache logs; `errors` should show no page errors; and the browser session plus temp artifacts should be cleaned up after evidence is recorded. A run that clicks Finish despite the stop instruction or silently substitutes artifact paths is a workflow failure even if the store flow itself works.
112
+
72
113
  ## Deterministic agent efficiency benchmark
73
114
 
74
115
  [`scripts/agent-browser-efficiency-benchmark.mjs`](../scripts/agent-browser-efficiency-benchmark.mjs) is an accounting-only benchmark: it does not shell out to `agent-browser`, launch a browser, or read or write Pi sessions. It models representative `agent_browser` call shapes (including optional `stdin` for `batch` and top-level `job`, `qa`, or experimental `sourceLookup` / `networkSourceLookup` objects that compile to batch) and aggregates success rate, tool-call counts, UTF-8 size of model-visible strings, stale-ref failure and recovery counts, artifact success, distinct failure-category coverage, and summed elapsed-time estimates. When extending scenarios, keep them aligned with the closed `RQ-0068` “no reusable recipe layer” rationale in [`ARCHITECTURE.md`](ARCHITECTURE.md#no-reusable-recipe-layer-yet) (benchmark ids cited there are the canonical inventory for that evidence bar).
@@ -28,7 +28,7 @@ When upstream ships a new `agent-browser` or the inventory changes:
28
28
  - Source of truth: `CAPABILITY_BASELINE.inventorySections` in the same file (stable `id` keys: `skills`, `core-commands`, `state-tabs-frames-dialogs`, `network-storage-artifacts-diagnostics`, `batch-auth-setup-ai`, `options-and-env`).
29
29
  - Status: supported for the current wrapper contract.
30
30
  - High-priority support gaps: none identified in the baseline audit.
31
- - Remaining queued work: only `RQ-0084` remains active, covering the `0.2.28` npm/GitHub release after npm authentication is restored. Dogfood-driven improvements `RQ-0080` through `RQ-0083` and `RQ-0085` are implemented and are beyond the current baseline support promise for thin upstream command coverage. Constrained `job` (`RQ-0064`), the lightweight `qa` preset (`RQ-0065`), the experimental `sourceLookup` helper (`RQ-0066`), and the experimental `networkSourceLookup` helper (`RQ-0067`) are implemented; see [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#job), [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#qa), [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#sourcelookup), and [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#networksourcelookup). Reusable browser recipes (`RQ-0068`) are intentionally not adopted as a runtime surface; see [`ARCHITECTURE.md`](ARCHITECTURE.md#no-reusable-recipe-layer-yet).
31
+ - Post-`v0.2.29` review state: commits `eb55320` through `86abbfb` add browser guidance/smoke coverage plus `RQ-0086` click-probe reduction, `RQ-0087` same-snapshot form fill batching, `RQ-0088` current-ref fallback on locator misses, `RQ-0089` direct-upstream click mutation investigation, and `RQ-0090` stop-boundary/artifact-path guidance. CueLoop validation was idle/valid on 2026-05-18 after those tasks were marked done. Constrained `job` (`RQ-0064`), the lightweight `qa` preset (`RQ-0065`), the experimental `sourceLookup` helper (`RQ-0066`), and the experimental `networkSourceLookup` helper (`RQ-0067`) are implemented; see [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#job), [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#qa), [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#sourcelookup), and [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#networksourcelookup). Reusable browser recipes (`RQ-0068`) are intentionally not adopted as a runtime surface; see [`ARCHITECTURE.md`](ARCHITECTURE.md#no-reusable-recipe-layer-yet).
32
32
 
33
33
  ## Verification evidence
34
34
 
@@ -36,12 +36,12 @@ Re-run the gates below before each release; this table records what the closure
36
36
 
37
37
  | Gate | Evidence | Status |
38
38
  | --- | --- | --- |
39
- | Default local gate | `npm run verify` checks generated playbook drift, `tsc --noEmit`, unit/fake tests, generated command-reference blocks, and live command-reference sampling. | Pass on 2026-05-15 (`npm run verify`, `agent-browser 0.27.0` on `PATH`). |
40
- | Real upstream contract | `npm run verify -- real-upstream` runs the localhost fixture matrix against the real installed `agent-browser` matching the baseline. | Pass on 2026-05-14 (`npm run verify -- real-upstream`). |
41
- | Packaged Pi smoke | `npm run verify -- package-pi` validates package contents, loads exactly one packaged `agent_browser` tool, and executes fake-upstream `--version`. | Pass on 2026-05-15 as part of `npm run verify -- release`. |
42
- | `verify -- release` / `prepublishOnly` | `npm run verify -- release` chains the default gate with packaged Pi smoke (`verifySteps` `release` in [`scripts/project.mjs`](../scripts/project.mjs)). `package.json` `prepublishOnly` runs that compose before `npm pack --dry-run` during `npm publish`. It intentionally omits lifecycle, real-upstream, and benchmark modes—see [`RELEASE.md`](RELEASE.md#pre-release-checks). | Pass on 2026-05-15; `prepublishOnly` also passed during the blocked `npm publish` attempt before npm returned `ENEEDAUTH`. |
43
- | Configured-source lifecycle | `npm run verify -- lifecycle` (`scripts/verify-lifecycle.mjs`) drives `/reload`, restart, `/resume`, session continuity, slash-command sentinel tokens (`v1` then `v2` after rewriting the packaged extension to simulate pickup), and persisted spill reachability with a fake upstream on `PATH`. Passthrough flags are defined in `validatePassthrough` in [`scripts/project.mjs`](../scripts/project.mjs): `--keep-artifacts`, `--verbose`, and `--timeout-ms` plus a separate positive integer value (for example `npm run verify -- lifecycle --keep-artifacts --verbose --timeout-ms 600000`). | Pass on 2026-05-14 (`npm run verify -- lifecycle --keep-artifacts --verbose --timeout-ms 600000`) during release cleanup; retained temp artifacts were removed after inspection. Treat any future unexplained red lifecycle gate as a release blocker. |
44
- | Quick isolated Pi smoke | `pi --no-extensions -e .` from repo root; native `agent_browser` only. | Covered version/help/skills, open/snapshot/click, eval stdin, batch stdin, screenshot, explicit session, `sessionMode: "fresh"`, network requests, console/errors, diff snapshot, stream status/disable, dashboard start/stop, and chat credential-failure pass-through during RQ-0055; RQ-0056 cleanup spot-check found no lingering tmux or repo-local smoke artifacts. |
39
+ | Default local gate | `npm run verify` checks generated playbook drift, `tsc --noEmit`, unit/fake tests, generated command-reference blocks, and live command-reference sampling. | Pass on 2026-05-18 (`npm run verify`, `agent-browser 0.27.0` on `PATH`). |
40
+ | Real upstream contract | `npm run verify -- real-upstream` runs the localhost fixture matrix against the real installed `agent-browser` matching the baseline. | Pass on 2026-05-18 (`npm run verify -- real-upstream`). |
41
+ | Packaged Pi smoke | `npm run verify -- package-pi` validates package contents, loads exactly one packaged `agent_browser` tool, and executes fake-upstream `--version`. | Pass on 2026-05-18 (`npm run verify -- package-pi`). |
42
+ | `verify -- release` / `prepublishOnly` | `npm run verify -- release` chains the default gate with packaged Pi smoke (`verifySteps` `release` in [`scripts/project.mjs`](../scripts/project.mjs)). `package.json` `prepublishOnly` runs that compose before `npm pack --dry-run` during `npm publish`. It intentionally omits lifecycle, real-upstream, and benchmark modes—see [`RELEASE.md`](RELEASE.md#pre-release-checks). | Pass on 2026-05-18 (`npm run verify -- release`). `prepublishOnly` still needs a fresh run during actual publish. |
43
+ | Configured-source lifecycle | `npm run verify -- lifecycle` (`scripts/verify-lifecycle.mjs`) drives `/reload`, restart, `/resume`, session continuity, slash-command sentinel tokens (`v1` then `v2` after rewriting the packaged extension to simulate pickup), and persisted spill reachability with a fake upstream on `PATH`. Passthrough flags are defined in `validatePassthrough` in [`scripts/project.mjs`](../scripts/project.mjs): `--keep-artifacts`, `--verbose`, and `--timeout-ms` plus a separate positive integer value (for example `npm run verify -- lifecycle --keep-artifacts --verbose --timeout-ms 600000`). | Pass on 2026-05-18 (`npm run verify -- lifecycle`). Treat any future unexplained red lifecycle gate as a release blocker. |
44
+ | Quick isolated Pi smoke | `pi --no-extensions -e .` from repo root; native `agent_browser` only. | Pass on 2026-05-18 for a fresh interactive tmux smoke: the agent opened `https://example.com`, waited for `Example Domain`, saved `/tmp/piab-isolated-smoke.png` with verified `image/png` artifact metadata, closed the browser session, and reported PASS. Broader historical coverage also includes version/help/skills, open/snapshot/click, eval stdin, batch stdin, screenshot, explicit session, `sessionMode: "fresh"`, network requests, console/errors, diff snapshot, stream status/disable, dashboard start/stop, and chat credential-failure pass-through during RQ-0055. |
45
45
 
46
46
  ## Baseline checklist by inventory section
47
47
 
@@ -64,13 +64,21 @@ Native `job`, `qa`, experimental `sourceLookup`, and experimental `networkSource
64
64
 
65
65
  `RQ-0068` closed with a no-adopt decision for reusable browser recipes. Current benchmark and repo-local dogfood evidence do not show repeated named job shapes that justify executable recipe state; examples stay in docs and prompt guidance, while the `qa` preset remains the only stable repeated smoke-test shortcut. Revisit recipes only with concrete repeated workflow evidence and a defined owner/versioning/test plan.
66
66
 
67
- `RQ-0070` adds bounded locator fallbacks when a compiled top-level `semanticAction` fails with `failureCategory: "selector-not-found"`: `extensions/agent-browser/index.ts` appends `try-*-candidate` entries to `details.nextActions` (and an `Agent-browser candidate fallbacks` block in visible text) only for `fill`+`placeholder`, `click`+`text`, or `fill`+`label` (not `select`, even with the same locators). Active-session role/name click/check/uncheck shorthands also get a pre-execution visible-ref resolution pass via one fresh `snapshot -i`, so hidden duplicate upstream `find` matches do not steal the action; the original target remains in `details.compiledSemanticAction` and the executed ref appears in `details.effectiveArgs`. Contract: [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#semanticaction); fake coverage: `agentBrowserExtension returns semantic locator candidates when semanticAction misses` and `agentBrowserExtension resolves semantic role clicks through current visible snapshot refs when available` in [`test/agent-browser.extension-validation.test.ts`](../test/agent-browser.extension-validation.test.ts).
67
+ `RQ-0070` adds bounded locator fallbacks when a compiled top-level `semanticAction` fails with `failureCategory: "selector-not-found"`: `extensions/agent-browser/index.ts` appends `try-*-candidate` entries to `details.nextActions` (and an `Agent-browser candidate fallbacks` block in visible text) only for `fill`+`placeholder`, `click`+`text`, or `fill`+`label`. Other locator/action pairs omit this block; top-level `semanticAction` does not expose upstream `select`, so select workflows use explicit `args`. Active-session role/name click/check/uncheck shorthands also get a pre-execution visible-ref resolution pass via one fresh `snapshot -i`, so hidden duplicate upstream `find` matches do not steal the action; the original target remains in `details.compiledSemanticAction` and the executed ref appears in `details.effectiveArgs`. Contract: [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#semanticaction); fake coverage: `agentBrowserExtension returns semantic locator candidates when semanticAction misses` and `agentBrowserExtension resolves semantic role clicks through current visible snapshot refs when available` in [`test/agent-browser.extension-validation.test.ts`](../test/agent-browser.extension-validation.test.ts).
68
68
 
69
69
  `RQ-0071` makes that shorthand session-aware: optional `semanticAction.session` compiles to `--session <name>` before `find`, so `buildExecutionPlan` treats the call like any argv that already names an upstream session (no extra implicit `--session`); `details.sessionName` reflects the name on success; stale-ref retries copy compiled argv with that prefix, and `try-*` candidates preserve the same `--session` prefix via `getCompiledSemanticActionSessionPrefix`. Contract: [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#semanticaction); fake coverage: `semanticAction` session compile/assertions in [`test/agent-browser.extension-validation.test.ts`](../test/agent-browser.extension-validation.test.ts).
70
70
 
71
- `RQ-0072` guards page-scoped `@e…` refs against silent recycling: successful `snapshot` (or the last `snapshot` step inside a successful `batch`) records `details.refSnapshot` with ref ids and the snapshot page URL; `extensions/agent-browser/index.ts` replays per-session snapshots from the transcript on reload/resume, clears them on successful `close`, rejects mutation-prone ref argv before spawn when the tab URL diverges or a ref id is missing from the latest snapshot, blocks `batch` stdin that uses `@e…` on a guarded command after an earlier step that can navigate or mutate until a `snapshot` step appears later in the same stdin array (pre-spawn latch reset only), and prefixes `refresh-interactive-refs` with `--session` when the call names a session (including upstream-classified `stale-ref` outcomes). Contract: [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#details) (`refSnapshot`, `stale-ref`); human workflow: [`COMMAND_REFERENCE.md`](COMMAND_REFERENCE.md) snapshot/ref notes and README pitfalls; fake coverage: `agentBrowserExtension blocks page-scoped ref reuse…`, `…blocks stale refs after page-changing steps inside a batch`, `…allows batch stdin ref steps after snapshot following an invalidating step`, `…records snapshot refs returned inside a successful batch`, and `…rejects refs absent from the latest same-page snapshot` in [`test/agent-browser.extension-validation.test.ts`](../test/agent-browser.extension-validation.test.ts).
71
+ `RQ-0088` adds current-snapshot ref fallback for selector misses: when raw `find` or compiled `semanticAction` fails with `failureCategory: "selector-not-found"`, `extensions/agent-browser/index.ts` may take one fresh session-scoped `snapshot -i`, look for exact normalized role/name matches for the failed target, emit `details.visibleRefFallback` plus visible `Current snapshot ref fallback`, and append bounded direct-ref next actions (`try-current-visible-ref` / `try-current-visible-ref-N`). The matcher is intentionally narrow: role locators require `--name`; text-click maps only to exact-name `button`/`link` refs; label/placeholder fill maps only to exact-name textbox/searchbox-style refs; prefixes/fuzzy matches are ignored, and duplicate exact matches carry ambiguity safety copy. Contract: [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#details) (`visibleRefFallback`, nextActions); human workflow: [`COMMAND_REFERENCE.md`](COMMAND_REFERENCE.md) selector strategy and README pitfalls; fake coverage: `agentBrowserExtension suggests current snapshot refs when raw find role locators miss` in [`test/agent-browser.extension-validation.test.ts`](../test/agent-browser.extension-validation.test.ts).
72
72
 
73
- `RQ-0073` surfaces likely overlay blockers after no-navigation clicks without inventing blind targets: for **top-level** `click` results (unified command `click`, not `batch`-wrapped steps) whose upstream JSON includes `data.clicked`, whose prior pinned tab URL and post-click URL (from `details.navigationSummary`, normally via `get url` when the click payload omits **both** string `data.url` and `data.title`) stay equal after the same fragment-insensitive normalization used for ref preflight, and where the same unified result did **not** already apply session tab correction or about-blank mismatch recovery, `extensions/agent-browser/index.ts` takes one fresh session-scoped `snapshot -i`, scans `refs` for strong modal context (`dialog` / `alertdialog`) plus up to three close/dismiss-pattern `button`/`link`/`menuitem` controls, and only then emits `details.overlayBlockers` (`candidates`, `summary`, and a `snapshot` map that can advance `refSnapshot`), visible `Possible overlay blockers`, and `inspect-overlay-state` / `try-overlay-blocker-candidate-*` next actions (with `--session` prefix when the session is named) appended after presentation follow-ups such as `inspect-after-mutation`. Page-wide privacy/sign-in/banner text without a dialog role is deliberately ignored to avoid warnings after ordinary same-page clicks. Contract: [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#details) (`overlayBlockers`); human workflow: [`COMMAND_REFERENCE.md`](COMMAND_REFERENCE.md) no-navigation click note and README pitfalls; fake coverage: `agentBrowserExtension surfaces likely overlay blockers after a no-op click` and `agentBrowserExtension does not report overlay blockers from unrelated page chrome after a successful same-page click` in [`test/agent-browser.extension-validation.test.ts`](../test/agent-browser.extension-validation.test.ts).
73
+ `RQ-0072` guards page-scoped `@e…` refs against silent recycling: successful `snapshot` (or the last `snapshot` step inside a successful `batch`) records `details.refSnapshot` with ref ids and the snapshot page URL; `extensions/agent-browser/index.ts` replays per-session snapshots from the transcript on reload/resume, clears them on successful `close`, rejects mutation-prone ref argv before spawn when the tab URL diverges or a ref id is missing from the latest snapshot, blocks `batch` stdin that uses `@e…` on a guarded command after an earlier step that can navigate or mutate until a `snapshot` step appears later in the same stdin array (pre-spawn latch reset only), and prefixes `refresh-interactive-refs` with `--session` when the call names a session (including upstream-classified `stale-ref` outcomes). Contract: [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#details) (`refSnapshot`, `stale-ref`); human workflow: [`COMMAND_REFERENCE.md`](COMMAND_REFERENCE.md) snapshot/ref notes and README pitfalls; fake coverage: `agentBrowserExtension blocks page-scoped ref reuse…`, `…blocks stale refs after page-changing steps inside a batch`, `…allows same-snapshot form fills before a batch click`, `…allows batch stdin ref steps after snapshot following an invalidating step`, `…records snapshot refs returned inside a successful batch`, and `…rejects refs absent from the latest same-page snapshot` in [`test/agent-browser.extension-validation.test.ts`](../test/agent-browser.extension-validation.test.ts).
74
+
75
+ `RQ-0087` keeps the RQ-0072 guard but removes `fill` from the batch invalidation latch: `fill @e…` rows remain guarded against stale/missing refs, yet multiple same-snapshot form fills can run before the first click/submit/navigation step in one upstream `batch`. A later guarded ref after `click`, `open`, `reload`, or other invalidating rows still fails before spawn unless the batch includes a fresh `snapshot` step first. This improves login/checkout efficiency without permitting likely post-navigation ref reuse. Contract: [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#details) (`Batch stdin ordering`); human workflow: README and [`COMMAND_REFERENCE.md`](COMMAND_REFERENCE.md) ref notes; fake coverage: `agentBrowserExtension allows same-snapshot form fills before a batch click` in [`test/agent-browser.extension-validation.test.ts`](../test/agent-browser.extension-validation.test.ts).
76
+
77
+ `RQ-0073` surfaces likely overlay blockers after no-navigation clicks without inventing blind targets: for **top-level** `click` results (unified command `click`, not `batch`-wrapped steps) whose upstream JSON includes `data.clicked`, whose prior pinned tab URL and post-click URL (from `details.navigationSummary`, gathered by one read-only `eval` summary when the click payload omits **both** string `data.url` and `data.title`) stay equal after the same fragment-insensitive normalization used for ref preflight, and where the same unified result did **not** already apply session tab correction or about-blank mismatch recovery, `extensions/agent-browser/index.ts` takes one fresh session-scoped `snapshot -i`, scans `refs` for strong modal context (`dialog` / `alertdialog`) plus up to three close/dismiss-pattern `button`/`link`/`menuitem` controls, and only then emits `details.overlayBlockers` (`candidates`, `summary`, and a `snapshot` map that can advance `refSnapshot`), visible `Possible overlay blockers`, and `inspect-overlay-state` / `try-overlay-blocker-candidate-*` next actions (with `--session` prefix when the session is named) appended after presentation follow-ups such as `inspect-after-mutation`. Page-wide privacy/sign-in/banner text without a dialog role is deliberately ignored to avoid warnings after ordinary same-page clicks. Contract: [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#details) (`overlayBlockers`); human workflow: [`COMMAND_REFERENCE.md`](COMMAND_REFERENCE.md) no-navigation click note and README pitfalls; fake coverage: `agentBrowserExtension surfaces likely overlay blockers after a no-op click` and `agentBrowserExtension does not report overlay blockers from unrelated page chrome after a successful same-page click` in [`test/agent-browser.extension-validation.test.ts`](../test/agent-browser.extension-validation.test.ts).
78
+
79
+ `RQ-0086` reduces wrapper-induced click fragility found during Sauce Demo smokes: navigation-summary enrichment for click/back/forward/reload/dblclick now uses one read-only `eval` (`({ title: document.title, url: location.href })`) instead of serial `get title` plus `get url` probes, including tab-pinned batch wrappers. Tab pinning/post-command tab correction now runs only after the wrapper has evidence of tab-drift risk (profile restore correction, overlapping stale opens, or restored session state), so ordinary same-session clicks no longer get repeated `tab list` probes. This keeps `details.navigationSummary`, overlay blocker checks, and drift recovery intact while avoiding the upstream `agent-browser 0.27.0` sequence that could report later clicks as successful without dispatching pointer/click events after repeated getter/tab/snapshot probes. Fake coverage: `agentBrowserExtension enriches click results with a post-navigation title and url summary`, `agentBrowserExtension pins the intended tab inside a follow-up command when reconnect drift would otherwise steal focus`, and about-blank/tab overlap assertions in [`test/agent-browser.extension-tabs.test.ts`](../test/agent-browser.extension-tabs.test.ts); manual validation source: [`RELEASE.md`](RELEASE.md#public-sauce-demo-checkout-smoke-prompt).
80
+
81
+ `RQ-0089` investigated remaining Sauce Demo no-op clicks after RQ-0086. Minimal direct-upstream probes against `agent-browser 0.27.0` reproduced the residual `Finish` behavior without the wrapper: both CSS `click [data-test="finish"]` and `find role button click --name Finish` returned success, but a page-level click listener recorded no click event and the URL stayed on `checkout-step-two.html` after a 1s wait; a separate cart-link check showed normal trusted click events and navigation when the app handled the event. Conclusion: the residual issue is upstream/site interaction rather than wrapper post-click probes. Runtime behavior stays thin/no site-specific DOM-click fallback; docs now state that click success is attempted-action evidence only and agents must verify important mutations with URL/text/state checks or fresh snapshots before continuing. Contract: [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#details) (`pageChangeSummary`, `nextActions`); human workflow: README and [`COMMAND_REFERENCE.md`](COMMAND_REFERENCE.md) click verification notes; manual evidence: direct-upstream RQ-0089 development probe plus [`RELEASE.md`](RELEASE.md#public-sauce-demo-checkout-smoke-prompt) smoke caveats.
74
82
 
75
83
  `RQ-0074` warns when `get text <selector>` may read hidden or tabbed DOM content: for non-ref CSS selectors, `extensions/agent-browser/index.ts` runs a read-only `eval --stdin` visibility probe after successful text reads, emits `details.selectorTextVisibility` plus visible warning text when the first match is hidden while visible matches exist or when multiple matches make the upstream first-match choice ambiguous, preserves multiple batched warnings in `details.selectorTextVisibilityAll`, and appends `inspect-visible-text-candidates` next actions. Contract: [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#details) (`selectorTextVisibility`); human workflow: [`COMMAND_REFERENCE.md`](COMMAND_REFERENCE.md) extraction note and README pitfalls; fake coverage: `agentBrowserExtension warns when get text may read hidden selector matches` in [`test/agent-browser.extension-validation.test.ts`](../test/agent-browser.extension-validation.test.ts).
76
84
 
@@ -35,10 +35,11 @@ Agent-facing efficiency claims are measured with `npm run benchmark:agent-browse
35
35
  <!-- agent-browser-playbook:start shared-guidelines -->
36
36
  <!-- Generated from extensions/agent-browser/lib/playbook.ts. Run `npm run docs -- playbook write` to update. -->
37
37
  - Standard workflow: open the page, snapshot -i, interact using current @refs from that snapshot, and re-snapshot after navigation, scrolling, rerendering, or other major DOM changes because refs are page-scoped; the wrapper fails mutation-prone stale/recycled refs before upstream can silently target a different current-page element.
38
+ - For ordinary forms from one snapshot, batch multiple fill @refs before the submit/click step to avoid serial tool calls; if a fill may autosubmit, navigate, or rerender later fields, split the flow and refresh refs first.
38
39
  - When snapshot -i compacts because the tree is oversized, scan visible output for Omitted high-value controls and optional details.data.highValueControlRefIds before opening the spill file: those list bounded searchboxes, textboxes, comboboxes, buttons, tabs, checkboxes, radios, options, and menuitems that did not fit the key/other ref previews.
39
40
  - When a visible text or accessible-name target should survive ref churn, prefer find locators such as role, text, label, placeholder, alt, title, or testid with the intended action instead of guessing a CSS selector.
40
41
  - Do not assume Playwright selector dialects such as text=Close or button:has-text('Close') are supported wrapper syntax unless current upstream agent-browser behavior has been verified.
41
- - For authenticated or user-specific content like feeds, inboxes, dashboards, and accounts, prefer --profile Default on the first browser call and let the implicit session carry continuity. Use --auto-connect only if profile-based reuse is unavailable or the task is specifically about attaching to a running debug-enabled browser.
42
+ - For authenticated or user-specific content explicitly requested by the user, such as feeds, inboxes, account pages, or private dashboards, prefer --profile Default on the first browser call and let the implicit session carry continuity. Do not use a real profile for public pages just because they are dashboards. Treat visible page content from real profiles as model-visible transcript data; use --auto-connect only if profile-based reuse is unavailable or the task is specifically about attaching to a running debug-enabled browser.
42
43
  - Do not invent fixed explicit session names for routine tasks. Use the implicit session unless you truly need multiple isolated browser sessions in the same conversation.
43
44
  - When using --profile, --session-name, --cdp, --state, --auto-connect, --init-script, --enable, -p/--provider, or iOS --device, put them on the first command for that session. If you intentionally use an explicit --session, keep using that same explicit session for follow-ups.
44
45
  - If you already used the implicit session and now need launch-scoped flags like --profile, --session-name, --cdp, --state, --auto-connect, --init-script, --enable, -p/--provider, or iOS --device, retry with sessionMode set to fresh or pass an explicit --session for the new launch. After a successful unnamed fresh launch, later auto calls follow that new session.
@@ -57,7 +58,7 @@ Agent-facing efficiency claims are measured with `npm run benchmark:agent-browse
57
58
  - When using eval --stdin, scope checks and actions to the target element or route whenever possible instead of relying on broad page-wide text heuristics.
58
59
  - When using eval --stdin for extraction, return the value you want instead of relying on console.log as the primary result channel. Prefer plain expressions like ({ title: document.title }) or explicitly invoked functions like (() => ({ title: document.title }))(); if a function-shaped snippet returns {}, details.evalStdinHint may warn that the function was serialized instead of called. If get text on a CSS selector surfaces details.selectorTextVisibility or selectorTextVisibilityAll, prefer a visible @ref, a more specific selector, or the inspect-visible-text-candidates nextAction over hidden tab content.
59
60
  - When details.pageChangeSummary is present, use changeType and summary as a compact signal for navigation, DOM mutation, confirmations, or artifacts; when nextActionIds is set, match those ids to entries in details.nextActions (or per-step nextActions inside batch) for concrete follow-up payloads instead of inferring from prose alone. If a no-navigation click surfaces details.overlayBlockers, inspect the fresh snapshot evidence before using a close/dismiss candidate nextAction; ordinary page chrome without dialog/alertdialog evidence should not trigger this diagnostic.
60
- - When commands save or spill files (screenshots, downloads, PDFs, traces, recordings, HAR, large snapshot spills), treat paths as provisional until details.artifactVerification shows every row verified: branch on missingCount, pendingCount, unverifiedCount, per-entry state, and optional limitation before downstream file use.
61
+ - When commands save or spill files (screenshots, downloads, PDFs, traces, recordings, HAR, large snapshot spills), use the user's exact requested paths when given and treat paths as provisional until details.artifactVerification shows every row verified: branch on missingCount, pendingCount, unverifiedCount, per-entry state, and optional limitation before downstream file use or PASS/FAIL reporting.
61
62
  - Do not call --help or other exploratory inspection commands unless the user explicitly asks for them or debugging the browser integration is necessary.
62
63
  <!-- agent-browser-playbook:end shared-guidelines -->
63
64
 
@@ -96,11 +97,11 @@ Examples:
96
97
  - optional; mutually exclusive with `args`, `job`, `qa`, `sourceLookup`, and `networkSourceLookup` (omit all of them when using this field)
97
98
  - top-level tool input only: `batch` stdin remains upstream argv arrays; express find steps inside batch as string arrays such as `["find","role","button","click","--name","Export"]`, not nested `semanticAction` objects
98
99
  - thin intent schema compiled by this wrapper into existing upstream `find` commands; behavior and locator semantics stay upstream-owned
99
- - supported actions: `click`, `fill`, `select`, `check`, `uncheck`
100
+ - supported actions: `click`, `fill`, `check`, `uncheck`
100
101
  - supported locators: `role`, `text`, `label`, `placeholder`, `alt`, `title`, `testid`
101
102
  - `value` is the locator argument (for example ARIA role token `"button"`, label text, or visible substring), must be a non-empty string after trim
102
- - `fill` and `select` require non-empty `text` (compiled as the trailing value argument to `find`)
103
- - optional `name` is only valid with `locator: "role"` and compiles to `--name <name>` after the action (and after `text` when present)
103
+ - `fill` requires non-empty `text` (compiled as the trailing value argument to `find`)
104
+ - optional `name` is only valid with `locator: "role"` and compiles to `--name <name>` after the action (and after `text` for `fill` when present)
104
105
  - optional `role` is accepted only when `locator` is `role` and must equal `value` if set (redundant with `value`; prefer `value` alone)
105
106
  - optional `session` is an upstream session name; when set, compilation prepends `--session <session>` before `find` so the shorthand targets that named browser context instead of the managed default; this is independent of top-level `sessionMode`, which only injects or rotates the extension-managed implicit session when the planned argv does not already start with `--session` (see `buildExecutionPlan` in `extensions/agent-browser/lib/runtime.ts`). On successful unified results, `details.sessionName` matches that name and `usedImplicitSession` is `false` because the call named upstream directly rather than consuming the extension-managed implicit session slot.
106
107
 
@@ -110,12 +111,14 @@ Compilation (then `--json` and session handling apply like any other call):
110
111
  | --- | --- |
111
112
  | `click`, `check`, or `uncheck` + non-`role` locator | `["find", <locator>, <value>, <action>]` |
112
113
  | `click` / `check` / `uncheck` + `role` + optional `name` | `["find","role",<value>,<action>]` plus `["--name",<name>]` when `name` is set |
113
- | `fill` or `select` | `["find",<locator>,<value>,<action>,<text>]` plus optional `["--name",<name>]` after `text` when `locator` is `role` and `name` is set |
114
+ | `fill` | `["find",<locator>,<value>,"fill",<text>]` plus optional `["--name",<name>]` after `text` when `locator` is `role` and `name` is set |
114
115
  | any supported action + `session` | prepends `["--session",<session>]` before the compiled `find` argv |
115
116
 
116
117
  When `semanticAction` compiles successfully, `details.compiledSemanticAction` echoes `{ action, locator, args }` with `args` redacted the same way as other invocation details. Expect it on the initial wrapper validation return (when that path still builds the early `details` object) and on the unified result after `agent-browser` runs. It is omitted when the call used `args` only, when compilation never produced argv, and on some in-`execute` error returns that attach a slimmer `details` shape before the unified merge (for example certain session-plan, stdin-contract, tab-pinning, or missing-binary guard paths); compare `extensions/agent-browser/index.ts` where `compiledSemanticAction` is assigned. For active sessions, role/name `click`, `check`, and `uncheck` semantic actions may be resolved through one fresh `snapshot -i` to a current visible `@ref` before execution; this avoids hidden duplicate matches stealing an upstream `find` action. In that case `details.compiledSemanticAction` still records the original semantic target while `details.effectiveArgs` shows the executed ref action.
117
118
 
118
- If a compiled `semanticAction` fails with `failureCategory: "selector-not-found"`, visible content includes an `Agent-browser candidate fallbacks` block when the wrapper has bounded role/name retries for that locator and action, and `details.nextActions` includes the normal `refresh-interactive-refs` snapshot step plus those entries. When `session` was provided, candidate retry args preserve the same `--session <session>` prefix. Today `buildSemanticActionCandidateActions` in `extensions/agent-browser/index.ts` only appends candidates for: `fill` + `placeholder` → `try-searchbox-name-candidate` and `try-textbox-name-candidate` (same accessible name as `value`); `click` + `text` → `try-button-name-candidate` and `try-link-name-candidate`; `fill` + `label` `try-labeled-textbox-candidate`. `select` misses—including `select` + `placeholder`—do not append `try-*` entries even when a parallel `fill` would; other locator/action pairs omit this block too. `fill` candidates keep the same trailing `text` token as the original compile before `--name <value>`; `click` candidates omit text. Each entry carries `safety` noting the match may be ambiguous. Candidate fallbacks are heuristics, not proof that an element exists; inspect the page when several controls could share the same name.
119
+ If a raw `find` or compiled `semanticAction` fails with `failureCategory: "selector-not-found"`, the wrapper may run one fresh session-scoped `snapshot -i` and add visible `Current snapshot ref fallback`, `details.visibleRefFallback`, and `try-current-visible-ref` / `try-current-visible-ref-N` next actions when that snapshot contains exact role/name matches for the failed target. This direct-ref fallback is bounded to current snapshot refs and exact normalized role/name matches: role locators require `--name`, text-click falls back only to exact-name `button`/`link` refs, label-fill to exact-name `textbox`, and placeholder-fill to exact-name `searchbox`/`textbox`. It never fuzzy-matches names such as prefixes; when several exact refs match, each action carries safety copy telling agents to inspect the snapshot and choose only if unambiguous.
120
+
121
+ If a compiled `semanticAction` fails with `failureCategory: "selector-not-found"`, visible content can also include an `Agent-browser candidate fallbacks` block when the wrapper has bounded role/name retries for that locator and action, and `details.nextActions` includes the normal `refresh-interactive-refs` snapshot step plus those entries. When `session` was provided, candidate retry args preserve the same `--session <session>` prefix. Today `buildSemanticActionCandidateActions` in `extensions/agent-browser/index.ts` only appends candidates for: `fill` + `placeholder` → `try-searchbox-name-candidate` and `try-textbox-name-candidate` (same accessible name as `value`); `click` + `text` → `try-button-name-candidate` and `try-link-name-candidate`; `fill` + `label` → `try-labeled-textbox-candidate`. Other locator/action pairs omit this block. `fill` candidates keep the same trailing `text` token as the original compile before `--name <value>`; `click` candidates omit text. Each entry carries `safety` noting the match may be ambiguous. Candidate fallbacks are heuristics, not proof that an element exists; inspect the page when several controls could share the same name.
119
122
 
120
123
  If a compiled `semanticAction` fails with `failureCategory: "stale-ref"`, `details.nextActions` includes `retry-semantic-action-after-stale-ref` with the same redacted compiled argv as `details.compiledSemanticAction` in `params.args` (any leading `--session` pair from `semanticAction.session`, then the `find` tokens). The wrapper appends that entry **after** any `refresh-interactive-refs` snapshot step from `buildAgentBrowserNextActions` in `extensions/agent-browser/lib/results/shared.ts` (see `extensions/agent-browser/index.ts` where `nextActions` is merged). That retry is only offered because the semantic target is stable and the stale-ref error proves the previous action did not execute; direct stale `@e…` commands still return snapshot/find recovery guidance instead of an unsafe blind retry.
121
124
 
@@ -129,7 +132,6 @@ Examples:
129
132
  { "semanticAction": { "action": "fill", "locator": "label", "value": "Email", "text": "user@example.com" } }
130
133
  { "semanticAction": { "action": "check", "locator": "label", "value": "Remember me" } }
131
134
  { "semanticAction": { "action": "uncheck", "locator": "label", "value": "Remember me" } }
132
- { "semanticAction": { "action": "select", "locator": "label", "value": "Country", "text": "United States" } }
133
135
  { "semanticAction": { "action": "click", "locator": "text", "value": "Close", "session": "named-browser" } }
134
136
  ```
135
137
 
@@ -175,7 +177,7 @@ Compiled shape:
175
177
 
176
178
  Use raw `args` plus `stdin` for upstream `batch` when a flow needs commands, flags, stdin forms, or failure policies outside this constrained schema.
177
179
 
178
- Because `job` still executes as upstream `batch` with generated stdin, the same wrapper page-scoped `@e…` preflight applies: if you pass `@refs` in `click`/`fill` selectors after an `open` or another step that can navigate or mutate the page, split the work across tool calls or switch to raw `batch` and insert your own `snapshot -i` rows between steps—the constrained `job` vocabulary does not emit snapshot steps for you.
180
+ Because `job` still executes as upstream `batch` with generated stdin, the same wrapper page-scoped `@e…` preflight applies: if you pass `@refs` in `click`/`fill` selectors after an `open`, `click`, or another step that can navigate or mutate the page, split the work across tool calls or switch to raw `batch` and insert your own `snapshot -i` rows between steps—the constrained `job` vocabulary does not emit snapshot steps for you. Multiple same-snapshot `fill @e…` rows may run before the first click/submit-style step.
179
181
 
180
182
  ### `qa`
181
183
 
@@ -365,19 +367,19 @@ Top-level `details.data` on `batch` is a compact per-step roll-up (not a verbati
365
367
  Ref preflight details (implementation in `extensions/agent-browser/index.ts`):
366
368
 
367
369
  - **URL alignment:** `refSnapshot.target.url` and the session’s current tab URL are compared via `targetsMatch` / `normalizeComparableUrl` in `extensions/agent-browser/index.ts`: values are trimmed, parsed as URLs when possible, compared **after dropping the `#fragment`**, and the query string remains significant. If either side lacks a `url`, `targetsMatch` treats the pair as matching so early-session calls are not blocked.
368
- - **Batch stdin ordering:** user `batch` JSON is scanned in order. Any step whose first token is in `REF_INVALIDATING_BATCH_COMMANDS` sets a latch that blocks later steps whose first token is in `REF_GUARDED_COMMANDS` and that mention `@e…` refs. A step whose first token is `snapshot` clears that latch for subsequent steps (pre-spawn intent only; it does not wait for upstream success). The invalidating set includes navigation/mutation verbs such as `open`, `goto`, `reload`, `click`, `fill`, and related upstream commands; the guarded set is the commands that accept page-scoped refs for interaction (`click`, `fill`, `download`, `scrollintoview`, and others enumerated next to those literals in source). Changing either set requires updating this contract, [`docs/SUPPORT_MATRIX.md`](SUPPORT_MATRIX.md) `RQ-0072` notes, README and command-reference pitfalls, and `test/agent-browser.extension-validation.test.ts`.
370
+ - **Batch stdin ordering:** user `batch` JSON is scanned in order. Any step whose first token is in `REF_INVALIDATING_BATCH_COMMANDS` sets a latch that blocks later steps whose first token is in `REF_GUARDED_COMMANDS` and that mention `@e…` refs. A step whose first token is `snapshot` clears that latch for subsequent steps (pre-spawn intent only; it does not wait for upstream success). The invalidating set includes navigation/mutation verbs such as `open`, `goto`, `reload`, `click`, and related upstream commands; same-snapshot `fill` rows stay guarded but do not set the latch, allowing ordinary form-fill batches before a click/submit step. The guarded set is the commands that accept page-scoped refs for interaction (`click`, `fill`, `download`, `scrollintoview`, and others enumerated next to those literals in source). Changing either set requires updating this contract, [`docs/SUPPORT_MATRIX.md`](SUPPORT_MATRIX.md) `RQ-0072`/`RQ-0087` notes, README and command-reference pitfalls, and `test/agent-browser.extension-validation.test.ts`.
369
371
 
370
372
  **Presentation redaction (implementation map):** Successful non-`batch` tool calls and each successful `batchSteps[]` row run upstream `data` through `redactPresentationData` in `extensions/agent-browser/lib/results/presentation.ts`: `cookies` and `storage` walk objects/arrays and replace case-insensitive `value` keys with `"[REDACTED]"` (diagnostic formatters still describe rows without expanding secrets); every other command’s payload is recursively scrubbed with `redactStructuredPresentationValue`, which redacts known sensitive key names and applies string-level sensitivity heuristics so network, diff, trace/profiler, stream, dashboard, chat, and other structured results do not echo bearer tokens, proxy credentials, or similar fields verbatim into `details.data`. Echoed `command` arrays in `details` and in batch roll-ups use `redactInvocationArgs` from `extensions/agent-browser/lib/runtime.ts` to mask trailing values for sensitive global flags (including `--body`, `--headers`, `--password`, and `--proxy`), preserve the special positional rules for `cookies set`, `storage local|session set`, and `set credentials`, and scrub other argv tokens for URLs and inline secrets. Failed batch steps additionally run `redactExactValues` on structured step errors so literals taken from that step’s argv (cookie value, storage set value, `--password` / `--password=` tokens) cannot reappear inside formatted error blobs.
371
373
 
372
- `nextActions` is an optional machine-readable list of exact native `agent_browser` follow-ups. Each entry includes `tool: "agent_browser"`, an `id`, a short `reason`, optional `safety`, and either `params` (`args`, optional `stdin`, optional `sessionMode`) or an `artifactPath` for saved-file workflows. Agents should prefer these payloads over prose when present. Current recommendations include: `open` success → `snapshot -i`; mutating/navigation commands (see `buildAgentBrowserNextActions` in source for the exact command set) → `snapshot -i`; stale refs and selector failures → `snapshot -i` via `refresh-interactive-refs` (prefixed with `--session <name>` when the failed call ran in a named or managed session); unknown getter shortcuts such as `title` / `url` → exact read-only retries like `get title` / `get url` with ids `use-get-title` / `use-get-url`; semantic `selector-not-found` failures that compiled from `semanticAction` may append `try-searchbox-name-candidate`, `try-textbox-name-candidate`, `try-button-name-candidate`, `try-link-name-candidate`, or `try-labeled-textbox-candidate` after presentation `nextActions` only for the bounded fill/click pairs enumerated under `semanticAction` (not for `select`); semantic `stale-ref` failures that compiled from `semanticAction` may also include `retry-semantic-action-after-stale-ref` after that snapshot step; qualifying same-URL top-level clicks (see `overlayBlockers` below) with fresh snapshot evidence of likely overlay/banner/dialog close controls may append `inspect-overlay-state` and bounded `try-overlay-blocker-candidate-*` entries; successful top-level `scroll` calls whose pre/post viewport and sampled scroll-container positions do not change may append `inspect-after-noop-scroll` and `verify-noop-scroll-visually`; explicit combobox-targeted actions that focus a combobox without visible options may append `inspect-focused-combobox`, `try-open-combobox-with-arrow`, and `try-open-combobox-with-enter`; `get text <selector>` calls with hidden/multiple CSS matches may append `inspect-visible-text-candidates` with a read-only `eval --stdin` probe (each prefixed with `--session <name>` when `details.sessionName` is set, same `sessionPrefixArgs` rule as other session-scoped follow-ups); confirmations → exact `confirm <id>` and `deny <id>` choices; tab drift → `tab list` then `snapshot -i`; download verification failures or missing successful download artifacts → `wait --download [path]`; saved artifacts → the artifact path to inspect/consume after checking `artifactVerification`/metadata; missing non-download artifacts → `verify-artifact-path` so agents do not trust an absent file. When nothing applies, the field is omitted.
374
+ `nextActions` is an optional machine-readable list of exact native `agent_browser` follow-ups. Each entry includes `tool: "agent_browser"`, an `id`, a short `reason`, optional `safety`, and either `params` (`args`, optional `stdin`, optional `sessionMode`) or an `artifactPath` for saved-file workflows. Agents should prefer these payloads over prose when present. Current recommendations include: `open` success → `snapshot -i`; mutating/navigation commands (see `buildAgentBrowserNextActions` in source for the exact command set) → `snapshot -i`; stale refs and selector failures → `snapshot -i` via `refresh-interactive-refs` (prefixed with `--session <name>` when the failed call ran in a named or managed session); selector misses with exact current snapshot role/name matches → direct ref retries via `try-current-visible-ref` or bounded `try-current-visible-ref-N`; unknown getter shortcuts such as `title` / `url` → exact read-only retries like `get title` / `get url` with ids `use-get-title` / `use-get-url`; semantic `selector-not-found` failures that compiled from `semanticAction` may append `try-searchbox-name-candidate`, `try-textbox-name-candidate`, `try-button-name-candidate`, `try-link-name-candidate`, or `try-labeled-textbox-candidate` after presentation `nextActions` only for the bounded fill/click pairs enumerated under `semanticAction`; semantic `stale-ref` failures that compiled from `semanticAction` may also include `retry-semantic-action-after-stale-ref` after that snapshot step; qualifying same-URL top-level clicks (see `overlayBlockers` below) with fresh snapshot evidence of likely overlay/banner/dialog close controls may append `inspect-overlay-state` and bounded `try-overlay-blocker-candidate-*` entries; successful top-level `scroll` calls whose pre/post viewport and sampled scroll-container positions do not change may append `inspect-after-noop-scroll` and `verify-noop-scroll-visually`; explicit combobox-targeted actions that focus a combobox without visible options may append `inspect-focused-combobox`, `try-open-combobox-with-arrow`, and `try-open-combobox-with-enter`; `get text <selector>` calls with hidden/multiple CSS matches may append `inspect-visible-text-candidates` with a read-only `eval --stdin` probe (each prefixed with `--session <name>` when `details.sessionName` is set, same `sessionPrefixArgs` rule as other session-scoped follow-ups); confirmations → exact `confirm <id>` and `deny <id>` choices; tab drift → `tab list` then `snapshot -i`; download verification failures or missing successful download artifacts → `wait --download [path]`; saved artifacts → the artifact path to inspect/consume after checking `artifactVerification`/metadata; missing non-download artifacts → `verify-artifact-path` so agents do not trust an absent file. When nothing applies, the field is omitted.
373
375
 
374
376
  **Unknown-command getter hints (failure presentation):** `buildToolPresentation` in `extensions/agent-browser/lib/results/presentation.ts` only runs this path when upstream error text (after model-facing redaction) matches `unknown command`, `unknown subcommand`, or `unrecognized command` (case-insensitive) **and** the failed invocation’s primary command token is one of `attr`, `count`, `html`, `text`, `title`, `url`, or `value`. Visible text then includes a grouped-`get` hint line plus per-token guidance (`get text <selector>`, `get html …`, `get attr …`, `get count …`, `get value …`, `get title`, `get url`). Machine `nextActions` with ids `use-get-title` / `use-get-url` are emitted only for `title` / `url`, with `params.args` optionally prefixed by `--session <name>` when the failed call targeted a named session. If the error string already contains `Agent-browser hint:` from selector recovery (stale-ref or unsupported selector dialect appendages), the getter block is skipped so two stacked `Agent-browser hint:` headers are not emitted.
375
377
 
376
378
  For `batch`, each `batchSteps[]` entry can carry its own `nextActions` for that step’s success or failure. Top-level `details.nextActions` on a failed batch duplicates `batchFailure.failedStep.nextActions` so callers can read one aggregate object. On a fully successful batch, top-level `nextActions` may still list artifact follow-ups derived from the combined step artifacts.
377
379
 
378
- `pageChangeSummary` is an optional compact summary for mutation-prone and artifact-producing commands. It includes `changeType` (`"navigation"`, `"mutation"`, `"artifact"`, or `"confirmation"`), `command`, a readable `summary`, optional `title`/`url`, optional `artifactCount` or `savedFilePath`, and `nextActionIds` that link the observed change to `nextActions` without repeating full payloads. The wrapper maintains an explicit allowlist of mutation-prone commands in `extensions/agent-browser/lib/results/presentation.ts` (`PAGE_CHANGE_SUMMARY_COMMANDS`): those commands still emit a `mutation`-typed summary when upstream JSON lacks navigation metadata, as long as no stronger signal (artifact, saved path, navigation fields, or pending confirmation) applies. Commands outside that set omit `pageChangeSummary` unless the parsed payload shows navigation, a confirmation prompt, saved files, or artifacts—including read-only inspection commands, which normally have no summary unless one of those signals appears. For `batch`, the top-level summary favors artifact rollups when any step produced artifacts; otherwise it may synthesize a `mutation` summary from steps that carried their own `pageChangeSummary`.
380
+ `pageChangeSummary` is an optional compact summary for mutation-prone and artifact-producing commands. It includes `changeType` (`"navigation"`, `"mutation"`, `"artifact"`, or `"confirmation"`), `command`, a readable `summary`, optional `title`/`url`, optional `artifactCount` or `savedFilePath`, and `nextActionIds` that link the observed change to `nextActions` without repeating full payloads. The wrapper maintains an explicit allowlist of mutation-prone commands in `extensions/agent-browser/lib/results/presentation.ts` (`PAGE_CHANGE_SUMMARY_COMMANDS`): those commands still emit a `mutation`-typed summary when upstream JSON lacks navigation metadata, as long as no stronger signal (artifact, saved path, navigation fields, or pending confirmation) applies. Commands outside that set omit `pageChangeSummary` unless the parsed payload shows navigation, a confirmation prompt, saved files, or artifacts—including read-only inspection commands, which normally have no summary unless one of those signals appears. For `batch`, the top-level summary favors artifact rollups when any step produced artifacts; otherwise it may synthesize a `mutation` summary from steps that carried their own `pageChangeSummary`. Treat mutation summaries as "upstream attempted the action" evidence, not proof the application handled it; agents should verify URL/text/state for important mutations before continuing.
379
381
 
380
- `overlayBlockers` may appear after a successful **top-level** `click` (the unified `details.command` is `click`, not `batch`/`job`/`qa` flows that compile to `batch`) only when upstream JSON includes a string `data.clicked` ref, the session’s prior pinned tab URL (`priorSessionTabTarget.url`) and the post-click active tab URL both exist and stay equal after the same URL normalization used for ref preflight (trimmed hosts/paths; **`#fragment` dropped** while the query string stays significant), and the wrapper did not apply session tab correction or an about-blank mismatch recovery in the same result. The post-click side comes from `details.navigationSummary.url`, which the wrapper fills via follow-up `get url` / `get title` only when upstream click JSON omits **both** string `data.url` and `data.title` (`shouldCaptureNavigationSummary` in `extensions/agent-browser/index.ts`). If either field is present as a string on the click payload, that probe is skipped, `navigationSummary` stays unset here, and overlay diagnostics are omitted even when the page did not navigate. The wrapper then issues **one** extra session-scoped `snapshot -i`, scans that snapshot’s `refs` map, and only emits diagnostics when **both** are true: at least one ref has a strong modal role (`dialog` or `alertdialog`), and there are up to **three** separate `button`/`link`/`menuitem` refs whose names match close/dismiss-style patterns (for example “Close”, “Dismiss”, “No thanks”, or a lone `×`). Page-wide text such as “privacy”, “sign in”, or “banner” without a dialog role is not enough, which avoids warning on ordinary same-page menu opens or app button mutations. Each candidate carries `ref` (`@eN`), optional `role`/`name`, exact `click` argv in `args`, and a short evidence `reason`. The struct also includes a `summary` string (one sentence stating that the click left the tab on the same normalized URL and the fresh snapshot shows likely dismiss controls) plus a `snapshot` object (same shape as `details.refSnapshot` after a normal snapshot): on success the wrapper may treat that snapshot as the session’s latest ref map for subsequent calls, so agents should assume refs can move to match this post-diagnostic tree. Visible text appends the same bullets under `Possible overlay blockers`, and `details.nextActions` gains `inspect-overlay-state` plus `try-overlay-blocker-candidate-1`…`3` after any presentation `nextActions` (for example `inspect-after-mutation`); when `details.sessionName` is set, those appended actions use `sessionPrefixArgs` so `params.args` begin with `--session <name>` unless argv already starts with `--session`. This is conservative evidence, not proof the candidate should be clicked; prefer `inspect-overlay-state` first unless the dismiss control is clearly safe.
382
+ `overlayBlockers` may appear after a successful **top-level** `click` (the unified `details.command` is `click`, not `batch`/`job`/`qa` flows that compile to `batch`) only when upstream JSON includes a string `data.clicked` ref, the session’s prior pinned tab URL (`priorSessionTabTarget.url`) and the post-click active tab URL both exist and stay equal after the same URL normalization used for ref preflight (trimmed hosts/paths; **`#fragment` dropped** while the query string stays significant), and the wrapper did not apply session tab correction or an about-blank mismatch recovery in the same result. The post-click side comes from `details.navigationSummary.url`, which the wrapper fills with one read-only `eval` summary (`({ title: document.title, url: location.href })`) only when upstream click JSON omits **both** string `data.url` and `data.title` (`shouldCaptureNavigationSummary` in `extensions/agent-browser/index.ts`). If either field is present as a string on the click payload, that probe is skipped, `navigationSummary` stays unset here, and overlay diagnostics are omitted even when the page did not navigate. The wrapper then issues **one** extra session-scoped `snapshot -i`, scans that snapshot’s `refs` map, and only emits diagnostics when **both** are true: at least one ref has a strong modal role (`dialog` or `alertdialog`), and there are up to **three** separate `button`/`link`/`menuitem` refs whose names match close/dismiss-style patterns (for example “Close”, “Dismiss”, “No thanks”, or a lone `×`). Page-wide text such as “privacy”, “sign in”, or “banner” without a dialog role is not enough, which avoids warning on ordinary same-page menu opens or app button mutations. Each candidate carries `ref` (`@eN`), optional `role`/`name`, exact `click` argv in `args`, and a short evidence `reason`. The struct also includes a `summary` string (one sentence stating that the click left the tab on the same normalized URL and the fresh snapshot shows likely dismiss controls) plus a `snapshot` object (same shape as `details.refSnapshot` after a normal snapshot): on success the wrapper may treat that snapshot as the session’s latest ref map for subsequent calls, so agents should assume refs can move to match this post-diagnostic tree. Visible text appends the same bullets under `Possible overlay blockers`, and `details.nextActions` gains `inspect-overlay-state` plus `try-overlay-blocker-candidate-1`…`3` after any presentation `nextActions` (for example `inspect-after-mutation`); when `details.sessionName` is set, those appended actions use `sessionPrefixArgs` so `params.args` begin with `--session <name>` unless argv already starts with `--session`. This is conservative evidence, not proof the candidate should be clicked; prefer `inspect-overlay-state` first unless the dismiss control is clearly safe.
381
383
 
382
384
  Example shape (fields vary by scenario):
383
385
 
@@ -442,8 +444,9 @@ Additional structured fields can appear when relevant:
442
444
  - `navigationSummary` for navigation-style commands like `click`, `back`, `forward`, and `reload`
443
445
  - `pageChangeSummary` for compact mutation/artifact/navigation summaries on commands that can change browser state
444
446
  - `overlayBlockers` for conservative post-click overlay/banner/dialog blocker candidates when a direct click stays on the same URL and a fresh snapshot provides evidence (`candidates`, `summary`, and `snapshot` per `OverlayBlockerDiagnostic` in `extensions/agent-browser/index.ts`)
447
+ - `visibleRefFallback` after a raw `find` or compiled `semanticAction` fails with `selector-not-found` and a fresh snapshot finds exact role/name `@ref` matches. Shape follows `VisibleRefFallbackDiagnostic` in `extensions/agent-browser/index.ts`: `{ candidates, snapshot, summary, target }`, where each candidate has `ref`, `role`, `name`, direct ref `args`, and `reason`; visible text appends `Current snapshot ref fallback`, and `details.nextActions` gains `try-current-visible-ref` or numbered `try-current-visible-ref-N` actions.
445
448
  - `scrollNoop` after a successful **top-level** `scroll` when wrapper-side read-only probes before and after the command show no change in `window.scrollX` / `window.scrollY` and no change in the sampled prominent scrollable containers. To avoid pre-launching a session without caller startup state, this probe is skipped when the invocation includes startup-scoped flags such as `--profile`, `--state`, `--session-name`, `--cdp`, providers, init scripts, or similar launch settings. Shape: `{ reason: "no-observed-scroll-position-change", message, before, after, recommendations }`; `before` / `after` include viewport dimensions, document scroll dimensions, and up to ten sampled container descriptors plus scroll offsets. Container descriptors use only sample index, tag name, and ARIA role; DOM ids/classes are intentionally not stored. This diagnostic is conservative evidence that the page-level scroll likely missed a nested pane, not proof that every app-specific region is unchanged. Visible text appends `Scroll diagnostic: no observed scroll movement`, and `details.nextActions` gains `inspect-after-noop-scroll` (`snapshot -i`) plus `verify-noop-scroll-visually` (`screenshot`), session-prefixed when applicable.
446
- - `comboboxFocus` after a successful explicit combobox-targeted `click` / `fill` / `find … click|fill|select` (for example `semanticAction` with role `combobox`, including when that semantic action resolves through a current visible `@ref` before execution) when a read-only probe sees the active element is combobox-like, `aria-expanded` is explicitly present (`false` or `true`), and no visible `listbox` / `option` / menu option elements are open. Shape: `{ reason: "focused-combobox-without-visible-options", message, activeElement, visibleListboxCount, visibleOptionCount, recommendations }`; `activeElement` includes bounded role/tag/expanded/hasPopup/name metadata with normal text redaction. Visible text appends `Combobox diagnostic: focused combobox did not expose visible options`, and `details.nextActions` gains `inspect-focused-combobox` (`snapshot -i`), `try-open-combobox-with-arrow` (`press ArrowDown`), and `try-open-combobox-with-enter` (`press Enter`), session-prefixed when applicable. The diagnostic is deliberately gated to explicit combobox-targeted calls to avoid extra probes or false positives on ordinary clicks/textboxes.
449
+ - `comboboxFocus` after a successful explicit combobox-targeted `click` / `fill` / `find … click|fill` (for example `semanticAction` with role `combobox`, including when that semantic action resolves through a current visible `@ref` before execution) when a read-only probe sees the active element is combobox-like, `aria-expanded` is explicitly present (`false` or `true`), and no visible `listbox` / `option` / menu option elements are open. Shape: `{ reason: "focused-combobox-without-visible-options", message, activeElement, visibleListboxCount, visibleOptionCount, recommendations }`; `activeElement` includes bounded role/tag/expanded/hasPopup/name metadata with normal text redaction. Visible text appends `Combobox diagnostic: focused combobox did not expose visible options`, and `details.nextActions` gains `inspect-focused-combobox` (`snapshot -i`), `try-open-combobox-with-arrow` (`press ArrowDown`), and `try-open-combobox-with-enter` (`press Enter`), session-prefixed when applicable. The diagnostic is deliberately gated to explicit combobox-targeted calls to avoid extra probes or false positives on ordinary clicks/textboxes.
447
450
  - `recordingDependencyWarning` after a successful `record start` or `record restart` when the wrapper cannot find an executable `ffmpeg` on the Pi process `PATH`. Shape: `{ reason: "ffmpeg-missing-for-recording", dependency: "ffmpeg", command, message, recommendations }`. Visible text appends `Recording dependency warning: ffmpeg not found on PATH`. This is a non-blocking preflight warning: upstream may start recording, but `record stop` needs `ffmpeg` to encode the WebM.
448
451
  - `selectorTextVisibility` after a **successful** upstream `get text <selector>` (standalone or inside a successful `batch`) when the wrapper’s follow-up probe finds a hazard: more than one DOM match (upstream reads the first `querySelectorAll` hit, which may be the wrong tab/panel), or the first match is hidden while at least one other match is visible (requires multiple DOM nodes so a visible peer exists; a lone hidden match is not flagged). The probe is a read-only `eval --stdin` script (`buildVisibleTextProbeScript` in `extensions/agent-browser/index.ts`) that counts matches, applies a small visibility heuristic (`display`/`visibility`/`opacity` plus non-zero client rects), and may include a redacted `firstVisibleTextPreview`. It is **not** run for page-scoped `@e…` selectors or when the selector string is withheld because `selectorMayExposeSensitiveLiteral` would risk echoing secrets in probe output. `details.selectorTextVisibility` mirrors the primary diagnostic (first sorted entry); when several selectors in one `batch` qualify, `selectorTextVisibilityAll` lists every diagnostic sorted so hidden-first cases precede generic multi-match ambiguity. Appended `details.nextActions` use ids `inspect-visible-text-candidates` and `inspect-visible-text-candidates-2`, … with the probe replayed via `eval --stdin` for each hazardous selector.
449
452
  - `evalStdinHint` after a successful `eval --stdin` when caller stdin (trimmed) looks function-shaped to the wrapper’s lightweight detector (`looksLikeFunctionEvalStdin` in `extensions/agent-browser/index.ts`: leading `function` / `async function`, parenthesized arrow `(…) =>`, or a concise `name =>` / `async name =>` form) **and** upstream JSON `data` is an object whose `result` field is an empty object (`{}`). It includes `reason` and `suggestion`; visible output appends `Eval stdin hint` with the same guidance. This is a heuristic for the common mistake of returning a function object instead of invoking it or passing a plain expression, not a JavaScript parser or proof that the page returned no useful data.
@@ -514,8 +517,8 @@ If `agent-browser` is not on `PATH`, fail with a message that:
514
517
  <!-- agent-browser-playbook:start wrapper-tab-recovery -->
515
518
  <!-- Generated from extensions/agent-browser/lib/playbook.ts. Run `npm run docs -- playbook write` to update. -->
516
519
  - After launch-scoped open/goto/navigate calls that can restore existing tabs (for example --profile, --session-name, or --state), agent_browser best-effort re-selects the tab whose URL matches the returned page when restored tabs steal focus during launch.
517
- - After a target tab is known for a session, later active-tab commands best-effort pin that tab inside the same upstream invocation when reconnect drift would otherwise move the command to a restored/background tab.
518
- - After a successful command on a known target tab, agent_browser also best-effort restores that intended tab if a restored/background tab steals focus after the command completes.
520
+ - After the wrapper observes tab-drift risk for a session (for example profile restore correction, overlapping stale opens, or resumed session state), later active-tab commands best-effort pin that tab inside the same upstream invocation. Routine same-session commands are not preflighted with tab list just because a target tab is known.
521
+ - For sessions with observed tab-drift risk, after a successful command on a known target tab, agent_browser also best-effort restores that intended tab if a restored/background tab steals focus after the command completes. Routine same-session commands skip this post-command tab-list probe.
519
522
  - If a known session target unexpectedly reports about:blank, agent_browser preserves the prior intended target, best-effort re-selects it when it still exists, and reports exact recovery guidance when it cannot be re-selected.
520
523
  <!-- agent-browser-playbook:end wrapper-tab-recovery -->
521
524
  - on local Unix launches, set a short private socket directory for wrapper-spawned `agent-browser` processes so extension-generated session names do not fail the upstream Unix socket-path length limit in longer cwd/session-name combinations
@@ -84,7 +84,7 @@ const DEFAULT_SESSION_MODE = "auto" as const;
84
84
  const DIRECT_AGENT_BROWSER_BASH_BYPASS_ENV = "PI_AGENT_BROWSER_ALLOW_DIRECT_BASH";
85
85
  const PACKAGE_NAME = "pi-agent-browser-native";
86
86
 
87
- const AGENT_BROWSER_SEMANTIC_ACTIONS = ["check", "click", "fill", "select", "uncheck"] as const;
87
+ const AGENT_BROWSER_SEMANTIC_ACTIONS = ["check", "click", "fill", "uncheck"] as const;
88
88
  const AGENT_BROWSER_SEMANTIC_LOCATORS = ["alt", "label", "placeholder", "role", "testid", "text", "title"] as const;
89
89
  const AGENT_BROWSER_JOB_STEP_ACTIONS = ["open", "click", "fill", "wait", "assertText", "assertUrl", "waitForDownload", "screenshot"] as const;
90
90
  const AGENT_BROWSER_QA_LOAD_STATES = ["domcontentloaded", "load", "networkidle"] as const;
@@ -271,7 +271,7 @@ const AGENT_BROWSER_PARAMS = Type.Object({
271
271
  description: "Upstream find locator family to use.",
272
272
  }),
273
273
  value: Type.String({ description: "Locator value, such as visible text, label text, placeholder text, test id, title, alt text, or role." }),
274
- text: Type.Optional(Type.String({ description: "Text/value argument for fill or select actions." })),
274
+ text: Type.Optional(Type.String({ description: "Text/value argument for fill actions." })),
275
275
  role: Type.Optional(Type.String({ description: "Role locator value; when set it must match value for locator=role." })),
276
276
  name: Type.Optional(Type.String({ description: "Accessible name filter for locator=role; compiles to --name <name>." })),
277
277
  session: Type.Optional(Type.String({ description: "Optional upstream session name; prepends --session <name> before the compiled find command." })),
@@ -945,7 +945,7 @@ async function analyzeNetworkSourceLookupResults(data: unknown, compiled: Compil
945
945
  }
946
946
 
947
947
  function appendSemanticActionTextArg(args: string[], action: string, text: string | undefined): void {
948
- if ((action === "fill" || action === "select") && text) {
948
+ if (action === "fill" && text) {
949
949
  args.push(text);
950
950
  }
951
951
  }
@@ -955,7 +955,7 @@ function getCompiledSemanticActionCommandIndex(compiled: CompiledAgentBrowserSem
955
955
  }
956
956
 
957
957
  function getCompiledSemanticActionTextArg(compiled: CompiledAgentBrowserSemanticAction): string | undefined {
958
- if (compiled.action !== "fill" && compiled.action !== "select") return undefined;
958
+ if (compiled.action !== "fill") return undefined;
959
959
  const commandIndex = getCompiledSemanticActionCommandIndex(compiled);
960
960
  if (commandIndex < 0) return undefined;
961
961
  const markerIndex = compiled.args.indexOf("--name");
@@ -1023,6 +1023,121 @@ function buildSemanticActionCandidateActions(compiled: CompiledAgentBrowserSeman
1023
1023
  return [];
1024
1024
  }
1025
1025
 
1026
+ function isAgentBrowserSemanticActionName(value: string | undefined): value is AgentBrowserSemanticActionName {
1027
+ return typeof value === "string" && AGENT_BROWSER_SEMANTIC_ACTIONS.includes(value as AgentBrowserSemanticActionName);
1028
+ }
1029
+
1030
+ function getFindNameFlagValue(args: string[], startIndex: number): string | undefined {
1031
+ const nameFlagIndex = args.indexOf("--name", startIndex);
1032
+ const name = nameFlagIndex >= 0 ? args[nameFlagIndex + 1] : undefined;
1033
+ return name && !name.startsWith("-") ? name : undefined;
1034
+ }
1035
+
1036
+ function getFindVisibleRefFallbackTarget(args: string[]): VisibleRefFallbackTarget | undefined {
1037
+ const findIndex = args[0] === "--session" ? 2 : args.indexOf("find");
1038
+ if (findIndex < 0) return undefined;
1039
+ const locator = args[findIndex + 1];
1040
+ const value = args[findIndex + 2];
1041
+ const action = args[findIndex + 3];
1042
+ if (!locator || !value || !isAgentBrowserSemanticActionName(action)) return undefined;
1043
+ const text = action === "fill" ? args[findIndex + 4] : undefined;
1044
+ if (action === "fill" && (!text || text.startsWith("-"))) return undefined;
1045
+ if (locator === "role") {
1046
+ const targetName = getFindNameFlagValue(args, findIndex + 4);
1047
+ return targetName ? { action, roles: [value], targetName, text } : undefined;
1048
+ }
1049
+ if (locator === "text" && action === "click") {
1050
+ return { action, roles: ["button", "link"], targetName: value };
1051
+ }
1052
+ if (locator === "label" && action === "fill") {
1053
+ return { action, roles: ["textbox"], targetName: value, text };
1054
+ }
1055
+ if (locator === "placeholder" && action === "fill") {
1056
+ return { action, roles: ["searchbox", "textbox"], targetName: value, text };
1057
+ }
1058
+ return undefined;
1059
+ }
1060
+
1061
+ function getVisibleRefFallbackTarget(options: {
1062
+ commandTokens: string[];
1063
+ compiledSemanticAction?: CompiledAgentBrowserSemanticAction;
1064
+ }): VisibleRefFallbackTarget | undefined {
1065
+ return getFindVisibleRefFallbackTarget(options.commandTokens) ?? (options.compiledSemanticAction ? getFindVisibleRefFallbackTarget(options.compiledSemanticAction.args) : undefined);
1066
+ }
1067
+
1068
+ const VISIBLE_REF_FALLBACK_CANDIDATE_LIMIT = 3;
1069
+
1070
+ function getVisibleRefFallbackCandidates(target: VisibleRefFallbackTarget, snapshotData: unknown): VisibleRefFallbackCandidate[] {
1071
+ const refs = getSnapshotRefRecord(snapshotData);
1072
+ if (!refs) return [];
1073
+ const roleOrder = target.roles.map((role) => role.toLowerCase());
1074
+ const targetName = normalizeSemanticActionAccessibleName(target.targetName);
1075
+ const candidates = Object.entries(refs).flatMap(([ref, entry]): VisibleRefFallbackCandidate[] => {
1076
+ if (!/^e\d+$/.test(ref) || !isRecord(entry)) return [];
1077
+ const role = typeof entry.role === "string" ? entry.role : undefined;
1078
+ const name = typeof entry.name === "string" ? entry.name : undefined;
1079
+ if (!role || !name || !roleOrder.includes(role.toLowerCase()) || normalizeSemanticActionAccessibleName(name) !== targetName) return [];
1080
+ const args = [target.action, `@${ref}`];
1081
+ appendSemanticActionTextArg(args, target.action, target.text);
1082
+ return [{
1083
+ action: target.action,
1084
+ args,
1085
+ name,
1086
+ reason: `Current snapshot shows ${role} ${JSON.stringify(name)} at @${ref}, matching the failed ${target.action} locator exactly.`,
1087
+ ref: `@${ref}`,
1088
+ role,
1089
+ }];
1090
+ });
1091
+ candidates.sort((left, right) => roleOrder.indexOf(left.role.toLowerCase()) - roleOrder.indexOf(right.role.toLowerCase()) || compareRefIds(left.ref.slice(1), right.ref.slice(1)));
1092
+ return candidates.slice(0, VISIBLE_REF_FALLBACK_CANDIDATE_LIMIT);
1093
+ }
1094
+
1095
+ async function collectVisibleRefFallbackDiagnostic(options: {
1096
+ commandTokens: string[];
1097
+ compiledSemanticAction?: CompiledAgentBrowserSemanticAction;
1098
+ cwd: string;
1099
+ sessionName?: string;
1100
+ signal?: AbortSignal;
1101
+ }): Promise<VisibleRefFallbackDiagnostic | undefined> {
1102
+ if (!options.sessionName) return undefined;
1103
+ const target = getVisibleRefFallbackTarget({ commandTokens: options.commandTokens, compiledSemanticAction: options.compiledSemanticAction });
1104
+ if (!target) return undefined;
1105
+ const snapshotData = await runSessionCommandData({ args: ["snapshot", "-i"], cwd: options.cwd, sessionName: options.sessionName, signal: options.signal });
1106
+ const snapshot = extractRefSnapshotFromData(snapshotData);
1107
+ if (!snapshot) return undefined;
1108
+ const candidates = getVisibleRefFallbackCandidates(target, snapshotData);
1109
+ if (candidates.length === 0) return undefined;
1110
+ return {
1111
+ candidates,
1112
+ snapshot,
1113
+ summary: candidates.length === 1
1114
+ ? `Current snapshot has one exact visible ref match for ${target.action} ${JSON.stringify(target.targetName)}.`
1115
+ : `Current snapshot has ${candidates.length} exact visible ref matches for ${target.action} ${JSON.stringify(target.targetName)}; choose only if the intended control is unambiguous.`,
1116
+ target,
1117
+ };
1118
+ }
1119
+
1120
+ function buildVisibleRefFallbackNextActions(options: { diagnostic: VisibleRefFallbackDiagnostic; sessionName?: string }): AgentBrowserNextAction[] {
1121
+ const ambiguous = options.diagnostic.candidates.length > 1;
1122
+ return options.diagnostic.candidates.map((candidate, index) => ({
1123
+ id: ambiguous ? `try-current-visible-ref-${index + 1}` : "try-current-visible-ref",
1124
+ params: { args: sessionPrefixArgs(options.sessionName, candidate.args) },
1125
+ reason: candidate.reason,
1126
+ safety: ambiguous
1127
+ ? "Several current refs share the same exact role/name. Inspect the snapshot and use only the ref that clearly matches the intended target."
1128
+ : "Use only while this current snapshot still represents the page; refresh refs first if the page changed.",
1129
+ tool: "agent_browser" as const,
1130
+ }));
1131
+ }
1132
+
1133
+ function formatVisibleRefFallbackText(diagnostic: VisibleRefFallbackDiagnostic | undefined): string | undefined {
1134
+ if (!diagnostic) return undefined;
1135
+ return [
1136
+ "Current snapshot ref fallback:",
1137
+ ...diagnostic.candidates.map((candidate) => `- ${candidate.ref}${candidate.role ? ` ${candidate.role}` : ""} ${JSON.stringify(candidate.name)}: ${candidate.reason}`),
1138
+ ].join("\n");
1139
+ }
1140
+
1026
1141
  function normalizeSemanticActionAccessibleName(name: string): string {
1027
1142
  return name.replace(/\s+/g, " ").trim().toLowerCase();
1028
1143
  }
@@ -1101,11 +1216,11 @@ function compileAgentBrowserSemanticAction(input: unknown): { compiled?: Compile
1101
1216
  if (text !== undefined && typeof text !== "string") {
1102
1217
  return { error: "semanticAction.text must be a string when provided." };
1103
1218
  }
1104
- if ((action === "fill" || action === "select") && (typeof text !== "string" || text.length === 0)) {
1219
+ if (action === "fill" && (typeof text !== "string" || text.length === 0)) {
1105
1220
  return { error: `semanticAction.text is required for ${action}.` };
1106
1221
  }
1107
- if (action !== "fill" && action !== "select" && text !== undefined) {
1108
- return { error: `semanticAction.text is only supported for fill and select actions.` };
1222
+ if (action !== "fill" && text !== undefined) {
1223
+ return { error: "semanticAction.text is only supported for fill actions." };
1109
1224
  }
1110
1225
  if (role !== undefined && (locator !== "role" || role !== value)) {
1111
1226
  return { error: "semanticAction.role is only supported for locator=role and must match value." };
@@ -1117,7 +1232,7 @@ function compileAgentBrowserSemanticAction(input: unknown): { compiled?: Compile
1117
1232
  return { error: "semanticAction.session must be a non-empty string when provided." };
1118
1233
  }
1119
1234
  const args = typeof session === "string" ? ["--session", session, "find", locator, value, action] : ["find", locator, value, action];
1120
- if (action === "fill" || action === "select") {
1235
+ if (action === "fill") {
1121
1236
  args.push(text as string);
1122
1237
  }
1123
1238
  if (locator === "role" && typeof name === "string") {
@@ -1498,6 +1613,7 @@ async function isDirectAgentBrowserBashAllowed(cwd: string): Promise<boolean> {
1498
1613
  }
1499
1614
 
1500
1615
  const NAVIGATION_SUMMARY_COMMANDS = new Set(["back", "click", "dblclick", "forward", "reload"]);
1616
+ const NAVIGATION_SUMMARY_EVAL = `({ title: document.title, url: location.href })`;
1501
1617
 
1502
1618
  interface NavigationSummary {
1503
1619
  title?: string;
@@ -1518,6 +1634,34 @@ interface OverlayBlockerDiagnostic {
1518
1634
  summary: string;
1519
1635
  }
1520
1636
 
1637
+ interface VisibleRefFallbackCandidate {
1638
+ action: AgentBrowserSemanticActionName;
1639
+ args: string[];
1640
+ name: string;
1641
+ reason: string;
1642
+ ref: string;
1643
+ role: string;
1644
+ }
1645
+
1646
+ interface VisibleRefFallbackDiagnostic {
1647
+ candidates: VisibleRefFallbackCandidate[];
1648
+ snapshot: SessionRefSnapshot;
1649
+ summary: string;
1650
+ target: {
1651
+ action: AgentBrowserSemanticActionName;
1652
+ roles: string[];
1653
+ text?: string;
1654
+ targetName: string;
1655
+ };
1656
+ }
1657
+
1658
+ interface VisibleRefFallbackTarget {
1659
+ action: AgentBrowserSemanticActionName;
1660
+ roles: string[];
1661
+ text?: string;
1662
+ targetName: string;
1663
+ }
1664
+
1521
1665
  interface SelectorTextVisibilityDiagnostic {
1522
1666
  firstMatchVisible?: boolean;
1523
1667
  firstVisibleTextPreview?: string;
@@ -1985,6 +2129,13 @@ function extractStringResultField(data: unknown, fieldName: "result" | "title" |
1985
2129
  return text.length > 0 ? text : undefined;
1986
2130
  }
1987
2131
 
2132
+ function extractNavigationSummaryFromData(data: unknown): NavigationSummary | undefined {
2133
+ const result = isRecord(data) && isRecord(data.result) ? data.result : data;
2134
+ const title = extractStringResultField(result, "title");
2135
+ const url = extractStringResultField(result, "url");
2136
+ return title || url ? { title, url } : undefined;
2137
+ }
2138
+
1988
2139
  const SESSION_TAB_PINNING_EXCLUDED_COMMANDS = new Set(["close", "goto", "navigate", "open", "session", "tab"]);
1989
2140
  const SESSION_TAB_POST_COMMAND_CORRECTION_EXCLUDED_COMMANDS = new Set(["batch", "close", "session", "tab"]);
1990
2141
 
@@ -2139,7 +2290,6 @@ function extractSessionTabTargetFromBatchResults(data: unknown): SessionTabTarge
2139
2290
  pendingTitle = undefined;
2140
2291
  continue;
2141
2292
  }
2142
-
2143
2293
  const resultTarget = extractSessionTabTargetFromData(result);
2144
2294
  if (resultTarget) {
2145
2295
  currentTarget = resultTarget;
@@ -2334,10 +2484,12 @@ function supportsPinnedStdinCommand(options: { command?: string; commandTokens:
2334
2484
  function shouldPinSessionTabForCommand(options: {
2335
2485
  command?: string;
2336
2486
  commandTokens: string[];
2487
+ pinningRequired?: boolean;
2337
2488
  sessionName?: string;
2338
2489
  stdin?: string;
2339
2490
  }): boolean {
2340
2491
  return (
2492
+ options.pinningRequired === true &&
2341
2493
  options.sessionName !== undefined &&
2342
2494
  options.command !== undefined &&
2343
2495
  !SESSION_TAB_PINNING_EXCLUDED_COMMANDS.has(options.command) &&
@@ -2403,7 +2555,6 @@ const REF_INVALIDATING_BATCH_COMMANDS = new Set([
2403
2555
  "click",
2404
2556
  "dblclick",
2405
2557
  "drag",
2406
- "fill",
2407
2558
  "forward",
2408
2559
  "goto",
2409
2560
  "keyboard",
@@ -2567,7 +2718,7 @@ function buildPinnedBatchPlan(options: {
2567
2718
  const includeNavigationSummary = options.command !== undefined && NAVIGATION_SUMMARY_COMMANDS.has(options.command);
2568
2719
  const tabSelectionStep: BatchCommandStep = ["tab", options.selectedTab];
2569
2720
  const commandStep = options.commandTokens as BatchCommandStep;
2570
- const navigationSummarySteps: BatchCommandStep[] = includeNavigationSummary ? [["get", "title"], ["get", "url"]] : [];
2721
+ const navigationSummarySteps: BatchCommandStep[] = includeNavigationSummary ? [["eval", NAVIGATION_SUMMARY_EVAL]] : [];
2571
2722
  return {
2572
2723
  includeNavigationSummary,
2573
2724
  steps: [tabSelectionStep, commandStep, ...navigationSummarySteps],
@@ -2575,8 +2726,9 @@ function buildPinnedBatchPlan(options: {
2575
2726
  };
2576
2727
  }
2577
2728
 
2578
- function shouldCorrectSessionTabAfterCommand(options: { command?: string; sessionName?: string }): boolean {
2729
+ function shouldCorrectSessionTabAfterCommand(options: { command?: string; pinningRequired?: boolean; sessionName?: string }): boolean {
2579
2730
  return (
2731
+ options.pinningRequired === true &&
2580
2732
  options.sessionName !== undefined &&
2581
2733
  options.command !== undefined &&
2582
2734
  !SESSION_TAB_POST_COMMAND_CORRECTION_EXCLUDED_COMMANDS.has(options.command)
@@ -2655,12 +2807,8 @@ function unwrapPinnedSessionBatchEnvelope(options: {
2655
2807
  };
2656
2808
  }
2657
2809
 
2658
- const titleStep = options.includeNavigationSummary ? steps[2] : undefined;
2659
- const urlStep = options.includeNavigationSummary ? steps[3] : undefined;
2660
- const navigationSummary = normalizeSessionTabTarget({
2661
- title: extractStringResultField(titleStep?.result, "title"),
2662
- url: extractStringResultField(urlStep?.result, "url"),
2663
- });
2810
+ const navigationSummaryStep = options.includeNavigationSummary ? steps[2] : undefined;
2811
+ const navigationSummary = normalizeSessionTabTarget(extractNavigationSummaryFromData(navigationSummaryStep?.result));
2664
2812
  return {
2665
2813
  envelope: {
2666
2814
  success: commandStep.success !== false,
@@ -2711,17 +2859,13 @@ async function collectNavigationSummary(options: {
2711
2859
  sessionName?: string;
2712
2860
  signal?: AbortSignal;
2713
2861
  }): Promise<NavigationSummary | undefined> {
2714
- const { cwd, sessionName, signal } = options;
2715
- const title = extractStringResultField(
2716
- await runSessionCommandData({ args: ["get", "title"], cwd, sessionName, signal }),
2717
- "title",
2718
- );
2719
- const url = extractStringResultField(
2720
- await runSessionCommandData({ args: ["get", "url"], cwd, sessionName, signal }),
2721
- "url",
2722
- );
2723
- if (!title && !url) return undefined;
2724
- return { title, url };
2862
+ return extractNavigationSummaryFromData(await runSessionCommandData({
2863
+ args: ["eval", "--stdin"],
2864
+ cwd: options.cwd,
2865
+ sessionName: options.sessionName,
2866
+ signal: options.signal,
2867
+ stdin: NAVIGATION_SUMMARY_EVAL,
2868
+ }));
2725
2869
  }
2726
2870
 
2727
2871
  function extractScrollPositionSnapshot(data: unknown): ScrollPositionSnapshot | undefined {
@@ -2914,7 +3058,7 @@ function isComboboxFocusDiagnosticCommand(command: string | undefined, commandTo
2914
3058
  const explicitlyTargetsCombobox = commandTokens.some((token) => /^(?:combobox|listbox)$/i.test(token));
2915
3059
  if (!explicitlyTargetsCombobox) return false;
2916
3060
  if (command === "click" || command === "fill") return true;
2917
- return command === "find" && commandTokens.some((token) => ["click", "fill", "select"].includes(token));
3061
+ return command === "find" && commandTokens.some((token) => ["click", "fill"].includes(token));
2918
3062
  }
2919
3063
 
2920
3064
  function getCompiledSemanticActionRoleValue(compiled: CompiledAgentBrowserSemanticAction): string | undefined {
@@ -2925,7 +3069,7 @@ function getCompiledSemanticActionRoleValue(compiled: CompiledAgentBrowserSemant
2925
3069
  }
2926
3070
 
2927
3071
  function isComboboxFocusDiagnosticSemanticAction(compiled: CompiledAgentBrowserSemanticAction | undefined): boolean {
2928
- if (!compiled || !["click", "fill", "select"].includes(compiled.action)) return false;
3072
+ if (!compiled || !["click", "fill"].includes(compiled.action)) return false;
2929
3073
  return /^(?:combobox|listbox)$/i.test(getCompiledSemanticActionRoleValue(compiled) ?? "");
2930
3074
  }
2931
3075
 
@@ -3733,6 +3877,7 @@ export default function agentBrowserExtension(pi: ExtensionAPI) {
3733
3877
  let freshSessionOrdinal = 0;
3734
3878
  let sessionTabTargets = new Map<string, OrderedSessionTabTarget>();
3735
3879
  let sessionRefSnapshots = new Map<string, OrderedSessionRefSnapshot>();
3880
+ let sessionTabPinningReasons = new Map<string, "drift" | "restore">();
3736
3881
  let sessionTabTargetUpdateOrder = 0;
3737
3882
  let traceOwners = new Map<string, TraceOwner>();
3738
3883
  let artifactManifest: SessionArtifactManifest | undefined;
@@ -3747,6 +3892,7 @@ export default function agentBrowserExtension(pi: ExtensionAPI) {
3747
3892
  freshSessionOrdinal = restoredState.freshSessionOrdinal;
3748
3893
  sessionTabTargets = restoreSessionTabTargetsFromBranch(ctx.sessionManager.getBranch());
3749
3894
  sessionRefSnapshots = restoreSessionRefSnapshotsFromBranch(ctx.sessionManager.getBranch());
3895
+ sessionTabPinningReasons = new Map([...sessionTabTargets.keys()].map((sessionName) => [sessionName, "restore"]));
3750
3896
  sessionTabTargetUpdateOrder = Math.max(getLatestSessionTabTargetOrder(sessionTabTargets), getLatestSessionTabTargetOrder(sessionRefSnapshots));
3751
3897
  artifactManifest = restoreArtifactManifestFromBranch(ctx.sessionManager.getBranch());
3752
3898
  });
@@ -3765,6 +3911,7 @@ export default function agentBrowserExtension(pi: ExtensionAPI) {
3765
3911
  managedSessionActive = false;
3766
3912
  sessionTabTargets = new Map<string, OrderedSessionTabTarget>();
3767
3913
  sessionRefSnapshots = new Map<string, OrderedSessionRefSnapshot>();
3914
+ sessionTabPinningReasons = new Map<string, "drift" | "restore">();
3768
3915
  sessionTabTargetUpdateOrder = 0;
3769
3916
  traceOwners = new Map<string, TraceOwner>();
3770
3917
  artifactManifest = undefined;
@@ -4013,6 +4160,7 @@ export default function agentBrowserExtension(pi: ExtensionAPI) {
4013
4160
 
4014
4161
  const priorSessionTabTargetState = executionPlan.sessionName ? sessionTabTargets.get(executionPlan.sessionName) : undefined;
4015
4162
  const priorSessionTabTarget = priorSessionTabTargetState?.target;
4163
+ const sessionTabPinningReason = executionPlan.sessionName ? sessionTabPinningReasons.get(executionPlan.sessionName) : undefined;
4016
4164
  const priorRefSnapshotState = executionPlan.sessionName ? sessionRefSnapshots.get(executionPlan.sessionName) : undefined;
4017
4165
  const resolvedSemanticActionRefSnapshot = semanticActionVisibleRefResolution?.snapshot
4018
4166
  ? { ...semanticActionVisibleRefResolution.snapshot, target: semanticActionVisibleRefResolution.snapshot.target ?? priorSessionTabTarget }
@@ -4051,6 +4199,7 @@ export default function agentBrowserExtension(pi: ExtensionAPI) {
4051
4199
  shouldPinSessionTabForCommand({
4052
4200
  command: executionPlan.commandInfo.command,
4053
4201
  commandTokens,
4202
+ pinningRequired: sessionTabPinningReason !== undefined,
4054
4203
  sessionName: executionPlan.sessionName,
4055
4204
  stdin: toolStdin,
4056
4205
  })
@@ -4336,6 +4485,7 @@ export default function agentBrowserExtension(pi: ExtensionAPI) {
4336
4485
  observedSessionTabTarget &&
4337
4486
  shouldCorrectSessionTabAfterCommand({
4338
4487
  command: executionPlan.commandInfo.command,
4488
+ pinningRequired: sessionTabPinningReason !== undefined,
4339
4489
  sessionName: executionPlan.sessionName,
4340
4490
  })
4341
4491
  ) {
@@ -4418,9 +4568,14 @@ export default function agentBrowserExtension(pi: ExtensionAPI) {
4418
4568
  if (executionPlan.commandInfo.command === "close" && succeeded) {
4419
4569
  sessionTabTargets.delete(executionPlan.sessionName);
4420
4570
  sessionRefSnapshots.delete(executionPlan.sessionName);
4571
+ sessionTabPinningReasons.delete(executionPlan.sessionName);
4421
4572
  } else if (currentSessionTabTarget) {
4422
4573
  sessionTabTargets.set(executionPlan.sessionName, { order: tabTargetUpdateOrder, target: currentSessionTabTarget });
4423
4574
  }
4575
+ } else if (succeeded && currentSessionTabTarget) {
4576
+ // A stale overlapping command may have moved browser focus even though its older target
4577
+ // must not replace the newer logical target. Require tab pinning on the next call.
4578
+ sessionTabPinningReasons.set(executionPlan.sessionName, "drift");
4424
4579
  }
4425
4580
  const refSnapshot = succeeded
4426
4581
  ? executionPlan.commandInfo.command === "snapshot"
@@ -4464,9 +4619,18 @@ export default function agentBrowserExtension(pi: ExtensionAPI) {
4464
4619
  if (executionPlan.managedSessionName && succeeded) {
4465
4620
  managedSessionCwd = ctx.cwd;
4466
4621
  }
4622
+ if (executionPlan.sessionName && succeeded) {
4623
+ if (openResultTabCorrection || sessionTabCorrection || aboutBlankSessionMismatch?.recoveryApplied) {
4624
+ sessionTabPinningReasons.set(executionPlan.sessionName, "drift");
4625
+ } else if (sessionTabPinningReason === "restore") {
4626
+ sessionTabPinningReasons.delete(executionPlan.sessionName);
4627
+ }
4628
+ }
4629
+
4467
4630
  if (replacedManagedSessionName) {
4468
4631
  sessionTabTargets.delete(replacedManagedSessionName);
4469
4632
  sessionRefSnapshots.delete(replacedManagedSessionName);
4633
+ sessionTabPinningReasons.delete(replacedManagedSessionName);
4470
4634
  await closeManagedSession({
4471
4635
  cwd: priorManagedSessionCwd,
4472
4636
  sessionName: replacedManagedSessionName,
@@ -4622,10 +4786,28 @@ export default function agentBrowserExtension(pi: ExtensionAPI) {
4622
4786
  timedOut: processResult.timedOut,
4623
4787
  validationError: undefined,
4624
4788
  });
4789
+ let visibleRefFallbackDiagnostic: VisibleRefFallbackDiagnostic | undefined;
4790
+ const visibleRefFallbackSessionName = executionPlan.sessionName ?? extractExplicitSessionName(toolArgs);
4791
+ if (categoryDetails.failureCategory === "selector-not-found") {
4792
+ visibleRefFallbackDiagnostic = await collectVisibleRefFallbackDiagnostic({
4793
+ commandTokens,
4794
+ compiledSemanticAction,
4795
+ cwd: ctx.cwd,
4796
+ sessionName: visibleRefFallbackSessionName,
4797
+ signal,
4798
+ });
4799
+ if (visibleRefFallbackDiagnostic && visibleRefFallbackSessionName && shouldApplySessionTabTargetUpdate({ current: sessionRefSnapshots.get(visibleRefFallbackSessionName), updateOrder: tabTargetUpdateOrder })) {
4800
+ currentRefSnapshot = { ...visibleRefFallbackDiagnostic.snapshot, target: visibleRefFallbackDiagnostic.snapshot.target ?? currentSessionTabTarget };
4801
+ sessionRefSnapshots.set(visibleRefFallbackSessionName, { ...currentRefSnapshot, order: tabTargetUpdateOrder });
4802
+ }
4803
+ }
4625
4804
  let nextActions = presentation.nextActions ? [...presentation.nextActions] : undefined;
4626
4805
  if (categoryDetails.failureCategory === "stale-ref") {
4627
4806
  nextActions = sessionAwareStaleRefNextActions(executionPlan.sessionName);
4628
4807
  }
4808
+ if (visibleRefFallbackDiagnostic) {
4809
+ (nextActions ??= []).push(...buildVisibleRefFallbackNextActions({ diagnostic: visibleRefFallbackDiagnostic, sessionName: visibleRefFallbackSessionName }));
4810
+ }
4629
4811
  if (categoryDetails.failureCategory === "selector-not-found" && redactedCompiledSemanticAction) {
4630
4812
  const candidateActions = buildSemanticActionCandidateActions(redactedCompiledSemanticAction);
4631
4813
  if (candidateActions.length > 0) {
@@ -4691,6 +4873,7 @@ export default function agentBrowserExtension(pi: ExtensionAPI) {
4691
4873
  nextActions,
4692
4874
  pageChangeSummary,
4693
4875
  overlayBlockers: overlayBlockerDiagnostic,
4876
+ visibleRefFallback: visibleRefFallbackDiagnostic,
4694
4877
  comboboxFocus: comboboxFocusDiagnostic,
4695
4878
  recordingDependencyWarning,
4696
4879
  scrollNoop: scrollNoopDiagnostic,
@@ -4718,6 +4901,7 @@ export default function agentBrowserExtension(pi: ExtensionAPI) {
4718
4901
  timeoutMs: processResult.timeoutMs,
4719
4902
  };
4720
4903
 
4904
+ const visibleRefFallbackText = formatVisibleRefFallbackText(visibleRefFallbackDiagnostic);
4721
4905
  const semanticActionCandidateText = nextActions ? formatSemanticActionCandidateText(nextActions) : undefined;
4722
4906
  const overlayBlockerText = overlayBlockerDiagnostic ? formatOverlayBlockerText(overlayBlockerDiagnostic) : undefined;
4723
4907
  const selectorTextVisibilityText = formatSelectorTextVisibilityText(selectorTextVisibilityDiagnostics);
@@ -4728,7 +4912,7 @@ export default function agentBrowserExtension(pi: ExtensionAPI) {
4728
4912
  const artifactCleanupText = formatArtifactCleanupGuidanceText(artifactCleanup);
4729
4913
  const timeoutPartialProgressText = timeoutPartialProgress ? formatTimeoutPartialProgressText(timeoutPartialProgress) : undefined;
4730
4914
  const managedSessionOutcomeText = formatManagedSessionOutcomeText(managedSessionOutcome);
4731
- const rawAppendedDiagnosticText = [semanticActionCandidateText, overlayBlockerText, selectorTextVisibilityText, scrollNoopDiagnosticText, comboboxFocusDiagnosticText, recordingDependencyWarningText, evalStdinHintText, artifactCleanupText, timeoutPartialProgressText, managedSessionOutcomeText].filter((item): item is string => item !== undefined).join("\n\n");
4915
+ const rawAppendedDiagnosticText = [visibleRefFallbackText, semanticActionCandidateText, overlayBlockerText, selectorTextVisibilityText, scrollNoopDiagnosticText, comboboxFocusDiagnosticText, recordingDependencyWarningText, evalStdinHintText, artifactCleanupText, timeoutPartialProgressText, managedSessionOutcomeText].filter((item): item is string => item !== undefined).join("\n\n");
4732
4916
  const appendedDiagnosticText = redactSensitiveText(redactExactSensitiveText(rawAppendedDiagnosticText, exactSensitiveValues));
4733
4917
  const shouldAppendDiagnosticText = appendedDiagnosticText.length > 0 && (!userRequestedJson || plainTextInspection);
4734
4918
  const content = shouldAppendDiagnosticText && redactedContent[0]?.type === "text"
@@ -33,10 +33,11 @@ export const BRAVE_SEARCH_PROMPT_GUIDELINE =
33
33
 
34
34
  export const SHARED_BROWSER_PLAYBOOK_GUIDELINES = [
35
35
  "Standard workflow: open the page, snapshot -i, interact using current @refs from that snapshot, and re-snapshot after navigation, scrolling, rerendering, or other major DOM changes because refs are page-scoped; the wrapper fails mutation-prone stale/recycled refs before upstream can silently target a different current-page element.",
36
+ "For ordinary forms from one snapshot, batch multiple fill @refs before the submit/click step to avoid serial tool calls; if a fill may autosubmit, navigate, or rerender later fields, split the flow and refresh refs first.",
36
37
  "When snapshot -i compacts because the tree is oversized, scan visible output for Omitted high-value controls and optional details.data.highValueControlRefIds before opening the spill file: those list bounded searchboxes, textboxes, comboboxes, buttons, tabs, checkboxes, radios, options, and menuitems that did not fit the key/other ref previews.",
37
38
  "When a visible text or accessible-name target should survive ref churn, prefer find locators such as role, text, label, placeholder, alt, title, or testid with the intended action instead of guessing a CSS selector.",
38
39
  "Do not assume Playwright selector dialects such as text=Close or button:has-text('Close') are supported wrapper syntax unless current upstream agent-browser behavior has been verified.",
39
- "For authenticated or user-specific content like feeds, inboxes, dashboards, and accounts, prefer --profile Default on the first browser call and let the implicit session carry continuity. Use --auto-connect only if profile-based reuse is unavailable or the task is specifically about attaching to a running debug-enabled browser.",
40
+ "For authenticated or user-specific content explicitly requested by the user, such as feeds, inboxes, account pages, or private dashboards, prefer --profile Default on the first browser call and let the implicit session carry continuity. Do not use a real profile for public pages just because they are dashboards. Treat visible page content from real profiles as model-visible transcript data; use --auto-connect only if profile-based reuse is unavailable or the task is specifically about attaching to a running debug-enabled browser.",
40
41
  "Do not invent fixed explicit session names for routine tasks. Use the implicit session unless you truly need multiple isolated browser sessions in the same conversation.",
41
42
  "When using --profile, --session-name, --cdp, --state, --auto-connect, --init-script, --enable, -p/--provider, or iOS --device, put them on the first command for that session. If you intentionally use an explicit --session, keep using that same explicit session for follow-ups.",
42
43
  "If you already used the implicit session and now need launch-scoped flags like --profile, --session-name, --cdp, --state, --auto-connect, --init-script, --enable, -p/--provider, or iOS --device, retry with sessionMode set to fresh or pass an explicit --session for the new launch. After a successful unnamed fresh launch, later auto calls follow that new session.",
@@ -55,7 +56,7 @@ export const SHARED_BROWSER_PLAYBOOK_GUIDELINES = [
55
56
  "When using eval --stdin, scope checks and actions to the target element or route whenever possible instead of relying on broad page-wide text heuristics.",
56
57
  "When using eval --stdin for extraction, return the value you want instead of relying on console.log as the primary result channel. Prefer plain expressions like ({ title: document.title }) or explicitly invoked functions like (() => ({ title: document.title }))(); if a function-shaped snippet returns {}, details.evalStdinHint may warn that the function was serialized instead of called. If get text on a CSS selector surfaces details.selectorTextVisibility or selectorTextVisibilityAll, prefer a visible @ref, a more specific selector, or the inspect-visible-text-candidates nextAction over hidden tab content.",
57
58
  "When details.pageChangeSummary is present, use changeType and summary as a compact signal for navigation, DOM mutation, confirmations, or artifacts; when nextActionIds is set, match those ids to entries in details.nextActions (or per-step nextActions inside batch) for concrete follow-up payloads instead of inferring from prose alone. If a no-navigation click surfaces details.overlayBlockers, inspect the fresh snapshot evidence before using a close/dismiss candidate nextAction; ordinary page chrome without dialog/alertdialog evidence should not trigger this diagnostic.",
58
- "When commands save or spill files (screenshots, downloads, PDFs, traces, recordings, HAR, large snapshot spills), treat paths as provisional until details.artifactVerification shows every row verified: branch on missingCount, pendingCount, unverifiedCount, per-entry state, and optional limitation before downstream file use.",
59
+ "When commands save or spill files (screenshots, downloads, PDFs, traces, recordings, HAR, large snapshot spills), use the user's exact requested paths when given and treat paths as provisional until details.artifactVerification shows every row verified: branch on missingCount, pendingCount, unverifiedCount, per-entry state, and optional limitation before downstream file use or PASS/FAIL reporting.",
59
60
  "Do not call --help or other exploratory inspection commands unless the user explicitly asks for them or debugging the browser integration is necessary.",
60
61
  ] as const;
61
62
 
@@ -75,8 +76,8 @@ export const INSPECTION_TOOL_CALL_EXAMPLES = [
75
76
 
76
77
  export const WRAPPER_TAB_RECOVERY_BEHAVIOR = [
77
78
  "After launch-scoped open/goto/navigate calls that can restore existing tabs (for example --profile, --session-name, or --state), agent_browser best-effort re-selects the tab whose URL matches the returned page when restored tabs steal focus during launch.",
78
- "After a target tab is known for a session, later active-tab commands best-effort pin that tab inside the same upstream invocation when reconnect drift would otherwise move the command to a restored/background tab.",
79
- "After a successful command on a known target tab, agent_browser also best-effort restores that intended tab if a restored/background tab steals focus after the command completes.",
79
+ "After the wrapper observes tab-drift risk for a session (for example profile restore correction, overlapping stale opens, or resumed session state), later active-tab commands best-effort pin that tab inside the same upstream invocation. Routine same-session commands are not preflighted with tab list just because a target tab is known.",
80
+ "For sessions with observed tab-drift risk, after a successful command on a known target tab, agent_browser also best-effort restores that intended tab if a restored/background tab steals focus after the command completes. Routine same-session commands skip this post-command tab-list probe.",
80
81
  "If a known session target unexpectedly reports about:blank, agent_browser preserves the prior intended target, best-effort re-selects it when it still exists, and reports exact recovery guidance when it cannot be re-selected.",
81
82
  ] as const;
82
83
 
@@ -90,14 +91,14 @@ export function buildSharedBrowserPlaybookGuidelines(options: { includeBraveSear
90
91
 
91
92
  const RUNTIME_PROMPT_GUIDELINES = [
92
93
  "Use exactly one input mode: args, semanticAction, job, qa, sourceLookup, or networkSourceLookup. Use stdin only for batch, eval --stdin, auth save --password-stdin, or wrapper-generated batch modes.",
93
- "Common flow: open, snapshot -i, interact with current @refs or semanticAction, then re-snapshot after navigation, scrolling, rerenders, or DOM changes.",
94
+ "Common flow: open, snapshot -i, interact with current @refs or semanticAction, then re-snapshot after navigation, scrolling, rerenders, or DOM changes. For ordinary forms, batch same-snapshot fill @refs before the submit/click step; split if a fill may autosubmit, navigate, or rerender later fields. Respect explicit stop boundaries: if the user says to stop before order/post/purchase/submit, do not click that final action.",
94
95
  "Prefer stable locators for visible text/names: semanticAction or upstream find with role/text/label/placeholder/alt/title/testid. Use current @refs only from the latest same-page snapshot.",
95
- "Use sessionMode=fresh for launch-scoped state such as --profile, --session-name, --cdp, --state, --auto-connect, --init-script, --enable, providers, or iOS devices; otherwise let the implicit session carry continuity.",
96
- "For artifacts, read visible metadata and details.artifactVerification before using files. record stop needs ffmpeg on PATH. close does not delete saved files; cleanup is host-owned.",
96
+ "For tasks that explicitly require the user's signed-in/account-specific content, start with --profile Default plus sessionMode=fresh unless the user asks otherwise; visible page content is model-visible. Use sessionMode=fresh for other launch-scoped state such as --session-name, --cdp, --state, --auto-connect, --init-script, --enable, providers, or iOS devices; otherwise let the implicit session carry continuity.",
97
+ "For requested screenshots, recordings, downloads, PDFs, or HARs, save the exact user path and read details.artifactVerification before claiming success; report unavailable/missing artifacts instead of silently substituting paths. record stop needs ffmpeg on PATH. close does not delete saved files; cleanup is host-owned.",
97
98
  "When details.nextActions is present, prefer those exact follow-up payloads over prose or guessed selectors.",
98
99
  "For dense snapshots, check Omitted high-value controls and details.data.highValueControlRefIds before opening large spill files.",
99
100
  "For dashboards, verify scroll with screenshot/snapshot; if nothing moved, use scrollintoview <@ref> or target the real scroll region. Combobox clicks may only focus; re-snapshot and fall back to type, Enter/arrows, select, or option refs.",
100
- "For extraction, prefer get title/url/text/html/value/attr/count or eval --stdin that returns a value; do not rely on console.log. If selector visibility warnings appear, prefer visible @refs or nextActions.",
101
+ "For extraction, prefer get title/url/text/html/value/attr/count or eval --stdin with a plain expression in the tool stdin field; do not rely on console.log. When reading several known refs/selectors, use batch with JSON-array stdin (for example [[\"get\",\"text\",\"@e1\"]]) or eval --stdin instead of many serial get calls. If selector visibility warnings appear, prefer visible @refs or nextActions.",
101
102
  "For non-core debugging, pass upstream commands through args: network, diff, trace/profiler/record, console/errors, stream, dashboard, chat, react, vitals, pushstate, dialog, frame, tab.",
102
103
  ] as const;
103
104
 
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "pi-agent-browser-native",
3
- "version": "0.2.29",
3
+ "version": "0.2.30",
4
4
  "description": "pi extension that exposes agent-browser as a native tool for browser automation",
5
5
  "type": "module",
6
6
  "author": "Mitch Fultz (https://github.com/fitchmultz)",