npm - pi-agent-browser-native - Versions diffs - 0.2.25 → 0.2.26 - Mend

pi-agent-browser-native 0.2.25 → 0.2.26

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (15) hide show

package/CHANGELOG.md +22 -1
package/README.md +19 -8
package/docs/ARCHITECTURE.md +7 -1
package/docs/COMMAND_REFERENCE.md +22 -5
package/docs/RELEASE.md +5 -1
package/docs/REQUIREMENTS.md +2 -1
package/docs/SUPPORT_MATRIX.md +20 -0
package/docs/TOOL_CONTRACT.md +44 -13
package/extensions/agent-browser/index.ts +1003 -19
package/extensions/agent-browser/lib/playbook.ts +6 -5
package/extensions/agent-browser/lib/results/presentation.ts +83 -5
package/extensions/agent-browser/lib/results/shared.ts +72 -0
package/extensions/agent-browser/lib/results/snapshot.ts +69 -9
package/extensions/agent-browser/lib/runtime.ts +7 -2
package/package.json +1 -1

package/CHANGELOG.md CHANGED Viewed

@@ -1,6 +1,27 @@
 # Changelog
-## Unreleased
+## 0.2.26 - 2026-05-14
+### Added
+- artifact lifecycle cleanup guidance (`RQ-0079`): successful `close` results now include `details.artifactCleanup` and a visible `Artifact lifecycle` note when recent artifact metadata exists, making explicit that browser close does not delete user-chosen screenshot/download/PDF/trace/HAR/recording paths and listing paths for host-tool cleanup.
+- getter/eval discoverability diagnostics (`RQ-0078`): common unknown getter shortcuts such as `title`, `url`, and `text` now get grouped-`get` guidance (with exact `use-get-title` / `use-get-url` next actions where unambiguous), and function-shaped `eval --stdin` snippets that serialize to `{}` now add visible `Eval stdin hint` plus `details.evalStdinHint` so agents know to pass a plain expression or invoke the function explicitly.
+- managed-session outcome diagnostics for failed `sessionMode: "fresh"` calls (`RQ-0077`): failed or timed-out fresh launches now report `details.managedSessionOutcome` and, when the plan used `sessionMode: "fresh"` with a failing outcome, append visible `Managed session outcome: …` text so agents know whether the prior managed session was preserved or no managed session became current.
+- timeout partial-progress evidence for long `job` / `qa` / `batch` calls (`RQ-0076`): wrapper watchdog timeouts now add best-effort `details.timeoutPartialProgress` with planned steps, current page title/URL, and declared artifact path checks, and append visible `Timeout partial progress` recovery text.
+- QA/network failed-request impact classification (`RQ-0075`): `extensions/agent-browser/lib/results/shared.ts` now separates actionable network failures from benign low-impact browser icon misses such as missing `favicon.ico`, `qa` preserves benign misses as `qaPreset.warnings` instead of failing otherwise healthy smoke checks, and `network requests` presentation includes actionable/benign summary lines plus per-row impact tags. Contract and operator notes live in [`docs/TOOL_CONTRACT.md`](docs/TOOL_CONTRACT.md), [`docs/COMMAND_REFERENCE.md`](docs/COMMAND_REFERENCE.md), [`README.md`](README.md), and [`docs/SUPPORT_MATRIX.md`](docs/SUPPORT_MATRIX.md); regression coverage is in `test/agent-browser.extension-validation.test.ts` and `test/agent-browser.presentation.test.ts`.
+- post-success `get text <selector>` visibility diagnostics (`RQ-0074`): after a successful `get text` on a non-`@ref` CSS selector, `extensions/agent-browser/index.ts` may run an extra read-only `eval --stdin` probe (`buildVisibleTextProbeScript`), merge `details.selectorTextVisibility` (and `selectorTextVisibilityAll` when several batched selectors qualify), prepend `Selector text visibility warning` lines to visible text, and append `inspect-visible-text-candidates` next actions carrying the same probe script in `stdin`; skipped for `@e…` refs and for selectors whose string would leak secrets after redaction or match sensitive attribute-literal heuristics. Contract in [`docs/TOOL_CONTRACT.md`](docs/TOOL_CONTRACT.md#details), operator notes in [`README.md`](README.md), [`docs/COMMAND_REFERENCE.md`](docs/COMMAND_REFERENCE.md), [`docs/SUPPORT_MATRIX.md`](docs/SUPPORT_MATRIX.md), maintainer checklist in [`AGENTS.md`](AGENTS.md); regression `agentBrowserExtension warns when get text may read hidden selector matches` in [`test/agent-browser.extension-validation.test.ts`](test/agent-browser.extension-validation.test.ts)
+- optional `semanticAction.session` on native `agent_browser`: compiles to a leading `--session <name>` pair before upstream `find` argv so the locator shorthand targets a named upstream browser session instead of the extension-managed default; `buildExecutionPlan` skips implicit `--session` injection when argv already starts with `--session`; successful unified results echo `details.sessionName`; `retry-semantic-action-after-stale-ref` copies the compiled argv including that prefix, and bounded `try-*-candidate` next actions preserve the same session prefix from `getCompiledSemanticActionSessionPrefix` in `extensions/agent-browser/index.ts`. Contract in [`docs/TOOL_CONTRACT.md`](docs/TOOL_CONTRACT.md#semanticaction) and [`docs/TOOL_CONTRACT.md`](docs/TOOL_CONTRACT.md#sessionmode); operator notes in [`README.md`](README.md), [`docs/COMMAND_REFERENCE.md`](docs/COMMAND_REFERENCE.md), [`docs/SUPPORT_MATRIX.md`](docs/SUPPORT_MATRIX.md), [`docs/ARCHITECTURE.md`](docs/ARCHITECTURE.md#direct-subprocess-execution); playbook line in `extensions/agent-browser/lib/playbook.ts`; regression coverage in `test/agent-browser.extension-validation.test.ts`
+- bounded `selector-not-found` recovery for top-level `semanticAction`: when the wrapper still has `details.compiledSemanticAction`, `extensions/agent-browser/index.ts` may append `try-*-candidate` entries to `details.nextActions` and an `Agent-browser candidate fallbacks` block in visible text for specific `fill`/`click` locator pairs (`placeholder`, `text`, `label` only; not `select`); contract in [`docs/TOOL_CONTRACT.md`](docs/TOOL_CONTRACT.md#semanticaction) and [`docs/TOOL_CONTRACT.md`](docs/TOOL_CONTRACT.md#details), operator notes in [`README.md`](README.md), [`docs/COMMAND_REFERENCE.md`](docs/COMMAND_REFERENCE.md), [`docs/SUPPORT_MATRIX.md`](docs/SUPPORT_MATRIX.md), [`docs/ARCHITECTURE.md`](docs/ARCHITECTURE.md#direct-subprocess-execution), playbook line in `extensions/agent-browser/lib/playbook.ts`, regression coverage in `test/agent-browser.extension-validation.test.ts`
+- compact oversized `snapshot` output now documents the `Omitted high-value controls` prose block and matching `details.data.highValueControlRefIds` (plus related compact snapshot metadata) in [`docs/TOOL_CONTRACT.md`](docs/TOOL_CONTRACT.md#details), [`docs/COMMAND_REFERENCE.md`](docs/COMMAND_REFERENCE.md#snapshot), [`README.md`](README.md), and generated playbook guidance from `extensions/agent-browser/lib/playbook.ts`, with implementation in `extensions/agent-browser/lib/results/snapshot.ts` and regression coverage in `test/agent-browser.presentation.test.ts`
+### Changed
+- documentation for `RQ-0077` managed-session outcomes now matches `buildManagedSessionOutcome` / `formatManagedSessionOutcomeText`: when the visible `Managed session outcome: …` line is emitted (including ordering after other diagnostic tails per `rawAppendedDiagnosticText` in `extensions/agent-browser/index.ts`, and the missing-binary-only case), how `details.managedSessionOutcome` behaves after **`qa`** reclassification, and that `"auto"` failures can populate `details` without the extra prose line; updates in [`docs/TOOL_CONTRACT.md`](docs/TOOL_CONTRACT.md#details), [`docs/COMMAND_REFERENCE.md`](docs/COMMAND_REFERENCE.md), [`README.md`](README.md), [`docs/SUPPORT_MATRIX.md`](docs/SUPPORT_MATRIX.md), [`docs/ARCHITECTURE.md`](docs/ARCHITECTURE.md#session-model), [`AGENTS.md`](AGENTS.md), and this changelog.
+- documentation for `RQ-0076` timeout partial progress now matches `collectTimeoutPartialProgress` / `formatTimeoutPartialProgressText` in code: compiled `qa` shares the `job` step-list path; otherwise planned steps come from JSON-array `batch` stdin (caller-provided or wrapper-generated for `sourceLookup` / `networkSourceLookup`), only when each element is a string[] argv row; optional planned-URL fallback and `PI_AGENT_BROWSER_PROCESS_TIMEOUT_MS` watchdog tuning are called out in [`docs/COMMAND_REFERENCE.md`](docs/COMMAND_REFERENCE.md), [`docs/SUPPORT_MATRIX.md`](docs/SUPPORT_MATRIX.md), [`README.md`](README.md), and [`AGENTS.md`](AGENTS.md); [`docs/TOOL_CONTRACT.md`](docs/TOOL_CONTRACT.md#details) now documents `timeoutPartialProgress.summary`, the batch stdin parse rule, the six-step cap on visible `Planned steps` lines, and the `0/0` declared-path count when only page context is recovered.
+- batch stdin page-scoped ref preflight clears the ref-invalidating latch when a later `snapshot` step appears in the same JSON plan (`getBatchRefInvalidationMessage` in `extensions/agent-browser/index.ts`), matching documented `snapshot -i` spacing inside `batch`; contract expanded under [`docs/TOOL_CONTRACT.md`](docs/TOOL_CONTRACT.md#details) (`refSnapshot`); regression `agentBrowserExtension allows batch stdin ref steps after snapshot following an invalidating step` in [`test/agent-browser.extension-validation.test.ts`](test/agent-browser.extension-validation.test.ts)
+### Fixed
+- explicit `--json` calls keep machine-readable visible content parseable even when the wrapper attaches extra diagnostic structs such as `details.evalStdinHint`; diagnostic guidance remains available on `details` without being appended after the JSON payload.
+- overlay blocker candidate actions (`try-overlay-blocker-candidate-*`) no longer appear under the semantic-action `Agent-browser candidate fallbacks` heading; that prose is now limited to the bounded semantic locator fallback ids.
+- packaged Pi smoke now forces its temporary `npm pack` to write a tarball even when invoked from `npm publish --dry-run`, so the release lifecycle dry run validates the same smoke path instead of inheriting npm's outer dry-run mode.
 ## 0.2.25 - 2026-05-14

package/README.md CHANGED Viewed

@@ -57,14 +57,14 @@ The result is optimized for agent work:
 | Pain | Native wrapper capability | Proof surface |
 |---|---|---|
 | Agents build fragile shell commands | Exposes `agent_browser` with exact `args`, an optional `semanticAction` shorthand for common `find` flows, constrained `job` / `qa` presets and experimental `sourceLookup` / `networkSourceLookup` that compile short workflows to `batch`, plus controlled `stdin` and `sessionMode` | `extensions/agent-browser/index.ts`, [`docs/TOOL_CONTRACT.md`](docs/TOOL_CONTRACT.md) |
-| Page snapshots are too large | Shows compact, main-content-first summaries and stores full raw output in spill files when needed | `test/agent-browser.presentation.test.ts` |
+| Page snapshots are too large | Shows compact, main-content-first summaries, surfaces an `Omitted high-value controls` section (plus `details.data.highValueControlRefIds`) when dense pages hide inputs and tabs from the trimmed ref lists, and stores full raw output in spill files when needed | `extensions/agent-browser/lib/results/snapshot.ts`, `test/agent-browser.presentation.test.ts` |
 | Screenshots/downloads get lost in text | Normalizes artifact paths and reports existence, size, cwd, session, and repair status | [`docs/COMMAND_REFERENCE.md`](docs/COMMAND_REFERENCE.md#download-screenshot-and-pdf-files) |
 | Profile restores and tab drift confuse agents | Tracks managed sessions, pins intended tabs, and re-selects target tabs after drift | generated tab-recovery notes below; `test/agent-browser.resume-state.test.ts` |
 | Auth/profile workflows can leak secrets | Supports `auth save --password-stdin` and redacts sensitive args, URLs, stdout/stderr, details, and parse-failure spills | `test/agent-browser.extension-validation.test.ts` |
 | Stateful cookies/storage/auth output bloats or leaks context | Presentation layer redacts `details.data` for cookies and storage (field-aware values) and recursively scrubs other structured upstream JSON (network, diff, trace/profiler, stream, dashboard, chat, auth, dialog, frame, state, and similar) using sensitive key names plus string heuristics; masks sensitive argv flags and positionals; scrubs secrets from failed batch step errors; and exposes a compact redacted `batch` matrix on top-level `details.data` | `extensions/agent-browser/lib/results/presentation.ts`, `extensions/agent-browser/lib/runtime.ts`, `test/agent-browser.presentation.test.ts` |
-| Stale `@eN` refs fail mysteriously | Adds recovery guidance to rerun `snapshot -i` or use stable `find` locators | `test/agent-browser.results.test.ts` |
+| Stale `@eN` refs fail mysteriously | Records per-session `details.refSnapshot`, rejects mismatched URLs / unknown refs / unsafe `batch` stdin ordering before spawn, adds recovery guidance to rerun `snapshot -i` or use stable `find` locators | `extensions/agent-browser/index.ts`, `test/agent-browser.results.test.ts`, `test/agent-browser.extension-validation.test.ts` |
 | Agents need stable success/failure buckets | Exposes bounded `resultCategory`, `successCategory`, and `failureCategory` on tool `details` for branching without parsing prose | [`docs/TOOL_CONTRACT.md`](docs/TOOL_CONTRACT.md#details), `extensions/agent-browser/lib/results/shared.ts`, `test/agent-browser.results.test.ts` |
-| Models re-snapshot after every click without new URL/title context | Adds optional `details.pageChangeSummary` (and per-batch-step summaries) with `changeType`, compact text, optional `title`/`url`, artifact hints, and `nextActionIds` aligned to `nextActions` | [`docs/TOOL_CONTRACT.md`](docs/TOOL_CONTRACT.md#details), `extensions/agent-browser/lib/results/presentation.ts`, `test/agent-browser.presentation.test.ts` |
+| Models re-snapshot after every click without new URL/title context | Adds optional `details.pageChangeSummary` (and per-batch-step summaries) with `changeType`, compact text, optional `title`/`url`, artifact hints, and `nextActionIds` aligned to `nextActions`; no-navigation clicks can also surface evidence-backed `details.overlayBlockers` candidates | [`docs/TOOL_CONTRACT.md`](docs/TOOL_CONTRACT.md#details), `extensions/agent-browser/lib/results/presentation.ts`, `test/agent-browser.presentation.test.ts` |
 | Direct binary help may be blocked in agent sessions | Publishes a repo-readable command reference and verifies it against the target upstream version | `npm run verify` |
 | Agents need bundled `skills` text without touching the live session | Treats `skills list`, `skills get …`, and `skills path …` as stateless JSON reads: no implicit managed `--session` under default `sessionMode: "auto"` (same session-ownership goal as plain-text `--help` / `--version`), while provider workflows stay thin passthroughs that require upstream setup and credentials | [`docs/COMMAND_REFERENCE.md`](docs/COMMAND_REFERENCE.md#built-in-skills), `extensions/agent-browser/lib/runtime.ts` |
@@ -155,10 +155,13 @@ Run a multi-step flow in one tool call:
 { "args": ["batch"], "stdin": "[[\"open\",\"https://example.com\"],[\"snapshot\",\"-i\"]]" }
 ```
-Evaluate page JavaScript through stdin:
+If the same `batch` stdin later uses `@e…` on interaction commands after a step that can navigate or mutate the page (`open`, `click`, `fill`, and similar), insert a `snapshot` step whose first argv token is `snapshot` (for example `["snapshot","-i"]`) between those phases. The wrapper rejects unsafe ordering with `failureCategory: "stale-ref"` before upstream runs; full rules are under `refSnapshot` in [`docs/TOOL_CONTRACT.md`](docs/TOOL_CONTRACT.md#details).
+Evaluate page JavaScript through stdin. Return the value you want as an expression; `eval --stdin` may warn with `details.evalStdinHint` when a function-shaped snippet serializes to `{}` instead of being invoked:
 ```json
 { "args": ["eval", "--stdin"], "stdin": "document.title" }
+{ "args": ["eval", "--stdin"], "stdin": "({ title: document.title, url: location.href })" }
 ```
 Save an auth profile without putting the password in `args`:
@@ -180,18 +183,24 @@ For supported upstream `find` flows you can omit hand-built `args` and pass a to
 ```json
 { "semanticAction": { "action": "click", "locator": "text", "value": "Submit" } }
 { "semanticAction": { "action": "fill", "locator": "label", "value": "Email", "text": "user@example.com" } }
+{ "semanticAction": { "action": "click", "locator": "text", "value": "Close", "session": "named-browser" } }
 ```
 Typical pitfalls:
 - Supply **exactly one** of `args`, `semanticAction`, `job`, `qa`, `sourceLookup`, or `networkSourceLookup` per call (not more, not none).
 - `semanticAction` and `job` are **not** valid inside `batch` stdin; batch steps stay upstream argv string arrays (spell a `find` step as tokens there if you need it in a batch).
-- Commands or locators outside the supported shorthand still require explicit `args`.
+- Commands or locators outside the supported shorthand still require explicit `args`. Common page getters are grouped under `get`: use `get title`, `get url`, or `get text <selector>` rather than shortcut commands such as `title` or `url`; unknown getter shortcuts can return read-only `details.nextActions` like `use-get-title`.
+- Use `semanticAction.session` to target a named upstream browser session; the wrapper prepends `--session <name>` before `find` and keeps that prefix on retry/candidate actions.
+- Do not reuse `@e…` refs across navigation. The wrapper records the latest snapshot refs per session and fails mutation-prone stale/recycled refs before upstream can silently hit a different current-page element; use the session-aware `refresh-interactive-refs` next action.
 - If upstream classifies the failure as `stale-ref` and `details.compiledSemanticAction` is present, `details.nextActions` may list `retry-semantic-action-after-stale-ref` after `refresh-interactive-refs`, carrying the same compiled `find` argv so you can retry the locator-stable target once it is safe to do so (contract in [`docs/TOOL_CONTRACT.md#semanticaction`](docs/TOOL_CONTRACT.md#semanticaction)).
+- If the failure is `selector-not-found` for a compiled `semanticAction`, visible text may add `Agent-browser candidate fallbacks` and `details.nextActions` may list bounded `try-*-candidate` follow-ups (role/name retries only for `fill` + `placeholder`, `click` + `text`, or `fill` + `label`; `select` misses do not get these entries); prefer those payloads or a fresh snapshot over guessing new selectors (same contract link).
+- If a **top-level** `click` succeeds (unified command `click`, not a `batch` step), upstream reports `data.clicked`, and the tab URL is unchanged under the same normalization as ref preflight (fragment-insensitive), the wrapper may take one extra `snapshot -i` and add `Possible overlay blockers` with `details.overlayBlockers` (`candidates`, `summary`, optional `snapshot` refresh for refs) plus session-aware `inspect-overlay-state` / bounded `try-overlay-blocker-candidate-*` next actions when that snapshot shows overlay/banner/dialog context and close/dismiss-like controls. The unchanged-URL check uses `details.navigationSummary`, which is only populated via follow-up `get url` / `get title` when the click JSON omits **both** string `data.url` and `data.title`; if upstream already includes either, overlay diagnostics are skipped here. Also skipped when tab correction or about-blank recovery already ran on that result.
+- If `get text <selector>` reads a non-ref CSS selector with multiple matches or a hidden first match while visible matches exist, including successful `batch` steps, the wrapper may add `Selector text visibility warning`, `details.selectorTextVisibility` (plus `selectorTextVisibilityAll` for multiple batched warnings), and `inspect-visible-text-candidates` next actions; prefer a visible `@ref`, a scoped selector, or a targeted `eval --stdin` over hidden tab content.
 ### Constrained browser jobs
-For short repeatable workflows, pass a top-level `job` instead of hand-writing `batch` stdin. The wrapper only supports constrained steps (`open`, `click`, `fill`, `wait`, `assertText`, `assertUrl`, `waitForDownload`, and `screenshot`), compiles them to existing upstream `batch` commands, and echoes the compiled commands as `details.compiledJob` for auditability. There is no separate catalog of reusable named browser recipes above `job`, `qa`, and raw `batch`; see [`docs/ARCHITECTURE.md#no-reusable-recipe-layer-yet`](docs/ARCHITECTURE.md#no-reusable-recipe-layer-yet) for the closed `RQ-0068` decision and when to revisit it.
+For short repeatable workflows, pass a top-level `job` instead of hand-writing `batch` stdin. The wrapper only supports constrained steps (`open`, `click`, `fill`, `wait`, `assertText`, `assertUrl`, `waitForDownload`, and `screenshot`), compiles them to existing upstream `batch` commands, and echoes the compiled commands as `details.compiledJob` for auditability. The same compile path backs top-level `qa`, so long `qa` runs surface the same timeout evidence shape. If a long `job`, `qa`, or `batch` hits the wrapper watchdog, `details.timeoutPartialProgress` may recover planned steps, current page title/URL, and declared artifact paths that already exist on disk (see [`docs/TOOL_CONTRACT.md#details`](docs/TOOL_CONTRACT.md#details)). There is no separate catalog of reusable named browser recipes above `job`, `qa`, and raw `batch`; see [`docs/ARCHITECTURE.md#no-reusable-recipe-layer-yet`](docs/ARCHITECTURE.md#no-reusable-recipe-layer-yet) for the closed `RQ-0068` decision and when to revisit it.
 ```json
 {
@@ -209,7 +218,7 @@ Use raw `args`/`stdin` when you need full upstream `batch` power, custom flags,
 ### Lightweight QA preset
-For a quick smoke/QA pass, use top-level `qa`. It compiles to the same batch path as `job`, clears enabled network/console/page-error buffers before opening the target URL, waits for page readiness, checks optional expected text or selector, inspects fresh network requests, console messages, and page errors, and can capture an evidence screenshot. `checkNetwork`, `checkConsole`, and `checkErrors` default to true; set one to `false` to skip that diagnostic read.
+For a quick smoke/QA pass, use top-level `qa`. It compiles to the same batch path as `job`, clears enabled network/console/page-error buffers before opening the target URL, waits for page readiness, checks optional expected text or selector, inspects fresh network requests, console messages, and page errors, and can capture an evidence screenshot. `checkNetwork`, `checkConsole`, and `checkErrors` default to true; set one to `false` to skip that diagnostic read. Network failures are classified by likely impact: actionable document/script/API-style failures still fail QA, while some low-impact browser icon asset misses (for example certain `favicon` or `apple-touch-icon` paths when upstream marks the row failed and resource metadata looks image-like) surface only as warnings instead of failing an otherwise healthy smoke check (`details.qaPreset.warnings`, with human-readable `details.qaPreset.summary` when the preset still passes). Exact predicates live in [`docs/TOOL_CONTRACT.md`](docs/TOOL_CONTRACT.md#qa) and `classifyNetworkRequestFailure` in `extensions/agent-browser/lib/results/shared.ts`.
 ```json
 {
@@ -248,13 +257,15 @@ For asynchronous exports, click first and then wait for the download:
 With upstream `agent-browser 0.27.0`, treat `details.savedFilePath` as upstream-reported metadata and confirm `details.artifacts[].exists` before relying on the requested `wait --download <path>` file being present on disk.
+Artifact cleanup is host-owned, not a browser command. `close` shuts down the browser session but does **not** delete explicit screenshots, downloads, PDFs, traces, HAR files, or recordings saved to paths you chose. When the session’s non-empty `details.artifactManifest` is in scope, a successful `close` appends an `Artifact lifecycle` note and sets `details.artifactCleanup` with the same retention summary as `details.artifactRetentionSummary`, a fixed `note` about host-owned cleanup, and `explicitArtifactPaths`: up to ten distinct paths from manifest rows whose `storageScope` is `explicit-path` (this list can be empty if the recent window only holds spills or other non-explicit inventory). Remove any listed paths with normal file tools after inspection.
 Start a fresh profiled browser after the implicit public-browsing session already exists:
 ```json
 { "args": ["--profile", "Default", "open", "https://example.com/account"], "sessionMode": "fresh" }
 ```
-After a successful unnamed fresh launch, later default `sessionMode: "auto"` calls follow that browser automatically.
+After a successful unnamed fresh launch, later default `sessionMode: "auto"` calls follow that browser automatically. If the fresh launch fails or times out, `details.managedSessionOutcome` records whether the previous managed session was preserved or the attempted fresh session was abandoned before any managed session became current; a `Managed session outcome: …` line is appended only when the failing call used `sessionMode: "fresh"`.
 ## Authenticated/profile workflows

package/docs/ARCHITECTURE.md CHANGED Viewed

@@ -33,12 +33,13 @@ The extension should:
 - invoke it directly, not through a shell
 - inject `--json`
 - support optional stdin only for `eval --stdin`, `batch`, `auth save --password-stdin`, and wrapper-generated `batch` stdin from top-level `job`, `qa`, `sourceLookup`, or `networkSourceLookup`, rejecting other command/stdin combinations before launch
-- accept an optional native `semanticAction` object as a mutually exclusive alternative to `args` on a single tool call, compile it into upstream `find` argv, and echo the compiled shape in `details.compiledSemanticAction` for observability (see [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#semanticaction))
+- accept an optional native `semanticAction` object as a mutually exclusive alternative to `args` on a single tool call, compile it into upstream `find` argv (with optional `semanticAction.session` expanding to a leading `--session <name>` before `find` when targeting a named upstream browser instead of the managed default), and echo the compiled shape in `details.compiledSemanticAction` for observability (see [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#semanticaction))
 - accept an optional native `job` object (mutually exclusive with `args`, `semanticAction`, `qa`, `sourceLookup`, and `networkSourceLookup` on the same call) with a small fixed step vocabulary that compiles only to existing upstream `batch` argv rows, generates the JSON batch stdin string internally, and echoes `details.compiledJob` for observability (see [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#job))
 - accept an optional native `qa` object (mutually exclusive with `args`, `semanticAction`, `job`, `sourceLookup`, and `networkSourceLookup` on the same call) that compiles to the same `batch` path as `job`, runs a fixed diagnostic smoke sequence, and echoes `details.compiledQaPreset` plus structured `details.qaPreset` pass/fail evidence (see [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#qa))
 - accept an optional native `sourceLookup` object (mutually exclusive with `args`, `semanticAction`, `job`, `qa`, and `networkSourceLookup` on the same call) that compiles to the same `batch` path, gathers evidence-backed local source *candidates* for a selector/fiber/component name, and echoes `details.compiledSourceLookup` plus structured `details.sourceLookup` (see [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#sourcelookup)); unlike `qa`, it never applies a second pass/fail layer that marks the tool failed when upstream already reported batch success—failed upstream steps still fail the invocation normally, and `details.sourceLookup` may still be present for partial evidence
 - accept an optional native `networkSourceLookup` object (mutually exclusive with `args`, `semanticAction`, `job`, `qa`, and `sourceLookup` on the same call) that compiles to the same `batch` path, correlates failed network requests with initiator metadata and bounded workspace URL literals, and echoes `details.compiledNetworkSourceLookup` plus structured `details.networkSourceLookup` (see [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#networksourcelookup)); like `sourceLookup`, it never flips a successful upstream batch to failed solely because no source candidates were found
 - when that compiled path fails as `stale-ref`, optionally append a `retry-semantic-action-after-stale-ref` entry to `details.nextActions` after the usual `refresh-interactive-refs` snapshot step so agents can re-issue the same compiled `find` argv only when the failure implies the interaction did not run (contract in [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#semanticaction))
+- when the same compiled path fails as `selector-not-found` for the bounded locator/action pairs documented there, optionally append `try-*-candidate` entries to `details.nextActions` and mirror them in visible text as `Agent-browser candidate fallbacks` so agents can retry role/name `find` variants without hand-rebuilding argv (`select` misses are intentionally excluded)
 ### Agent-first UX
@@ -110,12 +111,15 @@ Practical policy:
 - close the active extension-managed session when the originating `pi` process quits, while leaving explicit caller-provided sessions alone
 - set an idle timeout on extension-managed sessions as a backstop for abnormal exits or cleanup failures
 - clean up process-private temp spill artifacts on shutdown, but keep persisted-session snapshot spill files in a private session-scoped artifact directory with a bounded per-session budget so `details.fullOutputPath` stays usable after reload/resume without unbounded growth
+- keep explicit screenshots, downloads, PDFs, traces, HAR captures, and recordings written to caller-chosen paths on disk after a successful upstream `close`; when the bounded `details.artifactManifest` has entries, successful `close` also surfaces `details.artifactCleanup` and an `Artifact lifecycle` note (including up to ten distinct `explicit-path` manifest paths when present) so operators remove files with normal host tools—the native tool does not delete arbitrary user paths (`extensions/agent-browser/index.ts`, `getArtifactCleanupGuidance`); contract in [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#details), checklist `RQ-0079` in [`SUPPORT_MATRIX.md`](SUPPORT_MATRIX.md)
 - reconstruct the current extension-managed session from persisted tool details on resume/reload so later default calls keep following the active managed browser
 - if an unnamed fresh launch replaces an active extension-managed session, best-effort close the old managed session after the switch succeeds
 - leave explicit caller-provided `--session` choices alone unless the caller closes them explicitly
 - after profiled `open` / `goto` / `navigate` calls, verify the active tab still matches the returned page URL and best-effort switch back when restored profile tabs steal focus
 - once the wrapper knows which tab the agent is operating on, later active-tab commands may synthesize a tiny upstream `batch` that re-selects that tab and then runs the requested command in the same upstream invocation; this stays thin while avoiding reconnect-time drift on profile-restored sessions
 - after a successful command on a known tab target, the wrapper may best-effort restore that same target again if restored/background tabs steal focus after the command returns
+- keep a per-session `refSnapshot` aligned with the last successful `snapshot` (including refs merged from a successful `batch` by taking the last successful `snapshot` step in batch result order): restore it from persisted tool `details` when reloading or resuming, drop it on successful `close`, and refuse mutation-prone `@e…` argv before spawn when the active tab URL no longer matches the snapshot URL, when a ref id was never in that snapshot, or when `batch` stdin would reuse `@e…` on a guarded step after an earlier invalidating step without a later `snapshot` step in the same stdin array—see [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#details) for the agent-visible contract and failure text
+- after successful `get text` on a non-ref CSS selector, optionally issue one read-only `eval --stdin` probe per qualifying selector when multiple DOM matches or a hidden first match with visible peers could misread tabbed or off-screen content; merge `details.selectorTextVisibility` / `selectorTextVisibilityAll`, visible warning lines, and `inspect-visible-text-candidates*` next actions as documented in [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#details) and `RQ-0074` in [`SUPPORT_MATRIX.md`](SUPPORT_MATRIX.md)
 - for local Unix launches, set a short private socket directory so extension-generated session names do not fail on the upstream Unix socket-path length limit
 - keep wrapper-spawned upstream CLI calls inside the upstream IPC budget by clamping `AGENT_BROWSER_DEFAULT_TIMEOUT` to 25 seconds and stopping a stuck child process before the upstream 30-second read-timeout retry loop begins
@@ -144,6 +148,8 @@ Implementation detail lives in `extensions/agent-browser/lib/runtime.ts` (`findC
 A successful unnamed `sessionMode: "fresh"` launch should become the new extension-managed session so later default calls follow that browser instead of silently snapping back to the older managed session.
+When a managed implicit or fresh `--session` plan reaches process execution, `details.managedSessionOutcome` summarizes the managed-session transition: on **success**, statuses such as `created`, `replaced`, `unchanged`, or `closed` describe what became current (including successful `close`); on **failure** (launch error, timeout, missing binary, **`qa`** reclassification after a nominally successful batch, failed `close`, and similar), `preserved` vs `abandoned` captures whether a prior managed session stayed current or no managed session ended up active, plus related names and booleans. Failing calls that used `sessionMode: "fresh"` also append a short `Managed session outcome: …` line to model-visible text so the next default `sessionMode: "auto"` hop is obvious; `"auto"` failures may still populate the struct without that extra line. Implementation and field semantics live in `extensions/agent-browser/index.ts` (`buildManagedSessionOutcome`, `formatManagedSessionOutcomeText`); agent contract in [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#details); checklist row `RQ-0077` in [`SUPPORT_MATRIX.md`](SUPPORT_MATRIX.md).
 ## Preferring the native tool
 Keep the handling simple:

package/docs/COMMAND_REFERENCE.md CHANGED Viewed

@@ -62,6 +62,7 @@ Tool parameters (use exactly one of `args`, `semanticAction`, `job`, `qa`, `sour
 - `sessionMode`:
   - `"auto"` reuses the extension-managed session when possible.
   - `"fresh"` rotates that managed session to a fresh upstream launch so launch-scoped flags like `--profile`, `--session-name`, `--cdp`, `--state`, `--auto-connect`, `--init-script`, `--enable`, `-p` / `--provider`, or iOS `--device` apply.
+  - If a fresh launch fails or times out, read `details.managedSessionOutcome` for `preserved` vs `abandoned` (and related fields). A model-visible `Managed session outcome: …` line is appended only for failing calls that used `sessionMode: "fresh"`; `"auto"` failures can still populate the struct without that extra line.
 ### Debug, diff, stream, dashboard, and chat families
@@ -126,15 +127,20 @@ Examples:
 { "args": ["find", "label", "Email", "fill", "user@example.com"] }
 { "semanticAction": { "action": "click", "locator": "role", "value": "button", "name": "Close" } }
 { "semanticAction": { "action": "fill", "locator": "label", "value": "Email", "text": "user@example.com" } }
+{ "semanticAction": { "action": "click", "locator": "text", "value": "Close", "session": "named-browser" } }
 { "semanticAction": { "action": "uncheck", "locator": "label", "value": "Remember me" } }
 { "args": ["scrollintoview", "@e12"] }
 { "args": ["snapshot", "-i"] }
 ```
-The optional native `semanticAction` object is only a thin schema for common locator-based actions; it compiles to existing upstream `find` commands and reports the compiled argv in `details.compiledSemanticAction` (see [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#semanticaction) for the full field rules). It is a top-level alternative to `args`, `job`, `qa`, `sourceLookup`, and `networkSourceLookup`, not a nested shape inside `batch` stdin arrays.
+The optional native `semanticAction` object is only a thin schema for common locator-based actions; it compiles to existing upstream `find` commands and reports the compiled argv in `details.compiledSemanticAction` (see [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#semanticaction) for the full field rules). It is a top-level alternative to `args`, `job`, `qa`, `sourceLookup`, and `networkSourceLookup`, not a nested shape inside `batch` stdin arrays. Add `session` inside `semanticAction` when the shorthand should target a named upstream browser session; the compiled argv prepends `--session <name>` before `find`, and fallback candidate actions preserve that prefix. If a semantic action misses with `selector-not-found`, visible output may include `Agent-browser candidate fallbacks`, while `details.nextActions` first recommends a fresh `snapshot -i` and may include bounded role/name retries—for example `searchbox`/`textbox` for a missed `placeholder` fill, `button`/`link` for a missed `text` click, or a `textbox` retry for a missed `label` fill—each as a `try-*-candidate` entry carrying redacted `find role …` argv.
 Do not assume Playwright selector dialects such as `text=Close` or `button:has-text('Close')` are supported wrapper syntax. If you need those forms, verify current upstream `agent-browser` behavior first; otherwise use refs, `find`, or known CSS selectors.
+Treat `@e…` refs as page-scoped. After a successful `snapshot`, the wrapper records the latest refs and page target for that session; mutation-prone ref commands such as `click @e4` or batch steps with old refs fail with `failureCategory: "stale-ref"` when the page target changed or the ref is absent from the latest same-page snapshot. Inside `batch` stdin JSON, the wrapper also walks steps in order before spawn: steps whose first token can navigate or mutate set a latch; a later step whose first token is `snapshot` clears that latch for following rows; guarded steps that still mention `@e…` after an uncleared latch fail with the same `stale-ref` bucket without launching upstream. Follow the `refresh-interactive-refs` next action (it includes `--session <name>` when needed) and prefer stable `find` or `semanticAction` locators when navigation or rerendering is likely. Contract detail: [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#details) (`refSnapshot`).
+When a **top-level** `click` succeeds (not a `click` hidden inside a `batch`/`job` tool call—the unified command must be `click`), the upstream payload includes `data.clicked`, and the wrapper sees the active tab URL unchanged after the same normalization it uses for ref guards (**`#fragment` ignored**), it may run one extra `snapshot -i` and surface `Possible overlay blockers` plus `details.overlayBlockers` (`candidates`, `summary`, and a `snapshot` map that can refresh `refSnapshot`) when that snapshot shows overlay/banner/dialog context **and** up to three close/dismiss-like controls. The URL check compares the session’s prior pinned tab target to `details.navigationSummary.url` after the click; that summary is gathered with extra `get url` / `get title` calls only when the click JSON omits **both** string `data.url` and `data.title`—if upstream already echoes either field, overlay diagnostics are skipped on this path. The diagnostic is skipped if the wrapper already applied tab-focus correction or about-blank recovery on that result. Appended `inspect-overlay-state` / `try-overlay-blocker-candidate-*` entries in `details.nextActions` include `--session <name>` when the session is named, same as other session-scoped follow-ups. Treat `inspect-overlay-state` as the safe first follow-up; only use a `try-overlay-blocker-candidate-*` next action when the candidate is clearly the control you intend to close.
 ### Extract page data
 ```json
@@ -144,7 +150,11 @@ Do not assume Playwright selector dialects such as `text=Close` or `button:has-t
 { "args": ["eval", "--stdin"], "stdin": "document.title" }
 ```
-Prefer `get` and scoped `eval --stdin` for read-only extraction. Return the intended JavaScript value instead of relying on `console.log`.
+Prefer `get` and scoped `eval --stdin` for read-only extraction. Getter names are grouped under `get`: use `get title`, `get url`, or `get text <selector>`, not shortcut commands such as `title` or `url`. When upstream reports an unknown command, unknown subcommand, or unrecognized command for a single-token shortcut (`attr`, `count`, `html`, `text`, `title`, `url`, or `value`), the wrapper adds a visible grouped-`get` hint; only `title` and `url` also get exact read-only `details.nextActions` (`use-get-title` / `use-get-url`, with `--session` preserved when the failed call named a session). If another `Agent-browser hint:` (selector dialect or stale-ref recovery) was already appended to the same error text, the getter hint is omitted.
+Return the intended JavaScript value from `eval --stdin` instead of relying on `console.log`. For object-shaped extraction, pass a plain expression such as `({ title: document.title, url: location.href })`; if you send a function-shaped snippet, invoke it explicitly, for example `(() => ({ title: document.title }))()`. When upstream serializes a function result to `{}`, the wrapper can append `Eval stdin hint` and `details.evalStdinHint`.
+On tabbed or hidden-DOM pages, `get text <selector>` reads the upstream-selected match, which may be hidden even when a later match is visible. For non-`@ref` CSS selectors with multiple matches, including successful `batch` steps, the wrapper may add `Selector text visibility warning`, `details.selectorTextVisibility` (and `details.selectorTextVisibilityAll` for multiple batched warnings), and `inspect-visible-text-candidates` next actions so agents know to use a fresher `snapshot -i`, a visible `@ref`, or a more specific selector instead of trusting hidden tab content.
 ### Run a multi-step flow in one browser invocation
@@ -170,7 +180,9 @@ For short constrained flows, use top-level `job` instead of hand-writing `batch`
 Use raw `args: ["batch"]` with `stdin` when you need arbitrary upstream commands, flags, or batch failure policies outside the constrained schema. Do not pass `stdin` with `job`, `qa`, `sourceLookup`, or `networkSourceLookup`; those modes generate the batch stdin themselves.
-For quick smoke/QA checks, use top-level `qa`. It clears enabled network/console/page-error buffers before opening the target URL, waits for page readiness, checks expected text/selector, inspects fresh network requests, console messages, and page errors, and can capture an evidence screenshot.
+For quick smoke/QA checks, use top-level `qa`. It clears enabled network/console/page-error buffers before opening the target URL, waits for page readiness, checks expected text/selector, inspects fresh network requests, console messages, and page errors, and can capture an evidence screenshot. QA network diagnostics classify failed requests by likely impact: actionable document/script/API-style failures fail the preset, while common low-impact browser icon misses such as `favicon.ico` are surfaced as warnings (`qaPreset.warnings`) so they do not fail an otherwise healthy page.
+The same classification drives plain `network requests` presentation: when any row counts as failed (HTTP status ≥ 400, `failed: true`, or a string `error`), model-facing text starts with a line like `Network failure summary: 0 actionable, 1 benign low-impact (1 total).`, and each preview line can end with an impact tag such as `[benign: low-impact browser icon asset]` or `[actionable: document, script, API, or non-benign request failure]`. Rules live in `classifyNetworkRequestFailure` / `summarizeNetworkFailures` in `extensions/agent-browser/lib/results/shared.ts`; QA aggregation is `analyzeQaPresetResults` in `extensions/agent-browser/index.ts`.
 ```json
 { "qa": { "url": "https://example.com", "expectedText": "Example Domain", "screenshotPath": ".dogfood/qa-example.png" } }
@@ -253,6 +265,8 @@ The wrapper keeps a bounded, metadata-only `details.artifactManifest` of recent
 This manifest cap controls what appears in `details.artifactManifest` and in summaries such as `Session artifacts: 42 live, 0 evicted (42/100 recent)`. It does not delete explicit files that upstream saved to paths you chose, such as screenshots, PDFs, downloads, traces, HAR files, or WebM recordings.
+Browser `close` is also not file cleanup. If `details.artifactManifest` is present with a non-empty `entries` list, a successful `close` appends an `Artifact lifecycle` note and reports `details.artifactCleanup` with the current retention summary and the same host-owned cleanup `note` as the contract (`extensions/agent-browser/index.ts`, `getArtifactCleanupGuidance`). Up to ten distinct user-chosen paths appear in `explicitArtifactPaths` when matching `explicit-path` manifest rows exist in the recent window; otherwise that array is empty and visible text may omit the “Explicit artifact paths” line even though the lifecycle block still reminds you that close does not delete saved files. Delete any paths you care about with host file tools after inspection; the native browser tool intentionally does not remove arbitrary user-chosen filesystem paths.
 Oversized snapshots and oversized generic outputs are different: when a persisted pi session is available, their wrapper-managed spill files are stored under the private session artifact directory and are governed by the byte budget `PI_AGENT_BROWSER_SESSION_ARTIFACT_MAX_BYTES` (default 32 MiB). Raise that byte budget as well for long QA sessions that need many full raw snapshots or large text spills to survive reload/resume.
 ### Switch from an already-active implicit session to a fresh profiled launch
@@ -439,6 +453,8 @@ Stable tab ids look like `t1`, `t2`, and `t3`. Optional user labels such as `doc
 | `snapshot -d <n>` / `snapshot --depth <n>` | Limit tree depth. |
 | `snapshot -s <sel>` / `snapshot --selector <sel>` | Scope to a CSS selector. |
+When a snapshot is too large for inline output, the Pi wrapper renders a compact view before spilling the full raw snapshot to `details.fullOutputPath`. Compact snapshots are main-content-first, but dense pages can still hide actionable controls in omitted content; in that case, look for `Omitted high-value controls` to find bounded searchboxes, textboxes, comboboxes, buttons, tabs, checkboxes, radios, options, and menuitems that were not already listed under key refs or other refs. When that section appears, `details.data.highValueControlRefIds` repeats the same ref ids for programmatic follow-up alongside fields such as `previewMode`, `previewSections`, and counts on `details.data` (see [`docs/TOOL_CONTRACT.md`](docs/TOOL_CONTRACT.md#details)).
 ### Wait
 | Mode | Purpose |
@@ -514,7 +530,7 @@ Long-running or lifecycle commands should be explicitly paired with cleanup call
 | `doctor [--fix]` | Diagnose install issues and optionally auto-clean stale files. Use `doctor --offline --quick` for a fast local-only check and `doctor --json` for structured output. |
 | `profiles` | List available Chrome profiles. |
-When these commands are invoked through the native `agent_browser` tool, structured diagnostic/status outputs are rendered as compact summaries. List-like outputs such as sessions, Chrome profiles, auth profiles, network requests, console messages, and page errors include counts and key fields; large outputs are previewed with a `Full output path:` spill file instead of dumping the entire payload into context. For `network requests`, the wrapper shows status, method, URL, resource/mime type, request id, and, when the installed upstream output includes body-like fields, bounded redacted payload, response, and failure/error snippets. `network request <requestId>` can expose upstream full-detail body fields such as response bodies using the same bounded model-facing preview. Header, cookie, auth, token, and other secret-like fields are not expanded in model-facing text or `details.data`; command echoes also redact `--body`, `--headers`, `--password`, proxy credentials, auth-bearing URLs, cookie/storage values, and bearer/basic credential text in positional arguments. Use upstream HAR or full raw details only when complete data is required.
+When these commands are invoked through the native `agent_browser` tool, structured diagnostic/status outputs are rendered as compact summaries. List-like outputs such as sessions, Chrome profiles, auth profiles, network requests, console messages, and page errors include counts and key fields; large outputs are previewed with a `Full output path:` spill file instead of dumping the entire payload into context. For `network requests`, the wrapper shows a failed-request summary split into actionable versus benign low-impact rows, then status, method, URL, resource/mime type, request id, and, when the installed upstream output includes body-like fields, bounded redacted payload, response, and failure/error snippets. `network request <requestId>` can expose upstream full-detail body fields such as response bodies using the same bounded model-facing preview. Header, cookie, auth, token, and other secret-like fields are not expanded in model-facing text or `details.data`; command echoes also redact `--body`, `--headers`, `--password`, proxy credentials, auth-bearing URLs, cookie/storage values, and bearer/basic credential text in positional arguments. Use upstream HAR or full raw details only when complete data is required.
 ## Important global flags, config, and environment
@@ -585,6 +601,7 @@ Other useful environment variables include `AGENT_BROWSER_DEFAULT_TIMEOUT`, `AGE
 - The extension may keep following one implicit managed session across later tool calls.
 - If launch-scoped flags like `--profile`, `--session-name`, `--cdp`, `--state`, `--auto-connect`, `--init-script`, `--enable`, `--provider` / `-p`, or provider device flags like `--device` would be ignored because that implicit session is already active, retry with `sessionMode: "fresh"`.
+- If a `sessionMode: "fresh"` call fails (including upstream failure, timeout, missing binary, or **`qa`** reclassification after a nominally successful batch), read `details.managedSessionOutcome` before assuming where the next default call will go: `preserved` means the prior managed session remains current, while `abandoned` means no managed session became current. When the failure reason is not the fresh launch itself—for example `failureCategory: "qa-failure"`—`status`/`summary` may still describe the managed-session transition while `succeeded` on this object matches the final tool outcome.
 <!-- agent-browser-playbook:start wrapper-tab-recovery -->
 <!-- Generated from extensions/agent-browser/lib/playbook.ts. Run `npm run docs -- playbook write` to update. -->
 - After launch-scoped open/goto/navigate calls that can restore existing tabs (for example --profile, --session-name, or --state), agent_browser best-effort re-selects the tab whose URL matches the returned page when restored tabs steal focus during launch.
@@ -592,7 +609,7 @@ Other useful environment variables include `AGENT_BROWSER_DEFAULT_TIMEOUT`, `AGE
 - After a successful command on a known target tab, agent_browser also best-effort restores that intended tab if a restored/background tab steals focus after the command completes.
 - If a known session target unexpectedly reports about:blank, agent_browser preserves the prior intended target, best-effort re-selects it when it still exists, and reports exact recovery guidance when it cannot be re-selected.
 <!-- agent-browser-playbook:end wrapper-tab-recovery -->
-- Wrapper-spawned commands clamp `AGENT_BROWSER_DEFAULT_TIMEOUT` to 25 seconds and use a 28-second child-process watchdog so one upstream CLI call does not cross the upstream 30-second IPC read-timeout/retry path.
+- Wrapper-spawned commands clamp `AGENT_BROWSER_DEFAULT_TIMEOUT` to 25 seconds and use a 28-second child-process watchdog (`PI_AGENT_BROWSER_PROCESS_TIMEOUT_MS` overrides the default 28s budget) so one upstream CLI call does not cross the upstream 30-second IPC read-timeout/retry path. When that watchdog fires, `details.timeoutPartialProgress` may include a planned step list for compiled `job` / `qa` plans or caller `batch` stdin, current page title/URL from best-effort session `get url` / `get title` (or a planned URL inferred from the step list when the session cannot answer), and declared artifact paths such as `screenshot`, `pdf`, `download`, or `wait --download` outputs with existence/size checks; the same evidence is appended under `Timeout partial progress` in visible text with URL/path redaction.
 - Oversized snapshots and oversized generic outputs may be compacted in tool content, with the full raw output written to a spill file path shown directly in the tool result. Recent artifact metadata is bounded by `PI_AGENT_BROWSER_SESSION_ARTIFACT_MANIFEST_MAX_ENTRIES` (default 100); persisted spill files are separately bounded by `PI_AGENT_BROWSER_SESSION_ARTIFACT_MAX_BYTES` (default 32 MiB).
 - The wrapper keeps `--help` and `--version` stateless so they do not consume the implicit managed-session slot.

package/docs/RELEASE.md CHANGED Viewed

@@ -7,6 +7,8 @@ Related docs:
 - [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md)
 - [`SUPPORT_MATRIX.md`](SUPPORT_MATRIX.md)
 - Bounded `agent_browser` outcome metadata on `details` (`resultCategory`, `successCategory`, `failureCategory`, optional `nextActions`, optional `pageChangeSummary` with per-step summaries on `batch`): contract in [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#details); maintainer checklists under “Tool result categories” and “Page-change summaries” in [`../AGENTS.md`](../AGENTS.md)
+- Post-success `get text` selector visibility (`RQ-0074`): optional `details.selectorTextVisibility` / `selectorTextVisibilityAll`, visible warnings, and `inspect-visible-text-candidates*` next actions after read-only visibility probes—[`SUPPORT_MATRIX.md`](SUPPORT_MATRIX.md), [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#details), and [`../AGENTS.md`](../AGENTS.md) maintainer checklist
+- Managed-session outcomes (`RQ-0077`): after extension-managed implicit or fresh `--session` injection reaches process execution, `details.managedSessionOutcome` records the transition (`created` / `replaced` / `unchanged` / `closed` on success; `preserved` / `abandoned` when a plan fails before a new session becomes current). Failing `sessionMode: "fresh"` calls also append model-visible `Managed session outcome: …`—[`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#details), [`COMMAND_REFERENCE.md`](COMMAND_REFERENCE.md), [`SUPPORT_MATRIX.md`](SUPPORT_MATRIX.md), and [`../AGENTS.md`](../AGENTS.md) maintainer checklist
 - Stateful context commands (`cookies`, `storage`, `auth`, `dialog`, `frame`, `state`) and aggregate `batch` results: model-facing `details.data` is summarized or redacted per [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#details); aggregate `batch` replaces top-level `details.data` with a compact per-step matrix (`success`, argv-redacted `command`, redacted `result` or scrubbed `error`) while full per-step payloads, artifacts, and categories remain on `batchSteps[]`—operational notes in [`COMMAND_REFERENCE.md`](COMMAND_REFERENCE.md#use-stateful-browser-context-commands-safely), assembly in `extensions/agent-browser/lib/results/presentation.ts`
 ## Purpose
@@ -101,10 +103,12 @@ Before publishing, validate both local-checkout modes without mixing their assum
 4. Run a smoke prompt that exercises `agent_browser`.
 5. Restart the `pi` process after extension edits; Pi settings and `/reload` are not the validation target in this isolated mode.
-For expanded-surface validation, the smoke prompt should cover native tool invocation rather than shelling out to `agent-browser`: `--version`, `--help`, `skills list`, `skills get core --full`, `open` with `sessionMode: "fresh"`, `snapshot -i`, `click`, `eval --stdin`, `batch` via stdin, top-level `job`, `qa`, or experimental `sourceLookup` / `networkSourceLookup` (compiled batch smoke), `screenshot <path>`, explicit `--session … open` plus `--session … close`, `network requests`, `console` / `errors`, `diff snapshot`, `stream status` plus `stream disable`, `dashboard start` plus `dashboard stop`, and `chat <message>` (credential failure is acceptable evidence of wrapper pass-through when `AI_GATEWAY_API_KEY` is intentionally unset). Clean up any opened browser session with `close`, remove temporary files, and kill the tmux session before ending validation.
+For expanded-surface validation, the smoke prompt should cover native tool invocation rather than shelling out to `agent-browser`: `--version`, `--help`, `skills list`, `skills get core --full`, `open` with `sessionMode: "fresh"`, `snapshot -i`, `click`, top-level `semanticAction` (locator shorthand compiled to upstream `find`, optionally with `semanticAction.session` when you need the same named upstream session as a prior explicit `--session` call), `eval --stdin`, `batch` via stdin, top-level `job`, `qa`, or experimental `sourceLookup` / `networkSourceLookup` (compiled batch smoke), `screenshot <path>`, explicit `--session … open` plus `--session … close`, `network requests`, `console` / `errors`, `diff snapshot`, `stream status` plus `stream disable`, `dashboard start` plus `dashboard stop`, and `chat <message>` (credential failure is acceptable evidence of wrapper pass-through when `AI_GATEWAY_API_KEY` is intentionally unset). Clean up any opened browser session with `close`, remove temporary files, and kill the tmux session before ending validation.
 This checklist assumes a real `agent-browser` on `PATH`. It complements, but does not overlap, `npm run verify -- lifecycle`: that harness swaps in a fake upstream binary and focuses on `/reload`, full restart, `/resume`, managed-session continuity, and spill-path persistence (`scripts/verify-lifecycle.mjs`), not the full command matrix above.
+When a smoke or dogfood run fails after `sessionMode: "fresh"` (missing binary, timeout, upstream error, or **`qa`** preset reclassification), read `details.managedSessionOutcome` before assuming which managed session the next default `sessionMode: "auto"` call will follow; the same struct can appear without the extra `Managed session outcome: …` prose line on `"auto"` failures. Field-level semantics and append ordering relative to other diagnostic tails are documented in [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#details) and the session-mode notes in [`COMMAND_REFERENCE.md`](COMMAND_REFERENCE.md).
 ### Configured-source lifecycle validation
 Run the automated harness for deterministic configured-source lifecycle regression coverage (required before publish together with the other [Pre-release checks](#pre-release-checks)):

package/docs/REQUIREMENTS.md CHANGED Viewed

@@ -65,7 +65,7 @@ Define the product requirements and constraints for `pi-agent-browser-native`.
 - Each tool invocation must supply **exactly one** of: `args` (full upstream argv after the binary name) or top-level `semanticAction` (a small intent object compiled into existing upstream `find` argv). Supplying both or neither is rejected before launch (`extensions/agent-browser/index.ts`, `test/agent-browser.extension-validation.test.ts`).
 - `semanticAction` is not a nested shape inside `batch` stdin; batch steps remain upstream argv string arrays, including `find` steps expressed as token lists.
-- Supported actions, locators, exclusivity rules, and when `details.compiledSemanticAction` appears are specified in [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#semanticaction), with workflow examples in [`COMMAND_REFERENCE.md`](COMMAND_REFERENCE.md).
+- Supported actions, locators, exclusivity rules, when `details.compiledSemanticAction` appears, and bounded `try-*-candidate` follow-ups on `selector-not-found` (specific action/locator pairs only; see contract) are specified in [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#semanticaction), with workflow examples in [`COMMAND_REFERENCE.md`](COMMAND_REFERENCE.md).
 ### Documentation standard
@@ -121,6 +121,7 @@ The design should comfortably support workflows such as:
 - On local Unix launches, extension-generated session names should not fail just because the upstream default socket path is too long; the wrapper should choose a shorter socket directory when needed.
 - Provider selection flags (`-p`, `--provider`) and provider device flags (`--device`) are launch-scoped like profile, CDP, and persisted state: if an extension-managed implicit session is already active, the planner must fail fast with the same recovery guidance as other startup-scoped flags instead of silently forwarding argv upstream would ignore; contract in [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#sessionmode) and session model in [`ARCHITECTURE.md`](ARCHITECTURE.md).
 - Read-only upstream `skills list`, `skills get …`, and `skills path …` must stay free of implicit managed `--session` under default `sessionMode: "auto"` (still with `--json`), matching plain-text `--help` / `--version` inspection semantics so bundled skill text does not pin or rotate the active browser session; new `skills` subcommands pick up that behavior only after allowlisting in `extensions/agent-browser/lib/runtime.ts` with regression coverage.
+- Optional `semanticAction.session` on native `agent_browser` must compile to a leading `--session <name>` pair before upstream `find` argv so the locator shorthand can target a named upstream browser without hand-built `args`, while `buildExecutionPlan` still skips double-injecting the extension-managed implicit session whenever planned argv already starts with `--session`; stale-ref retries and bounded `try-*` candidate `nextActions` must preserve that same prefix. Contract in [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#semanticaction) / [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#sessionmode); implementation in `extensions/agent-browser/index.ts` and `extensions/agent-browser/lib/runtime.ts`.
 ## Open design questions

package/docs/SUPPORT_MATRIX.md CHANGED Viewed

@@ -63,3 +63,23 @@ Native `job`, `qa`, experimental `sourceLookup`, and experimental `networkSource
 `RQ-0067` shipped as the failed-request correlation experiment in [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#networksourcelookup): it compiles to upstream `batch` steps (`network request …` and/or `network requests --filter …`), merges `details.networkSourceLookup` after scanning batch JSON for failed requests and optional workspace URL literals, redacts query strings and credentials in model-visible surfaces, and never reclassifies an upstream-successful batch to failed solely because no candidates were found.
 `RQ-0068` closed with a no-adopt decision for reusable browser recipes. Current benchmark and repo-local dogfood evidence do not show repeated named job shapes that justify executable recipe state; examples stay in docs and prompt guidance, while the `qa` preset remains the only stable repeated smoke-test shortcut. Revisit recipes only with concrete repeated workflow evidence and a defined owner/versioning/test plan.
+`RQ-0070` adds bounded locator fallbacks when a compiled top-level `semanticAction` fails with `failureCategory: "selector-not-found"`: `extensions/agent-browser/index.ts` appends `try-*-candidate` entries to `details.nextActions` (and an `Agent-browser candidate fallbacks` block in visible text) only for `fill`+`placeholder`, `click`+`text`, or `fill`+`label` (not `select`, even with the same locators). Contract: [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#semanticaction); fake coverage: `agentBrowserExtension returns semantic locator candidates when semanticAction misses` in [`test/agent-browser.extension-validation.test.ts`](../test/agent-browser.extension-validation.test.ts).
+`RQ-0071` makes that shorthand session-aware: optional `semanticAction.session` compiles to `--session <name>` before `find`, so `buildExecutionPlan` treats the call like any argv that already names an upstream session (no extra implicit `--session`); `details.sessionName` reflects the name on success; stale-ref retries copy compiled argv with that prefix, and `try-*` candidates preserve the same `--session` prefix via `getCompiledSemanticActionSessionPrefix`. Contract: [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#semanticaction); fake coverage: `semanticAction` session compile/assertions in [`test/agent-browser.extension-validation.test.ts`](../test/agent-browser.extension-validation.test.ts).
+`RQ-0072` guards page-scoped `@e…` refs against silent recycling: successful `snapshot` (or the last `snapshot` step inside a successful `batch`) records `details.refSnapshot` with ref ids and the snapshot page URL; `extensions/agent-browser/index.ts` replays per-session snapshots from the transcript on reload/resume, clears them on successful `close`, rejects mutation-prone ref argv before spawn when the tab URL diverges or a ref id is missing from the latest snapshot, blocks `batch` stdin that uses `@e…` on a guarded command after an earlier step that can navigate or mutate until a `snapshot` step appears later in the same stdin array (pre-spawn latch reset only), and prefixes `refresh-interactive-refs` with `--session` when the call names a session (including upstream-classified `stale-ref` outcomes). Contract: [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#details) (`refSnapshot`, `stale-ref`); human workflow: [`COMMAND_REFERENCE.md`](COMMAND_REFERENCE.md) snapshot/ref notes and README pitfalls; fake coverage: `agentBrowserExtension blocks page-scoped ref reuse…`, `…blocks stale refs after page-changing steps inside a batch`, `…allows batch stdin ref steps after snapshot following an invalidating step`, `…records snapshot refs returned inside a successful batch`, and `…rejects refs absent from the latest same-page snapshot` in [`test/agent-browser.extension-validation.test.ts`](../test/agent-browser.extension-validation.test.ts).
+`RQ-0073` surfaces likely overlay blockers after no-navigation clicks without inventing blind targets: for **top-level** `click` results (unified command `click`, not `batch`-wrapped steps) whose upstream JSON includes `data.clicked`, whose prior pinned tab URL and post-click URL (from `details.navigationSummary`, normally via `get url` when the click payload omits **both** string `data.url` and `data.title`) stay equal after the same fragment-insensitive normalization used for ref preflight, and where the same unified result did **not** already apply session tab correction or about-blank mismatch recovery, `extensions/agent-browser/index.ts` takes one fresh session-scoped `snapshot -i`, scans `refs` for overlay/banner/dialog context plus up to three close/dismiss-pattern `button`/`link`/`menuitem` controls, and only then emits `details.overlayBlockers` (`candidates`, `summary`, and a `snapshot` map that can advance `refSnapshot`), visible `Possible overlay blockers`, and `inspect-overlay-state` / `try-overlay-blocker-candidate-*` next actions (with `--session` prefix when the session is named) appended after presentation follow-ups such as `inspect-after-mutation`. Contract: [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#details) (`overlayBlockers`); human workflow: [`COMMAND_REFERENCE.md`](COMMAND_REFERENCE.md) no-navigation click note and README pitfalls; fake coverage: `agentBrowserExtension surfaces likely overlay blockers after a no-op click` in [`test/agent-browser.extension-validation.test.ts`](../test/agent-browser.extension-validation.test.ts).
+`RQ-0074` warns when `get text <selector>` may read hidden or tabbed DOM content: for non-ref CSS selectors, `extensions/agent-browser/index.ts` runs a read-only `eval --stdin` visibility probe after successful text reads, emits `details.selectorTextVisibility` plus visible warning text when the first match is hidden while visible matches exist or when multiple matches make the upstream first-match choice ambiguous, preserves multiple batched warnings in `details.selectorTextVisibilityAll`, and appends `inspect-visible-text-candidates` next actions. Contract: [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#details) (`selectorTextVisibility`); human workflow: [`COMMAND_REFERENCE.md`](COMMAND_REFERENCE.md) extraction note and README pitfalls; fake coverage: `agentBrowserExtension warns when get text may read hidden selector matches` in [`test/agent-browser.extension-validation.test.ts`](../test/agent-browser.extension-validation.test.ts).
+`RQ-0075` classifies QA and diagnostic network failures by likely impact: `summarizeNetworkFailures` / `classifyNetworkRequestFailure` in `extensions/agent-browser/lib/results/shared.ts` split rows that already count as failed (`isFailedNetworkRequest`) into actionable versus benign low-impact browser icon asset misses (`isBenignAssetFailure`: favicon/apple-touch-icon basename patterns, 404/`failed`/string `error` signals, and image-like `resourceType`/`mimeType` when present). `analyzeQaPresetResults` fails `qa` only for actionable network failures while preserving benign rows in `qaPreset.warnings`, and network request presentation adds a compact actionable/benign summary plus per-row impact tags. Contract: [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#qa) and [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#details); human workflow: [`COMMAND_REFERENCE.md`](COMMAND_REFERENCE.md) QA and network diagnostic notes; fake coverage: `agentBrowserExtension compiles lightweight QA presets and fails diagnostics` in [`test/agent-browser.extension-validation.test.ts`](../test/agent-browser.extension-validation.test.ts) plus network presentation assertions in [`test/agent-browser.presentation.test.ts`](../test/agent-browser.presentation.test.ts).
+`RQ-0076` adds best-effort timeout recovery when the wrapper watchdog kills a stuck upstream process: `extensions/agent-browser/index.ts` calls `collectTimeoutPartialProgress` / `formatTimeoutPartialProgressText` to build `details.timeoutPartialProgress` from the compiled `job` or `qa` step list or parsed caller `batch` stdin, session-scoped `get url` / `get title` (plus optional planned-URL fallback from `open`/`navigate`/`pushstate` steps), and declared artifact paths (`screenshot`, `pdf`, `download`, `wait --download`) with existence/size checks, then appends a visible `Timeout partial progress` block with redacted URLs/paths. Contract: [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#details); human workflow: [`COMMAND_REFERENCE.md`](COMMAND_REFERENCE.md) wrapper timeout note and README job section; fake coverage: `agentBrowserExtension reports partial progress and artifacts after job timeout` in [`test/agent-browser.extension-validation.test.ts`](../test/agent-browser.extension-validation.test.ts).
+`RQ-0077` reports managed-session outcomes after managed-session process execution: `extensions/agent-browser/index.ts` builds `details.managedSessionOutcome` (`buildManagedSessionOutcome`), recording `status` values such as `preserved` (previous managed session remains current) or `abandoned` (no managed session became current), plus previous/current/attempted session names, optional `replacedSessionName`, and active-before/after booleans. Visible `Managed session outcome: …` text (`formatManagedSessionOutcomeText`) is appended only when `sessionMode` is `"fresh"` and the outcome’s `succeeded` is false—covering launch failures, missing-binary on a fresh plan, and post-batch failures such as **`qa`** reclassification where `succeeded` is realigned after the fact. Contract: [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#details); human workflow: [`COMMAND_REFERENCE.md`](COMMAND_REFERENCE.md) session-mode notes and README session section; fake coverage: `agentBrowserExtension reports managed-session outcomes after failed fresh launches` and the managed-session slice of `agentBrowserExtension compiles lightweight QA presets and fails diagnostics` in [`test/agent-browser.extension-validation.test.ts`](../test/agent-browser.extension-validation.test.ts).
+`RQ-0078` improves getter/eval discoverability: `extensions/agent-browser/lib/results/presentation.ts` matches upstream failure text containing `unknown command`, `unknown subcommand`, or `unrecognized command` (case-insensitive) when the failed command token is one of `attr`, `count`, `html`, `text`, `title`, `url`, or `value`, then adds grouped-`get` prose; only `title` / `url` also emit read-only `nextActions` (`use-get-title` / `use-get-url`, with `--session` when the failed call named a session). The getter block is skipped when selector recovery already injected an `Agent-browser hint:` line into the same error string. `extensions/agent-browser/index.ts` adds `details.evalStdinHint` plus visible `Eval stdin hint` when `looksLikeFunctionEvalStdin` matches trimmed stdin and upstream JSON carries an empty-object `data.result`. Contract: [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#details) (`nextActions`, `evalStdinHint`); human workflow: [`COMMAND_REFERENCE.md`](COMMAND_REFERENCE.md) extraction note and README quick start; fake coverage: `buildToolPresentation suggests grouped getter commands for common unknown getter shortcuts` and `agentBrowserExtension warns when eval stdin returns an empty object from a function-shaped snippet`.
+`RQ-0079` clarifies artifact lifecycle and cleanup ownership: `extensions/agent-browser/index.ts` adds `details.artifactCleanup` and visible `Artifact lifecycle` copy on successful `close` when `artifactManifest.entries` is non-empty (`getArtifactCleanupGuidance`), stating that close does not delete explicit artifacts; `explicitArtifactPaths` carries up to ten distinct `explicit-path` manifest paths (possibly empty when the recent window has no explicit rows). Contract: [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#details) (`artifactCleanup`); human workflow: [`COMMAND_REFERENCE.md`](COMMAND_REFERENCE.md) artifact retention section and README artifact notes; fake coverage: `agentBrowserExtension reports artifact lifecycle guidance on close`.