npm - pi-agent-browser-native - Versions diffs - 0.2.46 → 0.2.48 - Mend

pi-agent-browser-native 0.2.46 → 0.2.48

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (38) hide show

package/docs/ARCHITECTURE.md CHANGED Viewed

@@ -2,7 +2,7 @@
 Related docs:
 - [`../README.md`](../README.md)
-- [`../AGENTS.md`](../AGENTS.md) (maintainer workflows, including upstream capability baseline)
+- [`../AGENTS.md`](https://github.com/fitchmultz/pi-agent-browser-native/blob/main/AGENTS.md) (maintainer workflows, including upstream capability baseline)
 - [`REQUIREMENTS.md`](REQUIREMENTS.md)
 - [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md)
 - [`ELECTRON.md`](ELECTRON.md)
@@ -68,9 +68,9 @@ Pi docs use `settings.json` for package/resource loading and filtering, not arbi
 - project-local: `.pi/config/pi-agent-browser-native/config.json`
 - explicit override: `PI_AGENT_BROWSER_CONFIG=/path/to/config.json`
-Config layers merge in that order: global, project, override. The shared policy module (`extensions/agent-browser/lib/config-policy.js`) owns provider descriptors, environment variable names, config keys, project-local credential safety, layer validation/merge, redacted status projection, and credential summaries for both runtime config loading and the package config helper. The config reader accepts v1 fields for `webSearch.enabled`, `webSearch.preferredProvider`, `webSearch.exaApiKey`, `webSearch.braveApiKey`, and conservative browser defaults such as `browser.defaultProfile` and `browser.executablePath`. Web-search key fields follow Pi model/provider-style value resolution for trusted global/override config: literal values, `$ENV_VAR` / `${ENV_VAR}` interpolation, escapes (`$$`, `$!`), and leading `!command` resolved at request time. Project-local plaintext, interpolation-literal, malformed, and command-backed web-search keys are rejected because project config can be copied, committed, or supplied by a repository; project config may use only the matching provider env ref (`$EXA_API_KEY` / `${EXA_API_KEY}` for Exa, `$BRAVE_API_KEY` / `${BRAVE_API_KEY}` for Brave), so a repository cannot redirect search credentials to arbitrary host env vars. `EXA_API_KEY` and `BRAVE_API_KEY` remain environment fallbacks when no config credential source exists for that provider. Browser default values keep their source scope; profile/executable prompt guidance is emitted only from trusted global or explicit override config, not from project-local config that could steer host profile or executable choices.
+Config layers merge in that order: global, project, override. The shared policy module (`extensions/agent-browser/lib/config-policy.js`) owns provider descriptors, environment variable names, config keys, project-local credential safety, developer-trusted project layer inclusion, layer validation/merge, redacted status projection, and credential summaries for both runtime config loading and the package config helper. Under Pi 0.79+, globally installed or CLI-loaded extensions are developer-trusted code, so this extension reads `.pi/config/pi-agent-browser-native/config.json` by default and skips that project layer when Pi reports the project is untrusted or when launched with `--no-approve`. Global config and explicit `PI_AGENT_BROWSER_CONFIG` overrides remain available either way. The config reader accepts v1 fields for `webSearch.enabled`, `webSearch.preferredProvider`, `webSearch.exaApiKey`, `webSearch.braveApiKey`, and conservative browser defaults such as `browser.defaultProfile` and `browser.executablePath`. Web-search key fields follow Pi model/provider-style value resolution for trusted global/override config: literal values, `$ENV_VAR` / `${ENV_VAR}` interpolation, escapes (`$$`, `$!`), and leading `!command` resolved at request time. Project-local plaintext, interpolation-literal, malformed, and command-backed web-search keys are rejected because project config can be copied, committed, or supplied by a repository; project config may use only the matching provider env ref (`$EXA_API_KEY` / `${EXA_API_KEY}` for Exa, `$BRAVE_API_KEY` / `${BRAVE_API_KEY}` for Brave), so a repository cannot redirect search credentials to arbitrary host env vars. `EXA_API_KEY` and `BRAVE_API_KEY` remain environment fallbacks when no config credential source exists for that provider. Browser default values keep their source scope; profile/executable prompt guidance is emitted only from trusted global or explicit override config, not from project-local config that could steer host profile or executable choices.
-`agent_browser_web_search` registration is conditional. `webSearch.enabled: false` disables registration even when environment keys are present, but it is evaluated after config merge. A global disable is the normal user default and can still be overridden by project config or `PI_AGENT_BROWSER_CONFIG`; a project disable applies to one repo; an explicit `PI_AGENT_BROWSER_CONFIG` file with `webSearch.enabled: false` is the highest-priority hard disable for that run. Literal and env-backed sources must resolve at startup; command-backed sources are considered configured without running the command until tool execution, so secret managers do not slow startup or prompt unexpectedly. The tool resolves the selected key lazily, chooses Exa or Brave from available credentials (preferring Exa by default unless `webSearch.preferredProvider` says otherwise), then follows one provider-agnostic execution path through provider adapters for request building, HTTP JSON fetch, response normalization, and provider-specific detail fields. It calls Exa `/search` with highlights or Brave Search and returns compact result details without exposing keys.
+`agent_browser_web_search` registration is conditional. `webSearch.enabled: false` disables registration even when environment keys are present, but it is evaluated after the available config layers merge. A global disable is the normal user default and can still be overridden by project config or `PI_AGENT_BROWSER_CONFIG`; a project disable applies to one repo; an explicit `PI_AGENT_BROWSER_CONFIG` file with `webSearch.enabled: false` is the highest-priority hard disable for that run. Literal and env-backed sources must resolve at startup; command-backed sources are considered configured without running the command until tool execution, so secret managers do not slow startup or prompt unexpectedly. The tool resolves the selected key lazily, chooses Exa or Brave from available credentials (preferring Exa by default unless `webSearch.preferredProvider` says otherwise), then follows one provider-agnostic execution path through provider adapters for request building, HTTP JSON fetch, response normalization, and provider-specific detail fields. It calls Exa `/search` with highlights or Brave Search and returns compact result details without exposing keys.
 Browser default config is intentionally conservative. It can add prompt guidance for signed-in/account-specific tasks and alternate Chromium-compatible executables, but current releases do not auto-inject `--profile` or `--executable-path` into every launch. Project-local browser config is status-only for launch guidance unless the profile policy is `explicit-only`; executable guidance must come from global or explicit override config. Automatic launch-default mutation would affect privacy, browser state, and host executable choice, so it needs a separate explicit design and test pass.
@@ -85,7 +85,7 @@ Tier B guidance lives in `SHARED_BROWSER_PLAYBOOK_GUIDELINES`, generated README/
 Do **not** add reusable browser recipes as a first-class runtime surface yet.
 Current evidence does not justify another source of truth for workflows:
-- the deterministic efficiency benchmark in [`scripts/agent-browser-efficiency-benchmark.mjs`](../scripts/agent-browser-efficiency-benchmark.mjs) models one native `job` scenario (`job-open-assert-screenshot`), one `qa` preset (`qa-open-diagnostics`), one `sourceLookup` (`source-lookup-visible-element`), one `networkSourceLookup` (`network-source-lookup-failed-request`), plus deterministic `electron` lifecycle/probe scenarios (`electron-lifecycle`, `electron-probe`) rather than repeated named job patterns that agents keep re-specifying
+- the deterministic efficiency benchmark in [`scripts/agent-browser-efficiency-benchmark.mjs`](https://github.com/fitchmultz/pi-agent-browser-native/blob/main/scripts/agent-browser-efficiency-benchmark.mjs) models one native `job` scenario (`job-open-assert-screenshot`), one `qa` preset (`qa-open-diagnostics`), one `sourceLookup` (`source-lookup-visible-element`), one `networkSourceLookup` (`network-source-lookup-failed-request`), plus deterministic `electron` lifecycle/probe scenarios (`electron-lifecycle`, `electron-probe`) rather than repeated named job patterns that agents keep re-specifying
 - repo-local dogfood evidence does not show repeated project-specific job recipes that need versioning or ownership
 - `qa` already covers the only repeated smoke-test shape with a stable top-level preset
 - docs and prompt guidance can carry examples without adding recipe state, migration rules, or another schema
@@ -98,8 +98,8 @@ The published package should load from the `pi` manifest in `package.json`.
 Local checkout validation has two intentional modes:
-- **Quick isolated mode:** use explicit CLI loading such as `pi --no-extensions -e .` from the repository root. This bypasses Pi settings and extension discovery, avoids duplicate `agent_browser` registrations when another source is installed globally, and is the right mode for checkout smoke tests.
-- **Configured-source lifecycle mode:** configure exactly one active checkout or package source in Pi settings and launch plain `pi`. This is the right mode for validating `/reload` and exact-session relaunch because those lifecycle checks exercise discovered/configured resources. Focused extension harness tests validate branch-backed `session_tree` rehydration and cleanup ownership. Before shipping, maintainers also run `npm run verify -- lifecycle` (same semantics under automation, using Pi 0.78 `--session-id` to reopen the exact JSONL session) plus the live-site checks in [`RELEASE.md`](RELEASE.md#pre-release-checks); `npm publish` enforces `npm run verify -- release` via `prepublishOnly` unless scripts are skipped.
+- **Quick isolated mode:** use explicit CLI loading such as `pi --approve --no-extensions -e .` from the repository root when this checkout is intentionally trusted. This bypasses Pi settings and extension discovery, avoids duplicate `agent_browser` registrations when another source is installed globally, and is the right mode for checkout smoke tests; omit `--approve` only when deliberately testing Pi's Project Trust prompt.
+- **Configured-source lifecycle mode:** configure exactly one active checkout or package source in Pi settings and launch plain `pi` for manual validation, or run the automated harness that launches with `--approve`. This is the right mode for validating `/reload` and exact-session relaunch because those lifecycle checks exercise discovered/configured resources. Focused extension harness tests validate branch-backed `session_tree` rehydration and cleanup ownership. Before shipping, maintainers also run `npm run verify -- lifecycle` (same semantics under automation, using Pi 0.79 `--approve --session-id` to reopen the exact JSONL session) plus the live-site checks in [`RELEASE.md`](RELEASE.md#pre-release-checks); `npm publish` enforces `npm run verify -- release` via `prepublishOnly` unless scripts are skipped.
 The repo should not add a repo-local `.pi/extensions/` autoload shim as the documented checkout path.
@@ -109,7 +109,7 @@ Why:
 - keeps reload and exact-session relaunch validation tied to Pi's configured-source lifecycle instead of an isolated quick-test path, while `session_tree` state changes stay covered by focused extension harness tests
 - keeps the published tarball focused on the package manifest, extension code, canonical docs, and license
-The published package should exclude agent-only and superseded repo materials such as `AGENTS.md`, `docs/v1-tool-contract.md`, `docs/native-integration-design.md`, and other internal planning notes.
+The published package should exclude agent-only and superseded repo materials such as `AGENTS.md`, archived drafts under `docs/archive/`, and other internal planning notes.
 ## Session model
@@ -137,11 +137,11 @@ V1 ownership rule:
 - extension-managed sessions should be reusable during an active `pi` session and across `/reload`, exact-session relaunch, `/resume`, and Pi branch-tree transitions, while still being cleaned up predictably
 Practical policy:
-- preserve the current branch-visible extension-managed session across `/reload`, exact-session relaunch, `/resume`, and Pi 0.78 `session_tree` branch transitions so persisted sessions can keep following the live browser after lifecycle changes
+- preserve the current branch-visible extension-managed session across `/reload`, exact-session relaunch, `/resume`, and Pi 0.79 `session_tree` branch transitions so persisted sessions can keep following the live browser after lifecycle changes
 - close the active extension-managed session when the originating `pi` process quits, while leaving explicit caller-provided sessions alone
 - set an idle timeout on extension-managed sessions as a backstop for abnormal exits or cleanup failures
 - clean up process-private temp spill artifacts on shutdown, but keep persisted-session snapshot spill files in a private session-scoped artifact directory with a bounded per-session budget so `details.fullOutputPath` stays usable after reload/resume without unbounded growth
-- keep explicit screenshots, downloads, PDFs, traces, HAR captures, and recordings written to caller-chosen paths on disk after a successful upstream close command (`close`, `quit`, or `exit`); before artifact-producing commands run, create missing parent directories for requested host paths, and for simple loopback HTML anchor downloads with resolvable HTTP(S) hrefs the wrapper may save directly to the requested path before upstream fallback. When the bounded `details.artifactManifest` has entries, successful close commands also surface `details.artifactCleanup` and an `Artifact lifecycle` note (including up to ten distinct `explicit-path` manifest paths when present) so operators remove files with normal host tools—the native tool does not delete arbitrary user paths (`extensions/agent-browser/lib/orchestration/browser-run/diagnostics.ts`, `getArtifactCleanupGuidance`); contract in [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#details), checklist `RQ-0079` in [`SUPPORT_MATRIX.md`](SUPPORT_MATRIX.md)
+- keep explicit screenshots, downloads, PDFs, traces, HAR captures, and recordings written to caller-chosen paths on disk after a successful upstream close command (`close`, `quit`, or `exit`); before artifact-producing commands run, create missing parent directories for requested host paths, and for simple loopback HTML anchor downloads with resolvable HTTP(S) hrefs the wrapper may save directly to the requested path before upstream fallback. When the bounded `details.artifactManifest` has entries, successful close commands also surface `details.artifactCleanup` and a compact `Artifact lifecycle` note pointing to structured explicit paths so operators remove files with normal host tools—the native tool does not delete arbitrary user paths (`extensions/agent-browser/lib/orchestration/browser-run/diagnostics.ts`, `getArtifactCleanupGuidance`); contract in [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#details), checklist `RQ-0079` in [`SUPPORT_MATRIX.md`](SUPPORT_MATRIX.md)
 - reconstruct the current branch-visible extension-managed session, page-scoped refs, artifact manifest, and Electron launch records from the active transcript branch on `session_start` and `session_tree` so later default calls keep following the active managed browser after resume/reload or branch switching; restore also honors successful explicit `--session <wrapper-owned> close` rows and `electron.cleanup` managed-session steps so closed wrapper-owned sessions are not resurrected
 - keep process-owned cleanup registries for extension-managed sessions and wrapper-launched Electron records separate from the current branch-visible view; `session_tree` restore and wrapper-owned browser commands are serialized with managed-session work, while independent caller-owned explicit-session commands keep their parallel tab-target behavior but use a branch-state generation guard so stale completions cannot overwrite newer branch-visible managed/artifact state after a branch switch; branch switches still must not drop resources the current Pi process owns and must keep fresh-session allocation monotonic
 - when a successful close targets the current extension-managed session, including an explicit `--session <current> close` or an `electron.cleanup` managed-session step, clear page/ref state, mark that session inactive, untrack cleanup ownership, and rotate the next default auto call to a fresh wrapper-generated session name rather than reusing the closed name
@@ -153,11 +153,11 @@ Practical policy:
 - once the wrapper observes tab-drift risk for a session (profile restore correction, overlapping stale opens, or restored session state), later active-tab commands may synthesize a tiny upstream `batch` that re-selects that tab and then runs the requested command in the same upstream invocation; routine same-session commands avoid `tab list` preflights to reduce probes that can perturb upstream click behavior
 - for sessions with observed tab-drift risk, after a successful command on a known tab target, the wrapper may best-effort restore that same target again if restored/background tabs steal focus after the command returns; routine same-session commands skip this post-command `tab list` probe
 - keep a per-session `refSnapshot` aligned with the last successful `snapshot` (including refs merged from a successful `batch` by taking the last successful `snapshot` step in batch result order): restore it from persisted tool `details` when reloading, resuming, or moving to a different Pi session-tree branch, store bounded ref role/name metadata from the same snapshot for wrapper-side current-ref diagnostics, drop it on successful close commands (`close`, `quit`, or `exit`), and refuse mutation-prone `@e…` argv before spawn when the active tab URL no longer matches the snapshot URL, when a ref id was never in that snapshot, or when `batch` stdin would reuse `@e…` on a guarded step after an earlier invalidating step without a later `snapshot` step in the same stdin array. Same-snapshot `fill @e…` rows are guarded but do not themselves set that invalidation latch, so ordinary form fills can precede a click/submit row in one batch—see [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#details) for the agent-visible contract and failure text; typed per-session tab/ref/pinning state lives in `extensions/agent-browser/lib/session-page-state.ts` and is updated from `extensions/agent-browser/index.ts` after each tool result
-- for top-level non-Electron `click` commands, install a bounded in-page event probe before upstream runs; if upstream reports success but no trusted pointer/mouse/click event reached the target, fail the tool and report `details.clickDispatch` with explicit retry/inspect next actions (the wrapper does not replay clicks in-page). The probe is intentionally skipped for `batch`/`job`/`qa` click steps. For `@e…` targets it uses the stored `refSnapshot.refs` metadata above instead of taking a fresh pre-click snapshot that could recycle upstream refs
+- for top-level non-Electron direct `click` commands, install a bounded in-page target-specific event probe before upstream runs; if upstream reports success but no trusted pointer/mouse/click event reached the resolved target, fail the tool and report `details.clickDispatch` with explicit retry/inspect next actions (the wrapper does not replay clicks in-page). The probe is intentionally skipped for unresolved `find … click` locators and for `batch`/`job`/`qa` click steps. For `@e…` targets it only covers current refs whose latest stored `refSnapshot.refs` role is `button`, `checkbox`, `menuitem`, `radio`, `switch`, or `tab`; it uses that role/name metadata, including snapshot-order `duplicateIndex` for duplicate-name refs, instead of taking a fresh pre-click snapshot that could recycle upstream refs
 - derive narrow prompt guards only for concrete evidence invariants: exact required screenshot paths block browser close until the artifact manifest verifies those paths. The wrapper intentionally does not infer broad business/user intent from prompt text such as order/payment/post boundaries; agents must follow those instructions themselves. The artifact guard is bounded preflight policy (`details.promptGuard`, `failureCategory: "policy-blocked"`), not a reusable browser recipe layer
 - after successful `get text` on a non-ref CSS selector, optionally issue one read-only `eval --stdin` probe per qualifying selector when multiple DOM matches or a hidden first match with visible peers could misread tabbed or off-screen content; merge `details.selectorTextVisibility` / `selectorTextVisibilityAll`, visible warning lines, and `inspect-visible-text-candidates*` next actions as documented in [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#details) and `RQ-0074` in [`SUPPORT_MATRIX.md`](SUPPORT_MATRIX.md)
 - for local Unix launches, set a short private socket directory so extension-generated session names do not fail on the upstream Unix socket-path length limit
-- keep wrapper-spawned upstream CLI calls inside the upstream IPC budget by clamping `AGENT_BROWSER_DEFAULT_TIMEOUT` to 25 seconds and stopping a stuck child process before the upstream 30-second read-timeout retry loop begins; dialog commands, likely dialog-trigger clicks/taps/finds, and `eval --stdin` snippets that look like alert/confirm/prompt/dialog triggers use shorter wrapper subprocess budgets so blocking JavaScript prompts surface recovery actions before the full default watchdog
+- keep wrapper-spawned upstream CLI calls bounded by clamping `AGENT_BROWSER_DEFAULT_TIMEOUT` to the upstream documented 25-second default while deriving a longer subprocess watchdog for explicit long `wait <ms>` / `wait --timeout <ms>` calls; dialog commands, likely dialog-trigger clicks/taps/finds, and `eval --stdin` snippets that look like alert/confirm/prompt/dialog triggers use shorter wrapper subprocess budgets so blocking JavaScript prompts surface recovery actions before the full default watchdog
 This is primarily about ownership clarity and avoiding surprise, not adding a heavy safety wrapper. If the extension invented the session, the extension should own its lifecycle without breaking reload, resume, or branch-tree semantics. If the caller explicitly chose the upstream session model, the extension should stay out of the way.
@@ -177,7 +177,7 @@ That failure should include a structured recovery hint pointing to `sessionMode:
 Implementation detail lives in `extensions/agent-browser/lib/launch-scoped-flags.ts` (canonical flag metadata shared with playbook/docs assertions), `extensions/agent-browser/lib/argv-descriptor.ts` and `extensions/agent-browser/lib/argv-grammar.ts` (command discovery, `VALUE_FLAGS`, `parseArgvDescriptor`) plus `extensions/agent-browser/lib/runtime.ts` (`getStartupScopedFlags`, `buildExecutionPlan`):
 - **Command discovery:** Leading argv is scanned with a value-taking allowlist so known global flags and documented command flags consume their values before the upstream command word is identified. Missing-value prevalidation is intentionally limited to upstream global value flags; command-scoped flags and literal text are left to upstream parsing so values like `fill #field --password` are not rejected by wrapper heuristics before the CLI sees them. When upstream adds new global flags that take values ahead of the command, extend both the command-discovery and prevalidation allowlists; when it adds command-specific flags, extend only command discovery/redaction as needed. A smaller set of global boolean flags may be followed by an optional `true`/`false` literal; when present, that literal is consumed as the flag value before command discovery continues.
-- **`--state` disambiguation:** Persisted browser `--state` before the command participates in launch-scoped validation and tab-correction hints. The same flag spelling after a `wait` command is excluded from startup-scoped detection so upstream help examples such as `wait @ref --state hidden` do not spuriously require `sessionMode: "fresh"` while an implicit session is active. As of upstream `agent-browser 0.27.1`, the parser does not implement those `wait --state` examples as distinct wait modes, so agent-facing docs recommend `wait --fn` predicates for disappearance checks instead.
+- **`--state` disambiguation:** Persisted browser `--state` before the command participates in launch-scoped validation and tab-correction hints. The same flag spelling after a `wait` command is excluded from startup-scoped detection so upstream help examples such as `wait @ref --state hidden` do not spuriously require `sessionMode: "fresh"` while an implicit session is active. As of upstream `agent-browser 0.27.2`, the parser still does not implement those `wait --state` examples as distinct wait modes, so agent-facing docs recommend `wait --fn` predicates for disappearance checks instead.
 - **`--auto-connect`:** Treated as launch-scoped only when enabled (`--auto-connect` bare or `true`). `--auto-connect false` is ignored for startup-scoped blocking so disabled attach hints do not force a fresh launch.
 **Sessionless inspection and local commands:** Plain-text global help and version probes (`--help`, `-h`, `--version`, `-V`) must never allocate or bind the extension-managed session. The same session-ownership rule applies to read-only upstream `skills list`, `skills get …`, and `skills path …`, local auth profile management (`auth save/list/show/delete/remove`), plus local/setup surfaces such as `profiles`, `dashboard start/stop`, `device list`, `doctor`, `install`, `upgrade`, `session list`, and targeted/all local saved-state maintenance (`state list/show`, `state clear --all`, `state clear -a`, `state clear <session-name>`, `state clean --older-than <days>`, `state rename`). Non-plain-text sessionless commands still run with `--json` for machine-readable output, but the planner does not prepend the implicit managed `--session`, so an agent can inspect local capabilities or start/stop the standalone dashboard without consuming the implicit session slot before a real `open`. Browser-backed, context-dependent, or incomplete commands such as root `session`, untargeted `state clear`, bare `state clean`, `auth login`, `state save`, and `state load` keep normal managed-session injection. Command-shape allowlisting lives in `extensions/agent-browser/lib/command-policy.ts` (`needsManagedSession`), while `extensions/agent-browser/lib/runtime.ts` (`isPlainTextInspectionArgs`, `buildExecutionPlan`) applies that decision to execution planning.
@@ -206,7 +206,7 @@ This keeps the product centered on native tool usage instead of auxiliary skill
 - inline screenshots/images for the plain `screenshot` command; other image-like saves (for example `diff screenshot`) still appear in `details.artifacts` and summaries but are not auto-inlined as Pi image attachments (see [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#details))
 - lightweight session convenience
 - docs, including a repo-readable command reference that mirrors the blocked direct-binary help path closely enough for normal agent work
-- a deterministic **agent efficiency benchmark** (`scripts/agent-browser-efficiency-benchmark.mjs`) used to quantify representative agent-facing workflows without invoking upstream; maintainer commands and constraints are in [`AGENTS.md`](../AGENTS.md) under “Agent browser efficiency benchmark”
+- a deterministic **agent efficiency benchmark** (`scripts/agent-browser-efficiency-benchmark.mjs`) used to quantify representative agent-facing workflows without invoking upstream; maintainer commands and constraints are in [`AGENTS.md`](https://github.com/fitchmultz/pi-agent-browser-native/blob/main/AGENTS.md) under “Agent browser efficiency benchmark”
 ### Upstream `agent-browser` owns
@@ -226,7 +226,7 @@ The extension does not ship `agent-browser`, but it does ship maintainer-owned d
 3. **Live help verification** is `scripts/verify-command-reference.mjs`, invoked via `npm run verify -- command-reference` (and included in the default `npm run verify` gate). It runs the baseline’s help commands against `agent-browser` on `PATH` and fails when the installed upstream surface does not match the declared target version or expected tokens.
-This mirrors the playbook contract pattern described in [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md): canonical TypeScript source and Markdown fragments stay paired through `npm run docs` / `npm run verify`, with deeper step-by-step notes in [`AGENTS.md`](../AGENTS.md), release checklist items in [`RELEASE.md`](RELEASE.md), and the baseline inventory-to-gates matrix in [`SUPPORT_MATRIX.md`](SUPPORT_MATRIX.md).
+This mirrors the playbook contract pattern described in [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md): canonical TypeScript source and Markdown fragments stay paired through `npm run docs` / `npm run verify`, with deeper step-by-step notes in [`AGENTS.md`](https://github.com/fitchmultz/pi-agent-browser-native/blob/main/AGENTS.md), release checklist items in [`RELEASE.md`](RELEASE.md), and the baseline inventory-to-gates matrix in [`SUPPORT_MATRIX.md`](SUPPORT_MATRIX.md).
 ## Not the right design

package/docs/COMMAND_REFERENCE.md CHANGED Viewed

@@ -18,13 +18,24 @@ This project intentionally blocks normal `agent-browser` bash usage in most agen
 <!-- agent-browser-capability-baseline:start upstream-baseline -->
 <!-- Generated from scripts/agent-browser-capability-baseline.mjs. Run `npm run docs -- command-reference write` to update. Do not edit manually. -->
-This reference is baselined to the locally installed `agent-browser 0.27.1` command/help surface, audited against vercel-labs/agent-browser@90050f2913159875e2c3719e424746396ccb3cbf. Upstream `agent-browser` remains the source of truth for command semantics; this file is the local fallback for Pi agent sessions where direct binary help is blocked or discouraged.
+This reference is baselined to the locally installed `agent-browser 0.27.2` command/help surface, audited against vercel-labs/agent-browser@5185339ca3fdab9848e11b8ec676eecfdec3733f. Upstream `agent-browser` remains the source of truth for command semantics; this file is the local fallback for Pi agent sessions where direct binary help is blocked or discouraged.
 The lightweight drift check is `npm run verify -- command-reference`. Run it whenever the installed upstream `agent-browser` version changes or this reference is edited.
 Use `npm run benchmark:agent-browser` or `npm run verify -- benchmark` before and after agent-facing workflow abstractions to measure task success, tool calls, model-visible output size, stale-ref behavior, artifact success, failure-category coverage, and elapsed-time estimates.
 <!-- agent-browser-capability-baseline:end upstream-baseline -->
+### Upstream 0.27.2 changelog support
+The 0.27.2 rebaseline is a passthrough-first compatibility update, not a compatibility shim for older upstream releases. The wrapper must not hide these upstream fixes:
+- click reliability: upstream now scrolls off-viewport elements before coordinate resolution, handles JavaScript dialogs promptly, recovers mouse state after dialog-opening clicks, and reports overlay interception before dispatching input
+- frame-scoped CSS selectors and waits, including cross-process iframe click-coordinate translation
+- wait timeout handling: documented 25s default, honored `--timeout` across wait variants, and appropriate client read budgets for long waits; the native wrapper forwards explicit long waits and derives a subprocess watchdog when top-level `timeoutMs` is omitted
+- form commands: `find label` matches `aria-label` / `aria-labelledby`, `select` errors when no option matches, and `type` parses `--clear` / `--delay` instead of typing them as literal text
+- warm CLI command latency and batch daemon respawn/retry improvements
+- GNU Linux release artifacts pinned to glibc 2.28
 ## Core mental model
 Input mode chooser (one per call): **`args`** for the default open → snapshot -i → click/fill `@refs` flow; **`semanticAction`** for stable role/text/label targets; **`job`** / **`qa`** for multi-step checks; **`electron`** for desktop apps only; **`sourceLookup`** / **`networkSourceLookup`** are **experimental candidates-only** helpers (not authoritative mappings). Do not pass `--json` in `args`—the wrapper injects it. Match link and button text to the latest snapshot (on `https://example.com/` the main link is `Learn more`, not legacy `More information...` copy). See [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#input-mode-chooser) for snapshot variants (`-i` vs `--compact` vs full) and batching three or more getters.
@@ -64,7 +75,7 @@ Tool parameters (use exactly one of `args`, `semanticAction`, `job`, `qa`, `sour
 - `args`: exact `agent-browser` CLI tokens after the binary name. Omit when using `semanticAction`, `job`, `qa`, `sourceLookup`, `networkSourceLookup`, or `electron` instead (mutually exclusive).
 - `semanticAction`: optional shorthand for common `find` flows, direct selector/ref click/check/fill, and native dropdown `select`; compiles to upstream argv and is rejected together with `args`, `job`, `qa`, `sourceLookup`, `networkSourceLookup`, or `electron` on the same call.
-- `job`: optional constrained short-workflow schema; compiles to existing upstream `batch` args/stdin, defaults to `batch --bail` (`failFast: true`), and reports the compiled plan in `details.compiledJob`.
+- `job`: optional constrained short-workflow schema; compiles to existing upstream `batch` args/stdin, defaults to `batch --bail` (`failFast: true`), and reports the compiled plan in `details.compiledJob`. Keep stateful jobs short around navigation, click, and rerender boundaries on dynamic apps.
 - `qa`: optional lightweight QA preset; compiles to the same fail-fast batch path and reports `details.compiledQaPreset` plus `details.qaPreset` pass/fail evidence.
 - `sourceLookup`: **EXPERIMENTAL — candidates only** for local UI-to-source hints; compiles to the same `batch` path, reports `details.compiledSourceLookup` and `details.sourceLookup`, and never reclassifies a fully successful upstream batch as failed the way `qa` can (see [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#sourcelookup) and the longer notes below).
 - `networkSourceLookup`: **EXPERIMENTAL — candidates only** for failed request-to-source hints; compiles to generated `batch`, reports `details.compiledNetworkSourceLookup` and `details.networkSourceLookup`, and never assigns blame or edits files.
@@ -107,7 +118,7 @@ Treat headed success as browser-context success, not proof that a window is visi
 For local fixtures, remember that `localhost` and `127.0.0.1` are resolved from the browser host, which may differ from the shell that started a temporary HTTP server. `net::ERR_EMPTY_RESPONSE` on `http://localhost:<port>` usually means the browser could not reach that server, not that the page itself rendered blank; the wrapper appends a local fixture hint for common loopback navigation failures. Prefer a host-reachable address when your environment provides one; otherwise use `file://` only for static fixtures and note its limits. `file://` does not provide HTTP headers and may change MIME/CORS/storage/debugger behavior. If `eval --stdin` on a `file://` page returns `null` for even simple DOM expressions, first make sure the JavaScript is in the native tool `stdin` field rather than trailing after `--stdin` in `args`; then treat the result as inconclusive and verify with `snapshot -i`, `get text` on current refs, or screenshots until the fixture can run over reachable HTTP.
-Temporary HTTP servers and their port/process lifecycle stay outside the native tool. Extension maintainers running real-upstream contract tests can reuse `startAgentBrowserContractFixtureServer()` in [`test/helpers/agent-browser-harness.ts`](../test/helpers/agent-browser-harness.ts) instead of ad-hoc `python3 -m http.server` processes.
+Temporary HTTP servers and their port/process lifecycle stay outside the native tool. Extension maintainers running real-upstream contract tests can reuse `startAgentBrowserContractFixtureServer()` in [`test/helpers/agent-browser-harness.ts`](https://github.com/fitchmultz/pi-agent-browser-native/blob/main/test/helpers/agent-browser-harness.ts) instead of ad-hoc `python3 -m http.server` processes.
 ### React, SPA, and Web Vitals flows
@@ -129,7 +140,7 @@ Use `vitals [url]` for Core Web Vitals plus React hydration timing when availabl
 { "args": ["pushstate", "/dashboard?tab=settings"] }
 ```
-For first-navigation setup, start on `about:blank`, then stage routes, cookies, or init scripts before navigating. The relevant v0.27.1 surfaces are `network route <url> [--abort|--body <json>] [--resource-type <csv>]` and `cookies set --curl <file>`:
+For first-navigation setup, start on `about:blank`, then stage routes, cookies, or init scripts before navigating. The relevant v0.27.2 surfaces are `network route <url> [--abort|--body <json>] [--resource-type <csv>]` and `cookies set --curl <file>`:
 ```json
 { "args": ["open"], "sessionMode": "fresh" }
@@ -172,7 +183,7 @@ Do not assume Playwright selector dialects such as `text=Close` or `button:has-t
 Treat `@e…` refs as page-scoped. After a successful `snapshot`, the wrapper records the latest refs and page target for that session; mutation-prone ref commands such as non-form `click @e4`, `select @e5 chocolate`, or batch steps with old refs fail with `failureCategory: "stale-ref"` when the page target changed or the ref is absent from the latest same-page snapshot. If a session `snapshot -i` fails with `No active page`, the wrapper invalidates prior refs for that session; later mutation-prone `@e…` calls fail before upstream until a successful fresh `snapshot -i` records refs again. Inside `batch` stdin JSON, the wrapper also walks steps in order before spawn: steps whose first token can navigate or mutate set a latch; a later step whose first token is `snapshot` clears that latch for following rows; guarded steps that still mention `@e…` after an uncleared latch fail with the same `stale-ref` bucket without launching upstream. Same-snapshot form fills and native form-control steps are allowed before a click or submit step, so `fill`, `check`/`uncheck` checkbox or radio refs, checkbox/radio `click`/`tap` refs, `select` combobox refs, then a final submit `click` can run from one snapshot. Split dynamic or autosubmit forms with a fresh snapshot if a control interaction rerenders the targets. Follow the `refresh-interactive-refs` next action (it includes `--session <name>` when needed) and prefer stable `find` or `semanticAction` locators when navigation or rerendering is likely. Contract detail: [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#details) (`refSnapshot`, `refSnapshotInvalidation`).
-A successful `click` result means upstream reported a target, not that the app definitely handled the event. For top-level non-Electron clicks, the wrapper installs a bounded DOM-event probe; when upstream reports success but no trusted event reaches the target, it fails the tool and exposes `details.clickDispatch` plus a `Click dispatch diagnostic` line with explicit retry/inspect next actions (no in-page click replay). If the probe evidence shows the target is outside a nested scroll container or viewport, `details.clickDispatch.scrollContainer` and `scroll-target-into-view-after-dispatch-miss` point to `scrollintoview <target>` before retry. When the workflow depends on a mutation, use `details.pageChangeSummary`, a wait, URL/text extraction, or a fresh `snapshot -i` before trusting the state; if nothing changed, retry with a current visible ref or stable selector and report the workflow issue. For static local fixtures or debugging where the user explicitly accepts scripted activation, `eval --stdin` can call `document.querySelector(...).click()` to exercise inline handlers and app code; treat that as an untrusted programmatic event, not as evidence that CDP/user-like clicking works. Respect explicit user stop boundaries yourself: if the user says to stop before a final order, post, purchase, or submit action, gather evidence from that page and do not click the final action or use scripted activation to bypass the stop. The wrapper does not infer broad business intent from prompt text; `details.promptGuard` is reserved for concrete artifact-before-close checks. `press`, `key`, `keydown`, and `keyup` accept exactly one key token; focus or click the target first, then run `press Enter` or another single-key command.
+A successful `click` result means upstream reported a target, not that the app definitely handled the event. For top-level non-Electron direct clicks on selectors, `xpath=` targets, and eligible current `@e…` refs, the wrapper installs a bounded target-specific DOM-event probe when it can; when upstream reports success but no trusted event reaches the resolved target, it fails the tool and exposes `details.clickDispatch` plus a `Click dispatch diagnostic` line with explicit retry/inspect next actions (no in-page click replay). Raw `find … click` locator calls are not probed because the wrapper has no concrete element before upstream resolves the locator, and document-level probes can falsely fail frame-scoped clicks. Direct `@e…` click probes are role-gated to current snapshot refs whose accessible role is `button`, `checkbox`, `menuitem`, `radio`, `switch`, or `tab`; duplicate names use snapshot order. If the probe evidence shows the target is outside a nested scroll container or viewport, `details.clickDispatch.scrollContainer` and `scroll-target-into-view-after-dispatch-miss` point to `scrollintoview <target>` before retry. When the workflow depends on a mutation, use `details.pageChangeSummary`, a wait, URL/text extraction, or a fresh `snapshot -i` before trusting the state; if nothing changed, retry with a current visible ref or stable selector and report the workflow issue. For static local fixtures or debugging where the user explicitly accepts scripted activation, `eval --stdin` can call `document.querySelector(...).click()` to exercise inline handlers and app code; treat that as an untrusted programmatic event, not as evidence that CDP/user-like clicking works. Respect explicit user stop boundaries yourself: if the user says to stop before a final order, post, purchase, or submit action, gather evidence from that page and do not click the final action or use scripted activation to bypass the stop. The wrapper does not infer broad business intent from prompt text; `details.promptGuard` is reserved for concrete artifact-before-close checks. `press`, `key`, `keydown`, and `keyup` accept exactly one key token; focus or click the target first, then run `press Enter` or another single-key command.
 Successful `snapshot -i` results can also surface `Possible overlay blockers` when their own refs already show dialog/alertdialog context plus close/dismiss controls, so agents can detect likely obstruction before clicking. When a **top-level** `click` succeeds (not a `click` hidden inside a `batch`/`job` tool call—the unified command must be `click`), the upstream payload includes `data.clicked`, no `details.clickDispatch` diagnostic fired for the same result, and the wrapper sees the active tab URL unchanged after the same normalization it uses for ref guards (**`#fragment` ignored**), it may run one extra `snapshot -i` and surface `Possible overlay blockers` plus `details.overlayBlockers` (`candidates`, `summary`, and a `snapshot` map that can refresh `refSnapshot`) when that snapshot shows strong modal context (`dialog` / `alertdialog`) **and** up to three close/dismiss-like controls; page-wide words such as privacy, sign in, or banner alone do not trigger it. The URL check compares the session’s prior pinned tab target to `details.navigationSummary.url` after the click; that summary is gathered with one read-only `eval` when the click JSON omits **both** string `data.url` and `data.title`—if upstream already echoes either field, overlay diagnostics are skipped on this path. The diagnostic is skipped if the wrapper already applied tab-focus correction or about-blank recovery on that result. Appended `inspect-overlay-state` / `try-overlay-blocker-candidate-*` entries in `details.nextActions` include `--session <name>` when the session is named, same as other session-scoped follow-ups. Treat `inspect-overlay-state` as the safe first follow-up; only use a `try-overlay-blocker-candidate-*` next action when the candidate is clearly the control you intend to close.
@@ -207,7 +218,7 @@ Use `batch --bail` when later steps should stop after the first failed command.
 For short constrained flows, use top-level `job` instead of hand-writing `batch` stdin. Supported job steps are `open`, `click`, `fill`, `type`, `select`, `wait`, `assertText`, `assertUrl`, `waitForDownload`, `snapshot`, and `screenshot`. `open` can include `loadState: "domcontentloaded" | "load" | "networkidle"` to insert a `wait --load …` row immediately after navigation before the next click/read step. `click` and `fill` accept either a stable `selector` or the same semantic locator fields as top-level `semanticAction` (`locator`, plus `role`/`name` or `value` as appropriate) and compile locator steps to upstream `find` argv. `type` focuses an optional selector, sends text through upstream keyboard typing, can insert `wait` rows via `delayMs` for human-paced input, and can append a final `press` key such as `Enter`; delayed typing is capped at 200 characters per step, and generated per-character rows are compacted in model-visible batch text while remaining available in `details.batchSteps`. `select` requires `selector` plus `value` or `values`, and compiles to upstream `select <selector> <value...>`. By default the wrapper compiles steps to upstream `batch --bail` so a failed setup/fill/assertion step stops later mutating clicks; set `failFast: false` only when you explicitly need continue-after-error diagnostics. The wrapper records `details.compiledJob.steps[]` plus `details.compiledJob.failFast`. There is still no separate first-class catalog of reusable named browser recipes above `job`, the `qa` preset, and raw `batch`; see [`ARCHITECTURE.md`](ARCHITECTURE.md#no-reusable-recipe-layer-yet) for the closed `RQ-0068` decision and revisit bar.
-**Job navigation is explicit.** A `click` step (or other navigation-prone interaction) does not prove the next page loaded. The wrapper does not auto-insert `assertUrl` or `assertText` after clicks inside `job`; add those steps yourself with the URL pattern or on-page text you expect, especially after forms, checkout, tabs, or submit buttons, before screenshots or later steps.
+**Job navigation is explicit.** A `click` step (or other navigation-prone interaction) does not prove the next page loaded. The wrapper does not auto-insert `assertUrl` or `assertText` after clicks inside `job`; add those steps yourself with the exact URL, a `*` / `**` glob-style URL pattern, or on-page text you expect, especially after forms, checkout, tabs, or submit buttons, before screenshots or later steps. Exact `assertUrl` values without `*` compile to `wait --url` unchanged, including query strings and literal `?`. Glob-style values compile to a `wait --fn` predicate: single `*` matches within one path segment only, while `**` or longer star runs match across `/`; regex metacharacters such as `.`, `?`, `+`, `[`, `]`, and `$` stay literal. Literal `*` exact URLs are not supported by `assertUrl`; use raw `wait --url` only after verifying upstream behavior. Do not put a whole dynamic checkout into one long job: split around login, sorting/cart mutations, checkout navigation, and final evidence capture so refs and app state can be rechecked between phases. Glob-style `assertUrl` values compile this way so `**/shipping` works even when upstream `wait --url` pattern matching is narrower than its help text implies.
 ```json
 {
@@ -327,13 +338,13 @@ Top-level `networkSourceLookup` does the same for failed browser requests. When
 ```json
 { "args": ["wait", "--load", "networkidle"] }
-{ "args": ["wait", "--url", "**/dashboard"] }
+{ "args": ["wait", "--url", "https://app.example/dashboard"] }
 { "args": ["wait", "--download", "/tmp/report.pdf"] }
 ```
 Do not omit the load state value; use `wait --load <state>` with `load`, `domcontentloaded`, or `networkidle`.
-For desktop-host readiness, prefer condition waits over fixed sleeps. Use this ladder: `wait --text` / `wait --url` / `wait --fn` / `wait --load <state>` / `wait --download` when a real condition exists; after raw `connect`, run `tab list` → `tab t<N>` → condition wait or `snapshot -i`; after wrapper-owned `electron.launch`, use `electron.probe` / `electron.status` for launch health or target mismatch; use `qa.attached` when expected text or selector plus diagnostics can express the check. Fixed waits are a last resort: `wait 30000` is intentionally blocked by the wrapper IPC budget, and a successful fixed-wait payload such as `"waited":"timeout"` means elapsed time only, not proof that the desktop host finished. Verify with an observed condition, fresh snapshot, or screenshot before continuing.
+For desktop-host readiness, prefer condition waits over fixed sleeps. Use this ladder: `wait --text` / exact `wait --url` / `wait --fn` / `wait --load <state>` / `wait --download` when a real condition exists; after raw `connect`, run `tab list` → `tab t<N>` → condition wait or `snapshot -i`; after wrapper-owned `electron.launch`, use `electron.probe` / `electron.status` for launch health or target mismatch; use `qa.attached` when expected text or selector plus diagnostics can express the check. Upstream help labels `wait --url` as a pattern matcher, but dogfood found glob forms such as `**/learn` can time out on the current baseline; use an exact URL there, or use `job.assertUrl` for `*` / `**` glob-style matching. Fixed waits are a last resort: use explicit `--timeout` or top-level `timeoutMs` for legitimately slow waits, and treat a successful fixed-wait payload such as `"waited":"timeout"` as elapsed time only, not proof that the desktop host finished. Verify with an observed condition, fresh snapshot, or screenshot before continuing.
 Use `wait --download [path]` after an earlier action has already started a browser download, such as a dashboard export button that responds asynchronously:
@@ -348,7 +359,7 @@ For one-call flows, put the click and wait in `batch`; the wait step keeps the s
 { "args": ["batch"], "stdin": "[[\"click\",\"@export\"],[\"wait\",\"--download\",\"/tmp/report.csv\"]]" }
 ```
-A successful wait-based download renders a readable summary such as `Download completed: /tmp/report.csv` and exposes top-level `details.savedFilePath` plus `details.savedFile` for non-batch calls. With the current upstream `agent-browser 0.27.1`, `wait --download <path>` may report the requested path before this environment can verify that the file was persisted there. Treat `details.savedFilePath` as upstream-reported metadata unless `details.artifacts[].exists` is true. Upstream tracking: [vercel-labs/agent-browser#1300](https://github.com/vercel-labs/agent-browser/issues/1300).
+A successful wait-based download renders a readable summary such as `Download completed: /tmp/report.csv` and exposes top-level `details.savedFilePath` plus `details.savedFile` for non-batch calls. With the current upstream `agent-browser 0.27.2`, `wait --download <path>` may report the requested path before this environment can verify that the file was persisted there. Treat `details.savedFilePath` as upstream-reported metadata unless `details.artifacts[].exists` is true. Upstream tracking: [vercel-labs/agent-browser#1300](https://github.com/vercel-labs/agent-browser/issues/1300).
 ### Download, screenshot, and PDF files
@@ -388,7 +399,7 @@ The wrapper keeps a bounded, metadata-only `details.artifactManifest` of recent
 This manifest cap controls what appears in `details.artifactManifest` and in summaries such as `Session artifacts: 42 live, 0 evicted (42/100 recent)`. It does not delete explicit files that upstream saved to paths you chose, such as screenshots, PDFs, downloads, traces, HAR files, or WebM recordings.
-Browser close commands (`close`, `quit`, or `exit`) are also not file cleanup. If `details.artifactManifest` is present with a non-empty `entries` list, a successful close command appends an `Artifact lifecycle` note and reports `details.artifactCleanup` with the current retention summary and the same host-owned cleanup `note` as the contract (`extensions/agent-browser/lib/orchestration/browser-run/diagnostics.ts`, `getArtifactCleanupGuidance`). Up to ten distinct user-chosen paths that still exist on disk appear in `explicitArtifactPaths` when matching `explicit-path` manifest rows exist in the recent window; deleted/stale paths are skipped. Otherwise that array is empty and visible text may omit the “Explicit artifact paths” line even though the lifecycle block still reminds you that close commands do not delete saved files. Delete any paths you care about with host file tools after inspection; the native browser tool intentionally does not remove arbitrary user-chosen filesystem paths.
+Browser close commands (`close`, `quit`, or `exit`) are also not file cleanup. If `details.artifactManifest` is present with a non-empty `entries` list, a successful close command appends a compact `Artifact lifecycle` note and reports `details.artifactCleanup` with the current retention summary and the same host-owned cleanup `note` as the contract (`extensions/agent-browser/lib/orchestration/browser-run/diagnostics.ts`, `getArtifactCleanupGuidance`). Up to ten distinct user-chosen paths that still exist on disk appear in `explicitArtifactPaths` when matching `explicit-path` manifest rows exist in the recent window; deleted/stale paths are skipped. Otherwise that array is empty and the visible text stays compact while the structured detail still reminds you that close commands do not delete saved files. Delete any paths you care about with host file tools after inspection; the native browser tool intentionally does not remove arbitrary user-chosen filesystem paths.
 Oversized snapshots and oversized generic outputs are different: when a persisted pi session is available, their wrapper-managed spill files are stored under the private session artifact directory and are governed by the byte budget `PI_AGENT_BROWSER_SESSION_ARTIFACT_MAX_BYTES` (default 32 MiB). Raise that byte budget as well for long QA sessions that need many full raw snapshots or large text spills to survive reload/resume.
@@ -613,22 +624,22 @@ Stable tab ids look like `t1`, `t2`, and `t3`. Optional user labels such as `doc
 When a snapshot is too large for inline output, the Pi wrapper renders a compact view before spilling the full raw snapshot to `details.fullOutputPath`. Compact snapshots are main-content-first, but dense pages and desktop host screens can still hide actionable controls in omitted content; scan `Omitted high-value controls` before opening the spill file. That bounded section favors editable/searchbox/textbox/combobox controls, named tab/surface controls, primary action buttons, and high-signal named links such as repository search results, then includes other useful controls such as checkboxes, radios, options, and menuitems that were not already listed under key refs or other refs. When that section appears, `details.data.highValueControlRefIds` repeats the same visible ref ids for programmatic follow-up alongside fields such as `previewMode`, `previewSections`, and counts on `details.data` (see [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#details)).
-For dense pages, the wrapper also accepts `snapshot -i --search <text>` and `snapshot -i --filter role=<role>` as wrapper-side filters. It runs upstream `snapshot` without those wrapper-only flags, records the full returned ref map in `details.refSnapshot` for stale-ref safety, and renders only matching refs/lines in the model-visible snapshot with `details.snapshotFilter` counts. Add wrapper-side `--viewport` when scroll position, viewport size, document size, and sampled scroll-container offsets matter; it runs one read-only `eval --stdin` probe and reports `details.snapshotViewport`. Add wrapper-side `--diff` to compare the current ref map with the previous wrapper-tracked snapshot for that session and report `details.snapshotDiff` added/removed/changed refs. Use these flags when you need controls like checkout buttons, all comboboxes, above/below-fold context, or a quick before/after ref delta without reading a full spill file.
+For dense pages, the wrapper also accepts `snapshot -i --search <text>` and `snapshot -i --filter role=<role>` as wrapper-side filters. It runs upstream `snapshot` without those wrapper-only flags, records the full returned ref map in `details.refSnapshot` for stale-ref safety, and renders matching direct refs plus surrounding snapshot context in the model-visible snapshot with `details.snapshotFilter` counts. The visible summary distinguishes direct ref matches from surrounding lines so contextual/nested output does not look like a ref-count mismatch. Add wrapper-side `--viewport` when scroll position, viewport size, document size, and sampled scroll-container offsets matter; it runs one read-only `eval --stdin` probe and reports `details.snapshotViewport`. Add wrapper-side `--diff` to compare the current ref map with the previous wrapper-tracked snapshot for that session and report `details.snapshotDiff` added/removed/changed refs. Use these flags when you need controls like checkout buttons, all comboboxes, above/below-fold context, or a quick before/after ref delta without reading a full spill file.
 ### Wait
 | Mode | Purpose |
 | --- | --- |
 | `wait <selector>` | Wait for an element to appear. |
-| `wait <ms>` | Wait for a fixed number of milliseconds. In the native Pi wrapper, keep each fixed wait at `25000` ms or less and split longer waits into multiple tool calls. |
+| `wait <ms>` | Wait for a fixed number of milliseconds. The native Pi wrapper now forwards long waits and derives a subprocess watchdog from the explicit wait duration when the caller does not provide top-level `timeoutMs`. |
 | `wait --url <pattern>` | Wait for the URL to match a pattern. |
 | `wait --load <state>` | Wait for load state: `load`, `domcontentloaded`, or `networkidle`. |
 | `wait --fn <expression>` | Wait for a JavaScript expression to become truthy. |
 | `wait --text <text>` | Wait for text to appear on the page; failures may include `inspect-after-text-assertion-failure` with a session-scoped `snapshot -i` payload. |
 | `wait --download [path]` | Wait for a download started by a previous action and optionally save it to `path`; successful wrapper results include upstream-reported `savedFilePath`/`savedFile`, while `details.artifacts[].exists` is the wrapper's on-disk verification signal. |
-| `wait --download [path] --timeout <ms>` | Set download-start timeout in milliseconds. In the native Pi wrapper, use `25000` ms or less per call to stay under the upstream CLI IPC budget. |
+| `wait --download [path] --timeout <ms>` | Set download-start timeout in milliseconds. The native Pi wrapper forwards explicit wait timeouts and extends the subprocess watchdog unless the caller supplies top-level `timeoutMs`. |
-Current v0.27.1 source does not parse `wait <selector> --state hidden` / `wait <selector> --state detached` as distinct wait modes even though upstream help mentions those examples. Use `wait --fn "!document.querySelector('#spinner')"` or another explicit JavaScript predicate for disappearance/detach checks until upstream parser support exists.
+Current v0.27.2 source still does not parse `wait <selector> --state hidden` / `wait <selector> --state detached` as distinct wait modes even though upstream help mentions those examples. Use `wait --fn "!document.querySelector('#spinner')"` or another explicit JavaScript predicate for disappearance/detach checks until upstream parser support exists.
 ### Diff, debug, and streaming
@@ -701,7 +712,7 @@ When these commands are invoked through the native `agent_browser` tool, structu
 - project-local: `.pi/config/pi-agent-browser-native/config.json`
 - explicit override: `PI_AGENT_BROWSER_CONFIG=/path/to/config.json`
-Get an Exa API key from the [Exa dashboard](https://dashboard.exa.ai/api-keys) or a Brave Search API key from the [Brave Search API dashboard](https://api-dashboard.search.brave.com/). If both keys are available, `agent_browser_web_search` prefers Exa by default because its `/search` endpoint returns token-efficient highlights and agent-oriented search modes; set `webSearch.preferredProvider` to `"brave"` when Brave Search is preferred. You can also disable this package's search tool with `webSearch.enabled: false` when another search tool should win. Config merges global → project → `PI_AGENT_BROWSER_CONFIG` override, so `enabled` is read from the final merged config: a global disable can be re-enabled by project or override config, while an override file with `enabled: false` is the highest-priority hard disable for that run.
+Get an Exa API key from the [Exa dashboard](https://dashboard.exa.ai/api-keys) or a Brave Search API key from the [Brave Search API dashboard](https://api-dashboard.search.brave.com/). If both keys are available, `agent_browser_web_search` prefers Exa by default because its `/search` endpoint returns token-efficient highlights and agent-oriented search modes; set `webSearch.preferredProvider` to `"brave"` when Brave Search is preferred. You can also disable this package's search tool with `webSearch.enabled: false` when another search tool should win. Config merges global → project → `PI_AGENT_BROWSER_CONFIG` override, so `enabled` is read from the final loaded config: a global disable can be re-enabled by project or override config, while an override file with `enabled: false` is the highest-priority hard disable for that run. Under Pi 0.79+, globally installed or CLI-loaded extensions are developer-trusted code, so this extension reads project-local config under `.pi/config/...` by default and skips that project layer when Pi reports the project is untrusted or when launched with `--no-approve`.
 `pi install npm:pi-agent-browser-native` loads the extension, but it does **not** usually put the package helper on your shell `PATH`. The clearest setup is to write the config file directly and keep actual keys in the environment that launches `pi`:
@@ -737,7 +748,7 @@ npm exec --yes --package pi-agent-browser-native@latest -- pi-agent-browser-conf
 npm exec --yes --package pi-agent-browser-native@latest -- pi-agent-browser-config browser executable set "/Applications/Brave Browser.app/Contents/MacOS/Brave Browser"
 ```
-The optional `agent_browser_web_search` tool is registered only when Exa or Brave credentials are available and the final merged config has not set `webSearch.enabled` to `false`. It is a separate custom tool, not an `agent_browser` input mode, and does not launch a browser. Use it when current/live external web information would help; use `agent_browser` for browser interaction, screenshots, authenticated/profile pages, and DOM inspection. Disable scope is explicit: `web-search disable --global` sets the normal user default, `web-search disable --project` disables it for one repo, and a `PI_AGENT_BROWSER_CONFIG` override containing `{ "version": 1, "webSearch": { "enabled": false } }` wins over both for a hard per-run disable. Project-local plaintext, custom env aliases, interpolation-literal, malformed, and command-backed web-search keys are refused; project config may only use the matching provider env refs (`$EXA_API_KEY` / `${EXA_API_KEY}` for Exa and `$BRAVE_API_KEY` / `${BRAVE_API_KEY}` for Brave). `web-search set-key`, `set-command`, and `clear` require `--provider`; `set-env` infers Exa/Brave from `EXA_API_KEY` or `BRAVE_API_KEY` unless you pass `--provider`. For Exa, the tool defaults to `searchType: "auto"` with `contents.highlights: true`; use `fast`, `instant`, `deep-lite`, `deep`, or `deep-reasoning` only when the task needs that latency/depth tradeoff.
+The optional `agent_browser_web_search` tool is registered only when Exa or Brave credentials are available and the final available config has not set `webSearch.enabled` to `false`. It is a separate custom tool, not an `agent_browser` input mode, and does not launch a browser. Use it when current/live external web information would help; use `agent_browser` for browser interaction, screenshots, authenticated/profile pages, and DOM inspection. Disable scope is explicit: `web-search disable --global` sets the normal user default, `web-search disable --project` disables it for one repo, and a `PI_AGENT_BROWSER_CONFIG` override containing `{ "version": 1, "webSearch": { "enabled": false } }` wins over both for a hard per-run disable. Project-local plaintext, custom env aliases, interpolation-literal, malformed, and command-backed web-search keys are refused; project config may only use the matching provider env refs (`$EXA_API_KEY` / `${EXA_API_KEY}` for Exa and `$BRAVE_API_KEY` / `${BRAVE_API_KEY}` for Brave). `web-search set-key`, `set-command`, and `clear` require `--provider`; `set-env` infers Exa/Brave from `EXA_API_KEY` or `BRAVE_API_KEY` unless you pass `--provider`. For Exa, the tool defaults to `searchType: "auto"` with `contents.highlights: true`; use `fast`, `instant`, `deep-lite`, `deep`, or `deep-reasoning` only when the task needs that latency/depth tradeoff.
 Example config:
@@ -760,7 +771,7 @@ Example config:
 }
 ```
-Browser default config is conservative: it adds agent guidance for signed-in/account-specific tasks and alternate Chromium-compatible executables; current releases do not auto-inject `--profile` or `--executable-path` for every launch. Configure profile/executable guidance globally or through `PI_AGENT_BROWSER_CONFIG`; project-local browser config is not trusted to steer host executable/profile prompt guidance. Ask the agent to run `profiles` and `doctor` when profile resolution fails, then use the reported Chrome profile directory name, a full profile/user-data directory path if upstream accepts one, or the configured `browser.executablePath` with top-level `sessionMode: "fresh"`.
+Browser default config is conservative: it adds agent guidance for signed-in/account-specific tasks and alternate Chromium-compatible executables; current releases do not auto-inject `--profile` or `--executable-path` for every launch. Configure profile/executable guidance globally or through `PI_AGENT_BROWSER_CONFIG`; project-local browser config is loaded by default but is never trusted to steer host executable/profile prompt guidance. Ask the agent to run `profiles` and `doctor` when profile resolution fails, then use the reported Chrome profile directory name, a full profile/user-data directory path if upstream accepts one, or the configured `browser.executablePath` with top-level `sessionMode: "fresh"`.
 ## Important global flags, config, and environment
@@ -808,7 +819,7 @@ Browser default config is conservative: it adds agent guidance for signed-in/acc
 - `--confirm-interactive`: interactive confirmations; auto-denies when stdin is not a TTY. Environment: `AGENT_BROWSER_CONFIRM_INTERACTIVE`.
 - `-p, --provider <name>`: provider such as `ios`, `browserbase`, `kernel`, `browseruse`, `browserless`, or `agentcore`. Environment: `AGENT_BROWSER_PROVIDER`.
 - `--device <name>`: iOS device name. Environment: `AGENT_BROWSER_IOS_DEVICE`.
-- Provider-specific iOS examples from upstream include `agent-browser -p ios device list`, `agent-browser -p ios swipe up`, and `agent-browser -p ios tap @e1`; in pi, pass those tokens through `args` rather than bash. iOS requires external Xcode/Appium setup, and cloud providers (`browserbase`, `kernel`, `browseruse`, `browserless`, `agentcore`) require their upstream accounts, credentials, and provider-specific environment variables. Common forwarded provider variables include `BROWSERBASE_API_KEY`, `BROWSERBASE_PROJECT_ID`, `BROWSERLESS_API_KEY`, `BROWSERLESS_API_URL`, `BROWSERLESS_BROWSER_TYPE`, `BROWSERLESS_STEALTH`, `BROWSERLESS_TTL`, `BROWSER_USE_API_KEY`, `KERNEL_API_KEY`, `KERNEL_HEADLESS`, `KERNEL_STEALTH`, `KERNEL_TIMEOUT_SECONDS`, `KERNEL_PROFILE_NAME`, `AGENTCORE_API_KEY`, `AGENTCORE_REGION`, `AGENTCORE_BROWSER_ID`, `AGENTCORE_PROFILE_ID`, `AGENTCORE_SESSION_TIMEOUT`, plus AWS names used by AgentCore such as `AWS_PROFILE`, `AWS_ACCESS_KEY_ID`, and `AWS_SECRET_ACCESS_KEY`. The wrapper forwards provider flags/env and stays thin; it does not emulate provider setup or cloud browser behavior.
+- Provider-specific iOS examples from upstream include `agent-browser -p ios device list`, `agent-browser -p ios swipe up`, and `agent-browser -p ios tap @e1`; in pi, pass those tokens through `args` rather than bash. iOS requires external Xcode/Appium setup, and cloud providers (`browserbase`, `kernel`, `browseruse`, `browserless`, `agentcore`) require their upstream accounts, credentials, and provider-specific environment variables. Common forwarded provider variables include `BROWSERBASE_API_KEY`, `BROWSERBASE_PROJECT_ID`, `BROWSERLESS_API_KEY`, `BROWSERLESS_API_URL`, `BROWSERLESS_BROWSER_TYPE`, `BROWSERLESS_STEALTH`, `BROWSERLESS_TTL`, `BROWSER_USE_API_KEY`, `KERNEL_API_KEY`, `KERNEL_HEADLESS`, `KERNEL_STEALTH`, `KERNEL_TIMEOUT_SECONDS`, `KERNEL_PROFILE_NAME`, `AGENTCORE_API_KEY`, `AGENTCORE_REGION`, `AGENTCORE_BROWSER_ID`, `AGENTCORE_PROFILE_ID`, `AGENTCORE_SESSION_TIMEOUT`, plus AWS names used by AgentCore such as `AWS_PROFILE`, `AWS_ACCESS_KEY_ID`, `AWS_SECRET_ACCESS_KEY`, `AWS_SESSION_TOKEN`, `AWS_REGION`, and `AWS_DEFAULT_REGION`. The wrapper forwards provider flags/env and stays thin; it does not emulate provider setup or cloud browser behavior.
 - `--model <name>`: AI model for `chat`. Environment: `AI_GATEWAY_MODEL`.
 - `-v, --verbose`: show tool commands and raw output.
 - `-q, --quiet`: show only AI text responses.
@@ -840,7 +851,7 @@ Other useful environment variables include `AGENT_BROWSER_DEFAULT_TIMEOUT`, `AGE
 - For sessions with observed tab-drift risk, after a successful command on a known target tab, agent_browser also best-effort restores that intended tab if a restored/background tab steals focus after the command completes. Routine same-session commands skip this post-command tab-list probe.
 - If a known session target unexpectedly reports about:blank, agent_browser best-effort re-selects the prior intended target when it still exists; if recovery fails, it records the observed about:blank target and reports exact recovery guidance instead of treating the prior page as active.
 <!-- agent-browser-playbook:end wrapper-tab-recovery -->
-- Wrapper-spawned commands clamp `AGENT_BROWSER_DEFAULT_TIMEOUT` to 25 seconds and use a 35-second child-process watchdog (`PI_AGENT_BROWSER_PROCESS_TIMEOUT_MS` overrides the default 35s budget; top-level `timeoutMs` overrides it per browser CLI call). The default now lets ordinary calls survive the upstream 30-second IPC retry window while still bounding wedged children. Dialog commands are additionally bounded to 5 seconds (`PI_AGENT_BROWSER_DIALOG_PROCESS_TIMEOUT_MS`), and click/tap/find refs or tokens plus `eval --stdin` snippets that look like alert/confirm/prompt/dialog triggers are bounded to 8 seconds (`PI_AGENT_BROWSER_DIALOG_TRIGGER_PROCESS_TIMEOUT_MS`). When any watchdog fires, `details.timeoutPartialProgress` may include a planned step list with per-step status (including `generatedFrom` labels for wrapper-inserted rows such as `open.loadState`) and a `retry-timeout-step` next action only when the first incomplete step is read-only or idempotent, current page title/URL from best-effort session `get url` / `get title` (or a planned URL inferred from the step list when the session cannot answer), an `openedButPostOpenTimedOut` classification only when a live page URL was recovered before a later step hung, and declared artifact paths such as `screenshot`, `pdf`, `download`, or `wait --download` outputs with existence/state checks; the same evidence is appended under `Timeout partial progress` in visible text with URL/path redaction.
+- Wrapper-spawned commands clamp `AGENT_BROWSER_DEFAULT_TIMEOUT` to the upstream documented 25-second default and use a 35-second child-process watchdog (`PI_AGENT_BROWSER_PROCESS_TIMEOUT_MS` overrides the default 35s budget; top-level `timeoutMs` overrides it per browser CLI call). Explicit `wait <ms>` or `wait --timeout <ms>` calls can exceed that default; when top-level `timeoutMs` is omitted, the wrapper derives a subprocess watchdog from the requested wait duration plus a small grace window. Dialog commands are additionally bounded to 5 seconds (`PI_AGENT_BROWSER_DIALOG_PROCESS_TIMEOUT_MS`), and click/tap/find refs or tokens plus `eval --stdin` snippets that look like alert/confirm/prompt/dialog triggers are bounded to 8 seconds (`PI_AGENT_BROWSER_DIALOG_TRIGGER_PROCESS_TIMEOUT_MS`). When any watchdog fires, `details.timeoutPartialProgress` may include a planned step list with per-step status (including `generatedFrom` labels for wrapper-inserted rows such as `open.loadState`) and a `retry-timeout-step` next action only when the first incomplete step is read-only or idempotent, or `inspect-current-page-after-timeout` when the session is still inspectable but the incomplete step may be mutating and should not be blindly retried. It also includes current page title/URL from best-effort session `get url` / `get title` (or a planned URL inferred from the step list when the session cannot answer), an `openedButPostOpenTimedOut` classification only when a live page URL was recovered before a later step hung, and declared artifact paths such as `screenshot`, `pdf`, `download`, or `wait --download` outputs with existence/state checks; the same evidence is appended under `Timeout partial progress` in visible text with URL/path redaction.
 - Oversized snapshots and oversized generic outputs may be compacted in tool content, with the full raw output written to a spill file path shown directly in the tool result. Recent artifact metadata is bounded by `PI_AGENT_BROWSER_SESSION_ARTIFACT_MANIFEST_MAX_ENTRIES` (default 100); persisted spill files are separately bounded by `PI_AGENT_BROWSER_SESSION_ARTIFACT_MAX_BYTES` (default 32 MiB).
 - The wrapper keeps `--help` and `--version` stateless so they do not consume the implicit managed-session slot.
@@ -849,14 +860,14 @@ Other useful environment variables include `AGENT_BROWSER_DEFAULT_TIMEOUT`, `AGE
 <!-- agent-browser-capability-baseline:start capability-token-baseline -->
 <!-- Generated from scripts/agent-browser-capability-baseline.mjs. Run `npm run docs -- command-reference write` to update. Do not edit manually. -->
 <details>
-<summary>Generated verifier capability baseline for agent-browser 0.27.1</summary>
+<summary>Generated verifier capability baseline for agent-browser 0.27.2</summary>
 This generated block is review data for maintainers. The human-authored reference sections above remain the readable command guide.
 #### Source evidence
 - repository: `vercel-labs/agent-browser`
-- upstream HEAD: `90050f2913159875e2c3719e424746396ccb3cbf`
-- upstream package version: `0.27.1`
+- upstream HEAD: `5185339ca3fdab9848e11b8ec676eecfdec3733f`
+- upstream package version: `0.27.2`
 - inspected: `agent-browser --version`
 - inspected: `agent-browser --help`
 - inspected: `selected agent-browser <command> --help output`
@@ -925,7 +936,7 @@ This generated block is review data for maintainers. The human-authored referenc
 - Sessions, state, tabs, frames, dialogs, and windows: 20 human-doc token(s), 16 upstream token(s)
 - Network, storage, artifacts, diagnostics, and performance: 43 human-doc token(s), 53 upstream token(s)
 - Batch, auth, confirmations, setup, dashboard, devices, and AI commands: 24 human-doc token(s), 24 upstream token(s)
-- Global flags, config, providers, policy, and environment: 117 human-doc token(s), 90 upstream token(s)
+- Global flags, config, providers, policy, and environment: 120 human-doc token(s), 90 upstream token(s)
 #### Human-authored doc tokens required
 ##### Built-in skills
@@ -1230,6 +1241,9 @@ This generated block is review data for maintainers. The human-authored referenc
 - `AWS_PROFILE`
 - `AWS_ACCESS_KEY_ID`
 - `AWS_SECRET_ACCESS_KEY`
+- `AWS_SESSION_TOKEN`
+- `AWS_REGION`
+- `AWS_DEFAULT_REGION`
 #### Upstream help tokens expected
 ##### Built-in skills

package/docs/ELECTRON.md CHANGED Viewed

@@ -2,7 +2,7 @@
 Related docs:
 - [`../README.md`](../README.md)
-- [`../AGENTS.md`](../AGENTS.md) — maintainer verification (`npm run verify`, lifecycle), Pi `tmux` smoke expectations, and upstream rebaselining
+- [`../AGENTS.md`](https://github.com/fitchmultz/pi-agent-browser-native/blob/main/AGENTS.md) — maintainer verification (`npm run verify`, lifecycle), Pi `tmux` smoke expectations, and upstream rebaselining
 - [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md) — full `electron` and `qa.attached` field contracts
 - [`COMMAND_REFERENCE.md`](COMMAND_REFERENCE.md) — workflow snippets in the broader native command surface
 - [`ARCHITECTURE.md`](ARCHITECTURE.md) — wrapper design and the closed `RQ-0068` recipe-layer decision
@@ -107,7 +107,7 @@ Use this ladder for desktop-host readiness instead of blind sleep loops:
 2. After raw `connect`, inspect targets with `tab list`, select the stable `tab t<N>` app surface, then use a condition wait or `snapshot -i` on that selected surface.
 3. After wrapper-owned `electron.launch`, use `electron.probe` or `electron.status` when launch health, debug-port liveness, or target mismatch matters.
 4. Use `qa.attached` when the readiness check can be expressed as expected text or selector plus diagnostics against the current managed session.
-5. Use fixed waits only as a last resort, and keep each fixed wait below the wrapper IPC budget. `wait 30000` is intentionally blocked; use `25000` ms or less per call when a fixed wait is unavoidable.
+5. Use fixed waits only as a last resort. For legitimately slow waits, pass an explicit upstream wait timeout and let the wrapper derive the subprocess watchdog, or set top-level `timeoutMs` to at least the wait duration plus a small grace window.
 6. Treat a fixed-wait payload such as `"waited":"timeout"` as elapsed time, not proof that the host finished. Verify with an observed condition, fresh `snapshot -i`, or screenshot before continuing.
 This project is not adding a first-class host-idle primitive yet. Revisit that only if repeated desktop smokes show that condition waits, `qa.attached`, `electron.probe`, snapshots, and screenshots cannot cover the workflow.
@@ -365,7 +365,7 @@ Electron support is gated by the same release evidence as the rest of the wrappe
 - `RQ-0096` in [`SUPPORT_MATRIX.md`](SUPPORT_MATRIX.md) records the contract, runtime, test, and verification coverage.
 - `electron-lifecycle` and `electron-probe` scenarios in `scripts/agent-browser-efficiency-benchmark.mjs` track the token-efficiency claim deterministically (no real browser, no real launches).
 - Fake-upstream coverage for Electron schema/probe/mismatch/post-command-health/fill-verification/broad-text/discovery-sensitivity lives in `test/agent-browser.extension-validation.test.ts`.
-- Real-app validation is a manual `tmux` smoke pass per the maintainer notes in `AGENTS.md`; the 2026-05-21 dogfood result is recorded at the end of [`docs/plans/electron-extension-2026-05-20.md`](plans/electron-extension-2026-05-20.md).
+- Real-app validation is a manual `tmux` smoke pass per the maintainer notes in `AGENTS.md`; the 2026-05-21 dogfood result is recorded in the repo-local `docs/plans/electron-extension-2026-05-20.md` plan.
 Run the local gate the same way as the rest of the project: