pi-agent-browser-native 0.2.46 → 0.2.48

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (38) hide show
  1. package/CHANGELOG.md +64 -20
  2. package/README.md +45 -20
  3. package/docs/ARCHITECTURE.md +14 -14
  4. package/docs/COMMAND_REFERENCE.md +37 -23
  5. package/docs/ELECTRON.md +3 -3
  6. package/docs/RELEASE.md +33 -24
  7. package/docs/REQUIREMENTS.md +4 -4
  8. package/docs/SUPPORT_MATRIX.md +34 -106
  9. package/docs/TOOL_CONTRACT.md +24 -22
  10. package/docs/platform-smoke.md +2 -2
  11. package/extensions/agent-browser/index.ts +20 -2
  12. package/extensions/agent-browser/lib/config-policy.js +16 -5
  13. package/extensions/agent-browser/lib/config.ts +17 -4
  14. package/extensions/agent-browser/lib/input-modes/job.ts +138 -62
  15. package/extensions/agent-browser/lib/input-modes/params.ts +2 -2
  16. package/extensions/agent-browser/lib/orchestration/browser-run/artifact-paths.ts +44 -0
  17. package/extensions/agent-browser/lib/orchestration/browser-run/click-dispatch.ts +42 -19
  18. package/extensions/agent-browser/lib/orchestration/browser-run/diagnostics.ts +6 -4
  19. package/extensions/agent-browser/lib/orchestration/browser-run/final-result.ts +18 -9
  20. package/extensions/agent-browser/lib/orchestration/browser-run/prepare/direct-anchor-download.ts +158 -0
  21. package/extensions/agent-browser/lib/orchestration/browser-run/prepare/network-page-filter.ts +116 -0
  22. package/extensions/agent-browser/lib/orchestration/browser-run/prepare/scroll-shims.ts +147 -0
  23. package/extensions/agent-browser/lib/orchestration/browser-run/prepare/snapshot-filter.ts +183 -0
  24. package/extensions/agent-browser/lib/orchestration/browser-run/prepare/wait-timeouts.ts +58 -0
  25. package/extensions/agent-browser/lib/orchestration/browser-run/prepare.ts +19 -653
  26. package/extensions/agent-browser/lib/orchestration/browser-run/process-output.ts +1 -6
  27. package/extensions/agent-browser/lib/orchestration/browser-run/session-artifacts.ts +8 -0
  28. package/extensions/agent-browser/lib/orchestration/browser-run/types.ts +1 -0
  29. package/extensions/agent-browser/lib/pi-tool-rendering.ts +34 -19
  30. package/extensions/agent-browser/lib/playbook.ts +4 -4
  31. package/extensions/agent-browser/lib/results/action-recommendations.ts +3 -3
  32. package/extensions/agent-browser/lib/web-search.ts +11 -4
  33. package/package.json +4 -4
  34. package/scripts/agent-browser-capability-baseline.mjs +6 -3
  35. package/scripts/doctor.mjs +12 -11
  36. package/scripts/platform-smoke/platform-build-windows.ps1 +2 -2
  37. package/scripts/platform-smoke/targets.mjs +7 -3
  38. package/scripts/platform-smoke.mjs +2 -2
@@ -2,7 +2,7 @@
2
2
 
3
3
  Related docs:
4
4
  - [`../README.md`](../README.md)
5
- - [`../AGENTS.md`](../AGENTS.md) (rebaselining and verification stack)
5
+ - [`../AGENTS.md`](https://github.com/fitchmultz/pi-agent-browser-native/blob/main/AGENTS.md) (rebaselining and verification stack)
6
6
  - [`COMMAND_REFERENCE.md`](COMMAND_REFERENCE.md)
7
7
  - [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md)
8
8
  - [`ELECTRON.md`](ELECTRON.md)
@@ -26,52 +26,42 @@ When upstream ships a new `agent-browser` or the inventory changes:
26
26
 
27
27
  ## Audit result
28
28
 
29
- - Target upstream: `agent-browser 0.27.1` (must match `CAPABILITY_BASELINE.targetVersion` in [`scripts/agent-browser-capability-baseline.mjs`](../scripts/agent-browser-capability-baseline.mjs)).
29
+ - Target upstream: `agent-browser 0.27.2` (must match `CAPABILITY_BASELINE.targetVersion` in [`scripts/agent-browser-capability-baseline.mjs`](../scripts/agent-browser-capability-baseline.mjs)).
30
30
  - Source of truth: `CAPABILITY_BASELINE.inventorySections` in the same file (stable `id` keys: `skills`, `core-commands`, `state-tabs-frames-dialogs`, `network-storage-artifacts-diagnostics`, `batch-auth-setup-ai`, `options-and-env`).
31
31
  - Status: supported for the current wrapper contract after the 2026-05-26 all-command audit.
32
- - High-priority support gaps: 2026-05-26 audit found sessionless local commands and command-scoped value flags needed sharper wrapper handling; runtime/tests/docs now cover those paths. Remaining upstream-owned caveat: `agent-browser 0.27.1` help mentions `wait <selector> --state hidden`, but source parsing does not implement that distinct wait mode, so wrapper docs steer agents to `wait --fn` predicates.
32
+ - High-priority support gaps: 2026-05-26 audit found sessionless local commands and command-scoped value flags needed sharper wrapper handling; runtime/tests/docs now cover those paths. The 0.27.2 rebaseline preserves thin support for upstream click reliability, frame-scoped selectors/waits, form-command fixes, daemon retry improvements, and glibc-pinned release artifacts; wrapper wait planning now forwards explicit long `wait <ms>` / `wait --timeout <ms>` calls instead of rejecting them before spawn. Remaining upstream-owned caveat: `agent-browser 0.27.2` help mentions `wait <selector> --state hidden`, but source parsing does not implement that distinct wait mode, so wrapper docs steer agents to `wait --fn` predicates.
33
33
  - Post-`v0.2.29` review state: commits `eb55320` through `86abbfb` add browser guidance/smoke coverage plus `RQ-0086` click-probe reduction, `RQ-0087` same-snapshot form fill batching, `RQ-0088` current-ref fallback on locator misses, `RQ-0089` direct-upstream click mutation investigation, and `RQ-0090` stop-boundary/artifact-path guidance. Verification gates below were rerun on 2026-05-18 after those tasks landed. Constrained `job` (`RQ-0064`), the lightweight `qa` preset (`RQ-0065`), the experimental `sourceLookup` helper (`RQ-0066`), the experimental `networkSourceLookup` helper (`RQ-0067`), optional Exa/Brave-backed `agent_browser_web_search` with Pi-scoped package config (`RQ-0121`), and agent recovery for search/profile configuration failures (`RQ-0122`) are implemented; see [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#job), [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#qa), [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#sourcelookup), [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#networksourcelookup), and [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#optional-companion-web-search). Reusable browser recipes (`RQ-0068`) are intentionally not adopted as a runtime surface; see [`ARCHITECTURE.md`](ARCHITECTURE.md#no-reusable-recipe-layer-yet).
34
34
 
35
35
  ## Open UX/reliability follow-ups from 2026-05-29 agent feedback
36
36
 
37
- Phase 1 triage (2026-05-29): IDs **RQ-0110–RQ-0117** track the first feedback batch. Second/third-round follow-up adds **RQ-0118–RQ-0120**. **Do not reuse RQ-0101** here—that id is already shipped for compact-snapshot high-value controls (see closure section below).
38
-
39
- These rows track this feedback batch. Some rows are docs-only or environment-owned; rows marked shipped have code/tests in this change but still need release-gate evidence before being treated as release closure.
40
-
41
- | ID | Feedback | Owner | Phase 1 classification | Evidence (2026-05-29, `agent-browser 0.27.0` on maintainer macOS unless noted) | Next implementation action | Likely files / tests |
42
- | --- | --- | --- | --- | --- | --- | --- |
43
- | RQ-0110 | Headed demos are hard to discover and hard to verify. | Wrapper + upstream (`--headed`) | **docs/playbook-mitigated** (`README`, `TOOL_CONTRACT`, `COMMAND_REFERENCE`, generated playbook guidance); visibility proof **out-of-scope/host-owned** until upstream exposes a portable signal. | `--headed open https://example.com` succeeds with JSON success; no upstream field proves an OS window is visible. Docs/playbook now document `sessionMode: "fresh"` and screenshot/tab/get-url evidence. | No further wrapper action planned for this batch without an upstream/OS portable visibility signal. | `extensions/agent-browser/lib/playbook.ts` (`npm run docs -- playbook write`), README, `docs/COMMAND_REFERENCE.md`, `docs/TOOL_CONTRACT.md`. |
44
- | RQ-0111 | Local `localhost` / `127.0.0.1` fixture servers can fail with `ERR_EMPTY_RESPONSE` from the browser host. | Environment + upstream (navigation) | **docs-mitigated** (loopback host mismatch + `ERR_EMPTY_RESPONSE` meaning); browser-host reachability remains **environment-owned**. | Reproduced loopback navigation failures on 2026-05-29 maintainer macOS: accept-then-close without HTTP can surface as `net::ERR_EMPTY_RESPONSE` or `net::ERR_SOCKET_NOT_CONNECTED`; nothing listening yields `net::ERR_CONNECTION_REFUSED`. Same-machine `python3 -m http.server` (or harness `SimpleHTTPRequestHandler`) + `open http://127.0.0.1:<port>/fixture.html` succeeds. `npm run verify -- real-upstream` already uses localhost fixtures successfully on this host. | No wrapper server manager or classifier in this batch: failures are not specific enough to prove browser-host loopback mismatch. Keep guidance on host-reachable addresses, `file://` static fallback, and harness-owned servers. | README, `docs/COMMAND_REFERENCE.md`, `docs/TOOL_CONTRACT.md`, `test/helpers/agent-browser-harness.ts`. |
45
- | RQ-0112 | `eval --stdin` can silently return `null` on `file://` pages, blocking DOM verification. | Upstream (eval channel) + wrapper (warning UX) | **docs-mitigated** (treat `file://` null as inconclusive); **wrapper-owned shipped** (`details.evalResultWarning` + visible `Eval result warning` on `file:` + `result === null`; upstream null channel remains environment/upstream-owned). | Reproduced on `file://` fixture: expressions `null`, `undefined`, `(() => null)()`, and missing-element queries return `"success":true` with `"result":null`; `JSON.stringify(null)` returns the string `"null"`. Simple DOM reads (`document.getElementById(...).textContent`) return real values on the same page. Focused fake coverage asserts the warning without failing the tool. | No further wrapper action planned unless real upstream exposes a richer error. Keep release validation focused on the non-failing warning and redaction-safe visible copy. | `extensions/agent-browser/lib/orchestration/browser-run/diagnostics.ts`, `process-output.ts`, `final-result.ts`, `types.ts`, `docs/TOOL_CONTRACT.md`, `test/agent-browser.extension-errors-artifacts.test.ts`. |
46
- | RQ-0113 | A successful click may not lead to the expected DOM mutation. | Wrapper + upstream (click semantics) | **docs-mitigated / existing-runtime-mitigated** by `RQ-0089` click-dispatch, `RQ-0073` overlay diagnostics, and stronger verification guidance; arbitrary app no-op handlers remain **app/upstream semantics**, not proof of wrapper failure. | Direct upstream can correctly report `click #noop` success while app state stays unchanged; `click #mutate` updates DOM. This shows click success is target activation evidence, not expected-state proof. Wrapper already probes missing trusted DOM events and overlay blockers, but cannot infer arbitrary expected mutations without task-specific assertions. | No additional generic post-click probe in this batch to avoid false positives. Use task-specific verification (`snapshot`, `wait --text`, `assertText`, screenshot, `pageChangeSummary`) after state-changing clicks. | Existing `clickDispatch`/overlay tests plus README / `docs/COMMAND_REFERENCE.md` verification guidance. |
47
- | RQ-0114 | `get text` selector ambiguity remains hard to resolve when several matches are visible. | Wrapper + upstream (first-match `get text`) | **wrapper-owned shipped** (`visibleCandidates` on selector probe + visible previews); first-match behavior remains upstream semantics. `RQ-0074` warning path already shipped. | Upstream CLI: `get text ".item"` with two visible matches returns only `Alpha`. Wrapper `RQ-0074` already warns when `matchCount > 1` (including all-visible cases) and now exposes bounded visible candidate previews/indexes for safer narrowing. | No further wrapper action planned for this batch. Future improvement: derive safe selector suggestions only if redaction rules can keep them non-sensitive. | `extensions/agent-browser/lib/orchestration/browser-run/diagnostics.ts`, `test/agent-browser.extension-errors-artifacts.test.ts`, `docs/TOOL_CONTRACT.md`. |
48
- | RQ-0115 | Temporary local HTTP server port management is manual and leaked processes block later runs. | Environment (host/process lifecycle) | **out-of-scope/host-owned** (no fixture-server runtime per architecture); **docs-mitigated** (harness pointer in `COMMAND_REFERENCE`). | By design outside `agent_browser` per architecture no-recipe policy. Repo test harness already exposes `startAgentBrowserContractFixtureServer()` for deterministic localhost pages; leaked `python3 -m http.server` / Node listeners are operator or CI cleanup. Phase 1 added a maintainer pointer from [`COMMAND_REFERENCE.md`](COMMAND_REFERENCE.md#headed-demo-and-local-page-checks) to the harness. | Phase 2: **no wrapper server manager** without new design evidence. Parent decision if a separate npm script (`verify` helper) is wanted—out of scope for thin integration. | `test/helpers/agent-browser-harness.ts`, `docs/COMMAND_REFERENCE.md`, `docs/ARCHITECTURE.md` (if explicit anti-scope note is needed). |
49
- | RQ-0116 | Fresh-session failure prose is opaque and exposes internal generated session ids without clear recovery. | Wrapper | **wrapper-owned shipped** (action-oriented visible recovery + `nextActions`; `attemptedSessionName` remains in `details`). Struct + visible line already exist (`RQ-0077`). | `buildManagedSessionOutcome` still keeps full generated-session transition details in `details.managedSessionOutcome`, while visible failure prose now summarizes preserved/abandoned/replaced outcomes without repeating generated ids. Focused fake coverage covers preserved, missing-binary, abandoned, and QA-reclassification paths. | No further wrapper action planned for this batch unless reviewer finds recovery actions unsafe or insufficient. | `extensions/agent-browser/lib/orchestration/browser-run/session-state.ts`, `final-result.ts`, `docs/TOOL_CONTRACT.md`, `test/agent-browser.extension-errors-artifacts.test.ts`, `test/agent-browser.extension-input-modes.test.ts`. |
50
- | RQ-0117 | There is no machine-readable confirmation that headed mode is visible to the user. | Wrapper gap + environment (display) | **documented unsupported** for this batch; true OS visibility is **out-of-scope/host-owned** until upstream exposes a portable signal. Pairs with RQ-0110. | Same root cause as RQ-0110: no portable upstream/wrapper field observed. Headed launch success is not visibility proof, and adding a constant `details.headedVisibility: "unsupported"` would add noise without a decision signal. | No runtime field in this batch. Keep the explicit contract limitation and independent screenshot/tab/get-url evidence guidance. | README, `docs/TOOL_CONTRACT.md`, `docs/COMMAND_REFERENCE.md`, generated playbook guidance. |
51
- | RQ-0118 | Second-round report says every `eval --stdin` expression on `file://` returned `null`, including `1 + 1` and `document.title`. | Wrapper UX + caller-shape recovery | **wrapper-owned shipped in follow-up branch**: direct upstream and native-tool checks on maintainer macOS show `eval --stdin` works on `file://` when the script is supplied through the top-level native tool `stdin` field. The reported all-null behavior is reproduced by the malformed native-tool shape `args: ["eval", "--stdin", "document.title"]` with no top-level `stdin`, which upstream treats as empty stdin and returns `null`. | Direct probes: `agent-browser --session ... open file:///tmp/page.html` then `printf '1+1' | agent-browser --session ... eval --stdin` returns `2`; native tool `{ args: ["eval", "--stdin"], stdin: "document.title" }` returns the fixture title; native tool `{ args: ["eval", "--stdin", "1+1"] }` reproduced `result: null` before normalization. | Normalize the common malformed native-tool call by moving trailing args after `--stdin` into process stdin before launch; keep docs/playbook explicit that top-level `stdin` is canonical. | `extensions/agent-browser/lib/orchestration/input-plan.ts`, `test/agent-browser.extension-errors-artifacts.test.ts`, README, `docs/COMMAND_REFERENCE.md`, `docs/TOOL_CONTRACT.md`, generated playbook guidance. |
52
- | RQ-0123 | Stress testing found artifact and navigation safety contracts could report success after failure evidence: missing explicit artifact files and `--allowed-domains` click escapes. | Wrapper result contract + navigation policy | **wrapper-owned in this branch**: non-pending resolved file artifacts with `exists:false` fail closed with `failureCategory: "artifact-missing"`, missing downloads are labeled as reported-but-not-verified rather than completed, and non-file URL payloads such as `data:` downloads are not treated as host artifacts; argv-supplied `--allowed-domains` is remembered for the managed session and successful-looking browser commands whose final observed `http(s)` URL escapes the allowlist fail with `failureCategory: "policy-blocked"`. | Reproduced in issue #68/#69 audits: `wait --download` and `diff screenshot --output` reported success with missing files; `example.com` opened under `--allowed-domains example.com` but clicking `Learn more` reached `www.iana.org`, while direct outside-domain open was blocked by upstream. | Preserve verified and pending artifact success; preserve valid in-domain navigation and direct upstream blocks; keep allowlist matching exact host plus subdomain suffix and skip non-`http(s)` URLs. | `extensions/agent-browser/lib/results/presentation/artifacts.ts`, `presentation.ts`, `batch.ts`, `contracts.ts`, `action-recommendations.ts`, `extensions/agent-browser/lib/navigation-policy.ts`, `process-output.ts`, `docs/TOOL_CONTRACT.md`, `docs/COMMAND_REFERENCE.md`, focused artifact/navigation tests. |
53
- | RQ-0124 | 2026-06-05 stress report found several agent-facing ergonomics gaps: `doctor` rendered as `(see attached image)`, simple anchor `download <selector> <path>` could time out and save a random-named file, `qa.attached` could false-pass by clearing diagnostics, failed fresh `job` prose implied the old session was still active, constrained `job` lacked semantic locators, dense annotated screenshots were too noisy, `tab list` hid labels, `state save` required pre-created parents, and `session list` names were cryptic. | Wrapper presentation, input modes, artifact preflight, managed-session state | **wrapper-owned in this branch**: top-level no-`data` success envelopes are preserved for presentation; `doctor`, `session list`, and `tab list` render stable readable fields; non-ref simple loopback anchor downloads are fetched directly to the requested path when an HTTP(S) href is resolvable; direct/batch artifact parent directories are created before launch including `state save`; `qa.attached` preserves diagnostics and exposes `diagnosticsResetAtStart:false`; post-launch fresh batch/job failures keep the fresh session current and visible recovery points to the failed step; `job` click/fill support semantic locator fields; annotated screenshot results add density guidance. | Stress report path `/tmp/agent-browser-stress-20260605T174431Z/reports/agent-browser-stress-report.md`; focused fake coverage added for direct anchor downloads, `qa.attached`, fresh job failure, job semantic locators, `state save` parents, tab/session/doctor presentation, and top-level envelope preservation. | Preserve RQ-0123 fail-closed artifact semantics, stale-ref guards, compact dense snapshot high-value refs, normal upstream `download @ref` behavior, and URL-opening QA buffer clearing. Do not add a reusable recipe layer or broad screenshot-label post-processing without new design evidence. | `extensions/agent-browser/lib/results/envelope.ts`, `presentation/diagnostics.ts`, `presentation.ts`, `input-modes/job.ts`, `params.ts`, `types.ts`, `orchestration/browser-run/prepare.ts`, `process-output.ts`, README, `docs/COMMAND_REFERENCE.md`, `docs/TOOL_CONTRACT.md`, focused artifact/input/passthrough/presentation/results tests. |
54
- | RQ-0125 | Follow-up 2026-06-05 stress report found route mocks could look fulfilled when requests failed, failed `qa.expectedText` checks could collapse into watchdog timeouts, semantic/current-ref recovery missed exact visible controls from nested/open-shadow contexts, compact snapshots after scroll could hide viewport context, batch/job stale-ref guidance lacked an in-schema refresh step, attached QA defaults surprised by preserved buffers, `data:image` request rows polluted diagnostics, `vitals` output lacked readable metrics/unavailable states, and close/session lifecycle prose was ambiguous. The report also expected prompt-derived final-action blocking; that expectation is **rejected** as out of scope except for exact artifact-before-close invariants. | Wrapper diagnostics, input modes, prompt policy, presentation, managed-session lifecycle | **wrapper-owned in this branch**: prompt-derived stop-boundary action blocking was removed while exact requested-artifact-before-close remains; route diagnostics flag failed/pending/CORS routed rows as unfulfilled and expose `inspect-routed-network-request`; QA expected text originally read body text after load and reported missing text as `qa-failure` (superseded by RQ-0126 visible-text predicates); `qa.attached` diagnostic reads default off unless opted in; `job` adds explicit `snapshot` steps; selector recovery uses the failed batch step command for exact current-ref fallbacks; compact snapshots include a viewport-ordering note/field; `network requests` hides `data:image` artifact noise while preserving raw details; `vitals`/`web-vitals` render metric summaries or unavailable reasons; successful managed-session close explains next-session state and `session list` includes active fields. | Stress report path `/private/tmp/piab-project-mUkv1r/dogfood-output/agent-browser-stress-20260605T190651Z/report.md`; focused coverage added for prompt non-blocking, artifact close guard retention, route failures, QA expected text body-check fallback/defaults, batch semantic ref fallback, compact snapshot viewport note, data-image filtering, vitals summaries, and close/session prose. | Keep broad user/business intent enforcement as agent responsibility; do not reintroduce prompt-derived click/key blocking without an explicit machine-checkable invariant. Preserve stale-ref safety by requiring explicit `snapshot` refresh rows rather than weakening ref guards after mutation-prone steps. | `prompt-policy.ts`, `prompt-guards.ts`, `input-modes/job.ts`, `results/network.ts`, `network-routes.ts`, `presentation/diagnostics.ts`, `presentation/registry.ts`, `snapshot.ts`, `orchestration/browser-run/final-result.ts`, `session-state.ts`, README, `docs/COMMAND_REFERENCE.md`, `docs/TOOL_CONTRACT.md`, focused input/prompt/passthrough/semantic/presentation/snapshot/resume tests. |
55
- | RQ-0126 | Later 2026-06-05 stress report found JavaScript prompt/dialog flows could hit the full watchdog, same-page rerenders could recycle stale refs into false click success, fresh `job`/`qa` post-launch failures could still read like launch failures after timeouts, broad QA body text reads timed out on dense MDN pages, constrained jobs continued after failed setup fills, `diff screenshot` missing outputs still used “saved” wording, virtualized/nested scrolling needed eval workarounds, aggregate network buffers were noisy across navigations, no-op page scrolls surfaced `scrolled:true`, and follow-up feedback asked for better timeout recovery, record-start artifact state, direct semantic selectors, output files, human-paced typing, and timeout knobs. | Wrapper timeouts, ref preflight, input modes, artifact/diagnostic presentation, scroll helpers | **wrapper-owned in this branch**: dialog commands and likely dialog-trigger clicks get shorter wrapper process timeouts plus dialog recovery next actions; ordinary calls use a 35s child watchdog with per-call `timeoutMs`; direct `@ref` mutations compare against a fresh same-page snapshot before spawn and fail stale when role/name identity changes; fresh batch timeouts with recovered page context keep the new session as current in `managedSessionOutcome`; timeout partial progress exposes per-step status and `retry-timeout-step`; QA expected text uses bounded visible-text `wait --fn` predicates; `job.failFast` defaults to `true`, compiles to `batch --bail`, and supports bounded paced `type`; `semanticAction` supports direct selector/ref click/check/fill; `outputPath` writes successful payloads to local files; `record start` / `record restart` artifacts are pending/open instead of missing; missing diff images are labeled reported-but-not-verified; explicit CSS-container `scroll <selector> <dir> [amount]` is handled before page scroll; network previews add a clear-buffer-before-repro next action; no-op scrolls set `details.data.scrolled:false` / `noMovement:true`. | Stress report path `/private/tmp/piab-project-TJbDcs/dogfood-output/agent-browser-stress/reports/agent-browser-stress-report.md`; focused coverage added for dialog timeouts, same-page rerender ref preflight, job `--bail`/bounded `type` compaction, semantic selector compilation, outputPath, timeout retry evidence, visible QA text predicates, record-start/restart pending artifacts, artifact missing wording, explicit container scroll, no-op scroll data, and network clear next actions. | Preserve thin upstream semantics for ordinary commands; do not implement a broad recipe layer or prompt-derived business-action blocker. Dialog recovery remains bounded best-effort, not a guarantee that upstream can accept every wedged prompt; fresh-session recovery is always offered. | `input-modes/job.ts`, `semantic-action.ts`, `params.ts`, `types.ts`, `orchestration/input-plan.ts`, `orchestration/output-file.ts`, `orchestration/browser-run/prepare.ts`, `process-output.ts`, `final-result.ts`, `diagnostics.ts`, `results/presentation/artifacts.ts`, `presentation/diagnostics.ts`, README, `docs/COMMAND_REFERENCE.md`, `docs/TOOL_CONTRACT.md`, focused input/ref/validation/artifact/process/results tests. |
56
- | RQ-0127 | Broad stress report rooted at `/private/tmp/piab-project-gfkJ2A` found follow-up friction: missing QA expected text could still let later diagnostics push the whole batch into the wrapper watchdog, `eval --stdin` snippets that open JavaScript dialogs used the full default watchdog, `scroll to end` could contradict no-movement diagnostics on long pages, and `keyboard press Enter` was an unsupported upstream shape without targeted guidance. The report also requested larger product/design improvements such as viewport-first snapshots, snapshot search/diff, semantic extraction, page-only network filters, batch retry with fresh refs, and smarter ref lifecycle. | QA compiler, dialog timeout planning, scroll helpers, error presentation, product backlog | **partially wrapper-owned in this branch**: QA now compiles to `batch --bail` so failed text/selector assertions stop before slower diagnostics; dialog-like `eval --stdin` is bounded by the dialog-trigger watchdog and gets dialog recovery next actions on timeout; `scroll to end` / `scroll to top` are wrapper-handled against `document.scrollingElement` with `details.scrollPage`; unsupported `keyboard press` errors now explain `keyboard type` / `inserttext` and Enter usage. Wrapper-side snapshot search/filter is now implemented for `snapshot -i --search <text>` and `--filter role=<role>` while preserving the full `details.refSnapshot`, `snapshot --viewport` adds opt-in viewport/scroll metadata, `snapshot --diff` reports ref-map deltas against the previous tracked snapshot, and wrapper-side `network requests --current-page` / `--current-origin` / `--current-url` filters aggregate buffers by the active page; job `open.loadState` can insert an explicit readiness wait after navigation. Automatic viewport-first snapshots, automatic batch retry with fresh refs, weaker stale-ref lifecycle rules, and broad additional Electron CDP recovery were triaged as resolved-by-existing-controls or intentionally rejected rather than implemented: the wrapper now provides opt-in viewport/search/diff evidence, preserves machine-checkable stale-ref guardrails instead of replaying mutating batches, and keeps the existing Electron status/probe/reattach/No-active-page recovery paths that already have lifecycle coverage without adding speculative CDP heuristics. | Focused coverage added for QA `batch --bail`, dialog-trigger eval timeout, wrapper-handled page scroll-to-end, wrapper-side snapshot search/filter/viewport/diff metadata, network current-page filtering, job open readiness waits, and keyboard press guidance. Root report path currently exists but has no files; inline user report is the source evidence for this row. | Preserve QA pass diagnostics for successful assertions; fail fast only after failed batch steps. Do not weaken stale-ref safety to reduce snapshot tax without a machine-checkable validity invariant; batch auto-retry after a click/open can repeat mutating steps and is intentionally not added. Do not pretend upstream-only CDP/keyboard API gaps are fixed without either wrapper recovery or explicit blocked evidence. | `input-modes/job.ts`, `orchestration/browser-run/prepare.ts`, `final-result.ts`, `results/presentation/errors.ts`, README, `docs/COMMAND_REFERENCE.md`, `docs/TOOL_CONTRACT.md`, focused input/validation/presentation tests. |
57
- | RQ-0119 | Second-round localhost failures still show `ERR_EMPTY_RESPONSE` even when shell `curl` succeeds. | Environment + wrapper diagnostics | **diagnostic-mitigated**: direct maintainer repro shows localhost HTTP succeeds with a normal same-host Python server, so the wrapper still cannot prove or bridge an environment-specific browser-host namespace/proxy mismatch. Add error presentation guidance specifically for loopback navigation failures so agents do not misread `ERR_EMPTY_RESPONSE` as blank page content. | Direct probe: `python3 -m http.server --bind 127.0.0.1 8766` + `agent-browser open http://127.0.0.1:8766/page.html` succeeds; previous first-batch evidence still shows accept-then-close servers can produce `ERR_EMPTY_RESPONSE`. | Append a local fixture hint on loopback `open`/navigation failures with `net::ERR_EMPTY_RESPONSE`, `ERR_CONNECTION_REFUSED`, `ERR_ADDRESS_UNREACHABLE`, `ERR_TIMED_OUT`, or `ERR_CONNECTION_RESET`; do not add server lifecycle management in the native browser tool. | `extensions/agent-browser/lib/results/presentation/errors.ts`, `test/agent-browser.presentation-skills-recovery.test.ts`, `docs/COMMAND_REFERENCE.md`, `docs/TOOL_CONTRACT.md`. |
58
- | RQ-0120 | Third-round report says ref/semantic clicks can report success while inline `onclick="…"` handlers do not run, though programmatic `.click()` does. | Wrapper diagnostics + upstream/browser hit testing | **diagnostic-mitigated in follow-up branch**: simple direct upstream probes show inline `onclick` handlers fire for selector and `@ref` clicks on file pages, so the reported case is likely a hit-target/overlay/ref-resolution miss rather than inline attributes generally. Extend the click-dispatch probe to `@e…` refs using the latest snapshot role/name metadata so ref or semanticAction→ref clicks that never deliver a trusted event to the intended element fail with `details.clickDispatch` instead of silently reporting success. Fourth-round external testing confirmed this diagnostic now catches the failure. | Direct probe: minimal `<button onclick="showGraph('rps')">` fixture updates DOM via selector click, `@e1` click, and programmatic `.click()`. Existing wrapper probe covered CSS/XPath only; semantic visible-ref resolution and raw `@e…` clicks skipped dispatch diagnostics. Follow-up tester confirmed programmatic `.click()` remains a useful static-fixture workaround when CDP/user-like click dispatch fails. | Probe standalone `click @e…` when the latest snapshot maps that ref to a unique visible role/name DOM candidate; keep no in-page replay policy. Document programmatic `eval --stdin` `.click()` as an explicit debugging/static-fixture workaround only, not proof of real user click behavior and not a way around stop boundaries. | `extensions/agent-browser/lib/orchestration/browser-run/click-dispatch.ts`, `types.ts`, `prepare.ts`, `test/agent-browser.extension-click-dispatch.test.ts`, README, `docs/COMMAND_REFERENCE.md`, `docs/TOOL_CONTRACT.md`. |
37
+ Detailed feedback triage and implementation notes live in the repository source at [`docs/support-notes.md`](https://github.com/fitchmultz/pi-agent-browser-native/blob/main/docs/support-notes.md) under the 2026-05-29 agent feedback section. Keep this active matrix limited to release-critical status and gates.
38
+
39
+ Current summary:
40
+
41
+ | Range | Status | Source of truth |
42
+ | --- | --- | --- |
43
+ | RQ-0110–RQ-0120 | Agent feedback triage resolved or documented; remaining unsupported areas are environment/upstream-owned. | [`docs/support-notes.md`](https://github.com/fitchmultz/pi-agent-browser-native/blob/main/docs/support-notes.md) |
44
+ | RQ-0123–RQ-0127 | Stress-report wrapper fixes shipped; prompt-derived business-action blocking remains intentionally out of scope. | [`docs/support-notes.md`](https://github.com/fitchmultz/pi-agent-browser-native/blob/main/docs/support-notes.md) |
45
+ | RQ-0101 | Upstream `agent-browser 0.27.2` rebaseline shipped. | [`docs/support-notes.md`](https://github.com/fitchmultz/pi-agent-browser-native/blob/main/docs/support-notes.md) |
59
46
 
60
47
  ## Verification evidence
61
48
 
62
- Re-run the gates below before each release; this table records what the closure audit exercised.
49
+ Re-run the gates below before each release; this table records what the closure audit exercised. Rows marked **Current for 0.27.2** were rerun after the `agent-browser 0.27.2` rebaseline. Rows marked **Historical / pending refresh** are useful prior evidence but must not be treated as current release proof until rerun under the named condition.
63
50
 
64
51
  | Gate | Evidence | Status |
65
52
  | --- | --- | --- |
66
- | Default local gate | `npm run verify` checks generated playbook drift, `tsc --noEmit`, unit/fake tests, generated command-reference blocks, and live command-reference sampling. | Pass on 2026-06-03 as part of `npm run verify -- release` (`agent-browser 0.27.1` on `PATH`). |
67
- | Real upstream contract | `npm run verify -- real-upstream` runs the localhost fixture matrix against the real installed `agent-browser` matching the baseline. | Pass on 2026-06-03 (`npm run verify -- real-upstream`, `agent-browser 0.27.1` on `PATH`; updated Web Vitals shape assertions for upstream 0.27.1 structured output). |
68
- | Packaged Pi smoke | `npm run verify -- package-pi` validates package contents, loads the packaged `agent_browser` tool without requiring optional Brave config, and executes fake-upstream `--version`. | Pass on 2026-06-03 as part of `npm run verify -- release` (`npm run verify -- package-pi` slice). |
69
- | Deterministic dogfood smoke | `npm run verify -- dogfood` (`scripts/verify-agent-browser-dogfood.ts`) drives the native wrapper against a local file fixture through top-level `qa`, `semanticAction`, constrained `job`, screenshot artifact verification, and session close with the real `agent-browser` on `PATH`. | Pass on 2026-06-03 (`npm run verify -- dogfood`, `agent-browser 0.27.1`; artifacts cleaned by the harness). |
70
- | Efficiency benchmark | `npm run verify -- benchmark` runs deterministic browser workflow accounting plus focused benchmark tests, including JSONL sampling fixtures and job/qa/sourceLookup/networkSourceLookup/Electron scenario coverage. | Pass on 2026-05-29 (`npm run verify -- benchmark`). |
71
- | Crabbox platform smoke | `npm run check:platform-smoke` syntax-checks the harness and cheap invariants. `npm run smoke:platform:ubuntu-image` builds the project-owned Linux image, `npm run smoke:platform:doctor` checks Crabbox 0.26.0+ and local target readiness, and `npm run smoke:platform:all` runs doctor first, then fast target-local `platform-build` (`npm run verify -- platform-target`, pack, clean Pi install) plus `browser-dogfood-smoke` on Crabbox `macos`, `ubuntu`, and `windows-native`; see [`platform-smoke.md`](platform-smoke.md). Target artifacts include Crabbox/provider/work-root metadata, and release review also checks provider-specific `crabbox list` commands for leftover leases/clones. | Pass on 2026-06-03 (`npm run check:platform-smoke`, `npm run smoke:platform:ubuntu-image`, and `npm run verify -- release`, whose platform slice ran the macOS/Ubuntu/native-Windows Crabbox matrix; artifacts cleaned after evidence capture). |
72
- | `verify -- release` / `prepublishOnly` | `npm run verify -- release` chains the default gate with the configured-source lifecycle harness, packaged Pi smoke, and the release-blocking Crabbox platform matrix (`verifySteps` `release` in [`scripts/project.mjs`](../scripts/project.mjs)). `package.json` `prepublishOnly` runs that compose before `npm pack --dry-run` during `npm publish`. It intentionally omits standalone real-upstream, host-only dogfood, and benchmark modes—see [`RELEASE.md`](RELEASE.md#pre-release-checks). | Pass on 2026-06-03 (`npm run verify -- release`, including macOS/Ubuntu/native-Windows Crabbox matrix). |
73
- | Configured-source lifecycle | `npm run verify -- lifecycle` (`scripts/verify-lifecycle.mjs`) drives `/reload`, closes and relaunches Pi with the same exact `--session-id`, checks the JSONL session header id, session continuity, slash-command sentinel tokens (`v1` then `v2` after rewriting the packaged extension to simulate pickup), persisted spill reachability, and real Pi `tool_result` failure-patch semantics for a QA reclassification with a fake upstream on `PATH`. Default Pi model is `zai/glm-5.1`; default per-step wait is **180000 ms** (`DEFAULT_TIMEOUT_MS`); override model with `--model <id>` and waits with `--timeout-ms <ms>`. Passthrough flags in [`scripts/project.mjs`](../scripts/project.mjs): `--keep-artifacts`, `--model`, `--verbose`, and `--timeout-ms` plus a value (for example `npm run verify -- lifecycle --model openai-codex/gpt-5.5:minimal --keep-artifacts --verbose --timeout-ms 600000`). | Pass on 2026-06-03 (`npm run verify -- lifecycle`). Treat any future unexplained red lifecycle gate as a release blocker. |
74
- | Quick isolated Pi smoke | `pi --no-extensions --no-skills -e . --tools agent_browser` from repo root; native `agent_browser` only. | Last interactive tmux checkout smoke pass on 2026-05-29 (`agent-browser 0.27.0` at the time). The 2026-06-03 Crabbox matrix now covers clean packed Pi install plus deterministic wrapper dogfood on all required platforms for `agent-browser 0.27.1`; run a new manual tmux smoke before publish when human-readable transcript evidence is required. Broader historical coverage also includes version/help/skills, open/snapshot/click, eval stdin, batch stdin, screenshot, explicit session, `sessionMode: "fresh"`, network requests, console/errors, diff snapshot, stream status/disable, dashboard start/stop, and chat credential-failure pass-through during RQ-0055. |
53
+ | Default local gate | `npm run verify` checks generated playbook drift, `tsc --noEmit`, unit/fake tests, generated command-reference blocks, and live command-reference sampling. | **Current for 0.27.2:** pass on 2026-06-11 inside `npm run verify -- release`; 561 passed, 1 skipped, then command-reference generated blocks and live sampling passed with `agent-browser 0.27.2` on `PATH`. |
54
+ | Pre-PR local gate | `npm run verify -- pre-pr` composes the default gate with package-content verification. Use before larger local handoffs or PR-ready claims when lifecycle/platform/live dogfood cost is not warranted. | Added 2026-06-10; orchestration is locked by `test/project-verify.test.ts` and does not change release mode. |
55
+ | Real upstream contract | `npm run verify -- real-upstream` runs the localhost fixture matrix against the real installed `agent-browser` matching the baseline. | **Current for 0.27.2:** pass on 2026-06-11 (`npm run verify -- real-upstream`, `agent-browser 0.27.2` on `PATH`; includes 0.27.2 off-viewport click, frame-scoped selector/wait/click, form command, and wait-download artifact coverage). |
56
+ | Packaged Pi smoke | `npm run verify -- package-pi` validates package contents, loads the packaged `agent_browser` tool without requiring optional Brave config, and executes fake-upstream `--version`. | **Current for 0.27.2:** pass on 2026-06-11 as part of `npm run verify -- release` (`verify-package.mjs --smoke-pi`; packed 117 files, packaged `agent_browser --version` invocation passed). |
57
+ | Deterministic dogfood smoke | `npm run verify -- dogfood` (`scripts/verify-agent-browser-dogfood.ts`) drives the native wrapper against a local file fixture through top-level `qa`, `semanticAction`, constrained `job`, screenshot artifact verification, and session close with the real `agent-browser` on `PATH`. | **Current for 0.27.2:** pass on 2026-06-11 (`npm run verify -- dogfood`, `agent-browser 0.27.2`; `qa-url`, fresh/current opens, semantic click, job screenshot artifact, and close all passed). |
58
+ | Efficiency benchmark | `npm run verify -- benchmark` runs deterministic browser workflow accounting plus focused benchmark tests, including JSONL sampling fixtures and job/qa/sourceLookup/networkSourceLookup/Electron scenario coverage. | **Historical / pending refresh:** pass on 2026-05-29 (`npm run verify -- benchmark`). This deterministic gate is not upstream-version-specific, but rerun before claiming current benchmark evidence after benchmark or workflow-scenario edits. |
59
+ | Crabbox platform smoke | `npm run check:platform-smoke` syntax-checks the harness and cheap invariants. `npm run smoke:platform:ubuntu-image` builds the project-owned Linux image, `npm run smoke:platform:doctor` checks Crabbox 0.26.0+ and local target readiness, and `npm run smoke:platform:all` runs doctor first, then fast target-local `platform-build` (`npm run verify -- platform-target`, pack, clean Pi install) plus `browser-dogfood-smoke` on Crabbox `macos`, `ubuntu`, and `windows-native`; see [`platform-smoke.md`](platform-smoke.md). Target artifacts include Crabbox/provider/work-root metadata, and release review also checks provider-specific `crabbox list` commands for leftover leases/clones. | **Current for 0.27.2:** pass on 2026-06-11 inside `npm run verify -- release`; rebuilt Ubuntu image `pi-agent-browser-native-platform:node24-agent-browser0.27.2`, refreshed the Windows `crabbox-ready` template snapshot to `agent-browser 0.27.2`, doctor passed, then Crabbox platform smoke passed for macOS, Ubuntu, and native Windows. |
60
+ | `verify -- release` / `prepublishOnly` | `npm run verify -- release` chains the default gate with the configured-source lifecycle harness, packaged Pi smoke, and the release-blocking Crabbox platform matrix (`verifySteps` `release` in [`scripts/project.mjs`](https://github.com/fitchmultz/pi-agent-browser-native/blob/main/scripts/project.mjs)). `package.json` `prepublishOnly` runs that compose before `npm pack --dry-run` during `npm publish`. It intentionally omits standalone real-upstream, host-only dogfood, and benchmark modes—see [`RELEASE.md`](RELEASE.md#pre-release-checks). | **Current for 0.27.2:** pass on 2026-06-11 (`npm run verify -- release`), including default unit/fake gate, generated docs checks, live command-reference sampling, lifecycle harness, packaged Pi smoke, and macOS/Ubuntu/native-Windows Crabbox platform smoke. |
61
+ | Configured-source lifecycle | `npm run verify -- lifecycle` (`scripts/verify-lifecycle.mjs`) drives `/reload`, closes and relaunches Pi with the same exact `--session-id`, checks the JSONL session header id, session continuity, slash-command sentinel tokens (`v1` then `v2` after rewriting the packaged extension to simulate pickup), persisted spill reachability, and real Pi `tool_result` failure-patch semantics for a QA reclassification with a fake upstream on `PATH`. Default Pi model is `zai/glm-5.1`; default per-step wait is **180000 ms** (`DEFAULT_TIMEOUT_MS`); override model with `--model <id>` and waits with `--timeout-ms <ms>`. Passthrough flags in [`scripts/project.mjs`](https://github.com/fitchmultz/pi-agent-browser-native/blob/main/scripts/project.mjs): `--keep-artifacts`, `--model`, `--verbose`, and `--timeout-ms` plus a value (for example `npm run verify -- lifecycle --model openai-codex/gpt-5.5:minimal --keep-artifacts --verbose --timeout-ms 600000`). | **Current for 0.27.2:** pass on 2026-06-11 inside `npm run verify -- release`; exact session `piab-lifecycle-16278`, managed browser session `piab-pi-agent-browser-piablifecycl-186485dc`, persisted full output verified before cleanup. |
62
+ | Quick isolated Pi smoke | `pi --approve --no-extensions --no-skills -e . --tools agent_browser` from trusted repo root; native `agent_browser` only. | **Current for 0.27.2:** pass on 2026-06-11 via tmux with `pi --approve --no-extensions --no-skills -e .`; native `agent_browser` only. Covered `qa` with `sessionMode: "fresh"` against `https://example.com`, `open` and compact `snapshot -i` on `https://react.dev`, `semanticAction` link click to `https://react.dev/learn`, screenshot artifact verification at `/tmp/piab-release-smoke-react.png`, and `close`; explicit screenshot and temporary session artifacts were removed after evidence capture. Broader historical coverage also includes version/help/skills, eval stdin, batch stdin, explicit session, network requests, console/errors, diff snapshot, stream status/disable, dashboard start/stop, and chat credential-failure pass-through during RQ-0055. |
63
+
64
+ Runtime floor note: package metadata keeps Pi core package peer ranges wildcard per installed Pi package docs, but `pi-agent-browser-doctor` / `npm run doctor` treats `pi --version` below 0.79.0 as a setup failure. This keeps package dependency shape aligned with Pi package loading while still making unsupported host Pi versions a release and first-run blocker.
75
65
 
76
66
  ## Baseline checklist by inventory section
77
67
 
@@ -82,78 +72,16 @@ Re-run the gates below before each release; this table records what the closure
82
72
  | Sessions, state, tabs, frames, dialogs, and windows | 20 canonical tokens from baseline section `state-tabs-frames-dialogs`; see [`scripts/agent-browser-capability-baseline.mjs`](../scripts/agent-browser-capability-baseline.mjs) and generated [`COMMAND_REFERENCE.md`](COMMAND_REFERENCE.md#session-state-frames-dialogs-windows-and-inspection-commands). | [`COMMAND_REFERENCE.md`](COMMAND_REFERENCE.md#session-state-frames-dialogs-windows-and-inspection-commands), stateful workflow notes, [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#details). | Stateful summaries/redaction, state artifact handling, sessionless local command planning, managed-session restore, tab target pinning, and close alias cleanup. | Extension-validation stateful matrix, runtime session/resume tests, presentation redaction tests, lifecycle harness. | Supported. External profile/auth state remains operator-owned. |
83
73
  | Network, storage, artifacts, diagnostics, and performance | 42 canonical tokens from baseline section `network-storage-artifacts-diagnostics`; see [`scripts/agent-browser-capability-baseline.mjs`](../scripts/agent-browser-capability-baseline.mjs) and generated [`COMMAND_REFERENCE.md`](COMMAND_REFERENCE.md#page-state-finding-mouse-settings-network-and-storage). | [`COMMAND_REFERENCE.md`](COMMAND_REFERENCE.md#page-state-finding-mouse-settings-network-and-storage), diagnostic sections, [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#details). | Thin passthrough plus compact diagnostics, route-mock warnings, useful-but-redacted storage output, stream idempotency normalization, artifact metadata, missing-ffmpeg warnings, sensitive-data redaction, timeout bounds, and cleanup-pair guidance. | Fake non-core matrix and safe real-upstream coverage for network/HAR, diff, trace/profiler, console/errors/highlight, stream, vitals, and React missing-renderer. | Supported. Environment-sensitive operations need suitable local/browser state. |
84
74
  | Batch, auth, confirmations, setup, dashboard, devices, and AI commands | 24 canonical tokens from baseline section `batch-auth-setup-ai`; see [`scripts/agent-browser-capability-baseline.mjs`](../scripts/agent-browser-capability-baseline.mjs) and generated [`COMMAND_REFERENCE.md`](COMMAND_REFERENCE.md#batch-auth-confirmations-sessions-chat-dashboard-devices-and-setup). | [`COMMAND_REFERENCE.md`](COMMAND_REFERENCE.md#batch-auth-confirmations-sessions-chat-dashboard-devices-and-setup), README security notes, release docs. | Native-tool batch stdin, generated `job`/`qa`/lookup batch plans, auth/confirmation redaction, sessionless local auth/setup/dashboard/doctor planning, timeout/cleanup guidance. | Unit/fake batch/auth/confirmation/dashboard/chat/doctor tests; extension-validation for structured input modes; efficiency benchmark scenarios. | Supported. Interactive side-effecting setup/auth/chat remains upstream-owned. |
85
- | Global flags, config, providers, policy, and environment | 117 canonical tokens from baseline section `options-and-env`; see [`scripts/agent-browser-capability-baseline.mjs`](../scripts/agent-browser-capability-baseline.mjs) and generated [`COMMAND_REFERENCE.md`](COMMAND_REFERENCE.md#important-global-flags-config-and-environment). | [`COMMAND_REFERENCE.md`](COMMAND_REFERENCE.md#important-global-flags-config-and-environment), README provider/setup notes, [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#sessionmode), architecture/runtime docs. | Runtime handles command discovery, value-flag prevalidation, launch-scoped flags, redacted echoes, fresh-session recovery hints, explicit sessions, provider/device launch-scoping, curated env forwarding, subprocess completion, and package-owned Pi-scoped config for optional companion features. | Runtime tests for flags/planning/redaction/session behavior; process tests for env and stdio-linger completion; config/web-search/CLI tests; fake provider/specialized-skill matrix; package doctor. | Supported. Provider clouds, iOS/Appium, proxies, profiles, and credentials require external setup. |
75
+ | Global flags, config, providers, policy, and environment | 120 canonical tokens from baseline section `options-and-env`; see [`scripts/agent-browser-capability-baseline.mjs`](../scripts/agent-browser-capability-baseline.mjs) and generated [`COMMAND_REFERENCE.md`](COMMAND_REFERENCE.md#important-global-flags-config-and-environment). | [`COMMAND_REFERENCE.md`](COMMAND_REFERENCE.md#important-global-flags-config-and-environment), README provider/setup notes, [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#sessionmode), architecture/runtime docs. | Runtime handles command discovery, value-flag prevalidation, launch-scoped flags, redacted echoes, fresh-session recovery hints, explicit sessions, provider/device launch-scoping, curated env forwarding, subprocess completion, and package-owned Pi-scoped config for optional companion features. | Runtime tests for flags/planning/redaction/session behavior; process tests for env and stdio-linger completion; config/web-search/CLI tests; fake provider/specialized-skill matrix; package doctor. | Supported. Provider clouds, iOS/Appium, proxies, profiles, and credentials require external setup. |
86
76
 
87
77
  ## Follow-up decision after closure
88
78
 
89
- Native `job`, `qa`, experimental `sourceLookup`, experimental `networkSourceLookup`, first-class Electron lifecycle/probe support, and optional Exa/Brave-backed companion web search are shipped.
90
-
91
- `RQ-0121` adds Pi-scoped package config plus optional Exa/Brave web search without turning search into an `agent_browser` input mode. Config lives at `~/.pi/config/pi-agent-browser-native/config.json`, `.pi/config/pi-agent-browser-native/config.json`, or an explicit `PI_AGENT_BROWSER_CONFIG` override, with global → project → override merge order and `EXA_API_KEY` / `BRAVE_API_KEY` as fallbacks only when no config credential source exists for that provider. `webSearch.exaApiKey` and `webSearch.braveApiKey` support Pi model/provider-style literal, `$ENV_VAR` / `${ENV_VAR}`, escape, and `!command` values in trusted global/override config; project-local config rejects plaintext, custom env aliases, interpolation-literal, malformed, and command-backed keys and allows only matching provider env refs. `webSearch.enabled: false` disables the tool even when environment keys exist after the final config merge: global disable is a user default, project disable is repo-scoped, and `PI_AGENT_BROWSER_CONFIG` disable wins for a hard per-run off switch. `webSearch.preferredProvider` chooses the default provider when both credentials resolve; Exa is the default because `/search` with highlights is the more agent-oriented result shape. `agent_browser_web_search` registers only when a usable credential source is available, resolves command secrets lazily at execution, calls Exa `/search` with highlights or Brave Search, returns compact normalized result details, and never exposes the key. Browser default profile/executable config records conservative prompt guidance only from trusted global or explicit override config; project-local browser config is not trusted to steer host profile/executable prompt guidance, and no config auto-injects launch args. Contract: [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#optional-companion-web-search); human workflow: README optional package config and [`COMMAND_REFERENCE.md`](COMMAND_REFERENCE.md#optional-package-config-and-companion-web-search); implementation: `extensions/agent-browser/lib/config.ts`, `extensions/agent-browser/lib/web-search.ts`, `scripts/config.mjs`, and conditional registration in `extensions/agent-browser/index.ts`; fake coverage: `test/agent-browser.config.test.ts`, `test/agent-browser.web-search.test.ts`, and `test/agent-browser.config-cli.test.ts`.
92
-
93
- `RQ-0122` tracks the private shared-session finding reported by @deepakness: agents should not fan out web searches until provider rate limits are hit, and browser/profile config failures should lead to diagnostics plus user-facing setup recommendations instead of repeated failing opens. The wrapper now serializes companion web-search calls with a small request gate, adds agent-visible 429 guidance, rejects `--session-mode` inside native `args` in favor of top-level `sessionMode`, surfaces profile/user-data-dir failures with `inspect-browser-profiles` and `run-agent-browser-doctor` next actions, treats `--executable-path` as launch-scoped for active implicit sessions, and adds conservative config/prompt/docs support for non-default Chromium-compatible browser executables. Contract: [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#details) and [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#sessionmode); human workflow: README authenticated/profile workflows and [`COMMAND_REFERENCE.md`](COMMAND_REFERENCE.md#switch-from-an-already-active-implicit-session-to-a-fresh-profiled-or-alternate-browser-launch); implementation: `extensions/agent-browser/lib/web-search.ts`, `extensions/agent-browser/lib/config-policy.js`, `extensions/agent-browser/lib/results/presentation/browser-profile-recovery.ts`, `extensions/agent-browser/lib/results/presentation/errors.ts`, `extensions/agent-browser/lib/launch-scoped-flags.ts`, `extensions/agent-browser/lib/runtime.ts`, `extensions/agent-browser/lib/config.ts`, `scripts/config.mjs`, and `extensions/agent-browser/lib/playbook.ts`; fake coverage: `test/agent-browser.web-search.test.ts`, `test/agent-browser.presentation-skills-recovery.test.ts`, `test/agent-browser.results.test.ts`, `test/agent-browser.runtime.test.ts`, `test/agent-browser.config.test.ts`, `test/agent-browser.config-cli.test.ts`, and prompt-guidance assertions in `test/agent-browser.extension-validation.test.ts`.
94
-
95
- `RQ-0066` shipped as the bounded evidence model in [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#sourcelookup): it compiles to upstream `batch` steps (`is visible`, `get html`, `react inspect`, `react tree` as applicable), merges `details.sourceLookup` into the tool `details` alongside batch presentation, and never reclassifies an upstream-successful batch to failed solely because no candidates were found (unlike `qa` diagnostic reclassification). Wrapper-tracked packaged Electron no-candidate results now add bounded `workspaceRoot` / `electronContext` when available, limitations that the scan only covers the Pi cwd and does not unpack installed app resources or `app.asar`, and live Electron `snapshot` / `probe` / `tab list` next actions. Fake coverage: `agentBrowserExtension explains packaged Electron sourceLookup no-candidate boundaries` in [`test/agent-browser.extension-validation.test.ts`](../test/agent-browser.extension-validation.test.ts).
96
-
97
- `RQ-0067` shipped as the failed-request correlation experiment in [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#networksourcelookup): it compiles to upstream `batch` steps (`network request …` and/or `network requests --filter …`), merges `details.networkSourceLookup` after scanning batch JSON for failed requests and optional workspace URL literals, redacts query strings and credentials in model-visible surfaces, and never reclassifies an upstream-successful batch to failed solely because no candidates were found.
98
-
99
- `RQ-0093` keeps network diagnostics read-only for wrapper page/ref state: standalone `network request …` results and generated `networkSourceLookup` batch rows may contain API/request URLs, but those URLs are not promoted to `details.sessionTabTarget` and do not stale the latest app-page `details.refSnapshot`. The prior session target is preserved until a real page/navigation/snapshot result updates it. Contract: [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#networksourcelookup) and [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#details); fake coverage: `agentBrowserExtension keeps network request diagnostics from replacing the active page target` in [`test/agent-browser.extension-validation.test.ts`](../test/agent-browser.extension-validation.test.ts).
100
-
101
- `RQ-0095` adds bounded machine follow-ups for compact `network requests` output: `extensions/agent-browser/lib/results/presentation/diagnostics.ts` selects at most one safe request ID (actionable failed row first, then API/fetch-like row, benign failed row, or first safe ID) and appends `details.nextActions` for exact `network request <id>`, optional `networkSourceLookup` on actionable failed rows, path filtering with `network requests --filter <path>`, and `network har start` before a repro. Request-detail/filter/HAR argv preserve the current `--session` prefix when known, source lookup nextActions carry `networkSourceLookup.session` when known, and URL queries plus sensitive-looking IDs/paths are omitted from action params. Route-mock diagnostics (#73) now track successful `network route` / `network unroute` patterns per session and, on later `network requests`, surface `details.networkRouteDiagnostics` plus executable `inspect-routed-network-request` and `start-network-har-capture-for-route-mock` follow-ups when a matching fetch/XHR row is failed, pending, or CORS/preflight-looking; same-origin/CORS fixture guidance stays in prose rather than a non-runnable next action. Compact network previews hide `data:image` screenshot/artifact rows by default while preserving raw rows in `details.data.requests`. Contract: [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#details); human workflow: [`COMMAND_REFERENCE.md`](COMMAND_REFERENCE.md) network diagnostics note and README source-lookup section; fake coverage: `buildToolPresentation formats redacted network payload, response, and error previews`, `buildToolPresentation returns bounded network request next actions for benign and successful API rows`, `buildToolPresentation adds routed pending network diagnostics`, `buildToolPresentation flags routed requests that return failed statuses`, and `agentBrowserExtension reports unfulfilled routed network mocks`.
102
-
103
- `RQ-0092` adds first-class native select support to the wrapper shorthand surfaces without adding a recipe layer: `semanticAction.action = "select"` requires `selector` plus `value` or `values` and compiles to upstream `select <selector> <value...>`; constrained `job` supports the same `select` step inside generated `batch` stdin. Role/name/label dropdown selection is deliberately not hidden behind `find … select` because upstream `find` has no verified select action; agents should use a stable selector or a current `@ref` for native selects and reserve visible option refs for custom comboboxes after a fresh snapshot. Stale-ref retries remain limited to compiled `find` semantic actions, so `select @e…` failures return refresh guidance rather than blind retry. Contract: [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#semanticaction) and [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#job); fake coverage: semanticAction/job select compile in [`test/agent-browser.extension-input-modes.test.ts`](../test/agent-browser.extension-input-modes.test.ts) and stale-ref assertions in [`test/agent-browser.extension-validation.test.ts`](../test/agent-browser.extension-validation.test.ts); real-upstream coverage: raw, semanticAction, and job select against the localhost native `<select>` fixture in [`test/agent-browser.real-upstream-contract.test.ts`](../test/agent-browser.real-upstream-contract.test.ts).
104
-
105
- `RQ-0091` keeps advanced release smoke tests focused on extension behavior instead of external skill routing: the Sauce Demo smoke in [`RELEASE.md`](RELEASE.md#public-sauce-demo-checkout-smoke-prompt) now launches with `--no-skills`, restricts tools to `agent_browser`, and uses bounded release-smoke wording rather than dogfood/exploratory QA language. Runtime guidance keeps stop-before-order/post/purchase/submit as agent-responsibility guidance and exact-artifact-before-close as a wrapper-checkable contract from `extensions/agent-browser/lib/playbook.ts`; no site-specific automation or recipe layer was added. Evidence from the failed high/low local-shop runs showed skill/report drift (`dogfood-output` substitution) and reasoning complexity, not a wrapper command defect, so skill-enabled dogfood remains a separate validation mode. Human workflow: [`RELEASE.md`](RELEASE.md#public-sauce-demo-checkout-smoke-prompt), [`AGENTS.md`](../AGENTS.md#preferred-testing-workflow), and [`REQUIREMENTS.md`](REQUIREMENTS.md#testing-guidance).
106
-
107
- `RQ-0090` keeps prompt-derived preflight guards limited to machine-checkable exact artifact requirements. `buildPromptPolicy` in `extensions/agent-browser/lib/prompt-policy.ts` extracts exact requested artifact paths from the latest user message; `prompt-guards.ts` blocks browser `close` / `quit` / `exit` with `details.promptGuard.reason: "requested-artifacts-missing-before-close"` until required prompt screenshot paths are verified in `details.artifactManifest` (optional recording paths are required only when recording appears available). The wrapper intentionally does not infer broad business/user intent from prompt text such as “stop before checkout” or “do not post anything”; those stop boundaries remain agent responsibility and are documented as guidance, not runtime action blocks. Contract: [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#details) (`promptGuard`); human workflow: README stop-boundary/artifact notes and [`COMMAND_REFERENCE.md`](COMMAND_REFERENCE.md); fake coverage: `agentBrowserExtension does not turn prompt stop-boundary text into click blocks`, `agentBrowserExtension blocks close until required prompt screenshot artifacts are saved`, and `buildPromptPolicy detects requested artifact paths without deriving semantic action blockers`.
108
-
109
- `RQ-0097` keeps upstream subprocess completion reliable when detached descendants inherit the child’s stdio handles: `runAgentBrowserProcess` in `extensions/agent-browser/lib/process.ts` uses `watchSpawnedChildCompletion` to observe both Node `exit` and `close`, leaves piped stdio intact during the short post-`exit` grace (`EXIT_STDIO_GRACE_MS`, currently **100 ms**) so normal `close` can still win, destroys those streams only if the fallback resolves, and resolves with exit-code precedence `close` → wrapper timeout (**124**) → post-`exit` fallback for the direct child → spawn failure (**127**) when `close` is still delayed so the Pi tool cannot hang after `agent-browser` has already exited. Human context: [`ARCHITECTURE.md`](ARCHITECTURE.md#direct-subprocess-execution) (subprocess bullet) and [`AGENTS.md`](../AGENTS.md) (**Runtime planning** → **Upstream subprocess completion**); fake coverage: `runAgentBrowserProcess resolves after exit when descendants keep stdio handles open` asserts the post-exit fallback returns near the 100 ms grace window instead of the process timeout, and `runAgentBrowserProcess returns timeout exit code when descendants keep stdio handles open` in [`test/agent-browser.process.test.ts`](../test/agent-browser.process.test.ts).
110
-
111
- `RQ-0096` ships first-class Electron desktop-app support without adding a generic recipe runtime: top-level `electron` covers wrapper-owned `list`, isolated `launch` with snapshot/tabs/connect handoff, `status`, `cleanup`, and compact current-session or launch-scoped `probe`; `qa.attached` extends the existing QA preset for attached Electron/CDP sessions without introducing `electron.qa`. `launch.handoff` still defaults to `"snapshot"`, while `handoff: "tabs"` is documented as the safer diagnostic starting point when refs/content capture is not needed yet. Host install discovery (`discoverElectronApps`) is macOS/Linux-only today: on Windows `electron.list` reports `platform: "unsupported"` with an empty catalog and name/bundle targets cannot resolve from scans—use `executablePath` (or a host path to the Electron binary) for Windows launch targeting. Discovery adds non-blocking likely-sensitive app annotations plus visible isolated-profile/auth-state warnings; launch output and `details.electron.profileIsolation` state that wrapper launches do not reuse existing signed-in app profiles or attach to already-running authenticated apps, and point agents to the host debug-port launch plus raw `connect` path when signed-in local app state is the goal; launch timeout failures include PID/profile/DevToolsActivePort/timing diagnostics; status/probe add launch/session identifiers, liveness, mismatch/reattach next actions, and dead-launch context for `about:blank`; post-mutation Electron death is upgraded to `tab-drift` with `details.electronPostCommandHealth`; Electron fills can add `details.fillVerification`; Electron `@e…` mutations can add same-URL ref freshness guidance; broad Electron `get text` selectors add scope warnings; cleanup ownership is bounded to wrapper-created launch records and temp profiles; externally launched debug ports stay on the manual `args: ["connect", "<port-or-url>"]` path and remain host-owned. Runtime-owned off-branch launch records remain visible to `electron.status { launchId }`, `electron.status { all: true }`, `electron.probe { launchId }`, and `electron.cleanup`; default current-session `electron.probe` stays scoped to the active managed session, and no-arg status/cleanup reports ambiguity when multiple active branch/off-branch records are still owned. Explicit cleanup is serialized with managed-session work, records managed-session close success independently from partial process/profile cleanup, clears live/restore managed-session state for the closed wrapper session, updates branch-visible Electron state only with selected cleanup records instead of unrelated off-branch lookup records, and rotates the next default auto browser call away from that closed name. `/reload` preserves the current branch-visible active Electron launch and its isolated temp `userDataDir` for continuity, cleans off-branch owned Electron launches before clearing process-local ownership, and durably protects profile dirs from generic temp cleanup, quit cleanup, process-exit cleanup, and stale temp-root pruning after restart when partial Electron cleanup deliberately skips or fails `user-data-dir` removal. Contract: [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#electron) plus [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#qa) for `qa.attached`; human workflow: [`COMMAND_REFERENCE.md`](COMMAND_REFERENCE.md#electron-desktop-apps) and README common calls; implementation: `extensions/agent-browser/lib/input-modes/electron.ts`, `extensions/agent-browser/lib/orchestration/electron-host/`, `extensions/agent-browser/lib/orchestration/browser-run/`, `extensions/agent-browser/lib/electron/`, and dispatch/state wiring in `extensions/agent-browser/index.ts`; deterministic efficiency evidence: `electron-lifecycle` and `electron-probe` in `scripts/agent-browser-efficiency-benchmark.mjs`; fake coverage includes Electron schema/probe/mismatch/post-command-health/fill-verification/broad-text/discovery-sensitivity and packaged-sourceLookup cases in [`test/agent-browser.extension-validation.test.ts`](../test/agent-browser.extension-validation.test.ts), plus off-branch Electron status/probe/cleanup, targeted cleanup without unrelated branch promotion, explicit cleanup serialization, current and restored cleanup managed-session retirement, active-Electron reload preservation, off-branch Electron reload cleanup, durable partial off-branch reload/quit profile preservation, protected temp-root process-exit and stale-prune cleanup, and partial-cleanup managed-session untracking in [`test/agent-browser.extension-ref-guards.test.ts`](../test/agent-browser.extension-ref-guards.test.ts). This plan is the `RQ-0068` revisit evidence for Electron specifically: [`docs/plans/electron-extension-2026-05-20.md`](plans/electron-extension-2026-05-20.md) documents repeated failure-prone discover/launch/attach/cleanup and multi-call state-probe sequences, plus bounded owner/versioning/test/docs artifacts.
112
-
113
- `RQ-0097` completes manual CDP attach recovery without making manually launched apps wrapper-owned: successful raw `connect` results append the session-scoped safe tab-list action `list-connected-session-tabs`; `snapshot -i` failures whose upstream error says `No active page` append the safe tab-list action `list-tabs-after-no-active-page` when a session is known. Agents then choose a stable `tab t<N>` target and run `snapshot -i` explicitly; the wrapper does not emit raw-connect or no-active-page snapshot retry ids without a wrapper-observed safe tab id. The runtime source of truth for these recovery ids is `AGENT_BROWSER_RECOVERY_NEXT_ACTION_IDS` in `extensions/agent-browser/lib/results/recovery-actions.ts` (re-exported from `shared.ts`). The guidance keeps manual signed-in desktop apps and explicit artifacts host-owned while `close` remains a browser/CDP-session close and `electron.cleanup` remains limited to wrapper-created `electron.launch` records. Contract: [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#details); human workflow: [`ELECTRON.md`](ELECTRON.md#manual-host-launch-pattern) and [`COMMAND_REFERENCE.md`](COMMAND_REFERENCE.md#electron-desktop-apps); fake coverage: raw connect and no-active snapshot assertions in [`test/agent-browser.extension-validation.test.ts`](../test/agent-browser.extension-validation.test.ts), plus central next-action helper coverage in [`test/agent-browser.results.test.ts`](../test/agent-browser.results.test.ts).
114
-
115
- `RQ-0068` remains closed with a no-adopt decision for a reusable named browser recipe runtime. The Electron evidence above justified a narrow typed shorthand and compact probe, not an open-ended recipe layer; future reusable recipes still require concrete repeated workflow evidence and a defined owner/versioning/test plan.
116
-
117
- `RQ-0098` completes the docs/playbook groundwork for desktop readiness and wait orchestration without adding a runtime primitive or reusable recipe layer. The accepted ladder is: prefer condition waits (`wait --text`, `wait --url`, `wait --fn`, `wait --load <state>`, `wait --download`) when a real condition exists; after raw manual CDP `connect`, inspect `tab list`, select a stable `tab t<N>` surface, then run a condition wait or `snapshot -i`; after wrapper-owned `electron.launch`, use `electron.probe` / `electron.status` when launch health or target mismatch matters; use `qa.attached` for current-session text/selector diagnostics; keep fixed waits as a last resort below the wrapper IPC budget; and treat fixed-wait payloads such as `"waited":"timeout"` as elapsed time rather than completion evidence. Manual signed-in attach docs now also restate that `connect` readiness is not immediate readiness, close commands (`close`, `quit`, or `exit`) only close the browser/CDP session, `electron.cleanup` remains wrapper-owned, and manually launched apps plus explicit artifacts stay host-owned. Human workflow: [`ELECTRON.md`](ELECTRON.md#readiness-and-waits), [`COMMAND_REFERENCE.md`](COMMAND_REFERENCE.md#wait-for-page-readiness-or-downloads), README Electron section, and generated playbook text from `extensions/agent-browser/lib/playbook.ts`. Revisit a first-class host-idle primitive only with repeated desktop smoke evidence that condition waits, `qa.attached`, `electron.probe`, snapshots, and screenshots cannot cover the workflow. Verification: `npm run docs` keeps generated playbook fragments aligned; no runtime `details.nextActions` are part of this RQ.
118
-
119
- `RQ-0100` makes desktop tab/surface drift recovery machine-readable without adding routine tab-list probes for normal clicks. When existing wrapper state already identifies a target tab, about:blank and tab-drift paths append `list-tabs-for-about-blank-recovery` or `list-tabs-for-tab-drift-recovery`, then `select-intended-tab-after-drift` and `snapshot-after-tab-recovery` when the stable `t<N>` id is known. The implementation reuses `priorSessionTabTarget`, `aboutBlankSessionMismatch`, `sessionTabCorrection`, `openResultTabCorrection`, and existing tab-correction outputs; it does not probe tabs for ordinary clicks beyond the RQ-0086-gated drift paths. Contract: [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#details); human workflow: [`COMMAND_REFERENCE.md`](COMMAND_REFERENCE.md#tabs) and [`ELECTRON.md`](ELECTRON.md#troubleshooting); fake coverage: about:blank recovery and explicit-about:blank negatives in [`test/agent-browser.extension-tab-recovery.test.ts`](../test/agent-browser.extension-tab-recovery.test.ts), early tab-drift failure assertions in [`test/agent-browser.extension-validation.test.ts`](../test/agent-browser.extension-validation.test.ts), and central next-action helper coverage in [`test/agent-browser.results.test.ts`](../test/agent-browser.results.test.ts).
120
-
121
- `RQ-0099` makes semantic fill misses on host-controlled rich inputs recoverable without changing upstream `find` semantics or adding a recipe runtime. Active-session role/name `semanticAction.fill` first gets a guarded pre-execution current-ref pass: one fresh `snapshot -i`, one exact editable `combobox` / `searchbox` / `textbox` match, then direct `fill @ref <text>` while preserving the original semantic target in `details.compiledSemanticAction`. When a later `selector-not-found` recovery already collected an exact current editable `searchbox` / `textbox` ref, `extensions/agent-browser/lib/results/selector-recovery.ts` defines `details.richInputRecovery`, visible `Rich input recovery`, and bounded `focus-current-editable-ref*` / `click-current-editable-ref*` next actions; `extensions/agent-browser/index.ts` only probes the current session snapshot and merges the result. Those next actions never copy the fill text and never press `Enter` or submit; agents should refresh refs, choose the current editable `@ref`, focus/click it, then use `keyboard inserttext` or `keyboard type` with the intended text only after the right input is focused. Contract: [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#details); human workflow: README locator shorthand, [`COMMAND_REFERENCE.md`](COMMAND_REFERENCE.md#selector-strategy), and generated playbook text from `extensions/agent-browser/lib/playbook.ts`; fake coverage: `agentBrowserExtension resolves semantic role fills through one exact current editable ref` in [`test/agent-browser.extension-input-modes.test.ts`](../test/agent-browser.extension-input-modes.test.ts) and `agentBrowserExtension returns rich input recovery when semanticAction fill misses current editable refs` in [`test/agent-browser.extension-validation.test.ts`](../test/agent-browser.extension-validation.test.ts).
79
+ Detailed closure notes live in the repository source at [`docs/support-notes.md`](https://github.com/fitchmultz/pi-agent-browser-native/blob/main/docs/support-notes.md) under the follow-up decision section. Keep this section as the active index of shipped follow-up areas and their canonical contracts.
122
80
 
123
- `RQ-0101` improves compact snapshot usefulness for dense desktop host screens without adding a new mode or dumping all refs inline. `extensions/agent-browser/lib/results/snapshot.ts` still emits the existing visible `Omitted high-value controls` section and `details.data.highValueControlRefIds`, while `snapshot-high-value-controls.ts` selects omitted controls with bounded diversity so editable/searchbox/textbox/combobox controls, named tab/surface controls, primary action buttons, and high-signal named links such as repository search results remain discoverable even when many utility buttons and dense host rows compete for the trimmed ref budget. Human workflow: [`COMMAND_REFERENCE.md`](COMMAND_REFERENCE.md#snapshot-refs-and-current-page-state), [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#details), and README; fake coverage: `buildToolPresentation keeps dense desktop host high-value controls discoverable in compact snapshots` in [`test/agent-browser.snapshot-presentation.test.ts`](../test/agent-browser.snapshot-presentation.test.ts).
124
-
125
- `RQ-0070` adds bounded locator fallbacks when a compiled top-level `semanticAction` fails with `failureCategory: "selector-not-found"`: `extensions/agent-browser/index.ts` appends `try-*-candidate` entries to `details.nextActions` (and an `Agent-browser candidate fallbacks` block in visible text) only for `click`+`text` (`try-button-name-candidate` and `try-link-name-candidate`). Other locator/action pairs omit this block; fill recovery now goes through the RQ-0099 current-editable-ref ladder so candidate nextActions do not repeat fill text. `semanticAction` `select` uses explicit `selector` plus `value`/`values` and compiles to upstream `select`, not to unverified `find … select`; `semanticAction.uncheck` is intentionally not exposed while upstream `find … uncheck` is not runtime-supported, and raw `uncheck <selector-or-ref>` remains available. Active-session role/name click/check/fill shorthands also get a pre-execution visible-ref resolution pass via one fresh `snapshot -i`, so hidden duplicate upstream `find` matches do not steal the action; fill requires one exact editable current ref. The original target remains in `details.compiledSemanticAction` and the executed ref appears in `details.effectiveArgs`. Contract: [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#semanticaction); fake coverage: semantic selector-miss assertions in [`test/agent-browser.extension-validation.test.ts`](../test/agent-browser.extension-validation.test.ts) plus current-ref assertions and `agentBrowserExtension resolves semantic role clicks through current visible snapshot refs when available` / `agentBrowserExtension resolves semantic role fills through one exact current editable ref` in [`test/agent-browser.extension-input-modes.test.ts`](../test/agent-browser.extension-input-modes.test.ts).
126
-
127
- `RQ-0071` makes that shorthand session-aware: optional `semanticAction.session` compiles to `--session <name>` before `find`, direct selector/ref commands, or `select`, so `buildExecutionPlan` treats the call like any argv that already names an upstream session (no extra implicit `--session`); `details.sessionName` reflects the name on success; stale-ref retries for compiled `find` actions copy compiled argv with that prefix, and `try-*` candidates preserve the same `--session` prefix via `getCompiledSemanticActionSessionPrefix`. Contract: [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#semanticaction); fake coverage: `semanticAction` session compile/assertions in [`test/agent-browser.extension-input-modes.test.ts`](../test/agent-browser.extension-input-modes.test.ts).
128
-
129
- `RQ-0088` adds current-snapshot ref fallback for selector misses: when raw `find` or compiled `semanticAction` fails with `failureCategory: "selector-not-found"`, `extensions/agent-browser/index.ts` may take one fresh session-scoped `snapshot -i`, then `extensions/agent-browser/lib/results/selector-recovery.ts` looks for exact normalized role/name matches for the failed target and emits `details.visibleRefFallback` plus visible `Current snapshot ref fallback`. Non-fill matches append bounded direct-ref next actions (`try-current-visible-ref` / `try-current-visible-ref-N`); fill matches omit direct args/text and feed the RQ-0099 rich-input recovery path when the ref is editable. The matcher is intentionally narrow: role locators require `--name`; text-click maps only to exact-name `button`/`link` refs; label/placeholder fill maps only to exact-name textbox/searchbox-style refs; prefixes/fuzzy matches are ignored, and duplicate exact matches carry ambiguity safety copy. Contract: [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#details) (`visibleRefFallback`, nextActions); human workflow: [`COMMAND_REFERENCE.md`](COMMAND_REFERENCE.md) selector strategy and README pitfalls; fake coverage: `agentBrowserExtension suggests current snapshot refs when raw find role locators miss` in [`test/agent-browser.extension-validation.test.ts`](../test/agent-browser.extension-validation.test.ts).
130
-
131
- `RQ-0072` guards page-scoped `@e…` refs against silent recycling: successful `snapshot` (or the last `snapshot` step inside a successful `batch`) records `details.refSnapshot` with ref ids and the snapshot page URL; `extensions/agent-browser/lib/session-page-state.ts` replays per-session snapshots and `refSnapshotInvalidation` markers from the active transcript branch on `session_start` and Pi 0.78 `session_tree` branch changes, clears them on successful close commands (`close`, `quit`, or `exit`), invalidates prior refs when a session `snapshot` fails with `No active page`, rejects mutation-prone ref argv before spawn when the tab URL diverges, a ref id is missing from the latest snapshot, or the session refs are invalidated, blocks `batch` stdin that uses `@e…` on a guarded command after an earlier step that can navigate or mutate until a `snapshot` step appears later in the same stdin array (pre-spawn latch reset only), and prefixes `refresh-interactive-refs` with `--session` when the call names a session (including upstream-classified `stale-ref` outcomes). The entrypoint also serializes `session_tree` restore and wrapper-owned browser commands with managed-session work, guards independent caller-owned explicit-session completions with a branch-state generation check, keeps process-owned cleanup registries for managed sessions and wrapper-launched Electron records separate from the branch-visible view, treats explicit wrapper-owned close rows and Electron cleanup managed-session steps as restore-visible close events, closes off-branch owned managed sessions and Electron launches on non-quit reload shutdown, preserves current branch-visible active managed/Electron sessions and active Electron temp profiles for reload continuity, and preserves fresh-session allocation monotonicity across branch restores. Contract: [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#details) (`refSnapshot`, `refSnapshotInvalidation`, `stale-ref`); human workflow: [`COMMAND_REFERENCE.md`](COMMAND_REFERENCE.md) snapshot/ref notes and README pitfalls; fake coverage: `agentBrowserExtension recommends tab recovery after No active page snapshot failures` and `agentBrowserExtension invalidates refs after No active page snapshot failures inside batch` in [`test/agent-browser.extension-validation.test.ts`](../test/agent-browser.extension-validation.test.ts), plus `agentBrowserExtension blocks page-scoped ref reuse…`, `…rehydrates page-scoped refs from the current tree branch`, `…rehydrates managed browser session state from the current tree branch`, `…rehydrates artifact manifest state from the current tree branch`, `…keeps Electron cleanup ownership after session_tree switches away from the launch branch`, `…blocks stale refs after page-changing steps inside a batch`, `…allows same-snapshot form fills before a batch click`, `…allows same-snapshot form control batches before a hard invalidating click`, `…allows batch stdin ref steps after snapshot following an invalidating step`, `…records snapshot refs returned inside a successful batch`, and `…rejects refs absent from the latest same-page snapshot` in [`test/agent-browser.extension-ref-guards.test.ts`](../test/agent-browser.extension-ref-guards.test.ts); managed-session reload cleanup, explicit close untracking/state rotation/restore, generated fresh-name reservation after repeated explicit closes, explicit-session command versus `session_tree` generation-guard coverage, explicit close versus in-flight implicit command serialization, and fresh-ordinal coverage lives in [`test/agent-browser.resume-state.test.ts`](../test/agent-browser.resume-state.test.ts).
132
-
133
- `RQ-0087` keeps the RQ-0072 guard but removes safe same-snapshot form work from the batch invalidation latch: `fill @e…` rows and role-checked native form-control rows (`check`/`uncheck` or direct `click`/`tap` on checkbox or radio refs, and `select` on combobox refs) remain guarded against stale/missing refs, yet can run before the first hard click/submit/navigation step in one upstream `batch`. A later guarded ref after `open`, `reload`, non-form `click`/`tap`, or other invalidating rows still fails before spawn unless the batch includes a fresh `snapshot` step first; checkbox/radio clicks are only allowed when every ref in the step has latest-snapshot checkbox/radio role evidence. This improves login/checkout/static-form efficiency without permitting likely post-navigation ref reuse. Contract: [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#details) (`Batch stdin ordering`); human workflow: README and [`COMMAND_REFERENCE.md`](COMMAND_REFERENCE.md) ref notes; fake coverage: `agentBrowserExtension allows same-snapshot form fills before a batch click` and `agentBrowserExtension allows same-snapshot form control batches before a hard invalidating click` in [`test/agent-browser.extension-ref-guards.test.ts`](../test/agent-browser.extension-ref-guards.test.ts).
134
-
135
- `RQ-0073` surfaces likely overlay blockers in snapshot output and after no-navigation clicks without inventing blind targets: successful `snapshot` results can emit the same blocker candidates when their own refs show strong modal evidence; for **top-level** `click` results (unified command `click`, not `batch`-wrapped steps) whose upstream JSON includes `data.clicked`, whose prior pinned tab URL and post-click URL (from `details.navigationSummary`, gathered by one read-only `eval` summary when the click payload omits **both** string `data.url` and `data.title`) stay equal after the same fragment-insensitive normalization used for ref preflight, and where the same unified result did **not** already apply session tab correction, about-blank mismatch recovery, or `details.clickDispatch` fired for the same result, `extensions/agent-browser/index.ts` takes one fresh session-scoped `snapshot -i` (or uses the successful snapshot result directly), scans `refs` for strong modal context (`dialog` / `alertdialog`) plus up to three close/dismiss-pattern `button`/`link`/`menuitem` controls, and only then emits `details.overlayBlockers` (`candidates`, `summary`, and a `snapshot` map that can advance `refSnapshot`), visible `Possible overlay blockers`, and `inspect-overlay-state` / `try-overlay-blocker-candidate-*` next actions (with `--session` prefix when the session is named) appended after presentation follow-ups such as `inspect-after-mutation`. Page-wide privacy/sign-in/banner text without a dialog role is deliberately ignored to avoid warnings after ordinary same-page clicks. Contract: [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#details) (`overlayBlockers`); human workflow: [`COMMAND_REFERENCE.md`](COMMAND_REFERENCE.md) no-navigation click note and README pitfalls; fake coverage: `agentBrowserExtension surfaces overlay blockers in snapshot actionability metadata`, `agentBrowserExtension surfaces likely overlay blockers after a no-op click` and `agentBrowserExtension does not report overlay blockers from unrelated page chrome after a successful same-page click` in [`test/agent-browser.extension-errors-artifacts.test.ts`](../test/agent-browser.extension-errors-artifacts.test.ts).
136
-
137
- `RQ-0086` reduces wrapper-induced click fragility found during Sauce Demo smokes: navigation-summary enrichment for click/back/forward/reload/dblclick now uses one read-only `eval` (`({ title: document.title, url: location.href })`) instead of serial `get title` plus `get url` probes, including tab-pinned batch wrappers. Tab pinning/post-command tab correction now runs only after the wrapper has evidence of tab-drift risk (profile restore correction, overlapping stale opens, or restored session state), so ordinary same-session clicks no longer get repeated `tab list` probes. This keeps `details.navigationSummary`, overlay blocker checks, and drift recovery intact while avoiding the upstream `agent-browser 0.27.0` sequence that could report later clicks as successful without dispatching pointer/click events after repeated getter/tab/snapshot probes. Fake coverage: `agentBrowserExtension enriches click results with a post-navigation title and url summary` in [`test/agent-browser.extension-tabs.test.ts`](../test/agent-browser.extension-tabs.test.ts), plus `agentBrowserExtension pins the intended tab inside a follow-up command when reconnect drift would otherwise steal focus` and about-blank/tab overlap assertions in [`test/agent-browser.extension-tab-recovery.test.ts`](../test/agent-browser.extension-tab-recovery.test.ts); manual validation source: [`RELEASE.md`](RELEASE.md#public-sauce-demo-checkout-smoke-prompt).
138
-
139
- `RQ-0089` investigated Sauce Demo no-op clicks after RQ-0086, and the 2026-05-26 release smoke reproduced the failure against direct upstream `agent-browser 0.27.0`: CSS `click [data-test=add-to-cart-sauce-labs-backpack]` and current `@ref` clicks returned success, but a page-level listener recorded no trusted pointer/mouse/click events and the cart stayed unchanged; an in-page `element.click()` did mutate the cart. The wrapper now adds a bounded top-level non-Electron `click` dispatch probe before standalone clicks. If upstream reports success but no trusted DOM event reached the target, it fails the tool, records `details.clickDispatch.status: "no-native-event-observed"`, and appends `inspect-click-dispatch-miss` / `retry-click-after-dispatch-miss` next actions; when the probe observes nested-scroll/offscreen evidence, it also records `details.clickDispatch.scrollContainer` and appends `scroll-target-into-view-after-dispatch-miss`. It does **not** replay clicks in-page. This is not site-specific and does not alter `batch`/`job`/`qa` click steps. For `@e…` refs, the probe uses role/name metadata persisted in `details.refSnapshot` from the latest snapshot instead of running a pre-click snapshot that could recycle upstream refs. Contract: [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#details) (`clickDispatch`, `refSnapshot`); human workflow: README and [`COMMAND_REFERENCE.md`](COMMAND_REFERENCE.md) click verification notes; fake coverage: `agentBrowserExtension reports click dispatch diagnostic when upstream reports success without dispatching DOM events` in [`test/agent-browser.extension-errors-artifacts.test.ts`](../test/agent-browser.extension-errors-artifacts.test.ts).
140
-
141
- `RQ-0074` warns when `get text <selector>` may read hidden or tabbed DOM content: for non-ref CSS selectors, `extensions/agent-browser/index.ts` runs a read-only `eval --stdin` visibility probe after successful text reads, emits `details.selectorTextVisibility` plus visible warning text when the first match is hidden while visible matches exist or when multiple matches make the upstream first-match choice ambiguous, preserves multiple batched warnings in `details.selectorTextVisibilityAll`, and appends `inspect-visible-text-candidates` next actions. Contract: [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#details) (`selectorTextVisibility`); human workflow: [`COMMAND_REFERENCE.md`](COMMAND_REFERENCE.md) extraction note and README pitfalls; fake coverage: `agentBrowserExtension warns when get text may read hidden selector matches` in [`test/agent-browser.extension-errors-artifacts.test.ts`](../test/agent-browser.extension-errors-artifacts.test.ts).
142
-
143
- `RQ-0075` classifies QA and diagnostic network failures by likely impact: `summarizeNetworkFailures` / `classifyNetworkRequestFailure` in `extensions/agent-browser/lib/results/network.ts` (re-exported from `shared.ts`) split rows that already count as failed (`isFailedNetworkRequest`) into actionable versus benign low-impact browser icon asset misses (`isBenignAssetFailure`: favicon/apple-touch-icon basename patterns, 404/`failed`/string `error` signals, and image-like `resourceType`/`mimeType` when present). `analyzeQaPresetResults` fails `qa` only for actionable network failures while preserving benign rows in `qaPreset.warnings`, and network request presentation adds a compact actionable/benign summary plus per-row impact tags, ordered with actionable/benign failed rows before successful rows so late failures are visible even in capped previews. Because real Pi ignores returned `isError` fields from custom tool `execute`, `extensions/agent-browser/index.ts` also realigns `details.resultCategory: "failure"` outcomes to Pi-visible tool errors through a `tool_result` handler; it appends the exact failure category plus `Pi tool isError: true` to prose output and preserves caller-requested `--json` output as parseable JSON while patching `isError`. Contract: [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#qa) and [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#details); human workflow: [`COMMAND_REFERENCE.md`](COMMAND_REFERENCE.md) QA and network diagnostic notes; fake coverage: `agentBrowserExtension compiles lightweight QA presets and fails diagnostics` in [`test/agent-browser.extension-input-modes.test.ts`](../test/agent-browser.extension-input-modes.test.ts) plus network presentation assertions in [`test/agent-browser.presentation.test.ts`](../test/agent-browser.presentation.test.ts); model-free real-Pi pipeline coverage in [`test/agent-browser.pi-pipeline.test.ts`](../test/agent-browser.pi-pipeline.test.ts) asserts both in-memory and persisted JSONL tool results for QA prose patching, parseable caller-requested `--json` failures, and strict public-schema rejection before upstream spawn; `npm run verify -- lifecycle` asserts the QA failure-patch line in a saved JSONL session.
144
-
145
- `RQ-0076` adds best-effort timeout recovery when the wrapper watchdog kills a stuck upstream process: `extensions/agent-browser/index.ts` calls `collectTimeoutPartialProgress` / `formatTimeoutPartialProgressText` to build `details.timeoutPartialProgress` from the compiled `job` or `qa` step list or parsed caller `batch` stdin, session-scoped `get url` / `get title` (plus optional planned-URL fallback from `open`/`navigate`/`pushstate` steps), and declared artifact paths (`screenshot`, `pdf`, `download`, `wait --download`) with existence/size/state checks. Planned steps include `completed` / `failed` / `pending` / `unknown` statuses, the first incomplete step is exposed as `retryStep`, and `details.nextActions` can include `retry-timeout-step` with the exact retry payload; opened pages recovered after a post-open hang set `openedButPostOpenTimedOut`. The visible `Timeout partial progress` block repeats the same redacted URLs/paths and retry guidance. Contract: [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#details); human workflow: [`COMMAND_REFERENCE.md`](COMMAND_REFERENCE.md) wrapper timeout note and README job section; fake coverage: `agentBrowserExtension reports partial progress and artifacts after job timeout` in [`test/agent-browser.extension-errors-artifacts.test.ts`](../test/agent-browser.extension-errors-artifacts.test.ts).
146
-
147
- `RQ-0077` reports managed-session outcomes after managed-session process execution: `extensions/agent-browser/index.ts` builds `details.managedSessionOutcome` (`buildManagedSessionOutcome`), recording `status` values such as `preserved` (previous managed session remains current) or `abandoned` (no managed session became current), plus previous/current/attempted session names, optional `replacedSessionName`, and active-before/after booleans. Visible `Managed session outcome: …` text (`formatManagedSessionOutcomeText`) is appended only when `sessionMode` is `"fresh"` and the outcome’s `succeeded` is false—covering launch failures, missing-binary on a fresh plan, and post-batch failures such as **`qa`** reclassification where `succeeded` is realigned after the fact. Contract: [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#details); human workflow: [`COMMAND_REFERENCE.md`](COMMAND_REFERENCE.md) session-mode notes and README session section; fake coverage: `agentBrowserExtension reports managed-session outcomes after failed fresh launches` in [`test/agent-browser.extension-errors-artifacts.test.ts`](../test/agent-browser.extension-errors-artifacts.test.ts) and the managed-session slice of `agentBrowserExtension compiles lightweight QA presets and fails diagnostics` in [`test/agent-browser.extension-input-modes.test.ts`](../test/agent-browser.extension-input-modes.test.ts).
148
-
149
- `RQ-0078` improves getter/eval discoverability: `extensions/agent-browser/lib/results/presentation/errors.ts` matches upstream failure text containing `unknown command`, `unknown subcommand`, or `unrecognized command` (case-insensitive) when the failed command token is one of `attr`, `count`, `html`, `text`, `title`, `url`, or `value`, then adds grouped-`get` prose; only `title` / `url` also emit read-only `nextActions` (`use-get-title` / `use-get-url`, with `--session` when the failed call named a session). The getter block is skipped when selector recovery already injected an `Agent-browser hint:` line into the same error string. `extensions/agent-browser/index.ts` adds `details.evalStdinHint` plus visible `Eval stdin hint` when `looksLikeFunctionEvalStdin` matches trimmed stdin and upstream JSON carries a plain empty-object `data.result`; empty arrays such as `[]` are valid eval results and are not warned. Contract: [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#details) (`nextActions`, `evalStdinHint`); human workflow: [`COMMAND_REFERENCE.md`](COMMAND_REFERENCE.md) extraction note and README quick start; fake coverage: `buildToolPresentation suggests grouped getter commands for common unknown getter shortcuts` and `agentBrowserExtension warns when eval stdin returns an empty object from a function-shaped snippet`.
150
-
151
- `RQ-0079` clarifies artifact lifecycle and cleanup ownership: `extensions/agent-browser/lib/orchestration/browser-run/diagnostics.ts` builds `details.artifactCleanup`, surfaced by process-output with visible `Artifact lifecycle` copy on successful close commands (`close`, `quit`, or `exit`) when `artifactManifest.entries` is non-empty (`getArtifactCleanupGuidance`), stating that close commands do not delete explicit artifacts; `explicitArtifactPaths` carries up to ten distinct existing `explicit-path` manifest paths after a filesystem existence check, skipping stale paths already removed by host tools (possibly empty when the recent window has no existing explicit rows). Contract: [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#details) (`artifactCleanup`); human workflow: [`COMMAND_REFERENCE.md`](COMMAND_REFERENCE.md) artifact retention section and README artifact notes; fake coverage: `agentBrowserExtension reports artifact lifecycle guidance on close` in [`test/agent-browser.extension-errors-artifacts.test.ts`](../test/agent-browser.extension-errors-artifacts.test.ts), plus close-alias unit coverage in [`test/agent-browser.runtime.test.ts`](../test/agent-browser.runtime.test.ts) and [`test/agent-browser.session-page-state.test.ts`](../test/agent-browser.session-page-state.test.ts).
152
-
153
- `RQ-0080` adds no-op scroll recovery for dense dashboards and nested panes, plus click-dispatch nested-scroll recovery when a click target appears outside its scroll container: for successful top-level `scroll`, `extensions/agent-browser/index.ts` samples viewport and prominent scroll-container positions before and after execution with read-only session-scoped `eval --stdin` probes. If no sampled position changes, it emits `details.scrollNoop`, appends visible `Scroll diagnostic: no observed scroll movement`, appends exact `inspect-after-noop-scroll` / `verify-noop-scroll-visually` next actions, and updates `pageChangeSummary.nextActionIds` so agents can branch without parsing prose. Contract: [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#details) (`scrollNoop`, `nextActions`); human workflow: [`COMMAND_REFERENCE.md`](COMMAND_REFERENCE.md) scroll note; fake coverage: `agentBrowserExtension reports no-op scroll diagnostics with recovery next actions`.
154
-
155
- `RQ-0081` adds focused-combobox recovery for dense dashboard controls: after successful explicit combobox-targeted actions (for example `semanticAction` role `combobox` click), `extensions/agent-browser/index.ts` runs a read-only focused-element probe and emits `details.comboboxFocus` plus visible `Combobox diagnostic` text when a combobox-like control is focused, has explicit `aria-expanded` state, and no visible listbox/options are open. It appends exact `inspect-focused-combobox`, `try-open-combobox-with-arrow`, and `try-open-combobox-with-enter` next actions, all session-prefixed when applicable. The probe is gated to explicit combobox targets to avoid ordinary-click false positives and preserves the original combobox semantic target even when active-session visible-ref resolution rewrites execution to `click @ref`. Contract: [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#details) (`comboboxFocus`, `nextActions`); human workflow: [`COMMAND_REFERENCE.md`](COMMAND_REFERENCE.md) combobox note; fake coverage: `agentBrowserExtension reports focused combobox diagnostics with option-opening next actions` and `agentBrowserExtension preserves combobox diagnostics after semanticAction visible-ref resolution`.
156
-
157
- `RQ-0082` adds early recording dependency warnings: after successful `record start` / `record restart`, `extensions/agent-browser/index.ts` checks whether executable `ffmpeg` is visible on the Pi process `PATH`. If not, it emits non-blocking `details.recordingDependencyWarning` plus visible `Recording dependency warning: ffmpeg not found on PATH` text so agents can install `ffmpeg` or fix PATH before `record stop` needs to encode the WebM. Contract: [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#details) (`recordingDependencyWarning`); human workflow: [`COMMAND_REFERENCE.md`](COMMAND_REFERENCE.md) recording notes and README dependency table; fake coverage: `agentBrowserExtension warns after record start when ffmpeg is missing`.
158
-
159
- `RQ-0083` documents a repeatable public Grafana stress checklist in [`RELEASE.md`](RELEASE.md#public-grafana-stress-checklist) instead of bundling private dogfood/VFR skills or adding a recipe runtime. The checklist uses Grafana Play Node Exporter Full to manually exercise dense snapshots, no-op scroll diagnostics, combobox recovery, screenshots/artifacts, optional short recording, network/console/error summaries, and cleanup. Treat known Grafana Play noise (analytics/Sentry requests, public-demo 403s, console errors) as site noise unless the wrapper leaks secrets, hides actionable rows, mishandles artifacts, or suggests unsafe follow-ups. Evidence should be a short release note or CueLoop task, not committed `.dogfood/` outputs, raw HARs, videos, or private scripts. Validation on 2026-05-15 used the native tool against Grafana Play: fresh open, dense `snapshot -i`, scroll, combobox semantic click, screenshots with verified artifacts, `network requests`, `console`, `close`, and host cleanup of `/tmp/pi-agent-browser-grafana-rq0083*.png`; observed 11 public-demo 403 request rows and Grafana console noise as expected site noise.
81
+ | Area | Active contract | Detail archive |
82
+ | --- | --- | --- |
83
+ | Native structured input modes (`job`, `qa`, `sourceLookup`, `networkSourceLookup`, `semanticAction`) | [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md), [`COMMAND_REFERENCE.md`](COMMAND_REFERENCE.md) | [`docs/support-notes.md`](https://github.com/fitchmultz/pi-agent-browser-native/blob/main/docs/support-notes.md) |
84
+ | Electron lifecycle, manual CDP attach, desktop readiness, and tab/surface recovery | [`ELECTRON.md`](ELECTRON.md), [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#electron), [`COMMAND_REFERENCE.md`](COMMAND_REFERENCE.md#electron-desktop-apps) | [`docs/support-notes.md`](https://github.com/fitchmultz/pi-agent-browser-native/blob/main/docs/support-notes.md) |
85
+ | Ref lifecycle, click dispatch, selector recovery, rich inputs, and dense snapshots | [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#details), [`COMMAND_REFERENCE.md`](COMMAND_REFERENCE.md#selector-strategy), README pitfalls | [`docs/support-notes.md`](https://github.com/fitchmultz/pi-agent-browser-native/blob/main/docs/support-notes.md) |
86
+ | Diagnostics, artifacts, QA/network classification, timeout recovery, scroll/combobox/recording guidance | [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#details), [`COMMAND_REFERENCE.md`](COMMAND_REFERENCE.md), [`RELEASE.md`](RELEASE.md) | [`docs/support-notes.md`](https://github.com/fitchmultz/pi-agent-browser-native/blob/main/docs/support-notes.md) |
87
+ | Package config and optional web search | [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#optional-companion-web-search), README optional package config, [`COMMAND_REFERENCE.md`](COMMAND_REFERENCE.md#optional-package-config-and-companion-web-search) | [`docs/support-notes.md`](https://github.com/fitchmultz/pi-agent-browser-native/blob/main/docs/support-notes.md) |