pi-agent-browser-native 0.2.31 → 0.2.33
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/CHANGELOG.md +35 -0
- package/README.md +64 -18
- package/docs/ARCHITECTURE.md +13 -10
- package/docs/COMMAND_REFERENCE.md +71 -16
- package/docs/ELECTRON.md +387 -0
- package/docs/RELEASE.md +34 -4
- package/docs/REQUIREMENTS.md +5 -3
- package/docs/SUPPORT_MATRIX.md +36 -21
- package/docs/TOOL_CONTRACT.md +198 -40
- package/extensions/agent-browser/index.ts +1585 -3486
- package/extensions/agent-browser/lib/electron/cleanup.ts +287 -0
- package/extensions/agent-browser/lib/electron/discovery.ts +717 -0
- package/extensions/agent-browser/lib/electron/launch.ts +553 -0
- package/extensions/agent-browser/lib/input-modes/electron.ts +170 -0
- package/extensions/agent-browser/lib/input-modes/job.ts +203 -0
- package/extensions/agent-browser/lib/input-modes/lookups.ts +447 -0
- package/extensions/agent-browser/lib/input-modes/params.ts +188 -0
- package/extensions/agent-browser/lib/input-modes/semantic-action.ts +107 -0
- package/extensions/agent-browser/lib/input-modes/shared.ts +46 -0
- package/extensions/agent-browser/lib/input-modes/types.ts +221 -0
- package/extensions/agent-browser/lib/input-modes.ts +41 -0
- package/extensions/agent-browser/lib/orchestration/browser-run/diagnostics.ts +696 -0
- package/extensions/agent-browser/lib/orchestration/browser-run/final-result.ts +450 -0
- package/extensions/agent-browser/lib/orchestration/browser-run/index.ts +46 -0
- package/extensions/agent-browser/lib/orchestration/browser-run/prepare.ts +711 -0
- package/extensions/agent-browser/lib/orchestration/browser-run/process-output.ts +386 -0
- package/extensions/agent-browser/lib/orchestration/browser-run/session-state.ts +868 -0
- package/extensions/agent-browser/lib/orchestration/browser-run/types.ts +476 -0
- package/extensions/agent-browser/lib/orchestration/browser-run.ts +1 -0
- package/extensions/agent-browser/lib/orchestration/input-plan.ts +338 -0
- package/extensions/agent-browser/lib/playbook.ts +15 -13
- package/extensions/agent-browser/lib/process.ts +106 -4
- package/extensions/agent-browser/lib/results/action-recommendations.ts +269 -0
- package/extensions/agent-browser/lib/results/artifact-manifest.ts +114 -0
- package/extensions/agent-browser/lib/results/artifact-state.ts +13 -0
- package/extensions/agent-browser/lib/results/categories.ts +106 -0
- package/extensions/agent-browser/lib/results/contracts.ts +220 -0
- package/extensions/agent-browser/lib/results/editable-ref-evidence.ts +72 -0
- package/extensions/agent-browser/lib/results/envelope.ts +2 -1
- package/extensions/agent-browser/lib/results/network.ts +64 -0
- package/extensions/agent-browser/lib/results/next-actions.ts +117 -0
- package/extensions/agent-browser/lib/results/presentation/artifacts.ts +506 -0
- package/extensions/agent-browser/lib/results/presentation/batch.ts +355 -0
- package/extensions/agent-browser/lib/results/presentation/common.ts +53 -0
- package/extensions/agent-browser/lib/results/presentation/content.ts +36 -0
- package/extensions/agent-browser/lib/results/presentation/diagnostics.ts +730 -0
- package/extensions/agent-browser/lib/results/presentation/errors.ts +125 -0
- package/extensions/agent-browser/lib/results/presentation/large-output.ts +182 -0
- package/extensions/agent-browser/lib/results/presentation/navigation.ts +216 -0
- package/extensions/agent-browser/lib/results/presentation/registry.ts +154 -0
- package/extensions/agent-browser/lib/results/presentation/skills.ts +143 -0
- package/extensions/agent-browser/lib/results/presentation.ts +87 -2369
- package/extensions/agent-browser/lib/results/recovery-actions.ts +139 -0
- package/extensions/agent-browser/lib/results/recovery-next-actions.ts +71 -0
- package/extensions/agent-browser/lib/results/selector-recovery.ts +312 -0
- package/extensions/agent-browser/lib/results/shared.ts +17 -701
- package/extensions/agent-browser/lib/results/snapshot-high-value-controls.ts +262 -0
- package/extensions/agent-browser/lib/results/snapshot-refs.ts +100 -0
- package/extensions/agent-browser/lib/results/snapshot-segments.ts +366 -0
- package/extensions/agent-browser/lib/results/snapshot-spill.ts +63 -0
- package/extensions/agent-browser/lib/results/snapshot.ts +37 -489
- package/extensions/agent-browser/lib/results/text.ts +40 -0
- package/extensions/agent-browser/lib/results.ts +16 -5
- package/extensions/agent-browser/lib/session-page-state.ts +486 -0
- package/extensions/agent-browser/lib/temp.ts +26 -0
- package/package.json +6 -4
package/docs/SUPPORT_MATRIX.md
CHANGED
|
@@ -5,6 +5,7 @@ Related docs:
|
|
|
5
5
|
- [`../AGENTS.md`](../AGENTS.md) (rebaselining and verification stack)
|
|
6
6
|
- [`COMMAND_REFERENCE.md`](COMMAND_REFERENCE.md)
|
|
7
7
|
- [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md)
|
|
8
|
+
- [`ELECTRON.md`](ELECTRON.md)
|
|
8
9
|
- [`RELEASE.md`](RELEASE.md)
|
|
9
10
|
- [`REQUIREMENTS.md`](REQUIREMENTS.md)
|
|
10
11
|
|
|
@@ -28,7 +29,7 @@ When upstream ships a new `agent-browser` or the inventory changes:
|
|
|
28
29
|
- Source of truth: `CAPABILITY_BASELINE.inventorySections` in the same file (stable `id` keys: `skills`, `core-commands`, `state-tabs-frames-dialogs`, `network-storage-artifacts-diagnostics`, `batch-auth-setup-ai`, `options-and-env`).
|
|
29
30
|
- Status: supported for the current wrapper contract.
|
|
30
31
|
- High-priority support gaps: none identified in the baseline audit.
|
|
31
|
-
- Post-`v0.2.29` review state: commits `eb55320` through `86abbfb` add browser guidance/smoke coverage plus `RQ-0086` click-probe reduction, `RQ-0087` same-snapshot form fill batching, `RQ-0088` current-ref fallback on locator misses, `RQ-0089` direct-upstream click mutation investigation, and `RQ-0090` stop-boundary/artifact-path guidance.
|
|
32
|
+
- Post-`v0.2.29` review state: commits `eb55320` through `86abbfb` add browser guidance/smoke coverage plus `RQ-0086` click-probe reduction, `RQ-0087` same-snapshot form fill batching, `RQ-0088` current-ref fallback on locator misses, `RQ-0089` direct-upstream click mutation investigation, and `RQ-0090` stop-boundary/artifact-path guidance. Verification gates below were rerun on 2026-05-18 after those tasks landed. Constrained `job` (`RQ-0064`), the lightweight `qa` preset (`RQ-0065`), the experimental `sourceLookup` helper (`RQ-0066`), and the experimental `networkSourceLookup` helper (`RQ-0067`) are implemented; see [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#job), [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#qa), [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#sourcelookup), and [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#networksourcelookup). Reusable browser recipes (`RQ-0068`) are intentionally not adopted as a runtime surface; see [`ARCHITECTURE.md`](ARCHITECTURE.md#no-reusable-recipe-layer-yet).
|
|
32
33
|
|
|
33
34
|
## Verification evidence
|
|
34
35
|
|
|
@@ -40,7 +41,7 @@ Re-run the gates below before each release; this table records what the closure
|
|
|
40
41
|
| Real upstream contract | `npm run verify -- real-upstream` runs the localhost fixture matrix against the real installed `agent-browser` matching the baseline. | Pass on 2026-05-18 (`npm run verify -- real-upstream`). |
|
|
41
42
|
| Packaged Pi smoke | `npm run verify -- package-pi` validates package contents, loads exactly one packaged `agent_browser` tool, and executes fake-upstream `--version`. | Pass on 2026-05-18 (`npm run verify -- package-pi`). |
|
|
42
43
|
| `verify -- release` / `prepublishOnly` | `npm run verify -- release` chains the default gate with packaged Pi smoke (`verifySteps` `release` in [`scripts/project.mjs`](../scripts/project.mjs)). `package.json` `prepublishOnly` runs that compose before `npm pack --dry-run` during `npm publish`. It intentionally omits lifecycle, real-upstream, and benchmark modes—see [`RELEASE.md`](RELEASE.md#pre-release-checks). | Pass on 2026-05-18 (`npm run verify -- release`). `prepublishOnly` still needs a fresh run during actual publish. |
|
|
43
|
-
| Configured-source lifecycle | `npm run verify -- lifecycle` (`scripts/verify-lifecycle.mjs`) drives `/reload`, restart, `/resume`, session continuity, slash-command sentinel tokens (`v1` then `v2` after rewriting the packaged extension to simulate pickup), and persisted spill reachability with a fake upstream on `PATH`.
|
|
44
|
+
| Configured-source lifecycle | `npm run verify -- lifecycle` (`scripts/verify-lifecycle.mjs`) drives `/reload`, restart, `/resume`, session continuity, slash-command sentinel tokens (`v1` then `v2` after rewriting the packaged extension to simulate pickup), and persisted spill reachability with a fake upstream on `PATH`. Default Pi model is `zai/glm-5.1`; default per-step wait is **180000 ms** (`DEFAULT_TIMEOUT_MS`); override model with `--model <id>` and waits with `--timeout-ms <ms>`. Passthrough flags in [`scripts/project.mjs`](../scripts/project.mjs): `--keep-artifacts`, `--model`, `--verbose`, and `--timeout-ms` plus a value (for example `npm run verify -- lifecycle --model openai-codex/gpt-5.5:minimal --keep-artifacts --verbose --timeout-ms 600000`). | Pass on 2026-05-18 (`npm run verify -- lifecycle`). Treat any future unexplained red lifecycle gate as a release blocker. |
|
|
44
45
|
| Quick isolated Pi smoke | `pi --no-extensions -e .` from repo root; native `agent_browser` only. | Pass on 2026-05-18 for a fresh interactive tmux smoke: the agent opened `https://example.com`, waited for `Example Domain`, saved `/tmp/piab-isolated-smoke.png` with verified `image/png` artifact metadata, closed the browser session, and reported PASS. Broader historical coverage also includes version/help/skills, open/snapshot/click, eval stdin, batch stdin, screenshot, explicit session, `sessionMode: "fresh"`, network requests, console/errors, diff snapshot, stream status/disable, dashboard start/stop, and chat credential-failure pass-through during RQ-0055. |
|
|
45
46
|
|
|
46
47
|
## Baseline checklist by inventory section
|
|
@@ -52,53 +53,67 @@ Re-run the gates below before each release; this table records what the closure
|
|
|
52
53
|
| Sessions, state, tabs, frames, dialogs, and windows | `session`, `session list`, `state save <path>`, `state load <path>`, `tab list`, `tab new --label <name> [url]`, `tab <t<N>|label>`, `frame <selector|main>`, `dialog accept [text]`, `dialog dismiss`, `dialog status`, `window new` | [`COMMAND_REFERENCE.md`](COMMAND_REFERENCE.md#session-state-frames-dialogs-windows-and-inspection-commands) (session/state/tabs/frames/dialogs/windows), stateful workflow notes, [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#details). | Stateful presentation summaries/redaction; state save artifact handling; explicit/implicit session restore; tab target pinning; frame/dialog/window passthrough. | `test/agent-browser.extension-validation.test.ts` stateful matrix; runtime session/resume tests; presentation stateful redaction tests; lifecycle harness for reload/resume. | Supported. External profile/auth state remains operator-owned and documented. |
|
|
53
54
|
| Network, storage, artifacts, diagnostics, and performance | `network <action>`, `network route <url> [--abort|--body <json>] [--resource-type <csv>]`, `network request <requestId>`, `cookies [get|set|clear]`, `cookies set --curl <file>`, `storage <local|session>`, `diff snapshot`, `diff screenshot --baseline`, `diff url <u1> <u2>`, `trace start|stop [path]`, `profiler start|stop [path]`, `record start <path> [url]`, `record restart <path> [url]`, `record stop`, `console [--clear]`, `errors [--clear]`, `highlight <sel>`, `inspect`, `clipboard <op> [text]`, `stream enable [--port <n>]`, `stream disable`, `stream status`, `react tree`, `react inspect <id>`, `react renders start`, `react renders stop [--json]`, `react suspense [--only-dynamic] [--json]`, `vitals [url] [--json]`, `removeinitscript <id>` | [`COMMAND_REFERENCE.md`](COMMAND_REFERENCE.md#page-state-finding-mouse-settings-network-and-storage) and diagnostic sections; [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#details). | Thin passthrough plus command-specific compact diagnostic summaries, artifact metadata for HAR/diff/trace/profile/record, early missing-ffmpeg recording warnings, sensitive-data redaction, timeout bounds, and cleanup-pair guidance. | Fake non-core matrix covers network/diff/trace/profiler/record/console/errors/highlight/inspect/clipboard/stream/dashboard/chat JSON shapes and redaction; real-upstream covers safe network requests/HAR, diff, trace/profiler, console/errors/highlight, stream, vitals, and React missing-renderer. | Supported. Browser-opening or environment-sensitive operations (`inspect`, OS clipboard, full React app inspection) are delegated thinly and documented as needing suitable local/browser state. |
|
|
54
55
|
| Batch, auth, confirmations, setup, dashboard, and AI commands | `batch [--bail]`, `auth save <name>`, `auth save <name> --password-stdin`, `auth login <name>`, `auth list`, `auth show <name>`, `auth delete <name>`, `confirm <id>`, `deny <id>`, `chat <message>`, `dashboard start --port <n>`, `dashboard stop`, `install`, `install --with-deps`, `upgrade`, `doctor [--fix]`, `doctor --offline --quick`, `doctor --json`, `profiles` | [`COMMAND_REFERENCE.md`](COMMAND_REFERENCE.md#batch-auth-confirmations-sessions-chat-dashboard-and-setup), README security notes, release docs. | Batch stdin is native-tool-only; top-level `job`, `qa`, and experimental `sourceLookup` / `networkSourceLookup` compile to `batch` with generated stdin (caller `stdin` rejected for those modes); job `select` compiles to upstream `select <selector> <value...>`; auth/confirmation details are redacted; dashboard/chat/setup/doctor are passed through thinly with timeout/cleanup guidance; package doctor remains separate and read-only. | Unit/fake tests cover batch, auth password stdin, confirmations, dashboard/chat summaries, and doctor diagnostics; extension-validation covers `job`, `qa`, `sourceLookup`, and `networkSourceLookup` compilation plus `details.sourceLookup` / `details.networkSourceLookup` evidence, including job `select`; [`scripts/agent-browser-efficiency-benchmark.mjs`](../scripts/agent-browser-efficiency-benchmark.mjs) includes `source-lookup-visible-element` and `network-source-lookup-failed-request` scenarios; quick isolated Pi smoke covered dashboard start/stop and chat credential-failure pass-through. | Supported. `install`, `upgrade`, `doctor --fix`, and interactive auth/chat/setup flows are upstream-owned and should be run only when the operator intends those side effects. |
|
|
55
|
-
| Global flags, config, providers, policy, and environment | `--profile <name|path>`, `AGENT_BROWSER_PROFILE`, `--session <name>`, `AGENT_BROWSER_SESSION`, `--session-name <name>`, `AGENT_BROWSER_SESSION_NAME`, `--state <path>`, `AGENT_BROWSER_STATE`, `--auto-connect`, `AGENT_BROWSER_AUTO_CONNECT`, `--headers <json>`, `--init-script <path>`, `AGENT_BROWSER_INIT_SCRIPTS`, `--enable <feature>`, `AGENT_BROWSER_ENABLE`, `--executable-path <path>`, `AGENT_BROWSER_EXECUTABLE_PATH`, `--extension <path>`, `AGENT_BROWSER_EXTENSIONS`, `--args <args>`, `AGENT_BROWSER_ARGS`, `--user-agent <ua>`, `AGENT_BROWSER_USER_AGENT`, `--proxy <server>`, `AGENT_BROWSER_PROXY`, `HTTP_PROXY`, `HTTPS_PROXY`, `ALL_PROXY`, `--proxy-bypass <hosts>`, `AGENT_BROWSER_PROXY_BYPASS`, `NO_PROXY`, `--ignore-https-errors`, `AGENT_BROWSER_IGNORE_HTTPS_ERRORS`, `--allow-file-access`, `AGENT_BROWSER_ALLOW_FILE_ACCESS`, `--headed`, `AGENT_BROWSER_HEADED`, `--cdp <port>`, `--color-scheme <scheme>`, `AGENT_BROWSER_COLOR_SCHEME`, `--download-path <path>`, `AGENT_BROWSER_DOWNLOAD_PATH`, `--engine <name>`, `AGENT_BROWSER_ENGINE`, `--no-auto-dialog`, `AGENT_BROWSER_NO_AUTO_DIALOG`, `--json`, `AGENT_BROWSER_JSON`, `--annotate`, `AGENT_BROWSER_ANNOTATE`, `--screenshot-dir <path>`, `AGENT_BROWSER_SCREENSHOT_DIR`, `--screenshot-quality <n>`, `AGENT_BROWSER_SCREENSHOT_QUALITY`, `--screenshot-format <fmt>`, `AGENT_BROWSER_SCREENSHOT_FORMAT`, `--content-boundaries`, `AGENT_BROWSER_CONTENT_BOUNDARIES`, `--max-output <chars>`, `AGENT_BROWSER_MAX_OUTPUT`, `--allowed-domains <list>`, `AGENT_BROWSER_ALLOWED_DOMAINS`, `--action-policy <path>`, `AGENT_BROWSER_ACTION_POLICY`, `--confirm-actions <list>`, `AGENT_BROWSER_CONFIRM_ACTIONS`, `--confirm-interactive`, `AGENT_BROWSER_CONFIRM_INTERACTIVE`, `-p, --provider <name>`, `AGENT_BROWSER_PROVIDER`, `browserbase`, `kernel`, `browseruse`, `browserless`, `agentcore`, `--device <name>`, `AGENT_BROWSER_IOS_DEVICE`, `agent-browser -p ios device list`, `agent-browser -p ios swipe up`, `agent-browser -p ios tap @e1`, `--model <name>`, `AI_GATEWAY_MODEL`, `-v, --verbose`, `-q, --quiet`, `--debug`, `AGENT_BROWSER_DEBUG`, `AGENT_BROWSER_CONFIG`, `AGENT_BROWSER_DEFAULT_TIMEOUT`, `AGENT_BROWSER_STREAM_PORT`, `AGENT_BROWSER_IDLE_TIMEOUT_MS`, `AGENT_BROWSER_ENCRYPTION_KEY`, `AGENT_BROWSER_STATE_EXPIRE_DAYS`, `AGENT_BROWSER_IOS_UDID`, `AI_GATEWAY_URL`, `AI_GATEWAY_API_KEY` | [`COMMAND_REFERENCE.md`](COMMAND_REFERENCE.md#important-global-flags-config-and-environment), README provider/setup notes, [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#sessionmode), architecture/runtime docs. | Runtime handles value flags, launch-scoped flags, redacted invocation echoes, `sessionMode: "fresh"` recovery hints, explicit sessions, and provider/device launch-scoping. Process env forwards a curated allowlist/prefix set for upstream/provider credentials without cloning the whole parent env. | Runtime tests cover launch-scoped flags, provider/device planning, redaction, stateless inspections, and explicit/fresh sessions. Process tests cover provider env prefixes. Fake provider/specialized-skill matrix covers provider argv/env passthrough. Package doctor checks version/source drift. | Supported. Provider clouds, iOS/Appium, Browserbase/Kernel/BrowserUse/Browserless/AgentCore, proxies, profiles, and credentials require external setup; the wrapper documents and forwards them thinly rather than emulating provider behavior. |
|
|
56
|
+
| Global flags, config, providers, policy, and environment | `--profile <name|path>`, `AGENT_BROWSER_PROFILE`, `--session <name>`, `AGENT_BROWSER_SESSION`, `--session-name <name>`, `AGENT_BROWSER_SESSION_NAME`, `--state <path>`, `AGENT_BROWSER_STATE`, `--auto-connect`, `AGENT_BROWSER_AUTO_CONNECT`, `--headers <json>`, `--init-script <path>`, `AGENT_BROWSER_INIT_SCRIPTS`, `--enable <feature>`, `AGENT_BROWSER_ENABLE`, `--executable-path <path>`, `AGENT_BROWSER_EXECUTABLE_PATH`, `--extension <path>`, `AGENT_BROWSER_EXTENSIONS`, `--args <args>`, `AGENT_BROWSER_ARGS`, `--user-agent <ua>`, `AGENT_BROWSER_USER_AGENT`, `--proxy <server>`, `AGENT_BROWSER_PROXY`, `HTTP_PROXY`, `HTTPS_PROXY`, `ALL_PROXY`, `--proxy-bypass <hosts>`, `AGENT_BROWSER_PROXY_BYPASS`, `NO_PROXY`, `--ignore-https-errors`, `AGENT_BROWSER_IGNORE_HTTPS_ERRORS`, `--allow-file-access`, `AGENT_BROWSER_ALLOW_FILE_ACCESS`, `--headed`, `AGENT_BROWSER_HEADED`, `--cdp <port>`, `--color-scheme <scheme>`, `AGENT_BROWSER_COLOR_SCHEME`, `--download-path <path>`, `AGENT_BROWSER_DOWNLOAD_PATH`, `--engine <name>`, `AGENT_BROWSER_ENGINE`, `--no-auto-dialog`, `AGENT_BROWSER_NO_AUTO_DIALOG`, `--json`, `AGENT_BROWSER_JSON`, `--annotate`, `AGENT_BROWSER_ANNOTATE`, `--screenshot-dir <path>`, `AGENT_BROWSER_SCREENSHOT_DIR`, `--screenshot-quality <n>`, `AGENT_BROWSER_SCREENSHOT_QUALITY`, `--screenshot-format <fmt>`, `AGENT_BROWSER_SCREENSHOT_FORMAT`, `--content-boundaries`, `AGENT_BROWSER_CONTENT_BOUNDARIES`, `--max-output <chars>`, `AGENT_BROWSER_MAX_OUTPUT`, `--allowed-domains <list>`, `AGENT_BROWSER_ALLOWED_DOMAINS`, `--action-policy <path>`, `AGENT_BROWSER_ACTION_POLICY`, `--confirm-actions <list>`, `AGENT_BROWSER_CONFIRM_ACTIONS`, `--confirm-interactive`, `AGENT_BROWSER_CONFIRM_INTERACTIVE`, `-p, --provider <name>`, `AGENT_BROWSER_PROVIDER`, `browserbase`, `kernel`, `browseruse`, `browserless`, `agentcore`, `--device <name>`, `AGENT_BROWSER_IOS_DEVICE`, `agent-browser -p ios device list`, `agent-browser -p ios swipe up`, `agent-browser -p ios tap @e1`, `--model <name>`, `AI_GATEWAY_MODEL`, `-v, --verbose`, `-q, --quiet`, `--debug`, `AGENT_BROWSER_DEBUG`, `AGENT_BROWSER_CONFIG`, `AGENT_BROWSER_DEFAULT_TIMEOUT`, `AGENT_BROWSER_STREAM_PORT`, `AGENT_BROWSER_IDLE_TIMEOUT_MS`, `AGENT_BROWSER_ENCRYPTION_KEY`, `AGENT_BROWSER_STATE_EXPIRE_DAYS`, `AGENT_BROWSER_IOS_UDID`, `AI_GATEWAY_URL`, `AI_GATEWAY_API_KEY` | [`COMMAND_REFERENCE.md`](COMMAND_REFERENCE.md#important-global-flags-config-and-environment), README provider/setup notes, [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#sessionmode), architecture/runtime docs. | Runtime handles value flags, launch-scoped flags, redacted invocation echoes, `sessionMode: "fresh"` recovery hints, explicit sessions, and provider/device launch-scoping. Process env forwards a curated allowlist/prefix set for upstream/provider credentials without cloning the whole parent env. Subprocess completion uses `watchSpawnedChildCompletion` so inherited stdio from detached descendants cannot stall the tool after the direct `agent-browser` child exits (`RQ-0097`). | Runtime tests cover launch-scoped flags, provider/device planning, redaction, stateless inspections, and explicit/fresh sessions. Process tests cover provider env prefixes, exit-vs-close completion when descendants keep stdio open, and timeout (`124`) resolution under the same stdio-inheritance pattern (`RQ-0097`). Fake provider/specialized-skill matrix covers provider argv/env passthrough. Package doctor checks version/source drift. | Supported. Provider clouds, iOS/Appium, Browserbase/Kernel/BrowserUse/Browserless/AgentCore, proxies, profiles, and credentials require external setup; the wrapper documents and forwards them thinly rather than emulating provider behavior. |
|
|
56
57
|
|
|
57
58
|
## Follow-up decision after closure
|
|
58
59
|
|
|
59
|
-
Native `job`, `qa`, experimental `sourceLookup`,
|
|
60
|
+
Native `job`, `qa`, experimental `sourceLookup`, experimental `networkSourceLookup`, and first-class Electron lifecycle/probe support are shipped.
|
|
60
61
|
|
|
61
|
-
`RQ-0066` shipped as the bounded evidence model in [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#sourcelookup): it compiles to upstream `batch` steps (`is visible`, `get html`, `react inspect`, `react tree` as applicable), merges `details.sourceLookup` into the tool `details` alongside batch presentation, and never reclassifies an upstream-successful batch to failed solely because no candidates were found (unlike `qa` diagnostic reclassification).
|
|
62
|
+
`RQ-0066` shipped as the bounded evidence model in [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#sourcelookup): it compiles to upstream `batch` steps (`is visible`, `get html`, `react inspect`, `react tree` as applicable), merges `details.sourceLookup` into the tool `details` alongside batch presentation, and never reclassifies an upstream-successful batch to failed solely because no candidates were found (unlike `qa` diagnostic reclassification). Wrapper-tracked packaged Electron no-candidate results now add bounded `workspaceRoot` / `electronContext` when available, limitations that the scan only covers the Pi cwd and does not unpack installed app resources or `app.asar`, and live Electron `snapshot` / `probe` / `tab list` next actions. Fake coverage: `agentBrowserExtension explains packaged Electron sourceLookup no-candidate boundaries` in [`test/agent-browser.extension-validation.test.ts`](../test/agent-browser.extension-validation.test.ts).
|
|
62
63
|
|
|
63
64
|
`RQ-0067` shipped as the failed-request correlation experiment in [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#networksourcelookup): it compiles to upstream `batch` steps (`network request …` and/or `network requests --filter …`), merges `details.networkSourceLookup` after scanning batch JSON for failed requests and optional workspace URL literals, redacts query strings and credentials in model-visible surfaces, and never reclassifies an upstream-successful batch to failed solely because no candidates were found.
|
|
64
65
|
|
|
65
66
|
`RQ-0093` keeps network diagnostics read-only for wrapper page/ref state: standalone `network request …` results and generated `networkSourceLookup` batch rows may contain API/request URLs, but those URLs are not promoted to `details.sessionTabTarget` and do not stale the latest app-page `details.refSnapshot`. The prior session target is preserved until a real page/navigation/snapshot result updates it. Contract: [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#networksourcelookup) and [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#details); fake coverage: `agentBrowserExtension keeps network request diagnostics from replacing the active page target` in [`test/agent-browser.extension-validation.test.ts`](../test/agent-browser.extension-validation.test.ts).
|
|
66
67
|
|
|
67
|
-
`RQ-0095` adds bounded machine follow-ups for compact `network requests` output: `extensions/agent-browser/lib/results/presentation.ts` selects at most one safe request ID (actionable failed row first, then API/fetch-like row, benign failed row, or first safe ID) and appends `details.nextActions` for exact `network request <id>`, optional `networkSourceLookup` on actionable failed rows, path filtering with `network requests --filter <path>`, and `network har start` before a repro. Request-detail/filter/HAR argv preserve the current `--session` prefix when known, source lookup nextActions carry `networkSourceLookup.session` when known, and URL queries plus sensitive-looking IDs/paths are omitted from action params. Contract: [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#details); human workflow: [`COMMAND_REFERENCE.md`](COMMAND_REFERENCE.md) network diagnostics note and README source-lookup section; fake coverage: `buildToolPresentation formats redacted network payload, response, and error previews` and `buildToolPresentation returns bounded network request next actions for benign and successful API rows` in [`test/agent-browser.presentation.test.ts`](../test/agent-browser.presentation.test.ts).
|
|
68
|
+
`RQ-0095` adds bounded machine follow-ups for compact `network requests` output: `extensions/agent-browser/lib/results/presentation/diagnostics.ts` selects at most one safe request ID (actionable failed row first, then API/fetch-like row, benign failed row, or first safe ID) and appends `details.nextActions` for exact `network request <id>`, optional `networkSourceLookup` on actionable failed rows, path filtering with `network requests --filter <path>`, and `network har start` before a repro. Request-detail/filter/HAR argv preserve the current `--session` prefix when known, source lookup nextActions carry `networkSourceLookup.session` when known, and URL queries plus sensitive-looking IDs/paths are omitted from action params. Contract: [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#details); human workflow: [`COMMAND_REFERENCE.md`](COMMAND_REFERENCE.md) network diagnostics note and README source-lookup section; fake coverage: `buildToolPresentation formats redacted network payload, response, and error previews` and `buildToolPresentation returns bounded network request next actions for benign and successful API rows` in [`test/agent-browser.presentation.test.ts`](../test/agent-browser.presentation.test.ts).
|
|
68
69
|
|
|
69
|
-
`RQ-0092` adds first-class native select support to the wrapper shorthand surfaces without adding a recipe layer: `semanticAction.action = "select"` requires `selector` plus `value` or `values` and compiles to upstream `select <selector> <value...>`; constrained `job` supports the same `select` step inside generated `batch` stdin. Role/name/label dropdown selection is deliberately not hidden behind `find … select` because upstream `find` has no verified select action; agents should use a stable selector or a current `@ref` for native selects and reserve visible option refs for custom comboboxes after a fresh snapshot. Stale-ref retries remain limited to compiled `find` semantic actions, so `select @e…` failures return refresh guidance rather than blind retry. Contract: [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#semanticaction) and [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#job); fake coverage: semanticAction/job select compile and stale-ref assertions in [`test/agent-browser.extension-validation.test.ts`](../test/agent-browser.extension-validation.test.ts); real-upstream coverage: raw, semanticAction, and job select against the localhost native `<select>` fixture in [`test/agent-browser.real-upstream-contract.test.ts`](../test/agent-browser.real-upstream-contract.test.ts).
|
|
70
|
+
`RQ-0092` adds first-class native select support to the wrapper shorthand surfaces without adding a recipe layer: `semanticAction.action = "select"` requires `selector` plus `value` or `values` and compiles to upstream `select <selector> <value...>`; constrained `job` supports the same `select` step inside generated `batch` stdin. Role/name/label dropdown selection is deliberately not hidden behind `find … select` because upstream `find` has no verified select action; agents should use a stable selector or a current `@ref` for native selects and reserve visible option refs for custom comboboxes after a fresh snapshot. Stale-ref retries remain limited to compiled `find` semantic actions, so `select @e…` failures return refresh guidance rather than blind retry. Contract: [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#semanticaction) and [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#job); fake coverage: semanticAction/job select compile in [`test/agent-browser.extension-input-modes.test.ts`](../test/agent-browser.extension-input-modes.test.ts) and stale-ref assertions in [`test/agent-browser.extension-validation.test.ts`](../test/agent-browser.extension-validation.test.ts); real-upstream coverage: raw, semanticAction, and job select against the localhost native `<select>` fixture in [`test/agent-browser.real-upstream-contract.test.ts`](../test/agent-browser.real-upstream-contract.test.ts).
|
|
70
71
|
|
|
71
72
|
`RQ-0091` keeps advanced release smoke tests focused on extension behavior instead of external skill routing: the Sauce Demo smoke in [`RELEASE.md`](RELEASE.md#public-sauce-demo-checkout-smoke-prompt) now launches with `--no-skills`, restricts tools to `agent_browser`, and uses bounded release-smoke wording rather than dogfood/exploratory QA language. Runtime guidance remains the concise stop-boundary and exact-artifact-path contract from `extensions/agent-browser/lib/playbook.ts`; no site-specific automation or recipe layer was added. Evidence from the failed high/low local-shop runs showed skill/report drift (`dogfood-output` substitution) and reasoning complexity, not a wrapper command defect, so skill-enabled dogfood remains a separate validation mode. Human workflow: [`RELEASE.md`](RELEASE.md#public-sauce-demo-checkout-smoke-prompt), [`AGENTS.md`](../AGENTS.md#preferred-testing-workflow), and [`REQUIREMENTS.md`](REQUIREMENTS.md#testing-guidance).
|
|
72
73
|
|
|
73
|
-
`RQ-
|
|
74
|
+
`RQ-0097` keeps upstream subprocess completion reliable when detached descendants inherit the child’s stdio handles: `runAgentBrowserProcess` in `extensions/agent-browser/lib/process.ts` uses `watchSpawnedChildCompletion` to observe both Node `exit` and `close`, leaves piped stdio intact during the short post-`exit` grace (`EXIT_STDIO_GRACE_MS`, currently **100 ms**) so normal `close` can still win, destroys those streams only if the fallback resolves, and resolves with exit-code precedence `close` → wrapper timeout (**124**) → post-`exit` fallback for the direct child → spawn failure (**127**) when `close` is still delayed so the Pi tool cannot hang after `agent-browser` has already exited. Human context: [`ARCHITECTURE.md`](ARCHITECTURE.md#direct-subprocess-execution) (subprocess bullet) and [`AGENTS.md`](../AGENTS.md) (**Runtime planning** → **Upstream subprocess completion**); fake coverage: `runAgentBrowserProcess resolves after exit when descendants keep stdio handles open` and `runAgentBrowserProcess returns timeout exit code when descendants keep stdio handles open` in [`test/agent-browser.process.test.ts`](../test/agent-browser.process.test.ts).
|
|
74
75
|
|
|
75
|
-
`RQ-
|
|
76
|
+
`RQ-0096` ships first-class Electron desktop-app support without adding a generic recipe runtime: top-level `electron` covers wrapper-owned `list`, isolated `launch` with snapshot/tabs/connect handoff, `status`, `cleanup`, and compact current-session or launch-scoped `probe`; `qa.attached` extends the existing QA preset for attached Electron/CDP sessions without introducing `electron.qa`. `launch.handoff` still defaults to `"snapshot"`, while `handoff: "tabs"` is documented as the safer diagnostic starting point when refs/content capture is not needed yet. Host install discovery (`discoverElectronApps`) is macOS/Linux-only today: on Windows `electron.list` reports `platform: "unsupported"` with an empty catalog and name/bundle targets cannot resolve from scans—use `executablePath` (or a host path to the Electron binary) for Windows launch targeting. Discovery adds non-blocking likely-sensitive app annotations plus visible isolated-profile/auth-state warnings; launch output and `details.electron.profileIsolation` state that wrapper launches do not reuse existing signed-in app profiles or attach to already-running authenticated apps, and point agents to the host debug-port launch plus raw `connect` path when signed-in local app state is the goal; launch timeout failures include PID/profile/DevToolsActivePort/timing diagnostics; status/probe add launch/session identifiers, liveness, mismatch/reattach next actions, and dead-launch context for `about:blank`; post-mutation Electron death is upgraded to `tab-drift` with `details.electronPostCommandHealth`; Electron fills can add `details.fillVerification`; Electron `@e…` mutations can add same-URL ref freshness guidance; broad Electron `get text` selectors add scope warnings; cleanup ownership is bounded to wrapper-created launch records and temp profiles; externally launched debug ports stay on the manual `args: ["connect", "<port-or-url>"]` path and remain host-owned. Contract: [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#electron) plus [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#qa) for `qa.attached`; human workflow: [`COMMAND_REFERENCE.md`](COMMAND_REFERENCE.md#electron-desktop-apps) and README common calls; implementation: `extensions/agent-browser/index.ts` and `extensions/agent-browser/lib/electron/`; deterministic efficiency evidence: `electron-lifecycle` and `electron-probe` in `scripts/agent-browser-efficiency-benchmark.mjs`; fake coverage includes Electron schema/probe/mismatch/post-command-health/fill-verification/broad-text/discovery-sensitivity and packaged-sourceLookup cases in [`test/agent-browser.extension-validation.test.ts`](../test/agent-browser.extension-validation.test.ts). This plan is the `RQ-0068` revisit evidence for Electron specifically: [`docs/plans/electron-extension-2026-05-20.md`](plans/electron-extension-2026-05-20.md) documents repeated failure-prone discover/launch/attach/cleanup and multi-call state-probe sequences, plus bounded owner/versioning/test/docs artifacts.
|
|
76
77
|
|
|
77
|
-
`RQ-
|
|
78
|
+
`RQ-0097` completes manual CDP attach recovery without making manually launched apps wrapper-owned: successful raw `connect` results append the session-scoped safe tab-list action `list-connected-session-tabs`; `snapshot -i` failures whose upstream error says `No active page` append the safe tab-list action `list-tabs-after-no-active-page` when a session is known. Agents then choose a stable `tab t<N>` target and run `snapshot -i` explicitly; the wrapper does not emit raw-connect or no-active-page snapshot retry ids without a wrapper-observed safe tab id. The runtime source of truth for these recovery ids is `AGENT_BROWSER_RECOVERY_NEXT_ACTION_IDS` in `extensions/agent-browser/lib/results/recovery-actions.ts` (re-exported from `shared.ts`). The guidance keeps manual signed-in desktop apps and explicit artifacts host-owned while `close` remains a browser/CDP-session close and `electron.cleanup` remains limited to wrapper-created `electron.launch` records. Contract: [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#details); human workflow: [`ELECTRON.md`](ELECTRON.md#manual-host-launch-pattern) and [`COMMAND_REFERENCE.md`](COMMAND_REFERENCE.md#electron-desktop-apps); fake coverage: raw connect and no-active snapshot assertions in [`test/agent-browser.extension-validation.test.ts`](../test/agent-browser.extension-validation.test.ts), plus central next-action helper coverage in [`test/agent-browser.results.test.ts`](../test/agent-browser.results.test.ts).
|
|
78
79
|
|
|
79
|
-
`RQ-
|
|
80
|
+
`RQ-0068` remains closed with a no-adopt decision for a reusable named browser recipe runtime. The Electron evidence above justified a narrow typed shorthand and compact probe, not an open-ended recipe layer; future reusable recipes still require concrete repeated workflow evidence and a defined owner/versioning/test plan.
|
|
80
81
|
|
|
81
|
-
`RQ-
|
|
82
|
+
`RQ-0098` completes the docs/playbook groundwork for desktop readiness and wait orchestration without adding a runtime primitive or reusable recipe layer. The accepted ladder is: prefer condition waits (`wait --text`, `wait --url`, `wait --fn`, `wait --load <state>`, `wait --download`) when a real condition exists; after raw manual CDP `connect`, inspect `tab list`, select a stable `tab t<N>` surface, then run a condition wait or `snapshot -i`; after wrapper-owned `electron.launch`, use `electron.probe` / `electron.status` when launch health or target mismatch matters; use `qa.attached` for current-session text/selector diagnostics; keep fixed waits as a last resort below the wrapper IPC budget; and treat fixed-wait payloads such as `"waited":"timeout"` as elapsed time rather than completion evidence. Manual signed-in attach docs now also restate that `connect` readiness is not immediate readiness, `close` only closes the browser/CDP session, `electron.cleanup` remains wrapper-owned, and manually launched apps plus explicit artifacts stay host-owned. Human workflow: [`ELECTRON.md`](ELECTRON.md#readiness-and-waits), [`COMMAND_REFERENCE.md`](COMMAND_REFERENCE.md#wait-for-page-readiness-or-downloads), README Electron section, and generated playbook text from `extensions/agent-browser/lib/playbook.ts`. Revisit a first-class host-idle primitive only with repeated desktop smoke evidence that condition waits, `qa.attached`, `electron.probe`, snapshots, and screenshots cannot cover the workflow. Verification: `npm run docs` keeps generated playbook fragments aligned; no runtime `details.nextActions` are part of this RQ.
|
|
82
83
|
|
|
83
|
-
`RQ-
|
|
84
|
+
`RQ-0100` makes desktop tab/surface drift recovery machine-readable without adding routine tab-list probes for normal clicks. When existing wrapper state already identifies a target tab, about:blank and tab-drift paths append `list-tabs-for-about-blank-recovery` or `list-tabs-for-tab-drift-recovery`, then `select-intended-tab-after-drift` and `snapshot-after-tab-recovery` when the stable `t<N>` id is known. The implementation reuses `priorSessionTabTarget`, `aboutBlankSessionMismatch`, `sessionTabCorrection`, `openResultTabCorrection`, and existing tab-correction outputs; it does not probe tabs for ordinary clicks beyond the RQ-0086-gated drift paths. Contract: [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#details); human workflow: [`COMMAND_REFERENCE.md`](COMMAND_REFERENCE.md#tabs) and [`ELECTRON.md`](ELECTRON.md#troubleshooting); fake coverage: about:blank recovery and explicit-about:blank negatives in [`test/agent-browser.extension-tab-recovery.test.ts`](../test/agent-browser.extension-tab-recovery.test.ts), early tab-drift failure assertions in [`test/agent-browser.extension-validation.test.ts`](../test/agent-browser.extension-validation.test.ts), and central next-action helper coverage in [`test/agent-browser.results.test.ts`](../test/agent-browser.results.test.ts).
|
|
84
85
|
|
|
85
|
-
`RQ-
|
|
86
|
+
`RQ-0099` makes semantic fill misses on host-controlled rich inputs recoverable without changing upstream `find` semantics or adding a recipe runtime. When `selector-not-found` recovery already collected an exact current editable `searchbox` / `textbox` ref, `extensions/agent-browser/lib/results/selector-recovery.ts` defines `details.richInputRecovery`, visible `Rich input recovery`, and bounded `focus-current-editable-ref*` / `click-current-editable-ref*` next actions; `extensions/agent-browser/index.ts` only probes the current session snapshot and merges the result. Those next actions never copy the fill text and never press `Enter` or submit; agents should refresh refs, choose the current editable `@ref`, focus/click it, then use `keyboard inserttext` or `keyboard type` with the intended text only after the right input is focused. Contract: [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#details); human workflow: README locator shorthand, [`COMMAND_REFERENCE.md`](COMMAND_REFERENCE.md#selector-strategy), and generated playbook text from `extensions/agent-browser/lib/playbook.ts`; fake coverage: `agentBrowserExtension returns rich input recovery when semanticAction fill misses current editable refs` in [`test/agent-browser.extension-validation.test.ts`](../test/agent-browser.extension-validation.test.ts).
|
|
86
87
|
|
|
87
|
-
`RQ-
|
|
88
|
+
`RQ-0101` improves compact snapshot usefulness for dense desktop host screens without adding a new mode or dumping all refs inline. `extensions/agent-browser/lib/results/snapshot.ts` still emits the existing visible `Omitted high-value controls` section and `details.data.highValueControlRefIds`, while `snapshot-high-value-controls.ts` selects omitted controls with bounded diversity so editable/searchbox/textbox/combobox controls, named tab/surface controls, and primary action buttons remain discoverable even when many utility buttons and dense host rows compete for the trimmed ref budget. Human workflow: [`COMMAND_REFERENCE.md`](COMMAND_REFERENCE.md#snapshot-refs-and-current-page-state), [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#details), and README; fake coverage: `buildToolPresentation keeps dense desktop host high-value controls discoverable in compact snapshots` in [`test/agent-browser.snapshot-presentation.test.ts`](../test/agent-browser.snapshot-presentation.test.ts).
|
|
89
|
+
|
|
90
|
+
`RQ-0070` adds bounded locator fallbacks when a compiled top-level `semanticAction` fails with `failureCategory: "selector-not-found"`: `extensions/agent-browser/index.ts` appends `try-*-candidate` entries to `details.nextActions` (and an `Agent-browser candidate fallbacks` block in visible text) only for `click`+`text` (`try-button-name-candidate` and `try-link-name-candidate`). Other locator/action pairs omit this block; fill recovery now goes through the RQ-0099 current-editable-ref ladder so candidate nextActions do not repeat fill text. `semanticAction` `select` uses explicit `selector` plus `value`/`values` and compiles to upstream `select`, not to unverified `find … select`. Active-session role/name click/check/uncheck shorthands also get a pre-execution visible-ref resolution pass via one fresh `snapshot -i`, so hidden duplicate upstream `find` matches do not steal the action; the original target remains in `details.compiledSemanticAction` and the executed ref appears in `details.effectiveArgs`. Contract: [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#semanticaction); fake coverage: semantic selector-miss assertions in [`test/agent-browser.extension-validation.test.ts`](../test/agent-browser.extension-validation.test.ts) plus current-ref assertions and `agentBrowserExtension resolves semantic role clicks through current visible snapshot refs when available` in [`test/agent-browser.extension-input-modes.test.ts`](../test/agent-browser.extension-input-modes.test.ts).
|
|
91
|
+
|
|
92
|
+
`RQ-0071` makes that shorthand session-aware: optional `semanticAction.session` compiles to `--session <name>` before `find` or `select`, so `buildExecutionPlan` treats the call like any argv that already names an upstream session (no extra implicit `--session`); `details.sessionName` reflects the name on success; stale-ref retries for compiled `find` actions copy compiled argv with that prefix, and `try-*` candidates preserve the same `--session` prefix via `getCompiledSemanticActionSessionPrefix`. Contract: [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#semanticaction); fake coverage: `semanticAction` session compile/assertions in [`test/agent-browser.extension-input-modes.test.ts`](../test/agent-browser.extension-input-modes.test.ts).
|
|
93
|
+
|
|
94
|
+
`RQ-0088` adds current-snapshot ref fallback for selector misses: when raw `find` or compiled `semanticAction` fails with `failureCategory: "selector-not-found"`, `extensions/agent-browser/index.ts` may take one fresh session-scoped `snapshot -i`, then `extensions/agent-browser/lib/results/selector-recovery.ts` looks for exact normalized role/name matches for the failed target and emits `details.visibleRefFallback` plus visible `Current snapshot ref fallback`. Non-fill matches append bounded direct-ref next actions (`try-current-visible-ref` / `try-current-visible-ref-N`); fill matches omit direct args/text and feed the RQ-0099 rich-input recovery path when the ref is editable. The matcher is intentionally narrow: role locators require `--name`; text-click maps only to exact-name `button`/`link` refs; label/placeholder fill maps only to exact-name textbox/searchbox-style refs; prefixes/fuzzy matches are ignored, and duplicate exact matches carry ambiguity safety copy. Contract: [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#details) (`visibleRefFallback`, nextActions); human workflow: [`COMMAND_REFERENCE.md`](COMMAND_REFERENCE.md) selector strategy and README pitfalls; fake coverage: `agentBrowserExtension suggests current snapshot refs when raw find role locators miss` in [`test/agent-browser.extension-validation.test.ts`](../test/agent-browser.extension-validation.test.ts).
|
|
95
|
+
|
|
96
|
+
`RQ-0072` guards page-scoped `@e…` refs against silent recycling: successful `snapshot` (or the last `snapshot` step inside a successful `batch`) records `details.refSnapshot` with ref ids and the snapshot page URL; `extensions/agent-browser/lib/session-page-state.ts` replays per-session snapshots and `refSnapshotInvalidation` markers from the transcript on reload/resume, clears them on successful `close`, invalidates prior refs when a session `snapshot` fails with `No active page`, rejects mutation-prone ref argv before spawn when the tab URL diverges, a ref id is missing from the latest snapshot, or the session refs are invalidated, blocks `batch` stdin that uses `@e…` on a guarded command after an earlier step that can navigate or mutate until a `snapshot` step appears later in the same stdin array (pre-spawn latch reset only), and prefixes `refresh-interactive-refs` with `--session` when the call names a session (including upstream-classified `stale-ref` outcomes). Contract: [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#details) (`refSnapshot`, `refSnapshotInvalidation`, `stale-ref`); human workflow: [`COMMAND_REFERENCE.md`](COMMAND_REFERENCE.md) snapshot/ref notes and README pitfalls; fake coverage: `agentBrowserExtension recommends tab recovery after No active page snapshot failures` and `agentBrowserExtension invalidates refs after No active page snapshot failures inside batch` in [`test/agent-browser.extension-validation.test.ts`](../test/agent-browser.extension-validation.test.ts), plus `agentBrowserExtension blocks page-scoped ref reuse…`, `…blocks stale refs after page-changing steps inside a batch`, `…allows same-snapshot form fills before a batch click`, `…allows batch stdin ref steps after snapshot following an invalidating step`, `…records snapshot refs returned inside a successful batch`, and `…rejects refs absent from the latest same-page snapshot` in [`test/agent-browser.extension-ref-guards.test.ts`](../test/agent-browser.extension-ref-guards.test.ts).
|
|
97
|
+
|
|
98
|
+
`RQ-0087` keeps the RQ-0072 guard but removes `fill` from the batch invalidation latch: `fill @e…` rows remain guarded against stale/missing refs, yet multiple same-snapshot form fills can run before the first click/submit/navigation step in one upstream `batch`. A later guarded ref after `click`, `open`, `reload`, or other invalidating rows still fails before spawn unless the batch includes a fresh `snapshot` step first. This improves login/checkout efficiency without permitting likely post-navigation ref reuse. Contract: [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#details) (`Batch stdin ordering`); human workflow: README and [`COMMAND_REFERENCE.md`](COMMAND_REFERENCE.md) ref notes; fake coverage: `agentBrowserExtension allows same-snapshot form fills before a batch click` in [`test/agent-browser.extension-ref-guards.test.ts`](../test/agent-browser.extension-ref-guards.test.ts).
|
|
99
|
+
|
|
100
|
+
`RQ-0073` surfaces likely overlay blockers after no-navigation clicks without inventing blind targets: for **top-level** `click` results (unified command `click`, not `batch`-wrapped steps) whose upstream JSON includes `data.clicked`, whose prior pinned tab URL and post-click URL (from `details.navigationSummary`, gathered by one read-only `eval` summary when the click payload omits **both** string `data.url` and `data.title`) stay equal after the same fragment-insensitive normalization used for ref preflight, and where the same unified result did **not** already apply session tab correction or about-blank mismatch recovery, `extensions/agent-browser/index.ts` takes one fresh session-scoped `snapshot -i`, scans `refs` for strong modal context (`dialog` / `alertdialog`) plus up to three close/dismiss-pattern `button`/`link`/`menuitem` controls, and only then emits `details.overlayBlockers` (`candidates`, `summary`, and a `snapshot` map that can advance `refSnapshot`), visible `Possible overlay blockers`, and `inspect-overlay-state` / `try-overlay-blocker-candidate-*` next actions (with `--session` prefix when the session is named) appended after presentation follow-ups such as `inspect-after-mutation`. Page-wide privacy/sign-in/banner text without a dialog role is deliberately ignored to avoid warnings after ordinary same-page clicks. Contract: [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#details) (`overlayBlockers`); human workflow: [`COMMAND_REFERENCE.md`](COMMAND_REFERENCE.md) no-navigation click note and README pitfalls; fake coverage: `agentBrowserExtension surfaces likely overlay blockers after a no-op click` and `agentBrowserExtension does not report overlay blockers from unrelated page chrome after a successful same-page click` in [`test/agent-browser.extension-errors-artifacts.test.ts`](../test/agent-browser.extension-errors-artifacts.test.ts).
|
|
101
|
+
|
|
102
|
+
`RQ-0086` reduces wrapper-induced click fragility found during Sauce Demo smokes: navigation-summary enrichment for click/back/forward/reload/dblclick now uses one read-only `eval` (`({ title: document.title, url: location.href })`) instead of serial `get title` plus `get url` probes, including tab-pinned batch wrappers. Tab pinning/post-command tab correction now runs only after the wrapper has evidence of tab-drift risk (profile restore correction, overlapping stale opens, or restored session state), so ordinary same-session clicks no longer get repeated `tab list` probes. This keeps `details.navigationSummary`, overlay blocker checks, and drift recovery intact while avoiding the upstream `agent-browser 0.27.0` sequence that could report later clicks as successful without dispatching pointer/click events after repeated getter/tab/snapshot probes. Fake coverage: `agentBrowserExtension enriches click results with a post-navigation title and url summary` in [`test/agent-browser.extension-tabs.test.ts`](../test/agent-browser.extension-tabs.test.ts), plus `agentBrowserExtension pins the intended tab inside a follow-up command when reconnect drift would otherwise steal focus` and about-blank/tab overlap assertions in [`test/agent-browser.extension-tab-recovery.test.ts`](../test/agent-browser.extension-tab-recovery.test.ts); manual validation source: [`RELEASE.md`](RELEASE.md#public-sauce-demo-checkout-smoke-prompt).
|
|
88
103
|
|
|
89
104
|
`RQ-0089` investigated remaining Sauce Demo no-op clicks after RQ-0086. Minimal direct-upstream probes against `agent-browser 0.27.0` reproduced the residual `Finish` behavior without the wrapper: both CSS `click [data-test="finish"]` and `find role button click --name Finish` returned success, but a page-level click listener recorded no click event and the URL stayed on `checkout-step-two.html` after a 1s wait; a separate cart-link check showed normal trusted click events and navigation when the app handled the event. Conclusion: the residual issue is upstream/site interaction rather than wrapper post-click probes. Runtime behavior stays thin/no site-specific DOM-click fallback; docs now state that click success is attempted-action evidence only and agents must verify important mutations with URL/text/state checks or fresh snapshots before continuing. Contract: [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#details) (`pageChangeSummary`, `nextActions`); human workflow: README and [`COMMAND_REFERENCE.md`](COMMAND_REFERENCE.md) click verification notes; manual evidence: direct-upstream RQ-0089 development probe plus [`RELEASE.md`](RELEASE.md#public-sauce-demo-checkout-smoke-prompt) smoke caveats.
|
|
90
105
|
|
|
91
|
-
`RQ-0074` warns when `get text <selector>` may read hidden or tabbed DOM content: for non-ref CSS selectors, `extensions/agent-browser/index.ts` runs a read-only `eval --stdin` visibility probe after successful text reads, emits `details.selectorTextVisibility` plus visible warning text when the first match is hidden while visible matches exist or when multiple matches make the upstream first-match choice ambiguous, preserves multiple batched warnings in `details.selectorTextVisibilityAll`, and appends `inspect-visible-text-candidates` next actions. Contract: [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#details) (`selectorTextVisibility`); human workflow: [`COMMAND_REFERENCE.md`](COMMAND_REFERENCE.md) extraction note and README pitfalls; fake coverage: `agentBrowserExtension warns when get text may read hidden selector matches` in [`test/agent-browser.extension-
|
|
106
|
+
`RQ-0074` warns when `get text <selector>` may read hidden or tabbed DOM content: for non-ref CSS selectors, `extensions/agent-browser/index.ts` runs a read-only `eval --stdin` visibility probe after successful text reads, emits `details.selectorTextVisibility` plus visible warning text when the first match is hidden while visible matches exist or when multiple matches make the upstream first-match choice ambiguous, preserves multiple batched warnings in `details.selectorTextVisibilityAll`, and appends `inspect-visible-text-candidates` next actions. Contract: [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#details) (`selectorTextVisibility`); human workflow: [`COMMAND_REFERENCE.md`](COMMAND_REFERENCE.md) extraction note and README pitfalls; fake coverage: `agentBrowserExtension warns when get text may read hidden selector matches` in [`test/agent-browser.extension-errors-artifacts.test.ts`](../test/agent-browser.extension-errors-artifacts.test.ts).
|
|
92
107
|
|
|
93
|
-
`RQ-0075` classifies QA and diagnostic network failures by likely impact: `summarizeNetworkFailures` / `classifyNetworkRequestFailure` in `extensions/agent-browser/lib/results/shared.ts` split rows that already count as failed (`isFailedNetworkRequest`) into actionable versus benign low-impact browser icon asset misses (`isBenignAssetFailure`: favicon/apple-touch-icon basename patterns, 404/`failed`/string `error` signals, and image-like `resourceType`/`mimeType` when present). `analyzeQaPresetResults` fails `qa` only for actionable network failures while preserving benign rows in `qaPreset.warnings`, and network request presentation adds a compact actionable/benign summary plus per-row impact tags, ordered with actionable/benign failed rows before successful rows so late failures are visible even in capped previews. Contract: [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#qa) and [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#details); human workflow: [`COMMAND_REFERENCE.md`](COMMAND_REFERENCE.md) QA and network diagnostic notes; fake coverage: `agentBrowserExtension compiles lightweight QA presets and fails diagnostics` in [`test/agent-browser.extension-
|
|
108
|
+
`RQ-0075` classifies QA and diagnostic network failures by likely impact: `summarizeNetworkFailures` / `classifyNetworkRequestFailure` in `extensions/agent-browser/lib/results/network.ts` (re-exported from `shared.ts`) split rows that already count as failed (`isFailedNetworkRequest`) into actionable versus benign low-impact browser icon asset misses (`isBenignAssetFailure`: favicon/apple-touch-icon basename patterns, 404/`failed`/string `error` signals, and image-like `resourceType`/`mimeType` when present). `analyzeQaPresetResults` fails `qa` only for actionable network failures while preserving benign rows in `qaPreset.warnings`, and network request presentation adds a compact actionable/benign summary plus per-row impact tags, ordered with actionable/benign failed rows before successful rows so late failures are visible even in capped previews. Because real Pi ignores returned `isError` fields from custom tool `execute`, `extensions/agent-browser/index.ts` also realigns `details.resultCategory: "failure"` outcomes to Pi-visible tool errors through a `tool_result` handler; it appends the exact failure category plus `Pi tool isError: true` to prose output and preserves caller-requested `--json` output as parseable JSON while patching `isError`. Contract: [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#qa) and [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#details); human workflow: [`COMMAND_REFERENCE.md`](COMMAND_REFERENCE.md) QA and network diagnostic notes; fake coverage: `agentBrowserExtension compiles lightweight QA presets and fails diagnostics` in [`test/agent-browser.extension-input-modes.test.ts`](../test/agent-browser.extension-input-modes.test.ts) plus network presentation assertions in [`test/agent-browser.presentation.test.ts`](../test/agent-browser.presentation.test.ts); real-Pi evidence should use saved JSONL sessions per [`AGENTS.md`](../AGENTS.md#preferred-testing-workflow).
|
|
94
109
|
|
|
95
|
-
`RQ-0076` adds best-effort timeout recovery when the wrapper watchdog kills a stuck upstream process: `extensions/agent-browser/index.ts` calls `collectTimeoutPartialProgress` / `formatTimeoutPartialProgressText` to build `details.timeoutPartialProgress` from the compiled `job` or `qa` step list or parsed caller `batch` stdin, session-scoped `get url` / `get title` (plus optional planned-URL fallback from `open`/`navigate`/`pushstate` steps), and declared artifact paths (`screenshot`, `pdf`, `download`, `wait --download`) with existence/size checks, then appends a visible `Timeout partial progress` block with redacted URLs/paths. Contract: [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#details); human workflow: [`COMMAND_REFERENCE.md`](COMMAND_REFERENCE.md) wrapper timeout note and README job section; fake coverage: `agentBrowserExtension reports partial progress and artifacts after job timeout` in [`test/agent-browser.extension-
|
|
110
|
+
`RQ-0076` adds best-effort timeout recovery when the wrapper watchdog kills a stuck upstream process: `extensions/agent-browser/index.ts` calls `collectTimeoutPartialProgress` / `formatTimeoutPartialProgressText` to build `details.timeoutPartialProgress` from the compiled `job` or `qa` step list or parsed caller `batch` stdin, session-scoped `get url` / `get title` (plus optional planned-URL fallback from `open`/`navigate`/`pushstate` steps), and declared artifact paths (`screenshot`, `pdf`, `download`, `wait --download`) with existence/size checks, then appends a visible `Timeout partial progress` block with redacted URLs/paths. Contract: [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#details); human workflow: [`COMMAND_REFERENCE.md`](COMMAND_REFERENCE.md) wrapper timeout note and README job section; fake coverage: `agentBrowserExtension reports partial progress and artifacts after job timeout` in [`test/agent-browser.extension-errors-artifacts.test.ts`](../test/agent-browser.extension-errors-artifacts.test.ts).
|
|
96
111
|
|
|
97
|
-
`RQ-0077` reports managed-session outcomes after managed-session process execution: `extensions/agent-browser/index.ts` builds `details.managedSessionOutcome` (`buildManagedSessionOutcome`), recording `status` values such as `preserved` (previous managed session remains current) or `abandoned` (no managed session became current), plus previous/current/attempted session names, optional `replacedSessionName`, and active-before/after booleans. Visible `Managed session outcome: …` text (`formatManagedSessionOutcomeText`) is appended only when `sessionMode` is `"fresh"` and the outcome’s `succeeded` is false—covering launch failures, missing-binary on a fresh plan, and post-batch failures such as **`qa`** reclassification where `succeeded` is realigned after the fact. Contract: [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#details); human workflow: [`COMMAND_REFERENCE.md`](COMMAND_REFERENCE.md) session-mode notes and README session section; fake coverage: `agentBrowserExtension reports managed-session outcomes after failed fresh launches` and the managed-session slice of `agentBrowserExtension compiles lightweight QA presets and fails diagnostics` in [`test/agent-browser.extension-
|
|
112
|
+
`RQ-0077` reports managed-session outcomes after managed-session process execution: `extensions/agent-browser/index.ts` builds `details.managedSessionOutcome` (`buildManagedSessionOutcome`), recording `status` values such as `preserved` (previous managed session remains current) or `abandoned` (no managed session became current), plus previous/current/attempted session names, optional `replacedSessionName`, and active-before/after booleans. Visible `Managed session outcome: …` text (`formatManagedSessionOutcomeText`) is appended only when `sessionMode` is `"fresh"` and the outcome’s `succeeded` is false—covering launch failures, missing-binary on a fresh plan, and post-batch failures such as **`qa`** reclassification where `succeeded` is realigned after the fact. Contract: [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#details); human workflow: [`COMMAND_REFERENCE.md`](COMMAND_REFERENCE.md) session-mode notes and README session section; fake coverage: `agentBrowserExtension reports managed-session outcomes after failed fresh launches` in [`test/agent-browser.extension-errors-artifacts.test.ts`](../test/agent-browser.extension-errors-artifacts.test.ts) and the managed-session slice of `agentBrowserExtension compiles lightweight QA presets and fails diagnostics` in [`test/agent-browser.extension-input-modes.test.ts`](../test/agent-browser.extension-input-modes.test.ts).
|
|
98
113
|
|
|
99
|
-
`RQ-0078` improves getter/eval discoverability: `extensions/agent-browser/lib/results/presentation.ts` matches upstream failure text containing `unknown command`, `unknown subcommand`, or `unrecognized command` (case-insensitive) when the failed command token is one of `attr`, `count`, `html`, `text`, `title`, `url`, or `value`, then adds grouped-`get` prose; only `title` / `url` also emit read-only `nextActions` (`use-get-title` / `use-get-url`, with `--session` when the failed call named a session). The getter block is skipped when selector recovery already injected an `Agent-browser hint:` line into the same error string. `extensions/agent-browser/index.ts` adds `details.evalStdinHint` plus visible `Eval stdin hint` when `looksLikeFunctionEvalStdin` matches trimmed stdin and upstream JSON carries a plain empty-object `data.result`; empty arrays such as `[]` are valid eval results and are not warned. Contract: [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#details) (`nextActions`, `evalStdinHint`); human workflow: [`COMMAND_REFERENCE.md`](COMMAND_REFERENCE.md) extraction note and README quick start; fake coverage: `buildToolPresentation suggests grouped getter commands for common unknown getter shortcuts` and `agentBrowserExtension warns when eval stdin returns an empty object from a function-shaped snippet`.
|
|
114
|
+
`RQ-0078` improves getter/eval discoverability: `extensions/agent-browser/lib/results/presentation/errors.ts` matches upstream failure text containing `unknown command`, `unknown subcommand`, or `unrecognized command` (case-insensitive) when the failed command token is one of `attr`, `count`, `html`, `text`, `title`, `url`, or `value`, then adds grouped-`get` prose; only `title` / `url` also emit read-only `nextActions` (`use-get-title` / `use-get-url`, with `--session` when the failed call named a session). The getter block is skipped when selector recovery already injected an `Agent-browser hint:` line into the same error string. `extensions/agent-browser/index.ts` adds `details.evalStdinHint` plus visible `Eval stdin hint` when `looksLikeFunctionEvalStdin` matches trimmed stdin and upstream JSON carries a plain empty-object `data.result`; empty arrays such as `[]` are valid eval results and are not warned. Contract: [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#details) (`nextActions`, `evalStdinHint`); human workflow: [`COMMAND_REFERENCE.md`](COMMAND_REFERENCE.md) extraction note and README quick start; fake coverage: `buildToolPresentation suggests grouped getter commands for common unknown getter shortcuts` and `agentBrowserExtension warns when eval stdin returns an empty object from a function-shaped snippet`.
|
|
100
115
|
|
|
101
|
-
`RQ-0079` clarifies artifact lifecycle and cleanup ownership: `extensions/agent-browser/index.ts` adds `details.artifactCleanup` and visible `Artifact lifecycle` copy on successful `close` when `artifactManifest.entries` is non-empty (`getArtifactCleanupGuidance`), stating that close does not delete explicit artifacts; `explicitArtifactPaths` carries up to ten distinct existing `explicit-path` manifest paths after a filesystem existence check, skipping stale paths already removed by host tools (possibly empty when the recent window has no existing explicit rows). Contract: [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#details) (`artifactCleanup`); human workflow: [`COMMAND_REFERENCE.md`](COMMAND_REFERENCE.md) artifact retention section and README artifact notes; fake coverage: `agentBrowserExtension reports artifact lifecycle guidance on close
|
|
116
|
+
`RQ-0079` clarifies artifact lifecycle and cleanup ownership: `extensions/agent-browser/index.ts` adds `details.artifactCleanup` and visible `Artifact lifecycle` copy on successful `close` when `artifactManifest.entries` is non-empty (`getArtifactCleanupGuidance`), stating that close does not delete explicit artifacts; `explicitArtifactPaths` carries up to ten distinct existing `explicit-path` manifest paths after a filesystem existence check, skipping stale paths already removed by host tools (possibly empty when the recent window has no existing explicit rows). Contract: [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#details) (`artifactCleanup`); human workflow: [`COMMAND_REFERENCE.md`](COMMAND_REFERENCE.md) artifact retention section and README artifact notes; fake coverage: `agentBrowserExtension reports artifact lifecycle guidance on close` in [`test/agent-browser.extension-errors-artifacts.test.ts`](../test/agent-browser.extension-errors-artifacts.test.ts).
|
|
102
117
|
|
|
103
118
|
`RQ-0080` adds no-op scroll recovery for dense dashboards and nested panes: for successful top-level `scroll`, `extensions/agent-browser/index.ts` samples viewport and prominent scroll-container positions before and after execution with read-only session-scoped `eval --stdin` probes. If no sampled position changes, it emits `details.scrollNoop`, appends visible `Scroll diagnostic: no observed scroll movement`, appends exact `inspect-after-noop-scroll` / `verify-noop-scroll-visually` next actions, and updates `pageChangeSummary.nextActionIds` so agents can branch without parsing prose. Contract: [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#details) (`scrollNoop`, `nextActions`); human workflow: [`COMMAND_REFERENCE.md`](COMMAND_REFERENCE.md) scroll note; fake coverage: `agentBrowserExtension reports no-op scroll diagnostics with recovery next actions`.
|
|
104
119
|
|