pi-agent-browser-native 0.2.44 → 0.2.45
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/CHANGELOG.md +26 -0
- package/README.md +20 -15
- package/docs/ARCHITECTURE.md +12 -10
- package/docs/COMMAND_REFERENCE.md +49 -27
- package/docs/ELECTRON.md +1 -1
- package/docs/RELEASE.md +6 -5
- package/docs/REQUIREMENTS.md +6 -3
- package/docs/SUPPORT_MATRIX.md +17 -13
- package/docs/TOOL_CONTRACT.md +87 -46
- package/docs/platform-smoke.md +4 -3
- package/extensions/agent-browser/index.ts +29 -445
- package/extensions/agent-browser/lib/bash-guard.ts +205 -0
- package/extensions/agent-browser/lib/electron/cdp.ts +69 -0
- package/extensions/agent-browser/lib/electron/cleanup.ts +5 -58
- package/extensions/agent-browser/lib/electron/discovery.ts +2 -9
- package/extensions/agent-browser/lib/electron/launch.ts +11 -65
- package/extensions/agent-browser/lib/electron/text.ts +13 -0
- package/extensions/agent-browser/lib/fs-utils.ts +18 -0
- package/extensions/agent-browser/lib/input-modes/job.ts +207 -21
- package/extensions/agent-browser/lib/input-modes/params.ts +17 -7
- package/extensions/agent-browser/lib/input-modes/semantic-action.ts +22 -2
- package/extensions/agent-browser/lib/input-modes/types.ts +5 -1
- package/extensions/agent-browser/lib/input-modes.ts +1 -0
- package/extensions/agent-browser/lib/orchestration/browser-run/click-dispatch.ts +82 -11
- package/extensions/agent-browser/lib/orchestration/browser-run/diagnostics.ts +153 -30
- package/extensions/agent-browser/lib/orchestration/browser-run/final-result.ts +53 -2
- package/extensions/agent-browser/lib/orchestration/browser-run/index.ts +1 -0
- package/extensions/agent-browser/lib/orchestration/browser-run/prepare.ts +751 -32
- package/extensions/agent-browser/lib/orchestration/browser-run/process-output.ts +38 -7
- package/extensions/agent-browser/lib/orchestration/browser-run/prompt-guards.ts +0 -46
- package/extensions/agent-browser/lib/orchestration/browser-run/session-state.ts +10 -1
- package/extensions/agent-browser/lib/orchestration/browser-run/types.ts +28 -1
- package/extensions/agent-browser/lib/orchestration/electron-host/index.ts +1 -6
- package/extensions/agent-browser/lib/orchestration/input-plan.ts +15 -3
- package/extensions/agent-browser/lib/orchestration/output-file.ts +86 -0
- package/extensions/agent-browser/lib/pi-tool-rendering.ts +231 -0
- package/extensions/agent-browser/lib/playbook.ts +26 -26
- package/extensions/agent-browser/lib/process.ts +1 -1
- package/extensions/agent-browser/lib/prompt-policy.ts +1 -18
- package/extensions/agent-browser/lib/results/artifact-manifest.ts +1 -4
- package/extensions/agent-browser/lib/results/artifact-state.ts +7 -3
- package/extensions/agent-browser/lib/results/contracts.ts +6 -2
- package/extensions/agent-browser/lib/results/envelope.ts +11 -2
- package/extensions/agent-browser/lib/results/network-routes.ts +7 -4
- package/extensions/agent-browser/lib/results/network.ts +7 -1
- package/extensions/agent-browser/lib/results/presentation/artifacts.ts +88 -20
- package/extensions/agent-browser/lib/results/presentation/batch.ts +84 -12
- package/extensions/agent-browser/lib/results/presentation/diagnostics.ts +81 -26
- package/extensions/agent-browser/lib/results/presentation/errors.ts +13 -0
- package/extensions/agent-browser/lib/results/presentation/registry.ts +60 -0
- package/extensions/agent-browser/lib/results/presentation.ts +10 -1
- package/extensions/agent-browser/lib/results/snapshot-high-value-controls.ts +16 -5
- package/extensions/agent-browser/lib/results/snapshot.ts +2 -0
- package/extensions/agent-browser/lib/runtime.ts +10 -1
- package/extensions/agent-browser/lib/session-page-state.ts +15 -6
- package/extensions/agent-browser/lib/web-search.ts +1 -1
- package/package.json +2 -2
- package/platform-smoke.config.mjs +5 -2
- package/scripts/platform-smoke/build-ubuntu-image.mjs +25 -0
- package/scripts/platform-smoke/crabbox-runner.mjs +5 -1
- package/scripts/platform-smoke/doctor.mjs +6 -2
- package/scripts/platform-smoke/linux-image/Dockerfile +3 -5
- package/scripts/platform-smoke/targets.mjs +2 -1
- package/extensions/agent-browser/lib/orchestration/browser-run/browser-action-model.ts +0 -154
package/docs/SUPPORT_MATRIX.md
CHANGED
|
@@ -49,7 +49,11 @@ These rows track this feedback batch. Some rows are docs-only or environment-own
|
|
|
49
49
|
| RQ-0116 | Fresh-session failure prose is opaque and exposes internal generated session ids without clear recovery. | Wrapper | **wrapper-owned shipped** (action-oriented visible recovery + `nextActions`; `attemptedSessionName` remains in `details`). Struct + visible line already exist (`RQ-0077`). | `buildManagedSessionOutcome` still keeps full generated-session transition details in `details.managedSessionOutcome`, while visible failure prose now summarizes preserved/abandoned/replaced outcomes without repeating generated ids. Focused fake coverage covers preserved, missing-binary, abandoned, and QA-reclassification paths. | No further wrapper action planned for this batch unless reviewer finds recovery actions unsafe or insufficient. | `extensions/agent-browser/lib/orchestration/browser-run/session-state.ts`, `final-result.ts`, `docs/TOOL_CONTRACT.md`, `test/agent-browser.extension-errors-artifacts.test.ts`, `test/agent-browser.extension-input-modes.test.ts`. |
|
|
50
50
|
| RQ-0117 | There is no machine-readable confirmation that headed mode is visible to the user. | Wrapper gap + environment (display) | **documented unsupported** for this batch; true OS visibility is **out-of-scope/host-owned** until upstream exposes a portable signal. Pairs with RQ-0110. | Same root cause as RQ-0110: no portable upstream/wrapper field observed. Headed launch success is not visibility proof, and adding a constant `details.headedVisibility: "unsupported"` would add noise without a decision signal. | No runtime field in this batch. Keep the explicit contract limitation and independent screenshot/tab/get-url evidence guidance. | README, `docs/TOOL_CONTRACT.md`, `docs/COMMAND_REFERENCE.md`, generated playbook guidance. |
|
|
51
51
|
| RQ-0118 | Second-round report says every `eval --stdin` expression on `file://` returned `null`, including `1 + 1` and `document.title`. | Wrapper UX + caller-shape recovery | **wrapper-owned shipped in follow-up branch**: direct upstream and native-tool checks on maintainer macOS show `eval --stdin` works on `file://` when the script is supplied through the top-level native tool `stdin` field. The reported all-null behavior is reproduced by the malformed native-tool shape `args: ["eval", "--stdin", "document.title"]` with no top-level `stdin`, which upstream treats as empty stdin and returns `null`. | Direct probes: `agent-browser --session ... open file:///tmp/page.html` then `printf '1+1' | agent-browser --session ... eval --stdin` returns `2`; native tool `{ args: ["eval", "--stdin"], stdin: "document.title" }` returns the fixture title; native tool `{ args: ["eval", "--stdin", "1+1"] }` reproduced `result: null` before normalization. | Normalize the common malformed native-tool call by moving trailing args after `--stdin` into process stdin before launch; keep docs/playbook explicit that top-level `stdin` is canonical. | `extensions/agent-browser/lib/orchestration/input-plan.ts`, `test/agent-browser.extension-errors-artifacts.test.ts`, README, `docs/COMMAND_REFERENCE.md`, `docs/TOOL_CONTRACT.md`, generated playbook guidance. |
|
|
52
|
-
| RQ-0123 | Stress testing found artifact and navigation safety contracts could report success after failure evidence: missing explicit artifact files and `--allowed-domains` click escapes. | Wrapper result contract + navigation policy | **wrapper-owned in this branch**: non-pending resolved file artifacts with `exists:false` fail closed with `failureCategory: "artifact-missing"
|
|
52
|
+
| RQ-0123 | Stress testing found artifact and navigation safety contracts could report success after failure evidence: missing explicit artifact files and `--allowed-domains` click escapes. | Wrapper result contract + navigation policy | **wrapper-owned in this branch**: non-pending resolved file artifacts with `exists:false` fail closed with `failureCategory: "artifact-missing"`, missing downloads are labeled as reported-but-not-verified rather than completed, and non-file URL payloads such as `data:` downloads are not treated as host artifacts; argv-supplied `--allowed-domains` is remembered for the managed session and successful-looking browser commands whose final observed `http(s)` URL escapes the allowlist fail with `failureCategory: "policy-blocked"`. | Reproduced in issue #68/#69 audits: `wait --download` and `diff screenshot --output` reported success with missing files; `example.com` opened under `--allowed-domains example.com` but clicking `Learn more` reached `www.iana.org`, while direct outside-domain open was blocked by upstream. | Preserve verified and pending artifact success; preserve valid in-domain navigation and direct upstream blocks; keep allowlist matching exact host plus subdomain suffix and skip non-`http(s)` URLs. | `extensions/agent-browser/lib/results/presentation/artifacts.ts`, `presentation.ts`, `batch.ts`, `contracts.ts`, `action-recommendations.ts`, `extensions/agent-browser/lib/navigation-policy.ts`, `process-output.ts`, `docs/TOOL_CONTRACT.md`, `docs/COMMAND_REFERENCE.md`, focused artifact/navigation tests. |
|
|
53
|
+
| RQ-0124 | 2026-06-05 stress report found several agent-facing ergonomics gaps: `doctor` rendered as `(see attached image)`, simple anchor `download <selector> <path>` could time out and save a random-named file, `qa.attached` could false-pass by clearing diagnostics, failed fresh `job` prose implied the old session was still active, constrained `job` lacked semantic locators, dense annotated screenshots were too noisy, `tab list` hid labels, `state save` required pre-created parents, and `session list` names were cryptic. | Wrapper presentation, input modes, artifact preflight, managed-session state | **wrapper-owned in this branch**: top-level no-`data` success envelopes are preserved for presentation; `doctor`, `session list`, and `tab list` render stable readable fields; non-ref simple loopback anchor downloads are fetched directly to the requested path when an HTTP(S) href is resolvable; direct/batch artifact parent directories are created before launch including `state save`; `qa.attached` preserves diagnostics and exposes `diagnosticsResetAtStart:false`; post-launch fresh batch/job failures keep the fresh session current and visible recovery points to the failed step; `job` click/fill support semantic locator fields; annotated screenshot results add density guidance. | Stress report path `/tmp/agent-browser-stress-20260605T174431Z/reports/agent-browser-stress-report.md`; focused fake coverage added for direct anchor downloads, `qa.attached`, fresh job failure, job semantic locators, `state save` parents, tab/session/doctor presentation, and top-level envelope preservation. | Preserve RQ-0123 fail-closed artifact semantics, stale-ref guards, compact dense snapshot high-value refs, normal upstream `download @ref` behavior, and URL-opening QA buffer clearing. Do not add a reusable recipe layer or broad screenshot-label post-processing without new design evidence. | `extensions/agent-browser/lib/results/envelope.ts`, `presentation/diagnostics.ts`, `presentation.ts`, `input-modes/job.ts`, `params.ts`, `types.ts`, `orchestration/browser-run/prepare.ts`, `process-output.ts`, README, `docs/COMMAND_REFERENCE.md`, `docs/TOOL_CONTRACT.md`, focused artifact/input/passthrough/presentation/results tests. |
|
|
54
|
+
| RQ-0125 | Follow-up 2026-06-05 stress report found route mocks could look fulfilled when requests failed, failed `qa.expectedText` checks could collapse into watchdog timeouts, semantic/current-ref recovery missed exact visible controls from nested/open-shadow contexts, compact snapshots after scroll could hide viewport context, batch/job stale-ref guidance lacked an in-schema refresh step, attached QA defaults surprised by preserved buffers, `data:image` request rows polluted diagnostics, `vitals` output lacked readable metrics/unavailable states, and close/session lifecycle prose was ambiguous. The report also expected prompt-derived final-action blocking; that expectation is **rejected** as out of scope except for exact artifact-before-close invariants. | Wrapper diagnostics, input modes, prompt policy, presentation, managed-session lifecycle | **wrapper-owned in this branch**: prompt-derived stop-boundary action blocking was removed while exact requested-artifact-before-close remains; route diagnostics flag failed/pending/CORS routed rows as unfulfilled and expose `inspect-routed-network-request`; QA expected text originally read body text after load and reported missing text as `qa-failure` (superseded by RQ-0126 visible-text predicates); `qa.attached` diagnostic reads default off unless opted in; `job` adds explicit `snapshot` steps; selector recovery uses the failed batch step command for exact current-ref fallbacks; compact snapshots include a viewport-ordering note/field; `network requests` hides `data:image` artifact noise while preserving raw details; `vitals`/`web-vitals` render metric summaries or unavailable reasons; successful managed-session close explains next-session state and `session list` includes active fields. | Stress report path `/private/tmp/piab-project-mUkv1r/dogfood-output/agent-browser-stress-20260605T190651Z/report.md`; focused coverage added for prompt non-blocking, artifact close guard retention, route failures, QA expected text body-check fallback/defaults, batch semantic ref fallback, compact snapshot viewport note, data-image filtering, vitals summaries, and close/session prose. | Keep broad user/business intent enforcement as agent responsibility; do not reintroduce prompt-derived click/key blocking without an explicit machine-checkable invariant. Preserve stale-ref safety by requiring explicit `snapshot` refresh rows rather than weakening ref guards after mutation-prone steps. | `prompt-policy.ts`, `prompt-guards.ts`, `input-modes/job.ts`, `results/network.ts`, `network-routes.ts`, `presentation/diagnostics.ts`, `presentation/registry.ts`, `snapshot.ts`, `orchestration/browser-run/final-result.ts`, `session-state.ts`, README, `docs/COMMAND_REFERENCE.md`, `docs/TOOL_CONTRACT.md`, focused input/prompt/passthrough/semantic/presentation/snapshot/resume tests. |
|
|
55
|
+
| RQ-0126 | Later 2026-06-05 stress report found JavaScript prompt/dialog flows could hit the full watchdog, same-page rerenders could recycle stale refs into false click success, fresh `job`/`qa` post-launch failures could still read like launch failures after timeouts, broad QA body text reads timed out on dense MDN pages, constrained jobs continued after failed setup fills, `diff screenshot` missing outputs still used “saved” wording, virtualized/nested scrolling needed eval workarounds, aggregate network buffers were noisy across navigations, no-op page scrolls surfaced `scrolled:true`, and follow-up feedback asked for better timeout recovery, record-start artifact state, direct semantic selectors, output files, human-paced typing, and timeout knobs. | Wrapper timeouts, ref preflight, input modes, artifact/diagnostic presentation, scroll helpers | **wrapper-owned in this branch**: dialog commands and likely dialog-trigger clicks get shorter wrapper process timeouts plus dialog recovery next actions; ordinary calls use a 35s child watchdog with per-call `timeoutMs`; direct `@ref` mutations compare against a fresh same-page snapshot before spawn and fail stale when role/name identity changes; fresh batch timeouts with recovered page context keep the new session as current in `managedSessionOutcome`; timeout partial progress exposes per-step status and `retry-timeout-step`; QA expected text uses bounded visible-text `wait --fn` predicates; `job.failFast` defaults to `true`, compiles to `batch --bail`, and supports bounded paced `type`; `semanticAction` supports direct selector/ref click/check/fill; `outputPath` writes successful payloads to local files; `record start` / `record restart` artifacts are pending/open instead of missing; missing diff images are labeled reported-but-not-verified; explicit CSS-container `scroll <selector> <dir> [amount]` is handled before page scroll; network previews add a clear-buffer-before-repro next action; no-op scrolls set `details.data.scrolled:false` / `noMovement:true`. | Stress report path `/private/tmp/piab-project-TJbDcs/dogfood-output/agent-browser-stress/reports/agent-browser-stress-report.md`; focused coverage added for dialog timeouts, same-page rerender ref preflight, job `--bail`/bounded `type` compaction, semantic selector compilation, outputPath, timeout retry evidence, visible QA text predicates, record-start/restart pending artifacts, artifact missing wording, explicit container scroll, no-op scroll data, and network clear next actions. | Preserve thin upstream semantics for ordinary commands; do not implement a broad recipe layer or prompt-derived business-action blocker. Dialog recovery remains bounded best-effort, not a guarantee that upstream can accept every wedged prompt; fresh-session recovery is always offered. | `input-modes/job.ts`, `semantic-action.ts`, `params.ts`, `types.ts`, `orchestration/input-plan.ts`, `orchestration/output-file.ts`, `orchestration/browser-run/prepare.ts`, `process-output.ts`, `final-result.ts`, `diagnostics.ts`, `results/presentation/artifacts.ts`, `presentation/diagnostics.ts`, README, `docs/COMMAND_REFERENCE.md`, `docs/TOOL_CONTRACT.md`, focused input/ref/validation/artifact/process/results tests. |
|
|
56
|
+
| RQ-0127 | Broad stress report rooted at `/private/tmp/piab-project-gfkJ2A` found follow-up friction: missing QA expected text could still let later diagnostics push the whole batch into the wrapper watchdog, `eval --stdin` snippets that open JavaScript dialogs used the full default watchdog, `scroll to end` could contradict no-movement diagnostics on long pages, and `keyboard press Enter` was an unsupported upstream shape without targeted guidance. The report also requested larger product/design improvements such as viewport-first snapshots, snapshot search/diff, semantic extraction, page-only network filters, batch retry with fresh refs, and smarter ref lifecycle. | QA compiler, dialog timeout planning, scroll helpers, error presentation, product backlog | **partially wrapper-owned in this branch**: QA now compiles to `batch --bail` so failed text/selector assertions stop before slower diagnostics; dialog-like `eval --stdin` is bounded by the dialog-trigger watchdog and gets dialog recovery next actions on timeout; `scroll to end` / `scroll to top` are wrapper-handled against `document.scrollingElement` with `details.scrollPage`; unsupported `keyboard press` errors now explain `keyboard type` / `inserttext` and Enter usage. Wrapper-side snapshot search/filter is now implemented for `snapshot -i --search <text>` and `--filter role=<role>` while preserving the full `details.refSnapshot`, `snapshot --viewport` adds opt-in viewport/scroll metadata, `snapshot --diff` reports ref-map deltas against the previous tracked snapshot, and wrapper-side `network requests --current-page` / `--current-origin` / `--current-url` filters aggregate buffers by the active page; job `open.loadState` can insert an explicit readiness wait after navigation. Automatic viewport-first snapshots, automatic batch retry with fresh refs, weaker stale-ref lifecycle rules, and broad additional Electron CDP recovery were triaged as resolved-by-existing-controls or intentionally rejected rather than implemented: the wrapper now provides opt-in viewport/search/diff evidence, preserves machine-checkable stale-ref guardrails instead of replaying mutating batches, and keeps the existing Electron status/probe/reattach/No-active-page recovery paths that already have lifecycle coverage without adding speculative CDP heuristics. | Focused coverage added for QA `batch --bail`, dialog-trigger eval timeout, wrapper-handled page scroll-to-end, wrapper-side snapshot search/filter/viewport/diff metadata, network current-page filtering, job open readiness waits, and keyboard press guidance. Root report path currently exists but has no files; inline user report is the source evidence for this row. | Preserve QA pass diagnostics for successful assertions; fail fast only after failed batch steps. Do not weaken stale-ref safety to reduce snapshot tax without a machine-checkable validity invariant; batch auto-retry after a click/open can repeat mutating steps and is intentionally not added. Do not pretend upstream-only CDP/keyboard API gaps are fixed without either wrapper recovery or explicit blocked evidence. | `input-modes/job.ts`, `orchestration/browser-run/prepare.ts`, `final-result.ts`, `results/presentation/errors.ts`, README, `docs/COMMAND_REFERENCE.md`, `docs/TOOL_CONTRACT.md`, focused input/validation/presentation tests. |
|
|
53
57
|
| RQ-0119 | Second-round localhost failures still show `ERR_EMPTY_RESPONSE` even when shell `curl` succeeds. | Environment + wrapper diagnostics | **diagnostic-mitigated**: direct maintainer repro shows localhost HTTP succeeds with a normal same-host Python server, so the wrapper still cannot prove or bridge an environment-specific browser-host namespace/proxy mismatch. Add error presentation guidance specifically for loopback navigation failures so agents do not misread `ERR_EMPTY_RESPONSE` as blank page content. | Direct probe: `python3 -m http.server --bind 127.0.0.1 8766` + `agent-browser open http://127.0.0.1:8766/page.html` succeeds; previous first-batch evidence still shows accept-then-close servers can produce `ERR_EMPTY_RESPONSE`. | Append a local fixture hint on loopback `open`/navigation failures with `net::ERR_EMPTY_RESPONSE`, `ERR_CONNECTION_REFUSED`, `ERR_ADDRESS_UNREACHABLE`, `ERR_TIMED_OUT`, or `ERR_CONNECTION_RESET`; do not add server lifecycle management in the native browser tool. | `extensions/agent-browser/lib/results/presentation/errors.ts`, `test/agent-browser.presentation-skills-recovery.test.ts`, `docs/COMMAND_REFERENCE.md`, `docs/TOOL_CONTRACT.md`. |
|
|
54
58
|
| RQ-0120 | Third-round report says ref/semantic clicks can report success while inline `onclick="…"` handlers do not run, though programmatic `.click()` does. | Wrapper diagnostics + upstream/browser hit testing | **diagnostic-mitigated in follow-up branch**: simple direct upstream probes show inline `onclick` handlers fire for selector and `@ref` clicks on file pages, so the reported case is likely a hit-target/overlay/ref-resolution miss rather than inline attributes generally. Extend the click-dispatch probe to `@e…` refs using the latest snapshot role/name metadata so ref or semanticAction→ref clicks that never deliver a trusted event to the intended element fail with `details.clickDispatch` instead of silently reporting success. Fourth-round external testing confirmed this diagnostic now catches the failure. | Direct probe: minimal `<button onclick="showGraph('rps')">` fixture updates DOM via selector click, `@e1` click, and programmatic `.click()`. Existing wrapper probe covered CSS/XPath only; semantic visible-ref resolution and raw `@e…` clicks skipped dispatch diagnostics. Follow-up tester confirmed programmatic `.click()` remains a useful static-fixture workaround when CDP/user-like click dispatch fails. | Probe standalone `click @e…` when the latest snapshot maps that ref to a unique visible role/name DOM candidate; keep no in-page replay policy. Document programmatic `eval --stdin` `.click()` as an explicit debugging/static-fixture workaround only, not proof of real user click behavior and not a way around stop boundaries. | `extensions/agent-browser/lib/orchestration/browser-run/click-dispatch.ts`, `types.ts`, `prepare.ts`, `test/agent-browser.extension-click-dispatch.test.ts`, README, `docs/COMMAND_REFERENCE.md`, `docs/TOOL_CONTRACT.md`. |
|
|
55
59
|
|
|
@@ -65,7 +69,7 @@ Re-run the gates below before each release; this table records what the closure
|
|
|
65
69
|
| Deterministic dogfood smoke | `npm run verify -- dogfood` (`scripts/verify-agent-browser-dogfood.ts`) drives the native wrapper against a local file fixture through top-level `qa`, `semanticAction`, constrained `job`, screenshot artifact verification, and session close with the real `agent-browser` on `PATH`. | Pass on 2026-06-03 (`npm run verify -- dogfood`, `agent-browser 0.27.1`; artifacts cleaned by the harness). |
|
|
66
70
|
| Efficiency benchmark | `npm run verify -- benchmark` runs deterministic browser workflow accounting plus focused benchmark tests, including JSONL sampling fixtures and job/qa/sourceLookup/networkSourceLookup/Electron scenario coverage. | Pass on 2026-05-29 (`npm run verify -- benchmark`). |
|
|
67
71
|
| Crabbox platform smoke | `npm run check:platform-smoke` syntax-checks the harness and cheap invariants. `npm run smoke:platform:ubuntu-image` builds the project-owned Linux image, `npm run smoke:platform:doctor` checks Crabbox 0.26.0+ and local target readiness, and `npm run smoke:platform:all` runs doctor first, then fast target-local `platform-build` (`npm run verify -- platform-target`, pack, clean Pi install) plus `browser-dogfood-smoke` on Crabbox `macos`, `ubuntu`, and `windows-native`; see [`platform-smoke.md`](platform-smoke.md). Target artifacts include Crabbox/provider/work-root metadata, and release review also checks provider-specific `crabbox list` commands for leftover leases/clones. | Pass on 2026-06-03 (`npm run check:platform-smoke`, `npm run smoke:platform:ubuntu-image`, and `npm run verify -- release`, whose platform slice ran the macOS/Ubuntu/native-Windows Crabbox matrix; artifacts cleaned after evidence capture). |
|
|
68
|
-
| `verify -- release` / `prepublishOnly` | `npm run verify -- release` chains the default gate with packaged Pi smoke and the release-blocking Crabbox platform matrix (`verifySteps` `release` in [`scripts/project.mjs`](../scripts/project.mjs)). `package.json` `prepublishOnly` runs that compose before `npm pack --dry-run` during `npm publish`. It intentionally omits standalone
|
|
72
|
+
| `verify -- release` / `prepublishOnly` | `npm run verify -- release` chains the default gate with the configured-source lifecycle harness, packaged Pi smoke, and the release-blocking Crabbox platform matrix (`verifySteps` `release` in [`scripts/project.mjs`](../scripts/project.mjs)). `package.json` `prepublishOnly` runs that compose before `npm pack --dry-run` during `npm publish`. It intentionally omits standalone real-upstream, host-only dogfood, and benchmark modes—see [`RELEASE.md`](RELEASE.md#pre-release-checks). | Pass on 2026-06-03 (`npm run verify -- release`, including macOS/Ubuntu/native-Windows Crabbox matrix). |
|
|
69
73
|
| Configured-source lifecycle | `npm run verify -- lifecycle` (`scripts/verify-lifecycle.mjs`) drives `/reload`, closes and relaunches Pi with the same exact `--session-id`, checks the JSONL session header id, session continuity, slash-command sentinel tokens (`v1` then `v2` after rewriting the packaged extension to simulate pickup), persisted spill reachability, and real Pi `tool_result` failure-patch semantics for a QA reclassification with a fake upstream on `PATH`. Default Pi model is `zai/glm-5.1`; default per-step wait is **180000 ms** (`DEFAULT_TIMEOUT_MS`); override model with `--model <id>` and waits with `--timeout-ms <ms>`. Passthrough flags in [`scripts/project.mjs`](../scripts/project.mjs): `--keep-artifacts`, `--model`, `--verbose`, and `--timeout-ms` plus a value (for example `npm run verify -- lifecycle --model openai-codex/gpt-5.5:minimal --keep-artifacts --verbose --timeout-ms 600000`). | Pass on 2026-06-03 (`npm run verify -- lifecycle`). Treat any future unexplained red lifecycle gate as a release blocker. |
|
|
70
74
|
| Quick isolated Pi smoke | `pi --no-extensions --no-skills -e . --tools agent_browser` from repo root; native `agent_browser` only. | Last interactive tmux checkout smoke pass on 2026-05-29 (`agent-browser 0.27.0` at the time). The 2026-06-03 Crabbox matrix now covers clean packed Pi install plus deterministic wrapper dogfood on all required platforms for `agent-browser 0.27.1`; run a new manual tmux smoke before publish when human-readable transcript evidence is required. Broader historical coverage also includes version/help/skills, open/snapshot/click, eval stdin, batch stdin, screenshot, explicit session, `sessionMode: "fresh"`, network requests, console/errors, diff snapshot, stream status/disable, dashboard start/stop, and chat credential-failure pass-through during RQ-0055. |
|
|
71
75
|
|
|
@@ -94,13 +98,13 @@ Native `job`, `qa`, experimental `sourceLookup`, experimental `networkSourceLook
|
|
|
94
98
|
|
|
95
99
|
`RQ-0093` keeps network diagnostics read-only for wrapper page/ref state: standalone `network request …` results and generated `networkSourceLookup` batch rows may contain API/request URLs, but those URLs are not promoted to `details.sessionTabTarget` and do not stale the latest app-page `details.refSnapshot`. The prior session target is preserved until a real page/navigation/snapshot result updates it. Contract: [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#networksourcelookup) and [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#details); fake coverage: `agentBrowserExtension keeps network request diagnostics from replacing the active page target` in [`test/agent-browser.extension-validation.test.ts`](../test/agent-browser.extension-validation.test.ts).
|
|
96
100
|
|
|
97
|
-
`RQ-0095` adds bounded machine follow-ups for compact `network requests` output: `extensions/agent-browser/lib/results/presentation/diagnostics.ts` selects at most one safe request ID (actionable failed row first, then API/fetch-like row, benign failed row, or first safe ID) and appends `details.nextActions` for exact `network request <id>`, optional `networkSourceLookup` on actionable failed rows, path filtering with `network requests --filter <path>`, and `network har start` before a repro. Request-detail/filter/HAR argv preserve the current `--session` prefix when known, source lookup nextActions carry `networkSourceLookup.session` when known, and URL queries plus sensitive-looking IDs/paths are omitted from action params. Route-mock diagnostics (#73) now track successful `network route` / `network unroute` patterns per session and, on later `network requests`, surface `details.networkRouteDiagnostics` plus executable `inspect-
|
|
101
|
+
`RQ-0095` adds bounded machine follow-ups for compact `network requests` output: `extensions/agent-browser/lib/results/presentation/diagnostics.ts` selects at most one safe request ID (actionable failed row first, then API/fetch-like row, benign failed row, or first safe ID) and appends `details.nextActions` for exact `network request <id>`, optional `networkSourceLookup` on actionable failed rows, path filtering with `network requests --filter <path>`, and `network har start` before a repro. Request-detail/filter/HAR argv preserve the current `--session` prefix when known, source lookup nextActions carry `networkSourceLookup.session` when known, and URL queries plus sensitive-looking IDs/paths are omitted from action params. Route-mock diagnostics (#73) now track successful `network route` / `network unroute` patterns per session and, on later `network requests`, surface `details.networkRouteDiagnostics` plus executable `inspect-routed-network-request` and `start-network-har-capture-for-route-mock` follow-ups when a matching fetch/XHR row is failed, pending, or CORS/preflight-looking; same-origin/CORS fixture guidance stays in prose rather than a non-runnable next action. Compact network previews hide `data:image` screenshot/artifact rows by default while preserving raw rows in `details.data.requests`. Contract: [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#details); human workflow: [`COMMAND_REFERENCE.md`](COMMAND_REFERENCE.md) network diagnostics note and README source-lookup section; fake coverage: `buildToolPresentation formats redacted network payload, response, and error previews`, `buildToolPresentation returns bounded network request next actions for benign and successful API rows`, `buildToolPresentation adds routed pending network diagnostics`, `buildToolPresentation flags routed requests that return failed statuses`, and `agentBrowserExtension reports unfulfilled routed network mocks`.
|
|
98
102
|
|
|
99
103
|
`RQ-0092` adds first-class native select support to the wrapper shorthand surfaces without adding a recipe layer: `semanticAction.action = "select"` requires `selector` plus `value` or `values` and compiles to upstream `select <selector> <value...>`; constrained `job` supports the same `select` step inside generated `batch` stdin. Role/name/label dropdown selection is deliberately not hidden behind `find … select` because upstream `find` has no verified select action; agents should use a stable selector or a current `@ref` for native selects and reserve visible option refs for custom comboboxes after a fresh snapshot. Stale-ref retries remain limited to compiled `find` semantic actions, so `select @e…` failures return refresh guidance rather than blind retry. Contract: [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#semanticaction) and [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#job); fake coverage: semanticAction/job select compile in [`test/agent-browser.extension-input-modes.test.ts`](../test/agent-browser.extension-input-modes.test.ts) and stale-ref assertions in [`test/agent-browser.extension-validation.test.ts`](../test/agent-browser.extension-validation.test.ts); real-upstream coverage: raw, semanticAction, and job select against the localhost native `<select>` fixture in [`test/agent-browser.real-upstream-contract.test.ts`](../test/agent-browser.real-upstream-contract.test.ts).
|
|
100
104
|
|
|
101
|
-
`RQ-0091` keeps advanced release smoke tests focused on extension behavior instead of external skill routing: the Sauce Demo smoke in [`RELEASE.md`](RELEASE.md#public-sauce-demo-checkout-smoke-prompt) now launches with `--no-skills`, restricts tools to `agent_browser`, and uses bounded release-smoke wording rather than dogfood/exploratory QA language. Runtime guidance
|
|
105
|
+
`RQ-0091` keeps advanced release smoke tests focused on extension behavior instead of external skill routing: the Sauce Demo smoke in [`RELEASE.md`](RELEASE.md#public-sauce-demo-checkout-smoke-prompt) now launches with `--no-skills`, restricts tools to `agent_browser`, and uses bounded release-smoke wording rather than dogfood/exploratory QA language. Runtime guidance keeps stop-before-order/post/purchase/submit as agent-responsibility guidance and exact-artifact-before-close as a wrapper-checkable contract from `extensions/agent-browser/lib/playbook.ts`; no site-specific automation or recipe layer was added. Evidence from the failed high/low local-shop runs showed skill/report drift (`dogfood-output` substitution) and reasoning complexity, not a wrapper command defect, so skill-enabled dogfood remains a separate validation mode. Human workflow: [`RELEASE.md`](RELEASE.md#public-sauce-demo-checkout-smoke-prompt), [`AGENTS.md`](../AGENTS.md#preferred-testing-workflow), and [`REQUIREMENTS.md`](REQUIREMENTS.md#testing-guidance).
|
|
102
106
|
|
|
103
|
-
`RQ-0090`
|
|
107
|
+
`RQ-0090` keeps prompt-derived preflight guards limited to machine-checkable exact artifact requirements. `buildPromptPolicy` in `extensions/agent-browser/lib/prompt-policy.ts` extracts exact requested artifact paths from the latest user message; `prompt-guards.ts` blocks browser `close` / `quit` / `exit` with `details.promptGuard.reason: "requested-artifacts-missing-before-close"` until required prompt screenshot paths are verified in `details.artifactManifest` (optional recording paths are required only when recording appears available). The wrapper intentionally does not infer broad business/user intent from prompt text such as “stop before checkout” or “do not post anything”; those stop boundaries remain agent responsibility and are documented as guidance, not runtime action blocks. Contract: [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#details) (`promptGuard`); human workflow: README stop-boundary/artifact notes and [`COMMAND_REFERENCE.md`](COMMAND_REFERENCE.md); fake coverage: `agentBrowserExtension does not turn prompt stop-boundary text into click blocks`, `agentBrowserExtension blocks close until required prompt screenshot artifacts are saved`, and `buildPromptPolicy detects requested artifact paths without deriving semantic action blockers`.
|
|
104
108
|
|
|
105
109
|
`RQ-0097` keeps upstream subprocess completion reliable when detached descendants inherit the child’s stdio handles: `runAgentBrowserProcess` in `extensions/agent-browser/lib/process.ts` uses `watchSpawnedChildCompletion` to observe both Node `exit` and `close`, leaves piped stdio intact during the short post-`exit` grace (`EXIT_STDIO_GRACE_MS`, currently **100 ms**) so normal `close` can still win, destroys those streams only if the fallback resolves, and resolves with exit-code precedence `close` → wrapper timeout (**124**) → post-`exit` fallback for the direct child → spawn failure (**127**) when `close` is still delayed so the Pi tool cannot hang after `agent-browser` has already exited. Human context: [`ARCHITECTURE.md`](ARCHITECTURE.md#direct-subprocess-execution) (subprocess bullet) and [`AGENTS.md`](../AGENTS.md) (**Runtime planning** → **Upstream subprocess completion**); fake coverage: `runAgentBrowserProcess resolves after exit when descendants keep stdio handles open` asserts the post-exit fallback returns near the 100 ms grace window instead of the process timeout, and `runAgentBrowserProcess returns timeout exit code when descendants keep stdio handles open` in [`test/agent-browser.process.test.ts`](../test/agent-browser.process.test.ts).
|
|
106
110
|
|
|
@@ -116,29 +120,29 @@ Native `job`, `qa`, experimental `sourceLookup`, experimental `networkSourceLook
|
|
|
116
120
|
|
|
117
121
|
`RQ-0099` makes semantic fill misses on host-controlled rich inputs recoverable without changing upstream `find` semantics or adding a recipe runtime. Active-session role/name `semanticAction.fill` first gets a guarded pre-execution current-ref pass: one fresh `snapshot -i`, one exact editable `combobox` / `searchbox` / `textbox` match, then direct `fill @ref <text>` while preserving the original semantic target in `details.compiledSemanticAction`. When a later `selector-not-found` recovery already collected an exact current editable `searchbox` / `textbox` ref, `extensions/agent-browser/lib/results/selector-recovery.ts` defines `details.richInputRecovery`, visible `Rich input recovery`, and bounded `focus-current-editable-ref*` / `click-current-editable-ref*` next actions; `extensions/agent-browser/index.ts` only probes the current session snapshot and merges the result. Those next actions never copy the fill text and never press `Enter` or submit; agents should refresh refs, choose the current editable `@ref`, focus/click it, then use `keyboard inserttext` or `keyboard type` with the intended text only after the right input is focused. Contract: [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#details); human workflow: README locator shorthand, [`COMMAND_REFERENCE.md`](COMMAND_REFERENCE.md#selector-strategy), and generated playbook text from `extensions/agent-browser/lib/playbook.ts`; fake coverage: `agentBrowserExtension resolves semantic role fills through one exact current editable ref` in [`test/agent-browser.extension-input-modes.test.ts`](../test/agent-browser.extension-input-modes.test.ts) and `agentBrowserExtension returns rich input recovery when semanticAction fill misses current editable refs` in [`test/agent-browser.extension-validation.test.ts`](../test/agent-browser.extension-validation.test.ts).
|
|
118
122
|
|
|
119
|
-
`RQ-0101` improves compact snapshot usefulness for dense desktop host screens without adding a new mode or dumping all refs inline. `extensions/agent-browser/lib/results/snapshot.ts` still emits the existing visible `Omitted high-value controls` section and `details.data.highValueControlRefIds`, while `snapshot-high-value-controls.ts` selects omitted controls with bounded diversity so editable/searchbox/textbox/combobox controls, named tab/surface controls,
|
|
123
|
+
`RQ-0101` improves compact snapshot usefulness for dense desktop host screens without adding a new mode or dumping all refs inline. `extensions/agent-browser/lib/results/snapshot.ts` still emits the existing visible `Omitted high-value controls` section and `details.data.highValueControlRefIds`, while `snapshot-high-value-controls.ts` selects omitted controls with bounded diversity so editable/searchbox/textbox/combobox controls, named tab/surface controls, primary action buttons, and high-signal named links such as repository search results remain discoverable even when many utility buttons and dense host rows compete for the trimmed ref budget. Human workflow: [`COMMAND_REFERENCE.md`](COMMAND_REFERENCE.md#snapshot-refs-and-current-page-state), [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#details), and README; fake coverage: `buildToolPresentation keeps dense desktop host high-value controls discoverable in compact snapshots` in [`test/agent-browser.snapshot-presentation.test.ts`](../test/agent-browser.snapshot-presentation.test.ts).
|
|
120
124
|
|
|
121
125
|
`RQ-0070` adds bounded locator fallbacks when a compiled top-level `semanticAction` fails with `failureCategory: "selector-not-found"`: `extensions/agent-browser/index.ts` appends `try-*-candidate` entries to `details.nextActions` (and an `Agent-browser candidate fallbacks` block in visible text) only for `click`+`text` (`try-button-name-candidate` and `try-link-name-candidate`). Other locator/action pairs omit this block; fill recovery now goes through the RQ-0099 current-editable-ref ladder so candidate nextActions do not repeat fill text. `semanticAction` `select` uses explicit `selector` plus `value`/`values` and compiles to upstream `select`, not to unverified `find … select`; `semanticAction.uncheck` is intentionally not exposed while upstream `find … uncheck` is not runtime-supported, and raw `uncheck <selector-or-ref>` remains available. Active-session role/name click/check/fill shorthands also get a pre-execution visible-ref resolution pass via one fresh `snapshot -i`, so hidden duplicate upstream `find` matches do not steal the action; fill requires one exact editable current ref. The original target remains in `details.compiledSemanticAction` and the executed ref appears in `details.effectiveArgs`. Contract: [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#semanticaction); fake coverage: semantic selector-miss assertions in [`test/agent-browser.extension-validation.test.ts`](../test/agent-browser.extension-validation.test.ts) plus current-ref assertions and `agentBrowserExtension resolves semantic role clicks through current visible snapshot refs when available` / `agentBrowserExtension resolves semantic role fills through one exact current editable ref` in [`test/agent-browser.extension-input-modes.test.ts`](../test/agent-browser.extension-input-modes.test.ts).
|
|
122
126
|
|
|
123
|
-
`RQ-0071` makes that shorthand session-aware: optional `semanticAction.session` compiles to `--session <name>` before `find
|
|
127
|
+
`RQ-0071` makes that shorthand session-aware: optional `semanticAction.session` compiles to `--session <name>` before `find`, direct selector/ref commands, or `select`, so `buildExecutionPlan` treats the call like any argv that already names an upstream session (no extra implicit `--session`); `details.sessionName` reflects the name on success; stale-ref retries for compiled `find` actions copy compiled argv with that prefix, and `try-*` candidates preserve the same `--session` prefix via `getCompiledSemanticActionSessionPrefix`. Contract: [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#semanticaction); fake coverage: `semanticAction` session compile/assertions in [`test/agent-browser.extension-input-modes.test.ts`](../test/agent-browser.extension-input-modes.test.ts).
|
|
124
128
|
|
|
125
129
|
`RQ-0088` adds current-snapshot ref fallback for selector misses: when raw `find` or compiled `semanticAction` fails with `failureCategory: "selector-not-found"`, `extensions/agent-browser/index.ts` may take one fresh session-scoped `snapshot -i`, then `extensions/agent-browser/lib/results/selector-recovery.ts` looks for exact normalized role/name matches for the failed target and emits `details.visibleRefFallback` plus visible `Current snapshot ref fallback`. Non-fill matches append bounded direct-ref next actions (`try-current-visible-ref` / `try-current-visible-ref-N`); fill matches omit direct args/text and feed the RQ-0099 rich-input recovery path when the ref is editable. The matcher is intentionally narrow: role locators require `--name`; text-click maps only to exact-name `button`/`link` refs; label/placeholder fill maps only to exact-name textbox/searchbox-style refs; prefixes/fuzzy matches are ignored, and duplicate exact matches carry ambiguity safety copy. Contract: [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#details) (`visibleRefFallback`, nextActions); human workflow: [`COMMAND_REFERENCE.md`](COMMAND_REFERENCE.md) selector strategy and README pitfalls; fake coverage: `agentBrowserExtension suggests current snapshot refs when raw find role locators miss` in [`test/agent-browser.extension-validation.test.ts`](../test/agent-browser.extension-validation.test.ts).
|
|
126
130
|
|
|
127
|
-
`RQ-0072` guards page-scoped `@e…` refs against silent recycling: successful `snapshot` (or the last `snapshot` step inside a successful `batch`) records `details.refSnapshot` with ref ids and the snapshot page URL; `extensions/agent-browser/lib/session-page-state.ts` replays per-session snapshots and `refSnapshotInvalidation` markers from the active transcript branch on `session_start` and Pi 0.78 `session_tree` branch changes, clears them on successful close commands (`close`, `quit`, or `exit`), invalidates prior refs when a session `snapshot` fails with `No active page`, rejects mutation-prone ref argv before spawn when the tab URL diverges, a ref id is missing from the latest snapshot, or the session refs are invalidated, blocks `batch` stdin that uses `@e…` on a guarded command after an earlier step that can navigate or mutate until a `snapshot` step appears later in the same stdin array (pre-spawn latch reset only), and prefixes `refresh-interactive-refs` with `--session` when the call names a session (including upstream-classified `stale-ref` outcomes). The entrypoint also serializes `session_tree` restore and wrapper-owned browser commands with managed-session work, guards independent caller-owned explicit-session completions with a branch-state generation check, keeps process-owned cleanup registries for managed sessions and wrapper-launched Electron records separate from the branch-visible view, treats explicit wrapper-owned close rows and Electron cleanup managed-session steps as restore-visible close events, closes off-branch owned managed sessions and Electron launches on non-quit reload shutdown, preserves current branch-visible active managed/Electron sessions and active Electron temp profiles for reload continuity, and preserves fresh-session allocation monotonicity across branch restores. Contract: [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#details) (`refSnapshot`, `refSnapshotInvalidation`, `stale-ref`); human workflow: [`COMMAND_REFERENCE.md`](COMMAND_REFERENCE.md) snapshot/ref notes and README pitfalls; fake coverage: `agentBrowserExtension recommends tab recovery after No active page snapshot failures` and `agentBrowserExtension invalidates refs after No active page snapshot failures inside batch` in [`test/agent-browser.extension-validation.test.ts`](../test/agent-browser.extension-validation.test.ts), plus `agentBrowserExtension blocks page-scoped ref reuse…`, `…rehydrates page-scoped refs from the current tree branch`, `…rehydrates managed browser session state from the current tree branch`, `…rehydrates artifact manifest state from the current tree branch`, `…keeps Electron cleanup ownership after session_tree switches away from the launch branch`, `…blocks stale refs after page-changing steps inside a batch`, `…allows same-snapshot form fills before a batch click`, `…allows batch stdin ref steps after snapshot following an invalidating step`, `…records snapshot refs returned inside a successful batch`, and `…rejects refs absent from the latest same-page snapshot` in [`test/agent-browser.extension-ref-guards.test.ts`](../test/agent-browser.extension-ref-guards.test.ts); managed-session reload cleanup, explicit close untracking/state rotation/restore, generated fresh-name reservation after repeated explicit closes, explicit-session command versus `session_tree` generation-guard coverage, explicit close versus in-flight implicit command serialization, and fresh-ordinal coverage lives in [`test/agent-browser.resume-state.test.ts`](../test/agent-browser.resume-state.test.ts).
|
|
131
|
+
`RQ-0072` guards page-scoped `@e…` refs against silent recycling: successful `snapshot` (or the last `snapshot` step inside a successful `batch`) records `details.refSnapshot` with ref ids and the snapshot page URL; `extensions/agent-browser/lib/session-page-state.ts` replays per-session snapshots and `refSnapshotInvalidation` markers from the active transcript branch on `session_start` and Pi 0.78 `session_tree` branch changes, clears them on successful close commands (`close`, `quit`, or `exit`), invalidates prior refs when a session `snapshot` fails with `No active page`, rejects mutation-prone ref argv before spawn when the tab URL diverges, a ref id is missing from the latest snapshot, or the session refs are invalidated, blocks `batch` stdin that uses `@e…` on a guarded command after an earlier step that can navigate or mutate until a `snapshot` step appears later in the same stdin array (pre-spawn latch reset only), and prefixes `refresh-interactive-refs` with `--session` when the call names a session (including upstream-classified `stale-ref` outcomes). The entrypoint also serializes `session_tree` restore and wrapper-owned browser commands with managed-session work, guards independent caller-owned explicit-session completions with a branch-state generation check, keeps process-owned cleanup registries for managed sessions and wrapper-launched Electron records separate from the branch-visible view, treats explicit wrapper-owned close rows and Electron cleanup managed-session steps as restore-visible close events, closes off-branch owned managed sessions and Electron launches on non-quit reload shutdown, preserves current branch-visible active managed/Electron sessions and active Electron temp profiles for reload continuity, and preserves fresh-session allocation monotonicity across branch restores. Contract: [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#details) (`refSnapshot`, `refSnapshotInvalidation`, `stale-ref`); human workflow: [`COMMAND_REFERENCE.md`](COMMAND_REFERENCE.md) snapshot/ref notes and README pitfalls; fake coverage: `agentBrowserExtension recommends tab recovery after No active page snapshot failures` and `agentBrowserExtension invalidates refs after No active page snapshot failures inside batch` in [`test/agent-browser.extension-validation.test.ts`](../test/agent-browser.extension-validation.test.ts), plus `agentBrowserExtension blocks page-scoped ref reuse…`, `…rehydrates page-scoped refs from the current tree branch`, `…rehydrates managed browser session state from the current tree branch`, `…rehydrates artifact manifest state from the current tree branch`, `…keeps Electron cleanup ownership after session_tree switches away from the launch branch`, `…blocks stale refs after page-changing steps inside a batch`, `…allows same-snapshot form fills before a batch click`, `…allows same-snapshot form control batches before a hard invalidating click`, `…allows batch stdin ref steps after snapshot following an invalidating step`, `…records snapshot refs returned inside a successful batch`, and `…rejects refs absent from the latest same-page snapshot` in [`test/agent-browser.extension-ref-guards.test.ts`](../test/agent-browser.extension-ref-guards.test.ts); managed-session reload cleanup, explicit close untracking/state rotation/restore, generated fresh-name reservation after repeated explicit closes, explicit-session command versus `session_tree` generation-guard coverage, explicit close versus in-flight implicit command serialization, and fresh-ordinal coverage lives in [`test/agent-browser.resume-state.test.ts`](../test/agent-browser.resume-state.test.ts).
|
|
128
132
|
|
|
129
|
-
`RQ-0087` keeps the RQ-0072 guard but removes safe same-snapshot form work from the batch invalidation latch: `fill @e…` rows and role-checked native form-control rows (`check`/`uncheck` on checkbox or radio refs and `select` on combobox refs) remain guarded against stale/missing refs, yet can run before the first click/submit/navigation step in one upstream `batch`. A later guarded ref after `open`, `reload`,
|
|
133
|
+
`RQ-0087` keeps the RQ-0072 guard but removes safe same-snapshot form work from the batch invalidation latch: `fill @e…` rows and role-checked native form-control rows (`check`/`uncheck` or direct `click`/`tap` on checkbox or radio refs, and `select` on combobox refs) remain guarded against stale/missing refs, yet can run before the first hard click/submit/navigation step in one upstream `batch`. A later guarded ref after `open`, `reload`, non-form `click`/`tap`, or other invalidating rows still fails before spawn unless the batch includes a fresh `snapshot` step first; checkbox/radio clicks are only allowed when every ref in the step has latest-snapshot checkbox/radio role evidence. This improves login/checkout/static-form efficiency without permitting likely post-navigation ref reuse. Contract: [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#details) (`Batch stdin ordering`); human workflow: README and [`COMMAND_REFERENCE.md`](COMMAND_REFERENCE.md) ref notes; fake coverage: `agentBrowserExtension allows same-snapshot form fills before a batch click` and `agentBrowserExtension allows same-snapshot form control batches before a hard invalidating click` in [`test/agent-browser.extension-ref-guards.test.ts`](../test/agent-browser.extension-ref-guards.test.ts).
|
|
130
134
|
|
|
131
|
-
`RQ-0073` surfaces likely overlay blockers after no-navigation clicks without inventing blind targets: for **top-level** `click` results (unified command `click`, not `batch`-wrapped steps) whose upstream JSON includes `data.clicked`, whose prior pinned tab URL and post-click URL (from `details.navigationSummary`, gathered by one read-only `eval` summary when the click payload omits **both** string `data.url` and `data.title`) stay equal after the same fragment-insensitive normalization used for ref preflight, and where the same unified result did **not** already apply session tab correction, about-blank mismatch recovery, or `details.clickDispatch` fired for the same result, `extensions/agent-browser/index.ts` takes one fresh session-scoped `snapshot -i
|
|
135
|
+
`RQ-0073` surfaces likely overlay blockers in snapshot output and after no-navigation clicks without inventing blind targets: successful `snapshot` results can emit the same blocker candidates when their own refs show strong modal evidence; for **top-level** `click` results (unified command `click`, not `batch`-wrapped steps) whose upstream JSON includes `data.clicked`, whose prior pinned tab URL and post-click URL (from `details.navigationSummary`, gathered by one read-only `eval` summary when the click payload omits **both** string `data.url` and `data.title`) stay equal after the same fragment-insensitive normalization used for ref preflight, and where the same unified result did **not** already apply session tab correction, about-blank mismatch recovery, or `details.clickDispatch` fired for the same result, `extensions/agent-browser/index.ts` takes one fresh session-scoped `snapshot -i` (or uses the successful snapshot result directly), scans `refs` for strong modal context (`dialog` / `alertdialog`) plus up to three close/dismiss-pattern `button`/`link`/`menuitem` controls, and only then emits `details.overlayBlockers` (`candidates`, `summary`, and a `snapshot` map that can advance `refSnapshot`), visible `Possible overlay blockers`, and `inspect-overlay-state` / `try-overlay-blocker-candidate-*` next actions (with `--session` prefix when the session is named) appended after presentation follow-ups such as `inspect-after-mutation`. Page-wide privacy/sign-in/banner text without a dialog role is deliberately ignored to avoid warnings after ordinary same-page clicks. Contract: [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#details) (`overlayBlockers`); human workflow: [`COMMAND_REFERENCE.md`](COMMAND_REFERENCE.md) no-navigation click note and README pitfalls; fake coverage: `agentBrowserExtension surfaces overlay blockers in snapshot actionability metadata`, `agentBrowserExtension surfaces likely overlay blockers after a no-op click` and `agentBrowserExtension does not report overlay blockers from unrelated page chrome after a successful same-page click` in [`test/agent-browser.extension-errors-artifacts.test.ts`](../test/agent-browser.extension-errors-artifacts.test.ts).
|
|
132
136
|
|
|
133
137
|
`RQ-0086` reduces wrapper-induced click fragility found during Sauce Demo smokes: navigation-summary enrichment for click/back/forward/reload/dblclick now uses one read-only `eval` (`({ title: document.title, url: location.href })`) instead of serial `get title` plus `get url` probes, including tab-pinned batch wrappers. Tab pinning/post-command tab correction now runs only after the wrapper has evidence of tab-drift risk (profile restore correction, overlapping stale opens, or restored session state), so ordinary same-session clicks no longer get repeated `tab list` probes. This keeps `details.navigationSummary`, overlay blocker checks, and drift recovery intact while avoiding the upstream `agent-browser 0.27.0` sequence that could report later clicks as successful without dispatching pointer/click events after repeated getter/tab/snapshot probes. Fake coverage: `agentBrowserExtension enriches click results with a post-navigation title and url summary` in [`test/agent-browser.extension-tabs.test.ts`](../test/agent-browser.extension-tabs.test.ts), plus `agentBrowserExtension pins the intended tab inside a follow-up command when reconnect drift would otherwise steal focus` and about-blank/tab overlap assertions in [`test/agent-browser.extension-tab-recovery.test.ts`](../test/agent-browser.extension-tab-recovery.test.ts); manual validation source: [`RELEASE.md`](RELEASE.md#public-sauce-demo-checkout-smoke-prompt).
|
|
134
138
|
|
|
135
|
-
`RQ-0089` investigated Sauce Demo no-op clicks after RQ-0086, and the 2026-05-26 release smoke reproduced the failure against direct upstream `agent-browser 0.27.0`: CSS `click [data-test=add-to-cart-sauce-labs-backpack]` and current `@ref` clicks returned success, but a page-level listener recorded no trusted pointer/mouse/click events and the cart stayed unchanged; an in-page `element.click()` did mutate the cart. The wrapper now adds a bounded top-level non-Electron `click` dispatch probe before standalone clicks. If upstream reports success but no trusted DOM event reached the target, it fails the tool, records `details.clickDispatch.status: "no-native-event-observed"`, and appends `inspect-click-dispatch-miss` / `retry-click-after-dispatch-miss` next actions; it does **not** replay clicks in-page. This is not site-specific and does not alter `batch`/`job`/`qa` click steps. For `@e…` refs, the probe uses role/name metadata persisted in `details.refSnapshot` from the latest snapshot instead of running a pre-click snapshot that could recycle upstream refs. Contract: [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#details) (`clickDispatch`, `refSnapshot`); human workflow: README and [`COMMAND_REFERENCE.md`](COMMAND_REFERENCE.md) click verification notes; fake coverage: `agentBrowserExtension reports click dispatch diagnostic when upstream reports success without dispatching DOM events` in [`test/agent-browser.extension-errors-artifacts.test.ts`](../test/agent-browser.extension-errors-artifacts.test.ts).
|
|
139
|
+
`RQ-0089` investigated Sauce Demo no-op clicks after RQ-0086, and the 2026-05-26 release smoke reproduced the failure against direct upstream `agent-browser 0.27.0`: CSS `click [data-test=add-to-cart-sauce-labs-backpack]` and current `@ref` clicks returned success, but a page-level listener recorded no trusted pointer/mouse/click events and the cart stayed unchanged; an in-page `element.click()` did mutate the cart. The wrapper now adds a bounded top-level non-Electron `click` dispatch probe before standalone clicks. If upstream reports success but no trusted DOM event reached the target, it fails the tool, records `details.clickDispatch.status: "no-native-event-observed"`, and appends `inspect-click-dispatch-miss` / `retry-click-after-dispatch-miss` next actions; when the probe observes nested-scroll/offscreen evidence, it also records `details.clickDispatch.scrollContainer` and appends `scroll-target-into-view-after-dispatch-miss`. It does **not** replay clicks in-page. This is not site-specific and does not alter `batch`/`job`/`qa` click steps. For `@e…` refs, the probe uses role/name metadata persisted in `details.refSnapshot` from the latest snapshot instead of running a pre-click snapshot that could recycle upstream refs. Contract: [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#details) (`clickDispatch`, `refSnapshot`); human workflow: README and [`COMMAND_REFERENCE.md`](COMMAND_REFERENCE.md) click verification notes; fake coverage: `agentBrowserExtension reports click dispatch diagnostic when upstream reports success without dispatching DOM events` in [`test/agent-browser.extension-errors-artifacts.test.ts`](../test/agent-browser.extension-errors-artifacts.test.ts).
|
|
136
140
|
|
|
137
141
|
`RQ-0074` warns when `get text <selector>` may read hidden or tabbed DOM content: for non-ref CSS selectors, `extensions/agent-browser/index.ts` runs a read-only `eval --stdin` visibility probe after successful text reads, emits `details.selectorTextVisibility` plus visible warning text when the first match is hidden while visible matches exist or when multiple matches make the upstream first-match choice ambiguous, preserves multiple batched warnings in `details.selectorTextVisibilityAll`, and appends `inspect-visible-text-candidates` next actions. Contract: [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#details) (`selectorTextVisibility`); human workflow: [`COMMAND_REFERENCE.md`](COMMAND_REFERENCE.md) extraction note and README pitfalls; fake coverage: `agentBrowserExtension warns when get text may read hidden selector matches` in [`test/agent-browser.extension-errors-artifacts.test.ts`](../test/agent-browser.extension-errors-artifacts.test.ts).
|
|
138
142
|
|
|
139
143
|
`RQ-0075` classifies QA and diagnostic network failures by likely impact: `summarizeNetworkFailures` / `classifyNetworkRequestFailure` in `extensions/agent-browser/lib/results/network.ts` (re-exported from `shared.ts`) split rows that already count as failed (`isFailedNetworkRequest`) into actionable versus benign low-impact browser icon asset misses (`isBenignAssetFailure`: favicon/apple-touch-icon basename patterns, 404/`failed`/string `error` signals, and image-like `resourceType`/`mimeType` when present). `analyzeQaPresetResults` fails `qa` only for actionable network failures while preserving benign rows in `qaPreset.warnings`, and network request presentation adds a compact actionable/benign summary plus per-row impact tags, ordered with actionable/benign failed rows before successful rows so late failures are visible even in capped previews. Because real Pi ignores returned `isError` fields from custom tool `execute`, `extensions/agent-browser/index.ts` also realigns `details.resultCategory: "failure"` outcomes to Pi-visible tool errors through a `tool_result` handler; it appends the exact failure category plus `Pi tool isError: true` to prose output and preserves caller-requested `--json` output as parseable JSON while patching `isError`. Contract: [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#qa) and [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#details); human workflow: [`COMMAND_REFERENCE.md`](COMMAND_REFERENCE.md) QA and network diagnostic notes; fake coverage: `agentBrowserExtension compiles lightweight QA presets and fails diagnostics` in [`test/agent-browser.extension-input-modes.test.ts`](../test/agent-browser.extension-input-modes.test.ts) plus network presentation assertions in [`test/agent-browser.presentation.test.ts`](../test/agent-browser.presentation.test.ts); model-free real-Pi pipeline coverage in [`test/agent-browser.pi-pipeline.test.ts`](../test/agent-browser.pi-pipeline.test.ts) asserts both in-memory and persisted JSONL tool results for QA prose patching, parseable caller-requested `--json` failures, and strict public-schema rejection before upstream spawn; `npm run verify -- lifecycle` asserts the QA failure-patch line in a saved JSONL session.
|
|
140
144
|
|
|
141
|
-
`RQ-0076` adds best-effort timeout recovery when the wrapper watchdog kills a stuck upstream process: `extensions/agent-browser/index.ts` calls `collectTimeoutPartialProgress` / `formatTimeoutPartialProgressText` to build `details.timeoutPartialProgress` from the compiled `job` or `qa` step list or parsed caller `batch` stdin, session-scoped `get url` / `get title` (plus optional planned-URL fallback from `open`/`navigate`/`pushstate` steps), and declared artifact paths (`screenshot`, `pdf`, `download`, `wait --download`) with existence/size checks,
|
|
145
|
+
`RQ-0076` adds best-effort timeout recovery when the wrapper watchdog kills a stuck upstream process: `extensions/agent-browser/index.ts` calls `collectTimeoutPartialProgress` / `formatTimeoutPartialProgressText` to build `details.timeoutPartialProgress` from the compiled `job` or `qa` step list or parsed caller `batch` stdin, session-scoped `get url` / `get title` (plus optional planned-URL fallback from `open`/`navigate`/`pushstate` steps), and declared artifact paths (`screenshot`, `pdf`, `download`, `wait --download`) with existence/size/state checks. Planned steps include `completed` / `failed` / `pending` / `unknown` statuses, the first incomplete step is exposed as `retryStep`, and `details.nextActions` can include `retry-timeout-step` with the exact retry payload; opened pages recovered after a post-open hang set `openedButPostOpenTimedOut`. The visible `Timeout partial progress` block repeats the same redacted URLs/paths and retry guidance. Contract: [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#details); human workflow: [`COMMAND_REFERENCE.md`](COMMAND_REFERENCE.md) wrapper timeout note and README job section; fake coverage: `agentBrowserExtension reports partial progress and artifacts after job timeout` in [`test/agent-browser.extension-errors-artifacts.test.ts`](../test/agent-browser.extension-errors-artifacts.test.ts).
|
|
142
146
|
|
|
143
147
|
`RQ-0077` reports managed-session outcomes after managed-session process execution: `extensions/agent-browser/index.ts` builds `details.managedSessionOutcome` (`buildManagedSessionOutcome`), recording `status` values such as `preserved` (previous managed session remains current) or `abandoned` (no managed session became current), plus previous/current/attempted session names, optional `replacedSessionName`, and active-before/after booleans. Visible `Managed session outcome: …` text (`formatManagedSessionOutcomeText`) is appended only when `sessionMode` is `"fresh"` and the outcome’s `succeeded` is false—covering launch failures, missing-binary on a fresh plan, and post-batch failures such as **`qa`** reclassification where `succeeded` is realigned after the fact. Contract: [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#details); human workflow: [`COMMAND_REFERENCE.md`](COMMAND_REFERENCE.md) session-mode notes and README session section; fake coverage: `agentBrowserExtension reports managed-session outcomes after failed fresh launches` in [`test/agent-browser.extension-errors-artifacts.test.ts`](../test/agent-browser.extension-errors-artifacts.test.ts) and the managed-session slice of `agentBrowserExtension compiles lightweight QA presets and fails diagnostics` in [`test/agent-browser.extension-input-modes.test.ts`](../test/agent-browser.extension-input-modes.test.ts).
|
|
144
148
|
|
|
@@ -146,7 +150,7 @@ Native `job`, `qa`, experimental `sourceLookup`, experimental `networkSourceLook
|
|
|
146
150
|
|
|
147
151
|
`RQ-0079` clarifies artifact lifecycle and cleanup ownership: `extensions/agent-browser/lib/orchestration/browser-run/diagnostics.ts` builds `details.artifactCleanup`, surfaced by process-output with visible `Artifact lifecycle` copy on successful close commands (`close`, `quit`, or `exit`) when `artifactManifest.entries` is non-empty (`getArtifactCleanupGuidance`), stating that close commands do not delete explicit artifacts; `explicitArtifactPaths` carries up to ten distinct existing `explicit-path` manifest paths after a filesystem existence check, skipping stale paths already removed by host tools (possibly empty when the recent window has no existing explicit rows). Contract: [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#details) (`artifactCleanup`); human workflow: [`COMMAND_REFERENCE.md`](COMMAND_REFERENCE.md) artifact retention section and README artifact notes; fake coverage: `agentBrowserExtension reports artifact lifecycle guidance on close` in [`test/agent-browser.extension-errors-artifacts.test.ts`](../test/agent-browser.extension-errors-artifacts.test.ts), plus close-alias unit coverage in [`test/agent-browser.runtime.test.ts`](../test/agent-browser.runtime.test.ts) and [`test/agent-browser.session-page-state.test.ts`](../test/agent-browser.session-page-state.test.ts).
|
|
148
152
|
|
|
149
|
-
`RQ-0080` adds no-op scroll recovery for dense dashboards and nested panes: for successful top-level `scroll`, `extensions/agent-browser/index.ts` samples viewport and prominent scroll-container positions before and after execution with read-only session-scoped `eval --stdin` probes. If no sampled position changes, it emits `details.scrollNoop`, appends visible `Scroll diagnostic: no observed scroll movement`, appends exact `inspect-after-noop-scroll` / `verify-noop-scroll-visually` next actions, and updates `pageChangeSummary.nextActionIds` so agents can branch without parsing prose. Contract: [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#details) (`scrollNoop`, `nextActions`); human workflow: [`COMMAND_REFERENCE.md`](COMMAND_REFERENCE.md) scroll note; fake coverage: `agentBrowserExtension reports no-op scroll diagnostics with recovery next actions`.
|
|
153
|
+
`RQ-0080` adds no-op scroll recovery for dense dashboards and nested panes, plus click-dispatch nested-scroll recovery when a click target appears outside its scroll container: for successful top-level `scroll`, `extensions/agent-browser/index.ts` samples viewport and prominent scroll-container positions before and after execution with read-only session-scoped `eval --stdin` probes. If no sampled position changes, it emits `details.scrollNoop`, appends visible `Scroll diagnostic: no observed scroll movement`, appends exact `inspect-after-noop-scroll` / `verify-noop-scroll-visually` next actions, and updates `pageChangeSummary.nextActionIds` so agents can branch without parsing prose. Contract: [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#details) (`scrollNoop`, `nextActions`); human workflow: [`COMMAND_REFERENCE.md`](COMMAND_REFERENCE.md) scroll note; fake coverage: `agentBrowserExtension reports no-op scroll diagnostics with recovery next actions`.
|
|
150
154
|
|
|
151
155
|
`RQ-0081` adds focused-combobox recovery for dense dashboard controls: after successful explicit combobox-targeted actions (for example `semanticAction` role `combobox` click), `extensions/agent-browser/index.ts` runs a read-only focused-element probe and emits `details.comboboxFocus` plus visible `Combobox diagnostic` text when a combobox-like control is focused, has explicit `aria-expanded` state, and no visible listbox/options are open. It appends exact `inspect-focused-combobox`, `try-open-combobox-with-arrow`, and `try-open-combobox-with-enter` next actions, all session-prefixed when applicable. The probe is gated to explicit combobox targets to avoid ordinary-click false positives and preserves the original combobox semantic target even when active-session visible-ref resolution rewrites execution to `click @ref`. Contract: [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#details) (`comboboxFocus`, `nextActions`); human workflow: [`COMMAND_REFERENCE.md`](COMMAND_REFERENCE.md) combobox note; fake coverage: `agentBrowserExtension reports focused combobox diagnostics with option-opening next actions` and `agentBrowserExtension preserves combobox diagnostics after semanticAction visible-ref resolution`.
|
|
152
156
|
|