pi-agent-browser-native 0.2.26 → 0.2.27
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/CHANGELOG.md +9 -0
- package/README.md +3 -3
- package/docs/COMMAND_REFERENCE.md +5 -5
- package/docs/SUPPORT_MATRIX.md +4 -4
- package/docs/TOOL_CONTRACT.md +7 -6
- package/extensions/agent-browser/index.ts +112 -17
- package/extensions/agent-browser/lib/playbook.ts +2 -2
- package/extensions/agent-browser/lib/results/presentation.ts +17 -2
- package/extensions/agent-browser/lib/results.ts +1 -0
- package/package.json +1 -1
package/CHANGELOG.md
CHANGED
|
@@ -1,5 +1,14 @@
|
|
|
1
1
|
# Changelog
|
|
2
2
|
|
|
3
|
+
## 0.2.27 - 2026-05-14
|
|
4
|
+
|
|
5
|
+
### Fixed
|
|
6
|
+
- `semanticAction` role/name click, check, and uncheck calls in active sessions now resolve through the current `snapshot -i` refs before execution, preventing hidden duplicate upstream `find` matches from stealing the action while preserving the original target in `details.compiledSemanticAction` and showing the executed ref in `details.effectiveArgs`.
|
|
7
|
+
- QA presets now default to `loadState: "domcontentloaded"` and accept explicit `domcontentloaded`, `load`, or `networkidle`, avoiding wrapper watchdog timeouts on analytics-heavy or long-polling docs sites while keeping stricter waits opt-in.
|
|
8
|
+
- Network request presentation now shows actionable and benign failed rows before successful rows, so late failures remain visible even when request previews are capped.
|
|
9
|
+
- Overlay blocker diagnostics now require strong modal context (`dialog` / `alertdialog`) before suggesting close/dismiss candidates, eliminating noisy warnings after ordinary same-page menu opens or app button mutations.
|
|
10
|
+
- Artifact lifecycle cleanup guidance now lists only explicit artifact paths that still exist on disk, skipping deleted/stale paths while preserving the close-does-not-delete reminder.
|
|
11
|
+
|
|
3
12
|
## 0.2.26 - 2026-05-14
|
|
4
13
|
|
|
5
14
|
### Added
|
package/README.md
CHANGED
|
@@ -191,11 +191,11 @@ Typical pitfalls:
|
|
|
191
191
|
- Supply **exactly one** of `args`, `semanticAction`, `job`, `qa`, `sourceLookup`, or `networkSourceLookup` per call (not more, not none).
|
|
192
192
|
- `semanticAction` and `job` are **not** valid inside `batch` stdin; batch steps stay upstream argv string arrays (spell a `find` step as tokens there if you need it in a batch).
|
|
193
193
|
- Commands or locators outside the supported shorthand still require explicit `args`. Common page getters are grouped under `get`: use `get title`, `get url`, or `get text <selector>` rather than shortcut commands such as `title` or `url`; unknown getter shortcuts can return read-only `details.nextActions` like `use-get-title`.
|
|
194
|
-
- Use `semanticAction.session` to target a named upstream browser session; the wrapper prepends `--session <name>` before `find` and keeps that prefix on retry/candidate actions.
|
|
194
|
+
- Use `semanticAction.session` to target a named upstream browser session; the wrapper prepends `--session <name>` before `find` and keeps that prefix on retry/candidate actions. In active sessions, role/name click/check/uncheck shorthands may resolve through the current `snapshot -i` refs before execution so hidden duplicate matches do not steal the action; `details.effectiveArgs` shows the exact executed argv.
|
|
195
195
|
- Do not reuse `@e…` refs across navigation. The wrapper records the latest snapshot refs per session and fails mutation-prone stale/recycled refs before upstream can silently hit a different current-page element; use the session-aware `refresh-interactive-refs` next action.
|
|
196
196
|
- If upstream classifies the failure as `stale-ref` and `details.compiledSemanticAction` is present, `details.nextActions` may list `retry-semantic-action-after-stale-ref` after `refresh-interactive-refs`, carrying the same compiled `find` argv so you can retry the locator-stable target once it is safe to do so (contract in [`docs/TOOL_CONTRACT.md#semanticaction`](docs/TOOL_CONTRACT.md#semanticaction)).
|
|
197
197
|
- If the failure is `selector-not-found` for a compiled `semanticAction`, visible text may add `Agent-browser candidate fallbacks` and `details.nextActions` may list bounded `try-*-candidate` follow-ups (role/name retries only for `fill` + `placeholder`, `click` + `text`, or `fill` + `label`; `select` misses do not get these entries); prefer those payloads or a fresh snapshot over guessing new selectors (same contract link).
|
|
198
|
-
- If a **top-level** `click` succeeds (unified command `click`, not a `batch` step), upstream reports `data.clicked`, and the tab URL is unchanged under the same normalization as ref preflight (fragment-insensitive), the wrapper may take one extra `snapshot -i` and add `Possible overlay blockers` with `details.overlayBlockers` (`candidates`, `summary`, optional `snapshot` refresh for refs) plus session-aware `inspect-overlay-state` / bounded `try-overlay-blocker-candidate-*` next actions when that snapshot shows
|
|
198
|
+
- If a **top-level** `click` succeeds (unified command `click`, not a `batch` step), upstream reports `data.clicked`, and the tab URL is unchanged under the same normalization as ref preflight (fragment-insensitive), the wrapper may take one extra `snapshot -i` and add `Possible overlay blockers` with `details.overlayBlockers` (`candidates`, `summary`, optional `snapshot` refresh for refs) plus session-aware `inspect-overlay-state` / bounded `try-overlay-blocker-candidate-*` next actions when that snapshot shows strong modal context (`dialog` / `alertdialog`) and close/dismiss-like controls. Page-wide words like privacy, sign in, or banner alone do not trigger this diagnostic. The unchanged-URL check uses `details.navigationSummary`, which is only populated via follow-up `get url` / `get title` when the click JSON omits **both** string `data.url` and `data.title`; if upstream already includes either, overlay diagnostics are skipped here. Also skipped when tab correction or about-blank recovery already ran on that result.
|
|
199
199
|
- If `get text <selector>` reads a non-ref CSS selector with multiple matches or a hidden first match while visible matches exist, including successful `batch` steps, the wrapper may add `Selector text visibility warning`, `details.selectorTextVisibility` (plus `selectorTextVisibilityAll` for multiple batched warnings), and `inspect-visible-text-candidates` next actions; prefer a visible `@ref`, a scoped selector, or a targeted `eval --stdin` over hidden tab content.
|
|
200
200
|
|
|
201
201
|
### Constrained browser jobs
|
|
@@ -218,7 +218,7 @@ Use raw `args`/`stdin` when you need full upstream `batch` power, custom flags,
|
|
|
218
218
|
|
|
219
219
|
### Lightweight QA preset
|
|
220
220
|
|
|
221
|
-
For a quick smoke/QA pass, use top-level `qa`. It compiles to the same batch path as `job`, clears enabled network/console/page-error buffers before opening the target URL, waits for page readiness, checks optional expected text or selector, inspects fresh network requests, console messages, and page errors, and can capture an evidence screenshot. `checkNetwork`, `checkConsole`, and `checkErrors` default to true; set one to `false` to skip that diagnostic read. Network failures are classified by likely impact: actionable document/script/API-style failures still fail QA, while some low-impact browser icon asset misses (for example certain `favicon` or `apple-touch-icon` paths when upstream marks the row failed and resource metadata looks image-like) surface only as warnings instead of failing an otherwise healthy smoke check (`details.qaPreset.warnings`, with human-readable `details.qaPreset.summary` when the preset still passes). Exact predicates live in [`docs/TOOL_CONTRACT.md`](docs/TOOL_CONTRACT.md#qa) and `classifyNetworkRequestFailure` in `extensions/agent-browser/lib/results/shared.ts`.
|
|
221
|
+
For a quick smoke/QA pass, use top-level `qa`. It compiles to the same batch path as `job`, clears enabled network/console/page-error buffers before opening the target URL, waits for page readiness, checks optional expected text or selector, inspects fresh network requests, console messages, and page errors, and can capture an evidence screenshot. `loadState` defaults to `"domcontentloaded"`; set it to `"load"` or `"networkidle"` only when the stricter state is useful and the site is not expected to keep background requests alive. `checkNetwork`, `checkConsole`, and `checkErrors` default to true; set one to `false` to skip that diagnostic read. Network failures are classified by likely impact and failed rows are listed first in network previews: actionable document/script/API-style failures still fail QA, while some low-impact browser icon asset misses (for example certain `favicon` or `apple-touch-icon` paths when upstream marks the row failed and resource metadata looks image-like) surface only as warnings instead of failing an otherwise healthy smoke check (`details.qaPreset.warnings`, with human-readable `details.qaPreset.summary` when the preset still passes). Exact predicates live in [`docs/TOOL_CONTRACT.md`](docs/TOOL_CONTRACT.md#qa) and `classifyNetworkRequestFailure` in `extensions/agent-browser/lib/results/shared.ts`.
|
|
222
222
|
|
|
223
223
|
```json
|
|
224
224
|
{
|
|
@@ -133,13 +133,13 @@ Examples:
|
|
|
133
133
|
{ "args": ["snapshot", "-i"] }
|
|
134
134
|
```
|
|
135
135
|
|
|
136
|
-
The optional native `semanticAction` object is only a thin schema for common locator-based actions; it compiles to existing upstream `find` commands and reports the compiled argv in `details.compiledSemanticAction` (see [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#semanticaction) for the full field rules). It is a top-level alternative to `args`, `job`, `qa`, `sourceLookup`, and `networkSourceLookup`, not a nested shape inside `batch` stdin arrays. Add `session` inside `semanticAction` when the shorthand should target a named upstream browser session; the compiled argv prepends `--session <name>` before `find`, and fallback candidate actions preserve that prefix. If a semantic action misses with `selector-not-found`, visible output may include `Agent-browser candidate fallbacks`, while `details.nextActions` first recommends a fresh `snapshot -i` and may include bounded role/name retries—for example `searchbox`/`textbox` for a missed `placeholder` fill, `button`/`link` for a missed `text` click, or a `textbox` retry for a missed `label` fill—each as a `try-*-candidate` entry carrying redacted `find role …` argv.
|
|
136
|
+
The optional native `semanticAction` object is only a thin schema for common locator-based actions; it compiles to existing upstream `find` commands and reports the compiled argv in `details.compiledSemanticAction` (see [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#semanticaction) for the full field rules). It is a top-level alternative to `args`, `job`, `qa`, `sourceLookup`, and `networkSourceLookup`, not a nested shape inside `batch` stdin arrays. Add `session` inside `semanticAction` when the shorthand should target a named upstream browser session; the compiled argv prepends `--session <name>` before `find`, and fallback candidate actions preserve that prefix. For active sessions, role/name click/check/uncheck shorthands may resolve through the current `snapshot -i` refs before execution so hidden duplicate matches do not steal the action; inspect `details.effectiveArgs` when you need the exact executed argv. If a semantic action misses with `selector-not-found`, visible output may include `Agent-browser candidate fallbacks`, while `details.nextActions` first recommends a fresh `snapshot -i` and may include bounded role/name retries—for example `searchbox`/`textbox` for a missed `placeholder` fill, `button`/`link` for a missed `text` click, or a `textbox` retry for a missed `label` fill—each as a `try-*-candidate` entry carrying redacted `find role …` argv.
|
|
137
137
|
|
|
138
138
|
Do not assume Playwright selector dialects such as `text=Close` or `button:has-text('Close')` are supported wrapper syntax. If you need those forms, verify current upstream `agent-browser` behavior first; otherwise use refs, `find`, or known CSS selectors.
|
|
139
139
|
|
|
140
140
|
Treat `@e…` refs as page-scoped. After a successful `snapshot`, the wrapper records the latest refs and page target for that session; mutation-prone ref commands such as `click @e4` or batch steps with old refs fail with `failureCategory: "stale-ref"` when the page target changed or the ref is absent from the latest same-page snapshot. Inside `batch` stdin JSON, the wrapper also walks steps in order before spawn: steps whose first token can navigate or mutate set a latch; a later step whose first token is `snapshot` clears that latch for following rows; guarded steps that still mention `@e…` after an uncleared latch fail with the same `stale-ref` bucket without launching upstream. Follow the `refresh-interactive-refs` next action (it includes `--session <name>` when needed) and prefer stable `find` or `semanticAction` locators when navigation or rerendering is likely. Contract detail: [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#details) (`refSnapshot`).
|
|
141
141
|
|
|
142
|
-
When a **top-level** `click` succeeds (not a `click` hidden inside a `batch`/`job` tool call—the unified command must be `click`), the upstream payload includes `data.clicked`, and the wrapper sees the active tab URL unchanged after the same normalization it uses for ref guards (**`#fragment` ignored**), it may run one extra `snapshot -i` and surface `Possible overlay blockers` plus `details.overlayBlockers` (`candidates`, `summary`, and a `snapshot` map that can refresh `refSnapshot`) when that snapshot shows
|
|
142
|
+
When a **top-level** `click` succeeds (not a `click` hidden inside a `batch`/`job` tool call—the unified command must be `click`), the upstream payload includes `data.clicked`, and the wrapper sees the active tab URL unchanged after the same normalization it uses for ref guards (**`#fragment` ignored**), it may run one extra `snapshot -i` and surface `Possible overlay blockers` plus `details.overlayBlockers` (`candidates`, `summary`, and a `snapshot` map that can refresh `refSnapshot`) when that snapshot shows strong modal context (`dialog` / `alertdialog`) **and** up to three close/dismiss-like controls; page-wide words such as privacy, sign in, or banner alone do not trigger it. The URL check compares the session’s prior pinned tab target to `details.navigationSummary.url` after the click; that summary is gathered with extra `get url` / `get title` calls only when the click JSON omits **both** string `data.url` and `data.title`—if upstream already echoes either field, overlay diagnostics are skipped on this path. The diagnostic is skipped if the wrapper already applied tab-focus correction or about-blank recovery on that result. Appended `inspect-overlay-state` / `try-overlay-blocker-candidate-*` entries in `details.nextActions` include `--session <name>` when the session is named, same as other session-scoped follow-ups. Treat `inspect-overlay-state` as the safe first follow-up; only use a `try-overlay-blocker-candidate-*` next action when the candidate is clearly the control you intend to close.
|
|
143
143
|
|
|
144
144
|
### Extract page data
|
|
145
145
|
|
|
@@ -180,7 +180,7 @@ For short constrained flows, use top-level `job` instead of hand-writing `batch`
|
|
|
180
180
|
|
|
181
181
|
Use raw `args: ["batch"]` with `stdin` when you need arbitrary upstream commands, flags, or batch failure policies outside the constrained schema. Do not pass `stdin` with `job`, `qa`, `sourceLookup`, or `networkSourceLookup`; those modes generate the batch stdin themselves.
|
|
182
182
|
|
|
183
|
-
For quick smoke/QA checks, use top-level `qa`. It clears enabled network/console/page-error buffers before opening the target URL, waits for page readiness, checks expected text/selector, inspects fresh network requests, console messages, and page errors, and can capture an evidence screenshot. QA network diagnostics classify failed requests by likely impact: actionable document/script/API-style failures fail the preset, while common low-impact browser icon misses such as `favicon.ico` are surfaced as warnings (`qaPreset.warnings`) so they do not fail an otherwise healthy page.
|
|
183
|
+
For quick smoke/QA checks, use top-level `qa`. It clears enabled network/console/page-error buffers before opening the target URL, waits for page readiness, checks expected text/selector, inspects fresh network requests, console messages, and page errors, and can capture an evidence screenshot. The readiness wait defaults to `loadState: "domcontentloaded"`; set `loadState` to `"load"` or `"networkidle"` only when that stricter state is useful and the site is not expected to keep background requests alive. QA network diagnostics classify failed requests by likely impact and list failed rows first in the network preview: actionable document/script/API-style failures fail the preset, while common low-impact browser icon misses such as `favicon.ico` are surfaced as warnings (`qaPreset.warnings`) so they do not fail an otherwise healthy page.
|
|
184
184
|
|
|
185
185
|
The same classification drives plain `network requests` presentation: when any row counts as failed (HTTP status ≥ 400, `failed: true`, or a string `error`), model-facing text starts with a line like `Network failure summary: 0 actionable, 1 benign low-impact (1 total).`, and each preview line can end with an impact tag such as `[benign: low-impact browser icon asset]` or `[actionable: document, script, API, or non-benign request failure]`. Rules live in `classifyNetworkRequestFailure` / `summarizeNetworkFailures` in `extensions/agent-browser/lib/results/shared.ts`; QA aggregation is `analyzeQaPresetResults` in `extensions/agent-browser/index.ts`.
|
|
186
186
|
|
|
@@ -188,7 +188,7 @@ The same classification drives plain `network requests` presentation: when any r
|
|
|
188
188
|
{ "qa": { "url": "https://example.com", "expectedText": "Example Domain", "screenshotPath": ".dogfood/qa-example.png" } }
|
|
189
189
|
```
|
|
190
190
|
|
|
191
|
-
Optional `checkNetwork`, `checkConsole`, and `checkErrors` default to `true`; set
|
|
191
|
+
Optional `loadState`, `checkNetwork`, `checkConsole`, and `checkErrors` default to `"domcontentloaded"`, `true`, `true`, and `true`; set a check to `false` to skip that diagnostic. Omit `expectedText` and `expectedSelector` when you only need load plus diagnostics.
|
|
192
192
|
|
|
193
193
|
Use custom `job` or raw `batch` when you need a different check sequence.
|
|
194
194
|
|
|
@@ -265,7 +265,7 @@ The wrapper keeps a bounded, metadata-only `details.artifactManifest` of recent
|
|
|
265
265
|
|
|
266
266
|
This manifest cap controls what appears in `details.artifactManifest` and in summaries such as `Session artifacts: 42 live, 0 evicted (42/100 recent)`. It does not delete explicit files that upstream saved to paths you chose, such as screenshots, PDFs, downloads, traces, HAR files, or WebM recordings.
|
|
267
267
|
|
|
268
|
-
Browser `close` is also not file cleanup. If `details.artifactManifest` is present with a non-empty `entries` list, a successful `close` appends an `Artifact lifecycle` note and reports `details.artifactCleanup` with the current retention summary and the same host-owned cleanup `note` as the contract (`extensions/agent-browser/index.ts`, `getArtifactCleanupGuidance`). Up to ten distinct user-chosen paths appear in `explicitArtifactPaths` when matching `explicit-path` manifest rows exist in the recent window;
|
|
268
|
+
Browser `close` is also not file cleanup. If `details.artifactManifest` is present with a non-empty `entries` list, a successful `close` appends an `Artifact lifecycle` note and reports `details.artifactCleanup` with the current retention summary and the same host-owned cleanup `note` as the contract (`extensions/agent-browser/index.ts`, `getArtifactCleanupGuidance`). Up to ten distinct user-chosen paths that still exist on disk appear in `explicitArtifactPaths` when matching `explicit-path` manifest rows exist in the recent window; deleted/stale paths are skipped. Otherwise that array is empty and visible text may omit the “Explicit artifact paths” line even though the lifecycle block still reminds you that close does not delete saved files. Delete any paths you care about with host file tools after inspection; the native browser tool intentionally does not remove arbitrary user-chosen filesystem paths.
|
|
269
269
|
|
|
270
270
|
Oversized snapshots and oversized generic outputs are different: when a persisted pi session is available, their wrapper-managed spill files are stored under the private session artifact directory and are governed by the byte budget `PI_AGENT_BROWSER_SESSION_ARTIFACT_MAX_BYTES` (default 32 MiB). Raise that byte budget as well for long QA sessions that need many full raw snapshots or large text spills to survive reload/resume.
|
|
271
271
|
|
package/docs/SUPPORT_MATRIX.md
CHANGED
|
@@ -64,17 +64,17 @@ Native `job`, `qa`, experimental `sourceLookup`, and experimental `networkSource
|
|
|
64
64
|
|
|
65
65
|
`RQ-0068` closed with a no-adopt decision for reusable browser recipes. Current benchmark and repo-local dogfood evidence do not show repeated named job shapes that justify executable recipe state; examples stay in docs and prompt guidance, while the `qa` preset remains the only stable repeated smoke-test shortcut. Revisit recipes only with concrete repeated workflow evidence and a defined owner/versioning/test plan.
|
|
66
66
|
|
|
67
|
-
`RQ-0070` adds bounded locator fallbacks when a compiled top-level `semanticAction` fails with `failureCategory: "selector-not-found"`: `extensions/agent-browser/index.ts` appends `try-*-candidate` entries to `details.nextActions` (and an `Agent-browser candidate fallbacks` block in visible text) only for `fill`+`placeholder`, `click`+`text`, or `fill`+`label` (not `select`, even with the same locators). Contract: [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#semanticaction); fake coverage: `agentBrowserExtension returns semantic locator candidates when semanticAction misses` in [`test/agent-browser.extension-validation.test.ts`](../test/agent-browser.extension-validation.test.ts).
|
|
67
|
+
`RQ-0070` adds bounded locator fallbacks when a compiled top-level `semanticAction` fails with `failureCategory: "selector-not-found"`: `extensions/agent-browser/index.ts` appends `try-*-candidate` entries to `details.nextActions` (and an `Agent-browser candidate fallbacks` block in visible text) only for `fill`+`placeholder`, `click`+`text`, or `fill`+`label` (not `select`, even with the same locators). Active-session role/name click/check/uncheck shorthands also get a pre-execution visible-ref resolution pass via one fresh `snapshot -i`, so hidden duplicate upstream `find` matches do not steal the action; the original target remains in `details.compiledSemanticAction` and the executed ref appears in `details.effectiveArgs`. Contract: [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#semanticaction); fake coverage: `agentBrowserExtension returns semantic locator candidates when semanticAction misses` and `agentBrowserExtension resolves semantic role clicks through current visible snapshot refs when available` in [`test/agent-browser.extension-validation.test.ts`](../test/agent-browser.extension-validation.test.ts).
|
|
68
68
|
|
|
69
69
|
`RQ-0071` makes that shorthand session-aware: optional `semanticAction.session` compiles to `--session <name>` before `find`, so `buildExecutionPlan` treats the call like any argv that already names an upstream session (no extra implicit `--session`); `details.sessionName` reflects the name on success; stale-ref retries copy compiled argv with that prefix, and `try-*` candidates preserve the same `--session` prefix via `getCompiledSemanticActionSessionPrefix`. Contract: [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#semanticaction); fake coverage: `semanticAction` session compile/assertions in [`test/agent-browser.extension-validation.test.ts`](../test/agent-browser.extension-validation.test.ts).
|
|
70
70
|
|
|
71
71
|
`RQ-0072` guards page-scoped `@e…` refs against silent recycling: successful `snapshot` (or the last `snapshot` step inside a successful `batch`) records `details.refSnapshot` with ref ids and the snapshot page URL; `extensions/agent-browser/index.ts` replays per-session snapshots from the transcript on reload/resume, clears them on successful `close`, rejects mutation-prone ref argv before spawn when the tab URL diverges or a ref id is missing from the latest snapshot, blocks `batch` stdin that uses `@e…` on a guarded command after an earlier step that can navigate or mutate until a `snapshot` step appears later in the same stdin array (pre-spawn latch reset only), and prefixes `refresh-interactive-refs` with `--session` when the call names a session (including upstream-classified `stale-ref` outcomes). Contract: [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#details) (`refSnapshot`, `stale-ref`); human workflow: [`COMMAND_REFERENCE.md`](COMMAND_REFERENCE.md) snapshot/ref notes and README pitfalls; fake coverage: `agentBrowserExtension blocks page-scoped ref reuse…`, `…blocks stale refs after page-changing steps inside a batch`, `…allows batch stdin ref steps after snapshot following an invalidating step`, `…records snapshot refs returned inside a successful batch`, and `…rejects refs absent from the latest same-page snapshot` in [`test/agent-browser.extension-validation.test.ts`](../test/agent-browser.extension-validation.test.ts).
|
|
72
72
|
|
|
73
|
-
`RQ-0073` surfaces likely overlay blockers after no-navigation clicks without inventing blind targets: for **top-level** `click` results (unified command `click`, not `batch`-wrapped steps) whose upstream JSON includes `data.clicked`, whose prior pinned tab URL and post-click URL (from `details.navigationSummary`, normally via `get url` when the click payload omits **both** string `data.url` and `data.title`) stay equal after the same fragment-insensitive normalization used for ref preflight, and where the same unified result did **not** already apply session tab correction or about-blank mismatch recovery, `extensions/agent-browser/index.ts` takes one fresh session-scoped `snapshot -i`, scans `refs` for
|
|
73
|
+
`RQ-0073` surfaces likely overlay blockers after no-navigation clicks without inventing blind targets: for **top-level** `click` results (unified command `click`, not `batch`-wrapped steps) whose upstream JSON includes `data.clicked`, whose prior pinned tab URL and post-click URL (from `details.navigationSummary`, normally via `get url` when the click payload omits **both** string `data.url` and `data.title`) stay equal after the same fragment-insensitive normalization used for ref preflight, and where the same unified result did **not** already apply session tab correction or about-blank mismatch recovery, `extensions/agent-browser/index.ts` takes one fresh session-scoped `snapshot -i`, scans `refs` for strong modal context (`dialog` / `alertdialog`) plus up to three close/dismiss-pattern `button`/`link`/`menuitem` controls, and only then emits `details.overlayBlockers` (`candidates`, `summary`, and a `snapshot` map that can advance `refSnapshot`), visible `Possible overlay blockers`, and `inspect-overlay-state` / `try-overlay-blocker-candidate-*` next actions (with `--session` prefix when the session is named) appended after presentation follow-ups such as `inspect-after-mutation`. Page-wide privacy/sign-in/banner text without a dialog role is deliberately ignored to avoid warnings after ordinary same-page clicks. Contract: [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#details) (`overlayBlockers`); human workflow: [`COMMAND_REFERENCE.md`](COMMAND_REFERENCE.md) no-navigation click note and README pitfalls; fake coverage: `agentBrowserExtension surfaces likely overlay blockers after a no-op click` and `agentBrowserExtension does not report overlay blockers from unrelated page chrome after a successful same-page click` in [`test/agent-browser.extension-validation.test.ts`](../test/agent-browser.extension-validation.test.ts).
|
|
74
74
|
|
|
75
75
|
`RQ-0074` warns when `get text <selector>` may read hidden or tabbed DOM content: for non-ref CSS selectors, `extensions/agent-browser/index.ts` runs a read-only `eval --stdin` visibility probe after successful text reads, emits `details.selectorTextVisibility` plus visible warning text when the first match is hidden while visible matches exist or when multiple matches make the upstream first-match choice ambiguous, preserves multiple batched warnings in `details.selectorTextVisibilityAll`, and appends `inspect-visible-text-candidates` next actions. Contract: [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#details) (`selectorTextVisibility`); human workflow: [`COMMAND_REFERENCE.md`](COMMAND_REFERENCE.md) extraction note and README pitfalls; fake coverage: `agentBrowserExtension warns when get text may read hidden selector matches` in [`test/agent-browser.extension-validation.test.ts`](../test/agent-browser.extension-validation.test.ts).
|
|
76
76
|
|
|
77
|
-
`RQ-0075` classifies QA and diagnostic network failures by likely impact: `summarizeNetworkFailures` / `classifyNetworkRequestFailure` in `extensions/agent-browser/lib/results/shared.ts` split rows that already count as failed (`isFailedNetworkRequest`) into actionable versus benign low-impact browser icon asset misses (`isBenignAssetFailure`: favicon/apple-touch-icon basename patterns, 404/`failed`/string `error` signals, and image-like `resourceType`/`mimeType` when present). `analyzeQaPresetResults` fails `qa` only for actionable network failures while preserving benign rows in `qaPreset.warnings`, and network request presentation adds a compact actionable/benign summary plus per-row impact tags. Contract: [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#qa) and [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#details); human workflow: [`COMMAND_REFERENCE.md`](COMMAND_REFERENCE.md) QA and network diagnostic notes; fake coverage: `agentBrowserExtension compiles lightweight QA presets and fails diagnostics` in [`test/agent-browser.extension-validation.test.ts`](../test/agent-browser.extension-validation.test.ts) plus network presentation assertions in [`test/agent-browser.presentation.test.ts`](../test/agent-browser.presentation.test.ts).
|
|
77
|
+
`RQ-0075` classifies QA and diagnostic network failures by likely impact: `summarizeNetworkFailures` / `classifyNetworkRequestFailure` in `extensions/agent-browser/lib/results/shared.ts` split rows that already count as failed (`isFailedNetworkRequest`) into actionable versus benign low-impact browser icon asset misses (`isBenignAssetFailure`: favicon/apple-touch-icon basename patterns, 404/`failed`/string `error` signals, and image-like `resourceType`/`mimeType` when present). `analyzeQaPresetResults` fails `qa` only for actionable network failures while preserving benign rows in `qaPreset.warnings`, and network request presentation adds a compact actionable/benign summary plus per-row impact tags, ordered with actionable/benign failed rows before successful rows so late failures are visible even in capped previews. Contract: [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#qa) and [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#details); human workflow: [`COMMAND_REFERENCE.md`](COMMAND_REFERENCE.md) QA and network diagnostic notes; fake coverage: `agentBrowserExtension compiles lightweight QA presets and fails diagnostics` in [`test/agent-browser.extension-validation.test.ts`](../test/agent-browser.extension-validation.test.ts) plus network presentation assertions in [`test/agent-browser.presentation.test.ts`](../test/agent-browser.presentation.test.ts).
|
|
78
78
|
|
|
79
79
|
`RQ-0076` adds best-effort timeout recovery when the wrapper watchdog kills a stuck upstream process: `extensions/agent-browser/index.ts` calls `collectTimeoutPartialProgress` / `formatTimeoutPartialProgressText` to build `details.timeoutPartialProgress` from the compiled `job` or `qa` step list or parsed caller `batch` stdin, session-scoped `get url` / `get title` (plus optional planned-URL fallback from `open`/`navigate`/`pushstate` steps), and declared artifact paths (`screenshot`, `pdf`, `download`, `wait --download`) with existence/size checks, then appends a visible `Timeout partial progress` block with redacted URLs/paths. Contract: [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#details); human workflow: [`COMMAND_REFERENCE.md`](COMMAND_REFERENCE.md) wrapper timeout note and README job section; fake coverage: `agentBrowserExtension reports partial progress and artifacts after job timeout` in [`test/agent-browser.extension-validation.test.ts`](../test/agent-browser.extension-validation.test.ts).
|
|
80
80
|
|
|
@@ -82,4 +82,4 @@ Native `job`, `qa`, experimental `sourceLookup`, and experimental `networkSource
|
|
|
82
82
|
|
|
83
83
|
`RQ-0078` improves getter/eval discoverability: `extensions/agent-browser/lib/results/presentation.ts` matches upstream failure text containing `unknown command`, `unknown subcommand`, or `unrecognized command` (case-insensitive) when the failed command token is one of `attr`, `count`, `html`, `text`, `title`, `url`, or `value`, then adds grouped-`get` prose; only `title` / `url` also emit read-only `nextActions` (`use-get-title` / `use-get-url`, with `--session` when the failed call named a session). The getter block is skipped when selector recovery already injected an `Agent-browser hint:` line into the same error string. `extensions/agent-browser/index.ts` adds `details.evalStdinHint` plus visible `Eval stdin hint` when `looksLikeFunctionEvalStdin` matches trimmed stdin and upstream JSON carries an empty-object `data.result`. Contract: [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#details) (`nextActions`, `evalStdinHint`); human workflow: [`COMMAND_REFERENCE.md`](COMMAND_REFERENCE.md) extraction note and README quick start; fake coverage: `buildToolPresentation suggests grouped getter commands for common unknown getter shortcuts` and `agentBrowserExtension warns when eval stdin returns an empty object from a function-shaped snippet`.
|
|
84
84
|
|
|
85
|
-
`RQ-0079` clarifies artifact lifecycle and cleanup ownership: `extensions/agent-browser/index.ts` adds `details.artifactCleanup` and visible `Artifact lifecycle` copy on successful `close` when `artifactManifest.entries` is non-empty (`getArtifactCleanupGuidance`), stating that close does not delete explicit artifacts; `explicitArtifactPaths` carries up to ten distinct `explicit-path` manifest paths (possibly empty when the recent window has no explicit rows). Contract: [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#details) (`artifactCleanup`); human workflow: [`COMMAND_REFERENCE.md`](COMMAND_REFERENCE.md) artifact retention section and README artifact notes; fake coverage: `agentBrowserExtension reports artifact lifecycle guidance on close`.
|
|
85
|
+
`RQ-0079` clarifies artifact lifecycle and cleanup ownership: `extensions/agent-browser/index.ts` adds `details.artifactCleanup` and visible `Artifact lifecycle` copy on successful `close` when `artifactManifest.entries` is non-empty (`getArtifactCleanupGuidance`), stating that close does not delete explicit artifacts; `explicitArtifactPaths` carries up to ten distinct existing `explicit-path` manifest paths after a filesystem existence check, skipping stale paths already removed by host tools (possibly empty when the recent window has no existing explicit rows). Contract: [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#details) (`artifactCleanup`); human workflow: [`COMMAND_REFERENCE.md`](COMMAND_REFERENCE.md) artifact retention section and README artifact notes; fake coverage: `agentBrowserExtension reports artifact lifecycle guidance on close`.
|
package/docs/TOOL_CONTRACT.md
CHANGED
|
@@ -55,7 +55,7 @@ Agent-facing efficiency claims are measured with `npm run benchmark:agent-browse
|
|
|
55
55
|
- For downloads, prefer download <selector> <path> when an element click should save a file. Do not rely on click alone when you need the downloaded file on disk.
|
|
56
56
|
- When using eval --stdin, scope checks and actions to the target element or route whenever possible instead of relying on broad page-wide text heuristics.
|
|
57
57
|
- When using eval --stdin for extraction, return the value you want instead of relying on console.log as the primary result channel. Prefer plain expressions like ({ title: document.title }) or explicitly invoked functions like (() => ({ title: document.title }))(); if a function-shaped snippet returns {}, details.evalStdinHint may warn that the function was serialized instead of called. If get text on a CSS selector surfaces details.selectorTextVisibility or selectorTextVisibilityAll, prefer a visible @ref, a more specific selector, or the inspect-visible-text-candidates nextAction over hidden tab content.
|
|
58
|
-
- When details.pageChangeSummary is present, use changeType and summary as a compact signal for navigation, DOM mutation, confirmations, or artifacts; when nextActionIds is set, match those ids to entries in details.nextActions (or per-step nextActions inside batch) for concrete follow-up payloads instead of inferring from prose alone. If a no-navigation click surfaces details.overlayBlockers, inspect the fresh snapshot evidence before using a close/dismiss candidate nextAction.
|
|
58
|
+
- When details.pageChangeSummary is present, use changeType and summary as a compact signal for navigation, DOM mutation, confirmations, or artifacts; when nextActionIds is set, match those ids to entries in details.nextActions (or per-step nextActions inside batch) for concrete follow-up payloads instead of inferring from prose alone. If a no-navigation click surfaces details.overlayBlockers, inspect the fresh snapshot evidence before using a close/dismiss candidate nextAction; ordinary page chrome without dialog/alertdialog evidence should not trigger this diagnostic.
|
|
59
59
|
- When commands save or spill files (screenshots, downloads, PDFs, traces, recordings, HAR, large snapshot spills), treat paths as provisional until details.artifactVerification shows every row verified: branch on missingCount, pendingCount, unverifiedCount, per-entry state, and optional limitation before downstream file use.
|
|
60
60
|
- Do not call --help or other exploratory inspection commands unless the user explicitly asks for them or debugging the browser integration is necessary.
|
|
61
61
|
<!-- agent-browser-playbook:end shared-guidelines -->
|
|
@@ -112,7 +112,7 @@ Compilation (then `--json` and session handling apply like any other call):
|
|
|
112
112
|
| `fill` or `select` | `["find",<locator>,<value>,<action>,<text>]` plus optional `["--name",<name>]` after `text` when `locator` is `role` and `name` is set |
|
|
113
113
|
| any supported action + `session` | prepends `["--session",<session>]` before the compiled `find` argv |
|
|
114
114
|
|
|
115
|
-
When `semanticAction` compiles successfully, `details.compiledSemanticAction` echoes `{ action, locator, args }` with `args` redacted the same way as other invocation details. Expect it on the initial wrapper validation return (when that path still builds the early `details` object) and on the unified result after `agent-browser` runs. It is omitted when the call used `args` only, when compilation never produced argv, and on some in-`execute` error returns that attach a slimmer `details` shape before the unified merge (for example certain session-plan, stdin-contract, tab-pinning, or missing-binary guard paths); compare `extensions/agent-browser/index.ts` where `compiledSemanticAction` is assigned.
|
|
115
|
+
When `semanticAction` compiles successfully, `details.compiledSemanticAction` echoes `{ action, locator, args }` with `args` redacted the same way as other invocation details. Expect it on the initial wrapper validation return (when that path still builds the early `details` object) and on the unified result after `agent-browser` runs. It is omitted when the call used `args` only, when compilation never produced argv, and on some in-`execute` error returns that attach a slimmer `details` shape before the unified merge (for example certain session-plan, stdin-contract, tab-pinning, or missing-binary guard paths); compare `extensions/agent-browser/index.ts` where `compiledSemanticAction` is assigned. For active sessions, role/name `click`, `check`, and `uncheck` semantic actions may be resolved through one fresh `snapshot -i` to a current visible `@ref` before execution; this avoids hidden duplicate matches stealing an upstream `find` action. In that case `details.compiledSemanticAction` still records the original semantic target while `details.effectiveArgs` shows the executed ref action.
|
|
116
116
|
|
|
117
117
|
If a compiled `semanticAction` fails with `failureCategory: "selector-not-found"`, visible content includes an `Agent-browser candidate fallbacks` block when the wrapper has bounded role/name retries for that locator and action, and `details.nextActions` includes the normal `refresh-interactive-refs` snapshot step plus those entries. When `session` was provided, candidate retry args preserve the same `--session <session>` prefix. Today `buildSemanticActionCandidateActions` in `extensions/agent-browser/index.ts` only appends candidates for: `fill` + `placeholder` → `try-searchbox-name-candidate` and `try-textbox-name-candidate` (same accessible name as `value`); `click` + `text` → `try-button-name-candidate` and `try-link-name-candidate`; `fill` + `label` → `try-labeled-textbox-candidate`. `select` misses—including `select` + `placeholder`—do not append `try-*` entries even when a parallel `fill` would; other locator/action pairs omit this block too. `fill` candidates keep the same trailing `text` token as the original compile before `--name <value>`; `click` candidates omit text. Each entry carries `safety` noting the match may be ambiguous. Candidate fallbacks are heuristics, not proof that an element exists; inspect the page when several controls could share the same name.
|
|
118
118
|
|
|
@@ -181,10 +181,11 @@ Because `job` still executes as upstream `batch` with generated stdin, the same
|
|
|
181
181
|
- type: object with required `url`
|
|
182
182
|
- optional; mutually exclusive with `args`, `semanticAction`, `job`, `sourceLookup`, and `networkSourceLookup`
|
|
183
183
|
- lightweight preset built on the same batch compiler path as `job`
|
|
184
|
-
- clears enabled diagnostic buffers first (`network requests --clear`, `console --clear`, `errors --clear`), then opens `url`, waits with `wait --load
|
|
184
|
+
- clears enabled diagnostic buffers first (`network requests --clear`, `console --clear`, `errors --clear`), then opens `url`, waits with `wait --load <loadState>`, optionally asserts `expectedText` (string or string array) and/or `expectedSelector` (each may be omitted for a load-plus-diagnostics-only smoke), then runs enabled diagnostics: `network requests`, `console`, and `errors`
|
|
185
|
+
- `loadState` is optional and must be `domcontentloaded`, `load`, or `networkidle`; it defaults to `domcontentloaded` so analytics-heavy or long-polling pages do not hang routine QA. Use `networkidle` only when the site is expected to go fully quiet.
|
|
185
186
|
- `checkNetwork`, `checkConsole`, and `checkErrors` default to `true`; set a field to `false` to omit that diagnostic
|
|
186
187
|
- optional `screenshotPath` adds an evidence screenshot step
|
|
187
|
-
- reports `details.compiledQaPreset` with the compiled batch plan and `details.qaPreset` with `{ passed, failedChecks, warnings, summary }`
|
|
188
|
+
- reports `details.compiledQaPreset` with the compiled batch plan and resolved `loadState`, plus `details.qaPreset` with `{ passed, failedChecks, warnings, summary }`
|
|
188
189
|
- fails the native tool result with `failureCategory: "qa-failure"` when diagnostics report page errors, console error messages, actionable failed network requests, or any batch step failure. Benign classification (implementation: `classifyNetworkRequestFailure` → `isBenignAssetFailure` in `extensions/agent-browser/lib/results/shared.ts`) applies only when the row is already treated as failed (`status >= 400`, `failed: true`, or a string `error`—see `isFailedNetworkRequest`), the URL path’s last segment matches the icon basename heuristic (`favicon` plus `.ico`/`.png`/`.svg`, or `apple-touch-icon` plus `.png`, each allowing an optional `[-.\w]*` stem suffix before the extension), **and** at least one of `status === 404`, `failed === true`, or `typeof request.error === "string"` holds (so a **status-only** failure such as `500` on that path with neither `failed` nor a string `error` stays actionable). It also requires the upstream `resourceType` / `mimeType` (whichever is present) to be absent or look image-like: `image`, `img`, `other`, or a value starting with `image/`. Those rows are counted in `qaPreset.warnings` (for example `N benign network request failure(s) ignored`) and omitted from the actionable failed-network tally; every other failed request stays actionable.
|
|
189
190
|
|
|
190
191
|
Example:
|
|
@@ -375,7 +376,7 @@ For `batch`, each `batchSteps[]` entry can carry its own `nextActions` for that
|
|
|
375
376
|
|
|
376
377
|
`pageChangeSummary` is an optional compact summary for mutation-prone and artifact-producing commands. It includes `changeType` (`"navigation"`, `"mutation"`, `"artifact"`, or `"confirmation"`), `command`, a readable `summary`, optional `title`/`url`, optional `artifactCount` or `savedFilePath`, and `nextActionIds` that link the observed change to `nextActions` without repeating full payloads. The wrapper maintains an explicit allowlist of mutation-prone commands in `extensions/agent-browser/lib/results/presentation.ts` (`PAGE_CHANGE_SUMMARY_COMMANDS`): those commands still emit a `mutation`-typed summary when upstream JSON lacks navigation metadata, as long as no stronger signal (artifact, saved path, navigation fields, or pending confirmation) applies. Commands outside that set omit `pageChangeSummary` unless the parsed payload shows navigation, a confirmation prompt, saved files, or artifacts—including read-only inspection commands, which normally have no summary unless one of those signals appears. For `batch`, the top-level summary favors artifact rollups when any step produced artifacts; otherwise it may synthesize a `mutation` summary from steps that carried their own `pageChangeSummary`.
|
|
377
378
|
|
|
378
|
-
`overlayBlockers` may appear after a successful **top-level** `click` (the unified `details.command` is `click`, not `batch`/`job`/`qa` flows that compile to `batch`) only when upstream JSON includes a string `data.clicked` ref, the session’s prior pinned tab URL (`priorSessionTabTarget.url`) and the post-click active tab URL both exist and stay equal after the same URL normalization used for ref preflight (trimmed hosts/paths; **`#fragment` dropped** while the query string stays significant), and the wrapper did not apply session tab correction or an about-blank mismatch recovery in the same result. The post-click side comes from `details.navigationSummary.url`, which the wrapper fills via follow-up `get url` / `get title` only when upstream click JSON omits **both** string `data.url` and `data.title` (`shouldCaptureNavigationSummary` in `extensions/agent-browser/index.ts`). If either field is present as a string on the click payload, that probe is skipped, `navigationSummary` stays unset here, and overlay diagnostics are omitted even when the page did not navigate. The wrapper then issues **one** extra session-scoped `snapshot -i`, scans that snapshot’s `refs` map, and only emits diagnostics when **both** are true: at least one ref
|
|
379
|
+
`overlayBlockers` may appear after a successful **top-level** `click` (the unified `details.command` is `click`, not `batch`/`job`/`qa` flows that compile to `batch`) only when upstream JSON includes a string `data.clicked` ref, the session’s prior pinned tab URL (`priorSessionTabTarget.url`) and the post-click active tab URL both exist and stay equal after the same URL normalization used for ref preflight (trimmed hosts/paths; **`#fragment` dropped** while the query string stays significant), and the wrapper did not apply session tab correction or an about-blank mismatch recovery in the same result. The post-click side comes from `details.navigationSummary.url`, which the wrapper fills via follow-up `get url` / `get title` only when upstream click JSON omits **both** string `data.url` and `data.title` (`shouldCaptureNavigationSummary` in `extensions/agent-browser/index.ts`). If either field is present as a string on the click payload, that probe is skipped, `navigationSummary` stays unset here, and overlay diagnostics are omitted even when the page did not navigate. The wrapper then issues **one** extra session-scoped `snapshot -i`, scans that snapshot’s `refs` map, and only emits diagnostics when **both** are true: at least one ref has a strong modal role (`dialog` or `alertdialog`), and there are up to **three** separate `button`/`link`/`menuitem` refs whose names match close/dismiss-style patterns (for example “Close”, “Dismiss”, “No thanks”, or a lone `×`). Page-wide text such as “privacy”, “sign in”, or “banner” without a dialog role is not enough, which avoids warning on ordinary same-page menu opens or app button mutations. Each candidate carries `ref` (`@eN`), optional `role`/`name`, exact `click` argv in `args`, and a short evidence `reason`. The struct also includes a `summary` string (one sentence stating that the click left the tab on the same normalized URL and the fresh snapshot shows likely dismiss controls) plus a `snapshot` object (same shape as `details.refSnapshot` after a normal snapshot): on success the wrapper may treat that snapshot as the session’s latest ref map for subsequent calls, so agents should assume refs can move to match this post-diagnostic tree. Visible text appends the same bullets under `Possible overlay blockers`, and `details.nextActions` gains `inspect-overlay-state` plus `try-overlay-blocker-candidate-1`…`3` after any presentation `nextActions` (for example `inspect-after-mutation`); when `details.sessionName` is set, those appended actions use `sessionPrefixArgs` so `params.args` begin with `--session <name>` unless argv already starts with `--session`. This is conservative evidence, not proof the candidate should be clicked; prefer `inspect-overlay-state` first unless the dismiss control is clearly safe.
|
|
379
380
|
|
|
380
381
|
Example shape (fields vary by scenario):
|
|
381
382
|
|
|
@@ -452,7 +453,7 @@ Additional structured fields can appear when relevant:
|
|
|
452
453
|
- `fullOutputPath` / `fullOutputPaths` when large snapshot output or other oversized tool output is compacted and spilled to a private file; persisted sessions keep that path under a private session-scoped artifact directory with a bounded per-session budget so it survives reload/resume without unbounded growth
|
|
453
454
|
- `artifactManifest` for a bounded, metadata-only inventory of recent session artifacts. Entries include path metadata, artifact `kind`, source `command`/`subcommand` when safe, `storageScope` (`persistent-session`, `process-temp`, or `explicit-path`), and `retentionState` (`live`, `ephemeral`, `missing`, or `evicted`). The default recent window is 100 entries and can be configured with `PI_AGENT_BROWSER_SESSION_ARTIFACT_MANIFEST_MAX_ENTRIES`. The manifest must not store command args, output contents, headers, DOM snapshots, or downloaded file contents.
|
|
454
455
|
- `artifactRetentionSummary` with a concise count of live, evicted, ephemeral, and missing artifacts from the current manifest; results append this summary to model-facing text only when retention state affects recovery, such as spill files, ephemeral files, or evictions. Routine explicit saved files keep the summary in details to avoid noisy browsing transcripts.
|
|
455
|
-
- `artifactCleanup` after a successful `close` when `artifactManifest` exists and `entries` is non-empty. Fields: `owner: "host-file-tools"`, `summary` (same retention summary string as `artifactRetentionSummary` for that manifest), `note` explaining that browser close does not delete explicit screenshots/downloads/PDFs/traces/HAR/recordings, and `explicitArtifactPaths`: up to ten **distinct** paths taken from manifest rows with `storageScope: "explicit-path"` in encounter order (de-duplicated);
|
|
456
|
+
- `artifactCleanup` after a successful `close` when `artifactManifest` exists and `entries` is non-empty. Fields: `owner: "host-file-tools"`, `summary` (same retention summary string as `artifactRetentionSummary` for that manifest), `note` explaining that browser close does not delete explicit screenshots/downloads/PDFs/traces/HAR/recordings, and `explicitArtifactPaths`: up to ten **distinct existing** paths taken from manifest rows with `storageScope: "explicit-path"` in encounter order (de-duplicated after checking the filesystem); deleted/stale explicit paths are skipped. When the recent window has no existing explicit rows—for example only spill/ephemeral inventory or explicit paths already deleted—the array is empty but `summary` / `note` still surface so agents know close is not file deletion. The native browser tool intentionally does not expose a delete operation for arbitrary user-chosen artifact paths; agents should inspect `artifactVerification` / manifest metadata, then remove files with normal host file tools when cleanup is required.
|
|
456
457
|
- compact **snapshot** metadata on successful presentation when `details.data.compacted` is true (oversized trees): `previewMode` (`"structured"` vs outline `"outline"`), `structuredPreviewUsed`, `previewRefIds`, `previewSections` (per-section `linesShown` / `omittedLines` / root `role` / `title`), `additionalSectionsOmitted`, counts such as `refCount`, `snapshotLineCount`, and `roleCounts`, optional `highValueControlRefIds` aligned with the visible `Omitted high-value controls` lines, and optional `spillError` when the wrapper could not write the raw spill file; the model text still ends with `Full raw snapshot path:` or an explicit unavailable reason plus `details.fullOutputPath` when a path exists
|
|
457
458
|
- `sessionRecoveryHint` when startup-scoped flags need `sessionMode: "fresh"` while an implicit session is already active: includes `reason`, `recommendedSessionMode` (`"fresh"`), redacted `exampleArgs`, and `exampleParams` where `sessionMode` is `"fresh"` and `args` is the same redacted argv as `exampleArgs` (from `buildExecutionPlan` in `extensions/agent-browser/lib/runtime.ts`, merged through `redactRecoveryHint` in `extensions/agent-browser/index.ts`)
|
|
458
459
|
- `inspection: true` plus `stdout` for successful plain-text inspection commands like `--help` and `--version`
|
|
@@ -30,6 +30,7 @@ import {
|
|
|
30
30
|
buildAgentBrowserNextActions,
|
|
31
31
|
buildAgentBrowserResultCategoryDetails,
|
|
32
32
|
buildToolPresentation,
|
|
33
|
+
compareRefIds,
|
|
33
34
|
getAgentBrowserErrorText,
|
|
34
35
|
parseAgentBrowserEnvelope,
|
|
35
36
|
type AgentBrowserBatchResult,
|
|
@@ -84,6 +85,7 @@ const PACKAGE_NAME = "pi-agent-browser-native";
|
|
|
84
85
|
const AGENT_BROWSER_SEMANTIC_ACTIONS = ["check", "click", "fill", "select", "uncheck"] as const;
|
|
85
86
|
const AGENT_BROWSER_SEMANTIC_LOCATORS = ["alt", "label", "placeholder", "role", "testid", "text", "title"] as const;
|
|
86
87
|
const AGENT_BROWSER_JOB_STEP_ACTIONS = ["open", "click", "fill", "wait", "assertText", "assertUrl", "waitForDownload", "screenshot"] as const;
|
|
88
|
+
const AGENT_BROWSER_QA_LOAD_STATES = ["domcontentloaded", "load", "networkidle"] as const;
|
|
87
89
|
const SOURCE_LOOKUP_WORKSPACE_EXTENSIONS = new Set([".ts", ".tsx", ".js", ".jsx"]);
|
|
88
90
|
const SOURCE_LOOKUP_IGNORED_DIRECTORIES = new Set([".git", "node_modules", "dist", "build", "coverage", ".next", "out", "tmp", "temp"]);
|
|
89
91
|
const SOURCE_LOOKUP_DEFAULT_MAX_WORKSPACE_FILES = 2_000;
|
|
@@ -92,6 +94,7 @@ const SOURCE_LOOKUP_MAX_WORKSPACE_FILES = 5_000;
|
|
|
92
94
|
type AgentBrowserSemanticActionName = (typeof AGENT_BROWSER_SEMANTIC_ACTIONS)[number];
|
|
93
95
|
type AgentBrowserSemanticLocator = (typeof AGENT_BROWSER_SEMANTIC_LOCATORS)[number];
|
|
94
96
|
type AgentBrowserJobStepAction = (typeof AGENT_BROWSER_JOB_STEP_ACTIONS)[number];
|
|
97
|
+
type AgentBrowserQaLoadState = (typeof AGENT_BROWSER_QA_LOAD_STATES)[number];
|
|
95
98
|
type AgentBrowserSourceLookupStatus = "candidates-found" | "no-candidates" | "unsupported";
|
|
96
99
|
type AgentBrowserNetworkSourceLookupStatus = "failed-requests-found" | "no-failed-requests" | "no-candidates";
|
|
97
100
|
|
|
@@ -127,6 +130,7 @@ interface CompiledAgentBrowserQaPreset extends CompiledAgentBrowserJob {
|
|
|
127
130
|
checkConsole: boolean;
|
|
128
131
|
checkErrors: boolean;
|
|
129
132
|
checkNetwork: boolean;
|
|
133
|
+
loadState: AgentBrowserQaLoadState;
|
|
130
134
|
expectedText: string[];
|
|
131
135
|
expectedSelector?: string;
|
|
132
136
|
screenshotPath?: string;
|
|
@@ -238,6 +242,7 @@ const AGENT_BROWSER_PARAMS = Type.Object({
|
|
|
238
242
|
checkConsole: Type.Optional(Type.Boolean({ description: "Whether to fail on console error messages. Defaults to true." })),
|
|
239
243
|
checkErrors: Type.Optional(Type.Boolean({ description: "Whether to fail on page errors. Defaults to true." })),
|
|
240
244
|
checkNetwork: Type.Optional(Type.Boolean({ description: "Whether to inspect network requests and fail on actionable request failures; benign icon misses warn. Defaults to true." })),
|
|
245
|
+
loadState: Type.Optional(StringEnum(AGENT_BROWSER_QA_LOAD_STATES, { description: "Page readiness state for the QA preset before assertions and diagnostics. Defaults to domcontentloaded; use networkidle only for pages without long-lived background requests." })),
|
|
241
246
|
}),
|
|
242
247
|
),
|
|
243
248
|
sourceLookup: Type.Optional(
|
|
@@ -450,16 +455,21 @@ function compileAgentBrowserQaPreset(input: unknown): { compiled?: CompiledAgent
|
|
|
450
455
|
return { error: `qa.${field} must be a boolean when provided.` };
|
|
451
456
|
}
|
|
452
457
|
}
|
|
458
|
+
const rawLoadState = input.loadState;
|
|
459
|
+
if (rawLoadState !== undefined && (typeof rawLoadState !== "string" || !AGENT_BROWSER_QA_LOAD_STATES.includes(rawLoadState as AgentBrowserQaLoadState))) {
|
|
460
|
+
return { error: `qa.loadState must be one of: ${AGENT_BROWSER_QA_LOAD_STATES.join(", ")}.` };
|
|
461
|
+
}
|
|
453
462
|
const checkConsole = input.checkConsole !== false;
|
|
454
463
|
const checkErrors = input.checkErrors !== false;
|
|
455
464
|
const checkNetwork = input.checkNetwork !== false;
|
|
465
|
+
const loadState = (rawLoadState as AgentBrowserQaLoadState | undefined) ?? "domcontentloaded";
|
|
456
466
|
const steps: CompiledAgentBrowserJobStep[] = [];
|
|
457
467
|
if (checkNetwork) steps.push({ action: "wait", args: ["network", "requests", "--clear"] });
|
|
458
468
|
if (checkConsole) steps.push({ action: "wait", args: ["console", "--clear"] });
|
|
459
469
|
if (checkErrors) steps.push({ action: "wait", args: ["errors", "--clear"] });
|
|
460
470
|
steps.push(
|
|
461
471
|
{ action: "open", args: ["open", url] },
|
|
462
|
-
{ action: "wait", args: ["wait", "--load",
|
|
472
|
+
{ action: "wait", args: ["wait", "--load", loadState] },
|
|
463
473
|
);
|
|
464
474
|
for (const text of expectedText) {
|
|
465
475
|
steps.push({ action: "assertText", args: ["wait", "--text", text] });
|
|
@@ -474,7 +484,7 @@ function compileAgentBrowserQaPreset(input: unknown): { compiled?: CompiledAgent
|
|
|
474
484
|
return {
|
|
475
485
|
compiled: {
|
|
476
486
|
args: ["batch"],
|
|
477
|
-
checks: { checkConsole, checkErrors, checkNetwork, expectedSelector, expectedText, screenshotPath, url },
|
|
487
|
+
checks: { checkConsole, checkErrors, checkNetwork, expectedSelector, expectedText, loadState, screenshotPath, url },
|
|
478
488
|
stdin: JSON.stringify(steps.map((step) => step.args)),
|
|
479
489
|
steps,
|
|
480
490
|
},
|
|
@@ -969,6 +979,61 @@ function buildSemanticActionCandidateActions(compiled: CompiledAgentBrowserSeman
|
|
|
969
979
|
return [];
|
|
970
980
|
}
|
|
971
981
|
|
|
982
|
+
function normalizeSemanticActionAccessibleName(name: string): string {
|
|
983
|
+
return name.replace(/\s+/g, " ").trim().toLowerCase();
|
|
984
|
+
}
|
|
985
|
+
|
|
986
|
+
function semanticActionNameMatches(candidateName: string, targetName: string): boolean {
|
|
987
|
+
const normalizedCandidate = normalizeSemanticActionAccessibleName(candidateName);
|
|
988
|
+
const normalizedTarget = normalizeSemanticActionAccessibleName(targetName);
|
|
989
|
+
return normalizedCandidate === normalizedTarget || normalizedCandidate.startsWith(`${normalizedTarget} `);
|
|
990
|
+
}
|
|
991
|
+
|
|
992
|
+
function getCompiledSemanticActionRoleTarget(compiled: CompiledAgentBrowserSemanticAction): { role: string; targetName: string } | undefined {
|
|
993
|
+
if (compiled.locator !== "role" || !["check", "click", "uncheck"].includes(compiled.action)) return undefined;
|
|
994
|
+
const findIndex = compiled.args.indexOf("find");
|
|
995
|
+
if (findIndex < 0 || compiled.args[findIndex + 1] !== "role") return undefined;
|
|
996
|
+
const role = compiled.args[findIndex + 2];
|
|
997
|
+
const nameFlagIndex = compiled.args.indexOf("--name");
|
|
998
|
+
const targetName = nameFlagIndex >= 0 ? compiled.args[nameFlagIndex + 1] : undefined;
|
|
999
|
+
if (!role || !targetName) return undefined;
|
|
1000
|
+
return { role, targetName };
|
|
1001
|
+
}
|
|
1002
|
+
|
|
1003
|
+
function findSemanticActionRefInSnapshot(compiled: CompiledAgentBrowserSemanticAction, snapshotData: unknown): string | undefined {
|
|
1004
|
+
const target = getCompiledSemanticActionRoleTarget(compiled);
|
|
1005
|
+
const refs = getSnapshotRefRecord(snapshotData);
|
|
1006
|
+
if (!target || !refs) return undefined;
|
|
1007
|
+
const candidates = Object.entries(refs).flatMap(([ref, entry]) => {
|
|
1008
|
+
if (!/^e\d+$/.test(ref) || !isRecord(entry)) return [];
|
|
1009
|
+
const role = typeof entry.role === "string" ? entry.role : undefined;
|
|
1010
|
+
const name = typeof entry.name === "string" ? entry.name : undefined;
|
|
1011
|
+
if (!role || !name || role.toLowerCase() !== target.role.toLowerCase() || !semanticActionNameMatches(name, target.targetName)) return [];
|
|
1012
|
+
return [{ exact: normalizeSemanticActionAccessibleName(name) === normalizeSemanticActionAccessibleName(target.targetName), name, ref }];
|
|
1013
|
+
});
|
|
1014
|
+
candidates.sort((left, right) => Number(right.exact) - Number(left.exact) || left.name.length - right.name.length || compareRefIds(left.ref, right.ref));
|
|
1015
|
+
return candidates[0]?.ref;
|
|
1016
|
+
}
|
|
1017
|
+
|
|
1018
|
+
interface SemanticActionVisibleRefResolution {
|
|
1019
|
+
args: string[];
|
|
1020
|
+
snapshot: SessionRefSnapshot;
|
|
1021
|
+
}
|
|
1022
|
+
|
|
1023
|
+
async function resolveSemanticActionVisibleRefArgs(options: {
|
|
1024
|
+
compiled: CompiledAgentBrowserSemanticAction | undefined;
|
|
1025
|
+
cwd: string;
|
|
1026
|
+
sessionName?: string;
|
|
1027
|
+
signal?: AbortSignal;
|
|
1028
|
+
}): Promise<SemanticActionVisibleRefResolution | undefined> {
|
|
1029
|
+
if (!options.compiled || !options.sessionName || !getCompiledSemanticActionRoleTarget(options.compiled)) return undefined;
|
|
1030
|
+
const snapshotData = await runSessionCommandData({ args: ["snapshot", "-i"], cwd: options.cwd, sessionName: options.sessionName, signal: options.signal });
|
|
1031
|
+
const ref = findSemanticActionRefInSnapshot(options.compiled, snapshotData);
|
|
1032
|
+
const snapshot = extractRefSnapshotFromData(snapshotData);
|
|
1033
|
+
if (!ref || !snapshot) return undefined;
|
|
1034
|
+
return { args: [...getCompiledSemanticActionSessionPrefix(options.compiled), options.compiled.action, `@${ref}`], snapshot };
|
|
1035
|
+
}
|
|
1036
|
+
|
|
972
1037
|
function compileAgentBrowserSemanticAction(input: unknown): { compiled?: CompiledAgentBrowserSemanticAction; error?: string } {
|
|
973
1038
|
if (!isRecord(input)) {
|
|
974
1039
|
return { error: "semanticAction must be an object." };
|
|
@@ -2627,7 +2692,6 @@ function getSnapshotRefRecord(data: unknown): Record<string, unknown> | undefine
|
|
|
2627
2692
|
}
|
|
2628
2693
|
|
|
2629
2694
|
const OVERLAY_CLOSE_NAME_PATTERN = /(?:\b(?:close|dismiss|no thanks|not now|maybe later|hide|skip|continue without|x)\b|^\s*×\s*$)/i;
|
|
2630
|
-
const OVERLAY_CONTEXT_NAME_PATTERN = /\b(?:banner|modal|dialog|popup|pop-up|overlay|donat(?:e|ion)|subscribe|sign in|login|cookie|privacy|consent)\b/i;
|
|
2631
2695
|
const OVERLAY_CONTEXT_ROLES = new Set(["alertdialog", "dialog"]);
|
|
2632
2696
|
const OVERLAY_ACTION_ROLES = new Set(["button", "link", "menuitem"]);
|
|
2633
2697
|
const OVERLAY_BLOCKER_CANDIDATE_LIMIT = 3;
|
|
@@ -2638,8 +2702,7 @@ function getOverlayBlockerCandidates(snapshotData: unknown): OverlayBlockerCandi
|
|
|
2638
2702
|
const hasOverlayContext = Object.values(refs).some((entry) => {
|
|
2639
2703
|
if (!isRecord(entry)) return false;
|
|
2640
2704
|
const role = typeof entry.role === "string" ? entry.role : "";
|
|
2641
|
-
|
|
2642
|
-
return OVERLAY_CONTEXT_ROLES.has(role.toLowerCase()) || OVERLAY_CONTEXT_NAME_PATTERN.test(name);
|
|
2705
|
+
return OVERLAY_CONTEXT_ROLES.has(role.toLowerCase());
|
|
2643
2706
|
});
|
|
2644
2707
|
if (!hasOverlayContext) return [];
|
|
2645
2708
|
const candidates: OverlayBlockerCandidate[] = [];
|
|
@@ -2810,16 +2873,27 @@ function formatEvalStdinHintText(hint: EvalStdinHint | undefined): string | unde
|
|
|
2810
2873
|
return hint ? `Eval stdin hint: ${hint.reason} ${hint.suggestion}` : undefined;
|
|
2811
2874
|
}
|
|
2812
2875
|
|
|
2813
|
-
function getArtifactCleanupGuidance(options: { command?: string; manifest?: SessionArtifactManifest; succeeded: boolean }): ArtifactCleanupGuidance | undefined {
|
|
2876
|
+
async function getArtifactCleanupGuidance(options: { command?: string; cwd: string; manifest?: SessionArtifactManifest; succeeded: boolean }): Promise<ArtifactCleanupGuidance | undefined> {
|
|
2814
2877
|
if (!options.succeeded || options.command !== "close" || !options.manifest || options.manifest.entries.length === 0) return undefined;
|
|
2815
|
-
const
|
|
2816
|
-
|
|
2817
|
-
|
|
2818
|
-
|
|
2819
|
-
.
|
|
2878
|
+
const explicitEntries = options.manifest.entries.filter((entry) => entry.storageScope === "explicit-path");
|
|
2879
|
+
const explicitArtifactPaths: string[] = [];
|
|
2880
|
+
const seenPaths = new Set<string>();
|
|
2881
|
+
for (const entry of explicitEntries) {
|
|
2882
|
+
if (explicitArtifactPaths.length >= 10) break;
|
|
2883
|
+
const displayPath = entry.path;
|
|
2884
|
+
if (seenPaths.has(displayPath)) continue;
|
|
2885
|
+
const absolutePath = entry.absolutePath ?? (isAbsolute(entry.path) ? entry.path : resolve(options.cwd, entry.path));
|
|
2886
|
+
try {
|
|
2887
|
+
await stat(absolutePath);
|
|
2888
|
+
} catch {
|
|
2889
|
+
continue;
|
|
2890
|
+
}
|
|
2891
|
+
seenPaths.add(displayPath);
|
|
2892
|
+
explicitArtifactPaths.push(displayPath);
|
|
2893
|
+
}
|
|
2820
2894
|
return {
|
|
2821
2895
|
explicitArtifactPaths,
|
|
2822
|
-
note: "Closing the browser session does not delete explicit screenshots, downloads, PDFs, traces, HAR files, or recordings; clean
|
|
2896
|
+
note: "Closing the browser session does not delete explicit screenshots, downloads, PDFs, traces, HAR files, or recordings; clean existing paths with host file tools when no longer needed.",
|
|
2823
2897
|
owner: "host-file-tools",
|
|
2824
2898
|
summary: formatSessionArtifactRetentionSummary(options.manifest),
|
|
2825
2899
|
};
|
|
@@ -3457,12 +3531,29 @@ export default function agentBrowserExtension(pi: ExtensionAPI) {
|
|
|
3457
3531
|
const runTool = async (): Promise<AgentBrowserToolResult> => {
|
|
3458
3532
|
const sessionMode = params.sessionMode ?? DEFAULT_SESSION_MODE;
|
|
3459
3533
|
const freshSessionName = createFreshSessionName(managedSessionBaseName, ephemeralSessionSeed, freshSessionOrdinal + 1);
|
|
3460
|
-
|
|
3534
|
+
let executionPlan = buildExecutionPlan(preparedArgs.args, {
|
|
3461
3535
|
freshSessionName,
|
|
3462
3536
|
managedSessionActive,
|
|
3463
3537
|
managedSessionName,
|
|
3464
3538
|
sessionMode,
|
|
3465
3539
|
});
|
|
3540
|
+
let semanticActionVisibleRefResolution: SemanticActionVisibleRefResolution | undefined;
|
|
3541
|
+
if (!executionPlan.validationError && executionPlan.managedSessionName !== freshSessionName) {
|
|
3542
|
+
semanticActionVisibleRefResolution = await resolveSemanticActionVisibleRefArgs({
|
|
3543
|
+
compiled: compiledSemanticAction,
|
|
3544
|
+
cwd: ctx.cwd,
|
|
3545
|
+
sessionName: executionPlan.sessionName,
|
|
3546
|
+
signal,
|
|
3547
|
+
});
|
|
3548
|
+
if (semanticActionVisibleRefResolution) {
|
|
3549
|
+
executionPlan = buildExecutionPlan(semanticActionVisibleRefResolution.args, {
|
|
3550
|
+
freshSessionName,
|
|
3551
|
+
managedSessionActive,
|
|
3552
|
+
managedSessionName,
|
|
3553
|
+
sessionMode,
|
|
3554
|
+
});
|
|
3555
|
+
}
|
|
3556
|
+
}
|
|
3466
3557
|
const redactedEffectiveArgs = redactInvocationArgs(executionPlan.effectiveArgs);
|
|
3467
3558
|
const redactedRecoveryHint = redactRecoveryHint(executionPlan.recoveryHint);
|
|
3468
3559
|
const compatibilityWorkaround: CompatibilityWorkaround | undefined = executionPlan.compatibilityWorkaround;
|
|
@@ -3490,7 +3581,7 @@ export default function agentBrowserExtension(pi: ExtensionAPI) {
|
|
|
3490
3581
|
};
|
|
3491
3582
|
}
|
|
3492
3583
|
|
|
3493
|
-
const commandTokens = extractCommandTokens(preparedArgs.args);
|
|
3584
|
+
const commandTokens = semanticActionVisibleRefResolution ? extractCommandTokens(semanticActionVisibleRefResolution.args) : extractCommandTokens(preparedArgs.args);
|
|
3494
3585
|
const exactSensitiveValues = getExactSensitiveStdinValues({
|
|
3495
3586
|
command: executionPlan.commandInfo.command,
|
|
3496
3587
|
commandTokens,
|
|
@@ -3560,10 +3651,13 @@ export default function agentBrowserExtension(pi: ExtensionAPI) {
|
|
|
3560
3651
|
const priorSessionTabTargetState = executionPlan.sessionName ? sessionTabTargets.get(executionPlan.sessionName) : undefined;
|
|
3561
3652
|
const priorSessionTabTarget = priorSessionTabTargetState?.target;
|
|
3562
3653
|
const priorRefSnapshotState = executionPlan.sessionName ? sessionRefSnapshots.get(executionPlan.sessionName) : undefined;
|
|
3654
|
+
const resolvedSemanticActionRefSnapshot = semanticActionVisibleRefResolution?.snapshot
|
|
3655
|
+
? { ...semanticActionVisibleRefResolution.snapshot, target: semanticActionVisibleRefResolution.snapshot.target ?? priorSessionTabTarget }
|
|
3656
|
+
: undefined;
|
|
3563
3657
|
const staleRefPreflight = buildStaleRefPreflight({
|
|
3564
3658
|
commandTokens,
|
|
3565
3659
|
currentTarget: priorSessionTabTarget,
|
|
3566
|
-
refSnapshot: priorRefSnapshotState,
|
|
3660
|
+
refSnapshot: resolvedSemanticActionRefSnapshot ?? priorRefSnapshotState,
|
|
3567
3661
|
stdin: toolStdin,
|
|
3568
3662
|
});
|
|
3569
3663
|
if (staleRefPreflight) {
|
|
@@ -3937,7 +4031,7 @@ export default function agentBrowserExtension(pi: ExtensionAPI) {
|
|
|
3937
4031
|
? extractRefSnapshotFromData(presentationEnvelope?.data)
|
|
3938
4032
|
: executionPlan.commandInfo.command === "batch"
|
|
3939
4033
|
? extractRefSnapshotFromBatchResults(presentationEnvelope?.data)
|
|
3940
|
-
: overlayBlockerDiagnostic?.snapshot
|
|
4034
|
+
: resolvedSemanticActionRefSnapshot ?? overlayBlockerDiagnostic?.snapshot
|
|
3941
4035
|
: undefined;
|
|
3942
4036
|
if (refSnapshot && shouldApplySessionTabTargetUpdate({ current: sessionRefSnapshots.get(executionPlan.sessionName), updateOrder: tabTargetUpdateOrder })) {
|
|
3943
4037
|
currentRefSnapshot = { ...refSnapshot, target: refSnapshot.target ?? currentSessionTabTarget };
|
|
@@ -4082,8 +4176,9 @@ export default function agentBrowserExtension(pi: ExtensionAPI) {
|
|
|
4082
4176
|
stdin: toolStdin,
|
|
4083
4177
|
});
|
|
4084
4178
|
const resultArtifactManifest = presentation.artifactManifest ?? artifactManifest;
|
|
4085
|
-
const artifactCleanup = getArtifactCleanupGuidance({
|
|
4179
|
+
const artifactCleanup = await getArtifactCleanupGuidance({
|
|
4086
4180
|
command: executionPlan.commandInfo.command,
|
|
4181
|
+
cwd: ctx.cwd,
|
|
4087
4182
|
manifest: resultArtifactManifest,
|
|
4088
4183
|
succeeded,
|
|
4089
4184
|
});
|
|
@@ -17,7 +17,7 @@ export const QUICK_START_GUIDELINES = [
|
|
|
17
17
|
"Quick start mental model: use exactly one of args (exact agent-browser CLI args after the binary), semanticAction (a thin find-locator shorthand compiled to find argv), job (a constrained short-workflow schema compiled to batch), qa (a lightweight QA preset built on job/batch), or the experimental sourceLookup / networkSourceLookup helpers (each compiled to batch); stdin is only for batch, eval --stdin, auth save --password-stdin, and wrapper-generated batch stdin from job, qa, sourceLookup, or networkSourceLookup, and other command/stdin combinations are rejected before launch; sessionMode=fresh switches the extension-managed pi-scoped session to a fresh upstream launch when you need new --profile, --session-name, --cdp, --state, --auto-connect, --init-script, --enable, -p/--provider, or iOS --device state.",
|
|
18
18
|
"There is no first-class reusable named browser recipe runtime above top-level job, the qa preset, and raw batch stdin; keep recurring flows in documentation examples or those inputs (closed RQ-0068; see docs/ARCHITECTURE.md#no-reusable-recipe-layer-yet).",
|
|
19
19
|
"Common first calls: { args: [\"open\", \"https://example.com\"] } then { args: [\"snapshot\", \"-i\"] }; after navigation, use { args: [\"click\", \"@e2\"] } then { args: [\"snapshot\", \"-i\"] }.",
|
|
20
|
-
"Locator-first clicks and fills without hand-building find argv: { semanticAction: { action: \"click\", locator: \"text\", value: \"Close\" } } or { semanticAction: { action: \"fill\", locator: \"label\", value: \"Email\", text: \"user@example.com\" } }; add semanticAction.session when targeting a named upstream browser session; details.compiledSemanticAction shows the
|
|
20
|
+
"Locator-first clicks and fills without hand-building find argv: { semanticAction: { action: \"click\", locator: \"text\", value: \"Close\" } } or { semanticAction: { action: \"fill\", locator: \"label\", value: \"Email\", text: \"user@example.com\" } }; add semanticAction.session when targeting a named upstream browser session; details.compiledSemanticAction shows the semantic target, while details.effectiveArgs may show a resolved current @ref for active-session role/name click/check/uncheck actions to avoid hidden duplicate matches; selector-not-found failures may append bounded try-*-candidate next actions (and an Agent-browser candidate fallbacks prose block) for specific placeholder/text/label shapes, and stale-ref failures can return retry-semantic-action-after-stale-ref when retry safety is provable.",
|
|
21
21
|
"Common advanced calls: { args: [\"batch\"], stdin: \"[[\\\"open\\\",\\\"https://example.com\\\"],[\\\"snapshot\\\",\\\"-i\\\"]]\" }, { job: { steps: [{ action: \"open\", url: \"https://example.com\" }, { action: \"assertText\", text: \"Example Domain\" }, { action: \"screenshot\", path: \".dogfood/example.png\" }] } }, { qa: { url: \"https://example.com\", expectedText: \"Example Domain\", screenshotPath: \".dogfood/qa-example.png\" } }, { args: [\"eval\", \"--stdin\"], stdin: \"document.title\" }, { args: [\"auth\", \"save\", \"name\", \"--password-stdin\"], stdin: \"<password from user-approved secret source>\" }, { args: [\"--profile\", \"Default\", \"open\", \"https://example.com/account\"], sessionMode: \"fresh\" }, and { args: [\"open\", \"--enable\", \"react-devtools\", \"https://example.com\"], sessionMode: \"fresh\" }.",
|
|
22
22
|
"High-value command reference: download <selector> <path> saves a file triggered by a click; get title/url/text/html/value/attr/count reads page state; screenshot [path] captures an image; pdf <path> saves a PDF; tab list and tab <tab-id-or-label> inspect or recover the active tab; react tree/inspect/renders/suspense introspect React after --enable react-devtools; vitals [url] measures Core Web Vitals; pushstate <url> performs SPA navigation.",
|
|
23
23
|
"For artifact-producing commands, read the visible artifact block and details.artifactVerification before using files: check requested path, absolute path, existence, size bytes, artifact kind, optional mediaType, status, optional limitation, and verified/missing/pending/unverified counts. details.artifacts contains per-file metadata. Browser close does not delete explicit saved files; if close reports details.artifactCleanup, use host file tools to remove paths listed in explicitArtifactPaths (when non-empty) after inspection. For annotated screenshots inside batch, put --annotate in top-level args (for example { args: [\"--annotate\", \"batch\"], stdin: \"[[\\\"screenshot\\\",\\\"/tmp/page.png\\\"]]\" }) rather than inside the screenshot step.",
|
|
@@ -49,7 +49,7 @@ export const SHARED_BROWSER_PLAYBOOK_GUIDELINES = [
|
|
|
49
49
|
"For downloads, prefer download <selector> <path> when an element click should save a file. Do not rely on click alone when you need the downloaded file on disk.",
|
|
50
50
|
"When using eval --stdin, scope checks and actions to the target element or route whenever possible instead of relying on broad page-wide text heuristics.",
|
|
51
51
|
"When using eval --stdin for extraction, return the value you want instead of relying on console.log as the primary result channel. Prefer plain expressions like ({ title: document.title }) or explicitly invoked functions like (() => ({ title: document.title }))(); if a function-shaped snippet returns {}, details.evalStdinHint may warn that the function was serialized instead of called. If get text on a CSS selector surfaces details.selectorTextVisibility or selectorTextVisibilityAll, prefer a visible @ref, a more specific selector, or the inspect-visible-text-candidates nextAction over hidden tab content.",
|
|
52
|
-
"When details.pageChangeSummary is present, use changeType and summary as a compact signal for navigation, DOM mutation, confirmations, or artifacts; when nextActionIds is set, match those ids to entries in details.nextActions (or per-step nextActions inside batch) for concrete follow-up payloads instead of inferring from prose alone. If a no-navigation click surfaces details.overlayBlockers, inspect the fresh snapshot evidence before using a close/dismiss candidate nextAction.",
|
|
52
|
+
"When details.pageChangeSummary is present, use changeType and summary as a compact signal for navigation, DOM mutation, confirmations, or artifacts; when nextActionIds is set, match those ids to entries in details.nextActions (or per-step nextActions inside batch) for concrete follow-up payloads instead of inferring from prose alone. If a no-navigation click surfaces details.overlayBlockers, inspect the fresh snapshot evidence before using a close/dismiss candidate nextAction; ordinary page chrome without dialog/alertdialog evidence should not trigger this diagnostic.",
|
|
53
53
|
"When commands save or spill files (screenshots, downloads, PDFs, traces, recordings, HAR, large snapshot spills), treat paths as provisional until details.artifactVerification shows every row verified: branch on missingCount, pendingCount, unverifiedCount, per-entry state, and optional limitation before downstream file use.",
|
|
54
54
|
"Do not call --help or other exploratory inspection commands unless the user explicitly asks for them or debugging the browser integration is necessary.",
|
|
55
55
|
] as const;
|
|
@@ -609,12 +609,27 @@ function formatNetworkRequestsText(data: Record<string, unknown>): string | unde
|
|
|
609
609
|
const shown = networkFailureSummary.totalCount > 0
|
|
610
610
|
? [`Network failure summary: ${networkFailureSummary.actionableCount} actionable, ${networkFailureSummary.benignCount} benign low-impact (${networkFailureSummary.totalCount} total).`]
|
|
611
611
|
: [];
|
|
612
|
-
|
|
612
|
+
const indexedRequests = requests.map((item, index) => ({ index, item }));
|
|
613
|
+
const failedRequests: typeof indexedRequests = [];
|
|
614
|
+
const normalRequests: typeof indexedRequests = [];
|
|
615
|
+
for (const indexed of indexedRequests) {
|
|
616
|
+
if (isRecord(indexed.item) && classifyNetworkRequestFailure(indexed.item)) failedRequests.push(indexed);
|
|
617
|
+
else normalRequests.push(indexed);
|
|
618
|
+
}
|
|
619
|
+
failedRequests.sort((left, right) => {
|
|
620
|
+
const leftClassification = isRecord(left.item) ? classifyNetworkRequestFailure(left.item) : undefined;
|
|
621
|
+
const rightClassification = isRecord(right.item) ? classifyNetworkRequestFailure(right.item) : undefined;
|
|
622
|
+
const leftRank = leftClassification?.impact === "actionable" ? 0 : 1;
|
|
623
|
+
const rightRank = rightClassification?.impact === "actionable" ? 0 : 1;
|
|
624
|
+
return leftRank - rightRank || left.index - right.index;
|
|
625
|
+
});
|
|
626
|
+
const prioritizedRequests = [...failedRequests, ...normalRequests];
|
|
627
|
+
shown.push(...prioritizedRequests.slice(0, DIAGNOSTIC_REQUEST_PREVIEW_LIMIT).flatMap(({ item, index }) => {
|
|
613
628
|
if (!isRecord(item)) return [`${index + 1}. ${stringifyModelFacing(item)}`];
|
|
614
629
|
return formatNetworkRequestLine(item, index);
|
|
615
630
|
}));
|
|
616
631
|
if (requests.length > DIAGNOSTIC_REQUEST_PREVIEW_LIMIT) {
|
|
617
|
-
shown.push(`... (${requests.length - DIAGNOSTIC_REQUEST_PREVIEW_LIMIT} additional requests omitted from preview)`);
|
|
632
|
+
shown.push(`... (${requests.length - DIAGNOSTIC_REQUEST_PREVIEW_LIMIT} additional requests omitted from preview; failed requests are shown first when present)`);
|
|
618
633
|
}
|
|
619
634
|
return shown.join("\n");
|
|
620
635
|
}
|
package/package.json
CHANGED