pi-agent-browser-native 0.2.30 → 0.2.31
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/CHANGELOG.md +13 -0
- package/README.md +11 -8
- package/docs/ARCHITECTURE.md +3 -3
- package/docs/COMMAND_REFERENCE.md +12 -8
- package/docs/RELEASE.md +11 -11
- package/docs/REQUIREMENTS.md +4 -3
- package/docs/SUPPORT_MATRIX.md +13 -5
- package/docs/TOOL_CONTRACT.md +30 -20
- package/extensions/agent-browser/index.ts +145 -33
- package/extensions/agent-browser/lib/playbook.ts +10 -10
- package/extensions/agent-browser/lib/results/presentation.ts +154 -2
- package/extensions/agent-browser/lib/results/shared.ts +7 -1
- package/package.json +1 -1
package/docs/TOOL_CONTRACT.md
CHANGED
|
@@ -47,14 +47,14 @@ Agent-facing efficiency claims are measured with `npm run benchmark:agent-browse
|
|
|
47
47
|
- For first-navigation setup, use open without a URL plus network route --resource-type <csv>, cookies set --curl <file>, or --init-script/--enable before navigate/opening the target page.
|
|
48
48
|
- For stateful browser context work, prefer purpose-specific page actions before dumping browser data: use auth save --password-stdin with the tool stdin field for credentials, state save/load for portable test state, cookies get/set/clear and storage local|session only when the task needs those values, and expect cookie/storage/auth/state summaries to redact credential-like fields.
|
|
49
49
|
- For batch chains that touch cookies, storage, auth, or other secret-bearing commands, use details.batchSteps for per-step artifacts, categories, spill paths, and full structured errors; top-level details.data on batch is only a compact redacted step matrix (success, argv-redacted command, redacted result or scrubbed error text) built from the same presentation rules as standalone calls.
|
|
50
|
-
- For non-core families, pass current upstream commands through the native tool directly: network route/requests/har, diff snapshot/screenshot/url, trace/profiler/record, console/errors/highlight/inspect/clipboard, stream enable/disable/status, dashboard start/stop, and chat. Artifact-producing commands report details.artifacts and verification state; long-running starts such as stream, dashboard, trace/profiler, and record should be paired with the matching stop/disable command when the task is done.
|
|
50
|
+
- For non-core families, pass current upstream commands through the native tool directly: network route/requests/har, diff snapshot/screenshot/url, trace/profiler/record, console/errors/highlight/inspect/clipboard, stream enable/disable/status, dashboard start/stop, and chat. For compact network requests output, prefer details.nextActions for request detail, actionable failed-request networkSourceLookup, filtering, or HAR capture follow-ups instead of guessing request-id syntax. Artifact-producing commands report details.artifacts and verification state; long-running starts such as stream, dashboard, trace/profiler, and record should be paired with the matching stop/disable command when the task is done.
|
|
51
51
|
- For provider or specialized app workflows, load version-matched upstream guidance with skills get agentcore|electron|slack|dogfood|vercel-sandbox through the native tool. Provider launches such as -p ios, --provider browserbase/kernel/browseruse/browserless/agentcore, and iOS --device are upstream-owned setup paths; use sessionMode fresh when switching providers and expect external credentials or local Appium/Xcode setup to be required.
|
|
52
52
|
- For dialogs and frames, use dialog status/accept/dismiss and frame <selector|main> through native args; when --confirm-actions produces a pending confirmation, use details.nextActions or exact confirm <id> / deny <id> calls instead of inventing ids.
|
|
53
53
|
- If a session lands on the wrong page or tab, an interaction changes origin unexpectedly, or an open call returns blocked, blank, or otherwise unexpected results, use tab list / tab <tab-id-or-label> / snapshot -i to recover state before retrying different URLs or fallback strategies. Only use wait with an explicit argument like milliseconds, --load <state>, --url <matcher>, --fn <js>, or --text <matcher>.
|
|
54
54
|
- For feed, timeline, or inbox reading tasks, focus on the main timeline/list region and read the first item there rather than unrelated composer or sidebar content.
|
|
55
55
|
- For read-only browsing tasks, prefer extracting the answer from the current snapshot, structured ref labels, or eval --stdin on the current page before navigating away. Only click into media viewers, detail routes, or new pages when the current view does not contain the needed information.
|
|
56
56
|
- For downloads, prefer download <selector> <path> when an element click should save a file. Do not rely on click alone when you need the downloaded file on disk.
|
|
57
|
-
- On dashboards with nested scroll containers, verify scroll with a screenshot or fresh snapshot -i; if the viewport did not move, prefer scrollintoview <@ref> or target the actual scrollable region. For comboboxes, a click/semanticAction may only focus the field
|
|
57
|
+
- On dashboards with nested scroll containers, verify scroll with a screenshot or fresh snapshot -i; if the viewport did not move, prefer scrollintoview <@ref> or target the actual scrollable region. For native selects, use select <selector> <value...> (or semanticAction/job select) instead of clicking option refs; for custom comboboxes, a click/semanticAction may only focus the field, so re-snapshot and fall back to type, press Enter/arrow keys, or visible option refs.
|
|
58
58
|
- When using eval --stdin, scope checks and actions to the target element or route whenever possible instead of relying on broad page-wide text heuristics.
|
|
59
59
|
- When using eval --stdin for extraction, return the value you want instead of relying on console.log as the primary result channel. Prefer plain expressions like ({ title: document.title }) or explicitly invoked functions like (() => ({ title: document.title }))(); if a function-shaped snippet returns {}, details.evalStdinHint may warn that the function was serialized instead of called. If get text on a CSS selector surfaces details.selectorTextVisibility or selectorTextVisibilityAll, prefer a visible @ref, a more specific selector, or the inspect-visible-text-candidates nextAction over hidden tab content.
|
|
60
60
|
- When details.pageChangeSummary is present, use changeType and summary as a compact signal for navigation, DOM mutation, confirmations, or artifacts; when nextActionIds is set, match those ids to entries in details.nextActions (or per-step nextActions inside batch) for concrete follow-up payloads instead of inferring from prose alone. If a no-navigation click surfaces details.overlayBlockers, inspect the fresh snapshot evidence before using a close/dismiss candidate nextAction; ordinary page chrome without dialog/alertdialog evidence should not trigger this diagnostic.
|
|
@@ -72,6 +72,7 @@ Illustrative shapes (each real call uses exactly one of `args`, `semanticAction`
|
|
|
72
72
|
|
|
73
73
|
```json
|
|
74
74
|
{ "semanticAction": { "action": "click", "locator": "role", "value": "button", "name": "Export" }, "sessionMode": "auto" }
|
|
75
|
+
{ "semanticAction": { "action": "select", "selector": "#flavor", "value": "chocolate" } }
|
|
75
76
|
```
|
|
76
77
|
|
|
77
78
|
### `args`
|
|
@@ -96,14 +97,15 @@ Examples:
|
|
|
96
97
|
- type: object
|
|
97
98
|
- optional; mutually exclusive with `args`, `job`, `qa`, `sourceLookup`, and `networkSourceLookup` (omit all of them when using this field)
|
|
98
99
|
- top-level tool input only: `batch` stdin remains upstream argv arrays; express find steps inside batch as string arrays such as `["find","role","button","click","--name","Export"]`, not nested `semanticAction` objects
|
|
99
|
-
- thin intent schema compiled by this wrapper into existing upstream `find`
|
|
100
|
-
- supported actions: `click`, `fill`, `check`, `uncheck`
|
|
101
|
-
- supported locators
|
|
102
|
-
- `value` is the locator argument (for example ARIA role token `"button"`, label text, or visible substring), must be a non-empty string after trim
|
|
100
|
+
- thin intent schema compiled by this wrapper into existing upstream commands; locator actions compile to `find`, while native dropdown selection compiles to `select <selector> <value...>`; behavior and locator/selector semantics stay upstream-owned
|
|
101
|
+
- supported actions: `click`, `fill`, `check`, `select`, `uncheck`
|
|
102
|
+
- supported locators for `click` / `fill` / `check` / `uncheck`: `role`, `text`, `label`, `placeholder`, `alt`, `title`, `testid`
|
|
103
|
+
- for locator actions, `value` is the locator argument (for example ARIA role token `"button"`, label text, or visible substring), must be a non-empty string after trim
|
|
103
104
|
- `fill` requires non-empty `text` (compiled as the trailing value argument to `find`)
|
|
105
|
+
- `select` requires non-empty `selector` plus either `value` (single option value) or `values` (non-empty array of option values). `select` does not accept `locator`, `role`, `name`, or `text`; upstream `find` does not expose a verified `select` action, so role/name/label dropdown targeting must first be resolved to a stable selector or current `@ref`.
|
|
104
106
|
- optional `name` is only valid with `locator: "role"` and compiles to `--name <name>` after the action (and after `text` for `fill` when present)
|
|
105
107
|
- optional `role` is accepted only when `locator` is `role` and must equal `value` if set (redundant with `value`; prefer `value` alone)
|
|
106
|
-
- optional `session` is an upstream session name; when set, compilation prepends `--session <session>` before `find` so the shorthand targets that named browser context instead of the managed default; this is independent of top-level `sessionMode`, which only injects or rotates the extension-managed implicit session when the planned argv does not already start with `--session` (see `buildExecutionPlan` in `extensions/agent-browser/lib/runtime.ts`). On successful unified results, `details.sessionName` matches that name and `usedImplicitSession` is `false` because the call named upstream directly rather than consuming the extension-managed implicit session slot.
|
|
108
|
+
- optional `session` is an upstream session name; when set, compilation prepends `--session <session>` before the compiled `find` or `select` command so the shorthand targets that named browser context instead of the managed default; this is independent of top-level `sessionMode`, which only injects or rotates the extension-managed implicit session when the planned argv does not already start with `--session` (see `buildExecutionPlan` in `extensions/agent-browser/lib/runtime.ts`). On successful unified results, `details.sessionName` matches that name and `usedImplicitSession` is `false` because the call named upstream directly rather than consuming the extension-managed implicit session slot.
|
|
107
109
|
|
|
108
110
|
Compilation (then `--json` and session handling apply like any other call):
|
|
109
111
|
|
|
@@ -112,15 +114,16 @@ Compilation (then `--json` and session handling apply like any other call):
|
|
|
112
114
|
| `click`, `check`, or `uncheck` + non-`role` locator | `["find", <locator>, <value>, <action>]` |
|
|
113
115
|
| `click` / `check` / `uncheck` + `role` + optional `name` | `["find","role",<value>,<action>]` plus `["--name",<name>]` when `name` is set |
|
|
114
116
|
| `fill` | `["find",<locator>,<value>,"fill",<text>]` plus optional `["--name",<name>]` after `text` when `locator` is `role` and `name` is set |
|
|
115
|
-
|
|
|
117
|
+
| `select` + `selector` + `value` / `values` | `["select",<selector>,<value...>]` |
|
|
118
|
+
| any supported action + `session` | prepends `["--session",<session>]` before the compiled argv |
|
|
116
119
|
|
|
117
|
-
When `semanticAction` compiles successfully, `details.compiledSemanticAction` echoes `{ action, locator, args }` with `args` redacted the same way as other invocation details. Expect it on the initial wrapper validation return (when that path still builds the early `details` object) and on the unified result after `agent-browser` runs. It is omitted when the call used `args` only, when compilation never produced argv, and on some in-`execute` error returns that attach a slimmer `details` shape before the unified merge (for example certain session-plan, stdin-contract, tab-pinning, or missing-binary guard paths); compare `extensions/agent-browser/index.ts` where `compiledSemanticAction` is assigned. For active sessions, role/name `click`, `check`, and `uncheck` semantic actions may be resolved through one fresh `snapshot -i` to a current visible `@ref` before execution; this avoids hidden duplicate matches stealing an upstream `find` action. In that case `details.compiledSemanticAction` still records the original semantic target while `details.effectiveArgs` shows the executed ref action.
|
|
120
|
+
When `semanticAction` compiles successfully, `details.compiledSemanticAction` echoes `{ action, locator, args }` for `find` actions or `{ action: "select", selector, values, args }` for `select`, with `args` redacted the same way as other invocation details. Expect it on the initial wrapper validation return (when that path still builds the early `details` object) and on the unified result after `agent-browser` runs. It is omitted when the call used `args` only, when compilation never produced argv, and on some in-`execute` error returns that attach a slimmer `details` shape before the unified merge (for example certain session-plan, stdin-contract, tab-pinning, or missing-binary guard paths); compare `extensions/agent-browser/index.ts` where `compiledSemanticAction` is assigned. For active sessions, role/name `click`, `check`, and `uncheck` semantic actions may be resolved through one fresh `snapshot -i` to a current visible `@ref` before execution; this avoids hidden duplicate matches stealing an upstream `find` action. In that case `details.compiledSemanticAction` still records the original semantic target while `details.effectiveArgs` shows the executed ref action.
|
|
118
121
|
|
|
119
122
|
If a raw `find` or compiled `semanticAction` fails with `failureCategory: "selector-not-found"`, the wrapper may run one fresh session-scoped `snapshot -i` and add visible `Current snapshot ref fallback`, `details.visibleRefFallback`, and `try-current-visible-ref` / `try-current-visible-ref-N` next actions when that snapshot contains exact role/name matches for the failed target. This direct-ref fallback is bounded to current snapshot refs and exact normalized role/name matches: role locators require `--name`, text-click falls back only to exact-name `button`/`link` refs, label-fill to exact-name `textbox`, and placeholder-fill to exact-name `searchbox`/`textbox`. It never fuzzy-matches names such as prefixes; when several exact refs match, each action carries safety copy telling agents to inspect the snapshot and choose only if unambiguous.
|
|
120
123
|
|
|
121
124
|
If a compiled `semanticAction` fails with `failureCategory: "selector-not-found"`, visible content can also include an `Agent-browser candidate fallbacks` block when the wrapper has bounded role/name retries for that locator and action, and `details.nextActions` includes the normal `refresh-interactive-refs` snapshot step plus those entries. When `session` was provided, candidate retry args preserve the same `--session <session>` prefix. Today `buildSemanticActionCandidateActions` in `extensions/agent-browser/index.ts` only appends candidates for: `fill` + `placeholder` → `try-searchbox-name-candidate` and `try-textbox-name-candidate` (same accessible name as `value`); `click` + `text` → `try-button-name-candidate` and `try-link-name-candidate`; `fill` + `label` → `try-labeled-textbox-candidate`. Other locator/action pairs omit this block. `fill` candidates keep the same trailing `text` token as the original compile before `--name <value>`; `click` candidates omit text. Each entry carries `safety` noting the match may be ambiguous. Candidate fallbacks are heuristics, not proof that an element exists; inspect the page when several controls could share the same name.
|
|
122
125
|
|
|
123
|
-
If a compiled `semanticAction` fails with `failureCategory: "stale-ref"`, `details.nextActions` includes `retry-semantic-action-after-stale-ref` with the same redacted compiled argv as `details.compiledSemanticAction` in `params.args` (any leading `--session` pair from `semanticAction.session`, then the `find` tokens). The wrapper appends that entry **after** any `refresh-interactive-refs` snapshot step from `buildAgentBrowserNextActions` in `extensions/agent-browser/lib/results/shared.ts` (see `extensions/agent-browser/index.ts` where `nextActions` is merged). That retry is only offered because the semantic target is stable and the stale-ref error proves the previous action did not execute; direct stale `@e…` commands still return
|
|
126
|
+
If a compiled `semanticAction` `find` action fails with `failureCategory: "stale-ref"`, `details.nextActions` includes `retry-semantic-action-after-stale-ref` with the same redacted compiled argv as `details.compiledSemanticAction` in `params.args` (any leading `--session` pair from `semanticAction.session`, then the `find` tokens). The wrapper appends that entry **after** any `refresh-interactive-refs` snapshot step from `buildAgentBrowserNextActions` in `extensions/agent-browser/lib/results/shared.ts` (see `extensions/agent-browser/index.ts` where `nextActions` is merged). That retry is only offered because the semantic target is stable and the stale-ref error proves the previous action did not execute; `select` shorthands with stale `@e…` selectors and direct stale `@e…` commands still return refresh guidance instead of an unsafe blind retry.
|
|
124
127
|
|
|
125
128
|
For direct page-scoped `@e…` refs, successful `snapshot` results record `details.refSnapshot` with the latest ref ids and page target for the session. Before mutation-prone ref commands such as `click`, `fill`, `check`, `select`, `download`, drag/upload/keyboard-style actions, or equivalent batch steps run, the wrapper rejects refs from an older page target or refs absent from the latest same-page snapshot with `failureCategory: "stale-ref"`. This is a best-effort wrapper guard against upstream ref-number recycling after navigation; it does not prove the DOM stayed unchanged after the snapshot. Refresh with the session-aware `refresh-interactive-refs` next action before retrying.
|
|
126
129
|
|
|
@@ -130,6 +133,8 @@ Examples:
|
|
|
130
133
|
{ "semanticAction": { "action": "click", "locator": "role", "value": "button", "name": "Export" } }
|
|
131
134
|
{ "semanticAction": { "action": "click", "locator": "text", "value": "Close" } }
|
|
132
135
|
{ "semanticAction": { "action": "fill", "locator": "label", "value": "Email", "text": "user@example.com" } }
|
|
136
|
+
{ "semanticAction": { "action": "select", "selector": "#flavor", "value": "chocolate" } }
|
|
137
|
+
{ "semanticAction": { "action": "select", "selector": "#multi", "values": ["dark", "compact"] } }
|
|
133
138
|
{ "semanticAction": { "action": "check", "locator": "label", "value": "Remember me" } }
|
|
134
139
|
{ "semanticAction": { "action": "uncheck", "locator": "label", "value": "Remember me" } }
|
|
135
140
|
{ "semanticAction": { "action": "click", "locator": "text", "value": "Close", "session": "named-browser" } }
|
|
@@ -142,10 +147,11 @@ Examples:
|
|
|
142
147
|
- top-level tool input only; do not nest `job` inside `batch` stdin
|
|
143
148
|
- constrained orchestration only: every step compiles to existing upstream `batch` argv and the compiled plan is echoed as `details.compiledJob`
|
|
144
149
|
- there is no separate reusable named “browser recipe” extension surface above `job`, `qa`, and raw `batch` yet; the closed `RQ-0068` decision, evidence bar, and revisit criteria are in [`ARCHITECTURE.md`](ARCHITECTURE.md#no-reusable-recipe-layer-yet) and [`SUPPORT_MATRIX.md`](SUPPORT_MATRIX.md)
|
|
145
|
-
- supported steps (each row becomes one upstream `batch` step; `click` / `fill` pass `selector` through as the same argv token shape standalone
|
|
150
|
+
- supported steps (each row becomes one upstream `batch` step; `click` / `fill` / `select` pass `selector` through as the same argv token shape standalone upstream commands would use, including `@refs`, not the `semanticAction` locator schema):
|
|
146
151
|
- `open` with `url`
|
|
147
152
|
- `click` with `selector`
|
|
148
153
|
- `fill` with `selector` and `text`
|
|
154
|
+
- `select` with `selector` plus either `value` or `values` (one or more option values; compiled as `select <selector> <value...>`)
|
|
149
155
|
- `wait` with positive integer `milliseconds`
|
|
150
156
|
- `assertText` with `text` (compiled as passive `wait --text <text>`)
|
|
151
157
|
- `assertUrl` with `url` pattern (compiled as `wait --url <pattern>`)
|
|
@@ -175,9 +181,11 @@ Compiled shape:
|
|
|
175
181
|
}
|
|
176
182
|
```
|
|
177
183
|
|
|
184
|
+
On app pages that expose a native dropdown, add a `select` step such as `{ "action": "select", "selector": "#flavor", "value": "chocolate" }` before the assertion that depends on it.
|
|
185
|
+
|
|
178
186
|
Use raw `args` plus `stdin` for upstream `batch` when a flow needs commands, flags, stdin forms, or failure policies outside this constrained schema.
|
|
179
187
|
|
|
180
|
-
Because `job` still executes as upstream `batch` with generated stdin, the same wrapper page-scoped `@e…` preflight applies: if you pass `@refs` in `click`/`fill` selectors after an `open`, `click`, or another step that can navigate or mutate the page, split the work across tool calls or switch to raw `batch` and insert your own `snapshot -i` rows between steps—the constrained `job` vocabulary does not emit snapshot steps for you. Multiple same-snapshot `fill @e…` rows may run before the first click/submit-style step.
|
|
188
|
+
Because `job` still executes as upstream `batch` with generated stdin, the same wrapper page-scoped `@e…` preflight applies: if you pass `@refs` in `click`/`fill`/`select` selectors after an `open`, `click`, `select`, or another step that can navigate or mutate the page, split the work across tool calls or switch to raw `batch` and insert your own `snapshot -i` rows between steps—the constrained `job` vocabulary does not emit snapshot steps for you. Multiple same-snapshot `fill @e…` rows may run before the first click/submit-style step.
|
|
181
189
|
|
|
182
190
|
### `qa`
|
|
183
191
|
|
|
@@ -228,13 +236,13 @@ Use raw `args` for direct upstream React inspection when you already know the ex
|
|
|
228
236
|
- type: object with at least one of `requestId`, `filter`, or `url`, plus optional `maxWorkspaceFiles`
|
|
229
237
|
- optional; mutually exclusive with `args`, `semanticAction`, `job`, `qa`, and `sourceLookup`
|
|
230
238
|
- experimental failed-request source-hint helper; it reports failed network requests and candidate source hints with evidence instead of assigning blame
|
|
231
|
-
- compiles to existing upstream `batch` commands only: `network request <requestId>` when provided plus `network requests` with `--filter <filter-or-url>` when a filter or URL is provided (if both are set, `filter` wins; when only `url` is set, it becomes the `--filter` argument)
|
|
239
|
+
- compiles to existing upstream `batch` commands only: `network request <requestId>` when provided plus `network requests` with `--filter <filter-or-url>` when a filter or URL is provided (if both are set, `filter` wins; when only `url` is set, it becomes the `--filter` argument); optional `session` prepends `--session <name>` before that generated `batch`
|
|
232
240
|
- detects failed requests from `status >= 400`, `failed: true`, or an `error` field
|
|
233
241
|
- candidate sources come from source-like initiator/stack metadata in upstream network results and bounded local workspace search for URL/path literals under the Pi session cwd
|
|
234
242
|
- optional `maxWorkspaceFiles` defaults to 2000 and cannot exceed 5000; workspace-search candidates are capped at ten
|
|
235
243
|
- reports `details.compiledNetworkSourceLookup` with the generated batch plan and `details.networkSourceLookup` with `{ status, failedRequests, candidates, limitations, summary }`
|
|
236
244
|
- `details.networkSourceLookup.status` is one of `failed-requests-found`, `no-failed-requests`, or `no-candidates`
|
|
237
|
-
- `details.networkSourceLookup.failedRequests[]` lists correlated failed requests (optional `requestId`, `url`, HTTP `status`, `method`, `error`) after the same failure heuristics as analysis
|
|
245
|
+
- `details.networkSourceLookup.failedRequests[]` lists correlated failed requests (optional `requestId`, `url`, HTTP `status`, `method`, `error`) after the same failure heuristics as analysis; those request URLs are diagnostic evidence only and do not replace the session’s active page target or invalidate the latest app-page `refSnapshot`
|
|
238
246
|
- each `candidates[]` entry uses `source` `initiator` (parsed from upstream initiator/stack/source/trace fields on matching requests) or `workspace-search` (string match of URL/path needles in local source files), plus `confidence`, `evidence`, optional `file`/`line`, and optional `requestUrl`; URLs and query parameters in these surfaces are redacted for model-facing output
|
|
239
247
|
|
|
240
248
|
Example:
|
|
@@ -371,10 +379,12 @@ Ref preflight details (implementation in `extensions/agent-browser/index.ts`):
|
|
|
371
379
|
|
|
372
380
|
**Presentation redaction (implementation map):** Successful non-`batch` tool calls and each successful `batchSteps[]` row run upstream `data` through `redactPresentationData` in `extensions/agent-browser/lib/results/presentation.ts`: `cookies` and `storage` walk objects/arrays and replace case-insensitive `value` keys with `"[REDACTED]"` (diagnostic formatters still describe rows without expanding secrets); every other command’s payload is recursively scrubbed with `redactStructuredPresentationValue`, which redacts known sensitive key names and applies string-level sensitivity heuristics so network, diff, trace/profiler, stream, dashboard, chat, and other structured results do not echo bearer tokens, proxy credentials, or similar fields verbatim into `details.data`. Echoed `command` arrays in `details` and in batch roll-ups use `redactInvocationArgs` from `extensions/agent-browser/lib/runtime.ts` to mask trailing values for sensitive global flags (including `--body`, `--headers`, `--password`, and `--proxy`), preserve the special positional rules for `cookies set`, `storage local|session set`, and `set credentials`, and scrub other argv tokens for URLs and inline secrets. Failed batch steps additionally run `redactExactValues` on structured step errors so literals taken from that step’s argv (cookie value, storage set value, `--password` / `--password=` tokens) cannot reappear inside formatted error blobs.
|
|
373
381
|
|
|
374
|
-
`nextActions` is an optional machine-readable list of exact native `agent_browser` follow-ups. Each entry includes `tool: "agent_browser"`, an `id`, a short `reason`, optional `safety`, and either `params` (`args`, optional `stdin`, optional `sessionMode`) or an `artifactPath` for saved-file workflows. Agents should prefer these payloads over prose when present. Current recommendations include: `open` success → `snapshot -i`; mutating/navigation commands (see `buildAgentBrowserNextActions` in source for the exact command set) → `snapshot -i`; stale refs and selector failures → `snapshot -i` via `refresh-interactive-refs` (prefixed with `--session <name>` when the failed call ran in a named or managed session); selector misses with exact current snapshot role/name matches → direct ref retries via `try-current-visible-ref` or bounded `try-current-visible-ref-N`; unknown getter shortcuts such as `title` / `url` → exact read-only retries like `get title` / `get url` with ids `use-get-title` / `use-get-url`; semantic `selector-not-found` failures that compiled from `semanticAction` may append `try-searchbox-name-candidate`, `try-textbox-name-candidate`, `try-button-name-candidate`, `try-link-name-candidate`, or `try-labeled-textbox-candidate` after presentation `nextActions` only for the bounded fill/click pairs enumerated under `semanticAction`; semantic `stale-ref` failures that compiled from `semanticAction` may also include `retry-semantic-action-after-stale-ref` after that snapshot step; qualifying same-URL top-level clicks (see `overlayBlockers` below) with fresh snapshot evidence of likely overlay/banner/dialog close controls may append `inspect-overlay-state` and bounded `try-overlay-blocker-candidate-*` entries; successful top-level `scroll` calls whose pre/post viewport and sampled scroll-container positions do not change may append `inspect-after-noop-scroll` and `verify-noop-scroll-visually`; explicit combobox-targeted actions that focus a combobox without visible options may append `inspect-focused-combobox`, `try-open-combobox-with-arrow`, and `try-open-combobox-with-enter`; `get text <selector>` calls with hidden/multiple CSS matches may append `inspect-visible-text-candidates` with a read-only `eval --stdin` probe (each prefixed with `--session <name>` when `details.sessionName` is set, same `sessionPrefixArgs` rule as other session-scoped follow-ups); confirmations → exact `confirm <id>` and `deny <id>` choices; tab drift → `tab list` then `snapshot -i`; download verification failures or missing successful download artifacts → `wait --download [path]`; saved artifacts → the artifact path to inspect/consume after checking `artifactVerification`/metadata; missing non-download artifacts → `verify-artifact-path` so agents do not trust an absent file. When nothing applies, the field is omitted.
|
|
382
|
+
`nextActions` is an optional machine-readable list of exact native `agent_browser` follow-ups. Each entry includes `tool: "agent_browser"`, an `id`, a short `reason`, optional `safety`, and either `params` (`args`, optional `stdin`, optional `sessionMode`, optional `networkSourceLookup`) or an `artifactPath` for saved-file workflows. Agents should prefer these payloads over prose when present. Current recommendations include: `open` success → `snapshot -i`; mutating/navigation commands (see `buildAgentBrowserNextActions` in source for the exact command set) → `snapshot -i`; stale refs and selector failures → `snapshot -i` via `refresh-interactive-refs` (prefixed with `--session <name>` when the failed call ran in a named or managed session); selector misses with exact current snapshot role/name matches → direct ref retries via `try-current-visible-ref` or bounded `try-current-visible-ref-N`; unknown getter shortcuts such as `title` / `url` → exact read-only retries like `get title` / `get url` with ids `use-get-title` / `use-get-url`; compact `network requests` results with safe request IDs → bounded read-only request detail, `networkSourceLookup`, path filter, or HAR-capture follow-ups; semantic `selector-not-found` failures that compiled from `semanticAction` may append `try-searchbox-name-candidate`, `try-textbox-name-candidate`, `try-button-name-candidate`, `try-link-name-candidate`, or `try-labeled-textbox-candidate` after presentation `nextActions` only for the bounded fill/click pairs enumerated under `semanticAction`; semantic `stale-ref` failures that compiled from `semanticAction` `find` argv may also include `retry-semantic-action-after-stale-ref` after that snapshot step; qualifying same-URL top-level clicks (see `overlayBlockers` below) with fresh snapshot evidence of likely overlay/banner/dialog close controls may append `inspect-overlay-state` and bounded `try-overlay-blocker-candidate-*` entries; successful top-level `scroll` calls whose pre/post viewport and sampled scroll-container positions do not change may append `inspect-after-noop-scroll` and `verify-noop-scroll-visually`; explicit combobox-targeted actions that focus a combobox without visible options may append `inspect-focused-combobox`, `try-open-combobox-with-arrow`, and `try-open-combobox-with-enter`; `get text <selector>` calls with hidden/multiple CSS matches may append `inspect-visible-text-candidates` with a read-only `eval --stdin` probe (each prefixed with `--session <name>` when `details.sessionName` is set, same `sessionPrefixArgs` rule as other session-scoped follow-ups); confirmations → exact `confirm <id>` and `deny <id>` choices; tab drift → `tab list` then `snapshot -i`; download verification failures or missing successful download artifacts → `wait --download [path]`; saved artifacts → the artifact path to inspect/consume after checking `artifactVerification`/metadata; missing non-download artifacts → `verify-artifact-path` so agents do not trust an absent file. When nothing applies, the field is omitted.
|
|
375
383
|
|
|
376
384
|
**Unknown-command getter hints (failure presentation):** `buildToolPresentation` in `extensions/agent-browser/lib/results/presentation.ts` only runs this path when upstream error text (after model-facing redaction) matches `unknown command`, `unknown subcommand`, or `unrecognized command` (case-insensitive) **and** the failed invocation’s primary command token is one of `attr`, `count`, `html`, `text`, `title`, `url`, or `value`. Visible text then includes a grouped-`get` hint line plus per-token guidance (`get text <selector>`, `get html …`, `get attr …`, `get count …`, `get value …`, `get title`, `get url`). Machine `nextActions` with ids `use-get-title` / `use-get-url` are emitted only for `title` / `url`, with `params.args` optionally prefixed by `--session <name>` when the failed call targeted a named session. If the error string already contains `Agent-browser hint:` from selector recovery (stale-ref or unsupported selector dialect appendages), the getter block is skipped so two stacked `Agent-browser hint:` headers are not emitted.
|
|
377
385
|
|
|
386
|
+
For `network requests`, `details.nextActions` is bounded to one selected safe request ID, preferring actionable failed rows, then API/fetch-like rows, then benign failed rows, then the first request with a safe ID. Detail/filter/HAR actions use `params.args` and preserve a known `--session <name>` prefix when the current presentation has `details.sessionName`; source-candidate actions use `params.networkSourceLookup` with the selected `requestId` plus `session` when known and are only emitted for actionable failed rows that the failed-request analyzer can correlate. URLs and query strings are not copied into action params; path filters are skipped when they look sensitive or too large.
|
|
387
|
+
|
|
378
388
|
For `batch`, each `batchSteps[]` entry can carry its own `nextActions` for that step’s success or failure. Top-level `details.nextActions` on a failed batch duplicates `batchFailure.failedStep.nextActions` so callers can read one aggregate object. On a fully successful batch, top-level `nextActions` may still list artifact follow-ups derived from the combined step artifacts.
|
|
379
389
|
|
|
380
390
|
`pageChangeSummary` is an optional compact summary for mutation-prone and artifact-producing commands. It includes `changeType` (`"navigation"`, `"mutation"`, `"artifact"`, or `"confirmation"`), `command`, a readable `summary`, optional `title`/`url`, optional `artifactCount` or `savedFilePath`, and `nextActionIds` that link the observed change to `nextActions` without repeating full payloads. The wrapper maintains an explicit allowlist of mutation-prone commands in `extensions/agent-browser/lib/results/presentation.ts` (`PAGE_CHANGE_SUMMARY_COMMANDS`): those commands still emit a `mutation`-typed summary when upstream JSON lacks navigation metadata, as long as no stronger signal (artifact, saved path, navigation fields, or pending confirmation) applies. Commands outside that set omit `pageChangeSummary` unless the parsed payload shows navigation, a confirmation prompt, saved files, or artifacts—including read-only inspection commands, which normally have no summary unless one of those signals appears. For `batch`, the top-level summary favors artifact rollups when any step produced artifacts; otherwise it may synthesize a `mutation` summary from steps that carried their own `pageChangeSummary`. Treat mutation summaries as "upstream attempted the action" evidence, not proof the application handled it; agents should verify URL/text/state for important mutations before continuing.
|
|
@@ -395,7 +405,7 @@ Example shape (fields vary by scenario):
|
|
|
395
405
|
]
|
|
396
406
|
```
|
|
397
407
|
|
|
398
|
-
When `semanticAction` produced compiled argv
|
|
408
|
+
When `semanticAction` produced compiled `find` argv and the unified result is `failureCategory: "stale-ref"` with `details.compiledSemanticAction` still present, `nextActions` chains snapshot refresh then the compiled `find` retry; `select` shorthands with stale `@refs` stop at refresh guidance. `reason` / `safety` strings match `buildAgentBrowserNextActions` in `extensions/agent-browser/lib/results/shared.ts` and the append in `extensions/agent-browser/index.ts`:
|
|
399
409
|
|
|
400
410
|
```json
|
|
401
411
|
"nextActions": [
|
|
@@ -429,14 +439,14 @@ When `semanticAction` produced compiled argv but the unified result is `failureC
|
|
|
429
439
|
|
|
430
440
|
Implementation and precedence:
|
|
431
441
|
|
|
432
|
-
- Types, classifiers, and follow-up assembly live in `extensions/agent-browser/lib/results/shared.ts`: `classifyAgentBrowserSuccessCategory`, `classifyAgentBrowserFailureCategory`, `buildAgentBrowserResultCategoryDetails` (the last prefers an explicit `failureCategory` when the caller already knows the bucket, otherwise it runs the classifier), and `buildAgentBrowserNextActions`. Failed upstream `network requests` rows also flow through `classifyNetworkRequestFailure` / `summarizeNetworkFailures` there for QA analysis (`analyzeQaPresetResults` in `extensions/agent-browser/index.ts`) and for actionable-vs-benign lines in `network requests` presentation (`extensions/agent-browser/lib/results/presentation.ts`).
|
|
442
|
+
- Types, classifiers, and generic follow-up assembly live in `extensions/agent-browser/lib/results/shared.ts`: `classifyAgentBrowserSuccessCategory`, `classifyAgentBrowserFailureCategory`, `buildAgentBrowserResultCategoryDetails` (the last prefers an explicit `failureCategory` when the caller already knows the bucket, otherwise it runs the classifier), and `buildAgentBrowserNextActions`. Failed upstream `network requests` rows also flow through `classifyNetworkRequestFailure` / `summarizeNetworkFailures` there for QA analysis (`analyzeQaPresetResults` in `extensions/agent-browser/index.ts`) and for actionable-vs-benign lines plus request-specific nextActions in `network requests` presentation (`extensions/agent-browser/lib/results/presentation.ts`).
|
|
433
443
|
- Artifact verification: `ArtifactVerificationSummary` / `ArtifactVerificationEntry` types live in `shared.ts`. `buildArtifactVerificationSummary`, `getArtifactVerificationEntry`, and `getManifestVerificationEntry` in `presentation.ts` merge each resolved file artifact with manifest rows whose `storageScope` is not `explicit-path` (those rows duplicate file artifacts) and whose `path` is in the current result’s spill path set. Successful presentation merges then run `classifyPresentationSuccessCategory` in `presentation.ts`, which forces `successCategory: "artifact-unverified"` when `artifactVerification.missingCount` or `artifactVerification.unverifiedCount` is greater than zero before delegating to `classifyAgentBrowserSuccessCategory`.
|
|
434
444
|
- Inner success categories (`classifyAgentBrowserSuccessCategory` in `shared.ts`, after verification counts are clear): if `inspection` is true → `"inspection"`; else if any non-pending artifact lacks confirmed on-disk presence (`exists !== true`) → `"artifact-unverified"`; else if there is a `savedFile` or any `artifacts` → `"artifact-saved"`; else → `"completed"`.
|
|
435
445
|
- Failure: the classifier walks a single ordered chain (first match wins): `confirmation-required` → `timeout` → `missing-binary` → `parse-failure` → `aborted` → `tab-drift` → `stale-ref` (including “unknown ref” text and a narrow `@eN` plus “element not found” heuristic) → `selector-unsupported` → `selector-not-found` → `download-not-verified` (download / wait-download style failures) → `validation-error` when a wrapper `validationError` is present → default `upstream-error`.
|
|
436
446
|
- The main tool implementation merges these fields into Pi-facing `details` from `extensions/agent-browser/index.ts` and from `extensions/agent-browser/lib/results/presentation.ts` for presentation-time failures.
|
|
437
447
|
|
|
438
448
|
Additional structured fields can appear when relevant:
|
|
439
|
-
- `compiledSemanticAction` when the call used `semanticAction` and the result includes the unified `details` merge: `{ action, locator, args }` with the same redaction rules as `args` / `effectiveArgs`; omitted for plain `args`/`job` calls and omitted on some early error returns that omit this field (see the `semanticAction` section above)
|
|
449
|
+
- `compiledSemanticAction` when the call used `semanticAction` and the result includes the unified `details` merge: `{ action, locator, args }` for `find` actions or `{ action: "select", selector, values, args }` for `select`, with the same redaction rules as `args` / `effectiveArgs`; omitted for plain `args`/`job` calls and omitted on some early error returns that omit this field (see the `semanticAction` section above)
|
|
440
450
|
- `compiledJob` when the call used `job` or the job-backed `qa` preset: `{ args: ["batch"], stdin, steps: [{ action, args }] }`, with step args redacted the same way as other invocation details
|
|
441
451
|
- `compiledQaPreset` when the call used `qa`: the compiled job fields plus the QA `checks` object
|
|
442
452
|
- `qaPreset` when the call used `qa`: `{ passed, failedChecks, warnings, summary }`. Network rows inside the `network requests` batch step use `summarizeNetworkFailures` / `classifyNetworkRequestFailure` in `shared.ts`: actionable failures appear in `failedChecks` (and fail the tool when the upstream batch still succeeded); benign icon-classified failures appear only in `warnings` and in `summary` as `QA preset passed with warnings: …` when nothing else failed.
|
|
@@ -449,7 +459,7 @@ Additional structured fields can appear when relevant:
|
|
|
449
459
|
- `comboboxFocus` after a successful explicit combobox-targeted `click` / `fill` / `find … click|fill` (for example `semanticAction` with role `combobox`, including when that semantic action resolves through a current visible `@ref` before execution) when a read-only probe sees the active element is combobox-like, `aria-expanded` is explicitly present (`false` or `true`), and no visible `listbox` / `option` / menu option elements are open. Shape: `{ reason: "focused-combobox-without-visible-options", message, activeElement, visibleListboxCount, visibleOptionCount, recommendations }`; `activeElement` includes bounded role/tag/expanded/hasPopup/name metadata with normal text redaction. Visible text appends `Combobox diagnostic: focused combobox did not expose visible options`, and `details.nextActions` gains `inspect-focused-combobox` (`snapshot -i`), `try-open-combobox-with-arrow` (`press ArrowDown`), and `try-open-combobox-with-enter` (`press Enter`), session-prefixed when applicable. The diagnostic is deliberately gated to explicit combobox-targeted calls to avoid extra probes or false positives on ordinary clicks/textboxes.
|
|
450
460
|
- `recordingDependencyWarning` after a successful `record start` or `record restart` when the wrapper cannot find an executable `ffmpeg` on the Pi process `PATH`. Shape: `{ reason: "ffmpeg-missing-for-recording", dependency: "ffmpeg", command, message, recommendations }`. Visible text appends `Recording dependency warning: ffmpeg not found on PATH`. This is a non-blocking preflight warning: upstream may start recording, but `record stop` needs `ffmpeg` to encode the WebM.
|
|
451
461
|
- `selectorTextVisibility` after a **successful** upstream `get text <selector>` (standalone or inside a successful `batch`) when the wrapper’s follow-up probe finds a hazard: more than one DOM match (upstream reads the first `querySelectorAll` hit, which may be the wrong tab/panel), or the first match is hidden while at least one other match is visible (requires multiple DOM nodes so a visible peer exists; a lone hidden match is not flagged). The probe is a read-only `eval --stdin` script (`buildVisibleTextProbeScript` in `extensions/agent-browser/index.ts`) that counts matches, applies a small visibility heuristic (`display`/`visibility`/`opacity` plus non-zero client rects), and may include a redacted `firstVisibleTextPreview`. It is **not** run for page-scoped `@e…` selectors or when the selector string is withheld because `selectorMayExposeSensitiveLiteral` would risk echoing secrets in probe output. `details.selectorTextVisibility` mirrors the primary diagnostic (first sorted entry); when several selectors in one `batch` qualify, `selectorTextVisibilityAll` lists every diagnostic sorted so hidden-first cases precede generic multi-match ambiguity. Appended `details.nextActions` use ids `inspect-visible-text-candidates` and `inspect-visible-text-candidates-2`, … with the probe replayed via `eval --stdin` for each hazardous selector.
|
|
452
|
-
- `evalStdinHint` after a successful `eval --stdin` when caller stdin (trimmed) looks function-shaped to the wrapper’s lightweight detector (`looksLikeFunctionEvalStdin` in `extensions/agent-browser/index.ts`: leading `function` / `async function`, parenthesized arrow `(…) =>`, or a concise `name =>` / `async name =>` form) **and** upstream JSON `data` is an object whose `result` field is
|
|
462
|
+
- `evalStdinHint` after a successful `eval --stdin` when caller stdin (trimmed) looks function-shaped to the wrapper’s lightweight detector (`looksLikeFunctionEvalStdin` in `extensions/agent-browser/index.ts`: leading `function` / `async function`, parenthesized arrow `(…) =>`, or a concise `name =>` / `async name =>` form) **and** upstream JSON `data` is an object whose `result` field is a plain empty object (`{}`). Arrays such as `[]` do not qualify. It includes `reason` and `suggestion`; visible output appends `Eval stdin hint` with the same guidance. This is a heuristic for the common mistake of returning a function object instead of invoking it or passing a plain expression, not a JavaScript parser or proof that the page returned no useful data.
|
|
453
463
|
- `timeoutPartialProgress` after `runAgentBrowserProcess` reports `timedOut` (wrapper child-process watchdog) when best-effort recovery finds useful context. `summary` is a short sentence counting how many declared artifact paths exist on disk versus how many were scanned, and whether page context came from live session reads or only from a planned URL (when nothing in the plan declares an artifact path, the fraction may read `0/0` while `currentPage` can still carry session or planned URL context). `steps` lists planned argv from the compiled `job` or `qa` batch plan (`compiledJob` in `extensions/agent-browser/index.ts`, which is only populated for those top-level modes) or, when that object is absent, from the same JSON-array `batch` stdin the tool sends upstream—whether caller-authored or wrapper-generated for `sourceLookup` / `networkSourceLookup` (1-based indices; only JSON-array stdin whose elements are string[] argv arrays is parsed); timeouts on other argv shapes may still emit `currentPage` / summary evidence without `steps`. `currentPage` comes from session-scoped `get url` / `get title` when the session answers, otherwise a fallback URL may be inferred from the last `open` / `navigate` / `pushstate` step in the plan. `artifacts` covers declared output paths on `screenshot`, `pdf`, `download`, and `wait --download` steps (absolute path, existence, optional `sizeBytes`, `stepIndex`). Visible text repeats the same block under `Timeout partial progress`, applying URL and path-segment redaction; the prose `Planned steps` list shows at most six steps, then an omitted-count line when the plan is longer. This is recovery evidence only; missing entries do not prove the upstream step never ran or that no other side effects occurred.
|
|
454
464
|
- `managedSessionOutcome` after a managed-session plan reaches process execution (`buildManagedSessionOutcome` / `formatManagedSessionOutcomeText` in `extensions/agent-browser/index.ts`). Populated when `buildExecutionPlan` injects an extension-managed implicit or fresh `--session` (omitted when the caller already set explicit upstream `--session` or for stateless inspection paths that skip injection). Fields: `status` (`created`, `replaced`, `unchanged`, `closed`, `preserved`, or `abandoned`), `sessionMode`, `attemptedSessionName`, `previousSessionName`, `currentSessionName`, optional `replacedSessionName`, `activeBefore`, `activeAfter`, `succeeded`, and `summary`. Model-visible echo: only when `sessionMode` is `"fresh"` **and** `succeeded` is false, the wrapper appends a line of the form `Managed session outcome: ${summary}` after the primary presentation (including missing-binary failures on a fresh plan, where it follows the missing-binary message and no other diagnostic tail runs). When other trailing diagnostic prose is also emitted in the same result, that line is concatenated **after** semantic-action candidate lines, overlay/selector-visibility tails, and `Timeout partial progress` (see `rawAppendedDiagnosticText` in `extensions/agent-browser/index.ts`). For `"auto"` failures the same struct may appear on `details` without that extra line. When post-upstream analysis (for example **`qa`** preset failure) flips the overall tool result after a successful batch, the implementation only realigns `managedSessionOutcome.succeeded` to the final outcome; `status`/`summary` may still describe the managed-session transition (for example `replaced` while `failureCategory` is `qa-failure`), so read `failureCategory` / `qaPreset` / `batchFailure` alongside this object.
|
|
455
465
|
- `imagePath` / `imagePaths` for Pi inline image attachments from the **`screenshot`** command (including batched screenshot steps). **`diff screenshot`** still records the diff output as an `image`-kind entry in `details.artifacts`, but it does **not** populate `imagePath` / `imagePaths` or attach an inline image: only plain `screenshot` is treated as a trusted live-capture path for automatic inlining (`isTrustedScreenshotOutput` in `extensions/agent-browser/lib/results/presentation.ts`).
|
|
@@ -488,7 +498,7 @@ Worth doing in v1:
|
|
|
488
498
|
- navigation actions like `click`, `back`, `forward`, and `reload` → lightweight post-action title/url summary when available
|
|
489
499
|
- tab lists → compact summary/table
|
|
490
500
|
- stream status → enabled/connected/port summary plus WebSocket URL and frame format when a port is known; if the caller explicitly passed `--json`, visible text is valid JSON instead of a prose summary
|
|
491
|
-
- diagnostic/status families (`session`, `session list`, `profiles`, `doctor`, `auth list`/`show`, `cookies`, `storage`, `dialog`, `frame`, `state`, `network requests`, `console`, `errors`, and dashboard start/stop/status outputs) → compact readable summaries with counts and stable fields; network request lists include an actionable-vs-benign failed-request summary and mark low-impact browser icon failures separately; large log/request/error outputs use previews plus `fullOutputPath` spill files; sensitive nested auth/header/token fields are not expanded in the model-facing text
|
|
501
|
+
- diagnostic/status families (`session`, `session list`, `profiles`, `doctor`, `auth list`/`show`, `cookies`, `storage`, `dialog`, `frame`, `state`, `network requests`, `console`, `errors`, and dashboard start/stop/status outputs) → compact readable summaries with counts and stable fields; network request lists include an actionable-vs-benign failed-request summary and mark low-impact browser icon failures separately; request-detail URLs from `network request` remain diagnostic-only rather than session page targets; large log/request/error outputs use previews plus `fullOutputPath` spill files; sensitive nested auth/header/token fields are not expanded in the model-facing text
|
|
492
502
|
- trace/profiler owner conflicts → when the wrapper has observed one owner active for a session, block conflicting starts/stops with "wrapper believes ..." wording because upstream or external CLI use can desynchronize wrapper-local state
|
|
493
503
|
|
|
494
504
|
## Missing binary behavior
|