pi-agent-browser-native 0.2.30 → 0.2.31
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/CHANGELOG.md +13 -0
- package/README.md +11 -8
- package/docs/ARCHITECTURE.md +3 -3
- package/docs/COMMAND_REFERENCE.md +12 -8
- package/docs/RELEASE.md +11 -11
- package/docs/REQUIREMENTS.md +4 -3
- package/docs/SUPPORT_MATRIX.md +13 -5
- package/docs/TOOL_CONTRACT.md +30 -20
- package/extensions/agent-browser/index.ts +145 -33
- package/extensions/agent-browser/lib/playbook.ts +10 -10
- package/extensions/agent-browser/lib/results/presentation.ts +154 -2
- package/extensions/agent-browser/lib/results/shared.ts +7 -1
- package/package.json +1 -1
package/CHANGELOG.md
CHANGED
|
@@ -1,5 +1,18 @@
|
|
|
1
1
|
# Changelog
|
|
2
2
|
|
|
3
|
+
## 0.2.31 - 2026-05-18
|
|
4
|
+
|
|
5
|
+
### Added
|
|
6
|
+
- First-class native dropdown selection for `agent_browser`: `semanticAction.action = "select"` and constrained `job` `select` steps now compile to upstream `select <selector> <value...>`, with tests against fake and real upstream fixtures.
|
|
7
|
+
- Bounded machine `details.nextActions` for compact `network requests` output, including exact request-detail, source-lookup, filter, and HAR-capture follow-ups with session preservation and sensitive path/query suppression.
|
|
8
|
+
|
|
9
|
+
### Changed
|
|
10
|
+
- Release smoke guidance now uses bounded extension-focused prompts with `--no-skills` for Sauce Demo validation, keeping skill-enabled dogfood/report routing as a separate test mode.
|
|
11
|
+
- Network diagnostics preserve app page/ref context so request-detail and `networkSourceLookup` URLs do not replace the active browser target or stale current-page refs.
|
|
12
|
+
|
|
13
|
+
### Fixed
|
|
14
|
+
- Narrowed the `eval --stdin` empty-result hint so valid empty array results no longer warn like uninvoked function snippets that serialize to `{}`.
|
|
15
|
+
|
|
3
16
|
## 0.2.30 - 2026-05-18
|
|
4
17
|
|
|
5
18
|
### Added
|
package/README.md
CHANGED
|
@@ -56,7 +56,7 @@ The result is optimized for agent work:
|
|
|
56
56
|
|
|
57
57
|
| Pain | Native wrapper capability | Proof surface |
|
|
58
58
|
|---|---|---|
|
|
59
|
-
| Agents build fragile shell commands | Exposes `agent_browser` with exact `args`, an optional `semanticAction` shorthand for common `find` flows
|
|
59
|
+
| Agents build fragile shell commands | Exposes `agent_browser` with exact `args`, an optional `semanticAction` shorthand for common `find` flows and native `select`, constrained `job` / `qa` presets and experimental `sourceLookup` / `networkSourceLookup` that compile short workflows to `batch`, plus controlled `stdin` and `sessionMode` | `extensions/agent-browser/index.ts`, [`docs/TOOL_CONTRACT.md`](docs/TOOL_CONTRACT.md) |
|
|
60
60
|
| Page snapshots are too large | Shows compact, main-content-first summaries, surfaces an `Omitted high-value controls` section (plus `details.data.highValueControlRefIds`) when dense pages hide inputs and tabs from the trimmed ref lists, and stores full raw output in spill files when needed | `extensions/agent-browser/lib/results/snapshot.ts`, `test/agent-browser.presentation.test.ts` |
|
|
61
61
|
| Screenshots/downloads get lost in text | Normalizes artifact paths and reports existence, size, cwd, session, and repair status | [`docs/COMMAND_REFERENCE.md`](docs/COMMAND_REFERENCE.md#download-screenshot-and-pdf-files) |
|
|
62
62
|
| Profile restores and tab drift confuse agents | Tracks managed sessions, re-selects target tabs after observed drift, and pins later commands only for sessions with drift/restored-session risk | generated tab-recovery notes below; `test/agent-browser.resume-state.test.ts` |
|
|
@@ -66,7 +66,7 @@ The result is optimized for agent work:
|
|
|
66
66
|
| Agents need stable success/failure buckets | Exposes bounded `resultCategory`, `successCategory`, and `failureCategory` on tool `details` for branching without parsing prose | [`docs/TOOL_CONTRACT.md`](docs/TOOL_CONTRACT.md#details), `extensions/agent-browser/lib/results/shared.ts`, `test/agent-browser.results.test.ts` |
|
|
67
67
|
| Models re-snapshot after every click without new URL/title context | Adds optional `details.pageChangeSummary` (and per-batch-step summaries) with `changeType`, compact text, optional `title`/`url`, artifact hints, and `nextActionIds` aligned to `nextActions`; no-navigation clicks can also surface evidence-backed `details.overlayBlockers` candidates | [`docs/TOOL_CONTRACT.md`](docs/TOOL_CONTRACT.md#details), `extensions/agent-browser/lib/results/presentation.ts`, `test/agent-browser.presentation.test.ts` |
|
|
68
68
|
| Dashboard scroll commands can look successful while nothing moves | Samples viewport and prominent scroll-container positions around top-level `scroll` calls; unchanged positions produce `details.scrollNoop`, visible recovery guidance, and exact `nextActions` for snapshot/screenshot verification | [`docs/TOOL_CONTRACT.md`](docs/TOOL_CONTRACT.md#details), [`docs/COMMAND_REFERENCE.md`](docs/COMMAND_REFERENCE.md#core-page-and-element-commands), `test/agent-browser.extension-validation.test.ts` |
|
|
69
|
-
|
|
|
69
|
+
| Dropdown/combobox clicks can focus or hit native option box-model errors | Adds first-class `select <selector> <value...>` paths through raw `args`, `semanticAction`, and `job`; for custom combobox clicks, detects focused controls with explicit `aria-expanded` state but no visible options and returns `details.comboboxFocus` plus exact recovery `nextActions` | [`docs/TOOL_CONTRACT.md`](docs/TOOL_CONTRACT.md#details), [`docs/COMMAND_REFERENCE.md`](docs/COMMAND_REFERENCE.md#core-page-and-element-commands), `test/agent-browser.extension-validation.test.ts` |
|
|
70
70
|
| Recording workflows fail late when `ffmpeg` is missing | After successful `record start` / `record restart`, warns when `ffmpeg` is not on `PATH` so agents can install or fix PATH before `record stop` | [`docs/TOOL_CONTRACT.md`](docs/TOOL_CONTRACT.md#details), [`docs/COMMAND_REFERENCE.md`](docs/COMMAND_REFERENCE.md#diagnostics-performance-and-recording), `test/agent-browser.extension-validation.test.ts` |
|
|
71
71
|
| Direct binary help may be blocked in agent sessions | Publishes a repo-readable command reference and verifies it against the target upstream version | `npm run verify` |
|
|
72
72
|
| Agents need bundled `skills` text without touching the live session | Treats `skills list`, `skills get …`, and `skills path …` as stateless JSON reads: no implicit managed `--session` under default `sessionMode: "auto"` (same session-ownership goal as plain-text `--help` / `--version`), while provider workflows stay thin passthroughs that require upstream setup and credentials | [`docs/COMMAND_REFERENCE.md`](docs/COMMAND_REFERENCE.md#built-in-skills), `extensions/agent-browser/lib/runtime.ts` |
|
|
@@ -198,11 +198,12 @@ Download a file from a known link or control:
|
|
|
198
198
|
|
|
199
199
|
### Locator shorthand (`semanticAction`)
|
|
200
200
|
|
|
201
|
-
For supported upstream `find` flows you can omit hand-built `args` and pass a top-level `semanticAction` object instead. The wrapper compiles
|
|
201
|
+
For supported upstream `find` flows and native dropdown selection you can omit hand-built `args` and pass a top-level `semanticAction` object instead. The wrapper compiles locator actions to the same `find` argv upstream already understands, or compiles `action: "select"` to upstream `select <selector> <value...>`; compiled argv is echoed as `details.compiledSemanticAction` when the unified result includes that field. Full field rules live in [`docs/TOOL_CONTRACT.md#semanticaction`](docs/TOOL_CONTRACT.md#semanticaction).
|
|
202
202
|
|
|
203
203
|
```json
|
|
204
204
|
{ "semanticAction": { "action": "click", "locator": "text", "value": "Submit" } }
|
|
205
205
|
{ "semanticAction": { "action": "fill", "locator": "label", "value": "Email", "text": "user@example.com" } }
|
|
206
|
+
{ "semanticAction": { "action": "select", "selector": "#flavor", "value": "chocolate" } }
|
|
206
207
|
{ "semanticAction": { "action": "click", "locator": "text", "value": "Close", "session": "named-browser" } }
|
|
207
208
|
```
|
|
208
209
|
|
|
@@ -211,9 +212,9 @@ Typical pitfalls:
|
|
|
211
212
|
- Supply **exactly one** of `args`, `semanticAction`, `job`, `qa`, `sourceLookup`, or `networkSourceLookup` per call (not more, not none).
|
|
212
213
|
- `semanticAction` and `job` are **not** valid inside `batch` stdin; batch steps stay upstream argv string arrays (spell a `find` step as tokens there if you need it in a batch).
|
|
213
214
|
- Commands or locators outside the supported shorthand still require explicit `args`. Common page getters are grouped under `get`: use `get title`, `get url`, or `get text <selector>` rather than shortcut commands such as `title` or `url`; unknown getter shortcuts can return read-only `details.nextActions` like `use-get-title`.
|
|
214
|
-
- Use `semanticAction.session` to target a named upstream browser session; the wrapper prepends `--session <name>` before `find` and keeps that prefix on retry/candidate actions. In active sessions, role/name click/check/uncheck shorthands may resolve through the current `snapshot -i` refs before execution so hidden duplicate matches do not steal the action; `details.effectiveArgs` shows the exact executed argv.
|
|
215
|
+
- Use `semanticAction.session` to target a named upstream browser session; the wrapper prepends `--session <name>` before the compiled `find` or `select` argv and keeps that prefix on retry/candidate actions. In active sessions, role/name click/check/uncheck shorthands may resolve through the current `snapshot -i` refs before execution so hidden duplicate matches do not steal the action; `details.effectiveArgs` shows the exact executed argv.
|
|
215
216
|
- Do not reuse `@e…` refs across navigation. The wrapper records the latest snapshot refs per session and fails mutation-prone stale/recycled refs before upstream can silently hit a different current-page element; use the session-aware `refresh-interactive-refs` next action.
|
|
216
|
-
- If upstream classifies the failure as `stale-ref` and `details.compiledSemanticAction` is present, `details.nextActions` may list `retry-semantic-action-after-stale-ref` after `refresh-interactive-refs`, carrying the same compiled `find` argv so you can retry the locator-stable target once it is safe to do so (contract in [`docs/TOOL_CONTRACT.md#semanticaction`](docs/TOOL_CONTRACT.md#semanticaction)).
|
|
217
|
+
- If upstream classifies the failure as `stale-ref` and `details.compiledSemanticAction` is present for a compiled `find` action, `details.nextActions` may list `retry-semantic-action-after-stale-ref` after `refresh-interactive-refs`, carrying the same compiled `find` argv so you can retry the locator-stable target once it is safe to do so. `select` calls that used stale `@refs` only get refresh guidance; use a fresh snapshot or stable selector before retrying (contract in [`docs/TOOL_CONTRACT.md#semanticaction`](docs/TOOL_CONTRACT.md#semanticaction)).
|
|
217
218
|
- If the failure is `selector-not-found`, the wrapper may take one fresh snapshot and add `Current snapshot ref fallback` plus `try-current-visible-ref*` next actions when that snapshot has exact visible role/name matches for the failed `find` / `semanticAction` target. It still adds `Agent-browser candidate fallbacks` for bounded semanticAction role/name retries (`fill` + `placeholder`, `click` + `text`, or `fill` + `label`); prefer these payloads or a fresh snapshot over guessing new selectors (same contract link).
|
|
218
219
|
- A successful upstream `click` is not proof that the web app handled the event or changed state. When the task depends on a mutation, follow `inspect-after-mutation` / `pageChangeSummary` evidence with a wait, URL/text check, or fresh snapshot before trusting the result; if the target still did not change, retry with a current visible ref or stable selector and report the workflow issue instead of silently continuing. Preserve explicit user stop boundaries: if the user says to stop before order/post/purchase/submit, gather evidence on that page and do not click the final action.
|
|
219
220
|
- If a **top-level** `click` succeeds (unified command `click`, not a `batch` step), upstream reports `data.clicked`, and the tab URL is unchanged under the same normalization as ref preflight (fragment-insensitive), the wrapper may take one extra `snapshot -i` and add `Possible overlay blockers` with `details.overlayBlockers` (`candidates`, `summary`, optional `snapshot` refresh for refs) plus session-aware `inspect-overlay-state` / bounded `try-overlay-blocker-candidate-*` next actions when that snapshot shows strong modal context (`dialog` / `alertdialog`) and close/dismiss-like controls. Page-wide words like privacy, sign in, or banner alone do not trigger this diagnostic. The unchanged-URL check uses `details.navigationSummary`, which is populated with one read-only `eval` summary when the click JSON omits **both** string `data.url` and `data.title`; if upstream already includes either, overlay diagnostics are skipped here. Also skipped when tab correction or about-blank recovery already ran on that result.
|
|
@@ -221,7 +222,7 @@ Typical pitfalls:
|
|
|
221
222
|
|
|
222
223
|
### Constrained browser jobs
|
|
223
224
|
|
|
224
|
-
For short repeatable workflows, pass a top-level `job` instead of hand-writing `batch` stdin. The wrapper only supports constrained steps (`open`, `click`, `fill`, `wait`, `assertText`, `assertUrl`, `waitForDownload`, and `screenshot`), compiles them to existing upstream `batch` commands, and echoes the compiled commands as `details.compiledJob` for auditability. The same compile path backs top-level `qa`, so long `qa` runs surface the same timeout evidence shape. If a long `job`, `qa`, or `batch` hits the wrapper watchdog, `details.timeoutPartialProgress` may recover planned steps, current page title/URL, and declared artifact paths that already exist on disk (see [`docs/TOOL_CONTRACT.md#details`](docs/TOOL_CONTRACT.md#details)). There is no separate catalog of reusable named browser recipes above `job`, `qa`, and raw `batch`; see [`docs/ARCHITECTURE.md#no-reusable-recipe-layer-yet`](docs/ARCHITECTURE.md#no-reusable-recipe-layer-yet) for the closed `RQ-0068` decision and when to revisit it.
|
|
225
|
+
For short repeatable workflows, pass a top-level `job` instead of hand-writing `batch` stdin. The wrapper only supports constrained steps (`open`, `click`, `fill`, `select`, `wait`, `assertText`, `assertUrl`, `waitForDownload`, and `screenshot`), compiles them to existing upstream `batch` commands, and echoes the compiled commands as `details.compiledJob` for auditability. The same compile path backs top-level `qa`, so long `qa` runs surface the same timeout evidence shape. If a long `job`, `qa`, or `batch` hits the wrapper watchdog, `details.timeoutPartialProgress` may recover planned steps, current page title/URL, and declared artifact paths that already exist on disk (see [`docs/TOOL_CONTRACT.md#details`](docs/TOOL_CONTRACT.md#details)). There is no separate catalog of reusable named browser recipes above `job`, `qa`, and raw `batch`; see [`docs/ARCHITECTURE.md#no-reusable-recipe-layer-yet`](docs/ARCHITECTURE.md#no-reusable-recipe-layer-yet) for the closed `RQ-0068` decision and when to revisit it.
|
|
225
226
|
|
|
226
227
|
```json
|
|
227
228
|
{
|
|
@@ -235,6 +236,8 @@ For short repeatable workflows, pass a top-level `job` instead of hand-writing `
|
|
|
235
236
|
}
|
|
236
237
|
```
|
|
237
238
|
|
|
239
|
+
On app pages that expose a native dropdown, add a `select` step such as `{ "action": "select", "selector": "#flavor", "value": "chocolate" }` before the assertion that depends on it.
|
|
240
|
+
|
|
238
241
|
Use raw `args`/`stdin` when you need full upstream `batch` power, custom flags, or commands outside the constrained job schema. Do not pass `stdin` with `job`, `qa`, `sourceLookup`, or `networkSourceLookup`; those modes generate the batch stdin themselves.
|
|
239
242
|
|
|
240
243
|
### Lightweight QA preset
|
|
@@ -263,7 +266,7 @@ For local app debugging, `sourceLookup` can gather candidate component/file loca
|
|
|
263
266
|
|
|
264
267
|
This is an experiment, not a guarantee. React hints require a session opened with `--enable react-devtools`, and many builds do not expose useful sourcemap/source metadata; `status: "no-candidates"` is common when nothing matched, and `status: "unsupported"` only when no candidates were found **and** a compiled `react` batch step failed (if DOM or workspace search still produced candidates, you get `candidates-found` instead).
|
|
265
268
|
|
|
266
|
-
`networkSourceLookup` is the matching failed-request experiment. It runs `network request <id>` when `requestId` is present and/or `network requests --filter …` when `filter` or `url` is present (`url` supplies the filter pattern when `filter` is omitted). It merges failed-request rows from the batch JSON with initiator-style hints and a bounded workspace literal scan (`maxWorkspaceFiles` defaults to 2000, cap 5000), surfaces everything under `details.networkSourceLookup`, and avoids automatic blame or edits.
|
|
269
|
+
`networkSourceLookup` is the matching failed-request experiment. It runs `network request <id>` when `requestId` is present and/or `network requests --filter …` when `filter` or `url` is present (`url` supplies the filter pattern when `filter` is omitted); add `session` when the generated batch should target an explicit upstream session. It merges failed-request rows from the batch JSON with initiator-style hints and a bounded workspace literal scan (`maxWorkspaceFiles` defaults to 2000, cap 5000), surfaces everything under `details.networkSourceLookup`, and avoids automatic blame or edits. Compact `network requests` results with safe request IDs also add `details.nextActions` for request details, bounded `networkSourceLookup` on actionable failures, path filtering, or HAR capture so agents can branch without guessing request-id syntax. Network diagnostics are read-only for wrapper page state: request URLs in `network request` or generated `networkSourceLookup` batches do not replace the session’s active page target or invalidate page-scoped refs from the app page.
|
|
267
270
|
|
|
268
271
|
```json
|
|
269
272
|
{ "networkSourceLookup": { "requestId": "req-1", "url": "/api/fail" } }
|
|
@@ -442,7 +445,7 @@ pi --no-extensions -e .
|
|
|
442
445
|
|
|
443
446
|
This bypasses Pi settings and configured extensions. After editing extension code, restart that Pi process to test the new checkout.
|
|
444
447
|
|
|
445
|
-
For a concrete expanded native-tool smoke matrix (version/help/skills through dashboard/chat families), see [Local development validation](docs/RELEASE.md#local-development-validation) in `docs/RELEASE.md`. When changes affect dense dashboards, diagnostics, artifacts, recording, scroll, or combobox behavior, use the public [Grafana stress checklist](docs/RELEASE.md#public-grafana-stress-checklist) for repeatable release dogfood without bundling private skills or recipes.
|
|
448
|
+
For a concrete expanded native-tool smoke matrix (version/help/skills through dashboard/chat families), see [Local development validation](docs/RELEASE.md#local-development-validation) in `docs/RELEASE.md`. For bounded release smokes that should validate this extension rather than skill routing, use the [Sauce Demo smoke prompt](docs/RELEASE.md#public-sauce-demo-checkout-smoke-prompt), which adds `--no-skills`. When changes affect dense dashboards, diagnostics, artifacts, recording, scroll, or combobox behavior, use the public [Grafana stress checklist](docs/RELEASE.md#public-grafana-stress-checklist) for repeatable release dogfood without bundling private skills or recipes.
|
|
446
449
|
|
|
447
450
|
Configured-source lifecycle validation:
|
|
448
451
|
|
package/docs/ARCHITECTURE.md
CHANGED
|
@@ -33,12 +33,12 @@ The extension should:
|
|
|
33
33
|
- invoke it directly, not through a shell
|
|
34
34
|
- inject `--json`
|
|
35
35
|
- support optional stdin only for `eval --stdin`, `batch`, `auth save --password-stdin`, and wrapper-generated `batch` stdin from top-level `job`, `qa`, `sourceLookup`, or `networkSourceLookup`, rejecting other command/stdin combinations before launch
|
|
36
|
-
- accept an optional native `semanticAction` object as a mutually exclusive alternative to `args` on a single tool call, compile
|
|
36
|
+
- accept an optional native `semanticAction` object as a mutually exclusive alternative to `args` on a single tool call, compile locator actions into upstream `find` argv and native dropdown selection into upstream `select <selector> <value...>` argv (with optional `semanticAction.session` expanding to a leading `--session <name>` before the compiled command when targeting a named upstream browser instead of the managed default), and echo the compiled shape in `details.compiledSemanticAction` for observability (see [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#semanticaction))
|
|
37
37
|
- accept an optional native `job` object (mutually exclusive with `args`, `semanticAction`, `qa`, `sourceLookup`, and `networkSourceLookup` on the same call) with a small fixed step vocabulary that compiles only to existing upstream `batch` argv rows, generates the JSON batch stdin string internally, and echoes `details.compiledJob` for observability (see [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#job))
|
|
38
38
|
- accept an optional native `qa` object (mutually exclusive with `args`, `semanticAction`, `job`, `sourceLookup`, and `networkSourceLookup` on the same call) that compiles to the same `batch` path as `job`, runs a fixed diagnostic smoke sequence, and echoes `details.compiledQaPreset` plus structured `details.qaPreset` pass/fail evidence (see [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#qa))
|
|
39
39
|
- accept an optional native `sourceLookup` object (mutually exclusive with `args`, `semanticAction`, `job`, `qa`, and `networkSourceLookup` on the same call) that compiles to the same `batch` path, gathers evidence-backed local source *candidates* for a selector/fiber/component name, and echoes `details.compiledSourceLookup` plus structured `details.sourceLookup` (see [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#sourcelookup)); unlike `qa`, it never applies a second pass/fail layer that marks the tool failed when upstream already reported batch success—failed upstream steps still fail the invocation normally, and `details.sourceLookup` may still be present for partial evidence
|
|
40
40
|
- accept an optional native `networkSourceLookup` object (mutually exclusive with `args`, `semanticAction`, `job`, `qa`, and `sourceLookup` on the same call) that compiles to the same `batch` path, correlates failed network requests with initiator metadata and bounded workspace URL literals, and echoes `details.compiledNetworkSourceLookup` plus structured `details.networkSourceLookup` (see [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#networksourcelookup)); like `sourceLookup`, it never flips a successful upstream batch to failed solely because no source candidates were found
|
|
41
|
-
- when
|
|
41
|
+
- when a compiled `find` semantic action fails as `stale-ref`, optionally append a `retry-semantic-action-after-stale-ref` entry to `details.nextActions` after the usual `refresh-interactive-refs` snapshot step so agents can re-issue the same compiled `find` argv only when the failure implies the interaction did not run; `select` shorthands with stale `@refs` get refresh guidance only (contract in [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#semanticaction))
|
|
42
42
|
- when the same compiled path fails as `selector-not-found` for the bounded locator/action pairs documented there, optionally append `try-*-candidate` entries to `details.nextActions` and mirror them in visible text as `Agent-browser candidate fallbacks` so agents can retry role/name `find` variants without hand-rebuilding argv (`select` misses are intentionally excluded)
|
|
43
43
|
|
|
44
44
|
### Agent-first UX
|
|
@@ -162,7 +162,7 @@ This keeps the product centered on native tool usage instead of auxiliary skill
|
|
|
162
162
|
|
|
163
163
|
### `pi-agent-browser-native` owns
|
|
164
164
|
|
|
165
|
-
- tool registration and schema (including the optional `semanticAction`
|
|
165
|
+
- tool registration and schema (including the optional `semanticAction` compilation path to upstream `find` or `select`)
|
|
166
166
|
- subprocess execution and JSON parsing through a filtered child environment (`buildAgentBrowserProcessEnv` in `extensions/agent-browser/lib/process.ts`): copies an allowlisted inherited-name set plus every parent `AGENT_BROWSER_*` variable and provider-related prefixes (`AGENTCORE_*`, `AI_GATEWAY_*`, `BROWSERBASE_*`, `BROWSERLESS_*`, `BROWSER_USE_*`, `KERNEL_*`, `XDG_*`) instead of cloning the full parent process environment
|
|
167
167
|
- clear missing-binary errors
|
|
168
168
|
- compact result summaries, including presentation-time redaction: stateful browser-context commands (`auth`, `cookies`, `storage`, `dialog`, `frame`, `state`) use field-aware value redaction and compact formatters, while other structured upstream JSON (for example `network`, `diff`, `trace` / `profiler` / `record`, `console` / `errors` / `highlight` / `inspect` / `clipboard`, `stream`, `dashboard`, and `chat`) is passed through `redactPresentationData` in `extensions/agent-browser/lib/results/presentation.ts` so model-facing `details.data` and batch roll-ups stay compact and do not echo bearer tokens, proxy passwords, or similar fields verbatim; `redactInvocationArgs` in `extensions/agent-browser/lib/runtime.ts` masks trailing values for sensitive global flags such as `--body`, `--headers`, `--password`, and `--proxy`, preserves positional rules for `cookies set` and `storage local|session set`, and nested `batch` steps use the same argv and error-body scrubbing before echoing commands or errors
|
|
@@ -34,6 +34,7 @@ Tool parameters (use exactly one of `args`, `semanticAction`, `job`, `qa`, `sour
|
|
|
34
34
|
|
|
35
35
|
```json
|
|
36
36
|
{ "semanticAction": { "action": "click", "locator": "text", "value": "Submit" }, "sessionMode": "auto" }
|
|
37
|
+
{ "semanticAction": { "action": "select", "selector": "#flavor", "value": "chocolate" } }
|
|
37
38
|
```
|
|
38
39
|
|
|
39
40
|
```json
|
|
@@ -53,7 +54,7 @@ Tool parameters (use exactly one of `args`, `semanticAction`, `job`, `qa`, `sour
|
|
|
53
54
|
```
|
|
54
55
|
|
|
55
56
|
- `args`: exact `agent-browser` CLI tokens after the binary name. Omit when using `semanticAction`, `job`, `qa`, `sourceLookup`, or `networkSourceLookup` instead (mutually exclusive).
|
|
56
|
-
- `semanticAction`: optional shorthand for common `find` flows
|
|
57
|
+
- `semanticAction`: optional shorthand for common `find` flows and native dropdown `select`; compiles to upstream argv and is rejected together with `args`, `job`, `qa`, `sourceLookup`, or `networkSourceLookup` on the same call.
|
|
57
58
|
- `job`: optional constrained short-workflow schema; compiles to existing upstream `batch` args/stdin and reports the compiled plan in `details.compiledJob`.
|
|
58
59
|
- `qa`: optional lightweight QA preset; compiles to the same batch path and reports `details.compiledQaPreset` plus `details.qaPreset` pass/fail evidence.
|
|
59
60
|
- `sourceLookup`: optional experimental helper for local UI-to-source *candidates*; compiles to the same `batch` path, reports `details.compiledSourceLookup` and `details.sourceLookup`, and never reclassifies a fully successful upstream batch as failed the way `qa` can (see [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#sourcelookup) and the longer notes below).
|
|
@@ -127,17 +128,18 @@ Examples:
|
|
|
127
128
|
{ "args": ["find", "label", "Email", "fill", "user@example.com"] }
|
|
128
129
|
{ "semanticAction": { "action": "click", "locator": "role", "value": "button", "name": "Close" } }
|
|
129
130
|
{ "semanticAction": { "action": "fill", "locator": "label", "value": "Email", "text": "user@example.com" } }
|
|
131
|
+
{ "semanticAction": { "action": "select", "selector": "#flavor", "value": "chocolate" } }
|
|
130
132
|
{ "semanticAction": { "action": "click", "locator": "text", "value": "Close", "session": "named-browser" } }
|
|
131
133
|
{ "semanticAction": { "action": "uncheck", "locator": "label", "value": "Remember me" } }
|
|
132
134
|
{ "args": ["scrollintoview", "@e12"] }
|
|
133
135
|
{ "args": ["snapshot", "-i"] }
|
|
134
136
|
```
|
|
135
137
|
|
|
136
|
-
The optional native `semanticAction` object is only a thin schema for common locator-based actions; it compiles to existing upstream `find` commands and reports the compiled argv in `details.compiledSemanticAction` (see [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#semanticaction) for the full field rules). It is a top-level alternative to `args`, `job`, `qa`, `sourceLookup`, and `networkSourceLookup`, not a nested shape inside `batch` stdin arrays. Add `session` inside `semanticAction` when the shorthand should target a named upstream browser session; the compiled argv prepends `--session <name>` before `find`, and fallback candidate actions preserve that prefix. For active sessions, role/name click/check/uncheck shorthands may resolve through the current `snapshot -i` refs before execution so hidden duplicate matches do not steal the action; inspect `details.effectiveArgs` when you need the exact executed argv. If a raw `find` or semantic action misses with `selector-not-found`, the wrapper may take one fresh snapshot and append `Current snapshot ref fallback` with `try-current-visible-ref*` next actions when that snapshot has exact visible role/name matches for the failed target. Semantic misses may also include `Agent-browser candidate fallbacks`; `details.nextActions` first recommends a fresh `snapshot -i` and may include bounded role/name retries—for example `searchbox`/`textbox` for a missed `placeholder` fill, `button`/`link` for a missed `text` click, or a `textbox` retry for a missed `label` fill—each as a `try-*-candidate` entry carrying redacted `find role …` argv.
|
|
138
|
+
The optional native `semanticAction` object is only a thin schema for common locator-based actions and native dropdown selection; it compiles locator actions to existing upstream `find` commands, compiles `action: "select"` to upstream `select <selector> <value...>`, and reports the compiled argv in `details.compiledSemanticAction` (see [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#semanticaction) for the full field rules). It is a top-level alternative to `args`, `job`, `qa`, `sourceLookup`, and `networkSourceLookup`, not a nested shape inside `batch` stdin arrays. Add `session` inside `semanticAction` when the shorthand should target a named upstream browser session; the compiled argv prepends `--session <name>` before `find` or `select`, and fallback candidate actions preserve that prefix. For active sessions, role/name click/check/uncheck shorthands may resolve through the current `snapshot -i` refs before execution so hidden duplicate matches do not steal the action; inspect `details.effectiveArgs` when you need the exact executed argv. `select` shorthand intentionally requires a stable selector or current `@ref` plus `value`/`values`; upstream `find` does not expose a verified `select` action, so role/name/label dropdown resolution stays a snapshot/selector decision instead of hidden wrapper magic. If a raw `find` or semantic action misses with `selector-not-found`, the wrapper may take one fresh snapshot and append `Current snapshot ref fallback` with `try-current-visible-ref*` next actions when that snapshot has exact visible role/name matches for the failed target. Semantic misses may also include `Agent-browser candidate fallbacks`; `details.nextActions` first recommends a fresh `snapshot -i` and may include bounded role/name retries—for example `searchbox`/`textbox` for a missed `placeholder` fill, `button`/`link` for a missed `text` click, or a `textbox` retry for a missed `label` fill—each as a `try-*-candidate` entry carrying redacted `find role …` argv.
|
|
137
139
|
|
|
138
140
|
Do not assume Playwright selector dialects such as `text=Close` or `button:has-text('Close')` are supported wrapper syntax. If you need those forms, verify current upstream `agent-browser` behavior first; otherwise use refs, `find`, or known CSS selectors.
|
|
139
141
|
|
|
140
|
-
Treat `@e…` refs as page-scoped. After a successful `snapshot`, the wrapper records the latest refs and page target for that session; mutation-prone ref commands such as `click @e4` or batch steps with old refs fail with `failureCategory: "stale-ref"` when the page target changed or the ref is absent from the latest same-page snapshot. Inside `batch` stdin JSON, the wrapper also walks steps in order before spawn: steps whose first token can navigate or mutate set a latch; a later step whose first token is `snapshot` clears that latch for following rows; guarded steps that still mention `@e…` after an uncleared latch fail with the same `stale-ref` bucket without launching upstream. Same-snapshot form fills are allowed before a click or submit step, so a login-style `fill`, `fill`, `click` batch can run from one snapshot; split dynamic or autosubmit forms with a fresh snapshot if a fill itself rerenders the targets. Follow the `refresh-interactive-refs` next action (it includes `--session <name>` when needed) and prefer stable `find` or `semanticAction` locators when navigation or rerendering is likely. Contract detail: [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#details) (`refSnapshot`).
|
|
142
|
+
Treat `@e…` refs as page-scoped. After a successful `snapshot`, the wrapper records the latest refs and page target for that session; mutation-prone ref commands such as `click @e4`, `select @e5 chocolate`, or batch steps with old refs fail with `failureCategory: "stale-ref"` when the page target changed or the ref is absent from the latest same-page snapshot. Inside `batch` stdin JSON, the wrapper also walks steps in order before spawn: steps whose first token can navigate or mutate set a latch; a later step whose first token is `snapshot` clears that latch for following rows; guarded steps that still mention `@e…` after an uncleared latch fail with the same `stale-ref` bucket without launching upstream. Same-snapshot form fills are allowed before a click or submit step, so a login-style `fill`, `fill`, `click` batch can run from one snapshot; split dynamic or autosubmit forms with a fresh snapshot if a fill itself rerenders the targets. Follow the `refresh-interactive-refs` next action (it includes `--session <name>` when needed) and prefer stable `find` or `semanticAction` locators when navigation or rerendering is likely. Contract detail: [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#details) (`refSnapshot`).
|
|
141
143
|
|
|
142
144
|
A successful `click` result means upstream reported a target, not that the app definitely handled the event. When the workflow depends on a mutation, use `details.pageChangeSummary`, a wait, URL/text extraction, or a fresh `snapshot -i` before trusting the state; if nothing changed, retry with a current visible ref or stable selector and report the workflow issue. Preserve explicit user stop boundaries: if the user says to stop before a final order, post, purchase, or submit action, gather evidence from that page and do not click the final action. The wrapper avoids site-specific fallback clicks and keeps the verification burden explicit.
|
|
143
145
|
|
|
@@ -172,7 +174,7 @@ On tabbed or hidden-DOM pages, `get text <selector>` reads the upstream-selected
|
|
|
172
174
|
|
|
173
175
|
Use `batch --bail` when later steps should stop after the first failed command.
|
|
174
176
|
|
|
175
|
-
For short constrained flows, use top-level `job` instead of hand-writing `batch` stdin. Supported job steps are `open`, `click`, `fill`, `wait`, `assertText`, `assertUrl`, `waitForDownload`, and `screenshot`;
|
|
177
|
+
For short constrained flows, use top-level `job` instead of hand-writing `batch` stdin. Supported job steps are `open`, `click`, `fill`, `select`, `wait`, `assertText`, `assertUrl`, `waitForDownload`, and `screenshot`; `select` requires `selector` plus `value` or `values`, and compiles to upstream `select <selector> <value...>`. The wrapper compiles steps to upstream `batch` and records `details.compiledJob.steps[]`. There is still no separate first-class catalog of reusable named browser recipes above `job`, the `qa` preset, and raw `batch`; see [`ARCHITECTURE.md`](ARCHITECTURE.md#no-reusable-recipe-layer-yet) for the closed `RQ-0068` decision and revisit bar.
|
|
176
178
|
|
|
177
179
|
```json
|
|
178
180
|
{
|
|
@@ -186,11 +188,13 @@ For short constrained flows, use top-level `job` instead of hand-writing `batch`
|
|
|
186
188
|
}
|
|
187
189
|
```
|
|
188
190
|
|
|
191
|
+
On app pages that expose a native dropdown, add a `select` step such as `{ "action": "select", "selector": "#flavor", "value": "chocolate" }` before the assertion that depends on it.
|
|
192
|
+
|
|
189
193
|
Use raw `args: ["batch"]` with `stdin` when you need arbitrary upstream commands, flags, or batch failure policies outside the constrained schema. Do not pass `stdin` with `job`, `qa`, `sourceLookup`, or `networkSourceLookup`; those modes generate the batch stdin themselves.
|
|
190
194
|
|
|
191
195
|
For quick smoke/QA checks, use top-level `qa`. It clears enabled network/console/page-error buffers before opening the target URL, waits for page readiness, checks expected text/selector, inspects fresh network requests, console messages, and page errors, and can capture an evidence screenshot. The readiness wait defaults to `loadState: "domcontentloaded"`; set `loadState` to `"load"` or `"networkidle"` only when that stricter state is useful and the site is not expected to keep background requests alive. QA network diagnostics classify failed requests by likely impact and list failed rows first in the network preview: actionable document/script/API-style failures fail the preset, while common low-impact browser icon misses such as `favicon.ico` are surfaced as warnings (`qaPreset.warnings`) so they do not fail an otherwise healthy page.
|
|
192
196
|
|
|
193
|
-
The same classification drives plain `network requests` presentation: when any row counts as failed (HTTP status ≥ 400, `failed: true`, or a string `error`), model-facing text starts with a line like `Network failure summary: 0 actionable, 1 benign low-impact (1 total).`, and each preview line can end with an impact tag such as `[benign: low-impact browser icon asset]` or `[actionable: document, script, API, or non-benign request failure]`. Rules live in `classifyNetworkRequestFailure` / `summarizeNetworkFailures` in `extensions/agent-browser/lib/results/shared.ts`; QA aggregation is `analyzeQaPresetResults` in `extensions/agent-browser/index.ts`.
|
|
197
|
+
The same classification drives plain `network requests` presentation: when any row counts as failed (HTTP status ≥ 400, `failed: true`, or a string `error`), model-facing text starts with a line like `Network failure summary: 0 actionable, 1 benign low-impact (1 total).`, and each preview line can end with an impact tag such as `[benign: low-impact browser icon asset]` or `[actionable: document, script, API, or non-benign request failure]`. When safe request IDs are present, `details.nextActions` adds bounded read-only follow-ups such as `network request <id>`, `networkSourceLookup` for actionable failed rows, `network requests --filter <path>`, and `network har start`; prefer those payloads over rebuilding request-id commands from prose. Rules live in `classifyNetworkRequestFailure` / `summarizeNetworkFailures` in `extensions/agent-browser/lib/results/shared.ts`; QA aggregation is `analyzeQaPresetResults` in `extensions/agent-browser/index.ts`.
|
|
194
198
|
|
|
195
199
|
```json
|
|
196
200
|
{ "qa": { "url": "https://example.com", "expectedText": "Example Domain", "screenshotPath": ".dogfood/qa-example.png" } }
|
|
@@ -206,7 +210,7 @@ For local app debugging, top-level `sourceLookup` can gather candidate component
|
|
|
206
210
|
{ "sourceLookup": { "selector": "#save", "reactFiberId": "2", "componentName": "SaveButton" } }
|
|
207
211
|
```
|
|
208
212
|
|
|
209
|
-
Top-level `networkSourceLookup` does the same for failed browser requests. When `requestId` is set it adds `network request <requestId>`; when `filter` or `url` is set it also adds `network requests --filter …`, using `url` as the filter pattern when `filter` is omitted. With `requestId` only, the compiled batch is just that request step; failed-request detection still walks the returned batch JSON and treats HTTP status ≥ 400, `failed: true`, or an `error` field as failure. When `filter` or `url` is present, the same heuristics apply but requests are correlated only if their URL matches that substring (either direction). Workspace URL literal search under the Pi session cwd reuses the `sourceLookup` scan rules (`maxWorkspaceFiles` defaults to 2000, hard cap 5000, at most ten `workspace-search` rows, up to eight URL/path needles from the query plus failed request URLs). It reports `details.networkSourceLookup.status` as `failed-requests-found`, `no-failed-requests`, or `no-candidates` and never assigns definitive blame.
|
|
213
|
+
Top-level `networkSourceLookup` does the same for failed browser requests. When `requestId` is set it adds `network request <requestId>`; when `filter` or `url` is set it also adds `network requests --filter …`, using `url` as the filter pattern when `filter` is omitted. Add `session` when the generated batch should target an explicit upstream session. With `requestId` only, the compiled batch is just that request step; failed-request detection still walks the returned batch JSON and treats HTTP status ≥ 400, `failed: true`, or an `error` field as failure. When `filter` or `url` is present, the same heuristics apply but requests are correlated only if their URL matches that substring (either direction). Workspace URL literal search under the Pi session cwd reuses the `sourceLookup` scan rules (`maxWorkspaceFiles` defaults to 2000, hard cap 5000, at most ten `workspace-search` rows, up to eight URL/path needles from the query plus failed request URLs). It reports `details.networkSourceLookup.status` as `failed-requests-found`, `no-failed-requests`, or `no-candidates` and never assigns definitive blame. Request-detail URLs are diagnostic evidence, not active-tab evidence: standalone `network request …` and generated `networkSourceLookup` batches preserve the previous app page target and latest same-page `refSnapshot`.
|
|
210
214
|
|
|
211
215
|
```json
|
|
212
216
|
{ "networkSourceLookup": { "requestId": "req-1", "url": "/api/fail" } }
|
|
@@ -390,7 +394,7 @@ Session note: `skills list`, `skills get …`, and `skills path …` are **state
|
|
|
390
394
|
|
|
391
395
|
On dashboards and other apps with nested scroll containers, `scroll <dir> [px]` may report a successful wheel action while the viewport appears unchanged because the page-level scroller was not the one containing the content. For top-level `scroll` calls without startup-scoped launch flags, the wrapper samples viewport and prominent scroll-container positions before and after the command; when nothing changes it appends `Scroll diagnostic: no observed scroll movement`, exposes `details.scrollNoop`, and adds exact `details.nextActions` for a fresh `snapshot -i` and screenshot. Use those before repeating page scrolls; when you need a specific panel, prefer `scrollintoview <@ref>` or a scoped interaction with the actual scrollable region.
|
|
392
396
|
|
|
393
|
-
Comboboxes vary by app. A `click` or `semanticAction` role/name click may focus a searchable combobox without opening its option list. For explicit combobox-targeted actions such as `semanticAction` role `combobox`, the wrapper checks whether a combobox-like element is focused, has explicit `aria-expanded` state, and has no visible listbox/options open; this still applies when the semantic action first resolves to a current visible `@ref` before execution. When that happens it appends `Combobox diagnostic: focused combobox did not expose visible options`, exposes `details.comboboxFocus`, and adds exact `details.nextActions` for a fresh `snapshot -i`, `press ArrowDown`, and `press Enter`. Use those instead of assuming click alone expanded the control;
|
|
397
|
+
Comboboxes vary by app. For native `<select>` controls, prefer raw `select <selector> <value...>`, `semanticAction: { action: "select", selector, value|values }`, or a `job` `select` step instead of clicking option refs; native option refs can be non-boxed in CDP and fail before a real selection. A `click` or `semanticAction` role/name click may focus a searchable custom combobox without opening its option list. For explicit combobox-targeted actions such as `semanticAction` role `combobox`, the wrapper checks whether a combobox-like element is focused, has explicit `aria-expanded` state, and has no visible listbox/options open; this still applies when the semantic action first resolves to a current visible `@ref` before execution. When that happens it appends `Combobox diagnostic: focused combobox did not expose visible options`, exposes `details.comboboxFocus`, and adds exact `details.nextActions` for a fresh `snapshot -i`, `press ArrowDown`, and `press Enter`. Use those instead of assuming click alone expanded the control; reserve visible option refs for custom comboboxes after a fresh snapshot shows the intended option.
|
|
394
398
|
|
|
395
399
|
### Navigation
|
|
396
400
|
|
|
@@ -543,7 +547,7 @@ Long-running or lifecycle commands should be explicitly paired with cleanup call
|
|
|
543
547
|
| `doctor [--fix]` | Diagnose install issues and optionally auto-clean stale files. Use `doctor --offline --quick` for a fast local-only check and `doctor --json` for structured output. |
|
|
544
548
|
| `profiles` | List available Chrome profiles. |
|
|
545
549
|
|
|
546
|
-
When these commands are invoked through the native `agent_browser` tool, structured diagnostic/status outputs are rendered as compact summaries. List-like outputs such as sessions, Chrome profiles, auth profiles, network requests, console messages, and page errors include counts and key fields; large outputs are previewed with a `Full output path:` spill file instead of dumping the entire payload into context. For `network requests`, the wrapper shows a failed-request summary split into actionable versus benign low-impact rows, then status, method, URL, resource/mime type, request id, and, when the installed upstream output includes body-like fields, bounded redacted payload, response, and failure/error snippets. `network request <requestId>` can expose upstream full-detail body fields such as response bodies using the same bounded model-facing preview. Header, cookie, auth, token, and other secret-like fields are not expanded in model-facing text or `details.data`; command echoes also redact `--body`, `--headers`, `--password`, proxy credentials, auth-bearing URLs, cookie/storage values, and bearer/basic credential text in positional arguments. Use upstream HAR or full raw details only when complete data is required.
|
|
550
|
+
When these commands are invoked through the native `agent_browser` tool, structured diagnostic/status outputs are rendered as compact summaries. List-like outputs such as sessions, Chrome profiles, auth profiles, network requests, console messages, and page errors include counts and key fields; large outputs are previewed with a `Full output path:` spill file instead of dumping the entire payload into context. For `network requests`, the wrapper shows a failed-request summary split into actionable versus benign low-impact rows, then status, method, URL, resource/mime type, request id, and, when the installed upstream output includes body-like fields, bounded redacted payload, response, and failure/error snippets. Safe request IDs also produce `details.nextActions` for exact request details, actionable failed-request source lookup candidates, filtered request lists, or starting HAR capture before a repro. `network request <requestId>` can expose upstream full-detail body fields such as response bodies using the same bounded model-facing preview; its request URL stays diagnostic-only and does not overwrite `details.sessionTabTarget` for later ref guards. Header, cookie, auth, token, and other secret-like fields are not expanded in model-facing text or `details.data`; command echoes also redact `--body`, `--headers`, `--password`, proxy credentials, auth-bearing URLs, cookie/storage values, and bearer/basic credential text in positional arguments. Use upstream HAR or full raw details only when complete data is required.
|
|
547
551
|
|
|
548
552
|
## Important global flags, config, and environment
|
|
549
553
|
|
package/docs/RELEASE.md
CHANGED
|
@@ -36,7 +36,7 @@ npm run verify -- release
|
|
|
36
36
|
|
|
37
37
|
`prepublishOnly` intentionally does **not** run `npm run verify -- lifecycle`, `npm run verify -- real-upstream`, or `npm run verify -- benchmark`; those are separate `npm run verify` modes in [`scripts/project.mjs`](../scripts/project.mjs). Treat the bullets below as the full pre-publish contract even though only the `release` slice is automated at publish time.
|
|
38
38
|
|
|
39
|
-
Every release also requires interactive `tmux`-driven Pi dogfood with the native `agent_browser` tool against real sites.
|
|
39
|
+
Every release also requires interactive `tmux`-driven Pi dogfood with the native `agent_browser` tool against real sites. For extension-focused release smokes, use `pi --no-extensions --no-skills -e .` from the checkout before publish so auto-loaded dogfood/QA skills cannot replace the bounded smoke workflow; run separate skill-enabled dogfood only when validating skill routing or report-generation behavior. Drive prompts with `tmux send-keys`, exercise at least one simple static site and one real documentation/product site, include the higher-level `qa` or `job`/`batch` surfaces when they changed, close every opened browser session, remove screenshots/temp artifacts, and record the outcome in the release notes or support-matrix evidence. Automated localhost and fake-upstream gates do not replace this human-readable live-site transcript evidence. For dense-dashboard stress coverage, use the [public Grafana stress checklist](#public-grafana-stress-checklist) below; it is a maintainer workflow, not bundled product skill or recipe runtime.
|
|
40
40
|
|
|
41
41
|
The configured-source lifecycle regression harness is required before release because it launches an interactive `pi` process under `tmux` and validates `/reload` plus restart/`/resume` behavior:
|
|
42
42
|
|
|
@@ -73,18 +73,18 @@ Record release evidence as a short note with: date, package/checkout source, tar
|
|
|
73
73
|
|
|
74
74
|
Use this validation prompt after changing click enrichment, tab pinning, ref preflight, form-fill batching, artifact handling, recording, or prompt guidance. It is intentionally more stateful than `example.com` and uses a natural user-style request so the transcript shows what the agent chooses on its own. Do **not** mention `agent_browser`, snapshots, refs, `batch`, `eval`, or upstream command names in the prompt; those are evaluator expectations, not user instructions.
|
|
75
75
|
|
|
76
|
-
Run it in an isolated checkout session. It is fine to restrict active tools at launch so the checkout extension is the only browser surface, but keep
|
|
76
|
+
Run it in an isolated checkout session with skills disabled so the run validates the extension browser workflow instead of external dogfood/QA skill routing. It is fine to restrict active tools at launch so the checkout extension is the only browser surface, but keep those launch details out of the user prompt:
|
|
77
77
|
|
|
78
78
|
```bash
|
|
79
|
-
pi --no-extensions -e . --model openai-codex/gpt-5.5:minimal --tools agent_browser --session-dir "$SESSION_DIR"
|
|
79
|
+
pi --no-extensions --no-skills -e . --model openai-codex/gpt-5.5:minimal --tools agent_browser --session-dir "$SESSION_DIR"
|
|
80
80
|
```
|
|
81
81
|
|
|
82
|
-
Repeat with `--model openai-codex/gpt-5.5:medium` when validating instruction-following robustness. Use unique temp paths for each run and delete them afterward.
|
|
82
|
+
Repeat with `--model openai-codex/gpt-5.5:medium` when validating instruction-following robustness. Use unique temp paths for each run and delete them afterward. Run separate skill-enabled dogfood sessions only when the thing under test is skill integration, not this bounded release smoke.
|
|
83
83
|
|
|
84
84
|
Copy/paste prompt, replacing the two artifact placeholders with exact absolute paths:
|
|
85
85
|
|
|
86
86
|
```text
|
|
87
|
-
Please
|
|
87
|
+
Please run a bounded release smoke check on the public Sauce Demo store. This is not an exploratory bug hunt or dogfood report.
|
|
88
88
|
|
|
89
89
|
Site: https://www.saucedemo.com/
|
|
90
90
|
Demo credentials: standard_user / secret_sauce
|
|
@@ -99,13 +99,13 @@ Scenario:
|
|
|
99
99
|
- Start checkout with a fake name and postal code.
|
|
100
100
|
- Stop on the checkout overview page; do not place the order.
|
|
101
101
|
|
|
102
|
-
Please gather enough evidence to support the
|
|
102
|
+
Please gather enough evidence to support the smoke result:
|
|
103
103
|
- Save a screenshot here: <ABSOLUTE_SCREENSHOT_PATH>.png
|
|
104
104
|
- Save a short screen recording here if recording is available: <ABSOLUTE_RECORDING_PATH>.webm
|
|
105
105
|
- Include the final page title/URL, the selected sort order, cart contents, item total/tax/total, and any browser-side network, console, or page-error issues you see.
|
|
106
106
|
- Clean up by closing the browser when finished.
|
|
107
107
|
|
|
108
|
-
Return a concise PASS/FAIL report with evidence and any tool or workflow issues you noticed.
|
|
108
|
+
Return a concise PASS/FAIL report with evidence and any tool or workflow issues you noticed. Do not create a dogfood-output report directory.
|
|
109
109
|
```
|
|
110
110
|
|
|
111
111
|
Evaluator expectations after the queued Sauce Demo fixes: the agent should independently choose efficient, safe browser operations; native add-to-cart clicks should mutate cart state without JavaScript fallback; same-snapshot form fills may be batched safely when the agent chooses that route; the selected sort order should be verified; checkout must stop before Finish and must not place the order; screenshot and recording must use the requested paths or be explicitly reported unavailable; `network requests` may show public-demo telemetry 401s; `console` may report offline-cache logs; `errors` should show no page errors; and the browser session plus temp artifacts should be cleaned up after evidence is recorded. A run that clicks Finish despite the stop instruction or silently substitutes artifact paths is a workflow failure even if the store flow itself works.
|
|
@@ -167,7 +167,7 @@ Before publishing, validate both local-checkout modes without mixing their assum
|
|
|
167
167
|
4. Run a smoke prompt that exercises `agent_browser`.
|
|
168
168
|
5. Restart the `pi` process after extension edits; Pi settings and `/reload` are not the validation target in this isolated mode.
|
|
169
169
|
|
|
170
|
-
For expanded-surface validation, the smoke prompt should cover native tool invocation rather than shelling out to `agent-browser`: `--version`, `--help`, `skills list`, `skills get core --full`, `open` with `sessionMode: "fresh"`, `snapshot -i`, `click`, top-level `semanticAction` (locator shorthand compiled to upstream `find`, optionally with `semanticAction.session` when you need the same named upstream session as a prior explicit `--session` call), `eval --stdin`, `batch` via stdin, top-level `job`, `qa`, or experimental `sourceLookup` / `networkSourceLookup` (compiled batch smoke), `screenshot <path>`, explicit `--session … open` plus `--session … close`, `network requests`, `console` / `errors`, `diff snapshot`, `stream status` plus `stream disable`, `dashboard start` plus `dashboard stop`, and `chat <message>` (credential failure is acceptable evidence of wrapper pass-through when `AI_GATEWAY_API_KEY` is intentionally unset). Clean up any opened browser session with `close`, remove temporary files, and kill the tmux session before ending validation.
|
|
170
|
+
For expanded-surface validation, the smoke prompt should cover native tool invocation rather than shelling out to `agent-browser`: `--version`, `--help`, `skills list`, `skills get core --full`, `open` with `sessionMode: "fresh"`, `snapshot -i`, `click`, top-level `semanticAction` (locator shorthand compiled to upstream `find` and native dropdown selection compiled to upstream `select`, optionally with `semanticAction.session` when you need the same named upstream session as a prior explicit `--session` call), `eval --stdin`, `batch` via stdin, top-level `job`, `qa`, or experimental `sourceLookup` / `networkSourceLookup` (compiled batch smoke), `screenshot <path>`, explicit `--session … open` plus `--session … close`, `network requests`, `console` / `errors`, `diff snapshot`, `stream status` plus `stream disable`, `dashboard start` plus `dashboard stop`, and `chat <message>` (credential failure is acceptable evidence of wrapper pass-through when `AI_GATEWAY_API_KEY` is intentionally unset). Clean up any opened browser session with `close`, remove temporary files, and kill the tmux session before ending validation.
|
|
171
171
|
|
|
172
172
|
This checklist assumes a real `agent-browser` on `PATH`. It complements, but does not overlap, `npm run verify -- lifecycle`: that harness swaps in a fake upstream binary and focuses on `/reload`, full restart, `/resume`, managed-session continuity, and spill-path persistence (`scripts/verify-lifecycle.mjs`), not the full command matrix above.
|
|
173
173
|
|
|
@@ -188,7 +188,7 @@ Manual validation remains useful for release confidence and installed-package ch
|
|
|
188
188
|
1. Configure exactly one active source for this extension in Pi settings: this checkout path before publishing, or the installed package after publishing.
|
|
189
189
|
2. Launch plain `pi` so extension discovery is active.
|
|
190
190
|
3. Validate managed-session continuity with `/reload` and a full restart + `/resume`.
|
|
191
|
-
4. Re-check local extension-side docs (`README.md`, `docs/COMMAND_REFERENCE.md`, `docs/TOOL_CONTRACT.md`, including the [`semanticAction`](TOOL_CONTRACT.md#semanticaction) rules when that shorthand or upstream `find` behavior changes) and regenerated prompt fragments from `extensions/agent-browser/lib/playbook.ts` via `npm run docs -- playbook check` or `npm run docs`. When the upstream `agent-browser` version or help surface changed, run `npm run verify -- command-reference`.
|
|
191
|
+
4. Re-check local extension-side docs (`README.md`, `docs/COMMAND_REFERENCE.md`, `docs/TOOL_CONTRACT.md`, including the [`semanticAction`](TOOL_CONTRACT.md#semanticaction) rules when that shorthand or upstream `find` / `select` behavior changes) and regenerated prompt fragments from `extensions/agent-browser/lib/playbook.ts` via `npm run docs -- playbook check` or `npm run docs`. When the upstream `agent-browser` version or help surface changed, run `npm run verify -- command-reference`.
|
|
192
192
|
|
|
193
193
|
### Real upstream contract validation
|
|
194
194
|
|
|
@@ -269,8 +269,8 @@ Before publishing:
|
|
|
269
269
|
- run `npm run verify -- command-reference` if the installed upstream `agent-browser` version or help surface changed
|
|
270
270
|
- run `npm run doctor` and confirm any duplicate-source remediation matches the active package/checkout setup
|
|
271
271
|
- run `npm run verify -- real-upstream` for upstream runtime, result-presentation, or managed-session changes
|
|
272
|
-
- confirm both local-checkout modes still work for pre-release validation: isolated `pi --no-extensions -e .` smoke testing and configured-source lifecycle validation
|
|
273
|
-
- complete interactive `tmux` live-site
|
|
272
|
+
- confirm both local-checkout modes still work for pre-release validation: isolated `pi --no-extensions -e .` smoke testing for general checkout loading (add `--no-skills` for extension-focused bounded smokes) and configured-source lifecycle validation
|
|
273
|
+
- complete interactive `tmux` live-site extension smoke with `pi --no-extensions --no-skills -e .` and the native `agent_browser` tool (at least one simple static site and one real documentation/product site; include `qa` or `job`/`batch` when those surfaces changed; use the [public Grafana stress checklist](#public-grafana-stress-checklist) when dashboard/diagnostic/artifact behavior changed; close sessions and remove screenshots/temp artifacts; record evidence). Run separate skill-enabled dogfood only when validating skill routing/report-generation behavior—see [Pre-release checks](#pre-release-checks); automated gates are not a substitute
|
|
274
274
|
- rerun `npm run verify -- release`
|
|
275
275
|
- run `npm run verify -- lifecycle` for configured-source `/reload` plus restart/`/resume` regression coverage (required before publish; see [Pre-release checks](#pre-release-checks))
|
|
276
276
|
- confirm [`SUPPORT_MATRIX.md`](SUPPORT_MATRIX.md) still maps every current baseline inventory section to docs, runtime handling, tests, and validation status
|
package/docs/REQUIREMENTS.md
CHANGED
|
@@ -63,7 +63,7 @@ Define the product requirements and constraints for `pi-agent-browser-native`.
|
|
|
63
63
|
|
|
64
64
|
### Native `agent_browser` inputs
|
|
65
65
|
|
|
66
|
-
- Each tool invocation must supply **exactly one** of: `args` (full upstream argv after the binary name)
|
|
66
|
+
- Each tool invocation must supply **exactly one** of: `args` (full upstream argv after the binary name), top-level `semanticAction` (a small intent object compiled into existing upstream `find` argv for locator actions or upstream `select <selector> <value...>` argv for native dropdown selection), `job`, `qa`, `sourceLookup`, or `networkSourceLookup`. Supplying multiple modes or none is rejected before launch (`extensions/agent-browser/index.ts`, `test/agent-browser.extension-validation.test.ts`).
|
|
67
67
|
- `semanticAction` is not a nested shape inside `batch` stdin; batch steps remain upstream argv string arrays, including `find` steps expressed as token lists.
|
|
68
68
|
- Supported actions, locators, exclusivity rules, when `details.compiledSemanticAction` appears, and bounded `try-*-candidate` follow-ups on `selector-not-found` (specific action/locator pairs only; see contract) are specified in [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#semanticaction), with workflow examples in [`COMMAND_REFERENCE.md`](COMMAND_REFERENCE.md).
|
|
69
69
|
|
|
@@ -89,6 +89,7 @@ Define the product requirements and constraints for `pi-agent-browser-native`.
|
|
|
89
89
|
- Prefer full `pi` restart over `/reload` when validating extension changes beyond a quick reload smoke check.
|
|
90
90
|
- Use `/resume` when needed after restart.
|
|
91
91
|
- Keep testing broader than a single smoke site like `example.com`.
|
|
92
|
+
- Bounded release smokes that validate this extension should disable auto-loaded skills with `--no-skills`; run skill-enabled dogfood separately only when validating external skill routing or report-generation behavior.
|
|
92
93
|
- Maintain a concrete release/package verification workflow in `docs/RELEASE.md` and matching repository scripts.
|
|
93
94
|
|
|
94
95
|
## Representative use cases
|
|
@@ -108,7 +109,7 @@ The design should comfortably support workflows such as:
|
|
|
108
109
|
- Package-manifest behavior matters more than repo-local development wiring.
|
|
109
110
|
- The extension should use official `pi` hooks and package resources where possible.
|
|
110
111
|
- The wrapper should stay thin, with upstream `agent-browser` remaining the source of truth for command semantics.
|
|
111
|
-
- Successful and failed tool outcomes should surface bounded machine-readable fields on Pi-facing `details` (`resultCategory`, `successCategory`, `failureCategory`, optional structured `nextActions`, optional `pageChangeSummary` with per-step summaries on `batch`, optional `artifactVerification` with the same shape on successful `batchSteps[]` rows) so agents can branch without parsing prose; stateful commands (`auth`, `cookies`, `storage`, `dialog`, `frame`, `state`) plus other structured diagnostics (for example `network`, `diff`, `trace`, `stream`, `dashboard`, `chat`) and `batch` should redact secret-bearing payloads in model-facing `details.data`, including the compact per-step `batch` roll-up on the parent result (full per-step payloads live on `batchSteps[]`). The contract lives in [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#details), enums and classifier precedence live in `extensions/agent-browser/lib/results/shared.ts`, and presentation-time summaries, redaction, and artifact verification rollups are assembled in `extensions/agent-browser/lib/results/presentation.ts` (`buildPageChangeSummary`, `PAGE_CHANGE_SUMMARY_COMMANDS`, `redactPresentationData`, `buildArtifactVerificationSummary`, `buildBatchPresentation`).
|
|
112
|
+
- Successful and failed tool outcomes should surface bounded machine-readable fields on Pi-facing `details` (`resultCategory`, `successCategory`, `failureCategory`, optional structured `nextActions`, optional `pageChangeSummary` with per-step summaries on `batch`, optional `artifactVerification` with the same shape on successful `batchSteps[]` rows) so agents can branch without parsing prose; stateful commands (`auth`, `cookies`, `storage`, `dialog`, `frame`, `state`) plus other structured diagnostics (for example `network`, `diff`, `trace`, `stream`, `dashboard`, `chat`) and `batch` should redact secret-bearing payloads in model-facing `details.data`, including the compact per-step `batch` roll-up on the parent result (full per-step payloads live on `batchSteps[]`). The contract lives in [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#details), enums and classifier precedence live in `extensions/agent-browser/lib/results/shared.ts`, and presentation-time summaries, redaction, network request follow-ups, and artifact verification rollups are assembled in `extensions/agent-browser/lib/results/presentation.ts` (`buildPageChangeSummary`, `PAGE_CHANGE_SUMMARY_COMMANDS`, `redactPresentationData`, `buildArtifactVerificationSummary`, `buildBatchPresentation`).
|
|
112
113
|
- User-facing docs belong in `README.md` and the canonical published files under `docs/`.
|
|
113
114
|
- Agent workflow and deeper testing procedures can stay in `AGENTS.md`, but published docs must not depend on that file being present.
|
|
114
115
|
- When upstream `agent-browser` changes, refresh the local command reference, prompt guidance, and other extension-side docs so agents still have a repo-readable equivalent of the blocked direct-binary help path.
|
|
@@ -121,7 +122,7 @@ The design should comfortably support workflows such as:
|
|
|
121
122
|
- On local Unix launches, extension-generated session names should not fail just because the upstream default socket path is too long; the wrapper should choose a shorter socket directory when needed.
|
|
122
123
|
- Provider selection flags (`-p`, `--provider`) and provider device flags (`--device`) are launch-scoped like profile, CDP, and persisted state: if an extension-managed implicit session is already active, the planner must fail fast with the same recovery guidance as other startup-scoped flags instead of silently forwarding argv upstream would ignore; contract in [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#sessionmode) and session model in [`ARCHITECTURE.md`](ARCHITECTURE.md).
|
|
123
124
|
- Read-only upstream `skills list`, `skills get …`, and `skills path …` must stay free of implicit managed `--session` under default `sessionMode: "auto"` (still with `--json`), matching plain-text `--help` / `--version` inspection semantics so bundled skill text does not pin or rotate the active browser session; new `skills` subcommands pick up that behavior only after allowlisting in `extensions/agent-browser/lib/runtime.ts` with regression coverage.
|
|
124
|
-
- Optional `semanticAction.session` on native `agent_browser` must compile to a leading `--session <name>` pair before upstream `find` argv so the
|
|
125
|
+
- Optional `semanticAction.session` on native `agent_browser` must compile to a leading `--session <name>` pair before upstream `find` or `select` argv so the shorthand can target a named upstream browser without hand-built `args`, while `buildExecutionPlan` still skips double-injecting the extension-managed implicit session whenever planned argv already starts with `--session`; stale-ref retries for compiled `find` actions and bounded `try-*` candidate `nextActions` must preserve that same prefix. Contract in [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#semanticaction) / [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#sessionmode); implementation in `extensions/agent-browser/index.ts` and `extensions/agent-browser/lib/runtime.ts`.
|
|
125
126
|
|
|
126
127
|
## Open design questions
|
|
127
128
|
|
package/docs/SUPPORT_MATRIX.md
CHANGED
|
@@ -48,10 +48,10 @@ Re-run the gates below before each release; this table records what the closure
|
|
|
48
48
|
| Baseline section | Baseline items | Documentation | Runtime handling | Test coverage | Validation status |
|
|
49
49
|
| --- | --- | --- | --- | --- | --- |
|
|
50
50
|
| Built-in skills | `skills list`, `skills get core`, `skills get core --full`, `skills get <name>`, `skills get electron`, `skills get slack`, `skills get dogfood`, `skills get vercel-sandbox`, `skills get agentcore`, `skills path [name]` | [`COMMAND_REFERENCE.md`](COMMAND_REFERENCE.md#built-in-skills), generated baseline block, README proof section, release docs. | `isStatelessInspectionCommand` keeps read-only `skills list` / `skills get` / `skills path` JSON inspection stateless while preserving thin upstream passthrough. | `test/agent-browser.runtime.test.ts`; `test/agent-browser.extension-validation.test.ts` skills/provider matrix; real-upstream inspection/skills group. | Supported. Real upstream covers `skills list`, `skills get core --full`, `skills path core`; fake matrix covers specialized skills. |
|
|
51
|
-
| Core page, element, navigation, and extraction commands | `open <url>`, `click <sel>`, `dblclick <sel>`, `type <sel> <text>`, `fill <sel> <text>`, `press <key>`, `keyboard type <text>`, `keyboard inserttext <text>`, `keydown Shift`, `keyup Shift`, `hover <sel>`, `focus <sel>`, `check <sel>`, `uncheck <sel>`, `select <sel> <val...>`, `drag <src> <dst>`, `upload <sel> <files...>`, `download <sel> <path>`, `scroll <dir> [px]`, `scrollintoview <sel>`, `wait <sel|ms>`, `screenshot [path]`, `screenshot --full`, `screenshot --annotate`, `pdf <path>`, `snapshot`, `eval <js>`, `connect <port|url>`, `close [--all]`, `back`, `forward`, `reload`, `pushstate <url>`, `get <what> [selector]`, `is <what> <selector>`, `find <locator> <value> <action>`, `mouse <action> [args]`, `set <setting> [value]` | [`COMMAND_REFERENCE.md`](COMMAND_REFERENCE.md#core-page-and-element-commands), [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md), README quick start. | Thin upstream passthrough with wrapper-owned `--json`, managed-session planning, stale-ref guidance, artifact verification, page-change summaries, no-op scroll diagnostics, focused-combobox diagnostics, and redaction. | Real-upstream core matrix covers representative interactions/navigation/extraction/artifacts
|
|
51
|
+
| Core page, element, navigation, and extraction commands | `open <url>`, `click <sel>`, `dblclick <sel>`, `type <sel> <text>`, `fill <sel> <text>`, `press <key>`, `keyboard type <text>`, `keyboard inserttext <text>`, `keydown Shift`, `keyup Shift`, `hover <sel>`, `focus <sel>`, `check <sel>`, `uncheck <sel>`, `select <sel> <val...>`, `drag <src> <dst>`, `upload <sel> <files...>`, `download <sel> <path>`, `scroll <dir> [px]`, `scrollintoview <sel>`, `wait <sel|ms>`, `screenshot [path]`, `screenshot --full`, `screenshot --annotate`, `pdf <path>`, `snapshot`, `eval <js>`, `connect <port|url>`, `close [--all]`, `back`, `forward`, `reload`, `pushstate <url>`, `get <what> [selector]`, `is <what> <selector>`, `find <locator> <value> <action>`, `mouse <action> [args]`, `set <setting> [value]` | [`COMMAND_REFERENCE.md`](COMMAND_REFERENCE.md#core-page-and-element-commands), [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md), README quick start. | Thin upstream passthrough with wrapper-owned `--json`, managed-session planning, stale-ref guidance, artifact verification, page-change summaries, no-op scroll diagnostics, focused-combobox diagnostics, first-class `semanticAction` / `job` compile paths for upstream `select`, and redaction. | Real-upstream core matrix covers representative interactions/navigation/extraction/artifacts plus raw/semantic/job `select`; fake core matrix covers additional passthrough and ordering plus no-op scroll, combobox-focus diagnostics, and select compiler validation. | Supported. Some upstream semantics remain upstream-owned; wrapper contract and artifact metadata are tested. |
|
|
52
52
|
| Sessions, state, tabs, frames, dialogs, and windows | `session`, `session list`, `state save <path>`, `state load <path>`, `tab list`, `tab new --label <name> [url]`, `tab <t<N>|label>`, `frame <selector|main>`, `dialog accept [text]`, `dialog dismiss`, `dialog status`, `window new` | [`COMMAND_REFERENCE.md`](COMMAND_REFERENCE.md#session-state-frames-dialogs-windows-and-inspection-commands) (session/state/tabs/frames/dialogs/windows), stateful workflow notes, [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#details). | Stateful presentation summaries/redaction; state save artifact handling; explicit/implicit session restore; tab target pinning; frame/dialog/window passthrough. | `test/agent-browser.extension-validation.test.ts` stateful matrix; runtime session/resume tests; presentation stateful redaction tests; lifecycle harness for reload/resume. | Supported. External profile/auth state remains operator-owned and documented. |
|
|
53
53
|
| Network, storage, artifacts, diagnostics, and performance | `network <action>`, `network route <url> [--abort|--body <json>] [--resource-type <csv>]`, `network request <requestId>`, `cookies [get|set|clear]`, `cookies set --curl <file>`, `storage <local|session>`, `diff snapshot`, `diff screenshot --baseline`, `diff url <u1> <u2>`, `trace start|stop [path]`, `profiler start|stop [path]`, `record start <path> [url]`, `record restart <path> [url]`, `record stop`, `console [--clear]`, `errors [--clear]`, `highlight <sel>`, `inspect`, `clipboard <op> [text]`, `stream enable [--port <n>]`, `stream disable`, `stream status`, `react tree`, `react inspect <id>`, `react renders start`, `react renders stop [--json]`, `react suspense [--only-dynamic] [--json]`, `vitals [url] [--json]`, `removeinitscript <id>` | [`COMMAND_REFERENCE.md`](COMMAND_REFERENCE.md#page-state-finding-mouse-settings-network-and-storage) and diagnostic sections; [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#details). | Thin passthrough plus command-specific compact diagnostic summaries, artifact metadata for HAR/diff/trace/profile/record, early missing-ffmpeg recording warnings, sensitive-data redaction, timeout bounds, and cleanup-pair guidance. | Fake non-core matrix covers network/diff/trace/profiler/record/console/errors/highlight/inspect/clipboard/stream/dashboard/chat JSON shapes and redaction; real-upstream covers safe network requests/HAR, diff, trace/profiler, console/errors/highlight, stream, vitals, and React missing-renderer. | Supported. Browser-opening or environment-sensitive operations (`inspect`, OS clipboard, full React app inspection) are delegated thinly and documented as needing suitable local/browser state. |
|
|
54
|
-
| Batch, auth, confirmations, setup, dashboard, and AI commands | `batch [--bail]`, `auth save <name>`, `auth save <name> --password-stdin`, `auth login <name>`, `auth list`, `auth show <name>`, `auth delete <name>`, `confirm <id>`, `deny <id>`, `chat <message>`, `dashboard start --port <n>`, `dashboard stop`, `install`, `install --with-deps`, `upgrade`, `doctor [--fix]`, `doctor --offline --quick`, `doctor --json`, `profiles` | [`COMMAND_REFERENCE.md`](COMMAND_REFERENCE.md#batch-auth-confirmations-sessions-chat-dashboard-and-setup), README security notes, release docs. | Batch stdin is native-tool-only; top-level `job`, `qa`, and experimental `sourceLookup` / `networkSourceLookup` compile to `batch` with generated stdin (caller `stdin` rejected for those modes); auth/confirmation details are redacted; dashboard/chat/setup/doctor are passed through thinly with timeout/cleanup guidance; package doctor remains separate and read-only. | Unit/fake tests cover batch, auth password stdin, confirmations, dashboard/chat summaries, and doctor diagnostics; extension-validation covers `job`, `qa`, `sourceLookup`, and `networkSourceLookup` compilation plus `details.sourceLookup` / `details.networkSourceLookup` evidence
|
|
54
|
+
| Batch, auth, confirmations, setup, dashboard, and AI commands | `batch [--bail]`, `auth save <name>`, `auth save <name> --password-stdin`, `auth login <name>`, `auth list`, `auth show <name>`, `auth delete <name>`, `confirm <id>`, `deny <id>`, `chat <message>`, `dashboard start --port <n>`, `dashboard stop`, `install`, `install --with-deps`, `upgrade`, `doctor [--fix]`, `doctor --offline --quick`, `doctor --json`, `profiles` | [`COMMAND_REFERENCE.md`](COMMAND_REFERENCE.md#batch-auth-confirmations-sessions-chat-dashboard-and-setup), README security notes, release docs. | Batch stdin is native-tool-only; top-level `job`, `qa`, and experimental `sourceLookup` / `networkSourceLookup` compile to `batch` with generated stdin (caller `stdin` rejected for those modes); job `select` compiles to upstream `select <selector> <value...>`; auth/confirmation details are redacted; dashboard/chat/setup/doctor are passed through thinly with timeout/cleanup guidance; package doctor remains separate and read-only. | Unit/fake tests cover batch, auth password stdin, confirmations, dashboard/chat summaries, and doctor diagnostics; extension-validation covers `job`, `qa`, `sourceLookup`, and `networkSourceLookup` compilation plus `details.sourceLookup` / `details.networkSourceLookup` evidence, including job `select`; [`scripts/agent-browser-efficiency-benchmark.mjs`](../scripts/agent-browser-efficiency-benchmark.mjs) includes `source-lookup-visible-element` and `network-source-lookup-failed-request` scenarios; quick isolated Pi smoke covered dashboard start/stop and chat credential-failure pass-through. | Supported. `install`, `upgrade`, `doctor --fix`, and interactive auth/chat/setup flows are upstream-owned and should be run only when the operator intends those side effects. |
|
|
55
55
|
| Global flags, config, providers, policy, and environment | `--profile <name|path>`, `AGENT_BROWSER_PROFILE`, `--session <name>`, `AGENT_BROWSER_SESSION`, `--session-name <name>`, `AGENT_BROWSER_SESSION_NAME`, `--state <path>`, `AGENT_BROWSER_STATE`, `--auto-connect`, `AGENT_BROWSER_AUTO_CONNECT`, `--headers <json>`, `--init-script <path>`, `AGENT_BROWSER_INIT_SCRIPTS`, `--enable <feature>`, `AGENT_BROWSER_ENABLE`, `--executable-path <path>`, `AGENT_BROWSER_EXECUTABLE_PATH`, `--extension <path>`, `AGENT_BROWSER_EXTENSIONS`, `--args <args>`, `AGENT_BROWSER_ARGS`, `--user-agent <ua>`, `AGENT_BROWSER_USER_AGENT`, `--proxy <server>`, `AGENT_BROWSER_PROXY`, `HTTP_PROXY`, `HTTPS_PROXY`, `ALL_PROXY`, `--proxy-bypass <hosts>`, `AGENT_BROWSER_PROXY_BYPASS`, `NO_PROXY`, `--ignore-https-errors`, `AGENT_BROWSER_IGNORE_HTTPS_ERRORS`, `--allow-file-access`, `AGENT_BROWSER_ALLOW_FILE_ACCESS`, `--headed`, `AGENT_BROWSER_HEADED`, `--cdp <port>`, `--color-scheme <scheme>`, `AGENT_BROWSER_COLOR_SCHEME`, `--download-path <path>`, `AGENT_BROWSER_DOWNLOAD_PATH`, `--engine <name>`, `AGENT_BROWSER_ENGINE`, `--no-auto-dialog`, `AGENT_BROWSER_NO_AUTO_DIALOG`, `--json`, `AGENT_BROWSER_JSON`, `--annotate`, `AGENT_BROWSER_ANNOTATE`, `--screenshot-dir <path>`, `AGENT_BROWSER_SCREENSHOT_DIR`, `--screenshot-quality <n>`, `AGENT_BROWSER_SCREENSHOT_QUALITY`, `--screenshot-format <fmt>`, `AGENT_BROWSER_SCREENSHOT_FORMAT`, `--content-boundaries`, `AGENT_BROWSER_CONTENT_BOUNDARIES`, `--max-output <chars>`, `AGENT_BROWSER_MAX_OUTPUT`, `--allowed-domains <list>`, `AGENT_BROWSER_ALLOWED_DOMAINS`, `--action-policy <path>`, `AGENT_BROWSER_ACTION_POLICY`, `--confirm-actions <list>`, `AGENT_BROWSER_CONFIRM_ACTIONS`, `--confirm-interactive`, `AGENT_BROWSER_CONFIRM_INTERACTIVE`, `-p, --provider <name>`, `AGENT_BROWSER_PROVIDER`, `browserbase`, `kernel`, `browseruse`, `browserless`, `agentcore`, `--device <name>`, `AGENT_BROWSER_IOS_DEVICE`, `agent-browser -p ios device list`, `agent-browser -p ios swipe up`, `agent-browser -p ios tap @e1`, `--model <name>`, `AI_GATEWAY_MODEL`, `-v, --verbose`, `-q, --quiet`, `--debug`, `AGENT_BROWSER_DEBUG`, `AGENT_BROWSER_CONFIG`, `AGENT_BROWSER_DEFAULT_TIMEOUT`, `AGENT_BROWSER_STREAM_PORT`, `AGENT_BROWSER_IDLE_TIMEOUT_MS`, `AGENT_BROWSER_ENCRYPTION_KEY`, `AGENT_BROWSER_STATE_EXPIRE_DAYS`, `AGENT_BROWSER_IOS_UDID`, `AI_GATEWAY_URL`, `AI_GATEWAY_API_KEY` | [`COMMAND_REFERENCE.md`](COMMAND_REFERENCE.md#important-global-flags-config-and-environment), README provider/setup notes, [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#sessionmode), architecture/runtime docs. | Runtime handles value flags, launch-scoped flags, redacted invocation echoes, `sessionMode: "fresh"` recovery hints, explicit sessions, and provider/device launch-scoping. Process env forwards a curated allowlist/prefix set for upstream/provider credentials without cloning the whole parent env. | Runtime tests cover launch-scoped flags, provider/device planning, redaction, stateless inspections, and explicit/fresh sessions. Process tests cover provider env prefixes. Fake provider/specialized-skill matrix covers provider argv/env passthrough. Package doctor checks version/source drift. | Supported. Provider clouds, iOS/Appium, Browserbase/Kernel/BrowserUse/Browserless/AgentCore, proxies, profiles, and credentials require external setup; the wrapper documents and forwards them thinly rather than emulating provider behavior. |
|
|
56
56
|
|
|
57
57
|
## Follow-up decision after closure
|
|
@@ -62,11 +62,19 @@ Native `job`, `qa`, experimental `sourceLookup`, and experimental `networkSource
|
|
|
62
62
|
|
|
63
63
|
`RQ-0067` shipped as the failed-request correlation experiment in [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#networksourcelookup): it compiles to upstream `batch` steps (`network request …` and/or `network requests --filter …`), merges `details.networkSourceLookup` after scanning batch JSON for failed requests and optional workspace URL literals, redacts query strings and credentials in model-visible surfaces, and never reclassifies an upstream-successful batch to failed solely because no candidates were found.
|
|
64
64
|
|
|
65
|
+
`RQ-0093` keeps network diagnostics read-only for wrapper page/ref state: standalone `network request …` results and generated `networkSourceLookup` batch rows may contain API/request URLs, but those URLs are not promoted to `details.sessionTabTarget` and do not stale the latest app-page `details.refSnapshot`. The prior session target is preserved until a real page/navigation/snapshot result updates it. Contract: [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#networksourcelookup) and [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#details); fake coverage: `agentBrowserExtension keeps network request diagnostics from replacing the active page target` in [`test/agent-browser.extension-validation.test.ts`](../test/agent-browser.extension-validation.test.ts).
|
|
66
|
+
|
|
67
|
+
`RQ-0095` adds bounded machine follow-ups for compact `network requests` output: `extensions/agent-browser/lib/results/presentation.ts` selects at most one safe request ID (actionable failed row first, then API/fetch-like row, benign failed row, or first safe ID) and appends `details.nextActions` for exact `network request <id>`, optional `networkSourceLookup` on actionable failed rows, path filtering with `network requests --filter <path>`, and `network har start` before a repro. Request-detail/filter/HAR argv preserve the current `--session` prefix when known, source lookup nextActions carry `networkSourceLookup.session` when known, and URL queries plus sensitive-looking IDs/paths are omitted from action params. Contract: [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#details); human workflow: [`COMMAND_REFERENCE.md`](COMMAND_REFERENCE.md) network diagnostics note and README source-lookup section; fake coverage: `buildToolPresentation formats redacted network payload, response, and error previews` and `buildToolPresentation returns bounded network request next actions for benign and successful API rows` in [`test/agent-browser.presentation.test.ts`](../test/agent-browser.presentation.test.ts).
|
|
68
|
+
|
|
69
|
+
`RQ-0092` adds first-class native select support to the wrapper shorthand surfaces without adding a recipe layer: `semanticAction.action = "select"` requires `selector` plus `value` or `values` and compiles to upstream `select <selector> <value...>`; constrained `job` supports the same `select` step inside generated `batch` stdin. Role/name/label dropdown selection is deliberately not hidden behind `find … select` because upstream `find` has no verified select action; agents should use a stable selector or a current `@ref` for native selects and reserve visible option refs for custom comboboxes after a fresh snapshot. Stale-ref retries remain limited to compiled `find` semantic actions, so `select @e…` failures return refresh guidance rather than blind retry. Contract: [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#semanticaction) and [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#job); fake coverage: semanticAction/job select compile and stale-ref assertions in [`test/agent-browser.extension-validation.test.ts`](../test/agent-browser.extension-validation.test.ts); real-upstream coverage: raw, semanticAction, and job select against the localhost native `<select>` fixture in [`test/agent-browser.real-upstream-contract.test.ts`](../test/agent-browser.real-upstream-contract.test.ts).
|
|
70
|
+
|
|
71
|
+
`RQ-0091` keeps advanced release smoke tests focused on extension behavior instead of external skill routing: the Sauce Demo smoke in [`RELEASE.md`](RELEASE.md#public-sauce-demo-checkout-smoke-prompt) now launches with `--no-skills`, restricts tools to `agent_browser`, and uses bounded release-smoke wording rather than dogfood/exploratory QA language. Runtime guidance remains the concise stop-boundary and exact-artifact-path contract from `extensions/agent-browser/lib/playbook.ts`; no site-specific automation or recipe layer was added. Evidence from the failed high/low local-shop runs showed skill/report drift (`dogfood-output` substitution) and reasoning complexity, not a wrapper command defect, so skill-enabled dogfood remains a separate validation mode. Human workflow: [`RELEASE.md`](RELEASE.md#public-sauce-demo-checkout-smoke-prompt), [`AGENTS.md`](../AGENTS.md#preferred-testing-workflow), and [`REQUIREMENTS.md`](REQUIREMENTS.md#testing-guidance).
|
|
72
|
+
|
|
65
73
|
`RQ-0068` closed with a no-adopt decision for reusable browser recipes. Current benchmark and repo-local dogfood evidence do not show repeated named job shapes that justify executable recipe state; examples stay in docs and prompt guidance, while the `qa` preset remains the only stable repeated smoke-test shortcut. Revisit recipes only with concrete repeated workflow evidence and a defined owner/versioning/test plan.
|
|
66
74
|
|
|
67
|
-
`RQ-0070` adds bounded locator fallbacks when a compiled top-level `semanticAction` fails with `failureCategory: "selector-not-found"`: `extensions/agent-browser/index.ts` appends `try-*-candidate` entries to `details.nextActions` (and an `Agent-browser candidate fallbacks` block in visible text) only for `fill`+`placeholder`, `click`+`text`, or `fill`+`label`. Other locator/action pairs omit this block;
|
|
75
|
+
`RQ-0070` adds bounded locator fallbacks when a compiled top-level `semanticAction` fails with `failureCategory: "selector-not-found"`: `extensions/agent-browser/index.ts` appends `try-*-candidate` entries to `details.nextActions` (and an `Agent-browser candidate fallbacks` block in visible text) only for `fill`+`placeholder`, `click`+`text`, or `fill`+`label`. Other locator/action pairs omit this block; `semanticAction` `select` now uses explicit `selector` plus `value`/`values` and compiles to upstream `select`, not to unverified `find … select`. Active-session role/name click/check/uncheck shorthands also get a pre-execution visible-ref resolution pass via one fresh `snapshot -i`, so hidden duplicate upstream `find` matches do not steal the action; the original target remains in `details.compiledSemanticAction` and the executed ref appears in `details.effectiveArgs`. Contract: [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#semanticaction); fake coverage: `agentBrowserExtension returns semantic locator candidates when semanticAction misses` and `agentBrowserExtension resolves semantic role clicks through current visible snapshot refs when available` in [`test/agent-browser.extension-validation.test.ts`](../test/agent-browser.extension-validation.test.ts).
|
|
68
76
|
|
|
69
|
-
`RQ-0071` makes that shorthand session-aware: optional `semanticAction.session` compiles to `--session <name>` before `find`, so `buildExecutionPlan` treats the call like any argv that already names an upstream session (no extra implicit `--session`); `details.sessionName` reflects the name on success; stale-ref retries copy compiled argv with that prefix, and `try-*` candidates preserve the same `--session` prefix via `getCompiledSemanticActionSessionPrefix`. Contract: [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#semanticaction); fake coverage: `semanticAction` session compile/assertions in [`test/agent-browser.extension-validation.test.ts`](../test/agent-browser.extension-validation.test.ts).
|
|
77
|
+
`RQ-0071` makes that shorthand session-aware: optional `semanticAction.session` compiles to `--session <name>` before `find` or `select`, so `buildExecutionPlan` treats the call like any argv that already names an upstream session (no extra implicit `--session`); `details.sessionName` reflects the name on success; stale-ref retries for compiled `find` actions copy compiled argv with that prefix, and `try-*` candidates preserve the same `--session` prefix via `getCompiledSemanticActionSessionPrefix`. Contract: [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#semanticaction); fake coverage: `semanticAction` session compile/assertions in [`test/agent-browser.extension-validation.test.ts`](../test/agent-browser.extension-validation.test.ts).
|
|
70
78
|
|
|
71
79
|
`RQ-0088` adds current-snapshot ref fallback for selector misses: when raw `find` or compiled `semanticAction` fails with `failureCategory: "selector-not-found"`, `extensions/agent-browser/index.ts` may take one fresh session-scoped `snapshot -i`, look for exact normalized role/name matches for the failed target, emit `details.visibleRefFallback` plus visible `Current snapshot ref fallback`, and append bounded direct-ref next actions (`try-current-visible-ref` / `try-current-visible-ref-N`). The matcher is intentionally narrow: role locators require `--name`; text-click maps only to exact-name `button`/`link` refs; label/placeholder fill maps only to exact-name textbox/searchbox-style refs; prefixes/fuzzy matches are ignored, and duplicate exact matches carry ambiguity safety copy. Contract: [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#details) (`visibleRefFallback`, nextActions); human workflow: [`COMMAND_REFERENCE.md`](COMMAND_REFERENCE.md) selector strategy and README pitfalls; fake coverage: `agentBrowserExtension suggests current snapshot refs when raw find role locators miss` in [`test/agent-browser.extension-validation.test.ts`](../test/agent-browser.extension-validation.test.ts).
|
|
72
80
|
|
|
@@ -88,7 +96,7 @@ Native `job`, `qa`, experimental `sourceLookup`, and experimental `networkSource
|
|
|
88
96
|
|
|
89
97
|
`RQ-0077` reports managed-session outcomes after managed-session process execution: `extensions/agent-browser/index.ts` builds `details.managedSessionOutcome` (`buildManagedSessionOutcome`), recording `status` values such as `preserved` (previous managed session remains current) or `abandoned` (no managed session became current), plus previous/current/attempted session names, optional `replacedSessionName`, and active-before/after booleans. Visible `Managed session outcome: …` text (`formatManagedSessionOutcomeText`) is appended only when `sessionMode` is `"fresh"` and the outcome’s `succeeded` is false—covering launch failures, missing-binary on a fresh plan, and post-batch failures such as **`qa`** reclassification where `succeeded` is realigned after the fact. Contract: [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#details); human workflow: [`COMMAND_REFERENCE.md`](COMMAND_REFERENCE.md) session-mode notes and README session section; fake coverage: `agentBrowserExtension reports managed-session outcomes after failed fresh launches` and the managed-session slice of `agentBrowserExtension compiles lightweight QA presets and fails diagnostics` in [`test/agent-browser.extension-validation.test.ts`](../test/agent-browser.extension-validation.test.ts).
|
|
90
98
|
|
|
91
|
-
`RQ-0078` improves getter/eval discoverability: `extensions/agent-browser/lib/results/presentation.ts` matches upstream failure text containing `unknown command`, `unknown subcommand`, or `unrecognized command` (case-insensitive) when the failed command token is one of `attr`, `count`, `html`, `text`, `title`, `url`, or `value`, then adds grouped-`get` prose; only `title` / `url` also emit read-only `nextActions` (`use-get-title` / `use-get-url`, with `--session` when the failed call named a session). The getter block is skipped when selector recovery already injected an `Agent-browser hint:` line into the same error string. `extensions/agent-browser/index.ts` adds `details.evalStdinHint` plus visible `Eval stdin hint` when `looksLikeFunctionEvalStdin` matches trimmed stdin and upstream JSON carries
|
|
99
|
+
`RQ-0078` improves getter/eval discoverability: `extensions/agent-browser/lib/results/presentation.ts` matches upstream failure text containing `unknown command`, `unknown subcommand`, or `unrecognized command` (case-insensitive) when the failed command token is one of `attr`, `count`, `html`, `text`, `title`, `url`, or `value`, then adds grouped-`get` prose; only `title` / `url` also emit read-only `nextActions` (`use-get-title` / `use-get-url`, with `--session` when the failed call named a session). The getter block is skipped when selector recovery already injected an `Agent-browser hint:` line into the same error string. `extensions/agent-browser/index.ts` adds `details.evalStdinHint` plus visible `Eval stdin hint` when `looksLikeFunctionEvalStdin` matches trimmed stdin and upstream JSON carries a plain empty-object `data.result`; empty arrays such as `[]` are valid eval results and are not warned. Contract: [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#details) (`nextActions`, `evalStdinHint`); human workflow: [`COMMAND_REFERENCE.md`](COMMAND_REFERENCE.md) extraction note and README quick start; fake coverage: `buildToolPresentation suggests grouped getter commands for common unknown getter shortcuts` and `agentBrowserExtension warns when eval stdin returns an empty object from a function-shaped snippet`.
|
|
92
100
|
|
|
93
101
|
`RQ-0079` clarifies artifact lifecycle and cleanup ownership: `extensions/agent-browser/index.ts` adds `details.artifactCleanup` and visible `Artifact lifecycle` copy on successful `close` when `artifactManifest.entries` is non-empty (`getArtifactCleanupGuidance`), stating that close does not delete explicit artifacts; `explicitArtifactPaths` carries up to ten distinct existing `explicit-path` manifest paths after a filesystem existence check, skipping stale paths already removed by host tools (possibly empty when the recent window has no existing explicit rows). Contract: [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md#details) (`artifactCleanup`); human workflow: [`COMMAND_REFERENCE.md`](COMMAND_REFERENCE.md) artifact retention section and README artifact notes; fake coverage: `agentBrowserExtension reports artifact lifecycle guidance on close`.
|
|
94
102
|
|