pi-agent-browser-native 0.2.12 → 0.2.14

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -12,6 +12,15 @@ Provide a local, repo-readable command reference for the native `agent_browser`
12
12
 
13
13
  This project intentionally blocks normal `agent-browser` bash usage in most agent sessions, so the agent still needs an accessible local equivalent of the upstream command surface. This document is the durable reference the agent can read inside the repository without calling the binary directly.
14
14
 
15
+ ## Upstream baseline
16
+
17
+ <!-- agent-browser-capability-baseline:start upstream-baseline -->
18
+ <!-- Generated from scripts/agent-browser-capability-baseline.mjs. Run `npm run docs -- command-reference write` to update. Do not edit manually. -->
19
+ This reference is baselined to the locally installed `agent-browser 0.26.0` command/help surface. Upstream `agent-browser` remains the source of truth for command semantics; this file is the local fallback for Pi agent sessions where direct binary help is blocked or discouraged.
20
+
21
+ The lightweight drift check is `npm run verify -- command-reference`. Run it whenever the installed upstream `agent-browser` version changes or this reference is edited.
22
+ <!-- agent-browser-capability-baseline:end upstream-baseline -->
23
+
15
24
  ## Core mental model
16
25
 
17
26
  Tool parameters:
@@ -24,241 +33,464 @@ Tool parameters:
24
33
  }
25
34
  ```
26
35
 
27
- - `args`: exact `agent-browser` CLI tokens after the binary name
28
- - `stdin`: only for commands like `batch` and `eval --stdin`
36
+ - `args`: exact `agent-browser` CLI tokens after the binary name.
37
+ - `stdin`: only for `batch` and `eval --stdin`; other command/stdin combinations are rejected before `agent-browser` is launched.
29
38
  - `sessionMode`:
30
- - `"auto"` reuse the extension-managed session when possible
31
- - `"fresh"` rotate that managed session to a fresh upstream launch so startup-scoped flags like `--profile`, `--session-name`, or `--cdp` apply
39
+ - `"auto"` reuses the extension-managed session when possible.
40
+ - `"fresh"` rotates that managed session to a fresh upstream launch so launch-scoped flags like `--profile`, `--session-name`, `--cdp`, `--state`, or `--auto-connect` apply.
32
41
 
33
42
  ## Recommended workflow
34
43
 
44
+ Keep routine browser work simple: open a page, inspect it with `snapshot -i`, interact with current `@ref` values from that snapshot, then inspect again. Re-run `snapshot -i` after navigation, scrolling, rerendering, or other major DOM changes because refs can become stale.
45
+
35
46
  ### Normal browse flow
36
47
 
37
48
  ```json
38
49
  { "args": ["open", "https://example.com"] }
39
- { "args": ["snapshot", "-i"] }
50
+ { "args": ["snapshot", "-i", "--urls"] }
40
51
  { "args": ["click", "@e2"] }
41
52
  { "args": ["snapshot", "-i"] }
42
53
  ```
43
54
 
55
+ ### Selector strategy
56
+
57
+ Prefer targets in this order:
58
+
59
+ 1. Use a current `@ref` from the latest `snapshot -i` for visible interactive controls.
60
+ 2. After `scroll`, `scrollintoview`, navigation, or any rerender, take a fresh `snapshot -i` before reusing refs.
61
+ 3. When a target is easiest to describe by accessible name or visible text, use `find` locators such as `role`, `text`, `label`, `placeholder`, `alt`, `title`, or `testid` instead of guessing selector syntax.
62
+ 4. Use CSS selectors for scoped extraction or stable app-specific hooks when you know they match the current page.
63
+
64
+ Examples:
65
+
66
+ ```json
67
+ { "args": ["find", "role", "button", "click", "--name", "Close"] }
68
+ { "args": ["find", "text", "Close", "click"] }
69
+ { "args": ["find", "label", "Email", "fill", "user@example.com"] }
70
+ { "args": ["scrollintoview", "@e12"] }
71
+ { "args": ["snapshot", "-i"] }
72
+ ```
73
+
74
+ Do not assume Playwright selector dialects such as `text=Close` or `button:has-text('Close')` are supported wrapper syntax. If you need those forms, verify current upstream `agent-browser` behavior first; otherwise use refs, `find`, or known CSS selectors.
75
+
44
76
  ### Extract page data
45
77
 
46
78
  ```json
47
79
  { "args": ["get", "title"] }
48
80
  { "args": ["get", "url"] }
81
+ { "args": ["get", "text", "main"] }
49
82
  { "args": ["eval", "--stdin"], "stdin": "document.title" }
50
83
  ```
51
84
 
85
+ Prefer `get` and scoped `eval --stdin` for read-only extraction. Return the intended JavaScript value instead of relying on `console.log`.
86
+
52
87
  ### Run a multi-step flow in one browser invocation
53
88
 
54
89
  ```json
55
90
  { "args": ["batch"], "stdin": "[[\"open\",\"https://example.com\"],[\"snapshot\",\"-i\"]]" }
56
91
  ```
57
92
 
58
- ### Switch from an already-active implicit session to a fresh profiled launch
93
+ Use `batch --bail` when later steps should stop after the first failed command.
59
94
 
60
- ```json
61
- {
62
- "args": ["--profile", "Default", "open", "https://mail.google.com"],
63
- "sessionMode": "fresh"
64
- }
65
- ```
66
-
67
- ## High-value commands
68
-
69
- ### Open and navigation
70
-
71
- - `open <url>`
72
- - `goto <url>`
73
- - `navigate <url>`
74
- - `back`
75
- - `forward`
76
- - `reload`
77
-
78
- Examples:
95
+ ### Wait for page readiness or downloads
79
96
 
80
97
  ```json
81
- { "args": ["open", "https://react.dev"] }
82
- { "args": ["reload"] }
98
+ { "args": ["wait", "--load", "networkidle"] }
99
+ { "args": ["wait", "--url", "**/dashboard"] }
100
+ { "args": ["wait", "--download", "/tmp/report.pdf"] }
83
101
  ```
84
102
 
85
- ### Snapshot and page inspection
103
+ Do not use a bare `wait --load`; `--load` needs a state value such as `load`, `domcontentloaded`, or `networkidle`.
86
104
 
87
- - `snapshot`
88
- - `snapshot -i` interactive elements only
89
- - `snapshot -c` compact tree
90
- - `snapshot -d <n>` limit depth
91
- - `snapshot -s <selector>` scope to one subtree
92
-
93
- Examples:
105
+ Use `wait --download [path]` after an earlier action has already started a browser download, such as a dashboard export button that responds asynchronously:
94
106
 
95
107
  ```json
96
- { "args": ["snapshot", "-i"] }
97
- { "args": ["snapshot", "-i", "-s", "main"] }
108
+ { "args": ["click", "@export"] }
109
+ { "args": ["wait", "--download", "/tmp/report.csv"] }
98
110
  ```
99
111
 
100
- ### Element interaction
101
-
102
- - `click <selector-or-@ref>`
103
- - `dblclick <selector-or-@ref>`
104
- - `hover <selector-or-@ref>`
105
- - `focus <selector-or-@ref>`
106
- - `type <selector-or-@ref> <text>`
107
- - `fill <selector-or-@ref> <text>`
108
- - `press <key>`
109
- - `check <selector-or-@ref>`
110
- - `uncheck <selector-or-@ref>`
111
- - `select <selector-or-@ref> <value...>`
112
- - `drag <src> <dst>`
113
- - `upload <selector-or-@ref> <files...>`
114
-
115
- Examples:
112
+ For one-call flows, put the click and wait in `batch`; the wait step keeps the saved-file metadata in `details.batchSteps[n].savedFilePath` and `details.batchSteps[n].savedFile`:
116
113
 
117
114
  ```json
118
- { "args": ["click", "@e12"] }
119
- { "args": ["fill", "#email", "user@example.com"] }
120
- { "args": ["press", "Enter"] }
115
+ { "args": ["batch"], "stdin": "[[\"click\",\"@export\"],[\"wait\",\"--download\",\"/tmp/report.csv\"]]" }
121
116
  ```
122
117
 
123
- ### Downloads and saved files
124
-
125
- Use the purpose-built command when a click should save a file.
118
+ A successful wait-based download renders a readable summary such as `Download completed: /tmp/report.csv` and exposes top-level `details.savedFilePath` plus `details.savedFile` for non-batch calls. With the current upstream `agent-browser 0.26.0`, `wait --download <path>` may report the requested path before this environment can verify that the file was persisted there. Treat `details.savedFilePath` as upstream-reported metadata unless `details.artifacts[].exists` is true. Upstream tracking: [vercel-labs/agent-browser#1300](https://github.com/vercel-labs/agent-browser/issues/1300).
126
119
 
127
- - `download <selector-or-@ref> <path>`
128
- - `pdf <path>`
129
- - `screenshot [path]`
130
-
131
- Examples:
120
+ ### Download, screenshot, and PDF files
132
121
 
133
122
  ```json
134
123
  { "args": ["download", "@e5", "/tmp/report.pdf"] }
135
- { "args": ["pdf", "/tmp/page.pdf"] }
136
124
  { "args": ["screenshot", "/tmp/page.png"] }
125
+ { "args": ["pdf", "/tmp/page.pdf"] }
137
126
  ```
138
127
 
139
- Rules:
128
+ Prefer `download <selector> <path>` when the target element itself is the downloadable link/control. Use `click` plus `wait --download [path]` when a previous action starts the download indirectly.
140
129
 
141
- - Prefer `download <selector> <path>` over `click` when the goal is a downloaded file on disk.
142
- - Prefer explicit output paths when the calling task needs to read, move, or attach the saved file later.
143
- - Use `--download-path <dir>` on the first launch when many downloads should land in one directory.
130
+ Wrapper result rendering is metadata-first for saved files:
131
+ - screenshots return a saved-path summary, structured `details.artifacts` metadata, and an inline image attachment when safe
132
+ - downloads, PDFs, `wait --download` files, traces, CPU profiles, completed WebM recordings from `record stop`, and path-bearing HAR captures return concise saved-path summaries plus structured `details.artifacts` metadata without inlining large files
133
+ - `record start <path>` reports that recording started and that output will be written on `record stop`; the target file may not exist until recording stops
134
+ - `batch` keeps each step's artifacts in `details.batchSteps[].artifacts` and aggregates them in top-level `details.artifacts` in step order
144
135
 
145
- ### Read page state
136
+ #### Artifact retention and dogfood-heavy QA runs
146
137
 
147
- `get <subcommand>` supports:
138
+ The wrapper keeps a bounded, metadata-only `details.artifactManifest` of recent artifacts so long sessions do not grow unbounded. The default recent window is 100 entries and can be raised for screenshot/video-heavy QA sessions with `PI_AGENT_BROWSER_SESSION_ARTIFACT_MANIFEST_MAX_ENTRIES=<count>`.
148
139
 
149
- - `title`
150
- - `url`
151
- - `text <selector>`
152
- - `html <selector>`
153
- - `value <selector>`
154
- - `attr <selector> <name>`
155
- - `count <selector>`
156
- - `box <selector>`
157
- - `styles <selector>`
158
- - `cdp-url`
140
+ This manifest cap controls what appears in `details.artifactManifest` and in summaries such as `Session artifacts: 42 live, 0 evicted (42/100 recent)`. It does not delete explicit files that upstream saved to paths you chose, such as screenshots, PDFs, downloads, traces, HAR files, or WebM recordings.
159
141
 
160
- Examples:
142
+ Oversized snapshots and oversized generic outputs are different: when a persisted pi session is available, their wrapper-managed spill files are stored under the private session artifact directory and are governed by the byte budget `PI_AGENT_BROWSER_SESSION_ARTIFACT_MAX_BYTES` (default 32 MiB). Raise that byte budget as well for long QA sessions that need many full raw snapshots or large text spills to survive reload/resume.
143
+
144
+ ### Switch from an already-active implicit session to a fresh profiled launch
161
145
 
162
146
  ```json
163
- { "args": ["get", "title"] }
164
- { "args": ["get", "text", "main"] }
165
- { "args": ["get", "attr", "a.primary", "href"] }
147
+ {
148
+ "args": ["--profile", "Default", "open", "https://mail.google.com"],
149
+ "sessionMode": "fresh"
150
+ }
166
151
  ```
167
152
 
168
- ### JavaScript evaluation
169
-
170
- - `eval <js>`
171
- - `eval --stdin` with JavaScript in `stdin`
172
-
173
- Example:
153
+ ### Recover tabs when focus lands somewhere unexpected
174
154
 
175
155
  ```json
176
- { "args": ["eval", "--stdin"], "stdin": "Array.from(document.querySelectorAll('a')).map((a) => a.href)" }
156
+ { "args": ["tab", "list"] }
157
+ { "args": ["tab", "t2"] }
158
+ { "args": ["snapshot", "-i"] }
177
159
  ```
178
160
 
179
- Rules:
161
+ Use `tab list` and `tab <tab-id-or-label>` when a profile restore, pop-up, or click opens or focuses the wrong tab.
180
162
 
181
- - Return the intended value instead of relying on `console.log`.
182
- - Scope DOM queries to the relevant route, component, or element.
183
- - Prefer `snapshot -i` refs first when the task is interaction-heavy.
163
+ ### Recover from guarded-action confirmations
184
164
 
185
- ### Wait
186
-
187
- - `wait <ms>`
188
- - `wait <selector>`
189
- - use explicit variants like `--load <state>`, `--url <matcher>`, `--fn <js>`, or `--text <matcher>` when needed
190
-
191
- Important:
165
+ When a call uses `--confirm-actions` and upstream requires confirmation, the native tool result prints the pending confirmation id and both recovery calls. Use the same `agent_browser` tool; do not switch to bash.
192
166
 
193
- - bare `wait --load` is incomplete; `--load` needs a state value
194
-
195
- ### Tabs
196
-
197
- - `tab list`
198
- - `tab <tab-id-or-label>`
199
- - `tab new`
200
- - `tab close`
167
+ ```json
168
+ { "args": ["--confirm-actions", "click", "click", "@danger"] }
169
+ ```
201
170
 
202
- Examples:
171
+ If the result says `Pending confirmation id: c_8f3a1234`, choose one follow-up:
203
172
 
204
173
  ```json
205
- { "args": ["tab", "list"] }
206
- { "args": ["tab", "t3"] }
174
+ { "args": ["confirm", "c_8f3a1234"] }
175
+ { "args": ["deny", "c_8f3a1234"] }
207
176
  ```
208
177
 
209
- Use this when:
178
+ Confirmation context may be redacted when it contains credentials, tokens, cookies, or auth-bearing URLs. Use the id exactly as printed.
179
+
180
+ ## Full supported surface
181
+
182
+ The tables below intentionally list more than the recommended workflow. Rare commands are included so agents can discover that the installed upstream supports them without direct `agent-browser --help` access.
183
+
184
+ ### Built-in skills
185
+
186
+ Native-tool note: upstream skills are written for the standalone `agent-browser` CLI and may show bash/heredoc examples. In pi, convert those examples to `agent_browser` calls: pass CLI tokens in `args`, and pass heredoc/stdin bodies through the tool `stdin` field for `batch` or `eval --stdin`.
187
+
188
+ | Command | Purpose |
189
+ | --- | --- |
190
+ | `skills list` | List available CLI-bundled skills. |
191
+ | `skills get core` | Print the core usage guide. |
192
+ | `skills get core --full` | Print the full version-matched core command reference and templates. |
193
+ | `skills get <name>` | Load a specialized skill such as `electron` or `slack`. |
194
+ | `skills path [name]` | Print a skill directory path. |
195
+
196
+ ### Core page and element commands
197
+
198
+ | Command | Purpose |
199
+ | --- | --- |
200
+ | `open <url>` | Navigate to a URL. |
201
+ | `click <sel>` | Click an element or `@ref`. |
202
+ | `dblclick <sel>` | Double-click an element. |
203
+ | `type <sel> <text>` | Type into an element. |
204
+ | `fill <sel> <text>` | Clear and fill an element. |
205
+ | `press <key>` | Press a key such as `Enter`, `Tab`, or `Control+a`. |
206
+ | `keyboard type <text>` | Type text with real keystrokes and no selector. |
207
+ | `keyboard inserttext <text>` | Insert text without key events. |
208
+ | `hover <sel>` | Hover an element. |
209
+ | `focus <sel>` | Focus an element. |
210
+ | `check <sel>` | Check a checkbox. |
211
+ | `uncheck <sel>` | Uncheck a checkbox. |
212
+ | `select <sel> <val...>` | Select one or more dropdown options. |
213
+ | `drag <src> <dst>` | Drag and drop. |
214
+ | `upload <sel> <files...>` | Upload one or more files. |
215
+ | `download <sel> <path>` | Download a file by clicking an element. |
216
+ | `scroll <dir> [px]` | Scroll `up`, `down`, `left`, or `right`. |
217
+ | `scrollintoview <sel>` | Scroll an element into view. |
218
+ | `wait <sel|ms>` | Wait for an element or a duration. |
219
+ | `screenshot [path]` | Take a screenshot. |
220
+ | `pdf <path>` | Save the page as a PDF. |
221
+ | `snapshot` | Print an accessibility tree with refs for AI interaction. |
222
+ | `eval <js>` | Run JavaScript. Use `eval --stdin` through this wrapper for larger snippets. |
223
+ | `connect <port|url>` | Connect to a browser through CDP. |
224
+ | `close [--all]` | Close the current browser or all sessions. |
225
+
226
+ ### Navigation
227
+
228
+ | Command | Purpose |
229
+ | --- | --- |
230
+ | `back` | Go back. |
231
+ | `forward` | Go forward. |
232
+ | `reload` | Reload the current page. |
210
233
 
211
- - a restored profile tab steals focus
212
- - an interaction opens a new tab
213
- - the browser lands on the wrong page unexpectedly
234
+ ### Session and inspection commands
214
235
 
215
- ### Batch
236
+ | Command | Purpose |
237
+ | --- | --- |
238
+ | `session` | Show current session name. |
239
+ | `session list` | List active sessions. |
240
+ | `close` | Close the current browser session. |
241
+ | `close --all` | Close every session. |
216
242
 
217
- - `batch`
218
- - `batch --bail`
243
+ <!-- agent-browser-playbook:start inspection -->
244
+ <!-- Generated from extensions/agent-browser/lib/playbook.ts. Run `npm run docs -- playbook write` to update. -->
245
+ Native inspection calls use the `agent_browser` tool shape, not shell-like direct-binary commands:
219
246
 
220
- Example:
247
+ - { "args": ["--help"] }
248
+ - { "args": ["--version"] }
221
249
 
222
- ```json
223
- { "args": ["batch", "--bail"], "stdin": "[[\"open\",\"https://example.com\"],[\"snapshot\",\"-i\"],[\"click\",\"@e2\"]]" }
224
- ```
250
+ These calls return plain text and stay stateless: the extension does not inject its implicit session and does not let inspection consume the managed-session slot needed for later profile, session, CDP, state, or auto-connect launches.
251
+ <!-- agent-browser-playbook:end inspection -->
225
252
 
226
- ### Session and inspection commands
253
+ ### Page state, finding, mouse, settings, network, and storage
254
+
255
+ | Family | Surface |
256
+ | --- | --- |
257
+ | `get <what> [selector]` | `text`, `html`, `value`, `attr <name>`, `title`, `url`, `count`, `box`, `styles`, `cdp-url`. |
258
+ | `is <what> <selector>` | Check `visible`, `enabled`, or `checked`. |
259
+ | `find <locator> <value> <action> [text]` | Locator types include `role`, `text`, `label`, `placeholder`, `alt`, `title`, `testid`, `first`, `last`, and `nth`. |
260
+ | `mouse <action> [args]` | `move <x> <y>`, `down [btn]`, `up [btn]`, `wheel <dy> [dx]`. |
261
+ | `set <setting> [value]` | `viewport <w> <h>`, `device <name>`, `geo <lat> <lng>`, `offline [on|off]`, `headers <json>`, `credentials <user> <pass>`, `media [dark|light] [reduced-motion]`. |
262
+ | `network <action>` | `route <url> [--abort|--body <json>]`, `unroute [url]`, `requests [--clear] [--filter <pattern>]`, `request <requestId>`, `har <start|stop> [path]`. |
263
+ | `cookies [get|set|clear]` | Manage cookies. `set` supports `--url`, `--domain`, `--path`, `--httpOnly`, `--secure`, `--sameSite`, and `--expires`. |
264
+ | `storage <local|session>` | Manage web storage. |
227
265
 
228
- - `session`
229
- - `session list`
230
- - `close`
231
- - `close --all`
232
- - `--help`
233
- - `--version`
266
+ ### Tabs
234
267
 
235
- The wrapper keeps `--help` and `--version` stateless so they do not consume the implicit managed-session slot.
268
+ Stable tab ids look like `t1`, `t2`, and `t3`. Optional user labels such as `docs` or `app` are interchangeable with ids wherever a tab reference is accepted.
269
+
270
+ | Command | Purpose |
271
+ | --- | --- |
272
+ | `tab` | List open tabs by default. |
273
+ | `tab list` | List open tabs with ids and labels. |
274
+ | `tab new [url]` | Open a new tab. |
275
+ | `tab new --label <name> [url]` | Open a new tab with a user label. |
276
+ | `tab <t<N>|label>` | Switch to a tab by id or label. |
277
+ | `tab close [t<N>|label]` | Close the current tab or a referenced tab. |
278
+
279
+ ### Snapshot
280
+
281
+ | Option | Purpose |
282
+ | --- | --- |
283
+ | `snapshot` | Full accessibility tree with refs. |
284
+ | `snapshot -i` / `snapshot --interactive` | Include only interactive elements. |
285
+ | `snapshot -i --urls` | Include only interactive elements and link hrefs. |
286
+ | `snapshot -u` / `snapshot --urls` | Include href URLs for link elements. |
287
+ | `snapshot -c` / `snapshot --compact` | Remove empty structural elements. |
288
+ | `snapshot -d <n>` / `snapshot --depth <n>` | Limit tree depth. |
289
+ | `snapshot -s <sel>` / `snapshot --selector <sel>` | Scope to a CSS selector. |
236
290
 
237
- ## Important global flags
291
+ ### Wait
238
292
 
239
- - `--profile <name|path>` reuse Chrome profile state
240
- - `--session <name>` explicit upstream session name
241
- - `--session-name <name>` upstream saved auth/session state name
242
- - `--cdp <port-or-url>` connect to an existing browser
243
- - `--headed` show the browser window
244
- - `--download-path <dir>` default download directory
245
- - `--user-agent <ua>` custom user agent
246
- - `--json` injected by the wrapper automatically for normal tool execution
293
+ | Mode | Purpose |
294
+ | --- | --- |
295
+ | `wait <selector>` | Wait for an element to appear. |
296
+ | `wait <ms>` | Wait for a fixed number of milliseconds. |
297
+ | `wait --url <pattern>` | Wait for the URL to match a pattern. |
298
+ | `wait --load <state>` | Wait for load state: `load`, `domcontentloaded`, or `networkidle`. |
299
+ | `wait --fn <expression>` | Wait for a JavaScript expression to become truthy. |
300
+ | `wait --text <text>` | Wait for text to appear on the page. |
301
+ | `wait --download [path]` | Wait for a download started by a previous action and optionally save it to `path`; successful wrapper results include upstream-reported `savedFilePath`/`savedFile`, while `details.artifacts[].exists` is the wrapper's on-disk verification signal. |
302
+ | `wait --download [path] --timeout <ms>` | Set download-start timeout in milliseconds. |
303
+ | `wait <selector> --state hidden` | Wait for an element to become hidden. |
304
+ | `wait <selector> --state detached` | Wait for an element to detach. |
305
+
306
+ ### Diff, debug, and streaming
307
+
308
+ | Command | Purpose |
309
+ | --- | --- |
310
+ | `diff snapshot` | Compare current versus last snapshot. |
311
+ | `diff screenshot --baseline` | Compare current screenshot versus a baseline image. |
312
+ | `diff url <u1> <u2>` | Compare two pages. |
313
+ | `trace start|stop [path]` | Record a Chrome DevTools trace. |
314
+ | `profiler start|stop [path]` | Record a Chrome DevTools profile. |
315
+ | `record start <path> [url]` | Start WebM video recording; output is written on `record stop`. |
316
+ | `record stop` | Stop and save video. |
317
+ | `console [--clear]` | View or clear console logs. |
318
+ | `errors [--clear]` | View or clear page errors. |
319
+ | `highlight <sel>` | Highlight an element. |
320
+ | `inspect` | Open Chrome DevTools for the active page. |
321
+ | `clipboard <op> [text]` | Read/write clipboard: `read`, `write`, `copy`, `paste`. |
322
+ | `stream enable [--port <n>]` | Start runtime WebSocket streaming for this session. |
323
+ | `stream disable` | Stop runtime WebSocket streaming. |
324
+ | `stream status` | Show streaming status and active port. |
325
+
326
+ When these diagnostic commands are invoked through the native `agent_browser` tool, structured console and page-error outputs render as compact summaries with counts and key fields. Large outputs are previewed with a `Full output path:` spill file instead of dumping the entire payload into context.
327
+
328
+ ### Batch, auth, confirmations, sessions, chat, dashboard, and setup
329
+
330
+ | Command | Purpose |
331
+ | --- | --- |
332
+ | `batch [--bail] ["cmd" ...]` | Execute multiple commands sequentially from args or stdin. |
333
+ | `auth save <name> [opts]` | Save an auth profile with options such as `--url`, `--username`, `--password`, or `--password-stdin`. |
334
+ | `auth login <name>` | Login using saved credentials. |
335
+ | `auth list` | List saved auth profiles. |
336
+ | `auth show <name>` | Show auth profile metadata. |
337
+ | `auth delete <name>` | Delete an auth profile. |
338
+ | `confirm <id>` | Approve a pending action. |
339
+ | `deny <id>` | Deny a pending action. |
340
+ | `session` | Show current session name. |
341
+ | `session list` | List active sessions. |
342
+ | `chat <message>` | Send a natural-language instruction. |
343
+ | `chat` | Start interactive chat when stdin is a TTY. |
344
+ | `dashboard [start]` | Start the dashboard server on the default port `4848`. |
345
+ | `dashboard start --port <n>` | Start the dashboard on a specific port. |
346
+ | `dashboard stop` | Stop the dashboard server. |
347
+ | `install` | Install browser binaries. |
348
+ | `install --with-deps` | Install browser binaries plus Linux system dependencies. |
349
+ | `upgrade` | Upgrade `agent-browser` to the latest version. |
350
+ | `doctor [--fix]` | Diagnose install issues and optionally auto-clean stale files. |
351
+ | `profiles` | List available Chrome profiles. |
352
+
353
+ When these commands are invoked through the native `agent_browser` tool, structured diagnostic/status outputs are rendered as compact summaries. List-like outputs such as sessions, Chrome profiles, auth profiles, network requests, console messages, and page errors include counts and key fields; large outputs are previewed with a `Full output path:` spill file instead of dumping the entire payload into context. For `network requests`, the wrapper shows status, method, URL, resource/mime type, request id, and, when the installed upstream output includes body-like fields, bounded redacted payload, response, and failure/error snippets. `network request <requestId>` can expose upstream full-detail body fields such as response bodies using the same bounded model-facing preview. Header, cookie, auth, token, and other secret-like fields are not expanded in model-facing text; use upstream HAR or full raw details only when complete data is required.
354
+
355
+ ## Important global flags, config, and environment
356
+
357
+ ### Authentication and session flags
358
+
359
+ - `--profile <name|path>`: reuse Chrome profile login state or use a persistent custom profile. Environment: `AGENT_BROWSER_PROFILE`.
360
+ - `--session <name>`: use an isolated session. Environment: `AGENT_BROWSER_SESSION`.
361
+ - `--session-name <name>`: auto-save/restore cookies and local storage by name. Environment: `AGENT_BROWSER_SESSION_NAME`.
362
+ - `--state <path>`: load saved auth state from JSON. Environment: `AGENT_BROWSER_STATE`.
363
+ - `--auto-connect`: connect to a running Chrome to reuse auth state. Environment: `AGENT_BROWSER_AUTO_CONNECT`.
364
+ - `--headers <json>`: apply HTTP headers scoped to the opened URL's origin.
365
+
366
+ ### Browser launch and runtime flags
367
+
368
+ - `--executable-path <path>`: custom browser executable. Environment: `AGENT_BROWSER_EXECUTABLE_PATH`.
369
+ - `--extension <path>`: load browser extensions; repeatable. Environment: `AGENT_BROWSER_EXTENSIONS`.
370
+ - `--args <args>`: browser launch args, comma or newline separated. Environment: `AGENT_BROWSER_ARGS`.
371
+ - `--user-agent <ua>`: custom user agent. Environment: `AGENT_BROWSER_USER_AGENT`.
372
+ - `--proxy <server>`: proxy server URL. Environments: `AGENT_BROWSER_PROXY`, `HTTP_PROXY`, `HTTPS_PROXY`, `ALL_PROXY`.
373
+ - `--proxy-bypass <hosts>`: proxy bypass hosts. Environments: `AGENT_BROWSER_PROXY_BYPASS`, `NO_PROXY`.
374
+ - `--ignore-https-errors`: ignore HTTPS certificate errors. Environment: `AGENT_BROWSER_IGNORE_HTTPS_ERRORS`.
375
+ - `--allow-file-access`: allow `file://` URLs to access local files. Environment: `AGENT_BROWSER_ALLOW_FILE_ACCESS`.
376
+ - `--headed`: show the browser window. Environment: `AGENT_BROWSER_HEADED`.
377
+ - `--cdp <port>`: connect through Chrome DevTools Protocol.
378
+ - `--color-scheme <scheme>`: `dark`, `light`, or `no-preference`. Environment: `AGENT_BROWSER_COLOR_SCHEME`.
379
+ - `--download-path <path>`: default browser download directory. Environment: `AGENT_BROWSER_DOWNLOAD_PATH`.
380
+ - `--engine <name>`: browser engine, `chrome` by default or `lightpanda`. Environment: `AGENT_BROWSER_ENGINE`.
381
+ - `--no-auto-dialog`: disable automatic dismissal of alert/beforeunload dialogs. Environment: `AGENT_BROWSER_NO_AUTO_DIALOG`.
382
+
383
+ ### Output, provider, policy, and AI flags
384
+
385
+ - `--json`: JSON output. The wrapper injects this automatically for normal tool execution.
386
+ - `--annotate`: annotated screenshot with numbered labels and legend. Environment: `AGENT_BROWSER_ANNOTATE`.
387
+ - `--screenshot-dir <path>`: default screenshot output directory. Environment: `AGENT_BROWSER_SCREENSHOT_DIR`.
388
+ - `--screenshot-quality <n>`: JPEG quality `0-100`. Environment: `AGENT_BROWSER_SCREENSHOT_QUALITY`.
389
+ - `--screenshot-format <fmt>`: `png` or `jpeg`. Environment: `AGENT_BROWSER_SCREENSHOT_FORMAT`.
390
+ - `--content-boundaries`: wrap page output in boundary markers. Environment: `AGENT_BROWSER_CONTENT_BOUNDARIES`.
391
+ - `--max-output <chars>`: truncate page output to N characters. Environment: `AGENT_BROWSER_MAX_OUTPUT`.
392
+ - `--allowed-domains <list>`: restrict navigation domains. Environment: `AGENT_BROWSER_ALLOWED_DOMAINS`.
393
+ - `--action-policy <path>`: action policy JSON file. Environment: `AGENT_BROWSER_ACTION_POLICY`.
394
+ - `--confirm-actions <list>`: action categories requiring confirmation. Environment: `AGENT_BROWSER_CONFIRM_ACTIONS`.
395
+ - `--confirm-interactive`: interactive confirmations; auto-denies when stdin is not a TTY. Environment: `AGENT_BROWSER_CONFIRM_INTERACTIVE`.
396
+ - `-p, --provider <name>`: provider such as `ios`, `browserbase`, `kernel`, `browseruse`, `browserless`, or `agentcore`. Environment: `AGENT_BROWSER_PROVIDER`.
397
+ - `--device <name>`: iOS device name. Environment: `AGENT_BROWSER_IOS_DEVICE`.
398
+ - `--model <name>`: AI model for `chat`. Environment: `AI_GATEWAY_MODEL`.
399
+ - `-v, --verbose`: show tool commands and raw output.
400
+ - `-q, --quiet`: show only AI text responses.
401
+ - `--debug`: debug output. Environment: `AGENT_BROWSER_DEBUG`.
402
+ - `--version`, `-V`: show version.
403
+
404
+ ### Config precedence
405
+
406
+ `agent-browser` looks for `agent-browser.json` in these locations, from lowest to highest priority:
407
+
408
+ 1. `~/.agent-browser/config.json` for user defaults.
409
+ 2. `./agent-browser.json` for project overrides.
410
+ 3. Environment variables, including `AGENT_BROWSER_CONFIG`.
411
+ 4. CLI flags.
412
+
413
+ Use `--config <path>` to load a specific config file. Boolean flags accept optional `true` or `false` values, such as `--headed false`, to override config. Browser extensions from user and project configs are merged rather than replaced.
414
+
415
+ Other useful environment variables include `AGENT_BROWSER_DEFAULT_TIMEOUT`, `AGENT_BROWSER_STREAM_PORT`, `AGENT_BROWSER_IDLE_TIMEOUT_MS`, `AGENT_BROWSER_ENCRYPTION_KEY`, `AGENT_BROWSER_STATE_EXPIRE_DAYS`, `AGENT_BROWSER_IOS_UDID`, `AI_GATEWAY_URL`, and `AI_GATEWAY_API_KEY`.
247
416
 
248
417
  ## Wrapper-specific behavior worth knowing
249
418
 
250
419
  - The extension may keep following one implicit managed session across later tool calls.
251
- - If startup-scoped flags like `--profile`, `--session-name`, or `--cdp` would be ignored because that implicit session is already active, retry with `sessionMode: "fresh"`.
252
- - After profiled opens, the wrapper best-effort restores the intended target tab when restored tabs steal focus.
253
- - After the wrapper knows the intended tab for a session, later commands best-effort keep that tab active so reconnect drift does not silently move the browser to a restored/background tab.
254
- - Oversized snapshots and oversized generic outputs may be compacted in tool content, with the full raw output written to a spill file path shown directly in the tool result.
420
+ - If launch-scoped flags like `--profile`, `--session-name`, `--cdp`, `--state`, or `--auto-connect` would be ignored because that implicit session is already active, retry with `sessionMode: "fresh"`.
421
+ <!-- agent-browser-playbook:start wrapper-tab-recovery -->
422
+ <!-- Generated from extensions/agent-browser/lib/playbook.ts. Run `npm run docs -- playbook write` to update. -->
423
+ - After launch-scoped open/goto/navigate calls that can restore existing tabs (for example --profile, --session-name, or --state), agent_browser best-effort re-selects the tab whose URL matches the returned page when restored tabs steal focus during launch.
424
+ - After a target tab is known for a session, later active-tab commands best-effort pin that tab inside the same upstream invocation when reconnect drift would otherwise move the command to a restored/background tab.
425
+ - After a successful command on a known target tab, agent_browser also best-effort restores that intended tab if a restored/background tab steals focus after the command completes.
426
+ - If a known session target unexpectedly reports about:blank, agent_browser preserves the prior intended target, best-effort re-selects it when it still exists, and reports exact recovery guidance when it cannot be re-selected.
427
+ <!-- agent-browser-playbook:end wrapper-tab-recovery -->
428
+ - Oversized snapshots and oversized generic outputs may be compacted in tool content, with the full raw output written to a spill file path shown directly in the tool result. Recent artifact metadata is bounded by `PI_AGENT_BROWSER_SESSION_ARTIFACT_MANIFEST_MAX_ENTRIES` (default 100); persisted spill files are separately bounded by `PI_AGENT_BROWSER_SESSION_ARTIFACT_MAX_BYTES` (default 32 MiB).
429
+ - The wrapper keeps `--help` and `--version` stateless so they do not consume the implicit managed-session slot.
430
+
431
+ ## Generated capability baseline
432
+
433
+ <!-- agent-browser-capability-baseline:start capability-token-baseline -->
434
+ <!-- Generated from scripts/agent-browser-capability-baseline.mjs. Run `npm run docs -- command-reference write` to update. Do not edit manually. -->
435
+ <details>
436
+ <summary>Generated verifier capability baseline for agent-browser 0.26.0</summary>
437
+
438
+ This generated block is review data for maintainers. The human-authored reference sections above remain the readable command guide.
439
+
440
+ #### Upstream help commands sampled
441
+ - root help: `agent-browser --help`
442
+ - tab help: `agent-browser tab --help`
443
+ - snapshot help: `agent-browser snapshot --help`
444
+ - wait help: `agent-browser wait --help`
445
+
446
+ #### Upstream help tokens expected
447
+ - root help: `skills`
448
+ - root help: `keyboard`
449
+ - root help: `scroll`
450
+ - root help: `scrollintoview`
451
+ - root help: `connect`
452
+ - root help: `is`
453
+ - root help: `find`
454
+ - root help: `mouse`
455
+ - root help: `set`
456
+ - root help: `network`
457
+ - root help: `cookies [get|set|clear]`
458
+ - root help: `storage`
459
+ - root help: `diff snapshot`
460
+ - root help: `trace start|stop [path]`
461
+ - root help: `profiler start|stop [path]`
462
+ - root help: `record start <path> [url]`
463
+ - root help: `console [--clear]`
464
+ - root help: `errors [--clear]`
465
+ - root help: `highlight <sel>`
466
+ - root help: `inspect`
467
+ - root help: `clipboard <op> [text]`
468
+ - root help: `stream enable [--port <n>]`
469
+ - root help: `auth save <name>`
470
+ - root help: `confirm <id>`
471
+ - root help: `deny <id>`
472
+ - root help: `chat <message>`
473
+ - root help: `dashboard start --port <n>`
474
+ - root help: `install --with-deps`
475
+ - root help: `upgrade`
476
+ - root help: `doctor [--fix]`
477
+ - root help: `profiles`
478
+ - snapshot help: `-u, --urls`
479
+ - wait help: `--download [path]`
480
+ - tab help: `new --label <name> [url]`
481
+
482
+ </details>
483
+ <!-- agent-browser-capability-baseline:end capability-token-baseline -->
255
484
 
256
485
  ## Maintenance rule
257
486
 
258
487
  Whenever the upstream `agent-browser` binary version changes in this project:
259
488
 
260
- 1. re-check the upstream command/help surface
261
- 2. update this local command reference if anything changed
262
- 3. update tool prompt guidance if the recommended agent workflow changed
263
- 4. update README and release docs if the user-visible behavior changed
264
- 5. validate the extension still exposes local documentation that is at least as usable as the blocked direct-binary path for normal agent work
489
+ 1. run `agent-browser --version`, `agent-browser --help`, `agent-browser tab --help`, `agent-browser snapshot --help`, and `agent-browser wait --help`
490
+ 2. update the canonical metadata in `scripts/agent-browser-capability-baseline.mjs`
491
+ 3. update the human-authored command reference sections if command semantics or recommended workflows changed
492
+ 4. run `npm run docs -- command-reference write` to regenerate capability baseline blocks; do not manually edit generated blocks
493
+ 5. run `npm run verify -- command-reference`
494
+ 6. update tool prompt guidance if the recommended agent workflow changed
495
+ 7. update README and release docs if user-visible behavior changed
496
+ 8. validate the extension still exposes local documentation that is at least as usable as the blocked direct-binary path for normal agent work