agent-browser 0.20.14 → 0.21.1
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +33 -1
- package/bin/agent-browser-darwin-arm64 +0 -0
- package/bin/agent-browser-darwin-x64 +0 -0
- package/bin/agent-browser-linux-arm64 +0 -0
- package/bin/agent-browser-linux-musl-arm64 +0 -0
- package/bin/agent-browser-linux-musl-x64 +0 -0
- package/bin/agent-browser-linux-x64 +0 -0
- package/bin/agent-browser-win32-x64.exe +0 -0
- package/package.json +1 -1
- package/skills/agent-browser/SKILL.md +51 -1
- package/skills/agent-browser/references/commands.md +27 -1
- package/skills/agent-browser/references/snapshot-refs.md +25 -0
package/README.md
CHANGED
|
@@ -58,6 +58,16 @@ On Linux, install system dependencies:
|
|
|
58
58
|
agent-browser install --with-deps
|
|
59
59
|
```
|
|
60
60
|
|
|
61
|
+
### Updating
|
|
62
|
+
|
|
63
|
+
Upgrade to the latest version:
|
|
64
|
+
|
|
65
|
+
```bash
|
|
66
|
+
agent-browser upgrade
|
|
67
|
+
```
|
|
68
|
+
|
|
69
|
+
Detects your installation method (npm, Homebrew, or Cargo) and runs the appropriate update command automatically.
|
|
70
|
+
|
|
61
71
|
### Requirements
|
|
62
72
|
|
|
63
73
|
- **Chrome** - Run `agent-browser install` to download Chrome from [Chrome for Testing](https://developer.chrome.com/blog/chrome-for-testing/) (Google's official automation channel). No Playwright or Node.js required for the daemon.
|
|
@@ -187,6 +197,25 @@ agent-browser wait "#spinner" --state hidden
|
|
|
187
197
|
|
|
188
198
|
**Load states:** `load`, `domcontentloaded`, `networkidle`
|
|
189
199
|
|
|
200
|
+
### Batch Execution
|
|
201
|
+
|
|
202
|
+
Execute multiple commands in a single invocation by piping a JSON array of
|
|
203
|
+
string arrays to `batch`. This avoids per-command process startup overhead
|
|
204
|
+
when running multi-step workflows.
|
|
205
|
+
|
|
206
|
+
```bash
|
|
207
|
+
# Pipe commands as JSON
|
|
208
|
+
echo '[
|
|
209
|
+
["open", "https://example.com"],
|
|
210
|
+
["snapshot", "-i"],
|
|
211
|
+
["click", "@e1"],
|
|
212
|
+
["screenshot", "result.png"]
|
|
213
|
+
]' | agent-browser batch --json
|
|
214
|
+
|
|
215
|
+
# Stop on first error
|
|
216
|
+
agent-browser batch --bail < commands.json
|
|
217
|
+
```
|
|
218
|
+
|
|
190
219
|
### Clipboard
|
|
191
220
|
|
|
192
221
|
```bash
|
|
@@ -241,6 +270,8 @@ agent-browser network route <url> --body <json> # Mock response
|
|
|
241
270
|
agent-browser network unroute [url] # Remove routes
|
|
242
271
|
agent-browser network requests # View tracked requests
|
|
243
272
|
agent-browser network requests --filter api # Filter requests
|
|
273
|
+
agent-browser network har start # Start HAR recording
|
|
274
|
+
agent-browser network har stop [output.har] # Stop and save HAR (temp path if omitted)
|
|
244
275
|
```
|
|
245
276
|
|
|
246
277
|
### Tabs & Windows
|
|
@@ -318,6 +349,7 @@ agent-browser reload # Reload page
|
|
|
318
349
|
```bash
|
|
319
350
|
agent-browser install # Download Chrome from Chrome for Testing (Google's official automation channel)
|
|
320
351
|
agent-browser install --with-deps # Also install system deps (Linux)
|
|
352
|
+
agent-browser upgrade # Upgrade agent-browser to the latest version
|
|
321
353
|
```
|
|
322
354
|
|
|
323
355
|
## Authentication
|
|
@@ -539,7 +571,6 @@ This is useful for multimodal AI models that can reason about visual layout, unl
|
|
|
539
571
|
| `-p, --provider <name>` | Cloud browser provider (or `AGENT_BROWSER_PROVIDER` env) |
|
|
540
572
|
| `--device <name>` | iOS device name, e.g. "iPhone 15 Pro" (or `AGENT_BROWSER_IOS_DEVICE` env) |
|
|
541
573
|
| `--json` | JSON output (for agents) |
|
|
542
|
-
| `--full, -f` | Full page screenshot |
|
|
543
574
|
| `--annotate` | Annotated screenshot with numbered element labels (or `AGENT_BROWSER_ANNOTATE` env) |
|
|
544
575
|
| `--screenshot-dir <path>` | Default screenshot output directory (or `AGENT_BROWSER_SCREENSHOT_DIR` env) |
|
|
545
576
|
| `--screenshot-quality <n>` | JPEG quality 0-100 (or `AGENT_BROWSER_SCREENSHOT_QUALITY` env) |
|
|
@@ -877,6 +908,7 @@ Auto-connect discovers Chrome by:
|
|
|
877
908
|
|
|
878
909
|
1. Reading Chrome's `DevToolsActivePort` file from the default user data directory
|
|
879
910
|
2. Falling back to probing common debugging ports (9222, 9229)
|
|
911
|
+
3. If HTTP-based discovery (`/json/version`, `/json/list`) fails, falling back to a direct WebSocket connection
|
|
880
912
|
|
|
881
913
|
This is useful when:
|
|
882
914
|
|
|
Binary file
|
|
Binary file
|
|
Binary file
|
|
Binary file
|
|
Binary file
|
|
Binary file
|
|
Binary file
|
package/package.json
CHANGED
|
@@ -6,7 +6,7 @@ allowed-tools: Bash(npx agent-browser:*), Bash(agent-browser:*)
|
|
|
6
6
|
|
|
7
7
|
# Browser Automation with agent-browser
|
|
8
8
|
|
|
9
|
-
The CLI uses Chrome/Chromium via CDP directly. Install via `npm i -g agent-browser`, `brew install agent-browser`, or `cargo install agent-browser`. Run `agent-browser install` to download Chrome.
|
|
9
|
+
The CLI uses Chrome/Chromium via CDP directly. Install via `npm i -g agent-browser`, `brew install agent-browser`, or `cargo install agent-browser`. Run `agent-browser install` to download Chrome. Run `agent-browser upgrade` to update to the latest version.
|
|
10
10
|
|
|
11
11
|
## Core Workflow
|
|
12
12
|
|
|
@@ -147,6 +147,12 @@ agent-browser download @e1 ./file.pdf # Click element to trigger downlo
|
|
|
147
147
|
agent-browser wait --download ./output.zip # Wait for any download to complete
|
|
148
148
|
agent-browser --download-path ./downloads open <url> # Set default download directory
|
|
149
149
|
|
|
150
|
+
# Network
|
|
151
|
+
agent-browser network requests # Inspect tracked requests
|
|
152
|
+
agent-browser network route "**/api/*" --abort # Block matching requests
|
|
153
|
+
agent-browser network har start # Start HAR recording
|
|
154
|
+
agent-browser network har stop ./capture.har # Stop and save HAR file
|
|
155
|
+
|
|
150
156
|
# Viewport & Device Emulation
|
|
151
157
|
agent-browser set viewport 1920 1080 # Set viewport size (default: 1280x720)
|
|
152
158
|
agent-browser set viewport 1920 1080 2 # 2x retina (same CSS size, higher res screenshots)
|
|
@@ -175,6 +181,24 @@ agent-browser diff url <url1> <url2> --wait-until networkidle # Custom wait str
|
|
|
175
181
|
agent-browser diff url <url1> <url2> --selector "#main" # Scope to element
|
|
176
182
|
```
|
|
177
183
|
|
|
184
|
+
## Batch Execution
|
|
185
|
+
|
|
186
|
+
Execute multiple commands in a single invocation by piping a JSON array of string arrays to `batch`. This avoids per-command process startup overhead when running multi-step workflows.
|
|
187
|
+
|
|
188
|
+
```bash
|
|
189
|
+
echo '[
|
|
190
|
+
["open", "https://example.com"],
|
|
191
|
+
["snapshot", "-i"],
|
|
192
|
+
["click", "@e1"],
|
|
193
|
+
["screenshot", "result.png"]
|
|
194
|
+
]' | agent-browser batch --json
|
|
195
|
+
|
|
196
|
+
# Stop on first error
|
|
197
|
+
agent-browser batch --bail < commands.json
|
|
198
|
+
```
|
|
199
|
+
|
|
200
|
+
Use `batch` when you have a known sequence of commands that don't depend on intermediate output. Use separate commands or `&&` chaining when you need to parse output between steps (e.g., snapshot to discover refs, then interact).
|
|
201
|
+
|
|
178
202
|
## Common Patterns
|
|
179
203
|
|
|
180
204
|
### Form Submission
|
|
@@ -245,6 +269,30 @@ agent-browser state clear myapp
|
|
|
245
269
|
agent-browser state clean --older-than 7
|
|
246
270
|
```
|
|
247
271
|
|
|
272
|
+
### Working with Iframes
|
|
273
|
+
|
|
274
|
+
Iframe content is automatically inlined in snapshots. Refs inside iframes carry frame context, so you can interact with them directly.
|
|
275
|
+
|
|
276
|
+
```bash
|
|
277
|
+
agent-browser open https://example.com/checkout
|
|
278
|
+
agent-browser snapshot -i
|
|
279
|
+
# @e1 [heading] "Checkout"
|
|
280
|
+
# @e2 [Iframe] "payment-frame"
|
|
281
|
+
# @e3 [input] "Card number"
|
|
282
|
+
# @e4 [input] "Expiry"
|
|
283
|
+
# @e5 [button] "Pay"
|
|
284
|
+
|
|
285
|
+
# Interact directly — no frame switch needed
|
|
286
|
+
agent-browser fill @e3 "4111111111111111"
|
|
287
|
+
agent-browser fill @e4 "12/28"
|
|
288
|
+
agent-browser click @e5
|
|
289
|
+
|
|
290
|
+
# To scope a snapshot to one iframe:
|
|
291
|
+
agent-browser frame @e2
|
|
292
|
+
agent-browser snapshot -i # Only iframe content
|
|
293
|
+
agent-browser frame main # Return to main frame
|
|
294
|
+
```
|
|
295
|
+
|
|
248
296
|
### Data Extraction
|
|
249
297
|
|
|
250
298
|
```bash
|
|
@@ -281,6 +329,8 @@ agent-browser --auto-connect snapshot
|
|
|
281
329
|
agent-browser --cdp 9222 snapshot
|
|
282
330
|
```
|
|
283
331
|
|
|
332
|
+
Auto-connect discovers Chrome via `DevToolsActivePort`, common debugging ports (9222, 9229), and falls back to a direct WebSocket connection if HTTP-based CDP discovery fails.
|
|
333
|
+
|
|
284
334
|
### Color Scheme (Dark Mode)
|
|
285
335
|
|
|
286
336
|
```bash
|
|
@@ -177,10 +177,36 @@ agent-browser window new # New window
|
|
|
177
177
|
## Frames
|
|
178
178
|
|
|
179
179
|
```bash
|
|
180
|
-
agent-browser frame "#iframe" # Switch to iframe
|
|
180
|
+
agent-browser frame "#iframe" # Switch to iframe by CSS selector
|
|
181
|
+
agent-browser frame @e3 # Switch to iframe by element ref
|
|
181
182
|
agent-browser frame main # Back to main frame
|
|
182
183
|
```
|
|
183
184
|
|
|
185
|
+
### Iframe support
|
|
186
|
+
|
|
187
|
+
Iframes are detected automatically during snapshots. When the main-frame snapshot runs, `Iframe` nodes are resolved and their content is inlined beneath the iframe element in the output (one level of nesting; iframes within iframes are not expanded).
|
|
188
|
+
|
|
189
|
+
```bash
|
|
190
|
+
agent-browser snapshot -i
|
|
191
|
+
# @e3 [Iframe] "payment-frame"
|
|
192
|
+
# @e4 [input] "Card number"
|
|
193
|
+
# @e5 [button] "Pay"
|
|
194
|
+
|
|
195
|
+
# Interact directly — refs inside iframes already work
|
|
196
|
+
agent-browser fill @e4 "4111111111111111"
|
|
197
|
+
agent-browser click @e5
|
|
198
|
+
|
|
199
|
+
# Or switch frame context for scoped snapshots
|
|
200
|
+
agent-browser frame @e3 # Switch using element ref
|
|
201
|
+
agent-browser snapshot -i # Snapshot scoped to that iframe
|
|
202
|
+
agent-browser frame main # Return to main frame
|
|
203
|
+
```
|
|
204
|
+
|
|
205
|
+
The `frame` command accepts:
|
|
206
|
+
- **Element refs** — `frame @e3` resolves the ref to an iframe element
|
|
207
|
+
- **CSS selectors** — `frame "#payment-iframe"` finds the iframe by selector
|
|
208
|
+
- **Frame name/URL** — matches against the browser's frame tree
|
|
209
|
+
|
|
184
210
|
## Dialogs
|
|
185
211
|
|
|
186
212
|
```bash
|
|
@@ -162,6 +162,31 @@ agent-browser snapshot @e9
|
|
|
162
162
|
@e10 [radio] selected # Selected radio
|
|
163
163
|
```
|
|
164
164
|
|
|
165
|
+
## Iframes
|
|
166
|
+
|
|
167
|
+
Snapshots automatically detect and inline iframe content. When the main-frame snapshot runs, each `Iframe` node is resolved and its child accessibility tree is included directly beneath it in the output. Refs assigned to elements inside iframes carry frame context, so interactions like `click`, `fill`, and `type` work without manually switching frames.
|
|
168
|
+
|
|
169
|
+
```bash
|
|
170
|
+
agent-browser snapshot -i
|
|
171
|
+
# @e1 [heading] "Checkout"
|
|
172
|
+
# @e2 [Iframe] "payment-frame"
|
|
173
|
+
# @e3 [input] "Card number"
|
|
174
|
+
# @e4 [input] "Expiry"
|
|
175
|
+
# @e5 [button] "Pay"
|
|
176
|
+
# @e6 [button] "Cancel"
|
|
177
|
+
|
|
178
|
+
# Interact with iframe elements directly using their refs
|
|
179
|
+
agent-browser fill @e3 "4111111111111111"
|
|
180
|
+
agent-browser fill @e4 "12/28"
|
|
181
|
+
agent-browser click @e5
|
|
182
|
+
```
|
|
183
|
+
|
|
184
|
+
**Key details:**
|
|
185
|
+
- Only one level of iframe nesting is expanded (iframes within iframes are not recursed)
|
|
186
|
+
- Cross-origin iframes that block accessibility tree access are silently skipped
|
|
187
|
+
- Empty iframes or iframes with no interactive content are omitted from the output
|
|
188
|
+
- To scope a snapshot to a single iframe, use `frame @ref` then `snapshot -i`
|
|
189
|
+
|
|
165
190
|
## Troubleshooting
|
|
166
191
|
|
|
167
192
|
### "Ref not found" Error
|