agent-browser-stealth 0.24.0-fork.2 → 0.27.0-fork.2
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +54 -1309
- package/bin/.install-method +1 -0
- package/bin/agent-browser-darwin-arm64 +0 -0
- package/bin/agent-browser-darwin-x64 +0 -0
- package/bin/agent-browser-linux-arm64 +0 -0
- package/bin/agent-browser-linux-x64 +0 -0
- package/bin/agent-browser-win32-x64.exe +0 -0
- package/package.json +5 -4
- package/{skills → skill-data}/agentcore/SKILL.md +1 -1
- package/skill-data/core/SKILL.md +476 -0
- package/{skills/agent-browser → skill-data/core}/references/commands.md +101 -7
- package/skill-data/core/references/trust-boundaries.md +89 -0
- package/{skills → skill-data}/dogfood/SKILL.md +1 -1
- package/{skills → skill-data}/electron/SKILL.md +1 -1
- package/{skills → skill-data}/slack/SKILL.md +1 -1
- package/skills/agent-browser/SKILL.md +32 -746
- /package/{skills/agent-browser → skill-data/core}/references/authentication.md +0 -0
- /package/{skills/agent-browser → skill-data/core}/references/profiling.md +0 -0
- /package/{skills/agent-browser → skill-data/core}/references/proxy-support.md +0 -0
- /package/{skills/agent-browser → skill-data/core}/references/session-management.md +0 -0
- /package/{skills/agent-browser → skill-data/core}/references/snapshot-refs.md +0 -0
- /package/{skills/agent-browser → skill-data/core}/references/video-recording.md +0 -0
- /package/{skills/agent-browser → skill-data/core}/templates/authenticated-session.sh +0 -0
- /package/{skills/agent-browser → skill-data/core}/templates/capture-workflow.sh +0 -0
- /package/{skills/agent-browser → skill-data/core}/templates/form-automation.sh +0 -0
- /package/{skills → skill-data}/dogfood/references/issue-taxonomy.md +0 -0
- /package/{skills → skill-data}/dogfood/templates/dogfood-report-template.md +0 -0
- /package/{skills → skill-data}/slack/references/slack-tasks.md +0 -0
- /package/{skills → skill-data}/slack/templates/slack-report-template.md +0 -0
- /package/{skills → skill-data}/vercel-sandbox/SKILL.md +0 -0
package/README.md
CHANGED
|
@@ -1,1355 +1,100 @@
|
|
|
1
|
-
# agent-browser
|
|
1
|
+
# agent-browser-stealth
|
|
2
2
|
|
|
3
|
-
|
|
3
|
+
Stealth fork of [agent-browser](https://github.com/vercel-labs/agent-browser) — connects to your real Chrome, shares your login sessions, and is undetectable by anti-bot systems.
|
|
4
4
|
|
|
5
|
-
|
|
5
|
+
For basic usage, commands, and API reference, see the [upstream documentation](https://github.com/vercel-labs/agent-browser).
|
|
6
6
|
|
|
7
|
-
|
|
7
|
+
## Why this fork?
|
|
8
8
|
|
|
9
|
-
|
|
9
|
+
**agent-browser** launches a fresh browser with an empty profile. You need to log in again, and websites can detect it's automated.
|
|
10
10
|
|
|
11
|
-
|
|
12
|
-
npm install -g agent-browser
|
|
13
|
-
agent-browser install # Download Chrome from Chrome for Testing (first time only)
|
|
14
|
-
```
|
|
15
|
-
|
|
16
|
-
### Project Installation (local dependency)
|
|
17
|
-
|
|
18
|
-
For projects that want to pin the version in `package.json`:
|
|
19
|
-
|
|
20
|
-
```bash
|
|
21
|
-
npm install agent-browser
|
|
22
|
-
agent-browser install
|
|
23
|
-
```
|
|
24
|
-
|
|
25
|
-
Then use via `package.json` scripts or by invoking `agent-browser` directly.
|
|
26
|
-
|
|
27
|
-
### Homebrew (macOS)
|
|
28
|
-
|
|
29
|
-
```bash
|
|
30
|
-
brew install agent-browser
|
|
31
|
-
agent-browser install # Download Chrome from Chrome for Testing (first time only)
|
|
32
|
-
```
|
|
33
|
-
|
|
34
|
-
### Cargo (Rust)
|
|
35
|
-
|
|
36
|
-
```bash
|
|
37
|
-
cargo install agent-browser
|
|
38
|
-
agent-browser install # Download Chrome from Chrome for Testing (first time only)
|
|
39
|
-
```
|
|
40
|
-
|
|
41
|
-
### From Source
|
|
42
|
-
|
|
43
|
-
```bash
|
|
44
|
-
git clone https://github.com/vercel-labs/agent-browser
|
|
45
|
-
cd agent-browser
|
|
46
|
-
pnpm install
|
|
47
|
-
pnpm build
|
|
48
|
-
pnpm build:native # Requires Rust (https://rustup.rs)
|
|
49
|
-
pnpm link --global # Makes agent-browser available globally
|
|
50
|
-
agent-browser install
|
|
51
|
-
```
|
|
52
|
-
|
|
53
|
-
### Linux Dependencies
|
|
54
|
-
|
|
55
|
-
On Linux, install system dependencies:
|
|
56
|
-
|
|
57
|
-
```bash
|
|
58
|
-
agent-browser install --with-deps
|
|
59
|
-
```
|
|
60
|
-
|
|
61
|
-
### Updating
|
|
62
|
-
|
|
63
|
-
Upgrade to the latest version:
|
|
64
|
-
|
|
65
|
-
```bash
|
|
66
|
-
agent-browser upgrade
|
|
67
|
-
```
|
|
68
|
-
|
|
69
|
-
Detects your installation method (npm, Homebrew, or Cargo) and runs the appropriate update command automatically.
|
|
70
|
-
|
|
71
|
-
### Requirements
|
|
72
|
-
|
|
73
|
-
- **Chrome** - Run `agent-browser install` to download Chrome from [Chrome for Testing](https://developer.chrome.com/blog/chrome-for-testing/) (Google's official automation channel). Existing Chrome, Brave, Playwright, and Puppeteer installations are detected automatically. No Playwright or Node.js required for the daemon.
|
|
74
|
-
- **Rust** - Only needed when building from source (see From Source above).
|
|
75
|
-
|
|
76
|
-
## Quick Start
|
|
77
|
-
|
|
78
|
-
```bash
|
|
79
|
-
agent-browser open example.com
|
|
80
|
-
agent-browser snapshot # Get accessibility tree with refs
|
|
81
|
-
agent-browser click @e2 # Click by ref from snapshot
|
|
82
|
-
agent-browser fill @e3 "test@example.com" # Fill by ref
|
|
83
|
-
agent-browser get text @e1 # Get text by ref
|
|
84
|
-
agent-browser screenshot page.png
|
|
85
|
-
agent-browser close
|
|
86
|
-
```
|
|
87
|
-
|
|
88
|
-
### Traditional Selectors (also supported)
|
|
89
|
-
|
|
90
|
-
```bash
|
|
91
|
-
agent-browser click "#submit"
|
|
92
|
-
agent-browser fill "#email" "test@example.com"
|
|
93
|
-
agent-browser find role button click --name "Submit"
|
|
94
|
-
```
|
|
95
|
-
|
|
96
|
-
## Commands
|
|
97
|
-
|
|
98
|
-
### Core Commands
|
|
99
|
-
|
|
100
|
-
```bash
|
|
101
|
-
agent-browser open <url> # Navigate to URL (aliases: goto, navigate)
|
|
102
|
-
agent-browser click <sel> # Click element (--new-tab to open in new tab)
|
|
103
|
-
agent-browser dblclick <sel> # Double-click element
|
|
104
|
-
agent-browser focus <sel> # Focus element
|
|
105
|
-
agent-browser type <sel> <text> # Type into element
|
|
106
|
-
agent-browser fill <sel> <text> # Clear and fill
|
|
107
|
-
agent-browser press <key> # Press key (Enter, Tab, Control+a) (alias: key)
|
|
108
|
-
agent-browser keyboard type <text> # Type with real keystrokes (no selector, current focus)
|
|
109
|
-
agent-browser keyboard inserttext <text> # Insert text without key events (no selector)
|
|
110
|
-
agent-browser keydown <key> # Hold key down
|
|
111
|
-
agent-browser keyup <key> # Release key
|
|
112
|
-
agent-browser hover <sel> # Hover element
|
|
113
|
-
agent-browser select <sel> <val> # Select dropdown option
|
|
114
|
-
agent-browser check <sel> # Check checkbox
|
|
115
|
-
agent-browser uncheck <sel> # Uncheck checkbox
|
|
116
|
-
agent-browser scroll <dir> [px] # Scroll (up/down/left/right, --selector <sel>)
|
|
117
|
-
agent-browser scrollintoview <sel> # Scroll element into view (alias: scrollinto)
|
|
118
|
-
agent-browser drag <src> <tgt> # Drag and drop
|
|
119
|
-
agent-browser upload <sel> <files> # Upload files
|
|
120
|
-
agent-browser screenshot [path] # Take screenshot (--full for full page, saves to a temporary directory if no path)
|
|
121
|
-
agent-browser screenshot --annotate # Annotated screenshot with numbered element labels
|
|
122
|
-
agent-browser screenshot --screenshot-dir ./shots # Save to custom directory
|
|
123
|
-
agent-browser screenshot --screenshot-format jpeg --screenshot-quality 80
|
|
124
|
-
agent-browser pdf <path> # Save as PDF
|
|
125
|
-
agent-browser snapshot # Accessibility tree with refs (best for AI)
|
|
126
|
-
agent-browser eval <js> # Run JavaScript (-b for base64, --stdin for piped input)
|
|
127
|
-
agent-browser connect <port> # Connect to browser via CDP
|
|
128
|
-
agent-browser stream enable [--port <port>] # Start runtime WebSocket streaming
|
|
129
|
-
agent-browser stream status # Show runtime streaming state and bound port
|
|
130
|
-
agent-browser stream disable # Stop runtime WebSocket streaming
|
|
131
|
-
agent-browser close # Close browser (aliases: quit, exit)
|
|
132
|
-
agent-browser close --all # Close all active sessions
|
|
133
|
-
```
|
|
134
|
-
|
|
135
|
-
### Get Info
|
|
136
|
-
|
|
137
|
-
```bash
|
|
138
|
-
agent-browser get text <sel> # Get text content
|
|
139
|
-
agent-browser get html <sel> # Get innerHTML
|
|
140
|
-
agent-browser get value <sel> # Get input value
|
|
141
|
-
agent-browser get attr <sel> <attr> # Get attribute
|
|
142
|
-
agent-browser get title # Get page title
|
|
143
|
-
agent-browser get url # Get current URL
|
|
144
|
-
agent-browser get cdp-url # Get CDP WebSocket URL (for DevTools, debugging)
|
|
145
|
-
agent-browser get count <sel> # Count matching elements
|
|
146
|
-
agent-browser get box <sel> # Get bounding box
|
|
147
|
-
agent-browser get styles <sel> # Get computed styles
|
|
148
|
-
```
|
|
149
|
-
|
|
150
|
-
### Check State
|
|
151
|
-
|
|
152
|
-
```bash
|
|
153
|
-
agent-browser is visible <sel> # Check if visible
|
|
154
|
-
agent-browser is enabled <sel> # Check if enabled
|
|
155
|
-
agent-browser is checked <sel> # Check if checked
|
|
156
|
-
```
|
|
157
|
-
|
|
158
|
-
### Find Elements (Semantic Locators)
|
|
159
|
-
|
|
160
|
-
```bash
|
|
161
|
-
agent-browser find role <role> <action> [value] # By ARIA role
|
|
162
|
-
agent-browser find text <text> <action> # By text content
|
|
163
|
-
agent-browser find label <label> <action> [value] # By label
|
|
164
|
-
agent-browser find placeholder <ph> <action> [value] # By placeholder
|
|
165
|
-
agent-browser find alt <text> <action> # By alt text
|
|
166
|
-
agent-browser find title <text> <action> # By title attr
|
|
167
|
-
agent-browser find testid <id> <action> [value] # By data-testid
|
|
168
|
-
agent-browser find first <sel> <action> [value] # First match
|
|
169
|
-
agent-browser find last <sel> <action> [value] # Last match
|
|
170
|
-
agent-browser find nth <n> <sel> <action> [value] # Nth match
|
|
171
|
-
```
|
|
172
|
-
|
|
173
|
-
**Actions:** `click`, `fill`, `type`, `hover`, `focus`, `check`, `uncheck`, `text`
|
|
174
|
-
|
|
175
|
-
**Options:** `--name <name>` (filter role by accessible name), `--exact` (require exact text match)
|
|
176
|
-
|
|
177
|
-
**Examples:**
|
|
178
|
-
|
|
179
|
-
```bash
|
|
180
|
-
agent-browser find role button click --name "Submit"
|
|
181
|
-
agent-browser find text "Sign In" click
|
|
182
|
-
agent-browser find label "Email" fill "test@test.com"
|
|
183
|
-
agent-browser find first ".item" click
|
|
184
|
-
agent-browser find nth 2 "a" text
|
|
185
|
-
```
|
|
186
|
-
|
|
187
|
-
### Wait
|
|
188
|
-
|
|
189
|
-
```bash
|
|
190
|
-
agent-browser wait <selector> # Wait for element to be visible
|
|
191
|
-
agent-browser wait <ms> # Wait for time (milliseconds)
|
|
192
|
-
agent-browser wait --text "Welcome" # Wait for text to appear (substring match)
|
|
193
|
-
agent-browser wait --url "**/dash" # Wait for URL pattern
|
|
194
|
-
agent-browser wait --load networkidle # Wait for load state
|
|
195
|
-
agent-browser wait --fn "window.ready === true" # Wait for JS condition
|
|
196
|
-
|
|
197
|
-
# Wait for text/element to disappear
|
|
198
|
-
agent-browser wait --fn "!document.body.innerText.includes('Loading...')"
|
|
199
|
-
agent-browser wait "#spinner" --state hidden
|
|
200
|
-
```
|
|
201
|
-
|
|
202
|
-
**Load states:** `load`, `domcontentloaded`, `networkidle`
|
|
203
|
-
|
|
204
|
-
### Batch Execution
|
|
205
|
-
|
|
206
|
-
Execute multiple commands in a single invocation by piping a JSON array of
|
|
207
|
-
string arrays to `batch`. This avoids per-command process startup overhead
|
|
208
|
-
when running multi-step workflows.
|
|
209
|
-
|
|
210
|
-
```bash
|
|
211
|
-
# Pipe commands as JSON
|
|
212
|
-
echo '[
|
|
213
|
-
["open", "https://example.com"],
|
|
214
|
-
["snapshot", "-i"],
|
|
215
|
-
["click", "@e1"],
|
|
216
|
-
["screenshot", "result.png"]
|
|
217
|
-
]' | agent-browser batch --json
|
|
218
|
-
|
|
219
|
-
# Stop on first error
|
|
220
|
-
agent-browser batch --bail < commands.json
|
|
221
|
-
```
|
|
222
|
-
|
|
223
|
-
### Clipboard
|
|
224
|
-
|
|
225
|
-
```bash
|
|
226
|
-
agent-browser clipboard read # Read text from clipboard
|
|
227
|
-
agent-browser clipboard write "Hello, World!" # Write text to clipboard
|
|
228
|
-
agent-browser clipboard copy # Copy current selection (Ctrl+C)
|
|
229
|
-
agent-browser clipboard paste # Paste from clipboard (Ctrl+V)
|
|
230
|
-
```
|
|
231
|
-
|
|
232
|
-
### Mouse Control
|
|
233
|
-
|
|
234
|
-
```bash
|
|
235
|
-
agent-browser mouse move <x> <y> # Move mouse
|
|
236
|
-
agent-browser mouse down [button] # Press button (left/right/middle)
|
|
237
|
-
agent-browser mouse up [button] # Release button
|
|
238
|
-
agent-browser mouse wheel <dy> [dx] # Scroll wheel
|
|
239
|
-
```
|
|
240
|
-
|
|
241
|
-
### Browser Settings
|
|
242
|
-
|
|
243
|
-
```bash
|
|
244
|
-
agent-browser set viewport <w> <h> [scale] # Set viewport size (scale for retina, e.g. 2)
|
|
245
|
-
agent-browser set device <name> # Emulate device ("iPhone 14")
|
|
246
|
-
agent-browser set geo <lat> <lng> # Set geolocation
|
|
247
|
-
agent-browser set offline [on|off] # Toggle offline mode
|
|
248
|
-
agent-browser set headers <json> # Extra HTTP headers
|
|
249
|
-
agent-browser set credentials <u> <p> # HTTP basic auth
|
|
250
|
-
agent-browser set media [dark|light] # Emulate color scheme
|
|
251
|
-
```
|
|
252
|
-
|
|
253
|
-
### Cookies & Storage
|
|
254
|
-
|
|
255
|
-
```bash
|
|
256
|
-
agent-browser cookies # Get all cookies
|
|
257
|
-
agent-browser cookies set <name> <val> # Set cookie
|
|
258
|
-
agent-browser cookies clear # Clear cookies
|
|
259
|
-
|
|
260
|
-
agent-browser storage local # Get all localStorage
|
|
261
|
-
agent-browser storage local <key> # Get specific key
|
|
262
|
-
agent-browser storage local set <k> <v> # Set value
|
|
263
|
-
agent-browser storage local clear # Clear all
|
|
264
|
-
|
|
265
|
-
agent-browser storage session # Same for sessionStorage
|
|
266
|
-
```
|
|
267
|
-
|
|
268
|
-
### Network
|
|
269
|
-
|
|
270
|
-
```bash
|
|
271
|
-
agent-browser network route <url> # Intercept requests
|
|
272
|
-
agent-browser network route <url> --abort # Block requests
|
|
273
|
-
agent-browser network route <url> --body <json> # Mock response
|
|
274
|
-
agent-browser network unroute [url] # Remove routes
|
|
275
|
-
agent-browser network requests # View tracked requests
|
|
276
|
-
agent-browser network requests --filter api # Filter requests
|
|
277
|
-
agent-browser network requests --type xhr,fetch # Filter by resource type
|
|
278
|
-
agent-browser network requests --method POST # Filter by HTTP method
|
|
279
|
-
agent-browser network requests --status 2xx # Filter by status (200, 2xx, 400-499)
|
|
280
|
-
agent-browser network request <requestId> # View full request/response detail
|
|
281
|
-
agent-browser network har start # Start HAR recording
|
|
282
|
-
agent-browser network har stop [output.har] # Stop and save HAR (temp path if omitted)
|
|
283
|
-
```
|
|
284
|
-
|
|
285
|
-
### Tabs & Windows
|
|
286
|
-
|
|
287
|
-
```bash
|
|
288
|
-
agent-browser tab # List tabs
|
|
289
|
-
agent-browser tab new [url] # New tab (optionally with URL)
|
|
290
|
-
agent-browser tab <n> # Switch to tab n
|
|
291
|
-
agent-browser tab close [n] # Close tab
|
|
292
|
-
agent-browser window new # New window
|
|
293
|
-
```
|
|
294
|
-
|
|
295
|
-
### Frames
|
|
296
|
-
|
|
297
|
-
```bash
|
|
298
|
-
agent-browser frame <sel> # Switch to iframe
|
|
299
|
-
agent-browser frame main # Back to main frame
|
|
300
|
-
```
|
|
301
|
-
|
|
302
|
-
### Dialogs
|
|
303
|
-
|
|
304
|
-
```bash
|
|
305
|
-
agent-browser dialog accept [text] # Accept (with optional prompt text)
|
|
306
|
-
agent-browser dialog dismiss # Dismiss
|
|
307
|
-
agent-browser dialog status # Check if a dialog is currently open
|
|
308
|
-
```
|
|
309
|
-
|
|
310
|
-
By default, `alert` and `beforeunload` dialogs are automatically accepted so they never block the agent. `confirm` and `prompt` dialogs still require explicit handling. Use `--no-auto-dialog` (or `AGENT_BROWSER_NO_AUTO_DIALOG=1`) to disable automatic handling.
|
|
311
|
-
|
|
312
|
-
When a JavaScript dialog is pending, all command responses include a `warning` field with the dialog type and message.
|
|
313
|
-
|
|
314
|
-
### Diff
|
|
315
|
-
|
|
316
|
-
```bash
|
|
317
|
-
agent-browser diff snapshot # Compare current vs last snapshot
|
|
318
|
-
agent-browser diff snapshot --baseline before.txt # Compare current vs saved snapshot file
|
|
319
|
-
agent-browser diff snapshot --selector "#main" --compact # Scoped snapshot diff
|
|
320
|
-
agent-browser diff screenshot --baseline before.png # Visual pixel diff against baseline
|
|
321
|
-
agent-browser diff screenshot --baseline b.png -o d.png # Save diff image to custom path
|
|
322
|
-
agent-browser diff screenshot --baseline b.png -t 0.2 # Adjust color threshold (0-1)
|
|
323
|
-
agent-browser diff url https://v1.com https://v2.com # Compare two URLs (snapshot diff)
|
|
324
|
-
agent-browser diff url https://v1.com https://v2.com --screenshot # Also visual diff
|
|
325
|
-
agent-browser diff url https://v1.com https://v2.com --wait-until networkidle # Custom wait strategy
|
|
326
|
-
agent-browser diff url https://v1.com https://v2.com --selector "#main" # Scope to element
|
|
327
|
-
```
|
|
328
|
-
|
|
329
|
-
### Debug
|
|
330
|
-
|
|
331
|
-
```bash
|
|
332
|
-
agent-browser trace start [path] # Start recording trace
|
|
333
|
-
agent-browser trace stop [path] # Stop and save trace
|
|
334
|
-
agent-browser profiler start # Start Chrome DevTools profiling
|
|
335
|
-
agent-browser profiler stop [path] # Stop and save profile (.json)
|
|
336
|
-
agent-browser console # View console messages (log, error, warn, info)
|
|
337
|
-
agent-browser console --json # JSON output with raw CDP args for programmatic access
|
|
338
|
-
agent-browser console --clear # Clear console
|
|
339
|
-
agent-browser errors # View page errors (uncaught JavaScript exceptions)
|
|
340
|
-
agent-browser errors --clear # Clear errors
|
|
341
|
-
agent-browser highlight <sel> # Highlight element
|
|
342
|
-
agent-browser inspect # Open Chrome DevTools for the active page
|
|
343
|
-
agent-browser state save <path> # Save auth state
|
|
344
|
-
agent-browser state load <path> # Load auth state
|
|
345
|
-
agent-browser state list # List saved state files
|
|
346
|
-
agent-browser state show <file> # Show state summary
|
|
347
|
-
agent-browser state rename <old> <new> # Rename state file
|
|
348
|
-
agent-browser state clear [name] # Clear states for session
|
|
349
|
-
agent-browser state clear --all # Clear all saved states
|
|
350
|
-
agent-browser state clean --older-than <days> # Delete old states
|
|
351
|
-
```
|
|
352
|
-
|
|
353
|
-
### Navigation
|
|
354
|
-
|
|
355
|
-
```bash
|
|
356
|
-
agent-browser back # Go back
|
|
357
|
-
agent-browser forward # Go forward
|
|
358
|
-
agent-browser reload # Reload page
|
|
359
|
-
```
|
|
360
|
-
|
|
361
|
-
### Setup
|
|
362
|
-
|
|
363
|
-
```bash
|
|
364
|
-
agent-browser install # Download Chrome from Chrome for Testing (Google's official automation channel)
|
|
365
|
-
agent-browser install --with-deps # Also install system deps (Linux)
|
|
366
|
-
agent-browser upgrade # Upgrade agent-browser to the latest version
|
|
367
|
-
```
|
|
368
|
-
|
|
369
|
-
## Authentication
|
|
370
|
-
|
|
371
|
-
agent-browser provides multiple ways to persist login sessions so you don't re-authenticate every run.
|
|
372
|
-
|
|
373
|
-
### Quick summary
|
|
374
|
-
|
|
375
|
-
| Approach | Best for | Flag / Env |
|
|
376
|
-
|----------|----------|------------|
|
|
377
|
-
| **Persistent profile** | Full browser state (cookies, IndexedDB, service workers, cache) across restarts | `--profile <path>` / `AGENT_BROWSER_PROFILE` |
|
|
378
|
-
| **Session persistence** | Auto-save/restore cookies + localStorage by name | `--session-name <name>` / `AGENT_BROWSER_SESSION_NAME` |
|
|
379
|
-
| **Import from your browser** | Grab auth from a Chrome session you already logged into | `--auto-connect` + `state save` |
|
|
380
|
-
| **State file** | Load a previously saved state JSON on launch | `--state <path>` / `AGENT_BROWSER_STATE` |
|
|
381
|
-
| **Auth vault** | Store credentials locally (encrypted), login by name | `auth save` / `auth login` |
|
|
382
|
-
|
|
383
|
-
### Import auth from your browser
|
|
384
|
-
|
|
385
|
-
If you are already logged in to a site in Chrome, you can grab that auth state and reuse it:
|
|
386
|
-
|
|
387
|
-
```bash
|
|
388
|
-
# 1. Launch Chrome with remote debugging enabled
|
|
389
|
-
# macOS:
|
|
390
|
-
"/Applications/Google Chrome.app/Contents/MacOS/Google Chrome" --remote-debugging-port=9222
|
|
391
|
-
# Or use --auto-connect to discover an already-running Chrome
|
|
392
|
-
|
|
393
|
-
# 2. Connect and save the authenticated state
|
|
394
|
-
agent-browser --auto-connect state save ./my-auth.json
|
|
395
|
-
|
|
396
|
-
# 3. Use the saved auth in future sessions
|
|
397
|
-
agent-browser --state ./my-auth.json open https://app.example.com/dashboard
|
|
398
|
-
|
|
399
|
-
# 4. Or use --session-name for automatic persistence
|
|
400
|
-
agent-browser --session-name myapp state load ./my-auth.json
|
|
401
|
-
# From now on, --session-name myapp auto-saves/restores this state
|
|
402
|
-
```
|
|
403
|
-
|
|
404
|
-
> **Security notes:**
|
|
405
|
-
> - `--remote-debugging-port` exposes full browser control on localhost. Any local process can connect. Only use on trusted machines and close Chrome when done.
|
|
406
|
-
> - State files contain session tokens in plaintext. Add them to `.gitignore` and delete when no longer needed. For encryption at rest, set `AGENT_BROWSER_ENCRYPTION_KEY` (see [State Encryption](#state-encryption)).
|
|
407
|
-
|
|
408
|
-
For full details on login flows, OAuth, 2FA, cookie-based auth, and the auth vault, see the [Authentication](docs/src/app/sessions/page.mdx) docs.
|
|
409
|
-
|
|
410
|
-
## Sessions
|
|
411
|
-
|
|
412
|
-
Run multiple isolated browser instances:
|
|
413
|
-
|
|
414
|
-
```bash
|
|
415
|
-
# Different sessions
|
|
416
|
-
agent-browser --session agent1 open site-a.com
|
|
417
|
-
agent-browser --session agent2 open site-b.com
|
|
418
|
-
|
|
419
|
-
# Or via environment variable
|
|
420
|
-
AGENT_BROWSER_SESSION=agent1 agent-browser click "#btn"
|
|
421
|
-
|
|
422
|
-
# List active sessions
|
|
423
|
-
agent-browser session list
|
|
424
|
-
# Output:
|
|
425
|
-
# Active sessions:
|
|
426
|
-
# -> default
|
|
427
|
-
# agent1
|
|
428
|
-
|
|
429
|
-
# Show current session
|
|
430
|
-
agent-browser session
|
|
431
|
-
```
|
|
432
|
-
|
|
433
|
-
Each session has its own:
|
|
434
|
-
|
|
435
|
-
- Browser instance
|
|
436
|
-
- Cookies and storage
|
|
437
|
-
- Navigation history
|
|
438
|
-
- Authentication state
|
|
439
|
-
|
|
440
|
-
## Persistent Profiles
|
|
441
|
-
|
|
442
|
-
By default, browser state (cookies, localStorage, login sessions) is ephemeral and lost when the browser closes. Use `--profile` to persist state across browser restarts:
|
|
443
|
-
|
|
444
|
-
```bash
|
|
445
|
-
# Use a persistent profile directory
|
|
446
|
-
agent-browser --profile ~/.myapp-profile open myapp.com
|
|
447
|
-
|
|
448
|
-
# Login once, then reuse the authenticated session
|
|
449
|
-
agent-browser --profile ~/.myapp-profile open myapp.com/dashboard
|
|
450
|
-
|
|
451
|
-
# Or via environment variable
|
|
452
|
-
AGENT_BROWSER_PROFILE=~/.myapp-profile agent-browser open myapp.com
|
|
453
|
-
```
|
|
454
|
-
|
|
455
|
-
The profile directory stores:
|
|
456
|
-
|
|
457
|
-
- Cookies and localStorage
|
|
458
|
-
- IndexedDB data
|
|
459
|
-
- Service workers
|
|
460
|
-
- Browser cache
|
|
461
|
-
- Login sessions
|
|
462
|
-
|
|
463
|
-
**Tip**: Use different profile paths for different projects to keep their browser state isolated.
|
|
464
|
-
|
|
465
|
-
## Session Persistence
|
|
466
|
-
|
|
467
|
-
Alternatively, use `--session-name` to automatically save and restore cookies and localStorage across browser restarts:
|
|
468
|
-
|
|
469
|
-
```bash
|
|
470
|
-
# Auto-save/load state for "twitter" session
|
|
471
|
-
agent-browser --session-name twitter open twitter.com
|
|
472
|
-
|
|
473
|
-
# Login once, then state persists automatically
|
|
474
|
-
# State files stored in ~/.agent-browser/sessions/
|
|
475
|
-
|
|
476
|
-
# Or via environment variable
|
|
477
|
-
export AGENT_BROWSER_SESSION_NAME=twitter
|
|
478
|
-
agent-browser open twitter.com
|
|
479
|
-
```
|
|
480
|
-
|
|
481
|
-
### State Encryption
|
|
482
|
-
|
|
483
|
-
Encrypt saved session data at rest with AES-256-GCM:
|
|
484
|
-
|
|
485
|
-
```bash
|
|
486
|
-
# Generate key: openssl rand -hex 32
|
|
487
|
-
export AGENT_BROWSER_ENCRYPTION_KEY=<64-char-hex-key>
|
|
488
|
-
|
|
489
|
-
# State files are now encrypted automatically
|
|
490
|
-
agent-browser --session-name secure open example.com
|
|
491
|
-
```
|
|
492
|
-
|
|
493
|
-
| Variable | Description |
|
|
494
|
-
| --------------------------------- | -------------------------------------------------- |
|
|
495
|
-
| `AGENT_BROWSER_SESSION_NAME` | Auto-save/load state persistence name |
|
|
496
|
-
| `AGENT_BROWSER_ENCRYPTION_KEY` | 64-char hex key for AES-256-GCM encryption |
|
|
497
|
-
| `AGENT_BROWSER_STATE_EXPIRE_DAYS` | Auto-delete states older than N days (default: 30) |
|
|
498
|
-
|
|
499
|
-
## Security
|
|
500
|
-
|
|
501
|
-
agent-browser includes security features for safe AI agent deployments. All features are opt-in -- existing workflows are unaffected until you explicitly enable a feature:
|
|
502
|
-
|
|
503
|
-
- **Authentication Vault** -- Store credentials locally (always encrypted), reference by name. The LLM never sees passwords. `auth login` navigates with `load` and then waits for login form selectors to appear (SPA-friendly, timeout follows the default action timeout). A key is auto-generated at `~/.agent-browser/.encryption-key` if `AGENT_BROWSER_ENCRYPTION_KEY` is not set: `echo "pass" | agent-browser auth save github --url https://github.com/login --username user --password-stdin` then `agent-browser auth login github`
|
|
504
|
-
- **Content Boundary Markers** -- Wrap page output in delimiters so LLMs can distinguish tool output from untrusted content: `--content-boundaries`
|
|
505
|
-
- **Domain Allowlist** -- Restrict navigation to trusted domains (wildcards like `*.example.com` also match the bare domain): `--allowed-domains "example.com,*.example.com"`. Sub-resource requests (scripts, images, fetch) and WebSocket/EventSource connections to non-allowed domains are also blocked. Include any CDN domains your target pages depend on (e.g., `*.cdn.example.com`).
|
|
506
|
-
- **Action Policy** -- Gate destructive actions with a static policy file: `--action-policy ./policy.json`
|
|
507
|
-
- **Action Confirmation** -- Require explicit approval for sensitive action categories: `--confirm-actions eval,download`
|
|
508
|
-
- **Output Length Limits** -- Prevent context flooding: `--max-output 50000`
|
|
509
|
-
|
|
510
|
-
| Variable | Description |
|
|
511
|
-
| ----------------------------------- | ---------------------------------------- |
|
|
512
|
-
| `AGENT_BROWSER_CONTENT_BOUNDARIES` | Wrap page output in boundary markers |
|
|
513
|
-
| `AGENT_BROWSER_MAX_OUTPUT` | Max characters for page output |
|
|
514
|
-
| `AGENT_BROWSER_ALLOWED_DOMAINS` | Comma-separated allowed domain patterns |
|
|
515
|
-
| `AGENT_BROWSER_ACTION_POLICY` | Path to action policy JSON file |
|
|
516
|
-
| `AGENT_BROWSER_CONFIRM_ACTIONS` | Action categories requiring confirmation |
|
|
517
|
-
| `AGENT_BROWSER_CONFIRM_INTERACTIVE` | Enable interactive confirmation prompts |
|
|
518
|
-
|
|
519
|
-
See [Security documentation](https://agent-browser.dev/security) for details.
|
|
520
|
-
|
|
521
|
-
## Snapshot Options
|
|
522
|
-
|
|
523
|
-
The `snapshot` command supports filtering to reduce output size:
|
|
11
|
+
**agent-browser-stealth** connects to your existing Chrome. Your cookies, sessions, and browser fingerprint are all real — because it IS your real browser.
|
|
524
12
|
|
|
525
|
-
|
|
526
|
-
|
|
527
|
-
|
|
528
|
-
|
|
529
|
-
|
|
530
|
-
|
|
531
|
-
|
|
532
|
-
```
|
|
533
|
-
|
|
534
|
-
| Option | Description |
|
|
535
|
-
| ---------------------- | ----------------------------------------------------------------------- |
|
|
536
|
-
| `-i, --interactive` | Only show interactive elements (buttons, links, inputs) |
|
|
537
|
-
| `-c, --compact` | Remove empty structural elements |
|
|
538
|
-
| `-d, --depth <n>` | Limit tree depth |
|
|
539
|
-
| `-s, --selector <sel>` | Scope to CSS selector |
|
|
540
|
-
|
|
541
|
-
## Annotated Screenshots
|
|
542
|
-
|
|
543
|
-
The `--annotate` flag overlays numbered labels on interactive elements in the screenshot. Each label `[N]` corresponds to ref `@eN`, so the same refs work for both visual and text-based workflows.
|
|
544
|
-
|
|
545
|
-
Annotated screenshots are supported on the CDP-backed browser path (Chrome/Lightpanda). The Safari/WebDriver backend does not yet support `--annotate`.
|
|
546
|
-
|
|
547
|
-
```bash
|
|
548
|
-
agent-browser screenshot --annotate
|
|
549
|
-
# -> Screenshot saved to /tmp/screenshot-2026-02-17T12-00-00-abc123.png
|
|
550
|
-
# [1] @e1 button "Submit"
|
|
551
|
-
# [2] @e2 link "Home"
|
|
552
|
-
# [3] @e3 textbox "Email"
|
|
553
|
-
```
|
|
13
|
+
| | agent-browser | agent-browser-stealth |
|
|
14
|
+
|---|---|---|
|
|
15
|
+
| Browser | Launches new Chrome | Connects to your Chrome |
|
|
16
|
+
| Login state | Empty, need to re-login | Your existing sessions |
|
|
17
|
+
| Fingerprint | Automation markers present | Your real fingerprint |
|
|
18
|
+
| User collaboration | Separate window | Same window, take over anytime |
|
|
19
|
+
| CAPTCHA | Agent stuck | You solve it, agent continues |
|
|
554
20
|
|
|
555
|
-
|
|
21
|
+
## Install
|
|
556
22
|
|
|
557
23
|
```bash
|
|
558
|
-
agent-browser
|
|
559
|
-
agent-browser click @e2 # Click the "Home" link labeled [2]
|
|
24
|
+
npm install -g agent-browser-stealth
|
|
560
25
|
```
|
|
561
26
|
|
|
562
|
-
|
|
563
|
-
|
|
564
|
-
## Options
|
|
565
|
-
|
|
566
|
-
| Option | Description |
|
|
567
|
-
|--------|-------------|
|
|
568
|
-
| `--session <name>` | Use isolated session (or `AGENT_BROWSER_SESSION` env) |
|
|
569
|
-
| `--session-name <name>` | Auto-save/restore session state (or `AGENT_BROWSER_SESSION_NAME` env) |
|
|
570
|
-
| `--profile <path>` | Persistent browser profile directory (or `AGENT_BROWSER_PROFILE` env) |
|
|
571
|
-
| `--state <path>` | Load storage state from JSON file (or `AGENT_BROWSER_STATE` env) |
|
|
572
|
-
| `--headers <json>` | Set HTTP headers scoped to the URL's origin |
|
|
573
|
-
| `--executable-path <path>` | Custom browser executable (or `AGENT_BROWSER_EXECUTABLE_PATH` env) |
|
|
574
|
-
| `--extension <path>` | Load browser extension (repeatable; or `AGENT_BROWSER_EXTENSIONS` env) |
|
|
575
|
-
| `--args <args>` | Browser launch args, comma or newline separated (or `AGENT_BROWSER_ARGS` env) |
|
|
576
|
-
| `--user-agent <ua>` | Custom User-Agent string (or `AGENT_BROWSER_USER_AGENT` env) |
|
|
577
|
-
| `--proxy <url>` | Proxy server URL with optional auth (or `AGENT_BROWSER_PROXY` env) |
|
|
578
|
-
| `--proxy-bypass <hosts>` | Hosts to bypass proxy (or `AGENT_BROWSER_PROXY_BYPASS` env) |
|
|
579
|
-
| `--ignore-https-errors` | Ignore HTTPS certificate errors (useful for self-signed certs) |
|
|
580
|
-
| `--allow-file-access` | Allow file:// URLs to access local files (Chromium only) |
|
|
581
|
-
| `-p, --provider <name>` | Cloud browser provider (or `AGENT_BROWSER_PROVIDER` env) |
|
|
582
|
-
| `--device <name>` | iOS device name, e.g. "iPhone 15 Pro" (or `AGENT_BROWSER_IOS_DEVICE` env) |
|
|
583
|
-
| `--json` | JSON output (for agents) |
|
|
584
|
-
| `--annotate` | Annotated screenshot with numbered element labels (or `AGENT_BROWSER_ANNOTATE` env) |
|
|
585
|
-
| `--screenshot-dir <path>` | Default screenshot output directory (or `AGENT_BROWSER_SCREENSHOT_DIR` env) |
|
|
586
|
-
| `--screenshot-quality <n>` | JPEG quality 0-100 (or `AGENT_BROWSER_SCREENSHOT_QUALITY` env) |
|
|
587
|
-
| `--screenshot-format <fmt>` | Screenshot format: `png`, `jpeg` (or `AGENT_BROWSER_SCREENSHOT_FORMAT` env) |
|
|
588
|
-
| `--headed` | Show browser window (not headless) (or `AGENT_BROWSER_HEADED` env) |
|
|
589
|
-
| `--cdp <port\|url>` | Connect via Chrome DevTools Protocol (port or WebSocket URL) |
|
|
590
|
-
| `--auto-connect` | Auto-discover and connect to running Chrome (or `AGENT_BROWSER_AUTO_CONNECT` env) |
|
|
591
|
-
| `--color-scheme <scheme>` | Color scheme: `dark`, `light`, `no-preference` (or `AGENT_BROWSER_COLOR_SCHEME` env) |
|
|
592
|
-
| `--download-path <path>` | Default download directory (or `AGENT_BROWSER_DOWNLOAD_PATH` env) |
|
|
593
|
-
| `--content-boundaries` | Wrap page output in boundary markers for LLM safety (or `AGENT_BROWSER_CONTENT_BOUNDARIES` env) |
|
|
594
|
-
| `--max-output <chars>` | Truncate page output to N characters (or `AGENT_BROWSER_MAX_OUTPUT` env) |
|
|
595
|
-
| `--allowed-domains <list>` | Comma-separated allowed domain patterns (or `AGENT_BROWSER_ALLOWED_DOMAINS` env) |
|
|
596
|
-
| `--action-policy <path>` | Path to action policy JSON file (or `AGENT_BROWSER_ACTION_POLICY` env) |
|
|
597
|
-
| `--confirm-actions <list>` | Action categories requiring confirmation (or `AGENT_BROWSER_CONFIRM_ACTIONS` env) |
|
|
598
|
-
| `--confirm-interactive` | Interactive confirmation prompts; auto-denies if stdin is not a TTY (or `AGENT_BROWSER_CONFIRM_INTERACTIVE` env) |
|
|
599
|
-
| `--engine <name>` | Browser engine: `chrome` (default), `lightpanda` (or `AGENT_BROWSER_ENGINE` env) |
|
|
600
|
-
| `--no-auto-dialog` | Disable automatic dismissal of `alert`/`beforeunload` dialogs (or `AGENT_BROWSER_NO_AUTO_DIALOG` env) |
|
|
601
|
-
| `--config <path>` | Use a custom config file (or `AGENT_BROWSER_CONFIG` env) |
|
|
602
|
-
| `--debug` | Debug output |
|
|
603
|
-
|
|
604
|
-
## Observability Dashboard
|
|
27
|
+
### Install the AI agent skills
|
|
605
28
|
|
|
606
|
-
|
|
29
|
+
The repo ships SKILL.md files for Claude Code, Cursor, etc. Pull them into the current project with [skills.sh](https://skills.sh):
|
|
607
30
|
|
|
608
31
|
```bash
|
|
609
|
-
|
|
610
|
-
agent-browser dashboard install
|
|
611
|
-
|
|
612
|
-
# Start the dashboard server (runs in background on port 4848)
|
|
613
|
-
agent-browser dashboard start
|
|
614
|
-
agent-browser dashboard start --port 8080 # Custom port
|
|
615
|
-
|
|
616
|
-
# All sessions are automatically visible in the dashboard
|
|
617
|
-
agent-browser open example.com
|
|
618
|
-
|
|
619
|
-
# Stop the dashboard
|
|
620
|
-
agent-browser dashboard stop
|
|
32
|
+
npx skills add leeguooooo/agent-browser-stealth
|
|
621
33
|
```
|
|
622
34
|
|
|
623
|
-
|
|
624
|
-
|
|
625
|
-
The dashboard displays:
|
|
626
|
-
- **Live viewport** -- real-time JPEG frames from the browser
|
|
627
|
-
- **Activity feed** -- chronological command/result stream with timing and expandable details
|
|
628
|
-
- **Console output** -- browser console messages (log, warn, error)
|
|
629
|
-
- **Session creation** -- create new sessions from the UI with local engines (Chrome, Lightpanda) or cloud providers (AgentCore, Browserbase, Browserless, Browser Use, Kernel)
|
|
35
|
+
This drops `skills/agent-browser` (and the specialized `skill-data/{core,electron,slack,dogfood,agentcore,vercel-sandbox}`) into your project so your AI agent gets the right usage patterns and pre-approved bash permissions for `agent-browser`, `agent-browser-stealth`, and `abs`.
|
|
630
36
|
|
|
631
|
-
##
|
|
37
|
+
## Setup (one time)
|
|
632
38
|
|
|
633
|
-
|
|
39
|
+
Enable Chrome DevTools Protocol in your Chrome:
|
|
634
40
|
|
|
635
|
-
|
|
41
|
+
1. Open `chrome://inspect/#remote-debugging` in Chrome
|
|
42
|
+
2. Toggle the switch on
|
|
636
43
|
|
|
637
|
-
|
|
638
|
-
2. `./agent-browser.json` -- project-level overrides (in working directory)
|
|
639
|
-
3. `AGENT_BROWSER_*` environment variables override config file values
|
|
640
|
-
4. CLI flags override everything
|
|
44
|
+
That's it. This setting persists across Chrome restarts.
|
|
641
45
|
|
|
642
|
-
|
|
643
|
-
|
|
644
|
-
```json
|
|
645
|
-
{
|
|
646
|
-
"headed": true,
|
|
647
|
-
"proxy": "http://localhost:8080",
|
|
648
|
-
"profile": "./browser-data",
|
|
649
|
-
"userAgent": "my-agent/1.0",
|
|
650
|
-
"ignoreHttpsErrors": true
|
|
651
|
-
}
|
|
652
|
-
```
|
|
653
|
-
|
|
654
|
-
Use `--config <path>` or `AGENT_BROWSER_CONFIG` to load a specific config file instead of the defaults:
|
|
46
|
+
## Usage
|
|
655
47
|
|
|
656
48
|
```bash
|
|
657
|
-
|
|
658
|
-
AGENT_BROWSER_CONFIG=./ci-config.json agent-browser open example.com
|
|
659
|
-
```
|
|
660
|
-
|
|
661
|
-
All options from the table above can be set in the config file using camelCase keys (e.g., `--executable-path` becomes `"executablePath"`, `--proxy-bypass` becomes `"proxyBypass"`). Unknown keys are ignored for forward compatibility.
|
|
662
|
-
|
|
663
|
-
Boolean flags accept an optional `true`/`false` value to override config settings. For example, `--headed false` disables `"headed": true` from config. A bare `--headed` is equivalent to `--headed true`.
|
|
664
|
-
|
|
665
|
-
Auto-discovered config files that are missing are silently ignored. If `--config <path>` points to a missing or invalid file, agent-browser exits with an error. Extensions from user and project configs are merged (concatenated), not replaced.
|
|
666
|
-
|
|
667
|
-
> **Tip:** If your project-level `agent-browser.json` contains environment-specific values (paths, proxies), consider adding it to `.gitignore`.
|
|
668
|
-
|
|
669
|
-
## Default Timeout
|
|
670
|
-
|
|
671
|
-
The default timeout for standard operations (clicks, waits, fills, etc.) is 25 seconds. This is intentionally below the CLI's 30-second IPC read timeout so that the daemon returns a proper error instead of the CLI timing out with EAGAIN.
|
|
672
|
-
|
|
673
|
-
Override the default timeout via environment variable:
|
|
674
|
-
|
|
675
|
-
```bash
|
|
676
|
-
# Set a longer timeout for slow pages (in milliseconds)
|
|
677
|
-
export AGENT_BROWSER_DEFAULT_TIMEOUT=45000
|
|
678
|
-
```
|
|
679
|
-
|
|
680
|
-
> **Note:** Setting this above 30000 (30s) may cause EAGAIN errors on slow operations because the CLI's read timeout will expire before the daemon responds. The CLI retries transient errors automatically, but response times will increase.
|
|
681
|
-
|
|
682
|
-
| Variable | Description |
|
|
683
|
-
| ------------------------------- | ---------------------------------------- |
|
|
684
|
-
| `AGENT_BROWSER_DEFAULT_TIMEOUT` | Default operation timeout in ms (default: 25000) |
|
|
685
|
-
|
|
686
|
-
## Selectors
|
|
687
|
-
|
|
688
|
-
### Refs (Recommended for AI)
|
|
689
|
-
|
|
690
|
-
Refs provide deterministic element selection from snapshots:
|
|
691
|
-
|
|
692
|
-
```bash
|
|
693
|
-
# 1. Get snapshot with refs
|
|
694
|
-
agent-browser snapshot
|
|
695
|
-
# Output:
|
|
696
|
-
# - heading "Example Domain" [ref=e1] [level=1]
|
|
697
|
-
# - button "Submit" [ref=e2]
|
|
698
|
-
# - textbox "Email" [ref=e3]
|
|
699
|
-
# - link "Learn more" [ref=e4]
|
|
700
|
-
|
|
701
|
-
# 2. Use refs to interact
|
|
702
|
-
agent-browser click @e2 # Click the button
|
|
703
|
-
agent-browser fill @e3 "test@example.com" # Fill the textbox
|
|
704
|
-
agent-browser get text @e1 # Get heading text
|
|
705
|
-
agent-browser hover @e4 # Hover the link
|
|
706
|
-
```
|
|
707
|
-
|
|
708
|
-
**Why use refs?**
|
|
709
|
-
|
|
710
|
-
- **Deterministic**: Ref points to exact element from snapshot
|
|
711
|
-
- **Fast**: No DOM re-query needed
|
|
712
|
-
- **AI-friendly**: Snapshot + ref workflow is optimal for LLMs
|
|
713
|
-
|
|
714
|
-
### CSS Selectors
|
|
715
|
-
|
|
716
|
-
```bash
|
|
717
|
-
agent-browser click "#id"
|
|
718
|
-
agent-browser click ".class"
|
|
719
|
-
agent-browser click "div > button"
|
|
720
|
-
```
|
|
721
|
-
|
|
722
|
-
### Text & XPath
|
|
723
|
-
|
|
724
|
-
```bash
|
|
725
|
-
agent-browser click "text=Submit"
|
|
726
|
-
agent-browser click "xpath=//button"
|
|
727
|
-
```
|
|
728
|
-
|
|
729
|
-
### Semantic Locators
|
|
730
|
-
|
|
731
|
-
```bash
|
|
732
|
-
agent-browser find role button click --name "Submit"
|
|
733
|
-
agent-browser find label "Email" fill "test@test.com"
|
|
734
|
-
```
|
|
735
|
-
|
|
736
|
-
## Agent Mode
|
|
737
|
-
|
|
738
|
-
Use `--json` for machine-readable output:
|
|
739
|
-
|
|
740
|
-
```bash
|
|
741
|
-
agent-browser snapshot --json
|
|
742
|
-
# Returns: {"success":true,"data":{"snapshot":"...","refs":{"e1":{"role":"heading","name":"Title"},...}}}
|
|
743
|
-
|
|
744
|
-
agent-browser get text @e1 --json
|
|
745
|
-
agent-browser is visible @e2 --json
|
|
746
|
-
```
|
|
747
|
-
|
|
748
|
-
### Optimal AI Workflow
|
|
749
|
-
|
|
750
|
-
```bash
|
|
751
|
-
# 1. Navigate and get snapshot
|
|
752
|
-
agent-browser open example.com
|
|
753
|
-
agent-browser snapshot -i --json # AI parses tree and refs
|
|
754
|
-
|
|
755
|
-
# 2. AI identifies target refs from snapshot
|
|
756
|
-
# 3. Execute actions using refs
|
|
757
|
-
agent-browser click @e2
|
|
758
|
-
agent-browser fill @e3 "input text"
|
|
759
|
-
|
|
760
|
-
# 4. Get new snapshot if page changed
|
|
761
|
-
agent-browser snapshot -i --json
|
|
762
|
-
```
|
|
763
|
-
|
|
764
|
-
### Command Chaining
|
|
765
|
-
|
|
766
|
-
Commands can be chained with `&&` in a single shell invocation. The browser persists via a background daemon, so chaining is safe and more efficient:
|
|
767
|
-
|
|
768
|
-
```bash
|
|
769
|
-
# Open, wait for load, and snapshot in one call
|
|
770
|
-
agent-browser open example.com && agent-browser wait --load networkidle && agent-browser snapshot -i
|
|
771
|
-
|
|
772
|
-
# Chain multiple interactions
|
|
773
|
-
agent-browser fill @e1 "user@example.com" && agent-browser fill @e2 "pass" && agent-browser click @e3
|
|
774
|
-
|
|
775
|
-
# Navigate and screenshot
|
|
776
|
-
agent-browser open example.com && agent-browser wait --load networkidle && agent-browser screenshot page.png
|
|
777
|
-
```
|
|
778
|
-
|
|
779
|
-
Use `&&` when you don't need intermediate output. Run commands separately when you need to parse output first (e.g., snapshot to discover refs before interacting).
|
|
780
|
-
|
|
781
|
-
## Headed Mode
|
|
782
|
-
|
|
783
|
-
Show the browser window for debugging:
|
|
784
|
-
|
|
785
|
-
```bash
|
|
786
|
-
agent-browser open example.com --headed
|
|
787
|
-
```
|
|
788
|
-
|
|
789
|
-
This opens a visible browser window instead of running headless.
|
|
790
|
-
|
|
791
|
-
> **Note:** Browser extensions work in both headed and headless mode (Chrome's `--headless=new`).
|
|
792
|
-
|
|
793
|
-
## Authenticated Sessions
|
|
794
|
-
|
|
795
|
-
Use `--headers` to set HTTP headers for a specific origin, enabling authentication without login flows:
|
|
796
|
-
|
|
797
|
-
```bash
|
|
798
|
-
# Headers are scoped to api.example.com only
|
|
799
|
-
agent-browser open api.example.com --headers '{"Authorization": "Bearer <token>"}'
|
|
800
|
-
|
|
801
|
-
# Requests to api.example.com include the auth header
|
|
802
|
-
agent-browser snapshot -i --json
|
|
803
|
-
agent-browser click @e2
|
|
804
|
-
|
|
805
|
-
# Navigate to another domain - headers are NOT sent (safe!)
|
|
806
|
-
agent-browser open other-site.com
|
|
807
|
-
```
|
|
808
|
-
|
|
809
|
-
This is useful for:
|
|
810
|
-
|
|
811
|
-
- **Skipping login flows** - Authenticate via headers instead of UI
|
|
812
|
-
- **Switching users** - Start new sessions with different auth tokens
|
|
813
|
-
- **API testing** - Access protected endpoints directly
|
|
814
|
-
- **Security** - Headers are scoped to the origin, not leaked to other domains
|
|
815
|
-
|
|
816
|
-
To set headers for multiple origins, use `--headers` with each `open` command:
|
|
817
|
-
|
|
818
|
-
```bash
|
|
819
|
-
agent-browser open api.example.com --headers '{"Authorization": "Bearer token1"}'
|
|
820
|
-
agent-browser open api.acme.com --headers '{"Authorization": "Bearer token2"}'
|
|
821
|
-
```
|
|
822
|
-
|
|
823
|
-
For global headers (all domains), use `set headers`:
|
|
824
|
-
|
|
825
|
-
```bash
|
|
826
|
-
agent-browser set headers '{"X-Custom-Header": "value"}'
|
|
827
|
-
```
|
|
828
|
-
|
|
829
|
-
## Custom Browser Executable
|
|
830
|
-
|
|
831
|
-
Use a custom browser executable instead of the bundled Chromium. This is useful for:
|
|
832
|
-
|
|
833
|
-
- **Serverless deployment**: Use lightweight Chromium builds like `@sparticuz/chromium` (~50MB vs ~684MB)
|
|
834
|
-
- **System browsers**: Use an existing Chrome/Chromium installation
|
|
835
|
-
- **Custom builds**: Use modified browser builds
|
|
836
|
-
|
|
837
|
-
### CLI Usage
|
|
838
|
-
|
|
839
|
-
```bash
|
|
840
|
-
# Via flag
|
|
841
|
-
agent-browser --executable-path /path/to/chromium open example.com
|
|
842
|
-
|
|
843
|
-
# Via environment variable
|
|
844
|
-
AGENT_BROWSER_EXECUTABLE_PATH=/path/to/chromium agent-browser open example.com
|
|
845
|
-
```
|
|
846
|
-
|
|
847
|
-
### Serverless (Vercel)
|
|
848
|
-
|
|
849
|
-
Run agent-browser + Chrome in an ephemeral Vercel Sandbox microVM. No external server needed:
|
|
850
|
-
|
|
851
|
-
```typescript
|
|
852
|
-
import { Sandbox } from "@vercel/sandbox";
|
|
853
|
-
|
|
854
|
-
const sandbox = await Sandbox.create({ runtime: "node24" });
|
|
855
|
-
await sandbox.runCommand("agent-browser", ["open", "https://example.com"]);
|
|
856
|
-
const result = await sandbox.runCommand("agent-browser", ["screenshot", "--json"]);
|
|
857
|
-
await sandbox.stop();
|
|
858
|
-
```
|
|
859
|
-
|
|
860
|
-
See the [environments example](examples/environments/) for a working demo with a UI and deploy-to-Vercel button.
|
|
861
|
-
|
|
862
|
-
### Serverless (AWS Lambda)
|
|
863
|
-
|
|
864
|
-
```typescript
|
|
865
|
-
import chromium from '@sparticuz/chromium';
|
|
866
|
-
import { execSync } from 'child_process';
|
|
867
|
-
|
|
868
|
-
export async function handler() {
|
|
869
|
-
const executablePath = await chromium.executablePath();
|
|
870
|
-
const result = execSync(
|
|
871
|
-
`AGENT_BROWSER_EXECUTABLE_PATH=${executablePath} agent-browser open https://example.com && agent-browser snapshot -i --json`,
|
|
872
|
-
{ encoding: 'utf-8' }
|
|
873
|
-
);
|
|
874
|
-
return JSON.parse(result);
|
|
875
|
-
}
|
|
876
|
-
```
|
|
877
|
-
|
|
878
|
-
## Local Files
|
|
879
|
-
|
|
880
|
-
Open and interact with local files (PDFs, HTML, etc.) using `file://` URLs:
|
|
881
|
-
|
|
882
|
-
```bash
|
|
883
|
-
# Enable file access (required for JavaScript to access local files)
|
|
884
|
-
agent-browser --allow-file-access open file:///path/to/document.pdf
|
|
885
|
-
agent-browser --allow-file-access open file:///path/to/page.html
|
|
886
|
-
|
|
887
|
-
# Take screenshot of a local PDF
|
|
888
|
-
agent-browser --allow-file-access open file:///Users/me/report.pdf
|
|
889
|
-
agent-browser screenshot report.png
|
|
890
|
-
```
|
|
891
|
-
|
|
892
|
-
The `--allow-file-access` flag adds Chromium flags (`--allow-file-access-from-files`, `--allow-file-access`) that allow `file://` URLs to:
|
|
893
|
-
|
|
894
|
-
- Load and render local files
|
|
895
|
-
- Access other local files via JavaScript (XHR, fetch)
|
|
896
|
-
- Load local resources (images, scripts, stylesheets)
|
|
897
|
-
|
|
898
|
-
**Note:** This flag only works with Chromium. For security, it's disabled by default.
|
|
899
|
-
|
|
900
|
-
## CDP Mode
|
|
901
|
-
|
|
902
|
-
Connect to an existing browser via Chrome DevTools Protocol:
|
|
903
|
-
|
|
904
|
-
```bash
|
|
905
|
-
# Start Chrome with: google-chrome --remote-debugging-port=9222
|
|
906
|
-
|
|
907
|
-
# Connect once, then run commands without --cdp
|
|
908
|
-
agent-browser connect 9222
|
|
909
|
-
agent-browser snapshot
|
|
910
|
-
agent-browser tab
|
|
911
|
-
agent-browser close
|
|
912
|
-
|
|
913
|
-
# Or pass --cdp on each command
|
|
914
|
-
agent-browser --cdp 9222 snapshot
|
|
915
|
-
|
|
916
|
-
# Connect to remote browser via WebSocket URL
|
|
917
|
-
agent-browser --cdp "wss://your-browser-service.com/cdp?token=..." snapshot
|
|
918
|
-
```
|
|
919
|
-
|
|
920
|
-
The `--cdp` flag accepts either:
|
|
921
|
-
|
|
922
|
-
- A port number (e.g., `9222`) for local connections via `http://localhost:{port}`
|
|
923
|
-
- A full WebSocket URL (e.g., `wss://...` or `ws://...`) for remote browser services
|
|
924
|
-
|
|
925
|
-
This enables control of:
|
|
926
|
-
|
|
927
|
-
- Electron apps
|
|
928
|
-
- Chrome/Chromium instances with remote debugging
|
|
929
|
-
- WebView2 applications
|
|
930
|
-
- Any browser exposing a CDP endpoint
|
|
931
|
-
|
|
932
|
-
### Auto-Connect
|
|
933
|
-
|
|
934
|
-
Use `--auto-connect` to automatically discover and connect to a running Chrome instance without specifying a port:
|
|
935
|
-
|
|
936
|
-
```bash
|
|
937
|
-
# Auto-discover running Chrome with remote debugging
|
|
938
|
-
agent-browser --auto-connect open example.com
|
|
939
|
-
agent-browser --auto-connect snapshot
|
|
940
|
-
|
|
941
|
-
# Or via environment variable
|
|
942
|
-
AGENT_BROWSER_AUTO_CONNECT=1 agent-browser snapshot
|
|
943
|
-
```
|
|
944
|
-
|
|
945
|
-
Auto-connect discovers Chrome by:
|
|
946
|
-
|
|
947
|
-
1. Reading Chrome's `DevToolsActivePort` file from the default user data directory
|
|
948
|
-
2. Falling back to probing common debugging ports (9222, 9229)
|
|
949
|
-
3. If HTTP-based discovery (`/json/version`, `/json/list`) fails, falling back to a direct WebSocket connection
|
|
950
|
-
|
|
951
|
-
This is useful when:
|
|
952
|
-
|
|
953
|
-
- Chrome 144+ has remote debugging enabled via `chrome://inspect/#remote-debugging` (which uses a dynamic port)
|
|
954
|
-
- You want a zero-configuration connection to your existing browser
|
|
955
|
-
- You don't want to track which port Chrome is using
|
|
956
|
-
|
|
957
|
-
## Streaming (Browser Preview)
|
|
958
|
-
|
|
959
|
-
Stream the browser viewport via WebSocket for live preview or "pair browsing" where a human can watch and interact alongside an AI agent.
|
|
960
|
-
|
|
961
|
-
### Streaming
|
|
962
|
-
|
|
963
|
-
Every session automatically starts a WebSocket stream server on an OS-assigned port. Use `stream status` to see the bound port and connection state:
|
|
964
|
-
|
|
965
|
-
```bash
|
|
966
|
-
agent-browser stream status
|
|
967
|
-
```
|
|
968
|
-
|
|
969
|
-
To bind to a specific port, set `AGENT_BROWSER_STREAM_PORT`:
|
|
970
|
-
|
|
971
|
-
```bash
|
|
972
|
-
AGENT_BROWSER_STREAM_PORT=9223 agent-browser open example.com
|
|
973
|
-
```
|
|
974
|
-
|
|
975
|
-
You can also manage streaming at runtime with `stream enable`, `stream disable`, and `stream status`:
|
|
976
|
-
|
|
977
|
-
```bash
|
|
978
|
-
agent-browser stream enable --port 9223 # Re-enable on a specific port
|
|
979
|
-
agent-browser stream disable # Stop streaming for the session
|
|
980
|
-
```
|
|
981
|
-
|
|
982
|
-
The WebSocket server streams the browser viewport and accepts input events.
|
|
983
|
-
|
|
984
|
-
### WebSocket Protocol
|
|
985
|
-
|
|
986
|
-
Connect to `ws://localhost:9223` to receive frames and send input:
|
|
987
|
-
|
|
988
|
-
**Receive frames:**
|
|
989
|
-
|
|
990
|
-
```json
|
|
991
|
-
{
|
|
992
|
-
"type": "frame",
|
|
993
|
-
"data": "<base64-encoded-jpeg>",
|
|
994
|
-
"metadata": {
|
|
995
|
-
"deviceWidth": 1280,
|
|
996
|
-
"deviceHeight": 720,
|
|
997
|
-
"pageScaleFactor": 1,
|
|
998
|
-
"offsetTop": 0,
|
|
999
|
-
"scrollOffsetX": 0,
|
|
1000
|
-
"scrollOffsetY": 0
|
|
1001
|
-
}
|
|
1002
|
-
}
|
|
1003
|
-
```
|
|
1004
|
-
|
|
1005
|
-
**Send mouse events:**
|
|
1006
|
-
|
|
1007
|
-
```json
|
|
1008
|
-
{
|
|
1009
|
-
"type": "input_mouse",
|
|
1010
|
-
"eventType": "mousePressed",
|
|
1011
|
-
"x": 100,
|
|
1012
|
-
"y": 200,
|
|
1013
|
-
"button": "left",
|
|
1014
|
-
"clickCount": 1
|
|
1015
|
-
}
|
|
1016
|
-
```
|
|
1017
|
-
|
|
1018
|
-
**Send keyboard events:**
|
|
1019
|
-
|
|
1020
|
-
```json
|
|
1021
|
-
{
|
|
1022
|
-
"type": "input_keyboard",
|
|
1023
|
-
"eventType": "keyDown",
|
|
1024
|
-
"key": "Enter",
|
|
1025
|
-
"code": "Enter"
|
|
1026
|
-
}
|
|
1027
|
-
```
|
|
1028
|
-
|
|
1029
|
-
**Send touch events:**
|
|
1030
|
-
|
|
1031
|
-
```json
|
|
1032
|
-
{
|
|
1033
|
-
"type": "input_touch",
|
|
1034
|
-
"eventType": "touchStart",
|
|
1035
|
-
"touchPoints": [{ "x": 100, "y": 200 }]
|
|
1036
|
-
}
|
|
1037
|
-
```
|
|
1038
|
-
|
|
1039
|
-
## Architecture
|
|
1040
|
-
|
|
1041
|
-
agent-browser uses a client-daemon architecture:
|
|
1042
|
-
|
|
1043
|
-
1. **Rust CLI** - Parses commands, communicates with daemon
|
|
1044
|
-
2. **Rust Daemon** - Pure Rust daemon using direct CDP, no Node.js required
|
|
1045
|
-
|
|
1046
|
-
The daemon starts automatically on first command and persists between commands for fast subsequent operations. To auto-shutdown the daemon after a period of inactivity, set `AGENT_BROWSER_IDLE_TIMEOUT_MS` (value in milliseconds). When set, the daemon closes the browser and exits after receiving no commands for the specified duration.
|
|
1047
|
-
|
|
1048
|
-
**Browser Engine:** Uses Chrome (from Chrome for Testing) by default. The `--engine` flag selects between `chrome` and `lightpanda`. Supported browsers: Chromium/Chrome (via CDP) and Safari (via WebDriver for iOS).
|
|
1049
|
-
|
|
1050
|
-
## Platforms
|
|
1051
|
-
|
|
1052
|
-
| Platform | Binary |
|
|
1053
|
-
| ----------- | ----------- |
|
|
1054
|
-
| macOS ARM64 | Native Rust |
|
|
1055
|
-
| macOS x64 | Native Rust |
|
|
1056
|
-
| Linux ARM64 | Native Rust |
|
|
1057
|
-
| Linux x64 | Native Rust |
|
|
1058
|
-
| Windows x64 | Native Rust |
|
|
1059
|
-
|
|
1060
|
-
## Usage with AI Agents
|
|
1061
|
-
|
|
1062
|
-
### Just ask the agent
|
|
1063
|
-
|
|
1064
|
-
The simplest approach -- just tell your agent to use it:
|
|
1065
|
-
|
|
1066
|
-
```
|
|
1067
|
-
Use agent-browser to test the login flow. Run agent-browser --help to see available commands.
|
|
1068
|
-
```
|
|
1069
|
-
|
|
1070
|
-
The `--help` output is comprehensive and most agents can figure it out from there.
|
|
1071
|
-
|
|
1072
|
-
### AI Coding Assistants (recommended)
|
|
1073
|
-
|
|
1074
|
-
Add the skill to your AI coding assistant for richer context:
|
|
1075
|
-
|
|
1076
|
-
```bash
|
|
1077
|
-
npx skills add vercel-labs/agent-browser
|
|
1078
|
-
```
|
|
1079
|
-
|
|
1080
|
-
This works with Claude Code, Codex, Cursor, Gemini CLI, GitHub Copilot, Goose, OpenCode, and Windsurf. The skill is fetched from the repository, so it stays up to date automatically -- do not copy `SKILL.md` from `node_modules` as it will become stale.
|
|
1081
|
-
|
|
1082
|
-
### Claude Code
|
|
1083
|
-
|
|
1084
|
-
Install as a Claude Code skill:
|
|
1085
|
-
|
|
1086
|
-
```bash
|
|
1087
|
-
npx skills add vercel-labs/agent-browser
|
|
1088
|
-
```
|
|
1089
|
-
|
|
1090
|
-
This adds the skill to `.claude/skills/agent-browser/SKILL.md` in your project. The skill teaches Claude Code the full agent-browser workflow, including the snapshot-ref interaction pattern, session management, and timeout handling.
|
|
1091
|
-
|
|
1092
|
-
### AGENTS.md / CLAUDE.md
|
|
1093
|
-
|
|
1094
|
-
For more consistent results, add to your project or global instructions file:
|
|
1095
|
-
|
|
1096
|
-
```markdown
|
|
1097
|
-
## Browser Automation
|
|
1098
|
-
|
|
1099
|
-
Use `agent-browser` for web automation. Run `agent-browser --help` for all commands.
|
|
1100
|
-
|
|
1101
|
-
Core workflow:
|
|
1102
|
-
|
|
1103
|
-
1. `agent-browser open <url>` - Navigate to page
|
|
1104
|
-
2. `agent-browser snapshot -i` - Get interactive elements with refs (@e1, @e2)
|
|
1105
|
-
3. `agent-browser click @e1` / `fill @e2 "text"` - Interact using refs
|
|
1106
|
-
4. Re-snapshot after page changes
|
|
1107
|
-
```
|
|
1108
|
-
|
|
1109
|
-
## Integrations
|
|
1110
|
-
|
|
1111
|
-
### iOS Simulator
|
|
1112
|
-
|
|
1113
|
-
Control real Mobile Safari in the iOS Simulator for authentic mobile web testing. Requires macOS with Xcode.
|
|
1114
|
-
|
|
1115
|
-
**Setup:**
|
|
1116
|
-
|
|
1117
|
-
```bash
|
|
1118
|
-
# Install Appium and XCUITest driver
|
|
1119
|
-
npm install -g appium
|
|
1120
|
-
appium driver install xcuitest
|
|
1121
|
-
```
|
|
1122
|
-
|
|
1123
|
-
**Usage:**
|
|
1124
|
-
|
|
1125
|
-
```bash
|
|
1126
|
-
# List available iOS simulators
|
|
1127
|
-
agent-browser device list
|
|
1128
|
-
|
|
1129
|
-
# Launch Safari on a specific device
|
|
1130
|
-
agent-browser -p ios --device "iPhone 16 Pro" open https://example.com
|
|
1131
|
-
|
|
1132
|
-
# Same commands as desktop
|
|
1133
|
-
agent-browser -p ios snapshot -i
|
|
1134
|
-
agent-browser -p ios tap @e1
|
|
1135
|
-
agent-browser -p ios fill @e2 "text"
|
|
1136
|
-
agent-browser -p ios screenshot mobile.png
|
|
1137
|
-
|
|
1138
|
-
# Mobile-specific commands
|
|
1139
|
-
agent-browser -p ios swipe up
|
|
1140
|
-
agent-browser -p ios swipe down 500
|
|
1141
|
-
|
|
1142
|
-
# Close session
|
|
1143
|
-
agent-browser -p ios close
|
|
1144
|
-
```
|
|
1145
|
-
|
|
1146
|
-
Or use environment variables:
|
|
1147
|
-
|
|
1148
|
-
```bash
|
|
1149
|
-
export AGENT_BROWSER_PROVIDER=ios
|
|
1150
|
-
export AGENT_BROWSER_IOS_DEVICE="iPhone 16 Pro"
|
|
49
|
+
# Connect to your Chrome and navigate
|
|
1151
50
|
agent-browser open https://example.com
|
|
1152
|
-
```
|
|
1153
|
-
|
|
1154
|
-
| Variable | Description |
|
|
1155
|
-
| -------------------------- | ----------------------------------------------- |
|
|
1156
|
-
| `AGENT_BROWSER_PROVIDER` | Set to `ios` to enable iOS mode |
|
|
1157
|
-
| `AGENT_BROWSER_IOS_DEVICE` | Device name (e.g., "iPhone 16 Pro", "iPad Pro") |
|
|
1158
|
-
| `AGENT_BROWSER_IOS_UDID` | Device UDID (alternative to device name) |
|
|
1159
51
|
|
|
1160
|
-
|
|
1161
|
-
|
|
1162
|
-
|
|
1163
|
-
|
|
1164
|
-
#### Real Device Support
|
|
1165
|
-
|
|
1166
|
-
Appium also supports real iOS devices connected via USB. This requires additional one-time setup:
|
|
1167
|
-
|
|
1168
|
-
**1. Get your device UDID:**
|
|
1169
|
-
|
|
1170
|
-
```bash
|
|
1171
|
-
xcrun xctrace list devices
|
|
1172
|
-
# or
|
|
1173
|
-
system_profiler SPUSBDataType | grep -A 5 "iPhone\|iPad"
|
|
52
|
+
# Everything works through your logged-in browser
|
|
53
|
+
agent-browser click "Post"
|
|
54
|
+
agent-browser fill "Title" "Hello World"
|
|
55
|
+
agent-browser screenshot ./page.png
|
|
1174
56
|
```
|
|
1175
57
|
|
|
1176
|
-
|
|
58
|
+
The agent operates in your Chrome — you'll see tabs opening, pages loading, clicks happening in real time. You can take over at any point (e.g. solve a CAPTCHA), then let the agent continue.
|
|
1177
59
|
|
|
1178
|
-
|
|
1179
|
-
# Open the WebDriverAgent Xcode project
|
|
1180
|
-
cd ~/.appium/node_modules/appium-xcuitest-driver/node_modules/appium-webdriveragent
|
|
1181
|
-
open WebDriverAgent.xcodeproj
|
|
1182
|
-
```
|
|
60
|
+
### Standalone mode
|
|
1183
61
|
|
|
1184
|
-
|
|
1185
|
-
|
|
1186
|
-
- Select the `WebDriverAgentRunner` target
|
|
1187
|
-
- Go to Signing & Capabilities
|
|
1188
|
-
- Select your Team (requires Apple Developer account, free tier works)
|
|
1189
|
-
- Let Xcode manage signing automatically
|
|
1190
|
-
|
|
1191
|
-
**3. Use with agent-browser:**
|
|
62
|
+
If you need a separate browser (CI, testing, etc.):
|
|
1192
63
|
|
|
1193
64
|
```bash
|
|
1194
|
-
|
|
1195
|
-
agent-browser -p ios --device "<DEVICE_UDID>" open https://example.com
|
|
1196
|
-
|
|
1197
|
-
# Or use the device name if unique
|
|
1198
|
-
agent-browser -p ios --device "John's iPhone" open https://example.com
|
|
65
|
+
agent-browser --launch open https://example.com
|
|
1199
66
|
```
|
|
1200
67
|
|
|
1201
|
-
|
|
1202
|
-
|
|
1203
|
-
- First run installs WebDriverAgent to the device (may require Trust prompt)
|
|
1204
|
-
- Device must be unlocked and connected via USB
|
|
1205
|
-
- Slightly slower initial connection than simulator
|
|
1206
|
-
- Tests against real Safari performance and behavior
|
|
68
|
+
In CI environments, standalone mode is used automatically.
|
|
1207
69
|
|
|
1208
|
-
|
|
70
|
+
## Anti-detection
|
|
1209
71
|
|
|
1210
|
-
|
|
72
|
+
When connected to your real Chrome, we inject **zero** JavaScript patches. Your browser's fingerprint is completely genuine.
|
|
1211
73
|
|
|
1212
|
-
|
|
1213
|
-
|
|
1214
|
-
```bash
|
|
1215
|
-
export BROWSERLESS_API_KEY="your-api-token"
|
|
1216
|
-
agent-browser -p browserless open https://example.com
|
|
1217
|
-
```
|
|
74
|
+
The only thing we do is call `Emulation.setAutomationOverride` via CDP to set `navigator.webdriver = false` at the native Chrome level — undetectable by lie-detection systems like CreepJS.
|
|
1218
75
|
|
|
1219
|
-
|
|
76
|
+
**Test results (connected to real Chrome):**
|
|
1220
77
|
|
|
1221
|
-
|
|
1222
|
-
|
|
1223
|
-
|
|
1224
|
-
|
|
1225
|
-
|
|
1226
|
-
|
|
1227
|
-
Optional configuration via environment variables:
|
|
1228
|
-
|
|
1229
|
-
| Variable | Description | Default |
|
|
1230
|
-
| -------------------------- | ------------------------------------------------ | --------------------------------------- |
|
|
1231
|
-
| `BROWSERLESS_API_URL` | Base API URL (for custom regions or self-hosted) | `https://production-sfo.browserless.io` |
|
|
1232
|
-
| `BROWSERLESS_BROWSER_TYPE` | Type of browser to use (chromium or chrome) | chromium |
|
|
1233
|
-
| `BROWSERLESS_TTL` | Session TTL in milliseconds | `300000` |
|
|
1234
|
-
| `BROWSERLESS_STEALTH` | Enable stealth mode (`true`/`false`) | `true` |
|
|
1235
|
-
|
|
1236
|
-
When enabled, agent-browser connects to a Browserless cloud session instead of launching a local browser. All commands work identically.
|
|
1237
|
-
|
|
1238
|
-
Get your API token from the [Browserless Dashboard](https://browserless.io).
|
|
1239
|
-
|
|
1240
|
-
### Browserbase
|
|
1241
|
-
|
|
1242
|
-
[Browserbase](https://browserbase.com) provides remote browser infrastructure to make deployment of agentic browsing agents easy. Use it when running the agent-browser CLI in an environment where a local browser isn't feasible.
|
|
1243
|
-
|
|
1244
|
-
To enable Browserbase, use the `-p` flag:
|
|
1245
|
-
|
|
1246
|
-
```bash
|
|
1247
|
-
export BROWSERBASE_API_KEY="your-api-key"
|
|
1248
|
-
agent-browser -p browserbase open https://example.com
|
|
1249
|
-
```
|
|
1250
|
-
|
|
1251
|
-
Or use environment variables for CI/scripts:
|
|
1252
|
-
|
|
1253
|
-
```bash
|
|
1254
|
-
export AGENT_BROWSER_PROVIDER=browserbase
|
|
1255
|
-
export BROWSERBASE_API_KEY="your-api-key"
|
|
1256
|
-
agent-browser open https://example.com
|
|
1257
|
-
```
|
|
1258
|
-
|
|
1259
|
-
When enabled, agent-browser connects to a Browserbase session instead of launching a local browser. All commands work identically.
|
|
1260
|
-
|
|
1261
|
-
Get your API key from the [Browserbase Dashboard](https://browserbase.com/overview).
|
|
1262
|
-
|
|
1263
|
-
### Browser Use
|
|
1264
|
-
|
|
1265
|
-
[Browser Use](https://browser-use.com) provides cloud browser infrastructure for AI agents. Use it when running agent-browser in environments where a local browser isn't available (serverless, CI/CD, etc.).
|
|
1266
|
-
|
|
1267
|
-
To enable Browser Use, use the `-p` flag:
|
|
1268
|
-
|
|
1269
|
-
```bash
|
|
1270
|
-
export BROWSER_USE_API_KEY="your-api-key"
|
|
1271
|
-
agent-browser -p browseruse open https://example.com
|
|
1272
|
-
```
|
|
1273
|
-
|
|
1274
|
-
Or use environment variables for CI/scripts:
|
|
1275
|
-
|
|
1276
|
-
```bash
|
|
1277
|
-
export AGENT_BROWSER_PROVIDER=browseruse
|
|
1278
|
-
export BROWSER_USE_API_KEY="your-api-key"
|
|
1279
|
-
agent-browser open https://example.com
|
|
1280
|
-
```
|
|
1281
|
-
|
|
1282
|
-
When enabled, agent-browser connects to a Browser Use cloud session instead of launching a local browser. All commands work identically.
|
|
1283
|
-
|
|
1284
|
-
Get your API key from the [Browser Use Cloud Dashboard](https://cloud.browser-use.com/settings?tab=api-keys). Free credits are available to get started, with pay-as-you-go pricing after.
|
|
1285
|
-
|
|
1286
|
-
### Kernel
|
|
1287
|
-
|
|
1288
|
-
[Kernel](https://www.kernel.sh) provides cloud browser infrastructure for AI agents with features like stealth mode and persistent profiles.
|
|
1289
|
-
|
|
1290
|
-
To enable Kernel, use the `-p` flag:
|
|
1291
|
-
|
|
1292
|
-
```bash
|
|
1293
|
-
export KERNEL_API_KEY="your-api-key"
|
|
1294
|
-
agent-browser -p kernel open https://example.com
|
|
1295
|
-
```
|
|
1296
|
-
|
|
1297
|
-
Or use environment variables for CI/scripts:
|
|
1298
|
-
|
|
1299
|
-
```bash
|
|
1300
|
-
export AGENT_BROWSER_PROVIDER=kernel
|
|
1301
|
-
export KERNEL_API_KEY="your-api-key"
|
|
1302
|
-
agent-browser open https://example.com
|
|
1303
|
-
```
|
|
1304
|
-
|
|
1305
|
-
Optional configuration via environment variables:
|
|
1306
|
-
|
|
1307
|
-
| Variable | Description | Default |
|
|
1308
|
-
| ------------------------ | -------------------------------------------------------------------------------- | ------- |
|
|
1309
|
-
| `KERNEL_HEADLESS` | Run browser in headless mode (`true`/`false`) | `false` |
|
|
1310
|
-
| `KERNEL_STEALTH` | Enable stealth mode to avoid bot detection (`true`/`false`) | `true` |
|
|
1311
|
-
| `KERNEL_TIMEOUT_SECONDS` | Session timeout in seconds | `300` |
|
|
1312
|
-
| `KERNEL_PROFILE_NAME` | Browser profile name for persistent cookies/logins (created if it doesn't exist) | (none) |
|
|
1313
|
-
|
|
1314
|
-
When enabled, agent-browser connects to a Kernel cloud session instead of launching a local browser. All commands work identically.
|
|
1315
|
-
|
|
1316
|
-
**Profile Persistence:** When `KERNEL_PROFILE_NAME` is set, the profile will be created if it doesn't already exist. Cookies, logins, and session data are automatically saved back to the profile when the browser session ends, making them available for future sessions.
|
|
1317
|
-
|
|
1318
|
-
Get your API key from the [Kernel Dashboard](https://dashboard.onkernel.com).
|
|
1319
|
-
|
|
1320
|
-
### AgentCore
|
|
1321
|
-
|
|
1322
|
-
[AWS Bedrock AgentCore](https://aws.amazon.com/bedrock/agentcore/) provides cloud browser sessions with SigV4 authentication.
|
|
1323
|
-
|
|
1324
|
-
To enable AgentCore, use the `-p` flag:
|
|
1325
|
-
|
|
1326
|
-
```bash
|
|
1327
|
-
agent-browser -p agentcore open https://example.com
|
|
1328
|
-
```
|
|
1329
|
-
|
|
1330
|
-
Or use environment variables for CI/scripts:
|
|
1331
|
-
|
|
1332
|
-
```bash
|
|
1333
|
-
export AGENT_BROWSER_PROVIDER=agentcore
|
|
1334
|
-
agent-browser open https://example.com
|
|
1335
|
-
```
|
|
78
|
+
| Test site | Result |
|
|
79
|
+
|---|---|
|
|
80
|
+
| [CreepJS](https://abrahamjuliot.github.io/creepjs/) | 0% stealth, 0% headless |
|
|
81
|
+
| [bot.sannysoft.com](https://bot.sannysoft.com) | All green |
|
|
82
|
+
| [Cloudflare Turnstile](https://nowsecure.nl) | Passed |
|
|
1336
83
|
|
|
1337
|
-
|
|
84
|
+
When using `--launch` mode (standalone browser), a full suite of 32 stealth patches is applied for headless Chrome.
|
|
1338
85
|
|
|
1339
|
-
|
|
86
|
+
## Differences from upstream
|
|
1340
87
|
|
|
1341
|
-
|
|
1342
|
-
| -------------------------- | -------------------------------------------------------------------- | ---------------- |
|
|
1343
|
-
| `AGENTCORE_REGION` | AWS region for the AgentCore endpoint | `us-east-1` |
|
|
1344
|
-
| `AGENTCORE_BROWSER_ID` | Browser identifier | `aws.browser.v1` |
|
|
1345
|
-
| `AGENTCORE_PROFILE_ID` | Browser profile for persistent state (cookies, localStorage) | (none) |
|
|
1346
|
-
| `AGENTCORE_SESSION_TIMEOUT`| Session timeout in seconds | `3600` |
|
|
1347
|
-
| `AWS_PROFILE` | AWS CLI profile for credential resolution | `default` |
|
|
88
|
+
Based on [agent-browser v0.27.0](https://github.com/vercel-labs/agent-browser). Changes:
|
|
1348
89
|
|
|
1349
|
-
**
|
|
90
|
+
- **Auto-connect is default** — `agent-browser open <url>` connects to your Chrome instead of launching a new one
|
|
91
|
+
- **CDP-native stealth** — `Emulation.setAutomationOverride` instead of JS patches
|
|
92
|
+
- **Dual stealth mode** — zero patches for real Chrome, full patches for `--launch` mode
|
|
93
|
+
- **`--launch` / `--new` flag** — explicitly start a standalone browser
|
|
94
|
+
- **CI auto-detection** — standalone mode when `CI` env var is set
|
|
1350
95
|
|
|
1351
|
-
|
|
96
|
+
All upstream features (commands, snapshots, screenshots, recordings, tabs, sessions, etc.) work the same. See the [upstream repo](https://github.com/vercel-labs/agent-browser) for full documentation.
|
|
1352
97
|
|
|
1353
98
|
## License
|
|
1354
99
|
|
|
1355
|
-
Apache-2.0
|
|
100
|
+
Apache-2.0 (same as upstream)
|