agent-browser-stealth 0.24.0-fork.1 → 0.27.0-fork.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (30) hide show
  1. package/README.md +54 -1309
  2. package/bin/.install-method +1 -0
  3. package/bin/agent-browser-darwin-arm64 +0 -0
  4. package/bin/agent-browser-darwin-x64 +0 -0
  5. package/bin/agent-browser-linux-arm64 +0 -0
  6. package/bin/agent-browser-linux-x64 +0 -0
  7. package/bin/agent-browser-win32-x64.exe +0 -0
  8. package/package.json +5 -7
  9. package/{skills → skill-data}/agentcore/SKILL.md +1 -1
  10. package/skill-data/core/SKILL.md +476 -0
  11. package/{skills/agent-browser → skill-data/core}/references/commands.md +101 -7
  12. package/skill-data/core/references/trust-boundaries.md +89 -0
  13. package/{skills → skill-data}/dogfood/SKILL.md +1 -1
  14. package/{skills → skill-data}/electron/SKILL.md +1 -1
  15. package/{skills → skill-data}/slack/SKILL.md +1 -1
  16. package/skills/agent-browser/SKILL.md +32 -746
  17. /package/{skills/agent-browser → skill-data/core}/references/authentication.md +0 -0
  18. /package/{skills/agent-browser → skill-data/core}/references/profiling.md +0 -0
  19. /package/{skills/agent-browser → skill-data/core}/references/proxy-support.md +0 -0
  20. /package/{skills/agent-browser → skill-data/core}/references/session-management.md +0 -0
  21. /package/{skills/agent-browser → skill-data/core}/references/snapshot-refs.md +0 -0
  22. /package/{skills/agent-browser → skill-data/core}/references/video-recording.md +0 -0
  23. /package/{skills/agent-browser → skill-data/core}/templates/authenticated-session.sh +0 -0
  24. /package/{skills/agent-browser → skill-data/core}/templates/capture-workflow.sh +0 -0
  25. /package/{skills/agent-browser → skill-data/core}/templates/form-automation.sh +0 -0
  26. /package/{skills → skill-data}/dogfood/references/issue-taxonomy.md +0 -0
  27. /package/{skills → skill-data}/dogfood/templates/dogfood-report-template.md +0 -0
  28. /package/{skills → skill-data}/slack/references/slack-tasks.md +0 -0
  29. /package/{skills → skill-data}/slack/templates/slack-report-template.md +0 -0
  30. /package/{skills → skill-data}/vercel-sandbox/SKILL.md +0 -0
package/README.md CHANGED
@@ -1,1355 +1,100 @@
1
- # agent-browser
1
+ # agent-browser-stealth
2
2
 
3
- Browser automation CLI for AI agents. Fast native Rust CLI.
3
+ Stealth fork of [agent-browser](https://github.com/vercel-labs/agent-browser) connects to your real Chrome, shares your login sessions, and is undetectable by anti-bot systems.
4
4
 
5
- ## Installation
5
+ For basic usage, commands, and API reference, see the [upstream documentation](https://github.com/vercel-labs/agent-browser).
6
6
 
7
- ### Global Installation (recommended)
7
+ ## Why this fork?
8
8
 
9
- Installs the native Rust binary:
9
+ **agent-browser** launches a fresh browser with an empty profile. You need to log in again, and websites can detect it's automated.
10
10
 
11
- ```bash
12
- npm install -g agent-browser
13
- agent-browser install # Download Chrome from Chrome for Testing (first time only)
14
- ```
15
-
16
- ### Project Installation (local dependency)
17
-
18
- For projects that want to pin the version in `package.json`:
19
-
20
- ```bash
21
- npm install agent-browser
22
- agent-browser install
23
- ```
24
-
25
- Then use via `package.json` scripts or by invoking `agent-browser` directly.
26
-
27
- ### Homebrew (macOS)
28
-
29
- ```bash
30
- brew install agent-browser
31
- agent-browser install # Download Chrome from Chrome for Testing (first time only)
32
- ```
33
-
34
- ### Cargo (Rust)
35
-
36
- ```bash
37
- cargo install agent-browser
38
- agent-browser install # Download Chrome from Chrome for Testing (first time only)
39
- ```
40
-
41
- ### From Source
42
-
43
- ```bash
44
- git clone https://github.com/vercel-labs/agent-browser
45
- cd agent-browser
46
- pnpm install
47
- pnpm build
48
- pnpm build:native # Requires Rust (https://rustup.rs)
49
- pnpm link --global # Makes agent-browser available globally
50
- agent-browser install
51
- ```
52
-
53
- ### Linux Dependencies
54
-
55
- On Linux, install system dependencies:
56
-
57
- ```bash
58
- agent-browser install --with-deps
59
- ```
60
-
61
- ### Updating
62
-
63
- Upgrade to the latest version:
64
-
65
- ```bash
66
- agent-browser upgrade
67
- ```
68
-
69
- Detects your installation method (npm, Homebrew, or Cargo) and runs the appropriate update command automatically.
70
-
71
- ### Requirements
72
-
73
- - **Chrome** - Run `agent-browser install` to download Chrome from [Chrome for Testing](https://developer.chrome.com/blog/chrome-for-testing/) (Google's official automation channel). Existing Chrome, Brave, Playwright, and Puppeteer installations are detected automatically. No Playwright or Node.js required for the daemon.
74
- - **Rust** - Only needed when building from source (see From Source above).
75
-
76
- ## Quick Start
77
-
78
- ```bash
79
- agent-browser open example.com
80
- agent-browser snapshot # Get accessibility tree with refs
81
- agent-browser click @e2 # Click by ref from snapshot
82
- agent-browser fill @e3 "test@example.com" # Fill by ref
83
- agent-browser get text @e1 # Get text by ref
84
- agent-browser screenshot page.png
85
- agent-browser close
86
- ```
87
-
88
- ### Traditional Selectors (also supported)
89
-
90
- ```bash
91
- agent-browser click "#submit"
92
- agent-browser fill "#email" "test@example.com"
93
- agent-browser find role button click --name "Submit"
94
- ```
95
-
96
- ## Commands
97
-
98
- ### Core Commands
99
-
100
- ```bash
101
- agent-browser open <url> # Navigate to URL (aliases: goto, navigate)
102
- agent-browser click <sel> # Click element (--new-tab to open in new tab)
103
- agent-browser dblclick <sel> # Double-click element
104
- agent-browser focus <sel> # Focus element
105
- agent-browser type <sel> <text> # Type into element
106
- agent-browser fill <sel> <text> # Clear and fill
107
- agent-browser press <key> # Press key (Enter, Tab, Control+a) (alias: key)
108
- agent-browser keyboard type <text> # Type with real keystrokes (no selector, current focus)
109
- agent-browser keyboard inserttext <text> # Insert text without key events (no selector)
110
- agent-browser keydown <key> # Hold key down
111
- agent-browser keyup <key> # Release key
112
- agent-browser hover <sel> # Hover element
113
- agent-browser select <sel> <val> # Select dropdown option
114
- agent-browser check <sel> # Check checkbox
115
- agent-browser uncheck <sel> # Uncheck checkbox
116
- agent-browser scroll <dir> [px] # Scroll (up/down/left/right, --selector <sel>)
117
- agent-browser scrollintoview <sel> # Scroll element into view (alias: scrollinto)
118
- agent-browser drag <src> <tgt> # Drag and drop
119
- agent-browser upload <sel> <files> # Upload files
120
- agent-browser screenshot [path] # Take screenshot (--full for full page, saves to a temporary directory if no path)
121
- agent-browser screenshot --annotate # Annotated screenshot with numbered element labels
122
- agent-browser screenshot --screenshot-dir ./shots # Save to custom directory
123
- agent-browser screenshot --screenshot-format jpeg --screenshot-quality 80
124
- agent-browser pdf <path> # Save as PDF
125
- agent-browser snapshot # Accessibility tree with refs (best for AI)
126
- agent-browser eval <js> # Run JavaScript (-b for base64, --stdin for piped input)
127
- agent-browser connect <port> # Connect to browser via CDP
128
- agent-browser stream enable [--port <port>] # Start runtime WebSocket streaming
129
- agent-browser stream status # Show runtime streaming state and bound port
130
- agent-browser stream disable # Stop runtime WebSocket streaming
131
- agent-browser close # Close browser (aliases: quit, exit)
132
- agent-browser close --all # Close all active sessions
133
- ```
134
-
135
- ### Get Info
136
-
137
- ```bash
138
- agent-browser get text <sel> # Get text content
139
- agent-browser get html <sel> # Get innerHTML
140
- agent-browser get value <sel> # Get input value
141
- agent-browser get attr <sel> <attr> # Get attribute
142
- agent-browser get title # Get page title
143
- agent-browser get url # Get current URL
144
- agent-browser get cdp-url # Get CDP WebSocket URL (for DevTools, debugging)
145
- agent-browser get count <sel> # Count matching elements
146
- agent-browser get box <sel> # Get bounding box
147
- agent-browser get styles <sel> # Get computed styles
148
- ```
149
-
150
- ### Check State
151
-
152
- ```bash
153
- agent-browser is visible <sel> # Check if visible
154
- agent-browser is enabled <sel> # Check if enabled
155
- agent-browser is checked <sel> # Check if checked
156
- ```
157
-
158
- ### Find Elements (Semantic Locators)
159
-
160
- ```bash
161
- agent-browser find role <role> <action> [value] # By ARIA role
162
- agent-browser find text <text> <action> # By text content
163
- agent-browser find label <label> <action> [value] # By label
164
- agent-browser find placeholder <ph> <action> [value] # By placeholder
165
- agent-browser find alt <text> <action> # By alt text
166
- agent-browser find title <text> <action> # By title attr
167
- agent-browser find testid <id> <action> [value] # By data-testid
168
- agent-browser find first <sel> <action> [value] # First match
169
- agent-browser find last <sel> <action> [value] # Last match
170
- agent-browser find nth <n> <sel> <action> [value] # Nth match
171
- ```
172
-
173
- **Actions:** `click`, `fill`, `type`, `hover`, `focus`, `check`, `uncheck`, `text`
174
-
175
- **Options:** `--name <name>` (filter role by accessible name), `--exact` (require exact text match)
176
-
177
- **Examples:**
178
-
179
- ```bash
180
- agent-browser find role button click --name "Submit"
181
- agent-browser find text "Sign In" click
182
- agent-browser find label "Email" fill "test@test.com"
183
- agent-browser find first ".item" click
184
- agent-browser find nth 2 "a" text
185
- ```
186
-
187
- ### Wait
188
-
189
- ```bash
190
- agent-browser wait <selector> # Wait for element to be visible
191
- agent-browser wait <ms> # Wait for time (milliseconds)
192
- agent-browser wait --text "Welcome" # Wait for text to appear (substring match)
193
- agent-browser wait --url "**/dash" # Wait for URL pattern
194
- agent-browser wait --load networkidle # Wait for load state
195
- agent-browser wait --fn "window.ready === true" # Wait for JS condition
196
-
197
- # Wait for text/element to disappear
198
- agent-browser wait --fn "!document.body.innerText.includes('Loading...')"
199
- agent-browser wait "#spinner" --state hidden
200
- ```
201
-
202
- **Load states:** `load`, `domcontentloaded`, `networkidle`
203
-
204
- ### Batch Execution
205
-
206
- Execute multiple commands in a single invocation by piping a JSON array of
207
- string arrays to `batch`. This avoids per-command process startup overhead
208
- when running multi-step workflows.
209
-
210
- ```bash
211
- # Pipe commands as JSON
212
- echo '[
213
- ["open", "https://example.com"],
214
- ["snapshot", "-i"],
215
- ["click", "@e1"],
216
- ["screenshot", "result.png"]
217
- ]' | agent-browser batch --json
218
-
219
- # Stop on first error
220
- agent-browser batch --bail < commands.json
221
- ```
222
-
223
- ### Clipboard
224
-
225
- ```bash
226
- agent-browser clipboard read # Read text from clipboard
227
- agent-browser clipboard write "Hello, World!" # Write text to clipboard
228
- agent-browser clipboard copy # Copy current selection (Ctrl+C)
229
- agent-browser clipboard paste # Paste from clipboard (Ctrl+V)
230
- ```
231
-
232
- ### Mouse Control
233
-
234
- ```bash
235
- agent-browser mouse move <x> <y> # Move mouse
236
- agent-browser mouse down [button] # Press button (left/right/middle)
237
- agent-browser mouse up [button] # Release button
238
- agent-browser mouse wheel <dy> [dx] # Scroll wheel
239
- ```
240
-
241
- ### Browser Settings
242
-
243
- ```bash
244
- agent-browser set viewport <w> <h> [scale] # Set viewport size (scale for retina, e.g. 2)
245
- agent-browser set device <name> # Emulate device ("iPhone 14")
246
- agent-browser set geo <lat> <lng> # Set geolocation
247
- agent-browser set offline [on|off] # Toggle offline mode
248
- agent-browser set headers <json> # Extra HTTP headers
249
- agent-browser set credentials <u> <p> # HTTP basic auth
250
- agent-browser set media [dark|light] # Emulate color scheme
251
- ```
252
-
253
- ### Cookies & Storage
254
-
255
- ```bash
256
- agent-browser cookies # Get all cookies
257
- agent-browser cookies set <name> <val> # Set cookie
258
- agent-browser cookies clear # Clear cookies
259
-
260
- agent-browser storage local # Get all localStorage
261
- agent-browser storage local <key> # Get specific key
262
- agent-browser storage local set <k> <v> # Set value
263
- agent-browser storage local clear # Clear all
264
-
265
- agent-browser storage session # Same for sessionStorage
266
- ```
267
-
268
- ### Network
269
-
270
- ```bash
271
- agent-browser network route <url> # Intercept requests
272
- agent-browser network route <url> --abort # Block requests
273
- agent-browser network route <url> --body <json> # Mock response
274
- agent-browser network unroute [url] # Remove routes
275
- agent-browser network requests # View tracked requests
276
- agent-browser network requests --filter api # Filter requests
277
- agent-browser network requests --type xhr,fetch # Filter by resource type
278
- agent-browser network requests --method POST # Filter by HTTP method
279
- agent-browser network requests --status 2xx # Filter by status (200, 2xx, 400-499)
280
- agent-browser network request <requestId> # View full request/response detail
281
- agent-browser network har start # Start HAR recording
282
- agent-browser network har stop [output.har] # Stop and save HAR (temp path if omitted)
283
- ```
284
-
285
- ### Tabs & Windows
286
-
287
- ```bash
288
- agent-browser tab # List tabs
289
- agent-browser tab new [url] # New tab (optionally with URL)
290
- agent-browser tab <n> # Switch to tab n
291
- agent-browser tab close [n] # Close tab
292
- agent-browser window new # New window
293
- ```
294
-
295
- ### Frames
296
-
297
- ```bash
298
- agent-browser frame <sel> # Switch to iframe
299
- agent-browser frame main # Back to main frame
300
- ```
301
-
302
- ### Dialogs
303
-
304
- ```bash
305
- agent-browser dialog accept [text] # Accept (with optional prompt text)
306
- agent-browser dialog dismiss # Dismiss
307
- agent-browser dialog status # Check if a dialog is currently open
308
- ```
309
-
310
- By default, `alert` and `beforeunload` dialogs are automatically accepted so they never block the agent. `confirm` and `prompt` dialogs still require explicit handling. Use `--no-auto-dialog` (or `AGENT_BROWSER_NO_AUTO_DIALOG=1`) to disable automatic handling.
311
-
312
- When a JavaScript dialog is pending, all command responses include a `warning` field with the dialog type and message.
313
-
314
- ### Diff
315
-
316
- ```bash
317
- agent-browser diff snapshot # Compare current vs last snapshot
318
- agent-browser diff snapshot --baseline before.txt # Compare current vs saved snapshot file
319
- agent-browser diff snapshot --selector "#main" --compact # Scoped snapshot diff
320
- agent-browser diff screenshot --baseline before.png # Visual pixel diff against baseline
321
- agent-browser diff screenshot --baseline b.png -o d.png # Save diff image to custom path
322
- agent-browser diff screenshot --baseline b.png -t 0.2 # Adjust color threshold (0-1)
323
- agent-browser diff url https://v1.com https://v2.com # Compare two URLs (snapshot diff)
324
- agent-browser diff url https://v1.com https://v2.com --screenshot # Also visual diff
325
- agent-browser diff url https://v1.com https://v2.com --wait-until networkidle # Custom wait strategy
326
- agent-browser diff url https://v1.com https://v2.com --selector "#main" # Scope to element
327
- ```
328
-
329
- ### Debug
330
-
331
- ```bash
332
- agent-browser trace start [path] # Start recording trace
333
- agent-browser trace stop [path] # Stop and save trace
334
- agent-browser profiler start # Start Chrome DevTools profiling
335
- agent-browser profiler stop [path] # Stop and save profile (.json)
336
- agent-browser console # View console messages (log, error, warn, info)
337
- agent-browser console --json # JSON output with raw CDP args for programmatic access
338
- agent-browser console --clear # Clear console
339
- agent-browser errors # View page errors (uncaught JavaScript exceptions)
340
- agent-browser errors --clear # Clear errors
341
- agent-browser highlight <sel> # Highlight element
342
- agent-browser inspect # Open Chrome DevTools for the active page
343
- agent-browser state save <path> # Save auth state
344
- agent-browser state load <path> # Load auth state
345
- agent-browser state list # List saved state files
346
- agent-browser state show <file> # Show state summary
347
- agent-browser state rename <old> <new> # Rename state file
348
- agent-browser state clear [name] # Clear states for session
349
- agent-browser state clear --all # Clear all saved states
350
- agent-browser state clean --older-than <days> # Delete old states
351
- ```
352
-
353
- ### Navigation
354
-
355
- ```bash
356
- agent-browser back # Go back
357
- agent-browser forward # Go forward
358
- agent-browser reload # Reload page
359
- ```
360
-
361
- ### Setup
362
-
363
- ```bash
364
- agent-browser install # Download Chrome from Chrome for Testing (Google's official automation channel)
365
- agent-browser install --with-deps # Also install system deps (Linux)
366
- agent-browser upgrade # Upgrade agent-browser to the latest version
367
- ```
368
-
369
- ## Authentication
370
-
371
- agent-browser provides multiple ways to persist login sessions so you don't re-authenticate every run.
372
-
373
- ### Quick summary
374
-
375
- | Approach | Best for | Flag / Env |
376
- |----------|----------|------------|
377
- | **Persistent profile** | Full browser state (cookies, IndexedDB, service workers, cache) across restarts | `--profile <path>` / `AGENT_BROWSER_PROFILE` |
378
- | **Session persistence** | Auto-save/restore cookies + localStorage by name | `--session-name <name>` / `AGENT_BROWSER_SESSION_NAME` |
379
- | **Import from your browser** | Grab auth from a Chrome session you already logged into | `--auto-connect` + `state save` |
380
- | **State file** | Load a previously saved state JSON on launch | `--state <path>` / `AGENT_BROWSER_STATE` |
381
- | **Auth vault** | Store credentials locally (encrypted), login by name | `auth save` / `auth login` |
382
-
383
- ### Import auth from your browser
384
-
385
- If you are already logged in to a site in Chrome, you can grab that auth state and reuse it:
386
-
387
- ```bash
388
- # 1. Launch Chrome with remote debugging enabled
389
- # macOS:
390
- "/Applications/Google Chrome.app/Contents/MacOS/Google Chrome" --remote-debugging-port=9222
391
- # Or use --auto-connect to discover an already-running Chrome
392
-
393
- # 2. Connect and save the authenticated state
394
- agent-browser --auto-connect state save ./my-auth.json
395
-
396
- # 3. Use the saved auth in future sessions
397
- agent-browser --state ./my-auth.json open https://app.example.com/dashboard
398
-
399
- # 4. Or use --session-name for automatic persistence
400
- agent-browser --session-name myapp state load ./my-auth.json
401
- # From now on, --session-name myapp auto-saves/restores this state
402
- ```
403
-
404
- > **Security notes:**
405
- > - `--remote-debugging-port` exposes full browser control on localhost. Any local process can connect. Only use on trusted machines and close Chrome when done.
406
- > - State files contain session tokens in plaintext. Add them to `.gitignore` and delete when no longer needed. For encryption at rest, set `AGENT_BROWSER_ENCRYPTION_KEY` (see [State Encryption](#state-encryption)).
407
-
408
- For full details on login flows, OAuth, 2FA, cookie-based auth, and the auth vault, see the [Authentication](docs/src/app/sessions/page.mdx) docs.
409
-
410
- ## Sessions
411
-
412
- Run multiple isolated browser instances:
413
-
414
- ```bash
415
- # Different sessions
416
- agent-browser --session agent1 open site-a.com
417
- agent-browser --session agent2 open site-b.com
418
-
419
- # Or via environment variable
420
- AGENT_BROWSER_SESSION=agent1 agent-browser click "#btn"
421
-
422
- # List active sessions
423
- agent-browser session list
424
- # Output:
425
- # Active sessions:
426
- # -> default
427
- # agent1
428
-
429
- # Show current session
430
- agent-browser session
431
- ```
432
-
433
- Each session has its own:
434
-
435
- - Browser instance
436
- - Cookies and storage
437
- - Navigation history
438
- - Authentication state
439
-
440
- ## Persistent Profiles
441
-
442
- By default, browser state (cookies, localStorage, login sessions) is ephemeral and lost when the browser closes. Use `--profile` to persist state across browser restarts:
443
-
444
- ```bash
445
- # Use a persistent profile directory
446
- agent-browser --profile ~/.myapp-profile open myapp.com
447
-
448
- # Login once, then reuse the authenticated session
449
- agent-browser --profile ~/.myapp-profile open myapp.com/dashboard
450
-
451
- # Or via environment variable
452
- AGENT_BROWSER_PROFILE=~/.myapp-profile agent-browser open myapp.com
453
- ```
454
-
455
- The profile directory stores:
456
-
457
- - Cookies and localStorage
458
- - IndexedDB data
459
- - Service workers
460
- - Browser cache
461
- - Login sessions
462
-
463
- **Tip**: Use different profile paths for different projects to keep their browser state isolated.
464
-
465
- ## Session Persistence
466
-
467
- Alternatively, use `--session-name` to automatically save and restore cookies and localStorage across browser restarts:
468
-
469
- ```bash
470
- # Auto-save/load state for "twitter" session
471
- agent-browser --session-name twitter open twitter.com
472
-
473
- # Login once, then state persists automatically
474
- # State files stored in ~/.agent-browser/sessions/
475
-
476
- # Or via environment variable
477
- export AGENT_BROWSER_SESSION_NAME=twitter
478
- agent-browser open twitter.com
479
- ```
480
-
481
- ### State Encryption
482
-
483
- Encrypt saved session data at rest with AES-256-GCM:
484
-
485
- ```bash
486
- # Generate key: openssl rand -hex 32
487
- export AGENT_BROWSER_ENCRYPTION_KEY=<64-char-hex-key>
488
-
489
- # State files are now encrypted automatically
490
- agent-browser --session-name secure open example.com
491
- ```
492
-
493
- | Variable | Description |
494
- | --------------------------------- | -------------------------------------------------- |
495
- | `AGENT_BROWSER_SESSION_NAME` | Auto-save/load state persistence name |
496
- | `AGENT_BROWSER_ENCRYPTION_KEY` | 64-char hex key for AES-256-GCM encryption |
497
- | `AGENT_BROWSER_STATE_EXPIRE_DAYS` | Auto-delete states older than N days (default: 30) |
498
-
499
- ## Security
500
-
501
- agent-browser includes security features for safe AI agent deployments. All features are opt-in -- existing workflows are unaffected until you explicitly enable a feature:
502
-
503
- - **Authentication Vault** -- Store credentials locally (always encrypted), reference by name. The LLM never sees passwords. `auth login` navigates with `load` and then waits for login form selectors to appear (SPA-friendly, timeout follows the default action timeout). A key is auto-generated at `~/.agent-browser/.encryption-key` if `AGENT_BROWSER_ENCRYPTION_KEY` is not set: `echo "pass" | agent-browser auth save github --url https://github.com/login --username user --password-stdin` then `agent-browser auth login github`
504
- - **Content Boundary Markers** -- Wrap page output in delimiters so LLMs can distinguish tool output from untrusted content: `--content-boundaries`
505
- - **Domain Allowlist** -- Restrict navigation to trusted domains (wildcards like `*.example.com` also match the bare domain): `--allowed-domains "example.com,*.example.com"`. Sub-resource requests (scripts, images, fetch) and WebSocket/EventSource connections to non-allowed domains are also blocked. Include any CDN domains your target pages depend on (e.g., `*.cdn.example.com`).
506
- - **Action Policy** -- Gate destructive actions with a static policy file: `--action-policy ./policy.json`
507
- - **Action Confirmation** -- Require explicit approval for sensitive action categories: `--confirm-actions eval,download`
508
- - **Output Length Limits** -- Prevent context flooding: `--max-output 50000`
509
-
510
- | Variable | Description |
511
- | ----------------------------------- | ---------------------------------------- |
512
- | `AGENT_BROWSER_CONTENT_BOUNDARIES` | Wrap page output in boundary markers |
513
- | `AGENT_BROWSER_MAX_OUTPUT` | Max characters for page output |
514
- | `AGENT_BROWSER_ALLOWED_DOMAINS` | Comma-separated allowed domain patterns |
515
- | `AGENT_BROWSER_ACTION_POLICY` | Path to action policy JSON file |
516
- | `AGENT_BROWSER_CONFIRM_ACTIONS` | Action categories requiring confirmation |
517
- | `AGENT_BROWSER_CONFIRM_INTERACTIVE` | Enable interactive confirmation prompts |
518
-
519
- See [Security documentation](https://agent-browser.dev/security) for details.
520
-
521
- ## Snapshot Options
522
-
523
- The `snapshot` command supports filtering to reduce output size:
11
+ **agent-browser-stealth** connects to your existing Chrome. Your cookies, sessions, and browser fingerprint are all real — because it IS your real browser.
524
12
 
525
- ```bash
526
- agent-browser snapshot # Full accessibility tree
527
- agent-browser snapshot -i # Interactive elements only (buttons, inputs, links)
528
- agent-browser snapshot -c # Compact (remove empty structural elements)
529
- agent-browser snapshot -d 3 # Limit depth to 3 levels
530
- agent-browser snapshot -s "#main" # Scope to CSS selector
531
- agent-browser snapshot -i -c -d 5 # Combine options
532
- ```
533
-
534
- | Option | Description |
535
- | ---------------------- | ----------------------------------------------------------------------- |
536
- | `-i, --interactive` | Only show interactive elements (buttons, links, inputs) |
537
- | `-c, --compact` | Remove empty structural elements |
538
- | `-d, --depth <n>` | Limit tree depth |
539
- | `-s, --selector <sel>` | Scope to CSS selector |
540
-
541
- ## Annotated Screenshots
542
-
543
- The `--annotate` flag overlays numbered labels on interactive elements in the screenshot. Each label `[N]` corresponds to ref `@eN`, so the same refs work for both visual and text-based workflows.
544
-
545
- Annotated screenshots are supported on the CDP-backed browser path (Chrome/Lightpanda). The Safari/WebDriver backend does not yet support `--annotate`.
546
-
547
- ```bash
548
- agent-browser screenshot --annotate
549
- # -> Screenshot saved to /tmp/screenshot-2026-02-17T12-00-00-abc123.png
550
- # [1] @e1 button "Submit"
551
- # [2] @e2 link "Home"
552
- # [3] @e3 textbox "Email"
553
- ```
13
+ | | agent-browser | agent-browser-stealth |
14
+ |---|---|---|
15
+ | Browser | Launches new Chrome | Connects to your Chrome |
16
+ | Login state | Empty, need to re-login | Your existing sessions |
17
+ | Fingerprint | Automation markers present | Your real fingerprint |
18
+ | User collaboration | Separate window | Same window, take over anytime |
19
+ | CAPTCHA | Agent stuck | You solve it, agent continues |
554
20
 
555
- After an annotated screenshot, refs are cached so you can immediately interact with elements:
21
+ ## Install
556
22
 
557
23
  ```bash
558
- agent-browser screenshot --annotate ./page.png
559
- agent-browser click @e2 # Click the "Home" link labeled [2]
24
+ npm install -g agent-browser-stealth
560
25
  ```
561
26
 
562
- This is useful for multimodal AI models that can reason about visual layout, unlabeled icon buttons, canvas elements, or visual state that the text accessibility tree cannot capture.
563
-
564
- ## Options
565
-
566
- | Option | Description |
567
- |--------|-------------|
568
- | `--session <name>` | Use isolated session (or `AGENT_BROWSER_SESSION` env) |
569
- | `--session-name <name>` | Auto-save/restore session state (or `AGENT_BROWSER_SESSION_NAME` env) |
570
- | `--profile <path>` | Persistent browser profile directory (or `AGENT_BROWSER_PROFILE` env) |
571
- | `--state <path>` | Load storage state from JSON file (or `AGENT_BROWSER_STATE` env) |
572
- | `--headers <json>` | Set HTTP headers scoped to the URL's origin |
573
- | `--executable-path <path>` | Custom browser executable (or `AGENT_BROWSER_EXECUTABLE_PATH` env) |
574
- | `--extension <path>` | Load browser extension (repeatable; or `AGENT_BROWSER_EXTENSIONS` env) |
575
- | `--args <args>` | Browser launch args, comma or newline separated (or `AGENT_BROWSER_ARGS` env) |
576
- | `--user-agent <ua>` | Custom User-Agent string (or `AGENT_BROWSER_USER_AGENT` env) |
577
- | `--proxy <url>` | Proxy server URL with optional auth (or `AGENT_BROWSER_PROXY` env) |
578
- | `--proxy-bypass <hosts>` | Hosts to bypass proxy (or `AGENT_BROWSER_PROXY_BYPASS` env) |
579
- | `--ignore-https-errors` | Ignore HTTPS certificate errors (useful for self-signed certs) |
580
- | `--allow-file-access` | Allow file:// URLs to access local files (Chromium only) |
581
- | `-p, --provider <name>` | Cloud browser provider (or `AGENT_BROWSER_PROVIDER` env) |
582
- | `--device <name>` | iOS device name, e.g. "iPhone 15 Pro" (or `AGENT_BROWSER_IOS_DEVICE` env) |
583
- | `--json` | JSON output (for agents) |
584
- | `--annotate` | Annotated screenshot with numbered element labels (or `AGENT_BROWSER_ANNOTATE` env) |
585
- | `--screenshot-dir <path>` | Default screenshot output directory (or `AGENT_BROWSER_SCREENSHOT_DIR` env) |
586
- | `--screenshot-quality <n>` | JPEG quality 0-100 (or `AGENT_BROWSER_SCREENSHOT_QUALITY` env) |
587
- | `--screenshot-format <fmt>` | Screenshot format: `png`, `jpeg` (or `AGENT_BROWSER_SCREENSHOT_FORMAT` env) |
588
- | `--headed` | Show browser window (not headless) (or `AGENT_BROWSER_HEADED` env) |
589
- | `--cdp <port\|url>` | Connect via Chrome DevTools Protocol (port or WebSocket URL) |
590
- | `--auto-connect` | Auto-discover and connect to running Chrome (or `AGENT_BROWSER_AUTO_CONNECT` env) |
591
- | `--color-scheme <scheme>` | Color scheme: `dark`, `light`, `no-preference` (or `AGENT_BROWSER_COLOR_SCHEME` env) |
592
- | `--download-path <path>` | Default download directory (or `AGENT_BROWSER_DOWNLOAD_PATH` env) |
593
- | `--content-boundaries` | Wrap page output in boundary markers for LLM safety (or `AGENT_BROWSER_CONTENT_BOUNDARIES` env) |
594
- | `--max-output <chars>` | Truncate page output to N characters (or `AGENT_BROWSER_MAX_OUTPUT` env) |
595
- | `--allowed-domains <list>` | Comma-separated allowed domain patterns (or `AGENT_BROWSER_ALLOWED_DOMAINS` env) |
596
- | `--action-policy <path>` | Path to action policy JSON file (or `AGENT_BROWSER_ACTION_POLICY` env) |
597
- | `--confirm-actions <list>` | Action categories requiring confirmation (or `AGENT_BROWSER_CONFIRM_ACTIONS` env) |
598
- | `--confirm-interactive` | Interactive confirmation prompts; auto-denies if stdin is not a TTY (or `AGENT_BROWSER_CONFIRM_INTERACTIVE` env) |
599
- | `--engine <name>` | Browser engine: `chrome` (default), `lightpanda` (or `AGENT_BROWSER_ENGINE` env) |
600
- | `--no-auto-dialog` | Disable automatic dismissal of `alert`/`beforeunload` dialogs (or `AGENT_BROWSER_NO_AUTO_DIALOG` env) |
601
- | `--config <path>` | Use a custom config file (or `AGENT_BROWSER_CONFIG` env) |
602
- | `--debug` | Debug output |
603
-
604
- ## Observability Dashboard
27
+ ### Install the AI agent skills
605
28
 
606
- Monitor agent-browser sessions in real time with a local web dashboard showing a live viewport and command activity feed.
29
+ The repo ships SKILL.md files for Claude Code, Cursor, etc. Pull them into the current project with [skills.sh](https://skills.sh):
607
30
 
608
31
  ```bash
609
- # Install the dashboard (one time)
610
- agent-browser dashboard install
611
-
612
- # Start the dashboard server (runs in background on port 4848)
613
- agent-browser dashboard start
614
- agent-browser dashboard start --port 8080 # Custom port
615
-
616
- # All sessions are automatically visible in the dashboard
617
- agent-browser open example.com
618
-
619
- # Stop the dashboard
620
- agent-browser dashboard stop
32
+ npx skills add leeguooooo/agent-browser-stealth
621
33
  ```
622
34
 
623
- The dashboard runs as a standalone background process on port 4848, independent of browser sessions. It stays available even when no sessions are running. All sessions automatically stream to the dashboard.
624
-
625
- The dashboard displays:
626
- - **Live viewport** -- real-time JPEG frames from the browser
627
- - **Activity feed** -- chronological command/result stream with timing and expandable details
628
- - **Console output** -- browser console messages (log, warn, error)
629
- - **Session creation** -- create new sessions from the UI with local engines (Chrome, Lightpanda) or cloud providers (AgentCore, Browserbase, Browserless, Browser Use, Kernel)
35
+ This drops `skills/agent-browser` (and the specialized `skill-data/{core,electron,slack,dogfood,agentcore,vercel-sandbox}`) into your project so your AI agent gets the right usage patterns and pre-approved bash permissions for `agent-browser`, `agent-browser-stealth`, and `abs`.
630
36
 
631
- ## Configuration
37
+ ## Setup (one time)
632
38
 
633
- Create an `agent-browser.json` file to set persistent defaults instead of repeating flags on every command.
39
+ Enable Chrome DevTools Protocol in your Chrome:
634
40
 
635
- **Locations (lowest to highest priority):**
41
+ 1. Open `chrome://inspect/#remote-debugging` in Chrome
42
+ 2. Toggle the switch on
636
43
 
637
- 1. `~/.agent-browser/config.json` -- user-level defaults
638
- 2. `./agent-browser.json` -- project-level overrides (in working directory)
639
- 3. `AGENT_BROWSER_*` environment variables override config file values
640
- 4. CLI flags override everything
44
+ That's it. This setting persists across Chrome restarts.
641
45
 
642
- **Example `agent-browser.json`:**
643
-
644
- ```json
645
- {
646
- "headed": true,
647
- "proxy": "http://localhost:8080",
648
- "profile": "./browser-data",
649
- "userAgent": "my-agent/1.0",
650
- "ignoreHttpsErrors": true
651
- }
652
- ```
653
-
654
- Use `--config <path>` or `AGENT_BROWSER_CONFIG` to load a specific config file instead of the defaults:
46
+ ## Usage
655
47
 
656
48
  ```bash
657
- agent-browser --config ./ci-config.json open example.com
658
- AGENT_BROWSER_CONFIG=./ci-config.json agent-browser open example.com
659
- ```
660
-
661
- All options from the table above can be set in the config file using camelCase keys (e.g., `--executable-path` becomes `"executablePath"`, `--proxy-bypass` becomes `"proxyBypass"`). Unknown keys are ignored for forward compatibility.
662
-
663
- Boolean flags accept an optional `true`/`false` value to override config settings. For example, `--headed false` disables `"headed": true` from config. A bare `--headed` is equivalent to `--headed true`.
664
-
665
- Auto-discovered config files that are missing are silently ignored. If `--config <path>` points to a missing or invalid file, agent-browser exits with an error. Extensions from user and project configs are merged (concatenated), not replaced.
666
-
667
- > **Tip:** If your project-level `agent-browser.json` contains environment-specific values (paths, proxies), consider adding it to `.gitignore`.
668
-
669
- ## Default Timeout
670
-
671
- The default timeout for standard operations (clicks, waits, fills, etc.) is 25 seconds. This is intentionally below the CLI's 30-second IPC read timeout so that the daemon returns a proper error instead of the CLI timing out with EAGAIN.
672
-
673
- Override the default timeout via environment variable:
674
-
675
- ```bash
676
- # Set a longer timeout for slow pages (in milliseconds)
677
- export AGENT_BROWSER_DEFAULT_TIMEOUT=45000
678
- ```
679
-
680
- > **Note:** Setting this above 30000 (30s) may cause EAGAIN errors on slow operations because the CLI's read timeout will expire before the daemon responds. The CLI retries transient errors automatically, but response times will increase.
681
-
682
- | Variable | Description |
683
- | ------------------------------- | ---------------------------------------- |
684
- | `AGENT_BROWSER_DEFAULT_TIMEOUT` | Default operation timeout in ms (default: 25000) |
685
-
686
- ## Selectors
687
-
688
- ### Refs (Recommended for AI)
689
-
690
- Refs provide deterministic element selection from snapshots:
691
-
692
- ```bash
693
- # 1. Get snapshot with refs
694
- agent-browser snapshot
695
- # Output:
696
- # - heading "Example Domain" [ref=e1] [level=1]
697
- # - button "Submit" [ref=e2]
698
- # - textbox "Email" [ref=e3]
699
- # - link "Learn more" [ref=e4]
700
-
701
- # 2. Use refs to interact
702
- agent-browser click @e2 # Click the button
703
- agent-browser fill @e3 "test@example.com" # Fill the textbox
704
- agent-browser get text @e1 # Get heading text
705
- agent-browser hover @e4 # Hover the link
706
- ```
707
-
708
- **Why use refs?**
709
-
710
- - **Deterministic**: Ref points to exact element from snapshot
711
- - **Fast**: No DOM re-query needed
712
- - **AI-friendly**: Snapshot + ref workflow is optimal for LLMs
713
-
714
- ### CSS Selectors
715
-
716
- ```bash
717
- agent-browser click "#id"
718
- agent-browser click ".class"
719
- agent-browser click "div > button"
720
- ```
721
-
722
- ### Text & XPath
723
-
724
- ```bash
725
- agent-browser click "text=Submit"
726
- agent-browser click "xpath=//button"
727
- ```
728
-
729
- ### Semantic Locators
730
-
731
- ```bash
732
- agent-browser find role button click --name "Submit"
733
- agent-browser find label "Email" fill "test@test.com"
734
- ```
735
-
736
- ## Agent Mode
737
-
738
- Use `--json` for machine-readable output:
739
-
740
- ```bash
741
- agent-browser snapshot --json
742
- # Returns: {"success":true,"data":{"snapshot":"...","refs":{"e1":{"role":"heading","name":"Title"},...}}}
743
-
744
- agent-browser get text @e1 --json
745
- agent-browser is visible @e2 --json
746
- ```
747
-
748
- ### Optimal AI Workflow
749
-
750
- ```bash
751
- # 1. Navigate and get snapshot
752
- agent-browser open example.com
753
- agent-browser snapshot -i --json # AI parses tree and refs
754
-
755
- # 2. AI identifies target refs from snapshot
756
- # 3. Execute actions using refs
757
- agent-browser click @e2
758
- agent-browser fill @e3 "input text"
759
-
760
- # 4. Get new snapshot if page changed
761
- agent-browser snapshot -i --json
762
- ```
763
-
764
- ### Command Chaining
765
-
766
- Commands can be chained with `&&` in a single shell invocation. The browser persists via a background daemon, so chaining is safe and more efficient:
767
-
768
- ```bash
769
- # Open, wait for load, and snapshot in one call
770
- agent-browser open example.com && agent-browser wait --load networkidle && agent-browser snapshot -i
771
-
772
- # Chain multiple interactions
773
- agent-browser fill @e1 "user@example.com" && agent-browser fill @e2 "pass" && agent-browser click @e3
774
-
775
- # Navigate and screenshot
776
- agent-browser open example.com && agent-browser wait --load networkidle && agent-browser screenshot page.png
777
- ```
778
-
779
- Use `&&` when you don't need intermediate output. Run commands separately when you need to parse output first (e.g., snapshot to discover refs before interacting).
780
-
781
- ## Headed Mode
782
-
783
- Show the browser window for debugging:
784
-
785
- ```bash
786
- agent-browser open example.com --headed
787
- ```
788
-
789
- This opens a visible browser window instead of running headless.
790
-
791
- > **Note:** Browser extensions work in both headed and headless mode (Chrome's `--headless=new`).
792
-
793
- ## Authenticated Sessions
794
-
795
- Use `--headers` to set HTTP headers for a specific origin, enabling authentication without login flows:
796
-
797
- ```bash
798
- # Headers are scoped to api.example.com only
799
- agent-browser open api.example.com --headers '{"Authorization": "Bearer <token>"}'
800
-
801
- # Requests to api.example.com include the auth header
802
- agent-browser snapshot -i --json
803
- agent-browser click @e2
804
-
805
- # Navigate to another domain - headers are NOT sent (safe!)
806
- agent-browser open other-site.com
807
- ```
808
-
809
- This is useful for:
810
-
811
- - **Skipping login flows** - Authenticate via headers instead of UI
812
- - **Switching users** - Start new sessions with different auth tokens
813
- - **API testing** - Access protected endpoints directly
814
- - **Security** - Headers are scoped to the origin, not leaked to other domains
815
-
816
- To set headers for multiple origins, use `--headers` with each `open` command:
817
-
818
- ```bash
819
- agent-browser open api.example.com --headers '{"Authorization": "Bearer token1"}'
820
- agent-browser open api.acme.com --headers '{"Authorization": "Bearer token2"}'
821
- ```
822
-
823
- For global headers (all domains), use `set headers`:
824
-
825
- ```bash
826
- agent-browser set headers '{"X-Custom-Header": "value"}'
827
- ```
828
-
829
- ## Custom Browser Executable
830
-
831
- Use a custom browser executable instead of the bundled Chromium. This is useful for:
832
-
833
- - **Serverless deployment**: Use lightweight Chromium builds like `@sparticuz/chromium` (~50MB vs ~684MB)
834
- - **System browsers**: Use an existing Chrome/Chromium installation
835
- - **Custom builds**: Use modified browser builds
836
-
837
- ### CLI Usage
838
-
839
- ```bash
840
- # Via flag
841
- agent-browser --executable-path /path/to/chromium open example.com
842
-
843
- # Via environment variable
844
- AGENT_BROWSER_EXECUTABLE_PATH=/path/to/chromium agent-browser open example.com
845
- ```
846
-
847
- ### Serverless (Vercel)
848
-
849
- Run agent-browser + Chrome in an ephemeral Vercel Sandbox microVM. No external server needed:
850
-
851
- ```typescript
852
- import { Sandbox } from "@vercel/sandbox";
853
-
854
- const sandbox = await Sandbox.create({ runtime: "node24" });
855
- await sandbox.runCommand("agent-browser", ["open", "https://example.com"]);
856
- const result = await sandbox.runCommand("agent-browser", ["screenshot", "--json"]);
857
- await sandbox.stop();
858
- ```
859
-
860
- See the [environments example](examples/environments/) for a working demo with a UI and deploy-to-Vercel button.
861
-
862
- ### Serverless (AWS Lambda)
863
-
864
- ```typescript
865
- import chromium from '@sparticuz/chromium';
866
- import { execSync } from 'child_process';
867
-
868
- export async function handler() {
869
- const executablePath = await chromium.executablePath();
870
- const result = execSync(
871
- `AGENT_BROWSER_EXECUTABLE_PATH=${executablePath} agent-browser open https://example.com && agent-browser snapshot -i --json`,
872
- { encoding: 'utf-8' }
873
- );
874
- return JSON.parse(result);
875
- }
876
- ```
877
-
878
- ## Local Files
879
-
880
- Open and interact with local files (PDFs, HTML, etc.) using `file://` URLs:
881
-
882
- ```bash
883
- # Enable file access (required for JavaScript to access local files)
884
- agent-browser --allow-file-access open file:///path/to/document.pdf
885
- agent-browser --allow-file-access open file:///path/to/page.html
886
-
887
- # Take screenshot of a local PDF
888
- agent-browser --allow-file-access open file:///Users/me/report.pdf
889
- agent-browser screenshot report.png
890
- ```
891
-
892
- The `--allow-file-access` flag adds Chromium flags (`--allow-file-access-from-files`, `--allow-file-access`) that allow `file://` URLs to:
893
-
894
- - Load and render local files
895
- - Access other local files via JavaScript (XHR, fetch)
896
- - Load local resources (images, scripts, stylesheets)
897
-
898
- **Note:** This flag only works with Chromium. For security, it's disabled by default.
899
-
900
- ## CDP Mode
901
-
902
- Connect to an existing browser via Chrome DevTools Protocol:
903
-
904
- ```bash
905
- # Start Chrome with: google-chrome --remote-debugging-port=9222
906
-
907
- # Connect once, then run commands without --cdp
908
- agent-browser connect 9222
909
- agent-browser snapshot
910
- agent-browser tab
911
- agent-browser close
912
-
913
- # Or pass --cdp on each command
914
- agent-browser --cdp 9222 snapshot
915
-
916
- # Connect to remote browser via WebSocket URL
917
- agent-browser --cdp "wss://your-browser-service.com/cdp?token=..." snapshot
918
- ```
919
-
920
- The `--cdp` flag accepts either:
921
-
922
- - A port number (e.g., `9222`) for local connections via `http://localhost:{port}`
923
- - A full WebSocket URL (e.g., `wss://...` or `ws://...`) for remote browser services
924
-
925
- This enables control of:
926
-
927
- - Electron apps
928
- - Chrome/Chromium instances with remote debugging
929
- - WebView2 applications
930
- - Any browser exposing a CDP endpoint
931
-
932
- ### Auto-Connect
933
-
934
- Use `--auto-connect` to automatically discover and connect to a running Chrome instance without specifying a port:
935
-
936
- ```bash
937
- # Auto-discover running Chrome with remote debugging
938
- agent-browser --auto-connect open example.com
939
- agent-browser --auto-connect snapshot
940
-
941
- # Or via environment variable
942
- AGENT_BROWSER_AUTO_CONNECT=1 agent-browser snapshot
943
- ```
944
-
945
- Auto-connect discovers Chrome by:
946
-
947
- 1. Reading Chrome's `DevToolsActivePort` file from the default user data directory
948
- 2. Falling back to probing common debugging ports (9222, 9229)
949
- 3. If HTTP-based discovery (`/json/version`, `/json/list`) fails, falling back to a direct WebSocket connection
950
-
951
- This is useful when:
952
-
953
- - Chrome 144+ has remote debugging enabled via `chrome://inspect/#remote-debugging` (which uses a dynamic port)
954
- - You want a zero-configuration connection to your existing browser
955
- - You don't want to track which port Chrome is using
956
-
957
- ## Streaming (Browser Preview)
958
-
959
- Stream the browser viewport via WebSocket for live preview or "pair browsing" where a human can watch and interact alongside an AI agent.
960
-
961
- ### Streaming
962
-
963
- Every session automatically starts a WebSocket stream server on an OS-assigned port. Use `stream status` to see the bound port and connection state:
964
-
965
- ```bash
966
- agent-browser stream status
967
- ```
968
-
969
- To bind to a specific port, set `AGENT_BROWSER_STREAM_PORT`:
970
-
971
- ```bash
972
- AGENT_BROWSER_STREAM_PORT=9223 agent-browser open example.com
973
- ```
974
-
975
- You can also manage streaming at runtime with `stream enable`, `stream disable`, and `stream status`:
976
-
977
- ```bash
978
- agent-browser stream enable --port 9223 # Re-enable on a specific port
979
- agent-browser stream disable # Stop streaming for the session
980
- ```
981
-
982
- The WebSocket server streams the browser viewport and accepts input events.
983
-
984
- ### WebSocket Protocol
985
-
986
- Connect to `ws://localhost:9223` to receive frames and send input:
987
-
988
- **Receive frames:**
989
-
990
- ```json
991
- {
992
- "type": "frame",
993
- "data": "<base64-encoded-jpeg>",
994
- "metadata": {
995
- "deviceWidth": 1280,
996
- "deviceHeight": 720,
997
- "pageScaleFactor": 1,
998
- "offsetTop": 0,
999
- "scrollOffsetX": 0,
1000
- "scrollOffsetY": 0
1001
- }
1002
- }
1003
- ```
1004
-
1005
- **Send mouse events:**
1006
-
1007
- ```json
1008
- {
1009
- "type": "input_mouse",
1010
- "eventType": "mousePressed",
1011
- "x": 100,
1012
- "y": 200,
1013
- "button": "left",
1014
- "clickCount": 1
1015
- }
1016
- ```
1017
-
1018
- **Send keyboard events:**
1019
-
1020
- ```json
1021
- {
1022
- "type": "input_keyboard",
1023
- "eventType": "keyDown",
1024
- "key": "Enter",
1025
- "code": "Enter"
1026
- }
1027
- ```
1028
-
1029
- **Send touch events:**
1030
-
1031
- ```json
1032
- {
1033
- "type": "input_touch",
1034
- "eventType": "touchStart",
1035
- "touchPoints": [{ "x": 100, "y": 200 }]
1036
- }
1037
- ```
1038
-
1039
- ## Architecture
1040
-
1041
- agent-browser uses a client-daemon architecture:
1042
-
1043
- 1. **Rust CLI** - Parses commands, communicates with daemon
1044
- 2. **Rust Daemon** - Pure Rust daemon using direct CDP, no Node.js required
1045
-
1046
- The daemon starts automatically on first command and persists between commands for fast subsequent operations. To auto-shutdown the daemon after a period of inactivity, set `AGENT_BROWSER_IDLE_TIMEOUT_MS` (value in milliseconds). When set, the daemon closes the browser and exits after receiving no commands for the specified duration.
1047
-
1048
- **Browser Engine:** Uses Chrome (from Chrome for Testing) by default. The `--engine` flag selects between `chrome` and `lightpanda`. Supported browsers: Chromium/Chrome (via CDP) and Safari (via WebDriver for iOS).
1049
-
1050
- ## Platforms
1051
-
1052
- | Platform | Binary |
1053
- | ----------- | ----------- |
1054
- | macOS ARM64 | Native Rust |
1055
- | macOS x64 | Native Rust |
1056
- | Linux ARM64 | Native Rust |
1057
- | Linux x64 | Native Rust |
1058
- | Windows x64 | Native Rust |
1059
-
1060
- ## Usage with AI Agents
1061
-
1062
- ### Just ask the agent
1063
-
1064
- The simplest approach -- just tell your agent to use it:
1065
-
1066
- ```
1067
- Use agent-browser to test the login flow. Run agent-browser --help to see available commands.
1068
- ```
1069
-
1070
- The `--help` output is comprehensive and most agents can figure it out from there.
1071
-
1072
- ### AI Coding Assistants (recommended)
1073
-
1074
- Add the skill to your AI coding assistant for richer context:
1075
-
1076
- ```bash
1077
- npx skills add vercel-labs/agent-browser
1078
- ```
1079
-
1080
- This works with Claude Code, Codex, Cursor, Gemini CLI, GitHub Copilot, Goose, OpenCode, and Windsurf. The skill is fetched from the repository, so it stays up to date automatically -- do not copy `SKILL.md` from `node_modules` as it will become stale.
1081
-
1082
- ### Claude Code
1083
-
1084
- Install as a Claude Code skill:
1085
-
1086
- ```bash
1087
- npx skills add vercel-labs/agent-browser
1088
- ```
1089
-
1090
- This adds the skill to `.claude/skills/agent-browser/SKILL.md` in your project. The skill teaches Claude Code the full agent-browser workflow, including the snapshot-ref interaction pattern, session management, and timeout handling.
1091
-
1092
- ### AGENTS.md / CLAUDE.md
1093
-
1094
- For more consistent results, add to your project or global instructions file:
1095
-
1096
- ```markdown
1097
- ## Browser Automation
1098
-
1099
- Use `agent-browser` for web automation. Run `agent-browser --help` for all commands.
1100
-
1101
- Core workflow:
1102
-
1103
- 1. `agent-browser open <url>` - Navigate to page
1104
- 2. `agent-browser snapshot -i` - Get interactive elements with refs (@e1, @e2)
1105
- 3. `agent-browser click @e1` / `fill @e2 "text"` - Interact using refs
1106
- 4. Re-snapshot after page changes
1107
- ```
1108
-
1109
- ## Integrations
1110
-
1111
- ### iOS Simulator
1112
-
1113
- Control real Mobile Safari in the iOS Simulator for authentic mobile web testing. Requires macOS with Xcode.
1114
-
1115
- **Setup:**
1116
-
1117
- ```bash
1118
- # Install Appium and XCUITest driver
1119
- npm install -g appium
1120
- appium driver install xcuitest
1121
- ```
1122
-
1123
- **Usage:**
1124
-
1125
- ```bash
1126
- # List available iOS simulators
1127
- agent-browser device list
1128
-
1129
- # Launch Safari on a specific device
1130
- agent-browser -p ios --device "iPhone 16 Pro" open https://example.com
1131
-
1132
- # Same commands as desktop
1133
- agent-browser -p ios snapshot -i
1134
- agent-browser -p ios tap @e1
1135
- agent-browser -p ios fill @e2 "text"
1136
- agent-browser -p ios screenshot mobile.png
1137
-
1138
- # Mobile-specific commands
1139
- agent-browser -p ios swipe up
1140
- agent-browser -p ios swipe down 500
1141
-
1142
- # Close session
1143
- agent-browser -p ios close
1144
- ```
1145
-
1146
- Or use environment variables:
1147
-
1148
- ```bash
1149
- export AGENT_BROWSER_PROVIDER=ios
1150
- export AGENT_BROWSER_IOS_DEVICE="iPhone 16 Pro"
49
+ # Connect to your Chrome and navigate
1151
50
  agent-browser open https://example.com
1152
- ```
1153
-
1154
- | Variable | Description |
1155
- | -------------------------- | ----------------------------------------------- |
1156
- | `AGENT_BROWSER_PROVIDER` | Set to `ios` to enable iOS mode |
1157
- | `AGENT_BROWSER_IOS_DEVICE` | Device name (e.g., "iPhone 16 Pro", "iPad Pro") |
1158
- | `AGENT_BROWSER_IOS_UDID` | Device UDID (alternative to device name) |
1159
51
 
1160
- **Supported devices:** All iOS Simulators available in Xcode (iPhones, iPads), plus real iOS devices.
1161
-
1162
- **Note:** The iOS provider boots the simulator, starts Appium, and controls Safari. First launch takes ~30-60 seconds; subsequent commands are fast.
1163
-
1164
- #### Real Device Support
1165
-
1166
- Appium also supports real iOS devices connected via USB. This requires additional one-time setup:
1167
-
1168
- **1. Get your device UDID:**
1169
-
1170
- ```bash
1171
- xcrun xctrace list devices
1172
- # or
1173
- system_profiler SPUSBDataType | grep -A 5 "iPhone\|iPad"
52
+ # Everything works through your logged-in browser
53
+ agent-browser click "Post"
54
+ agent-browser fill "Title" "Hello World"
55
+ agent-browser screenshot ./page.png
1174
56
  ```
1175
57
 
1176
- **2. Sign WebDriverAgent (one-time):**
58
+ The agent operates in your Chrome — you'll see tabs opening, pages loading, clicks happening in real time. You can take over at any point (e.g. solve a CAPTCHA), then let the agent continue.
1177
59
 
1178
- ```bash
1179
- # Open the WebDriverAgent Xcode project
1180
- cd ~/.appium/node_modules/appium-xcuitest-driver/node_modules/appium-webdriveragent
1181
- open WebDriverAgent.xcodeproj
1182
- ```
60
+ ### Standalone mode
1183
61
 
1184
- In Xcode:
1185
-
1186
- - Select the `WebDriverAgentRunner` target
1187
- - Go to Signing & Capabilities
1188
- - Select your Team (requires Apple Developer account, free tier works)
1189
- - Let Xcode manage signing automatically
1190
-
1191
- **3. Use with agent-browser:**
62
+ If you need a separate browser (CI, testing, etc.):
1192
63
 
1193
64
  ```bash
1194
- # Connect device via USB, then:
1195
- agent-browser -p ios --device "<DEVICE_UDID>" open https://example.com
1196
-
1197
- # Or use the device name if unique
1198
- agent-browser -p ios --device "John's iPhone" open https://example.com
65
+ agent-browser --launch open https://example.com
1199
66
  ```
1200
67
 
1201
- **Real device notes:**
1202
-
1203
- - First run installs WebDriverAgent to the device (may require Trust prompt)
1204
- - Device must be unlocked and connected via USB
1205
- - Slightly slower initial connection than simulator
1206
- - Tests against real Safari performance and behavior
68
+ In CI environments, standalone mode is used automatically.
1207
69
 
1208
- ### Browserless
70
+ ## Anti-detection
1209
71
 
1210
- [Browserless](https://browserless.io) provides cloud browser infrastructure with a Sessions API. Use it when running agent-browser in environments where a local browser isn't available.
72
+ When connected to your real Chrome, we inject **zero** JavaScript patches. Your browser's fingerprint is completely genuine.
1211
73
 
1212
- To enable Browserless, use the `-p` flag:
1213
-
1214
- ```bash
1215
- export BROWSERLESS_API_KEY="your-api-token"
1216
- agent-browser -p browserless open https://example.com
1217
- ```
74
+ The only thing we do is call `Emulation.setAutomationOverride` via CDP to set `navigator.webdriver = false` at the native Chrome level — undetectable by lie-detection systems like CreepJS.
1218
75
 
1219
- Or use environment variables for CI/scripts:
76
+ **Test results (connected to real Chrome):**
1220
77
 
1221
- ```bash
1222
- export AGENT_BROWSER_PROVIDER=browserless
1223
- export BROWSERLESS_API_KEY="your-api-token"
1224
- agent-browser open https://example.com
1225
- ```
1226
-
1227
- Optional configuration via environment variables:
1228
-
1229
- | Variable | Description | Default |
1230
- | -------------------------- | ------------------------------------------------ | --------------------------------------- |
1231
- | `BROWSERLESS_API_URL` | Base API URL (for custom regions or self-hosted) | `https://production-sfo.browserless.io` |
1232
- | `BROWSERLESS_BROWSER_TYPE` | Type of browser to use (chromium or chrome) | chromium |
1233
- | `BROWSERLESS_TTL` | Session TTL in milliseconds | `300000` |
1234
- | `BROWSERLESS_STEALTH` | Enable stealth mode (`true`/`false`) | `true` |
1235
-
1236
- When enabled, agent-browser connects to a Browserless cloud session instead of launching a local browser. All commands work identically.
1237
-
1238
- Get your API token from the [Browserless Dashboard](https://browserless.io).
1239
-
1240
- ### Browserbase
1241
-
1242
- [Browserbase](https://browserbase.com) provides remote browser infrastructure to make deployment of agentic browsing agents easy. Use it when running the agent-browser CLI in an environment where a local browser isn't feasible.
1243
-
1244
- To enable Browserbase, use the `-p` flag:
1245
-
1246
- ```bash
1247
- export BROWSERBASE_API_KEY="your-api-key"
1248
- agent-browser -p browserbase open https://example.com
1249
- ```
1250
-
1251
- Or use environment variables for CI/scripts:
1252
-
1253
- ```bash
1254
- export AGENT_BROWSER_PROVIDER=browserbase
1255
- export BROWSERBASE_API_KEY="your-api-key"
1256
- agent-browser open https://example.com
1257
- ```
1258
-
1259
- When enabled, agent-browser connects to a Browserbase session instead of launching a local browser. All commands work identically.
1260
-
1261
- Get your API key from the [Browserbase Dashboard](https://browserbase.com/overview).
1262
-
1263
- ### Browser Use
1264
-
1265
- [Browser Use](https://browser-use.com) provides cloud browser infrastructure for AI agents. Use it when running agent-browser in environments where a local browser isn't available (serverless, CI/CD, etc.).
1266
-
1267
- To enable Browser Use, use the `-p` flag:
1268
-
1269
- ```bash
1270
- export BROWSER_USE_API_KEY="your-api-key"
1271
- agent-browser -p browseruse open https://example.com
1272
- ```
1273
-
1274
- Or use environment variables for CI/scripts:
1275
-
1276
- ```bash
1277
- export AGENT_BROWSER_PROVIDER=browseruse
1278
- export BROWSER_USE_API_KEY="your-api-key"
1279
- agent-browser open https://example.com
1280
- ```
1281
-
1282
- When enabled, agent-browser connects to a Browser Use cloud session instead of launching a local browser. All commands work identically.
1283
-
1284
- Get your API key from the [Browser Use Cloud Dashboard](https://cloud.browser-use.com/settings?tab=api-keys). Free credits are available to get started, with pay-as-you-go pricing after.
1285
-
1286
- ### Kernel
1287
-
1288
- [Kernel](https://www.kernel.sh) provides cloud browser infrastructure for AI agents with features like stealth mode and persistent profiles.
1289
-
1290
- To enable Kernel, use the `-p` flag:
1291
-
1292
- ```bash
1293
- export KERNEL_API_KEY="your-api-key"
1294
- agent-browser -p kernel open https://example.com
1295
- ```
1296
-
1297
- Or use environment variables for CI/scripts:
1298
-
1299
- ```bash
1300
- export AGENT_BROWSER_PROVIDER=kernel
1301
- export KERNEL_API_KEY="your-api-key"
1302
- agent-browser open https://example.com
1303
- ```
1304
-
1305
- Optional configuration via environment variables:
1306
-
1307
- | Variable | Description | Default |
1308
- | ------------------------ | -------------------------------------------------------------------------------- | ------- |
1309
- | `KERNEL_HEADLESS` | Run browser in headless mode (`true`/`false`) | `false` |
1310
- | `KERNEL_STEALTH` | Enable stealth mode to avoid bot detection (`true`/`false`) | `true` |
1311
- | `KERNEL_TIMEOUT_SECONDS` | Session timeout in seconds | `300` |
1312
- | `KERNEL_PROFILE_NAME` | Browser profile name for persistent cookies/logins (created if it doesn't exist) | (none) |
1313
-
1314
- When enabled, agent-browser connects to a Kernel cloud session instead of launching a local browser. All commands work identically.
1315
-
1316
- **Profile Persistence:** When `KERNEL_PROFILE_NAME` is set, the profile will be created if it doesn't already exist. Cookies, logins, and session data are automatically saved back to the profile when the browser session ends, making them available for future sessions.
1317
-
1318
- Get your API key from the [Kernel Dashboard](https://dashboard.onkernel.com).
1319
-
1320
- ### AgentCore
1321
-
1322
- [AWS Bedrock AgentCore](https://aws.amazon.com/bedrock/agentcore/) provides cloud browser sessions with SigV4 authentication.
1323
-
1324
- To enable AgentCore, use the `-p` flag:
1325
-
1326
- ```bash
1327
- agent-browser -p agentcore open https://example.com
1328
- ```
1329
-
1330
- Or use environment variables for CI/scripts:
1331
-
1332
- ```bash
1333
- export AGENT_BROWSER_PROVIDER=agentcore
1334
- agent-browser open https://example.com
1335
- ```
78
+ | Test site | Result |
79
+ |---|---|
80
+ | [CreepJS](https://abrahamjuliot.github.io/creepjs/) | 0% stealth, 0% headless |
81
+ | [bot.sannysoft.com](https://bot.sannysoft.com) | All green |
82
+ | [Cloudflare Turnstile](https://nowsecure.nl) | Passed |
1336
83
 
1337
- Credentials are automatically resolved from environment variables (`AWS_ACCESS_KEY_ID`, `AWS_SECRET_ACCESS_KEY`) or the AWS CLI (`aws configure export-credentials`), which supports SSO, profiles, and IAM roles.
84
+ When using `--launch` mode (standalone browser), a full suite of 32 stealth patches is applied for headless Chrome.
1338
85
 
1339
- Optional configuration via environment variables:
86
+ ## Differences from upstream
1340
87
 
1341
- | Variable | Description | Default |
1342
- | -------------------------- | -------------------------------------------------------------------- | ---------------- |
1343
- | `AGENTCORE_REGION` | AWS region for the AgentCore endpoint | `us-east-1` |
1344
- | `AGENTCORE_BROWSER_ID` | Browser identifier | `aws.browser.v1` |
1345
- | `AGENTCORE_PROFILE_ID` | Browser profile for persistent state (cookies, localStorage) | (none) |
1346
- | `AGENTCORE_SESSION_TIMEOUT`| Session timeout in seconds | `3600` |
1347
- | `AWS_PROFILE` | AWS CLI profile for credential resolution | `default` |
88
+ Based on [agent-browser v0.27.0](https://github.com/vercel-labs/agent-browser). Changes:
1348
89
 
1349
- **Browser profiles:** When `AGENTCORE_PROFILE_ID` is set, browser state (cookies, localStorage) is persisted across sessions automatically.
90
+ - **Auto-connect is default** `agent-browser open <url>` connects to your Chrome instead of launching a new one
91
+ - **CDP-native stealth** — `Emulation.setAutomationOverride` instead of JS patches
92
+ - **Dual stealth mode** — zero patches for real Chrome, full patches for `--launch` mode
93
+ - **`--launch` / `--new` flag** — explicitly start a standalone browser
94
+ - **CI auto-detection** — standalone mode when `CI` env var is set
1350
95
 
1351
- When enabled, agent-browser connects to an AgentCore cloud browser session instead of launching a local browser. All commands work identically.
96
+ All upstream features (commands, snapshots, screenshots, recordings, tabs, sessions, etc.) work the same. See the [upstream repo](https://github.com/vercel-labs/agent-browser) for full documentation.
1352
97
 
1353
98
  ## License
1354
99
 
1355
- Apache-2.0
100
+ Apache-2.0 (same as upstream)