agent-browser-stealth 0.17.0-fork.1 → 0.24.0-fork.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (121) hide show
  1. package/README.md +1254 -236
  2. package/bin/agent-browser-darwin-arm64 +0 -0
  3. package/bin/agent-browser-darwin-x64 +0 -0
  4. package/bin/agent-browser-linux-arm64 +0 -0
  5. package/bin/agent-browser-linux-x64 +0 -0
  6. package/bin/agent-browser-win32-x64.exe +0 -0
  7. package/bin/agent-browser.js +13 -2
  8. package/extensions/tab-group-cdp/content-script.js +425 -0
  9. package/extensions/tab-group-cdp/icons/icon.svg +7 -0
  10. package/extensions/tab-group-cdp/manifest.json +34 -0
  11. package/extensions/tab-group-cdp/page-bridge.js +133 -0
  12. package/extensions/tab-group-cdp/service-worker.js +2249 -0
  13. package/extensions/tab-group-cdp/sidepanel.css +258 -0
  14. package/extensions/tab-group-cdp/sidepanel.html +28 -0
  15. package/extensions/tab-group-cdp/sidepanel.js +1225 -0
  16. package/package.json +20 -66
  17. package/scripts/build-all-platforms.sh +6 -0
  18. package/scripts/check-version-sync.js +14 -2
  19. package/scripts/postinstall.js +149 -165
  20. package/scripts/windows-debug/provision.sh +220 -0
  21. package/scripts/windows-debug/run.sh +92 -0
  22. package/scripts/windows-debug/start.sh +43 -0
  23. package/scripts/windows-debug/stop.sh +28 -0
  24. package/scripts/windows-debug/sync.sh +27 -0
  25. package/skills/agent-browser/SKILL.md +256 -156
  26. package/skills/agent-browser/references/authentication.md +101 -0
  27. package/skills/agent-browser/references/commands.md +34 -2
  28. package/skills/agent-browser/references/snapshot-refs.md +25 -0
  29. package/skills/agentcore/SKILL.md +115 -0
  30. package/skills/dogfood/SKILL.md +4 -2
  31. package/skills/electron/SKILL.md +26 -2
  32. package/skills/slack/SKILL.md +0 -9
  33. package/skills/slack/references/slack-tasks.md +2 -8
  34. package/skills/vercel-sandbox/SKILL.md +280 -0
  35. package/bin/agent-browser-local +0 -0
  36. package/bin/agent-browser-stealth +0 -0
  37. package/bin/agent-browser-stealth.d +0 -1
  38. package/dist/action-policy.d.ts +0 -14
  39. package/dist/action-policy.d.ts.map +0 -1
  40. package/dist/action-policy.js +0 -253
  41. package/dist/action-policy.js.map +0 -1
  42. package/dist/actions.d.ts +0 -21
  43. package/dist/actions.d.ts.map +0 -1
  44. package/dist/actions.js +0 -2139
  45. package/dist/actions.js.map +0 -1
  46. package/dist/auth-cli.d.ts +0 -2
  47. package/dist/auth-cli.d.ts.map +0 -1
  48. package/dist/auth-cli.js +0 -97
  49. package/dist/auth-cli.js.map +0 -1
  50. package/dist/auth-vault.d.ts +0 -36
  51. package/dist/auth-vault.d.ts.map +0 -1
  52. package/dist/auth-vault.js +0 -125
  53. package/dist/auth-vault.js.map +0 -1
  54. package/dist/browser.d.ts +0 -663
  55. package/dist/browser.d.ts.map +0 -1
  56. package/dist/browser.js +0 -3101
  57. package/dist/browser.js.map +0 -1
  58. package/dist/confirmation.d.ts +0 -8
  59. package/dist/confirmation.d.ts.map +0 -1
  60. package/dist/confirmation.js +0 -30
  61. package/dist/confirmation.js.map +0 -1
  62. package/dist/daemon.d.ts +0 -78
  63. package/dist/daemon.d.ts.map +0 -1
  64. package/dist/daemon.js +0 -789
  65. package/dist/daemon.js.map +0 -1
  66. package/dist/diff.d.ts +0 -18
  67. package/dist/diff.d.ts.map +0 -1
  68. package/dist/diff.js +0 -271
  69. package/dist/diff.js.map +0 -1
  70. package/dist/domain-filter.d.ts +0 -28
  71. package/dist/domain-filter.d.ts.map +0 -1
  72. package/dist/domain-filter.js +0 -149
  73. package/dist/domain-filter.js.map +0 -1
  74. package/dist/encryption.d.ts +0 -73
  75. package/dist/encryption.d.ts.map +0 -1
  76. package/dist/encryption.js +0 -171
  77. package/dist/encryption.js.map +0 -1
  78. package/dist/ios-actions.d.ts +0 -11
  79. package/dist/ios-actions.d.ts.map +0 -1
  80. package/dist/ios-actions.js +0 -228
  81. package/dist/ios-actions.js.map +0 -1
  82. package/dist/ios-manager.d.ts +0 -266
  83. package/dist/ios-manager.d.ts.map +0 -1
  84. package/dist/ios-manager.js +0 -1073
  85. package/dist/ios-manager.js.map +0 -1
  86. package/dist/protocol.d.ts +0 -26
  87. package/dist/protocol.d.ts.map +0 -1
  88. package/dist/protocol.js +0 -990
  89. package/dist/protocol.js.map +0 -1
  90. package/dist/snapshot.d.ts +0 -67
  91. package/dist/snapshot.d.ts.map +0 -1
  92. package/dist/snapshot.js +0 -514
  93. package/dist/snapshot.js.map +0 -1
  94. package/dist/state-utils.d.ts +0 -77
  95. package/dist/state-utils.d.ts.map +0 -1
  96. package/dist/state-utils.js +0 -178
  97. package/dist/state-utils.js.map +0 -1
  98. package/dist/stealth.d.ts +0 -41
  99. package/dist/stealth.d.ts.map +0 -1
  100. package/dist/stealth.js +0 -1743
  101. package/dist/stealth.js.map +0 -1
  102. package/dist/stream-server.d.ts +0 -117
  103. package/dist/stream-server.d.ts.map +0 -1
  104. package/dist/stream-server.js +0 -309
  105. package/dist/stream-server.js.map +0 -1
  106. package/dist/types.d.ts +0 -973
  107. package/dist/types.d.ts.map +0 -1
  108. package/dist/types.js +0 -2
  109. package/dist/types.js.map +0 -1
  110. package/scripts/check-creepjs-headless.js +0 -137
  111. package/scripts/check-daemon-pid-recovery.js +0 -148
  112. package/scripts/check-sannysoft-webdriver.js +0 -112
  113. package/scripts/check-stealth-regression.js +0 -199
  114. package/scripts/check-turnstile-testkey.ts +0 -125
  115. package/scripts/clawhub-sync.sh +0 -27
  116. package/scripts/sync-upstream.sh +0 -142
  117. package/scripts/verify-bundled-binaries.js +0 -71
  118. package/scripts/verify-native-version.js +0 -48
  119. package/scripts/verify-packed-host-binary.js +0 -88
  120. package/scripts/verify-registry-host-binary.js +0 -120
  121. package/skills/agent-browser-stealth/SKILL.md +0 -127
package/README.md CHANGED
@@ -1,336 +1,1354 @@
1
- # agent-browser-stealth
1
+ # agent-browser
2
2
 
3
- Stealth-first fork of `agent-browser` for production browser automation under anti-bot pressure.
3
+ Browser automation CLI for AI agents. Fast native Rust CLI.
4
4
 
5
- This README focuses on stealth architecture and principles. For full command coverage inherited from upstream, use:
5
+ ## Installation
6
6
 
7
- - upstream docs: <https://github.com/vercel-labs/agent-browser>
8
- - local help: `agent-browser --help` (short alias: `abs --help`)
7
+ ### Global Installation (recommended)
9
8
 
10
- ## What This Fork Optimizes
9
+ Installs the native Rust binary:
11
10
 
12
- - Stealth is always on (legacy `launch.stealth` is accepted but ignored).
13
- - Fingerprint surfaces are patched at multiple layers (launch args, CDP overrides, init scripts).
14
- - Behavioral signals are humanized (typing cadence, cursor path, pacing, retry backoff).
15
- - Region signals are auto-aligned (locale/timezone/Accept-Language) to reduce mismatch risk.
16
- - Verification/captcha handling is policy-driven (`--risk-mode off|warn|block`).
11
+ ```bash
12
+ npm install -g agent-browser
13
+ agent-browser install # Download Chrome from Chrome for Testing (first time only)
14
+ ```
17
15
 
18
- ## FAQ: `agent-browser` vs `agent-browser-stealth`
16
+ ### Project Installation (local dependency)
19
17
 
20
- People often ask this: "What's the anti-detection approach compared to `agent-browser-stealth` on npm?"
18
+ For projects that want to pin the version in `package.json`:
21
19
 
22
- - `agent-browser-stealth` on npm is the package name for this fork.
23
- - The CLI keeps upstream-compatible command names (`agent-browser` is still the main executable, with `agent-browser-stealth` and `abs` as aliases).
24
- - The practical difference vs upstream `agent-browser` is not one single "stealth switch"; it is a defense-in-depth stack designed for anti-bot pressure.
20
+ ```bash
21
+ npm install agent-browser
22
+ agent-browser install
23
+ ```
24
+
25
+ Then use via `package.json` scripts or by invoking `agent-browser` directly.
25
26
 
26
- The core idea is layered hardening across the full automation lifecycle:
27
+ ### Homebrew (macOS)
27
28
 
28
- 1. Connection-aware policy: choose the best available stealth capability by mode (local launch/CDP/cloud provider).
29
- 2. Fingerprint hardening: patch launch args, CDP metadata, and init-script surfaces before page code runs.
30
- 3. Behavioral humanization: non-uniform typing/mouse/wait patterns instead of perfectly mechanical actions.
31
- 4. Region coherence: auto-align locale/timezone/language signals to target geography.
32
- 5. Risk-aware control loop: detect verification/captcha signals and handle them with explicit `risk-mode` policy.
29
+ ```bash
30
+ brew install agent-browser
31
+ agent-browser install # Download Chrome from Chrome for Testing (first time only)
32
+ ```
33
33
 
34
- Goal: reduce detection probability and improve stability in production automation. Non-goal: "guaranteed bypass" on every target.
34
+ ### Cargo (Rust)
35
35
 
36
- ## Quick Start
36
+ ```bash
37
+ cargo install agent-browser
38
+ agent-browser install # Download Chrome from Chrome for Testing (first time only)
39
+ ```
37
40
 
38
- ### Install
41
+ ### From Source
39
42
 
40
43
  ```bash
41
- npm install -g agent-browser-stealth
44
+ git clone https://github.com/vercel-labs/agent-browser
45
+ cd agent-browser
46
+ pnpm install
47
+ pnpm build
48
+ pnpm build:native # Requires Rust (https://rustup.rs)
49
+ pnpm link --global # Makes agent-browser available globally
42
50
  agent-browser install
43
- # same CLI, short alias
44
- abs install
45
51
  ```
46
52
 
47
- ### Minimal Usage
53
+ ### Linux Dependencies
54
+
55
+ On Linux, install system dependencies:
48
56
 
49
57
  ```bash
50
- agent-browser open https://example.com
51
- agent-browser snapshot -i
58
+ agent-browser install --with-deps
59
+ ```
60
+
61
+ ### Updating
62
+
63
+ Upgrade to the latest version:
64
+
65
+ ```bash
66
+ agent-browser upgrade
67
+ ```
68
+
69
+ Detects your installation method (npm, Homebrew, or Cargo) and runs the appropriate update command automatically.
70
+
71
+ ### Requirements
72
+
73
+ - **Chrome** - Run `agent-browser install` to download Chrome from [Chrome for Testing](https://developer.chrome.com/blog/chrome-for-testing/) (Google's official automation channel). Existing Chrome, Brave, Playwright, and Puppeteer installations are detected automatically. No Playwright or Node.js required for the daemon.
74
+ - **Rust** - Only needed when building from source (see From Source above).
75
+
76
+ ## Quick Start
77
+
78
+ ```bash
79
+ agent-browser open example.com
80
+ agent-browser snapshot # Get accessibility tree with refs
81
+ agent-browser click @e2 # Click by ref from snapshot
82
+ agent-browser fill @e3 "test@example.com" # Fill by ref
83
+ agent-browser get text @e1 # Get text by ref
84
+ agent-browser screenshot page.png
85
+ agent-browser close
86
+ ```
87
+
88
+ ### Traditional Selectors (also supported)
89
+
90
+ ```bash
91
+ agent-browser click "#submit"
92
+ agent-browser fill "#email" "test@example.com"
93
+ agent-browser find role button click --name "Submit"
94
+ ```
95
+
96
+ ## Commands
97
+
98
+ ### Core Commands
99
+
100
+ ```bash
101
+ agent-browser open <url> # Navigate to URL (aliases: goto, navigate)
102
+ agent-browser click <sel> # Click element (--new-tab to open in new tab)
103
+ agent-browser dblclick <sel> # Double-click element
104
+ agent-browser focus <sel> # Focus element
105
+ agent-browser type <sel> <text> # Type into element
106
+ agent-browser fill <sel> <text> # Clear and fill
107
+ agent-browser press <key> # Press key (Enter, Tab, Control+a) (alias: key)
108
+ agent-browser keyboard type <text> # Type with real keystrokes (no selector, current focus)
109
+ agent-browser keyboard inserttext <text> # Insert text without key events (no selector)
110
+ agent-browser keydown <key> # Hold key down
111
+ agent-browser keyup <key> # Release key
112
+ agent-browser hover <sel> # Hover element
113
+ agent-browser select <sel> <val> # Select dropdown option
114
+ agent-browser check <sel> # Check checkbox
115
+ agent-browser uncheck <sel> # Uncheck checkbox
116
+ agent-browser scroll <dir> [px] # Scroll (up/down/left/right, --selector <sel>)
117
+ agent-browser scrollintoview <sel> # Scroll element into view (alias: scrollinto)
118
+ agent-browser drag <src> <tgt> # Drag and drop
119
+ agent-browser upload <sel> <files> # Upload files
120
+ agent-browser screenshot [path] # Take screenshot (--full for full page, saves to a temporary directory if no path)
121
+ agent-browser screenshot --annotate # Annotated screenshot with numbered element labels
122
+ agent-browser screenshot --screenshot-dir ./shots # Save to custom directory
123
+ agent-browser screenshot --screenshot-format jpeg --screenshot-quality 80
124
+ agent-browser pdf <path> # Save as PDF
125
+ agent-browser snapshot # Accessibility tree with refs (best for AI)
126
+ agent-browser eval <js> # Run JavaScript (-b for base64, --stdin for piped input)
127
+ agent-browser connect <port> # Connect to browser via CDP
128
+ agent-browser stream enable [--port <port>] # Start runtime WebSocket streaming
129
+ agent-browser stream status # Show runtime streaming state and bound port
130
+ agent-browser stream disable # Stop runtime WebSocket streaming
131
+ agent-browser close # Close browser (aliases: quit, exit)
132
+ agent-browser close --all # Close all active sessions
133
+ ```
134
+
135
+ ### Get Info
136
+
137
+ ```bash
138
+ agent-browser get text <sel> # Get text content
139
+ agent-browser get html <sel> # Get innerHTML
140
+ agent-browser get value <sel> # Get input value
141
+ agent-browser get attr <sel> <attr> # Get attribute
142
+ agent-browser get title # Get page title
143
+ agent-browser get url # Get current URL
144
+ agent-browser get cdp-url # Get CDP WebSocket URL (for DevTools, debugging)
145
+ agent-browser get count <sel> # Count matching elements
146
+ agent-browser get box <sel> # Get bounding box
147
+ agent-browser get styles <sel> # Get computed styles
148
+ ```
149
+
150
+ ### Check State
151
+
152
+ ```bash
153
+ agent-browser is visible <sel> # Check if visible
154
+ agent-browser is enabled <sel> # Check if enabled
155
+ agent-browser is checked <sel> # Check if checked
156
+ ```
157
+
158
+ ### Find Elements (Semantic Locators)
159
+
160
+ ```bash
161
+ agent-browser find role <role> <action> [value] # By ARIA role
162
+ agent-browser find text <text> <action> # By text content
163
+ agent-browser find label <label> <action> [value] # By label
164
+ agent-browser find placeholder <ph> <action> [value] # By placeholder
165
+ agent-browser find alt <text> <action> # By alt text
166
+ agent-browser find title <text> <action> # By title attr
167
+ agent-browser find testid <id> <action> [value] # By data-testid
168
+ agent-browser find first <sel> <action> [value] # First match
169
+ agent-browser find last <sel> <action> [value] # Last match
170
+ agent-browser find nth <n> <sel> <action> [value] # Nth match
171
+ ```
172
+
173
+ **Actions:** `click`, `fill`, `type`, `hover`, `focus`, `check`, `uncheck`, `text`
174
+
175
+ **Options:** `--name <name>` (filter role by accessible name), `--exact` (require exact text match)
176
+
177
+ **Examples:**
178
+
179
+ ```bash
180
+ agent-browser find role button click --name "Submit"
181
+ agent-browser find text "Sign In" click
182
+ agent-browser find label "Email" fill "test@test.com"
183
+ agent-browser find first ".item" click
184
+ agent-browser find nth 2 "a" text
185
+ ```
186
+
187
+ ### Wait
188
+
189
+ ```bash
190
+ agent-browser wait <selector> # Wait for element to be visible
191
+ agent-browser wait <ms> # Wait for time (milliseconds)
192
+ agent-browser wait --text "Welcome" # Wait for text to appear (substring match)
193
+ agent-browser wait --url "**/dash" # Wait for URL pattern
194
+ agent-browser wait --load networkidle # Wait for load state
195
+ agent-browser wait --fn "window.ready === true" # Wait for JS condition
196
+
197
+ # Wait for text/element to disappear
198
+ agent-browser wait --fn "!document.body.innerText.includes('Loading...')"
199
+ agent-browser wait "#spinner" --state hidden
200
+ ```
201
+
202
+ **Load states:** `load`, `domcontentloaded`, `networkidle`
203
+
204
+ ### Batch Execution
205
+
206
+ Execute multiple commands in a single invocation by piping a JSON array of
207
+ string arrays to `batch`. This avoids per-command process startup overhead
208
+ when running multi-step workflows.
209
+
210
+ ```bash
211
+ # Pipe commands as JSON
212
+ echo '[
213
+ ["open", "https://example.com"],
214
+ ["snapshot", "-i"],
215
+ ["click", "@e1"],
216
+ ["screenshot", "result.png"]
217
+ ]' | agent-browser batch --json
218
+
219
+ # Stop on first error
220
+ agent-browser batch --bail < commands.json
221
+ ```
222
+
223
+ ### Clipboard
224
+
225
+ ```bash
226
+ agent-browser clipboard read # Read text from clipboard
227
+ agent-browser clipboard write "Hello, World!" # Write text to clipboard
228
+ agent-browser clipboard copy # Copy current selection (Ctrl+C)
229
+ agent-browser clipboard paste # Paste from clipboard (Ctrl+V)
230
+ ```
231
+
232
+ ### Mouse Control
233
+
234
+ ```bash
235
+ agent-browser mouse move <x> <y> # Move mouse
236
+ agent-browser mouse down [button] # Press button (left/right/middle)
237
+ agent-browser mouse up [button] # Release button
238
+ agent-browser mouse wheel <dy> [dx] # Scroll wheel
239
+ ```
240
+
241
+ ### Browser Settings
242
+
243
+ ```bash
244
+ agent-browser set viewport <w> <h> [scale] # Set viewport size (scale for retina, e.g. 2)
245
+ agent-browser set device <name> # Emulate device ("iPhone 14")
246
+ agent-browser set geo <lat> <lng> # Set geolocation
247
+ agent-browser set offline [on|off] # Toggle offline mode
248
+ agent-browser set headers <json> # Extra HTTP headers
249
+ agent-browser set credentials <u> <p> # HTTP basic auth
250
+ agent-browser set media [dark|light] # Emulate color scheme
251
+ ```
252
+
253
+ ### Cookies & Storage
254
+
255
+ ```bash
256
+ agent-browser cookies # Get all cookies
257
+ agent-browser cookies set <name> <val> # Set cookie
258
+ agent-browser cookies clear # Clear cookies
259
+
260
+ agent-browser storage local # Get all localStorage
261
+ agent-browser storage local <key> # Get specific key
262
+ agent-browser storage local set <k> <v> # Set value
263
+ agent-browser storage local clear # Clear all
264
+
265
+ agent-browser storage session # Same for sessionStorage
266
+ ```
267
+
268
+ ### Network
269
+
270
+ ```bash
271
+ agent-browser network route <url> # Intercept requests
272
+ agent-browser network route <url> --abort # Block requests
273
+ agent-browser network route <url> --body <json> # Mock response
274
+ agent-browser network unroute [url] # Remove routes
275
+ agent-browser network requests # View tracked requests
276
+ agent-browser network requests --filter api # Filter requests
277
+ agent-browser network requests --type xhr,fetch # Filter by resource type
278
+ agent-browser network requests --method POST # Filter by HTTP method
279
+ agent-browser network requests --status 2xx # Filter by status (200, 2xx, 400-499)
280
+ agent-browser network request <requestId> # View full request/response detail
281
+ agent-browser network har start # Start HAR recording
282
+ agent-browser network har stop [output.har] # Stop and save HAR (temp path if omitted)
283
+ ```
284
+
285
+ ### Tabs & Windows
286
+
287
+ ```bash
288
+ agent-browser tab # List tabs
289
+ agent-browser tab new [url] # New tab (optionally with URL)
290
+ agent-browser tab <n> # Switch to tab n
291
+ agent-browser tab close [n] # Close tab
292
+ agent-browser window new # New window
293
+ ```
294
+
295
+ ### Frames
296
+
297
+ ```bash
298
+ agent-browser frame <sel> # Switch to iframe
299
+ agent-browser frame main # Back to main frame
300
+ ```
301
+
302
+ ### Dialogs
303
+
304
+ ```bash
305
+ agent-browser dialog accept [text] # Accept (with optional prompt text)
306
+ agent-browser dialog dismiss # Dismiss
307
+ agent-browser dialog status # Check if a dialog is currently open
308
+ ```
309
+
310
+ By default, `alert` and `beforeunload` dialogs are automatically accepted so they never block the agent. `confirm` and `prompt` dialogs still require explicit handling. Use `--no-auto-dialog` (or `AGENT_BROWSER_NO_AUTO_DIALOG=1`) to disable automatic handling.
311
+
312
+ When a JavaScript dialog is pending, all command responses include a `warning` field with the dialog type and message.
313
+
314
+ ### Diff
315
+
316
+ ```bash
317
+ agent-browser diff snapshot # Compare current vs last snapshot
318
+ agent-browser diff snapshot --baseline before.txt # Compare current vs saved snapshot file
319
+ agent-browser diff snapshot --selector "#main" --compact # Scoped snapshot diff
320
+ agent-browser diff screenshot --baseline before.png # Visual pixel diff against baseline
321
+ agent-browser diff screenshot --baseline b.png -o d.png # Save diff image to custom path
322
+ agent-browser diff screenshot --baseline b.png -t 0.2 # Adjust color threshold (0-1)
323
+ agent-browser diff url https://v1.com https://v2.com # Compare two URLs (snapshot diff)
324
+ agent-browser diff url https://v1.com https://v2.com --screenshot # Also visual diff
325
+ agent-browser diff url https://v1.com https://v2.com --wait-until networkidle # Custom wait strategy
326
+ agent-browser diff url https://v1.com https://v2.com --selector "#main" # Scope to element
327
+ ```
328
+
329
+ ### Debug
330
+
331
+ ```bash
332
+ agent-browser trace start [path] # Start recording trace
333
+ agent-browser trace stop [path] # Stop and save trace
334
+ agent-browser profiler start # Start Chrome DevTools profiling
335
+ agent-browser profiler stop [path] # Stop and save profile (.json)
336
+ agent-browser console # View console messages (log, error, warn, info)
337
+ agent-browser console --json # JSON output with raw CDP args for programmatic access
338
+ agent-browser console --clear # Clear console
339
+ agent-browser errors # View page errors (uncaught JavaScript exceptions)
340
+ agent-browser errors --clear # Clear errors
341
+ agent-browser highlight <sel> # Highlight element
342
+ agent-browser inspect # Open Chrome DevTools for the active page
343
+ agent-browser state save <path> # Save auth state
344
+ agent-browser state load <path> # Load auth state
345
+ agent-browser state list # List saved state files
346
+ agent-browser state show <file> # Show state summary
347
+ agent-browser state rename <old> <new> # Rename state file
348
+ agent-browser state clear [name] # Clear states for session
349
+ agent-browser state clear --all # Clear all saved states
350
+ agent-browser state clean --older-than <days> # Delete old states
351
+ ```
352
+
353
+ ### Navigation
354
+
355
+ ```bash
356
+ agent-browser back # Go back
357
+ agent-browser forward # Go forward
358
+ agent-browser reload # Reload page
359
+ ```
360
+
361
+ ### Setup
362
+
363
+ ```bash
364
+ agent-browser install # Download Chrome from Chrome for Testing (Google's official automation channel)
365
+ agent-browser install --with-deps # Also install system deps (Linux)
366
+ agent-browser upgrade # Upgrade agent-browser to the latest version
367
+ ```
368
+
369
+ ## Authentication
370
+
371
+ agent-browser provides multiple ways to persist login sessions so you don't re-authenticate every run.
372
+
373
+ ### Quick summary
374
+
375
+ | Approach | Best for | Flag / Env |
376
+ |----------|----------|------------|
377
+ | **Persistent profile** | Full browser state (cookies, IndexedDB, service workers, cache) across restarts | `--profile <path>` / `AGENT_BROWSER_PROFILE` |
378
+ | **Session persistence** | Auto-save/restore cookies + localStorage by name | `--session-name <name>` / `AGENT_BROWSER_SESSION_NAME` |
379
+ | **Import from your browser** | Grab auth from a Chrome session you already logged into | `--auto-connect` + `state save` |
380
+ | **State file** | Load a previously saved state JSON on launch | `--state <path>` / `AGENT_BROWSER_STATE` |
381
+ | **Auth vault** | Store credentials locally (encrypted), login by name | `auth save` / `auth login` |
382
+
383
+ ### Import auth from your browser
384
+
385
+ If you are already logged in to a site in Chrome, you can grab that auth state and reuse it:
386
+
387
+ ```bash
388
+ # 1. Launch Chrome with remote debugging enabled
389
+ # macOS:
390
+ "/Applications/Google Chrome.app/Contents/MacOS/Google Chrome" --remote-debugging-port=9222
391
+ # Or use --auto-connect to discover an already-running Chrome
392
+
393
+ # 2. Connect and save the authenticated state
394
+ agent-browser --auto-connect state save ./my-auth.json
395
+
396
+ # 3. Use the saved auth in future sessions
397
+ agent-browser --state ./my-auth.json open https://app.example.com/dashboard
398
+
399
+ # 4. Or use --session-name for automatic persistence
400
+ agent-browser --session-name myapp state load ./my-auth.json
401
+ # From now on, --session-name myapp auto-saves/restores this state
402
+ ```
403
+
404
+ > **Security notes:**
405
+ > - `--remote-debugging-port` exposes full browser control on localhost. Any local process can connect. Only use on trusted machines and close Chrome when done.
406
+ > - State files contain session tokens in plaintext. Add them to `.gitignore` and delete when no longer needed. For encryption at rest, set `AGENT_BROWSER_ENCRYPTION_KEY` (see [State Encryption](#state-encryption)).
407
+
408
+ For full details on login flows, OAuth, 2FA, cookie-based auth, and the auth vault, see the [Authentication](docs/src/app/sessions/page.mdx) docs.
409
+
410
+ ## Sessions
411
+
412
+ Run multiple isolated browser instances:
413
+
414
+ ```bash
415
+ # Different sessions
416
+ agent-browser --session agent1 open site-a.com
417
+ agent-browser --session agent2 open site-b.com
418
+
419
+ # Or via environment variable
420
+ AGENT_BROWSER_SESSION=agent1 agent-browser click "#btn"
421
+
422
+ # List active sessions
423
+ agent-browser session list
424
+ # Output:
425
+ # Active sessions:
426
+ # -> default
427
+ # agent1
428
+
429
+ # Show current session
430
+ agent-browser session
431
+ ```
432
+
433
+ Each session has its own:
434
+
435
+ - Browser instance
436
+ - Cookies and storage
437
+ - Navigation history
438
+ - Authentication state
439
+
440
+ ## Persistent Profiles
441
+
442
+ By default, browser state (cookies, localStorage, login sessions) is ephemeral and lost when the browser closes. Use `--profile` to persist state across browser restarts:
443
+
444
+ ```bash
445
+ # Use a persistent profile directory
446
+ agent-browser --profile ~/.myapp-profile open myapp.com
447
+
448
+ # Login once, then reuse the authenticated session
449
+ agent-browser --profile ~/.myapp-profile open myapp.com/dashboard
450
+
451
+ # Or via environment variable
452
+ AGENT_BROWSER_PROFILE=~/.myapp-profile agent-browser open myapp.com
453
+ ```
454
+
455
+ The profile directory stores:
456
+
457
+ - Cookies and localStorage
458
+ - IndexedDB data
459
+ - Service workers
460
+ - Browser cache
461
+ - Login sessions
462
+
463
+ **Tip**: Use different profile paths for different projects to keep their browser state isolated.
464
+
465
+ ## Session Persistence
466
+
467
+ Alternatively, use `--session-name` to automatically save and restore cookies and localStorage across browser restarts:
468
+
469
+ ```bash
470
+ # Auto-save/load state for "twitter" session
471
+ agent-browser --session-name twitter open twitter.com
472
+
473
+ # Login once, then state persists automatically
474
+ # State files stored in ~/.agent-browser/sessions/
475
+
476
+ # Or via environment variable
477
+ export AGENT_BROWSER_SESSION_NAME=twitter
478
+ agent-browser open twitter.com
479
+ ```
480
+
481
+ ### State Encryption
482
+
483
+ Encrypt saved session data at rest with AES-256-GCM:
484
+
485
+ ```bash
486
+ # Generate key: openssl rand -hex 32
487
+ export AGENT_BROWSER_ENCRYPTION_KEY=<64-char-hex-key>
488
+
489
+ # State files are now encrypted automatically
490
+ agent-browser --session-name secure open example.com
491
+ ```
492
+
493
+ | Variable | Description |
494
+ | --------------------------------- | -------------------------------------------------- |
495
+ | `AGENT_BROWSER_SESSION_NAME` | Auto-save/load state persistence name |
496
+ | `AGENT_BROWSER_ENCRYPTION_KEY` | 64-char hex key for AES-256-GCM encryption |
497
+ | `AGENT_BROWSER_STATE_EXPIRE_DAYS` | Auto-delete states older than N days (default: 30) |
498
+
499
+ ## Security
500
+
501
+ agent-browser includes security features for safe AI agent deployments. All features are opt-in -- existing workflows are unaffected until you explicitly enable a feature:
502
+
503
+ - **Authentication Vault** -- Store credentials locally (always encrypted), reference by name. The LLM never sees passwords. `auth login` navigates with `load` and then waits for login form selectors to appear (SPA-friendly, timeout follows the default action timeout). A key is auto-generated at `~/.agent-browser/.encryption-key` if `AGENT_BROWSER_ENCRYPTION_KEY` is not set: `echo "pass" | agent-browser auth save github --url https://github.com/login --username user --password-stdin` then `agent-browser auth login github`
504
+ - **Content Boundary Markers** -- Wrap page output in delimiters so LLMs can distinguish tool output from untrusted content: `--content-boundaries`
505
+ - **Domain Allowlist** -- Restrict navigation to trusted domains (wildcards like `*.example.com` also match the bare domain): `--allowed-domains "example.com,*.example.com"`. Sub-resource requests (scripts, images, fetch) and WebSocket/EventSource connections to non-allowed domains are also blocked. Include any CDN domains your target pages depend on (e.g., `*.cdn.example.com`).
506
+ - **Action Policy** -- Gate destructive actions with a static policy file: `--action-policy ./policy.json`
507
+ - **Action Confirmation** -- Require explicit approval for sensitive action categories: `--confirm-actions eval,download`
508
+ - **Output Length Limits** -- Prevent context flooding: `--max-output 50000`
509
+
510
+ | Variable | Description |
511
+ | ----------------------------------- | ---------------------------------------- |
512
+ | `AGENT_BROWSER_CONTENT_BOUNDARIES` | Wrap page output in boundary markers |
513
+ | `AGENT_BROWSER_MAX_OUTPUT` | Max characters for page output |
514
+ | `AGENT_BROWSER_ALLOWED_DOMAINS` | Comma-separated allowed domain patterns |
515
+ | `AGENT_BROWSER_ACTION_POLICY` | Path to action policy JSON file |
516
+ | `AGENT_BROWSER_CONFIRM_ACTIONS` | Action categories requiring confirmation |
517
+ | `AGENT_BROWSER_CONFIRM_INTERACTIVE` | Enable interactive confirmation prompts |
518
+
519
+ See [Security documentation](https://agent-browser.dev/security) for details.
520
+
521
+ ## Snapshot Options
522
+
523
+ The `snapshot` command supports filtering to reduce output size:
524
+
525
+ ```bash
526
+ agent-browser snapshot # Full accessibility tree
527
+ agent-browser snapshot -i # Interactive elements only (buttons, inputs, links)
528
+ agent-browser snapshot -c # Compact (remove empty structural elements)
529
+ agent-browser snapshot -d 3 # Limit depth to 3 levels
530
+ agent-browser snapshot -s "#main" # Scope to CSS selector
531
+ agent-browser snapshot -i -c -d 5 # Combine options
532
+ ```
533
+
534
+ | Option | Description |
535
+ | ---------------------- | ----------------------------------------------------------------------- |
536
+ | `-i, --interactive` | Only show interactive elements (buttons, links, inputs) |
537
+ | `-c, --compact` | Remove empty structural elements |
538
+ | `-d, --depth <n>` | Limit tree depth |
539
+ | `-s, --selector <sel>` | Scope to CSS selector |
540
+
541
+ ## Annotated Screenshots
542
+
543
+ The `--annotate` flag overlays numbered labels on interactive elements in the screenshot. Each label `[N]` corresponds to ref `@eN`, so the same refs work for both visual and text-based workflows.
544
+
545
+ Annotated screenshots are supported on the CDP-backed browser path (Chrome/Lightpanda). The Safari/WebDriver backend does not yet support `--annotate`.
546
+
547
+ ```bash
548
+ agent-browser screenshot --annotate
549
+ # -> Screenshot saved to /tmp/screenshot-2026-02-17T12-00-00-abc123.png
550
+ # [1] @e1 button "Submit"
551
+ # [2] @e2 link "Home"
552
+ # [3] @e3 textbox "Email"
553
+ ```
554
+
555
+ After an annotated screenshot, refs are cached so you can immediately interact with elements:
556
+
557
+ ```bash
558
+ agent-browser screenshot --annotate ./page.png
559
+ agent-browser click @e2 # Click the "Home" link labeled [2]
560
+ ```
561
+
562
+ This is useful for multimodal AI models that can reason about visual layout, unlabeled icon buttons, canvas elements, or visual state that the text accessibility tree cannot capture.
563
+
564
+ ## Options
565
+
566
+ | Option | Description |
567
+ |--------|-------------|
568
+ | `--session <name>` | Use isolated session (or `AGENT_BROWSER_SESSION` env) |
569
+ | `--session-name <name>` | Auto-save/restore session state (or `AGENT_BROWSER_SESSION_NAME` env) |
570
+ | `--profile <path>` | Persistent browser profile directory (or `AGENT_BROWSER_PROFILE` env) |
571
+ | `--state <path>` | Load storage state from JSON file (or `AGENT_BROWSER_STATE` env) |
572
+ | `--headers <json>` | Set HTTP headers scoped to the URL's origin |
573
+ | `--executable-path <path>` | Custom browser executable (or `AGENT_BROWSER_EXECUTABLE_PATH` env) |
574
+ | `--extension <path>` | Load browser extension (repeatable; or `AGENT_BROWSER_EXTENSIONS` env) |
575
+ | `--args <args>` | Browser launch args, comma or newline separated (or `AGENT_BROWSER_ARGS` env) |
576
+ | `--user-agent <ua>` | Custom User-Agent string (or `AGENT_BROWSER_USER_AGENT` env) |
577
+ | `--proxy <url>` | Proxy server URL with optional auth (or `AGENT_BROWSER_PROXY` env) |
578
+ | `--proxy-bypass <hosts>` | Hosts to bypass proxy (or `AGENT_BROWSER_PROXY_BYPASS` env) |
579
+ | `--ignore-https-errors` | Ignore HTTPS certificate errors (useful for self-signed certs) |
580
+ | `--allow-file-access` | Allow file:// URLs to access local files (Chromium only) |
581
+ | `-p, --provider <name>` | Cloud browser provider (or `AGENT_BROWSER_PROVIDER` env) |
582
+ | `--device <name>` | iOS device name, e.g. "iPhone 15 Pro" (or `AGENT_BROWSER_IOS_DEVICE` env) |
583
+ | `--json` | JSON output (for agents) |
584
+ | `--annotate` | Annotated screenshot with numbered element labels (or `AGENT_BROWSER_ANNOTATE` env) |
585
+ | `--screenshot-dir <path>` | Default screenshot output directory (or `AGENT_BROWSER_SCREENSHOT_DIR` env) |
586
+ | `--screenshot-quality <n>` | JPEG quality 0-100 (or `AGENT_BROWSER_SCREENSHOT_QUALITY` env) |
587
+ | `--screenshot-format <fmt>` | Screenshot format: `png`, `jpeg` (or `AGENT_BROWSER_SCREENSHOT_FORMAT` env) |
588
+ | `--headed` | Show browser window (not headless) (or `AGENT_BROWSER_HEADED` env) |
589
+ | `--cdp <port\|url>` | Connect via Chrome DevTools Protocol (port or WebSocket URL) |
590
+ | `--auto-connect` | Auto-discover and connect to running Chrome (or `AGENT_BROWSER_AUTO_CONNECT` env) |
591
+ | `--color-scheme <scheme>` | Color scheme: `dark`, `light`, `no-preference` (or `AGENT_BROWSER_COLOR_SCHEME` env) |
592
+ | `--download-path <path>` | Default download directory (or `AGENT_BROWSER_DOWNLOAD_PATH` env) |
593
+ | `--content-boundaries` | Wrap page output in boundary markers for LLM safety (or `AGENT_BROWSER_CONTENT_BOUNDARIES` env) |
594
+ | `--max-output <chars>` | Truncate page output to N characters (or `AGENT_BROWSER_MAX_OUTPUT` env) |
595
+ | `--allowed-domains <list>` | Comma-separated allowed domain patterns (or `AGENT_BROWSER_ALLOWED_DOMAINS` env) |
596
+ | `--action-policy <path>` | Path to action policy JSON file (or `AGENT_BROWSER_ACTION_POLICY` env) |
597
+ | `--confirm-actions <list>` | Action categories requiring confirmation (or `AGENT_BROWSER_CONFIRM_ACTIONS` env) |
598
+ | `--confirm-interactive` | Interactive confirmation prompts; auto-denies if stdin is not a TTY (or `AGENT_BROWSER_CONFIRM_INTERACTIVE` env) |
599
+ | `--engine <name>` | Browser engine: `chrome` (default), `lightpanda` (or `AGENT_BROWSER_ENGINE` env) |
600
+ | `--no-auto-dialog` | Disable automatic dismissal of `alert`/`beforeunload` dialogs (or `AGENT_BROWSER_NO_AUTO_DIALOG` env) |
601
+ | `--config <path>` | Use a custom config file (or `AGENT_BROWSER_CONFIG` env) |
602
+ | `--debug` | Debug output |
603
+
604
+ ## Observability Dashboard
605
+
606
+ Monitor agent-browser sessions in real time with a local web dashboard showing a live viewport and command activity feed.
607
+
608
+ ```bash
609
+ # Install the dashboard (one time)
610
+ agent-browser dashboard install
611
+
612
+ # Start the dashboard server (runs in background on port 4848)
613
+ agent-browser dashboard start
614
+ agent-browser dashboard start --port 8080 # Custom port
615
+
616
+ # All sessions are automatically visible in the dashboard
617
+ agent-browser open example.com
618
+
619
+ # Stop the dashboard
620
+ agent-browser dashboard stop
621
+ ```
622
+
623
+ The dashboard runs as a standalone background process on port 4848, independent of browser sessions. It stays available even when no sessions are running. All sessions automatically stream to the dashboard.
624
+
625
+ The dashboard displays:
626
+ - **Live viewport** -- real-time JPEG frames from the browser
627
+ - **Activity feed** -- chronological command/result stream with timing and expandable details
628
+ - **Console output** -- browser console messages (log, warn, error)
629
+ - **Session creation** -- create new sessions from the UI with local engines (Chrome, Lightpanda) or cloud providers (AgentCore, Browserbase, Browserless, Browser Use, Kernel)
630
+
631
+ ## Configuration
632
+
633
+ Create an `agent-browser.json` file to set persistent defaults instead of repeating flags on every command.
634
+
635
+ **Locations (lowest to highest priority):**
636
+
637
+ 1. `~/.agent-browser/config.json` -- user-level defaults
638
+ 2. `./agent-browser.json` -- project-level overrides (in working directory)
639
+ 3. `AGENT_BROWSER_*` environment variables override config file values
640
+ 4. CLI flags override everything
641
+
642
+ **Example `agent-browser.json`:**
643
+
644
+ ```json
645
+ {
646
+ "headed": true,
647
+ "proxy": "http://localhost:8080",
648
+ "profile": "./browser-data",
649
+ "userAgent": "my-agent/1.0",
650
+ "ignoreHttpsErrors": true
651
+ }
652
+ ```
653
+
654
+ Use `--config <path>` or `AGENT_BROWSER_CONFIG` to load a specific config file instead of the defaults:
655
+
656
+ ```bash
657
+ agent-browser --config ./ci-config.json open example.com
658
+ AGENT_BROWSER_CONFIG=./ci-config.json agent-browser open example.com
659
+ ```
660
+
661
+ All options from the table above can be set in the config file using camelCase keys (e.g., `--executable-path` becomes `"executablePath"`, `--proxy-bypass` becomes `"proxyBypass"`). Unknown keys are ignored for forward compatibility.
662
+
663
+ Boolean flags accept an optional `true`/`false` value to override config settings. For example, `--headed false` disables `"headed": true` from config. A bare `--headed` is equivalent to `--headed true`.
664
+
665
+ Auto-discovered config files that are missing are silently ignored. If `--config <path>` points to a missing or invalid file, agent-browser exits with an error. Extensions from user and project configs are merged (concatenated), not replaced.
666
+
667
+ > **Tip:** If your project-level `agent-browser.json` contains environment-specific values (paths, proxies), consider adding it to `.gitignore`.
668
+
669
+ ## Default Timeout
670
+
671
+ The default timeout for standard operations (clicks, waits, fills, etc.) is 25 seconds. This is intentionally below the CLI's 30-second IPC read timeout so that the daemon returns a proper error instead of the CLI timing out with EAGAIN.
672
+
673
+ Override the default timeout via environment variable:
674
+
675
+ ```bash
676
+ # Set a longer timeout for slow pages (in milliseconds)
677
+ export AGENT_BROWSER_DEFAULT_TIMEOUT=45000
678
+ ```
679
+
680
+ > **Note:** Setting this above 30000 (30s) may cause EAGAIN errors on slow operations because the CLI's read timeout will expire before the daemon responds. The CLI retries transient errors automatically, but response times will increase.
681
+
682
+ | Variable | Description |
683
+ | ------------------------------- | ---------------------------------------- |
684
+ | `AGENT_BROWSER_DEFAULT_TIMEOUT` | Default operation timeout in ms (default: 25000) |
685
+
686
+ ## Selectors
687
+
688
+ ### Refs (Recommended for AI)
689
+
690
+ Refs provide deterministic element selection from snapshots:
691
+
692
+ ```bash
693
+ # 1. Get snapshot with refs
694
+ agent-browser snapshot
695
+ # Output:
696
+ # - heading "Example Domain" [ref=e1] [level=1]
697
+ # - button "Submit" [ref=e2]
698
+ # - textbox "Email" [ref=e3]
699
+ # - link "Learn more" [ref=e4]
700
+
701
+ # 2. Use refs to interact
702
+ agent-browser click @e2 # Click the button
703
+ agent-browser fill @e3 "test@example.com" # Fill the textbox
704
+ agent-browser get text @e1 # Get heading text
705
+ agent-browser hover @e4 # Hover the link
706
+ ```
707
+
708
+ **Why use refs?**
709
+
710
+ - **Deterministic**: Ref points to exact element from snapshot
711
+ - **Fast**: No DOM re-query needed
712
+ - **AI-friendly**: Snapshot + ref workflow is optimal for LLMs
713
+
714
+ ### CSS Selectors
715
+
716
+ ```bash
717
+ agent-browser click "#id"
718
+ agent-browser click ".class"
719
+ agent-browser click "div > button"
720
+ ```
721
+
722
+ ### Text & XPath
723
+
724
+ ```bash
725
+ agent-browser click "text=Submit"
726
+ agent-browser click "xpath=//button"
727
+ ```
728
+
729
+ ### Semantic Locators
730
+
731
+ ```bash
732
+ agent-browser find role button click --name "Submit"
733
+ agent-browser find label "Email" fill "test@test.com"
734
+ ```
735
+
736
+ ## Agent Mode
737
+
738
+ Use `--json` for machine-readable output:
739
+
740
+ ```bash
741
+ agent-browser snapshot --json
742
+ # Returns: {"success":true,"data":{"snapshot":"...","refs":{"e1":{"role":"heading","name":"Title"},...}}}
743
+
744
+ agent-browser get text @e1 --json
745
+ agent-browser is visible @e2 --json
746
+ ```
747
+
748
+ ### Optimal AI Workflow
749
+
750
+ ```bash
751
+ # 1. Navigate and get snapshot
752
+ agent-browser open example.com
753
+ agent-browser snapshot -i --json # AI parses tree and refs
754
+
755
+ # 2. AI identifies target refs from snapshot
756
+ # 3. Execute actions using refs
52
757
  agent-browser click @e2
758
+ agent-browser fill @e3 "input text"
759
+
760
+ # 4. Get new snapshot if page changed
761
+ agent-browser snapshot -i --json
762
+ ```
763
+
764
+ ### Command Chaining
765
+
766
+ Commands can be chained with `&&` in a single shell invocation. The browser persists via a background daemon, so chaining is safe and more efficient:
767
+
768
+ ```bash
769
+ # Open, wait for load, and snapshot in one call
770
+ agent-browser open example.com && agent-browser wait --load networkidle && agent-browser snapshot -i
771
+
772
+ # Chain multiple interactions
773
+ agent-browser fill @e1 "user@example.com" && agent-browser fill @e2 "pass" && agent-browser click @e3
774
+
775
+ # Navigate and screenshot
776
+ agent-browser open example.com && agent-browser wait --load networkidle && agent-browser screenshot page.png
777
+ ```
778
+
779
+ Use `&&` when you don't need intermediate output. Run commands separately when you need to parse output first (e.g., snapshot to discover refs before interacting).
780
+
781
+ ## Headed Mode
782
+
783
+ Show the browser window for debugging:
784
+
785
+ ```bash
786
+ agent-browser open example.com --headed
787
+ ```
788
+
789
+ This opens a visible browser window instead of running headless.
790
+
791
+ > **Note:** Browser extensions work in both headed and headless mode (Chrome's `--headless=new`).
792
+
793
+ ## Authenticated Sessions
794
+
795
+ Use `--headers` to set HTTP headers for a specific origin, enabling authentication without login flows:
796
+
797
+ ```bash
798
+ # Headers are scoped to api.example.com only
799
+ agent-browser open api.example.com --headers '{"Authorization": "Bearer <token>"}'
800
+
801
+ # Requests to api.example.com include the auth header
802
+ agent-browser snapshot -i --json
803
+ agent-browser click @e2
804
+
805
+ # Navigate to another domain - headers are NOT sent (safe!)
806
+ agent-browser open other-site.com
807
+ ```
808
+
809
+ This is useful for:
810
+
811
+ - **Skipping login flows** - Authenticate via headers instead of UI
812
+ - **Switching users** - Start new sessions with different auth tokens
813
+ - **API testing** - Access protected endpoints directly
814
+ - **Security** - Headers are scoped to the origin, not leaked to other domains
815
+
816
+ To set headers for multiple origins, use `--headers` with each `open` command:
817
+
818
+ ```bash
819
+ agent-browser open api.example.com --headers '{"Authorization": "Bearer token1"}'
820
+ agent-browser open api.acme.com --headers '{"Authorization": "Bearer token2"}'
821
+ ```
822
+
823
+ For global headers (all domains), use `set headers`:
824
+
825
+ ```bash
826
+ agent-browser set headers '{"X-Custom-Header": "value"}'
827
+ ```
828
+
829
+ ## Custom Browser Executable
830
+
831
+ Use a custom browser executable instead of the bundled Chromium. This is useful for:
832
+
833
+ - **Serverless deployment**: Use lightweight Chromium builds like `@sparticuz/chromium` (~50MB vs ~684MB)
834
+ - **System browsers**: Use an existing Chrome/Chromium installation
835
+ - **Custom builds**: Use modified browser builds
836
+
837
+ ### CLI Usage
838
+
839
+ ```bash
840
+ # Via flag
841
+ agent-browser --executable-path /path/to/chromium open example.com
842
+
843
+ # Via environment variable
844
+ AGENT_BROWSER_EXECUTABLE_PATH=/path/to/chromium agent-browser open example.com
845
+ ```
846
+
847
+ ### Serverless (Vercel)
848
+
849
+ Run agent-browser + Chrome in an ephemeral Vercel Sandbox microVM. No external server needed:
850
+
851
+ ```typescript
852
+ import { Sandbox } from "@vercel/sandbox";
853
+
854
+ const sandbox = await Sandbox.create({ runtime: "node24" });
855
+ await sandbox.runCommand("agent-browser", ["open", "https://example.com"]);
856
+ const result = await sandbox.runCommand("agent-browser", ["screenshot", "--json"]);
857
+ await sandbox.stop();
858
+ ```
859
+
860
+ See the [environments example](examples/environments/) for a working demo with a UI and deploy-to-Vercel button.
861
+
862
+ ### Serverless (AWS Lambda)
863
+
864
+ ```typescript
865
+ import chromium from '@sparticuz/chromium';
866
+ import { execSync } from 'child_process';
867
+
868
+ export async function handler() {
869
+ const executablePath = await chromium.executablePath();
870
+ const result = execSync(
871
+ `AGENT_BROWSER_EXECUTABLE_PATH=${executablePath} agent-browser open https://example.com && agent-browser snapshot -i --json`,
872
+ { encoding: 'utf-8' }
873
+ );
874
+ return JSON.parse(result);
875
+ }
53
876
  ```
54
877
 
55
- ### Parallel AI Runs (Isolated Runtime Channel)
878
+ ## Local Files
56
879
 
57
- Use `--parallel <name>` to run multiple AI flows concurrently without fighting over the same runtime channel.
880
+ Open and interact with local files (PDFs, HTML, etc.) using `file://` URLs:
58
881
 
59
882
  ```bash
60
- agent-browser --parallel worker-a open https://example.com
61
- agent-browser --parallel worker-b open https://example.org
883
+ # Enable file access (required for JavaScript to access local files)
884
+ agent-browser --allow-file-access open file:///path/to/document.pdf
885
+ agent-browser --allow-file-access open file:///path/to/page.html
886
+
887
+ # Take screenshot of a local PDF
888
+ agent-browser --allow-file-access open file:///Users/me/report.pdf
889
+ agent-browser screenshot report.png
62
890
  ```
63
891
 
64
- `--parallel` is designed for stateless throughput tasks (navigation, extraction, checks). For authenticated flows, keep using one stable `--session-name`.
892
+ The `--allow-file-access` flag adds Chromium flags (`--allow-file-access-from-files`, `--allow-file-access`) that allow `file://` URLs to:
65
893
 
66
- Default session isolation policy:
67
- - Running a default-session command reaps all non-default daemon sessions (`parallel-*` and legacy named channels).
68
- - This avoids stale daemon reuse and keeps stealth behavior consistent on the primary channel.
894
+ - Load and render local files
895
+ - Access other local files via JavaScript (XHR, fetch)
896
+ - Load local resources (images, scripts, stylesheets)
69
897
 
70
- | Option | Purpose | Typical Usage |
71
- | --- | --- | --- |
72
- | `--parallel <name>` | Isolate runtime channel for concurrent AI tasks | Stateless/no-login parallel jobs |
73
- | `--session-name <name>` | Persist cookies/localStorage across restarts | Login/auth continuity |
74
- | `--engine <name>` | Choose local browser engine (`chrome`, `lightpanda`) | Native-only engine experiments |
898
+ **Note:** This flag only works with Chromium. For security, it's disabled by default.
75
899
 
76
- ### Daemon Lifecycle
900
+ ## CDP Mode
77
901
 
78
- - Daemons auto-shutdown after 10 minutes of inactivity by default.
79
- - Use `--resident` to keep a daemon alive until an explicit `close`.
902
+ Connect to an existing browser via Chrome DevTools Protocol:
80
903
 
81
904
  ```bash
82
- agent-browser --resident open https://example.com
83
- # ... long-lived background workflow ...
905
+ # Start Chrome with: google-chrome --remote-debugging-port=9222
906
+
907
+ # Connect once, then run commands without --cdp
908
+ agent-browser connect 9222
909
+ agent-browser snapshot
910
+ agent-browser tab
84
911
  agent-browser close
912
+
913
+ # Or pass --cdp on each command
914
+ agent-browser --cdp 9222 snapshot
915
+
916
+ # Connect to remote browser via WebSocket URL
917
+ agent-browser --cdp "wss://your-browser-service.com/cdp?token=..." snapshot
85
918
  ```
86
919
 
87
- ### Browser Engine Selection
920
+ The `--cdp` flag accepts either:
921
+
922
+ - A port number (e.g., `9222`) for local connections via `http://localhost:{port}`
923
+ - A full WebSocket URL (e.g., `wss://...` or `ws://...`) for remote browser services
924
+
925
+ This enables control of:
926
+
927
+ - Electron apps
928
+ - Chrome/Chromium instances with remote debugging
929
+ - WebView2 applications
930
+ - Any browser exposing a CDP endpoint
88
931
 
89
- `chrome` remains the default engine. If you want to try [Lightpanda](https://lightpanda.io/docs/open-source/installation), use `--engine lightpanda`; this automatically routes through the native daemon.
932
+ ### Auto-Connect
933
+
934
+ Use `--auto-connect` to automatically discover and connect to a running Chrome instance without specifying a port:
90
935
 
91
936
  ```bash
92
- agent-browser --engine lightpanda open https://example.com
937
+ # Auto-discover running Chrome with remote debugging
938
+ agent-browser --auto-connect open example.com
939
+ agent-browser --auto-connect snapshot
940
+
941
+ # Or via environment variable
942
+ AGENT_BROWSER_AUTO_CONNECT=1 agent-browser snapshot
943
+ ```
944
+
945
+ Auto-connect discovers Chrome by:
946
+
947
+ 1. Reading Chrome's `DevToolsActivePort` file from the default user data directory
948
+ 2. Falling back to probing common debugging ports (9222, 9229)
949
+ 3. If HTTP-based discovery (`/json/version`, `/json/list`) fails, falling back to a direct WebSocket connection
950
+
951
+ This is useful when:
952
+
953
+ - Chrome 144+ has remote debugging enabled via `chrome://inspect/#remote-debugging` (which uses a dynamic port)
954
+ - You want a zero-configuration connection to your existing browser
955
+ - You don't want to track which port Chrome is using
956
+
957
+ ## Streaming (Browser Preview)
958
+
959
+ Stream the browser viewport via WebSocket for live preview or "pair browsing" where a human can watch and interact alongside an AI agent.
960
+
961
+ ### Streaming
962
+
963
+ Every session automatically starts a WebSocket stream server on an OS-assigned port. Use `stream status` to see the bound port and connection state:
964
+
965
+ ```bash
966
+ agent-browser stream status
967
+ ```
968
+
969
+ To bind to a specific port, set `AGENT_BROWSER_STREAM_PORT`:
970
+
971
+ ```bash
972
+ AGENT_BROWSER_STREAM_PORT=9223 agent-browser open example.com
973
+ ```
974
+
975
+ You can also manage streaming at runtime with `stream enable`, `stream disable`, and `stream status`:
976
+
977
+ ```bash
978
+ agent-browser stream enable --port 9223 # Re-enable on a specific port
979
+ agent-browser stream disable # Stop streaming for the session
980
+ ```
981
+
982
+ The WebSocket server streams the browser viewport and accepts input events.
983
+
984
+ ### WebSocket Protocol
985
+
986
+ Connect to `ws://localhost:9223` to receive frames and send input:
987
+
988
+ **Receive frames:**
989
+
990
+ ```json
991
+ {
992
+ "type": "frame",
993
+ "data": "<base64-encoded-jpeg>",
994
+ "metadata": {
995
+ "deviceWidth": 1280,
996
+ "deviceHeight": 720,
997
+ "pageScaleFactor": 1,
998
+ "offsetTop": 0,
999
+ "scrollOffsetX": 0,
1000
+ "scrollOffsetY": 0
1001
+ }
1002
+ }
1003
+ ```
1004
+
1005
+ **Send mouse events:**
1006
+
1007
+ ```json
1008
+ {
1009
+ "type": "input_mouse",
1010
+ "eventType": "mousePressed",
1011
+ "x": 100,
1012
+ "y": 200,
1013
+ "button": "left",
1014
+ "clickCount": 1
1015
+ }
1016
+ ```
1017
+
1018
+ **Send keyboard events:**
1019
+
1020
+ ```json
1021
+ {
1022
+ "type": "input_keyboard",
1023
+ "eventType": "keyDown",
1024
+ "key": "Enter",
1025
+ "code": "Enter"
1026
+ }
1027
+ ```
1028
+
1029
+ **Send touch events:**
1030
+
1031
+ ```json
1032
+ {
1033
+ "type": "input_touch",
1034
+ "eventType": "touchStart",
1035
+ "touchPoints": [{ "x": 100, "y": 200 }]
1036
+ }
1037
+ ```
1038
+
1039
+ ## Architecture
1040
+
1041
+ agent-browser uses a client-daemon architecture:
1042
+
1043
+ 1. **Rust CLI** - Parses commands, communicates with daemon
1044
+ 2. **Rust Daemon** - Pure Rust daemon using direct CDP, no Node.js required
1045
+
1046
+ The daemon starts automatically on first command and persists between commands for fast subsequent operations. To auto-shutdown the daemon after a period of inactivity, set `AGENT_BROWSER_IDLE_TIMEOUT_MS` (value in milliseconds). When set, the daemon closes the browser and exits after receiving no commands for the specified duration.
1047
+
1048
+ **Browser Engine:** Uses Chrome (from Chrome for Testing) by default. The `--engine` flag selects between `chrome` and `lightpanda`. Supported browsers: Chromium/Chrome (via CDP) and Safari (via WebDriver for iOS).
1049
+
1050
+ ## Platforms
1051
+
1052
+ | Platform | Binary |
1053
+ | ----------- | ----------- |
1054
+ | macOS ARM64 | Native Rust |
1055
+ | macOS x64 | Native Rust |
1056
+ | Linux ARM64 | Native Rust |
1057
+ | Linux x64 | Native Rust |
1058
+ | Windows x64 | Native Rust |
1059
+
1060
+ ## Usage with AI Agents
1061
+
1062
+ ### Just ask the agent
1063
+
1064
+ The simplest approach -- just tell your agent to use it:
93
1065
 
94
- export AGENT_BROWSER_ENGINE=lightpanda
95
- agent-browser open https://example.com
1066
+ ```
1067
+ Use agent-browser to test the login flow. Run agent-browser --help to see available commands.
1068
+ ```
1069
+
1070
+ The `--help` output is comprehensive and most agents can figure it out from there.
1071
+
1072
+ ### AI Coding Assistants (recommended)
1073
+
1074
+ Add the skill to your AI coding assistant for richer context:
1075
+
1076
+ ```bash
1077
+ npx skills add vercel-labs/agent-browser
96
1078
  ```
97
1079
 
98
- Lightpanda is headless-only and does not support `--extension`, `--state`, `--profile`, or `--allow-file-access`.
1080
+ This works with Claude Code, Codex, Cursor, Gemini CLI, GitHub Copilot, Goose, OpenCode, and Windsurf. The skill is fetched from the repository, so it stays up to date automatically -- do not copy `SKILL.md` from `node_modules` as it will become stale.
99
1081
 
100
- ### Headed Mode
1082
+ ### Claude Code
101
1083
 
102
- Use `--headed` when you want a visible browser window:
1084
+ Install as a Claude Code skill:
103
1085
 
104
1086
  ```bash
105
- agent-browser --headed open https://example.com
1087
+ npx skills add vercel-labs/agent-browser
1088
+ ```
1089
+
1090
+ This adds the skill to `.claude/skills/agent-browser/SKILL.md` in your project. The skill teaches Claude Code the full agent-browser workflow, including the snapshot-ref interaction pattern, session management, and timeout handling.
1091
+
1092
+ ### AGENTS.md / CLAUDE.md
1093
+
1094
+ For more consistent results, add to your project or global instructions file:
1095
+
1096
+ ```markdown
1097
+ ## Browser Automation
1098
+
1099
+ Use `agent-browser` for web automation. Run `agent-browser --help` for all commands.
1100
+
1101
+ Core workflow:
106
1102
 
107
- AGENT_BROWSER_HEADED=1 agent-browser open https://example.com
108
- AGENT_BROWSER_HEADED=true agent-browser open https://example.com
1103
+ 1. `agent-browser open <url>` - Navigate to page
1104
+ 2. `agent-browser snapshot -i` - Get interactive elements with refs (@e1, @e2)
1105
+ 3. `agent-browser click @e1` / `fill @e2 "text"` - Interact using refs
1106
+ 4. Re-snapshot after page changes
109
1107
  ```
110
1108
 
111
- In this fork, local launches default to headed mode unless headless is explicitly requested. Extension launches also stay headed by default so the stealth/runtime policy remains stable.
1109
+ ## Integrations
1110
+
1111
+ ### iOS Simulator
1112
+
1113
+ Control real Mobile Safari in the iOS Simulator for authentic mobile web testing. Requires macOS with Xcode.
1114
+
1115
+ **Setup:**
1116
+
1117
+ ```bash
1118
+ # Install Appium and XCUITest driver
1119
+ npm install -g appium
1120
+ appium driver install xcuitest
1121
+ ```
1122
+
1123
+ **Usage:**
1124
+
1125
+ ```bash
1126
+ # List available iOS simulators
1127
+ agent-browser device list
1128
+
1129
+ # Launch Safari on a specific device
1130
+ agent-browser -p ios --device "iPhone 16 Pro" open https://example.com
1131
+
1132
+ # Same commands as desktop
1133
+ agent-browser -p ios snapshot -i
1134
+ agent-browser -p ios tap @e1
1135
+ agent-browser -p ios fill @e2 "text"
1136
+ agent-browser -p ios screenshot mobile.png
1137
+
1138
+ # Mobile-specific commands
1139
+ agent-browser -p ios swipe up
1140
+ agent-browser -p ios swipe down 500
1141
+
1142
+ # Close session
1143
+ agent-browser -p ios close
1144
+ ```
112
1145
 
113
- ### Default: Auto Group Agent Tabs (CDP + Plugin)
1146
+ Or use environment variables:
114
1147
 
115
1148
  ```bash
1149
+ export AGENT_BROWSER_PROVIDER=ios
1150
+ export AGENT_BROWSER_IOS_DEVICE="iPhone 16 Pro"
116
1151
  agent-browser open https://example.com
117
- # In CDP mode, tabs are grouped when the tab-group extension is installed
118
-
119
- # Override group title
120
- agent-browser --tab-group "My Agent Group" open https://example.com
121
- ```
122
-
123
- - CDP (`--cdp` / `--auto-connect`) keeps working unchanged.
124
- - If the extension is installed and handshake succeeds, agent tabs are grouped by session:
125
- - session=`default`: `Agent Browser Stealth`
126
- - other sessions: `Agent Browser Stealth • <session>`
127
- - If the extension is missing/unavailable, commands continue normally with silent no-op (no warning/error unless `AGENT_BROWSER_DEBUG=1`).
128
- - Env overrides:
129
- - `AGENT_BROWSER_TAB_GROUP` for base title
130
- - `AGENT_BROWSER_TAB_GROUP_PLUGIN_ID` for expected extension ID
131
-
132
- Install once in Chrome: load unpacked extension from `extensions/tab-group-cdp/` (extension name: `agent-browser-stealth`).
133
-
134
- ### Extension Capabilities (`agent-browser-stealth`)
135
-
136
- - Session window isolation: tabs are kept in their session window when possible.
137
- - Configurable isolation controls: side panel can toggle `strictWindowIsolation` and cross-window activation guard.
138
- - Session-aware grouping: deterministic group color, default session expanded, non-default sessions collapsed.
139
- - Download archive routing: downloads from managed tabs are routed to `agent-browser-stealth/<session>/...`.
140
- - Domain allowlist fallback: when allowlist is configured for a session, extension can force-block out-of-policy tabs to `about:blank`.
141
- - Risk hints (debug only): suspicious host/TLD hints are returned via handshake and printed only when `AGENT_BROWSER_DEBUG=1`.
142
- - Side panel browser controls: open/back/forward/reload, click/fill/press by CSS selector, run shortcut commands, and switch/close tabs.
143
- - Side panel developer signals: capture page console errors/warnings, fetch/xhr network events, command history, and live DOM snapshots.
144
- - Workflow automation: record actions into workflows, run workflows, map workflows to slash shortcuts, and schedule runs (daily/weekly/monthly/yearly).
145
- - Side panel operations console: view session/tab/group mapping, focus a session, keep only one session, clean empty groups, edit session allowlist, and toggle auto-clean.
146
-
147
- ## Stealth Architecture
148
-
149
- ```mermaid
150
- flowchart TD
151
- A["Command Input"] --> B["Stealth Policy Resolver"]
152
- B --> C["Connection Mode Detection"]
153
- C --> D["Launch Layer: Chromium Args"]
154
- C --> E["CDP Layer: UA + Metadata Override"]
155
- C --> F["Context Layer: Init Script Patches"]
156
- D --> G["Behavior Layer: Humanized Interaction"]
157
- E --> G
158
- F --> G
159
- G --> H["Risk Layer: Verification Detection and Handling"]
160
- H --> I["Response with warnings and riskSignals"]
161
- ```
162
-
163
- ### Policy by Connection Mode
164
-
165
- | Mode | Stealth Capabilities | Notes |
166
- | --------------------------------------- | ------------------------------------------------------------- | -------------------------------------------- |
167
- | Local Chromium launch | Chromium launch args + CDP UA override + context init scripts | Most complete stack |
168
- | Existing browser via CDP | CDP UA override + context init scripts | No local Chromium arg injection |
169
- | Cloud provider (browserbase/browseruse) | Context init scripts | Remote browser runtime controls launch layer |
170
- | Kernel provider | Context init scripts + provider-managed stealth | Provider-side stealth may also apply |
1152
+ ```
1153
+
1154
+ | Variable | Description |
1155
+ | -------------------------- | ----------------------------------------------- |
1156
+ | `AGENT_BROWSER_PROVIDER` | Set to `ios` to enable iOS mode |
1157
+ | `AGENT_BROWSER_IOS_DEVICE` | Device name (e.g., "iPhone 16 Pro", "iPad Pro") |
1158
+ | `AGENT_BROWSER_IOS_UDID` | Device UDID (alternative to device name) |
1159
+
1160
+ **Supported devices:** All iOS Simulators available in Xcode (iPhones, iPads), plus real iOS devices.
1161
+
1162
+ **Note:** The iOS provider boots the simulator, starts Appium, and controls Safari. First launch takes ~30-60 seconds; subsequent commands are fast.
1163
+
1164
+ #### Real Device Support
1165
+
1166
+ Appium also supports real iOS devices connected via USB. This requires additional one-time setup:
171
1167
 
172
- ## Principle 1: Always-On Stealth with Explicit Boundaries
1168
+ **1. Get your device UDID:**
173
1169
 
174
- - Stealth defaults to enabled and does not depend on a runtime toggle.
175
- - Project policy forbids:
176
- - `--profile` / `AGENT_BROWSER_PROFILE`
177
- - `--channel` / `AGENT_BROWSER_CHANNEL`
178
- - Default CLI policy uses a dedicated automation browser on CDP `localhost:9333`. If `:9333` is unavailable, agent-browser auto-starts Chrome with the persistent profile `~/.agent-browser/chrome-bot-profile`.
1170
+ ```bash
1171
+ xcrun xctrace list devices
1172
+ # or
1173
+ system_profiler SPUSBDataType | grep -A 5 "iPhone\|iPad"
1174
+ ```
179
1175
 
180
- ## Principle 2: Multi-Layer Fingerprint Hardening
1176
+ **2. Sign WebDriverAgent (one-time):**
181
1177
 
182
- ### 2.1 Launch Layer (Local Chromium)
1178
+ ```bash
1179
+ # Open the WebDriverAgent Xcode project
1180
+ cd ~/.appium/node_modules/appium-xcuitest-driver/node_modules/appium-webdriveragent
1181
+ open WebDriverAgent.xcodeproj
1182
+ ```
183
1183
 
184
- Injected Chromium args:
185
-
186
- - `--disable-blink-features=AutomationControlled`
187
- - `--use-gl=angle`
188
- - `--use-angle=default`
1184
+ In Xcode:
189
1185
 
190
- If no custom UA is set, the runtime UA is normalized to remove `HeadlessChrome` tokens.
1186
+ - Select the `WebDriverAgentRunner` target
1187
+ - Go to Signing & Capabilities
1188
+ - Select your Team (requires Apple Developer account, free tier works)
1189
+ - Let Xcode manage signing automatically
1190
+
1191
+ **3. Use with agent-browser:**
1192
+
1193
+ ```bash
1194
+ # Connect device via USB, then:
1195
+ agent-browser -p ios --device "<DEVICE_UDID>" open https://example.com
191
1196
 
192
- ### 2.2 CDP Layer (Browser/Page Targets)
1197
+ # Or use the device name if unique
1198
+ agent-browser -p ios --device "John's iPhone" open https://example.com
1199
+ ```
1200
+
1201
+ **Real device notes:**
1202
+
1203
+ - First run installs WebDriverAgent to the device (may require Trust prompt)
1204
+ - Device must be unlocked and connected via USB
1205
+ - Slightly slower initial connection than simulator
1206
+ - Tests against real Safari performance and behavior
1207
+
1208
+ ### Browserless
1209
+
1210
+ [Browserless](https://browserless.io) provides cloud browser infrastructure with a Sessions API. Use it when running agent-browser in environments where a local browser isn't available.
1211
+
1212
+ To enable Browserless, use the `-p` flag:
1213
+
1214
+ ```bash
1215
+ export BROWSERLESS_API_KEY="your-api-token"
1216
+ agent-browser -p browserless open https://example.com
1217
+ ```
1218
+
1219
+ Or use environment variables for CI/scripts:
1220
+
1221
+ ```bash
1222
+ export AGENT_BROWSER_PROVIDER=browserless
1223
+ export BROWSERLESS_API_KEY="your-api-token"
1224
+ agent-browser open https://example.com
1225
+ ```
193
1226
 
194
- - Uses `Emulation.setUserAgentOverride` to align:
195
- - `userAgent`
196
- - `acceptLanguage`
197
- - `userAgentMetadata` brands and versions
198
- - Applies overrides for existing/new targets, including worker-relevant contexts.
199
- - Forces opaque white background (`Emulation.setDefaultBackgroundColorOverride`) to avoid headless transparency fingerprints.
200
-
201
- ### 2.3 Context Init-Script Layer (Patch Inventory)
202
-
203
- The init script patch set is injected before page scripts and currently includes:
204
-
205
- 1. `navigator.webdriver` removal (including prototype-level cleanup).
206
- 2. CSS webdriver heuristic neutralization (`CSS.supports('border-end-end-radius: initial')` probe).
207
- 3. `window.chrome.runtime` bootstrap for missing runtime surfaces.
208
- 4. Locale/language normalization (`navigator.language`, `navigator.languages`).
209
- 5. Realistic `navigator.plugins` and `navigator.mimeTypes`.
210
- 6. `navigator.permissions.query` normalization for notifications.
211
- 7. WebGL vendor/renderer masking when SwiftShader indicators are present.
212
- 8. `cdc_` property cleanup on document/documentElement.
213
- 9. Window/screen dimension normalization (`outerWidth/outerHeight/screenX/screenY`).
214
- 10. Screen availability patching (`availWidth/availHeight`).
215
- 11. Hardware concurrency stabilization.
216
- 12. Notification permission consistency.
217
- 13. Active text color heuristic patching.
218
- 14. `navigator.connection` normalization.
219
- 15. Worker network signal normalization (`downlinkMax`).
220
- 16. `prefers-color-scheme` light-mode heuristic neutralization.
221
- 17. `navigator.share` exposure.
222
- 18. `navigator.contacts` exposure.
223
- 19. `contentIndex` exposure.
224
- 20. `navigator.pdfViewerEnabled` normalization.
225
- 21. Media devices surface normalization.
226
- 22. `navigator.userAgent` cleanup (strip `HeadlessChrome`).
227
- 23. `navigator.userAgentData` brand cleanup.
228
- 24. `performance.memory` stabilization.
229
- 25. Default background color patching at script level.
1227
+ Optional configuration via environment variables:
230
1228
 
231
- ## Principle 3: Behavioral Humanization
1229
+ | Variable | Description | Default |
1230
+ | -------------------------- | ------------------------------------------------ | --------------------------------------- |
1231
+ | `BROWSERLESS_API_URL` | Base API URL (for custom regions or self-hosted) | `https://production-sfo.browserless.io` |
1232
+ | `BROWSERLESS_BROWSER_TYPE` | Type of browser to use (chromium or chrome) | chromium |
1233
+ | `BROWSERLESS_TTL` | Session TTL in milliseconds | `300000` |
1234
+ | `BROWSERLESS_STEALTH` | Enable stealth mode (`true`/`false`) | `true` |
232
1235
 
233
- - Navigation pacing jitter before `goto` (short randomized delay).
234
- - Typing jitter for `type --delay` and `keyboard type --delay`:
235
- - per-character randomized delay around the requested base delay (about ±40%).
236
- - Click path humanization:
237
- - cursor moves on a Bezier-like curve before click.
238
- - Wait supports random ranges (`wait min-max`) for non-uniform timing.
1236
+ When enabled, agent-browser connects to a Browserless cloud session instead of launching a local browser. All commands work identically.
239
1237
 
240
- ## Principle 4: Region Signal Alignment
1238
+ Get your API token from the [Browserless Dashboard](https://browserless.io).
241
1239
 
242
- Before navigation, the runtime derives region hints from target URL TLD and aligns:
1240
+ ### Browserbase
243
1241
 
244
- - locale
245
- - timezone
246
- - `Accept-Language`
1242
+ [Browserbase](https://browserbase.com) provides remote browser infrastructure to make deployment of agentic browsing agents easy. Use it when running the agent-browser CLI in an environment where a local browser isn't feasible.
247
1243
 
248
- Examples of built-in mappings include `tw`, `jp`, `kr`, `sg`, `de`, `fr`, `uk`, `in`, `au`.
1244
+ To enable Browserbase, use the `-p` flag:
249
1245
 
250
- Manual overrides are supported:
1246
+ ```bash
1247
+ export BROWSERBASE_API_KEY="your-api-key"
1248
+ agent-browser -p browserbase open https://example.com
1249
+ ```
251
1250
 
252
- - `AGENT_BROWSER_LOCALE`
253
- - `AGENT_BROWSER_TIMEZONE` (or `TZ`)
1251
+ Or use environment variables for CI/scripts:
254
1252
 
255
- ## Principle 5: Verification-Aware Risk Control
1253
+ ```bash
1254
+ export AGENT_BROWSER_PROVIDER=browserbase
1255
+ export BROWSERBASE_API_KEY="your-api-key"
1256
+ agent-browser open https://example.com
1257
+ ```
256
1258
 
257
- When a navigation lands on verification/captcha pages, structured risk signals are generated from URL/title/page-text evidence.
1259
+ When enabled, agent-browser connects to a Browserbase session instead of launching a local browser. All commands work identically.
258
1260
 
259
- `riskSignals` include:
1261
+ Get your API key from the [Browserbase Dashboard](https://browserbase.com/overview).
260
1262
 
261
- - `code`
262
- - `source` (`url` or `title`)
263
- - `evidence`
264
- - `confidence`
1263
+ ### Browser Use
265
1264
 
266
- ### Risk Mode
1265
+ [Browser Use](https://browser-use.com) provides cloud browser infrastructure for AI agents. Use it when running agent-browser in environments where a local browser isn't available (serverless, CI/CD, etc.).
267
1266
 
268
- - `warn` (default): wait for auto-clear, then retry with randomized backoff and return warnings + `riskSignals`.
269
- - `block`: fail fast once verification/captcha interstitial is detected.
270
- - `off`: skip detection/retry path.
1267
+ To enable Browser Use, use the `-p` flag:
271
1268
 
272
1269
  ```bash
273
- agent-browser --risk-mode warn open https://example.com
274
- agent-browser --risk-mode block open https://example.com
275
- AGENT_BROWSER_RISK_MODE=off agent-browser open https://example.com
1270
+ export BROWSER_USE_API_KEY="your-api-key"
1271
+ agent-browser -p browseruse open https://example.com
276
1272
  ```
277
1273
 
278
- ```mermaid
279
- flowchart TD
280
- A["Navigate"] --> B["Collect URL/Title/Text Signals"]
281
- B --> C{"risk-mode"}
282
- C -->|off| D["Return Success"]
283
- C -->|block| E["Return Error with First Signal"]
284
- C -->|warn| F["Wait for auto-clear, then retry up to 2 times"]
285
- F --> G{"Signals Cleared"}
286
- G -->|yes| H["Return Success + recovery warning + riskSignals"]
287
- G -->|no| I["Return Success + warning + riskSignals"]
1274
+ Or use environment variables for CI/scripts:
1275
+
1276
+ ```bash
1277
+ export AGENT_BROWSER_PROVIDER=browseruse
1278
+ export BROWSER_USE_API_KEY="your-api-key"
1279
+ agent-browser open https://example.com
288
1280
  ```
289
1281
 
290
- ## Operational Recommendations
1282
+ When enabled, agent-browser connects to a Browser Use cloud session instead of launching a local browser. All commands work identically.
291
1283
 
292
- - Prefer `--headed` for high-friction targets.
293
- - Reuse session state with one stable `--session-name` for continuity (when omitted, it defaults to `default`).
294
- - Use `--parallel <name>` only for stateless parallel workloads where higher throughput matters.
295
- - Default-session commands will reap all non-default daemon sessions, so keep parallel workers short-lived.
296
- - Use `--resident` only for deliberate long-running workflows, and close when done.
297
- - Keep locale/timezone consistent with target market.
298
- - For challenge-heavy pages, prefer `--wait-until domcontentloaded` on `open`/`navigate` to avoid `load` stalls.
299
- - Use `--risk-mode block` in strict pipelines that require explicit operator intervention on verification pages.
300
- - For `cookies set`, use either `--url <url>`, or `--domain <domain> --path <path>` together.
301
- - If `--url`, `--domain`, and `--path` are all omitted, the cookie is scoped from the current page URL.
1284
+ Get your API key from the [Browser Use Cloud Dashboard](https://cloud.browser-use.com/settings?tab=api-keys). Free credits are available to get started, with pay-as-you-go pricing after.
302
1285
 
303
- ## Validation Scripts
1286
+ ### Kernel
304
1287
 
305
- Run public detector checks after stealth changes:
1288
+ [Kernel](https://www.kernel.sh) provides cloud browser infrastructure for AI agents with features like stealth mode and persistent profiles.
1289
+
1290
+ To enable Kernel, use the `-p` flag:
306
1291
 
307
1292
  ```bash
308
- node scripts/check-sannysoft-webdriver.js --binary ./cli/target/release/agent-browser
309
- node scripts/check-creepjs-headless.js --binary ./cli/target/release/agent-browser
310
- node scripts/check-stealth-regression.js --binary ./cli/target/release/agent-browser
311
- pnpm run check:turnstile-testkey
1293
+ export KERNEL_API_KEY="your-api-key"
1294
+ agent-browser -p kernel open https://example.com
312
1295
  ```
313
1296
 
314
- ## Doctor Diagnostics
1297
+ Or use environment variables for CI/scripts:
1298
+
1299
+ ```bash
1300
+ export AGENT_BROWSER_PROVIDER=kernel
1301
+ export KERNEL_API_KEY="your-api-key"
1302
+ agent-browser open https://example.com
1303
+ ```
315
1304
 
316
- Use `doctor` to quickly diagnose local CDP, sourceURL sanitization, and tab-group plugin readiness:
1305
+ Optional configuration via environment variables:
1306
+
1307
+ | Variable | Description | Default |
1308
+ | ------------------------ | -------------------------------------------------------------------------------- | ------- |
1309
+ | `KERNEL_HEADLESS` | Run browser in headless mode (`true`/`false`) | `false` |
1310
+ | `KERNEL_STEALTH` | Enable stealth mode to avoid bot detection (`true`/`false`) | `true` |
1311
+ | `KERNEL_TIMEOUT_SECONDS` | Session timeout in seconds | `300` |
1312
+ | `KERNEL_PROFILE_NAME` | Browser profile name for persistent cookies/logins (created if it doesn't exist) | (none) |
1313
+
1314
+ When enabled, agent-browser connects to a Kernel cloud session instead of launching a local browser. All commands work identically.
1315
+
1316
+ **Profile Persistence:** When `KERNEL_PROFILE_NAME` is set, the profile will be created if it doesn't already exist. Cookies, logins, and session data are automatically saved back to the profile when the browser session ends, making them available for future sessions.
1317
+
1318
+ Get your API key from the [Kernel Dashboard](https://dashboard.onkernel.com).
1319
+
1320
+ ### AgentCore
1321
+
1322
+ [AWS Bedrock AgentCore](https://aws.amazon.com/bedrock/agentcore/) provides cloud browser sessions with SigV4 authentication.
1323
+
1324
+ To enable AgentCore, use the `-p` flag:
317
1325
 
318
1326
  ```bash
319
- agent-browser doctor
320
- agent-browser --json doctor
1327
+ agent-browser -p agentcore open https://example.com
321
1328
  ```
322
1329
 
323
- `doctor` checks:
1330
+ Or use environment variables for CI/scripts:
1331
+
1332
+ ```bash
1333
+ export AGENT_BROWSER_PROVIDER=agentcore
1334
+ agent-browser open https://example.com
1335
+ ```
1336
+
1337
+ Credentials are automatically resolved from environment variables (`AWS_ACCESS_KEY_ID`, `AWS_SECRET_ACCESS_KEY`) or the AWS CLI (`aws configure export-credentials`), which supports SSO, profiles, and IAM roles.
1338
+
1339
+ Optional configuration via environment variables:
324
1340
 
325
- - CDP probe status (preferred `:9333` plus common ports)
326
- - DevToolsActivePort discovery from local Chrome profiles
327
- - CDP Runtime.evaluate sourceURL sanitization probe
328
- - Plugin handshake page context check (internal page vs normal `http(s)` page)
329
- - Tab-group extension handshake (when currently attached in CDP mode)
1341
+ | Variable | Description | Default |
1342
+ | -------------------------- | -------------------------------------------------------------------- | ---------------- |
1343
+ | `AGENTCORE_REGION` | AWS region for the AgentCore endpoint | `us-east-1` |
1344
+ | `AGENTCORE_BROWSER_ID` | Browser identifier | `aws.browser.v1` |
1345
+ | `AGENTCORE_PROFILE_ID` | Browser profile for persistent state (cookies, localStorage) | (none) |
1346
+ | `AGENTCORE_SESSION_TIMEOUT`| Session timeout in seconds | `3600` |
1347
+ | `AWS_PROFILE` | AWS CLI profile for credential resolution | `default` |
330
1348
 
331
- ## Upstream Compatibility
1349
+ **Browser profiles:** When `AGENTCORE_PROFILE_ID` is set, browser state (cookies, localStorage) is persisted across sessions automatically.
332
1350
 
333
- This fork intentionally keeps command workflows close to upstream while concentrating custom behavior in stealth, policy, and anti-detection handling.
1351
+ When enabled, agent-browser connects to an AgentCore cloud browser session instead of launching a local browser. All commands work identically.
334
1352
 
335
1353
  ## License
336
1354