agent-browser-stealth 0.14.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (77) hide show
  1. package/LICENSE +201 -0
  2. package/README.md +1219 -0
  3. package/bin/agent-browser-darwin-arm64 +0 -0
  4. package/bin/agent-browser-local +0 -0
  5. package/bin/agent-browser.js +109 -0
  6. package/dist/actions.d.ts +17 -0
  7. package/dist/actions.d.ts.map +1 -0
  8. package/dist/actions.js +1917 -0
  9. package/dist/actions.js.map +1 -0
  10. package/dist/browser.d.ts +598 -0
  11. package/dist/browser.d.ts.map +1 -0
  12. package/dist/browser.js +2287 -0
  13. package/dist/browser.js.map +1 -0
  14. package/dist/daemon.d.ts +66 -0
  15. package/dist/daemon.d.ts.map +1 -0
  16. package/dist/daemon.js +603 -0
  17. package/dist/daemon.js.map +1 -0
  18. package/dist/diff.d.ts +18 -0
  19. package/dist/diff.d.ts.map +1 -0
  20. package/dist/diff.js +271 -0
  21. package/dist/diff.js.map +1 -0
  22. package/dist/encryption.d.ts +50 -0
  23. package/dist/encryption.d.ts.map +1 -0
  24. package/dist/encryption.js +85 -0
  25. package/dist/encryption.js.map +1 -0
  26. package/dist/ios-actions.d.ts +11 -0
  27. package/dist/ios-actions.d.ts.map +1 -0
  28. package/dist/ios-actions.js +228 -0
  29. package/dist/ios-actions.js.map +1 -0
  30. package/dist/ios-manager.d.ts +266 -0
  31. package/dist/ios-manager.d.ts.map +1 -0
  32. package/dist/ios-manager.js +1073 -0
  33. package/dist/ios-manager.js.map +1 -0
  34. package/dist/protocol.d.ts +26 -0
  35. package/dist/protocol.d.ts.map +1 -0
  36. package/dist/protocol.js +935 -0
  37. package/dist/protocol.js.map +1 -0
  38. package/dist/snapshot.d.ts +67 -0
  39. package/dist/snapshot.d.ts.map +1 -0
  40. package/dist/snapshot.js +514 -0
  41. package/dist/snapshot.js.map +1 -0
  42. package/dist/state-utils.d.ts +77 -0
  43. package/dist/state-utils.d.ts.map +1 -0
  44. package/dist/state-utils.js +178 -0
  45. package/dist/state-utils.js.map +1 -0
  46. package/dist/stealth.d.ts +22 -0
  47. package/dist/stealth.d.ts.map +1 -0
  48. package/dist/stealth.js +614 -0
  49. package/dist/stealth.js.map +1 -0
  50. package/dist/stream-server.d.ts +117 -0
  51. package/dist/stream-server.d.ts.map +1 -0
  52. package/dist/stream-server.js +309 -0
  53. package/dist/stream-server.js.map +1 -0
  54. package/dist/types.d.ts +855 -0
  55. package/dist/types.d.ts.map +1 -0
  56. package/dist/types.js +2 -0
  57. package/dist/types.js.map +1 -0
  58. package/package.json +85 -0
  59. package/scripts/build-all-platforms.sh +68 -0
  60. package/scripts/check-creepjs-headless.js +137 -0
  61. package/scripts/check-sannysoft-webdriver.js +112 -0
  62. package/scripts/check-version-sync.js +39 -0
  63. package/scripts/copy-native.js +36 -0
  64. package/scripts/postinstall.js +275 -0
  65. package/scripts/sync-upstream.sh +142 -0
  66. package/scripts/sync-version.js +69 -0
  67. package/skills/agent-browser/SKILL.md +464 -0
  68. package/skills/agent-browser/references/authentication.md +202 -0
  69. package/skills/agent-browser/references/commands.md +263 -0
  70. package/skills/agent-browser/references/profiling.md +120 -0
  71. package/skills/agent-browser/references/proxy-support.md +194 -0
  72. package/skills/agent-browser/references/session-management.md +193 -0
  73. package/skills/agent-browser/references/snapshot-refs.md +194 -0
  74. package/skills/agent-browser/references/video-recording.md +173 -0
  75. package/skills/agent-browser/templates/authenticated-session.sh +100 -0
  76. package/skills/agent-browser/templates/capture-workflow.sh +69 -0
  77. package/skills/agent-browser/templates/form-automation.sh +62 -0
package/README.md ADDED
@@ -0,0 +1,1219 @@
1
+ # agent-browser
2
+
3
+ Stealth browser automation CLI for AI agents with anti-bot evasions. Fast Rust CLI with Node.js fallback.
4
+
5
+ ## Installation
6
+
7
+ ### Global Installation (recommended)
8
+
9
+ Installs the native Rust binary for maximum performance:
10
+
11
+ ```bash
12
+ npm install -g agent-browser-stealth
13
+ agent-browser install # Download Chromium
14
+ ```
15
+
16
+ This is the fastest option -- commands run through the native Rust CLI directly with sub-millisecond parsing overhead.
17
+
18
+ ### Quick Start (no install)
19
+
20
+ Run directly with `npx` if you want to try it without installing globally:
21
+
22
+ ```bash
23
+ npx agent-browser-stealth install # Download Chromium (first time only)
24
+ npx agent-browser-stealth open example.com
25
+ ```
26
+
27
+ > **Note:** `npx` routes through Node.js before reaching the Rust CLI, so it is noticeably slower than a global install. For regular use, install globally.
28
+
29
+ ### Project Installation (local dependency)
30
+
31
+ For projects that want to pin the version in `package.json`:
32
+
33
+ ```bash
34
+ npm install agent-browser-stealth
35
+ npx agent-browser-stealth install
36
+ ```
37
+
38
+ Then use via `npx` or `package.json` scripts:
39
+
40
+ ```bash
41
+ npx agent-browser-stealth open example.com
42
+ ```
43
+
44
+ ### Homebrew (macOS)
45
+
46
+ ```bash
47
+ brew install agent-browser
48
+ agent-browser install # Download Chromium
49
+ ```
50
+
51
+ ### From Source
52
+
53
+ ```bash
54
+ git clone https://github.com/leeguooooo/agent-browser
55
+ cd agent-browser
56
+ pnpm install
57
+ pnpm build
58
+ pnpm build:native # Requires Rust (https://rustup.rs)
59
+ pnpm link --global # Makes agent-browser available globally
60
+ agent-browser install
61
+ ```
62
+
63
+ ### Fork Maintenance (Independent Release + Upstream Sync)
64
+
65
+ If you maintain a fork and publish your own CLI, use this workflow:
66
+
67
+ 1. Keep an upstream-tracking branch (`upstream-main`) for clean sync history.
68
+ 2. Keep your release branch (`main`) for production-ready code only.
69
+ 3. Merge upstream into short-lived sync branches, then open PRs into `main`.
70
+
71
+ One-time setup:
72
+
73
+ ```bash
74
+ git remote add upstream https://github.com/vercel-labs/agent-browser.git
75
+ git fetch upstream
76
+ ```
77
+
78
+ Regular sync:
79
+
80
+ ```bash
81
+ pnpm run sync:upstream:push
82
+ ```
83
+
84
+ This command:
85
+ - Fetches `upstream/main`
86
+ - Fast-forwards local `upstream-main`
87
+ - Creates `sync/YYYY-MM-DD` from local `main`
88
+ - Merges `upstream-main` into the sync branch
89
+ - Pushes the sync branch to `origin` (with `sync:upstream:push`)
90
+
91
+ If merge conflicts occur, resolve them on the sync branch and open a PR as usual.
92
+
93
+ Independent release checklist for forks:
94
+ - Use your own npm package name and CLI binary name (avoid conflicts with upstream package ownership).
95
+ - Update `repository`, `bugs`, and `homepage` in `package.json` to your fork.
96
+ - Configure npm Trusted Publishing (OIDC) for your package and repository workflow.
97
+ - Keep release tags and changelog in your own namespace/versioning policy.
98
+
99
+ ### Linux Dependencies
100
+
101
+ On Linux, install system dependencies:
102
+
103
+ ```bash
104
+ agent-browser install --with-deps
105
+ # or manually: npx playwright install-deps chromium
106
+ ```
107
+
108
+ ## Quick Start
109
+
110
+ ```bash
111
+ agent-browser open example.com
112
+ agent-browser snapshot # Get accessibility tree with refs
113
+ agent-browser click @e2 # Click by ref from snapshot
114
+ agent-browser fill @e3 "test@example.com" # Fill by ref
115
+ agent-browser get text @e1 # Get text by ref
116
+ agent-browser screenshot page.png
117
+ agent-browser close
118
+ ```
119
+
120
+ ### Traditional Selectors (also supported)
121
+
122
+ ```bash
123
+ agent-browser click "#submit"
124
+ agent-browser fill "#email" "test@example.com"
125
+ agent-browser find role button click --name "Submit"
126
+ ```
127
+
128
+ ## Commands
129
+
130
+ ### Core Commands
131
+
132
+ ```bash
133
+ agent-browser open <url> # Navigate to URL (aliases: goto, navigate)
134
+ agent-browser click <sel> # Click element (--new-tab to open in new tab)
135
+ agent-browser dblclick <sel> # Double-click element
136
+ agent-browser focus <sel> # Focus element
137
+ agent-browser type <sel> <text> # Type into element
138
+ agent-browser fill <sel> <text> # Clear and fill
139
+ agent-browser press <key> # Press key (Enter, Tab, Control+a) (alias: key)
140
+ agent-browser keyboard type <text> # Type with real keystrokes (no selector, current focus)
141
+ agent-browser keyboard inserttext <text> # Insert text without key events (no selector)
142
+ agent-browser keydown <key> # Hold key down
143
+ agent-browser keyup <key> # Release key
144
+ agent-browser hover <sel> # Hover element
145
+ agent-browser select <sel> <val> # Select dropdown option
146
+ agent-browser check <sel> # Check checkbox
147
+ agent-browser uncheck <sel> # Uncheck checkbox
148
+ agent-browser scroll <dir> [px] # Scroll (up/down/left/right)
149
+ agent-browser scrollintoview <sel> # Scroll element into view (alias: scrollinto)
150
+ agent-browser drag <src> <tgt> # Drag and drop
151
+ agent-browser upload <sel> <files> # Upload files
152
+ agent-browser screenshot [path] # Take screenshot (--full for full page, saves to a temporary directory if no path)
153
+ agent-browser screenshot --annotate # Annotated screenshot with numbered element labels
154
+ agent-browser pdf <path> # Save as PDF
155
+ agent-browser snapshot # Accessibility tree with refs (best for AI)
156
+ agent-browser eval <js> # Run JavaScript (-b for base64, --stdin for piped input)
157
+ agent-browser connect <port> # Connect to browser via CDP
158
+ agent-browser close # Close browser (aliases: quit, exit)
159
+ ```
160
+
161
+ ### Get Info
162
+
163
+ ```bash
164
+ agent-browser get text <sel> # Get text content
165
+ agent-browser get html <sel> # Get innerHTML
166
+ agent-browser get value <sel> # Get input value
167
+ agent-browser get attr <sel> <attr> # Get attribute
168
+ agent-browser get title # Get page title
169
+ agent-browser get url # Get current URL
170
+ agent-browser get count <sel> # Count matching elements
171
+ agent-browser get box <sel> # Get bounding box
172
+ agent-browser get styles <sel> # Get computed styles
173
+ ```
174
+
175
+ ### Check State
176
+
177
+ ```bash
178
+ agent-browser is visible <sel> # Check if visible
179
+ agent-browser is enabled <sel> # Check if enabled
180
+ agent-browser is checked <sel> # Check if checked
181
+ ```
182
+
183
+ ### Find Elements (Semantic Locators)
184
+
185
+ ```bash
186
+ agent-browser find role <role> <action> [value] # By ARIA role
187
+ agent-browser find text <text> <action> # By text content
188
+ agent-browser find label <label> <action> [value] # By label
189
+ agent-browser find placeholder <ph> <action> [value] # By placeholder
190
+ agent-browser find alt <text> <action> # By alt text
191
+ agent-browser find title <text> <action> # By title attr
192
+ agent-browser find testid <id> <action> [value] # By data-testid
193
+ agent-browser find first <sel> <action> [value] # First match
194
+ agent-browser find last <sel> <action> [value] # Last match
195
+ agent-browser find nth <n> <sel> <action> [value] # Nth match
196
+ ```
197
+
198
+ **Actions:** `click`, `fill`, `type`, `hover`, `focus`, `check`, `uncheck`, `text`
199
+
200
+ **Options:** `--name <name>` (filter role by accessible name), `--exact` (require exact text match)
201
+
202
+ **Examples:**
203
+ ```bash
204
+ agent-browser find role button click --name "Submit"
205
+ agent-browser find text "Sign In" click
206
+ agent-browser find label "Email" fill "test@test.com"
207
+ agent-browser find first ".item" click
208
+ agent-browser find nth 2 "a" text
209
+ ```
210
+
211
+ ### Wait
212
+
213
+ ```bash
214
+ agent-browser wait <selector> # Wait for element to be visible
215
+ agent-browser wait <ms> # Wait for time (milliseconds)
216
+ agent-browser wait 2000-5000 # Random wait between 2-5 seconds
217
+ agent-browser wait --text "Welcome" # Wait for text to appear
218
+ agent-browser wait --url "**/dash" # Wait for URL pattern
219
+ agent-browser wait --load networkidle # Wait for load state
220
+ agent-browser wait --fn "window.ready === true" # Wait for JS condition
221
+ ```
222
+
223
+ **Load states:** `load`, `domcontentloaded`, `networkidle`
224
+
225
+ ### Mouse Control
226
+
227
+ ```bash
228
+ agent-browser mouse move <x> <y> # Move mouse
229
+ agent-browser mouse down [button] # Press button (left/right/middle)
230
+ agent-browser mouse up [button] # Release button
231
+ agent-browser mouse wheel <dy> [dx] # Scroll wheel
232
+ ```
233
+
234
+ ### Browser Settings
235
+
236
+ ```bash
237
+ agent-browser set viewport <w> <h> # Set viewport size
238
+ agent-browser set device <name> # Emulate device ("iPhone 14")
239
+ agent-browser set geo <lat> <lng> # Set geolocation
240
+ agent-browser set offline [on|off] # Toggle offline mode
241
+ agent-browser set headers <json> # Extra HTTP headers
242
+ agent-browser set credentials <u> <p> # HTTP basic auth
243
+ agent-browser set media [dark|light] # Emulate color scheme
244
+ ```
245
+
246
+ ### Cookies & Storage
247
+
248
+ ```bash
249
+ agent-browser cookies # Get all cookies
250
+ agent-browser cookies set <name> <val> # Set cookie
251
+ agent-browser cookies clear # Clear cookies
252
+
253
+ agent-browser storage local # Get all localStorage
254
+ agent-browser storage local <key> # Get specific key
255
+ agent-browser storage local set <k> <v> # Set value
256
+ agent-browser storage local clear # Clear all
257
+
258
+ agent-browser storage session # Same for sessionStorage
259
+ ```
260
+
261
+ ### Network
262
+
263
+ ```bash
264
+ agent-browser network route <url> # Intercept requests
265
+ agent-browser network route <url> --abort # Block requests
266
+ agent-browser network route <url> --body <json> # Mock response
267
+ agent-browser network unroute [url] # Remove routes
268
+ agent-browser network requests # View tracked requests
269
+ agent-browser network requests --filter api # Filter requests
270
+ ```
271
+
272
+ ### Tabs & Windows
273
+
274
+ ```bash
275
+ agent-browser tab # List tabs
276
+ agent-browser tab new [url] # New tab (optionally with URL)
277
+ agent-browser tab <n> # Switch to tab n
278
+ agent-browser tab close [n] # Close tab
279
+ agent-browser window new # New window
280
+ ```
281
+
282
+ ### Frames
283
+
284
+ ```bash
285
+ agent-browser frame <sel> # Switch to iframe
286
+ agent-browser frame main # Back to main frame
287
+ ```
288
+
289
+ ### Dialogs
290
+
291
+ ```bash
292
+ agent-browser dialog accept [text] # Accept (with optional prompt text)
293
+ agent-browser dialog dismiss # Dismiss
294
+ ```
295
+
296
+ ### Diff
297
+
298
+ ```bash
299
+ agent-browser diff snapshot # Compare current vs last snapshot
300
+ agent-browser diff snapshot --baseline before.txt # Compare current vs saved snapshot file
301
+ agent-browser diff snapshot --selector "#main" --compact # Scoped snapshot diff
302
+ agent-browser diff screenshot --baseline before.png # Visual pixel diff against baseline
303
+ agent-browser diff screenshot --baseline b.png -o d.png # Save diff image to custom path
304
+ agent-browser diff screenshot --baseline b.png -t 0.2 # Adjust color threshold (0-1)
305
+ agent-browser diff url https://v1.com https://v2.com # Compare two URLs (snapshot diff)
306
+ agent-browser diff url https://v1.com https://v2.com --screenshot # Also visual diff
307
+ agent-browser diff url https://v1.com https://v2.com --wait-until networkidle # Custom wait strategy
308
+ agent-browser diff url https://v1.com https://v2.com --selector "#main" # Scope to element
309
+ ```
310
+
311
+ ### Debug
312
+
313
+ ```bash
314
+ agent-browser trace start [path] # Start recording trace
315
+ agent-browser trace stop [path] # Stop and save trace
316
+ agent-browser profiler start # Start Chrome DevTools profiling
317
+ agent-browser profiler stop [path] # Stop and save profile (.json)
318
+ agent-browser console # View console messages (log, error, warn, info)
319
+ agent-browser console --clear # Clear console
320
+ agent-browser errors # View page errors (uncaught JavaScript exceptions)
321
+ agent-browser errors --clear # Clear errors
322
+ agent-browser highlight <sel> # Highlight element
323
+ agent-browser state save <path> # Save auth state
324
+ agent-browser state load <path> # Load auth state
325
+ agent-browser state list # List saved state files
326
+ agent-browser state show <file> # Show state summary
327
+ agent-browser state rename <old> <new> # Rename state file
328
+ agent-browser state clear [name] # Clear states for session
329
+ agent-browser state clear --all # Clear all saved states
330
+ agent-browser state clean --older-than <days> # Delete old states
331
+ ```
332
+
333
+ ### Navigation
334
+
335
+ ```bash
336
+ agent-browser back # Go back
337
+ agent-browser forward # Go forward
338
+ agent-browser reload # Reload page
339
+ ```
340
+
341
+ ### Setup
342
+
343
+ ```bash
344
+ agent-browser install # Download Chromium browser
345
+ agent-browser install --with-deps # Also install system deps (Linux)
346
+ ```
347
+
348
+ ## Sessions
349
+
350
+ Run multiple isolated browser instances:
351
+
352
+ ```bash
353
+ # Different sessions
354
+ agent-browser --session agent1 open site-a.com
355
+ agent-browser --session agent2 open site-b.com
356
+
357
+ # Or via environment variable
358
+ AGENT_BROWSER_SESSION=agent1 agent-browser click "#btn"
359
+
360
+ # List active sessions
361
+ agent-browser session list
362
+ # Output:
363
+ # Active sessions:
364
+ # -> default
365
+ # agent1
366
+
367
+ # Show current session
368
+ agent-browser session
369
+ ```
370
+
371
+ Each session has its own:
372
+ - Browser instance
373
+ - Cookies and storage
374
+ - Navigation history
375
+ - Authentication state
376
+
377
+ ## Persistent Profiles
378
+
379
+ By default, browser state (cookies, localStorage, login sessions) is ephemeral and lost when the browser closes. Use `--profile` to persist state across browser restarts:
380
+
381
+ ```bash
382
+ # Use a persistent profile directory
383
+ agent-browser --profile ~/.myapp-profile open myapp.com
384
+
385
+ # Login once, then reuse the authenticated session
386
+ agent-browser --profile ~/.myapp-profile open myapp.com/dashboard
387
+
388
+ # Or via environment variable
389
+ AGENT_BROWSER_PROFILE=~/.myapp-profile agent-browser open myapp.com
390
+ ```
391
+
392
+ The profile directory stores:
393
+ - Cookies and localStorage
394
+ - IndexedDB data
395
+ - Service workers
396
+ - Browser cache
397
+ - Login sessions
398
+
399
+ **Tip**: Use different profile paths for different projects to keep their browser state isolated.
400
+
401
+ ## Session Persistence
402
+
403
+ Alternatively, use `--session-name` to automatically save and restore cookies and localStorage across browser restarts:
404
+
405
+ ```bash
406
+ # Auto-save/load state for "twitter" session
407
+ agent-browser --session-name twitter open twitter.com
408
+
409
+ # Login once, then state persists automatically
410
+ # State files stored in ~/.agent-browser/sessions/
411
+
412
+ # Or via environment variable
413
+ export AGENT_BROWSER_SESSION_NAME=twitter
414
+ agent-browser open twitter.com
415
+ ```
416
+
417
+ ### State Encryption
418
+
419
+ Encrypt saved session data at rest with AES-256-GCM:
420
+
421
+ ```bash
422
+ # Generate key: openssl rand -hex 32
423
+ export AGENT_BROWSER_ENCRYPTION_KEY=<64-char-hex-key>
424
+
425
+ # State files are now encrypted automatically
426
+ agent-browser --session-name secure open example.com
427
+ ```
428
+
429
+ | Variable | Description |
430
+ |----------|-------------|
431
+ | `AGENT_BROWSER_SESSION_NAME` | Auto-save/load state persistence name |
432
+ | `AGENT_BROWSER_ENCRYPTION_KEY` | 64-char hex key for AES-256-GCM encryption |
433
+ | `AGENT_BROWSER_STATE_EXPIRE_DAYS` | Auto-delete states older than N days (default: 30) |
434
+
435
+ ## Snapshot Options
436
+
437
+ The `snapshot` command supports filtering to reduce output size:
438
+
439
+ ```bash
440
+ agent-browser snapshot # Full accessibility tree
441
+ agent-browser snapshot -i # Interactive elements only (buttons, inputs, links)
442
+ agent-browser snapshot -i -C # Include cursor-interactive elements (divs with onclick, etc.)
443
+ agent-browser snapshot -c # Compact (remove empty structural elements)
444
+ agent-browser snapshot -d 3 # Limit depth to 3 levels
445
+ agent-browser snapshot -s "#main" # Scope to CSS selector
446
+ agent-browser snapshot -i -c -d 5 # Combine options
447
+ ```
448
+
449
+ | Option | Description |
450
+ |--------|-------------|
451
+ | `-i, --interactive` | Only show interactive elements (buttons, links, inputs) |
452
+ | `-C, --cursor` | Include cursor-interactive elements (cursor:pointer, onclick, tabindex) |
453
+ | `-c, --compact` | Remove empty structural elements |
454
+ | `-d, --depth <n>` | Limit tree depth |
455
+ | `-s, --selector <sel>` | Scope to CSS selector |
456
+
457
+ The `-C` flag is useful for modern web apps that use custom clickable elements (divs, spans) instead of standard buttons/links.
458
+
459
+ ## Annotated Screenshots
460
+
461
+ The `--annotate` flag overlays numbered labels on interactive elements in the screenshot. Each label `[N]` corresponds to ref `@eN`, so the same refs work for both visual and text-based workflows.
462
+
463
+ ```bash
464
+ agent-browser screenshot --annotate
465
+ # -> Screenshot saved to /tmp/screenshot-2026-02-17T12-00-00-abc123.png
466
+ # [1] @e1 button "Submit"
467
+ # [2] @e2 link "Home"
468
+ # [3] @e3 textbox "Email"
469
+ ```
470
+
471
+ After an annotated screenshot, refs are cached so you can immediately interact with elements:
472
+
473
+ ```bash
474
+ agent-browser screenshot --annotate ./page.png
475
+ agent-browser click @e2 # Click the "Home" link labeled [2]
476
+ ```
477
+
478
+ This is useful for multimodal AI models that can reason about visual layout, unlabeled icon buttons, canvas elements, or visual state that the text accessibility tree cannot capture.
479
+
480
+ ## Options
481
+
482
+ | Option | Description |
483
+ |--------|-------------|
484
+ | `--session <name>` | Use isolated session (or `AGENT_BROWSER_SESSION` env) |
485
+ | `--session-name <name>` | Auto-save/restore session state (or `AGENT_BROWSER_SESSION_NAME` env) |
486
+ | `--profile <path>` | Persistent browser profile directory (or `AGENT_BROWSER_PROFILE` env) |
487
+ | `--state <path>` | Load storage state from JSON file (or `AGENT_BROWSER_STATE` env) |
488
+ | `--headers <json>` | Set HTTP headers scoped to the URL's origin |
489
+ | `--executable-path <path>` | Custom browser executable (or `AGENT_BROWSER_EXECUTABLE_PATH` env) |
490
+ | `--extension <path>` | Load browser extension (repeatable; or `AGENT_BROWSER_EXTENSIONS` env) |
491
+ | `--args <args>` | Browser launch args, comma or newline separated (or `AGENT_BROWSER_ARGS` env) |
492
+ | `--user-agent <ua>` | Custom User-Agent string (or `AGENT_BROWSER_USER_AGENT` env) |
493
+ | `--proxy <url>` | Proxy server URL with optional auth (or `AGENT_BROWSER_PROXY` env) |
494
+ | `--proxy-bypass <hosts>` | Hosts to bypass proxy (or `AGENT_BROWSER_PROXY_BYPASS` env) |
495
+ | `--ignore-https-errors` | Ignore HTTPS certificate errors (useful for self-signed certs) |
496
+ | `--allow-file-access` | Allow file:// URLs to access local files (Chromium only) |
497
+ | `--stealth` | Stealth mode (default: on): local launch uses Chromium args + init scripts; CDP/provider uses init scripts |
498
+ | `-p, --provider <name>` | Cloud browser provider (or `AGENT_BROWSER_PROVIDER` env) |
499
+ | `--device <name>` | iOS device name, e.g. "iPhone 15 Pro" (or `AGENT_BROWSER_IOS_DEVICE` env) |
500
+ | `--json` | JSON output (for agents) |
501
+ | `--full, -f` | Full page screenshot |
502
+ | `--annotate` | Annotated screenshot with numbered element labels (or `AGENT_BROWSER_ANNOTATE` env) |
503
+ | `--headed` | Show browser window (not headless) |
504
+ | `--cdp <port\|url>` | Connect via Chrome DevTools Protocol (port or WebSocket URL) |
505
+ | `--auto-connect` | Auto-discover and connect to running Chrome (or `AGENT_BROWSER_AUTO_CONNECT` env) |
506
+ | `--color-scheme <scheme>` | Color scheme: `dark`, `light`, `no-preference` (or `AGENT_BROWSER_COLOR_SCHEME` env) |
507
+ | `--config <path>` | Use a custom config file (or `AGENT_BROWSER_CONFIG` env) |
508
+ | `--debug` | Debug output |
509
+
510
+ ## Configuration
511
+
512
+ Create an `agent-browser.json` file to set persistent defaults instead of repeating flags on every command.
513
+
514
+ **Locations (lowest to highest priority):**
515
+
516
+ 1. `~/.agent-browser/config.json` -- user-level defaults
517
+ 2. `./agent-browser.json` -- project-level overrides (in working directory)
518
+ 3. `AGENT_BROWSER_*` environment variables override config file values
519
+ 4. CLI flags override everything
520
+
521
+ **Example `agent-browser.json`:**
522
+
523
+ ```json
524
+ {
525
+ "headed": true,
526
+ "proxy": "http://localhost:8080",
527
+ "profile": "./browser-data",
528
+ "userAgent": "my-agent/1.0",
529
+ "ignoreHttpsErrors": true
530
+ }
531
+ ```
532
+
533
+ Use `--config <path>` or `AGENT_BROWSER_CONFIG` to load a specific config file instead of the defaults:
534
+
535
+ ```bash
536
+ agent-browser --config ./ci-config.json open example.com
537
+ AGENT_BROWSER_CONFIG=./ci-config.json agent-browser open example.com
538
+ ```
539
+
540
+ All options from the table above can be set in the config file using camelCase keys (e.g., `--executable-path` becomes `"executablePath"`, `--proxy-bypass` becomes `"proxyBypass"`). Unknown keys are ignored for forward compatibility.
541
+
542
+ Boolean flags accept an optional `true`/`false` value to override config settings. For example, `--headed false` disables `"headed": true` from config. A bare `--headed` is equivalent to `--headed true`.
543
+
544
+ Auto-discovered config files that are missing are silently ignored. If `--config <path>` points to a missing or invalid file, agent-browser exits with an error. Extensions from user and project configs are merged (concatenated), not replaced.
545
+
546
+ > **Tip:** If your project-level `agent-browser.json` contains environment-specific values (paths, proxies), consider adding it to `.gitignore`.
547
+
548
+ ## Default Timeout
549
+
550
+ The default Playwright timeout for standard operations (clicks, waits, fills, etc.) is 25 seconds. This is intentionally below the CLI's 30-second IPC read timeout so that Playwright returns a proper error instead of the CLI timing out with EAGAIN.
551
+
552
+ Override the default timeout via environment variable:
553
+
554
+ ```bash
555
+ # Set a longer timeout for slow pages (in milliseconds)
556
+ export AGENT_BROWSER_DEFAULT_TIMEOUT=45000
557
+ ```
558
+
559
+ > **Note:** Setting this above 30000 (30s) may cause EAGAIN errors on slow operations because the CLI's read timeout will expire before Playwright responds. The CLI retries transient errors automatically, but response times will increase.
560
+
561
+ | Variable | Description |
562
+ |----------|-------------|
563
+ | `AGENT_BROWSER_DEFAULT_TIMEOUT` | Default Playwright timeout in ms (default: 25000) |
564
+
565
+ ## Selectors
566
+
567
+ ### Refs (Recommended for AI)
568
+
569
+ Refs provide deterministic element selection from snapshots:
570
+
571
+ ```bash
572
+ # 1. Get snapshot with refs
573
+ agent-browser snapshot
574
+ # Output:
575
+ # - heading "Example Domain" [ref=e1] [level=1]
576
+ # - button "Submit" [ref=e2]
577
+ # - textbox "Email" [ref=e3]
578
+ # - link "Learn more" [ref=e4]
579
+
580
+ # 2. Use refs to interact
581
+ agent-browser click @e2 # Click the button
582
+ agent-browser fill @e3 "test@example.com" # Fill the textbox
583
+ agent-browser get text @e1 # Get heading text
584
+ agent-browser hover @e4 # Hover the link
585
+ ```
586
+
587
+ **Why use refs?**
588
+ - **Deterministic**: Ref points to exact element from snapshot
589
+ - **Fast**: No DOM re-query needed
590
+ - **AI-friendly**: Snapshot + ref workflow is optimal for LLMs
591
+
592
+ ### CSS Selectors
593
+
594
+ ```bash
595
+ agent-browser click "#id"
596
+ agent-browser click ".class"
597
+ agent-browser click "div > button"
598
+ ```
599
+
600
+ ### Text & XPath
601
+
602
+ ```bash
603
+ agent-browser click "text=Submit"
604
+ agent-browser click "xpath=//button"
605
+ ```
606
+
607
+ ### Semantic Locators
608
+
609
+ ```bash
610
+ agent-browser find role button click --name "Submit"
611
+ agent-browser find label "Email" fill "test@test.com"
612
+ ```
613
+
614
+ ## Agent Mode
615
+
616
+ Use `--json` for machine-readable output:
617
+
618
+ ```bash
619
+ agent-browser snapshot --json
620
+ # Returns: {"success":true,"data":{"snapshot":"...","refs":{"e1":{"role":"heading","name":"Title"},...}}}
621
+
622
+ agent-browser get text @e1 --json
623
+ agent-browser is visible @e2 --json
624
+ ```
625
+
626
+ ### Optimal AI Workflow
627
+
628
+ ```bash
629
+ # 1. Navigate and get snapshot
630
+ agent-browser open example.com
631
+ agent-browser snapshot -i --json # AI parses tree and refs
632
+
633
+ # 2. AI identifies target refs from snapshot
634
+ # 3. Execute actions using refs
635
+ agent-browser click @e2
636
+ agent-browser fill @e3 "input text"
637
+
638
+ # 4. Get new snapshot if page changed
639
+ agent-browser snapshot -i --json
640
+ ```
641
+
642
+ ### Command Chaining
643
+
644
+ Commands can be chained with `&&` in a single shell invocation. The browser persists via a background daemon, so chaining is safe and more efficient:
645
+
646
+ ```bash
647
+ # Open, wait for load, and snapshot in one call
648
+ agent-browser open example.com && agent-browser wait --load networkidle && agent-browser snapshot -i
649
+
650
+ # Chain multiple interactions
651
+ agent-browser fill @e1 "user@example.com" && agent-browser fill @e2 "pass" && agent-browser click @e3
652
+
653
+ # Navigate and screenshot
654
+ agent-browser open example.com && agent-browser wait --load networkidle && agent-browser screenshot page.png
655
+ ```
656
+
657
+ Use `&&` when you don't need intermediate output. Run commands separately when you need to parse output first (e.g., snapshot to discover refs before interacting).
658
+
659
+ ## Headed Mode
660
+
661
+ Show the browser window for debugging:
662
+
663
+ ```bash
664
+ agent-browser open example.com --headed
665
+ ```
666
+
667
+ This opens a visible browser window instead of running headless.
668
+
669
+ ## Authenticated Sessions
670
+
671
+ Use `--headers` to set HTTP headers for a specific origin, enabling authentication without login flows:
672
+
673
+ ```bash
674
+ # Headers are scoped to api.example.com only
675
+ agent-browser open api.example.com --headers '{"Authorization": "Bearer <token>"}'
676
+
677
+ # Requests to api.example.com include the auth header
678
+ agent-browser snapshot -i --json
679
+ agent-browser click @e2
680
+
681
+ # Navigate to another domain - headers are NOT sent (safe!)
682
+ agent-browser open other-site.com
683
+ ```
684
+
685
+ This is useful for:
686
+ - **Skipping login flows** - Authenticate via headers instead of UI
687
+ - **Switching users** - Start new sessions with different auth tokens
688
+ - **API testing** - Access protected endpoints directly
689
+ - **Security** - Headers are scoped to the origin, not leaked to other domains
690
+
691
+ To set headers for multiple origins, use `--headers` with each `open` command:
692
+
693
+ ```bash
694
+ agent-browser open api.example.com --headers '{"Authorization": "Bearer token1"}'
695
+ agent-browser open api.acme.com --headers '{"Authorization": "Bearer token2"}'
696
+ ```
697
+
698
+ For global headers (all domains), use `set headers`:
699
+
700
+ ```bash
701
+ agent-browser set headers '{"X-Custom-Header": "value"}'
702
+ ```
703
+
704
+ ## Custom Browser Executable
705
+
706
+ Use a custom browser executable instead of the bundled Chromium. This is useful for:
707
+ - **Serverless deployment**: Use lightweight Chromium builds like `@sparticuz/chromium` (~50MB vs ~684MB)
708
+ - **System browsers**: Use an existing Chrome/Chromium installation
709
+ - **Custom builds**: Use modified browser builds
710
+
711
+ ### CLI Usage
712
+
713
+ ```bash
714
+ # Via flag
715
+ agent-browser --executable-path /path/to/chromium open example.com
716
+
717
+ # Via environment variable
718
+ AGENT_BROWSER_EXECUTABLE_PATH=/path/to/chromium agent-browser open example.com
719
+ ```
720
+
721
+ ### Serverless Example (Vercel/AWS Lambda)
722
+
723
+ ```typescript
724
+ import chromium from '@sparticuz/chromium';
725
+ import { BrowserManager } from 'agent-browser-stealth';
726
+
727
+ export async function handler() {
728
+ const browser = new BrowserManager();
729
+ await browser.launch({
730
+ executablePath: await chromium.executablePath(),
731
+ headless: true,
732
+ });
733
+ // ... use browser
734
+ }
735
+ ```
736
+
737
+ ## Local Files
738
+
739
+ Open and interact with local files (PDFs, HTML, etc.) using `file://` URLs:
740
+
741
+ ```bash
742
+ # Enable file access (required for JavaScript to access local files)
743
+ agent-browser --allow-file-access open file:///path/to/document.pdf
744
+ agent-browser --allow-file-access open file:///path/to/page.html
745
+
746
+ # Take screenshot of a local PDF
747
+ agent-browser --allow-file-access open file:///Users/me/report.pdf
748
+ agent-browser screenshot report.png
749
+ ```
750
+
751
+ The `--allow-file-access` flag adds Chromium flags (`--allow-file-access-from-files`, `--allow-file-access`) that allow `file://` URLs to:
752
+ - Load and render local files
753
+ - Access other local files via JavaScript (XHR, fetch)
754
+ - Load local resources (images, scripts, stylesheets)
755
+
756
+ **Note:** This flag only works with Chromium. For security, it's disabled by default.
757
+
758
+ ## Stealth Mode
759
+
760
+ Stealth mode is **enabled by default**. It patches common detection vectors to make the browser appear like a regular user session, preventing websites from blocking automation.
761
+
762
+ ```bash
763
+ # Stealth is on by default -- just use normally
764
+ agent-browser open example.com
765
+
766
+ # Disable stealth if needed
767
+ agent-browser --stealth false open example.com
768
+
769
+ # Or disable via environment variable
770
+ export AGENT_BROWSER_STEALTH=false
771
+
772
+ # Or disable in config file
773
+ # agent-browser.json: {"stealth": false}
774
+ ```
775
+
776
+ Stealth mode applies the following countermeasures:
777
+ - Removes `navigator.webdriver` automation indicator
778
+ - Disables Chromium's `AutomationControlled` blink feature
779
+ - Adds realistic `navigator.plugins` (Chrome PDF Plugin, etc.)
780
+ - Patches `window.chrome.runtime` to match real Chrome
781
+ - Masks WebGL vendor/renderer when SwiftShader is detected
782
+ - Fixes `navigator.permissions.query` for notifications
783
+ - Reports realistic `navigator.hardwareConcurrency`
784
+ - Provides default media devices for `enumerateDevices()`
785
+ - Cleans up CDP-injected properties on the document
786
+
787
+ Stealth capability matrix:
788
+
789
+ <table>
790
+ <thead>
791
+ <tr><th>Connection type</th><th>Stealth capabilities</th></tr>
792
+ </thead>
793
+ <tbody>
794
+ <tr><td>Local launch</td><td>Chromium launch args (<code>--disable-blink-features=AutomationControlled</code>) + context init scripts</td></tr>
795
+ <tr><td>CDP / auto-connect</td><td>Context init scripts</td></tr>
796
+ <tr><td>Cloud providers</td><td>Context init scripts (Kernel may also apply provider-managed stealth)</td></tr>
797
+ </tbody>
798
+ </table>
799
+
800
+ Use <code>--debug</code> to print the active stealth connection type and capabilities at launch time.
801
+
802
+ ### Humanized Interactions
803
+
804
+ In addition to stealth patches, agent-browser automatically humanizes interactions to avoid behavioral detection:
805
+
806
+ - **Randomized typing** -- When using `type --delay`, each keystroke delay varies by +-40% so timing appears natural rather than mechanical
807
+ - **Random wait ranges** -- `wait 2000-5000` pauses for a random duration between 2 and 5 seconds
808
+ - **Bezier curve mouse movement** -- Before every `click`, the mouse moves to the target element along a randomized cubic Bezier curve with natural-looking control points
809
+
810
+ These behaviors are always active and require no additional flags.
811
+
812
+ ## CDP Mode
813
+
814
+ Connect to an existing browser via Chrome DevTools Protocol:
815
+
816
+ ```bash
817
+ # Start Chrome with: google-chrome --remote-debugging-port=9222
818
+
819
+ # Connect once, then run commands without --cdp
820
+ agent-browser connect 9222
821
+ agent-browser snapshot
822
+ agent-browser tab
823
+ agent-browser close
824
+
825
+ # Or pass --cdp on each command
826
+ agent-browser --cdp 9222 snapshot
827
+
828
+ # Connect to remote browser via WebSocket URL
829
+ agent-browser --cdp "wss://your-browser-service.com/cdp?token=..." snapshot
830
+ ```
831
+
832
+ The `--cdp` flag accepts either:
833
+ - A port number (e.g., `9222`) for local connections via `http://localhost:{port}`
834
+ - A full WebSocket URL (e.g., `wss://...` or `ws://...`) for remote browser services
835
+
836
+ This enables control of:
837
+ - Electron apps
838
+ - Chrome/Chromium instances with remote debugging
839
+ - WebView2 applications
840
+ - Any browser exposing a CDP endpoint
841
+
842
+ ### Auto-Connect
843
+
844
+ Use `--auto-connect` to automatically discover and connect to a running Chrome instance without specifying a port:
845
+
846
+ ```bash
847
+ # Auto-discover running Chrome with remote debugging
848
+ agent-browser --auto-connect open example.com
849
+ agent-browser --auto-connect snapshot
850
+
851
+ # Or via environment variable
852
+ AGENT_BROWSER_AUTO_CONNECT=1 agent-browser snapshot
853
+ ```
854
+
855
+ Auto-connect discovers Chrome by:
856
+ 1. Reading Chrome's `DevToolsActivePort` file from the default user data directory
857
+ 2. Falling back to probing common debugging ports (9222, 9229)
858
+
859
+ This is useful when:
860
+ - Chrome 144+ has remote debugging enabled via `chrome://inspect/#remote-debugging` (which uses a dynamic port)
861
+ - You want a zero-configuration connection to your existing browser
862
+ - You don't want to track which port Chrome is using
863
+
864
+ ## Streaming (Browser Preview)
865
+
866
+ Stream the browser viewport via WebSocket for live preview or "pair browsing" where a human can watch and interact alongside an AI agent.
867
+
868
+ ### Enable Streaming
869
+
870
+ Set the `AGENT_BROWSER_STREAM_PORT` environment variable:
871
+
872
+ ```bash
873
+ AGENT_BROWSER_STREAM_PORT=9223 agent-browser open example.com
874
+ ```
875
+
876
+ This starts a WebSocket server on the specified port that streams the browser viewport and accepts input events.
877
+
878
+ ### WebSocket Protocol
879
+
880
+ Connect to `ws://localhost:9223` to receive frames and send input:
881
+
882
+ **Receive frames:**
883
+ ```json
884
+ {
885
+ "type": "frame",
886
+ "data": "<base64-encoded-jpeg>",
887
+ "metadata": {
888
+ "deviceWidth": 1280,
889
+ "deviceHeight": 720,
890
+ "pageScaleFactor": 1,
891
+ "offsetTop": 0,
892
+ "scrollOffsetX": 0,
893
+ "scrollOffsetY": 0
894
+ }
895
+ }
896
+ ```
897
+
898
+ **Send mouse events:**
899
+ ```json
900
+ {
901
+ "type": "input_mouse",
902
+ "eventType": "mousePressed",
903
+ "x": 100,
904
+ "y": 200,
905
+ "button": "left",
906
+ "clickCount": 1
907
+ }
908
+ ```
909
+
910
+ **Send keyboard events:**
911
+ ```json
912
+ {
913
+ "type": "input_keyboard",
914
+ "eventType": "keyDown",
915
+ "key": "Enter",
916
+ "code": "Enter"
917
+ }
918
+ ```
919
+
920
+ **Send touch events:**
921
+ ```json
922
+ {
923
+ "type": "input_touch",
924
+ "eventType": "touchStart",
925
+ "touchPoints": [{ "x": 100, "y": 200 }]
926
+ }
927
+ ```
928
+
929
+ ### Programmatic API
930
+
931
+ For advanced use, control streaming directly via the protocol:
932
+
933
+ ```typescript
934
+ import { BrowserManager } from 'agent-browser-stealth';
935
+
936
+ const browser = new BrowserManager();
937
+ await browser.launch({ headless: true });
938
+ await browser.navigate('https://example.com');
939
+
940
+ // Start screencast
941
+ await browser.startScreencast((frame) => {
942
+ // frame.data is base64-encoded image
943
+ // frame.metadata contains viewport info
944
+ console.log('Frame received:', frame.metadata.deviceWidth, 'x', frame.metadata.deviceHeight);
945
+ }, {
946
+ format: 'jpeg',
947
+ quality: 80,
948
+ maxWidth: 1280,
949
+ maxHeight: 720,
950
+ });
951
+
952
+ // Inject mouse events
953
+ await browser.injectMouseEvent({
954
+ type: 'mousePressed',
955
+ x: 100,
956
+ y: 200,
957
+ button: 'left',
958
+ });
959
+
960
+ // Inject keyboard events
961
+ await browser.injectKeyboardEvent({
962
+ type: 'keyDown',
963
+ key: 'Enter',
964
+ code: 'Enter',
965
+ });
966
+
967
+ // Stop when done
968
+ await browser.stopScreencast();
969
+ ```
970
+
971
+ ## Architecture
972
+
973
+ agent-browser uses a client-daemon architecture:
974
+
975
+ 1. **Rust CLI** (fast native binary) - Parses commands, communicates with daemon
976
+ 2. **Node.js Daemon** - Manages Playwright browser instance
977
+ 3. **Fallback** - If native binary unavailable, uses Node.js directly
978
+
979
+ The daemon starts automatically on first command and persists between commands for fast subsequent operations.
980
+
981
+ **Browser Engine:** Uses Chromium by default. The daemon also supports Firefox and WebKit via the Playwright protocol.
982
+
983
+ ## Platforms
984
+
985
+ | Platform | Binary | Fallback |
986
+ |----------|--------|----------|
987
+ | macOS ARM64 | Native Rust | Node.js |
988
+ | macOS x64 | Native Rust | Node.js |
989
+ | Linux ARM64 | Native Rust | Node.js |
990
+ | Linux x64 | Native Rust | Node.js |
991
+ | Windows x64 | Native Rust | Node.js |
992
+
993
+ ## Usage with AI Agents
994
+
995
+ ### Just ask the agent
996
+
997
+ The simplest approach -- just tell your agent to use it:
998
+
999
+ ```
1000
+ Use agent-browser to test the login flow. Run agent-browser --help to see available commands.
1001
+ ```
1002
+
1003
+ The `--help` output is comprehensive and most agents can figure it out from there.
1004
+
1005
+ ### AI Coding Assistants (recommended)
1006
+
1007
+ Add the skill to your AI coding assistant for richer context:
1008
+
1009
+ ```bash
1010
+ npx skills add leeguooooo/agent-browser
1011
+ ```
1012
+
1013
+ This works with Claude Code, Codex, Cursor, Gemini CLI, GitHub Copilot, Goose, OpenCode, and Windsurf. The skill is fetched from the repository, so it stays up to date automatically -- do not copy `SKILL.md` from `node_modules` as it will become stale.
1014
+
1015
+ ### Claude Code
1016
+
1017
+ Install as a Claude Code skill:
1018
+
1019
+ ```bash
1020
+ npx skills add leeguooooo/agent-browser
1021
+ ```
1022
+
1023
+ This adds the skill to `.claude/skills/agent-browser/SKILL.md` in your project. The skill teaches Claude Code the full agent-browser workflow, including the snapshot-ref interaction pattern, session management, and timeout handling.
1024
+
1025
+ ### AGENTS.md / CLAUDE.md
1026
+
1027
+ For more consistent results, add to your project or global instructions file:
1028
+
1029
+ ```markdown
1030
+ ## Browser Automation
1031
+
1032
+ Use `agent-browser` for web automation. Run `agent-browser --help` for all commands.
1033
+
1034
+ Core workflow:
1035
+ 1. `agent-browser open <url>` - Navigate to page
1036
+ 2. `agent-browser snapshot -i` - Get interactive elements with refs (@e1, @e2)
1037
+ 3. `agent-browser click @e1` / `fill @e2 "text"` - Interact using refs
1038
+ 4. Re-snapshot after page changes
1039
+ ```
1040
+
1041
+ ## Integrations
1042
+
1043
+ ### iOS Simulator
1044
+
1045
+ Control real Mobile Safari in the iOS Simulator for authentic mobile web testing. Requires macOS with Xcode.
1046
+
1047
+ **Setup:**
1048
+
1049
+ ```bash
1050
+ # Install Appium and XCUITest driver
1051
+ npm install -g appium
1052
+ appium driver install xcuitest
1053
+ ```
1054
+
1055
+ **Usage:**
1056
+
1057
+ ```bash
1058
+ # List available iOS simulators
1059
+ agent-browser device list
1060
+
1061
+ # Launch Safari on a specific device
1062
+ agent-browser -p ios --device "iPhone 16 Pro" open https://example.com
1063
+
1064
+ # Same commands as desktop
1065
+ agent-browser -p ios snapshot -i
1066
+ agent-browser -p ios tap @e1
1067
+ agent-browser -p ios fill @e2 "text"
1068
+ agent-browser -p ios screenshot mobile.png
1069
+
1070
+ # Mobile-specific commands
1071
+ agent-browser -p ios swipe up
1072
+ agent-browser -p ios swipe down 500
1073
+
1074
+ # Close session
1075
+ agent-browser -p ios close
1076
+ ```
1077
+
1078
+ Or use environment variables:
1079
+
1080
+ ```bash
1081
+ export AGENT_BROWSER_PROVIDER=ios
1082
+ export AGENT_BROWSER_IOS_DEVICE="iPhone 16 Pro"
1083
+ agent-browser open https://example.com
1084
+ ```
1085
+
1086
+ | Variable | Description |
1087
+ |----------|-------------|
1088
+ | `AGENT_BROWSER_PROVIDER` | Set to `ios` to enable iOS mode |
1089
+ | `AGENT_BROWSER_IOS_DEVICE` | Device name (e.g., "iPhone 16 Pro", "iPad Pro") |
1090
+ | `AGENT_BROWSER_IOS_UDID` | Device UDID (alternative to device name) |
1091
+
1092
+ **Supported devices:** All iOS Simulators available in Xcode (iPhones, iPads), plus real iOS devices.
1093
+
1094
+ **Note:** The iOS provider boots the simulator, starts Appium, and controls Safari. First launch takes ~30-60 seconds; subsequent commands are fast.
1095
+
1096
+ #### Real Device Support
1097
+
1098
+ Appium also supports real iOS devices connected via USB. This requires additional one-time setup:
1099
+
1100
+ **1. Get your device UDID:**
1101
+ ```bash
1102
+ xcrun xctrace list devices
1103
+ # or
1104
+ system_profiler SPUSBDataType | grep -A 5 "iPhone\|iPad"
1105
+ ```
1106
+
1107
+ **2. Sign WebDriverAgent (one-time):**
1108
+ ```bash
1109
+ # Open the WebDriverAgent Xcode project
1110
+ cd ~/.appium/node_modules/appium-xcuitest-driver/node_modules/appium-webdriveragent
1111
+ open WebDriverAgent.xcodeproj
1112
+ ```
1113
+
1114
+ In Xcode:
1115
+ - Select the `WebDriverAgentRunner` target
1116
+ - Go to Signing & Capabilities
1117
+ - Select your Team (requires Apple Developer account, free tier works)
1118
+ - Let Xcode manage signing automatically
1119
+
1120
+ **3. Use with agent-browser:**
1121
+ ```bash
1122
+ # Connect device via USB, then:
1123
+ agent-browser -p ios --device "<DEVICE_UDID>" open https://example.com
1124
+
1125
+ # Or use the device name if unique
1126
+ agent-browser -p ios --device "John's iPhone" open https://example.com
1127
+ ```
1128
+
1129
+ **Real device notes:**
1130
+ - First run installs WebDriverAgent to the device (may require Trust prompt)
1131
+ - Device must be unlocked and connected via USB
1132
+ - Slightly slower initial connection than simulator
1133
+ - Tests against real Safari performance and behavior
1134
+
1135
+ ### Browserbase
1136
+
1137
+ [Browserbase](https://browserbase.com) provides remote browser infrastructure to make deployment of agentic browsing agents easy. Use it when running the agent-browser CLI in an environment where a local browser isn't feasible.
1138
+
1139
+ To enable Browserbase, use the `-p` flag:
1140
+
1141
+ ```bash
1142
+ export BROWSERBASE_API_KEY="your-api-key"
1143
+ export BROWSERBASE_PROJECT_ID="your-project-id"
1144
+ agent-browser -p browserbase open https://example.com
1145
+ ```
1146
+
1147
+ Or use environment variables for CI/scripts:
1148
+
1149
+ ```bash
1150
+ export AGENT_BROWSER_PROVIDER=browserbase
1151
+ export BROWSERBASE_API_KEY="your-api-key"
1152
+ export BROWSERBASE_PROJECT_ID="your-project-id"
1153
+ agent-browser open https://example.com
1154
+ ```
1155
+
1156
+ When enabled, agent-browser connects to a Browserbase session instead of launching a local browser. All commands work identically.
1157
+
1158
+ Get your API key and project ID from the [Browserbase Dashboard](https://browserbase.com/overview).
1159
+
1160
+ ### Browser Use
1161
+
1162
+ [Browser Use](https://browser-use.com) provides cloud browser infrastructure for AI agents. Use it when running agent-browser in environments where a local browser isn't available (serverless, CI/CD, etc.).
1163
+
1164
+ To enable Browser Use, use the `-p` flag:
1165
+
1166
+ ```bash
1167
+ export BROWSER_USE_API_KEY="your-api-key"
1168
+ agent-browser -p browseruse open https://example.com
1169
+ ```
1170
+
1171
+ Or use environment variables for CI/scripts:
1172
+
1173
+ ```bash
1174
+ export AGENT_BROWSER_PROVIDER=browseruse
1175
+ export BROWSER_USE_API_KEY="your-api-key"
1176
+ agent-browser open https://example.com
1177
+ ```
1178
+
1179
+ When enabled, agent-browser connects to a Browser Use cloud session instead of launching a local browser. All commands work identically.
1180
+
1181
+ Get your API key from the [Browser Use Cloud Dashboard](https://cloud.browser-use.com/settings?tab=api-keys). Free credits are available to get started, with pay-as-you-go pricing after.
1182
+
1183
+ ### Kernel
1184
+
1185
+ [Kernel](https://www.kernel.sh) provides cloud browser infrastructure for AI agents with features like stealth mode and persistent profiles.
1186
+
1187
+ To enable Kernel, use the `-p` flag:
1188
+
1189
+ ```bash
1190
+ export KERNEL_API_KEY="your-api-key"
1191
+ agent-browser -p kernel open https://example.com
1192
+ ```
1193
+
1194
+ Or use environment variables for CI/scripts:
1195
+
1196
+ ```bash
1197
+ export AGENT_BROWSER_PROVIDER=kernel
1198
+ export KERNEL_API_KEY="your-api-key"
1199
+ agent-browser open https://example.com
1200
+ ```
1201
+
1202
+ Optional configuration via environment variables:
1203
+
1204
+ | Variable | Description | Default |
1205
+ |----------|-------------|---------|
1206
+ | `KERNEL_HEADLESS` | Run browser in headless mode (`true`/`false`) | `false` |
1207
+ | `KERNEL_STEALTH` | Enable stealth mode to avoid bot detection (`true`/`false`) | `true` |
1208
+ | `KERNEL_TIMEOUT_SECONDS` | Session timeout in seconds | `300` |
1209
+ | `KERNEL_PROFILE_NAME` | Browser profile name for persistent cookies/logins (created if it doesn't exist) | (none) |
1210
+
1211
+ When enabled, agent-browser connects to a Kernel cloud session instead of launching a local browser. All commands work identically.
1212
+
1213
+ **Profile Persistence:** When `KERNEL_PROFILE_NAME` is set, the profile will be created if it doesn't already exist. Cookies, logins, and session data are automatically saved back to the profile when the browser session ends, making them available for future sessions.
1214
+
1215
+ Get your API key from the [Kernel Dashboard](https://dashboard.onkernel.com).
1216
+
1217
+ ## License
1218
+
1219
+ Apache-2.0