agent-browser-stealth 0.14.0-fork.3

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (80) hide show
  1. package/LICENSE +201 -0
  2. package/README.md +1214 -0
  3. package/bin/agent-browser-darwin-arm64 +0 -0
  4. package/bin/agent-browser-darwin-x64 +0 -0
  5. package/bin/agent-browser-linux-arm64 +0 -0
  6. package/bin/agent-browser-linux-x64 +0 -0
  7. package/bin/agent-browser-win32-x64.exe +0 -0
  8. package/bin/agent-browser.js +109 -0
  9. package/dist/actions.d.ts +17 -0
  10. package/dist/actions.d.ts.map +1 -0
  11. package/dist/actions.js +1977 -0
  12. package/dist/actions.js.map +1 -0
  13. package/dist/browser.d.ts +611 -0
  14. package/dist/browser.d.ts.map +1 -0
  15. package/dist/browser.js +2425 -0
  16. package/dist/browser.js.map +1 -0
  17. package/dist/daemon.d.ts +66 -0
  18. package/dist/daemon.d.ts.map +1 -0
  19. package/dist/daemon.js +632 -0
  20. package/dist/daemon.js.map +1 -0
  21. package/dist/diff.d.ts +18 -0
  22. package/dist/diff.d.ts.map +1 -0
  23. package/dist/diff.js +271 -0
  24. package/dist/diff.js.map +1 -0
  25. package/dist/encryption.d.ts +50 -0
  26. package/dist/encryption.d.ts.map +1 -0
  27. package/dist/encryption.js +85 -0
  28. package/dist/encryption.js.map +1 -0
  29. package/dist/ios-actions.d.ts +11 -0
  30. package/dist/ios-actions.d.ts.map +1 -0
  31. package/dist/ios-actions.js +228 -0
  32. package/dist/ios-actions.js.map +1 -0
  33. package/dist/ios-manager.d.ts +266 -0
  34. package/dist/ios-manager.d.ts.map +1 -0
  35. package/dist/ios-manager.js +1073 -0
  36. package/dist/ios-manager.js.map +1 -0
  37. package/dist/protocol.d.ts +26 -0
  38. package/dist/protocol.d.ts.map +1 -0
  39. package/dist/protocol.js +932 -0
  40. package/dist/protocol.js.map +1 -0
  41. package/dist/snapshot.d.ts +67 -0
  42. package/dist/snapshot.d.ts.map +1 -0
  43. package/dist/snapshot.js +514 -0
  44. package/dist/snapshot.js.map +1 -0
  45. package/dist/state-utils.d.ts +77 -0
  46. package/dist/state-utils.d.ts.map +1 -0
  47. package/dist/state-utils.js +178 -0
  48. package/dist/state-utils.js.map +1 -0
  49. package/dist/stealth.d.ts +29 -0
  50. package/dist/stealth.d.ts.map +1 -0
  51. package/dist/stealth.js +1103 -0
  52. package/dist/stealth.js.map +1 -0
  53. package/dist/stream-server.d.ts +117 -0
  54. package/dist/stream-server.d.ts.map +1 -0
  55. package/dist/stream-server.js +309 -0
  56. package/dist/stream-server.js.map +1 -0
  57. package/dist/types.d.ts +854 -0
  58. package/dist/types.d.ts.map +1 -0
  59. package/dist/types.js +2 -0
  60. package/dist/types.js.map +1 -0
  61. package/package.json +84 -0
  62. package/scripts/build-all-platforms.sh +68 -0
  63. package/scripts/check-creepjs-headless.js +137 -0
  64. package/scripts/check-sannysoft-webdriver.js +112 -0
  65. package/scripts/check-version-sync.js +39 -0
  66. package/scripts/copy-native.js +36 -0
  67. package/scripts/postinstall.js +275 -0
  68. package/scripts/sync-upstream.sh +142 -0
  69. package/scripts/sync-version.js +87 -0
  70. package/skills/agent-browser/SKILL.md +470 -0
  71. package/skills/agent-browser/references/authentication.md +202 -0
  72. package/skills/agent-browser/references/commands.md +263 -0
  73. package/skills/agent-browser/references/profiling.md +120 -0
  74. package/skills/agent-browser/references/proxy-support.md +194 -0
  75. package/skills/agent-browser/references/session-management.md +193 -0
  76. package/skills/agent-browser/references/snapshot-refs.md +194 -0
  77. package/skills/agent-browser/references/video-recording.md +173 -0
  78. package/skills/agent-browser/templates/authenticated-session.sh +100 -0
  79. package/skills/agent-browser/templates/capture-workflow.sh +69 -0
  80. package/skills/agent-browser/templates/form-automation.sh +62 -0
package/README.md ADDED
@@ -0,0 +1,1214 @@
1
+ # agent-browser
2
+
3
+ Stealth-first browser automation CLI engineered for anti-bot evasion. Fast Rust CLI with Node.js fallback.
4
+
5
+ Designed for production automation on detection-heavy sites:
6
+ - Always-on stealth (no opt-in flag)
7
+ - Browser and protocol-level anti-fingerprint patches
8
+ - Humanized interaction behavior by default
9
+ - Verified against CreepJS using the built-in check script
10
+
11
+ ## Installation
12
+
13
+ ### Global Installation (recommended)
14
+
15
+ Installs the native Rust binary for maximum performance:
16
+
17
+ ```bash
18
+ npm install -g agent-browser-stealth
19
+ agent-browser install # Download Chromium
20
+ ```
21
+
22
+ This is the fastest option -- commands run through the native Rust CLI directly with sub-millisecond parsing overhead.
23
+
24
+ ### Quick Start (no install)
25
+
26
+ Run directly with `npx` if you want to try it without installing globally:
27
+
28
+ ```bash
29
+ npx agent-browser-stealth install # Download Chromium (first time only)
30
+ npx agent-browser-stealth open example.com
31
+ ```
32
+
33
+ > **Note:** `npx` routes through Node.js before reaching the Rust CLI, so it is noticeably slower than a global install. For regular use, install globally.
34
+
35
+ ### Project Installation (local dependency)
36
+
37
+ For projects that want to pin the version in `package.json`:
38
+
39
+ ```bash
40
+ npm install agent-browser-stealth
41
+ npx agent-browser-stealth install
42
+ ```
43
+
44
+ Then use via `npx` or `package.json` scripts:
45
+
46
+ ```bash
47
+ npx agent-browser-stealth open example.com
48
+ ```
49
+
50
+ ### Homebrew (macOS)
51
+
52
+ ```bash
53
+ brew install agent-browser
54
+ agent-browser install # Download Chromium
55
+ ```
56
+
57
+ ### From Source
58
+
59
+ ```bash
60
+ git clone https://github.com/leeguooooo/agent-browser
61
+ cd agent-browser
62
+ pnpm install
63
+ pnpm build
64
+ pnpm build:native # Requires Rust (https://rustup.rs)
65
+ pnpm link --global # Makes agent-browser available globally
66
+ agent-browser install
67
+ ```
68
+
69
+ ### Fork Maintenance (Independent Release + Upstream Sync)
70
+
71
+ If you maintain a fork and publish your own CLI, use this workflow:
72
+
73
+ 1. Keep an upstream-tracking branch (`upstream-main`) for clean sync history.
74
+ 2. Keep your release branch (`main`) for production-ready code only.
75
+ 3. Merge upstream into short-lived sync branches, then open PRs into `main`.
76
+
77
+ One-time setup:
78
+
79
+ ```bash
80
+ git remote add upstream https://github.com/vercel-labs/agent-browser.git
81
+ git fetch upstream
82
+ ```
83
+
84
+ Regular sync:
85
+
86
+ ```bash
87
+ pnpm run sync:upstream:push
88
+ ```
89
+
90
+ This command:
91
+ - Fetches `upstream/main`
92
+ - Fast-forwards local `upstream-main`
93
+ - Creates `sync/YYYY-MM-DD` from local `main`
94
+ - Merges `upstream-main` into the sync branch
95
+ - Pushes the sync branch to `origin` (with `sync:upstream:push`)
96
+
97
+ If merge conflicts occur, resolve them on the sync branch and open a PR as usual.
98
+
99
+ Independent release checklist for forks:
100
+ - Use your own npm package name and CLI binary name (avoid conflicts with upstream package ownership).
101
+ - Update `repository`, `bugs`, and `homepage` in `package.json` to your fork.
102
+ - Configure npm Trusted Publishing (OIDC) for your package and repository workflow.
103
+ - Keep release tags and changelog in your own namespace/versioning policy.
104
+ - Use dual-version format: `<upstream>-fork.<fork>` (example: `0.14.0-fork.1`).
105
+ - `agent-browser --version` should show all three: full version, upstream version, and fork version.
106
+
107
+ ### Linux Dependencies
108
+
109
+ On Linux, install system dependencies:
110
+
111
+ ```bash
112
+ agent-browser install --with-deps
113
+ # or manually: npx playwright install-deps chromium
114
+ ```
115
+
116
+ ## Quick Start
117
+
118
+ ```bash
119
+ agent-browser open example.com
120
+ agent-browser snapshot # Get accessibility tree with refs
121
+ agent-browser click @e2 # Click by ref from snapshot
122
+ agent-browser fill @e3 "test@example.com" # Fill by ref
123
+ agent-browser get text @e1 # Get text by ref
124
+ agent-browser screenshot page.png
125
+ agent-browser --version # Includes upstream/fork metadata on fork builds
126
+ agent-browser close
127
+ ```
128
+
129
+ ### Traditional Selectors (also supported)
130
+
131
+ ```bash
132
+ agent-browser click "#submit"
133
+ agent-browser fill "#email" "test@example.com"
134
+ agent-browser find role button click --name "Submit"
135
+ ```
136
+
137
+ ## Commands
138
+
139
+ ### Core Commands
140
+
141
+ ```bash
142
+ agent-browser open <url> # Navigate to URL (aliases: goto, navigate)
143
+ agent-browser click <sel> # Click element (--new-tab to open in new tab)
144
+ agent-browser dblclick <sel> # Double-click element
145
+ agent-browser focus <sel> # Focus element
146
+ agent-browser type <sel> <text> [--delay <ms>] # Type into element
147
+ agent-browser fill <sel> <text> # Clear and fill
148
+ agent-browser press <key> # Press key (Enter, Tab, Control+a) (alias: key)
149
+ agent-browser keyboard type <text> [--delay <ms>] # Type with real keystrokes (no selector, current focus)
150
+ agent-browser keyboard inserttext <text> # Insert text without key events (no selector)
151
+ agent-browser keydown <key> # Hold key down
152
+ agent-browser keyup <key> # Release key
153
+ agent-browser hover <sel> # Hover element
154
+ agent-browser select <sel> <val> # Select dropdown option
155
+ agent-browser check <sel> # Check checkbox
156
+ agent-browser uncheck <sel> # Uncheck checkbox
157
+ agent-browser scroll <dir> [px] # Scroll (up/down/left/right)
158
+ agent-browser scrollintoview <sel> # Scroll element into view (alias: scrollinto)
159
+ agent-browser drag <src> <tgt> # Drag and drop
160
+ agent-browser upload <sel> <files> # Upload files
161
+ agent-browser screenshot [path] # Take screenshot (--full for full page, saves to a temporary directory if no path)
162
+ agent-browser screenshot --annotate # Annotated screenshot with numbered element labels
163
+ agent-browser pdf <path> # Save as PDF
164
+ agent-browser snapshot # Accessibility tree with refs (best for AI)
165
+ agent-browser eval <js> # Run JavaScript (-b for base64, --stdin for piped input)
166
+ agent-browser connect <port> # Connect to browser via CDP
167
+ agent-browser close # Close browser (aliases: quit, exit)
168
+ ```
169
+
170
+ ### Get Info
171
+
172
+ ```bash
173
+ agent-browser get text <sel> # Get text content
174
+ agent-browser get html <sel> # Get innerHTML
175
+ agent-browser get value <sel> # Get input value
176
+ agent-browser get attr <sel> <attr> # Get attribute
177
+ agent-browser get title # Get page title
178
+ agent-browser get url # Get current URL
179
+ agent-browser get count <sel> # Count matching elements
180
+ agent-browser get box <sel> # Get bounding box
181
+ agent-browser get styles <sel> # Get computed styles
182
+ ```
183
+
184
+ ### Check State
185
+
186
+ ```bash
187
+ agent-browser is visible <sel> # Check if visible
188
+ agent-browser is enabled <sel> # Check if enabled
189
+ agent-browser is checked <sel> # Check if checked
190
+ ```
191
+
192
+ ### Find Elements (Semantic Locators)
193
+
194
+ ```bash
195
+ agent-browser find role <role> <action> [value] # By ARIA role
196
+ agent-browser find text <text> <action> # By text content
197
+ agent-browser find label <label> <action> [value] # By label
198
+ agent-browser find placeholder <ph> <action> [value] # By placeholder
199
+ agent-browser find alt <text> <action> # By alt text
200
+ agent-browser find title <text> <action> # By title attr
201
+ agent-browser find testid <id> <action> [value] # By data-testid
202
+ agent-browser find first <sel> <action> [value] # First match
203
+ agent-browser find last <sel> <action> [value] # Last match
204
+ agent-browser find nth <n> <sel> <action> [value] # Nth match
205
+ ```
206
+
207
+ **Actions:** `click`, `fill`, `type`, `hover`, `focus`, `check`, `uncheck`, `text`
208
+
209
+ **Options:** `--name <name>` (filter role by accessible name), `--exact` (require exact text match)
210
+
211
+ **Examples:**
212
+ ```bash
213
+ agent-browser find role button click --name "Submit"
214
+ agent-browser find text "Sign In" click
215
+ agent-browser find label "Email" fill "test@test.com"
216
+ agent-browser find first ".item" click
217
+ agent-browser find nth 2 "a" text
218
+ ```
219
+
220
+ ### Wait
221
+
222
+ ```bash
223
+ agent-browser wait <selector> # Wait for element to be visible
224
+ agent-browser wait <ms> # Wait for time (milliseconds)
225
+ agent-browser wait 2000-5000 # Random wait between 2-5 seconds
226
+ agent-browser wait --text "Welcome" # Wait for text to appear
227
+ agent-browser wait --url "**/dash" # Wait for URL pattern
228
+ agent-browser wait --load networkidle # Wait for load state
229
+ agent-browser wait --fn "window.ready === true" # Wait for JS condition
230
+ ```
231
+
232
+ **Load states:** `load`, `domcontentloaded`, `networkidle`
233
+
234
+ ### Mouse Control
235
+
236
+ ```bash
237
+ agent-browser mouse move <x> <y> # Move mouse
238
+ agent-browser mouse down [button] # Press button (left/right/middle)
239
+ agent-browser mouse up [button] # Release button
240
+ agent-browser mouse wheel <dy> [dx] # Scroll wheel
241
+ ```
242
+
243
+ ### Browser Settings
244
+
245
+ ```bash
246
+ agent-browser set viewport <w> <h> # Set viewport size
247
+ agent-browser set device <name> # Emulate device ("iPhone 14")
248
+ agent-browser set geo <lat> <lng> # Set geolocation
249
+ agent-browser set offline [on|off] # Toggle offline mode
250
+ agent-browser set headers <json> # Extra HTTP headers
251
+ agent-browser set credentials <u> <p> # HTTP basic auth
252
+ agent-browser set media [dark|light] # Emulate color scheme
253
+ ```
254
+
255
+ ### Cookies & Storage
256
+
257
+ ```bash
258
+ agent-browser cookies # Get all cookies
259
+ agent-browser cookies set <name> <val> # Set cookie
260
+ agent-browser cookies clear # Clear cookies
261
+
262
+ agent-browser storage local # Get all localStorage
263
+ agent-browser storage local <key> # Get specific key
264
+ agent-browser storage local set <k> <v> # Set value
265
+ agent-browser storage local clear # Clear all
266
+
267
+ agent-browser storage session # Same for sessionStorage
268
+ ```
269
+
270
+ ### Network
271
+
272
+ ```bash
273
+ agent-browser network route <url> # Intercept requests
274
+ agent-browser network route <url> --abort # Block requests
275
+ agent-browser network route <url> --body <json> # Mock response
276
+ agent-browser network unroute [url] # Remove routes
277
+ agent-browser network requests # View tracked requests
278
+ agent-browser network requests --filter api # Filter requests
279
+ ```
280
+
281
+ ### Tabs & Windows
282
+
283
+ ```bash
284
+ agent-browser tab # List tabs
285
+ agent-browser tab new [url] # New tab (optionally with URL)
286
+ agent-browser tab <n> # Switch to tab n
287
+ agent-browser tab close [n] # Close tab
288
+ agent-browser window new # New window
289
+ ```
290
+
291
+ ### Frames
292
+
293
+ ```bash
294
+ agent-browser frame <sel> # Switch to iframe
295
+ agent-browser frame main # Back to main frame
296
+ ```
297
+
298
+ ### Dialogs
299
+
300
+ ```bash
301
+ agent-browser dialog accept [text] # Accept (with optional prompt text)
302
+ agent-browser dialog dismiss # Dismiss
303
+ ```
304
+
305
+ ### Diff
306
+
307
+ ```bash
308
+ agent-browser diff snapshot # Compare current vs last snapshot
309
+ agent-browser diff snapshot --baseline before.txt # Compare current vs saved snapshot file
310
+ agent-browser diff snapshot --selector "#main" --compact # Scoped snapshot diff
311
+ agent-browser diff screenshot --baseline before.png # Visual pixel diff against baseline
312
+ agent-browser diff screenshot --baseline b.png -o d.png # Save diff image to custom path
313
+ agent-browser diff screenshot --baseline b.png -t 0.2 # Adjust color threshold (0-1)
314
+ agent-browser diff url https://v1.com https://v2.com # Compare two URLs (snapshot diff)
315
+ agent-browser diff url https://v1.com https://v2.com --screenshot # Also visual diff
316
+ agent-browser diff url https://v1.com https://v2.com --wait-until networkidle # Custom wait strategy
317
+ agent-browser diff url https://v1.com https://v2.com --selector "#main" # Scope to element
318
+ ```
319
+
320
+ ### Debug
321
+
322
+ ```bash
323
+ agent-browser trace start [path] # Start recording trace
324
+ agent-browser trace stop [path] # Stop and save trace
325
+ agent-browser profiler start # Start Chrome DevTools profiling
326
+ agent-browser profiler stop [path] # Stop and save profile (.json)
327
+ agent-browser console # View console messages (log, error, warn, info)
328
+ agent-browser console --clear # Clear console
329
+ agent-browser errors # View page errors (uncaught JavaScript exceptions)
330
+ agent-browser errors --clear # Clear errors
331
+ agent-browser highlight <sel> # Highlight element
332
+ agent-browser state save <path> # Save auth state
333
+ agent-browser state load <path> # Load auth state
334
+ agent-browser state list # List saved state files
335
+ agent-browser state show <file> # Show state summary
336
+ agent-browser state rename <old> <new> # Rename state file
337
+ agent-browser state clear [name] # Clear states for session
338
+ agent-browser state clear --all # Clear all saved states
339
+ agent-browser state clean --older-than <days> # Delete old states
340
+ ```
341
+
342
+ ### Navigation
343
+
344
+ ```bash
345
+ agent-browser back # Go back
346
+ agent-browser forward # Go forward
347
+ agent-browser reload # Reload page
348
+ ```
349
+
350
+ ### Setup
351
+
352
+ ```bash
353
+ agent-browser install # Download Chromium browser
354
+ agent-browser install --with-deps # Also install system deps (Linux)
355
+ ```
356
+
357
+ ## Sessions
358
+
359
+ Run multiple isolated browser instances:
360
+
361
+ ```bash
362
+ # Different sessions
363
+ agent-browser --session agent1 open site-a.com
364
+ agent-browser --session agent2 open site-b.com
365
+
366
+ # Or via environment variable
367
+ AGENT_BROWSER_SESSION=agent1 agent-browser click "#btn"
368
+
369
+ # List active sessions
370
+ agent-browser session list
371
+ # Output:
372
+ # Active sessions:
373
+ # -> default
374
+ # agent1
375
+
376
+ # Show current session
377
+ agent-browser session
378
+ ```
379
+
380
+ Each session has its own:
381
+ - Browser instance
382
+ - Cookies and storage
383
+ - Navigation history
384
+ - Authentication state
385
+
386
+ ## Session Persistence
387
+
388
+ Use `--session-name` to automatically save and restore cookies and localStorage across browser restarts:
389
+
390
+ ```bash
391
+ # Auto-save/load state for "twitter" session
392
+ agent-browser --session-name twitter open twitter.com
393
+
394
+ # Login once, then state persists automatically
395
+ # State files stored in ~/.agent-browser/sessions/
396
+
397
+ # Or via environment variable
398
+ export AGENT_BROWSER_SESSION_NAME=twitter
399
+ agent-browser open twitter.com
400
+ ```
401
+
402
+ ### State Encryption
403
+
404
+ Encrypt saved session data at rest with AES-256-GCM:
405
+
406
+ ```bash
407
+ # Generate key: openssl rand -hex 32
408
+ export AGENT_BROWSER_ENCRYPTION_KEY=<64-char-hex-key>
409
+
410
+ # State files are now encrypted automatically
411
+ agent-browser --session-name secure open example.com
412
+ ```
413
+
414
+ | Variable | Description |
415
+ |----------|-------------|
416
+ | `AGENT_BROWSER_SESSION_NAME` | Auto-save/load state persistence name |
417
+ | `AGENT_BROWSER_ENCRYPTION_KEY` | 64-char hex key for AES-256-GCM encryption |
418
+ | `AGENT_BROWSER_STATE_EXPIRE_DAYS` | Auto-delete states older than N days (default: 30) |
419
+
420
+ ## Snapshot Options
421
+
422
+ The `snapshot` command supports filtering to reduce output size:
423
+
424
+ ```bash
425
+ agent-browser snapshot # Full accessibility tree
426
+ agent-browser snapshot -i # Interactive elements only (buttons, inputs, links)
427
+ agent-browser snapshot -i -C # Include cursor-interactive elements (divs with onclick, etc.)
428
+ agent-browser snapshot -c # Compact (remove empty structural elements)
429
+ agent-browser snapshot -d 3 # Limit depth to 3 levels
430
+ agent-browser snapshot -s "#main" # Scope to CSS selector
431
+ agent-browser snapshot -i -c -d 5 # Combine options
432
+ ```
433
+
434
+ | Option | Description |
435
+ |--------|-------------|
436
+ | `-i, --interactive` | Only show interactive elements (buttons, links, inputs) |
437
+ | `-C, --cursor` | Include cursor-interactive elements (cursor:pointer, onclick, tabindex) |
438
+ | `-c, --compact` | Remove empty structural elements |
439
+ | `-d, --depth <n>` | Limit tree depth |
440
+ | `-s, --selector <sel>` | Scope to CSS selector |
441
+
442
+ The `-C` flag is useful for modern web apps that use custom clickable elements (divs, spans) instead of standard buttons/links.
443
+
444
+ ## Annotated Screenshots
445
+
446
+ The `--annotate` flag overlays numbered labels on interactive elements in the screenshot. Each label `[N]` corresponds to ref `@eN`, so the same refs work for both visual and text-based workflows.
447
+
448
+ ```bash
449
+ agent-browser screenshot --annotate
450
+ # -> Screenshot saved to /tmp/screenshot-2026-02-17T12-00-00-abc123.png
451
+ # [1] @e1 button "Submit"
452
+ # [2] @e2 link "Home"
453
+ # [3] @e3 textbox "Email"
454
+ ```
455
+
456
+ After an annotated screenshot, refs are cached so you can immediately interact with elements:
457
+
458
+ ```bash
459
+ agent-browser screenshot --annotate ./page.png
460
+ agent-browser click @e2 # Click the "Home" link labeled [2]
461
+ ```
462
+
463
+ This is useful for multimodal AI models that can reason about visual layout, unlabeled icon buttons, canvas elements, or visual state that the text accessibility tree cannot capture.
464
+
465
+ ## Options
466
+
467
+ | Option | Description |
468
+ |--------|-------------|
469
+ | `--session <name>` | Use isolated session (or `AGENT_BROWSER_SESSION` env) |
470
+ | `--session-name <name>` | Auto-save/restore session state (or `AGENT_BROWSER_SESSION_NAME` env) |
471
+ | `--state <path>` | Load storage state from JSON file (or `AGENT_BROWSER_STATE` env) |
472
+ | `--headers <json>` | Set HTTP headers scoped to the URL's origin |
473
+ | `--executable-path <path>` | Custom browser executable (or `AGENT_BROWSER_EXECUTABLE_PATH` env) |
474
+ | `--extension <path>` | Load browser extension (repeatable; or `AGENT_BROWSER_EXTENSIONS` env) |
475
+ | `--args <args>` | Browser launch args, comma or newline separated (or `AGENT_BROWSER_ARGS` env) |
476
+ | `--user-agent <ua>` | Custom User-Agent string (or `AGENT_BROWSER_USER_AGENT` env) |
477
+ | `--proxy <url>` | Proxy server URL with optional auth (or `AGENT_BROWSER_PROXY` env) |
478
+ | `--proxy-bypass <hosts>` | Hosts to bypass proxy (or `AGENT_BROWSER_PROXY_BYPASS` env) |
479
+ | `--ignore-https-errors` | Ignore HTTPS certificate errors (useful for self-signed certs) |
480
+ | `--allow-file-access` | Allow file:// URLs to access local files (Chromium only) |
481
+ | `-p, --provider <name>` | Cloud browser provider (or `AGENT_BROWSER_PROVIDER` env) |
482
+ | `--device <name>` | iOS device name, e.g. "iPhone 15 Pro" (or `AGENT_BROWSER_IOS_DEVICE` env) |
483
+ | `--json` | JSON output (for agents) |
484
+ | `--full, -f` | Full page screenshot |
485
+ | `--annotate` | Annotated screenshot with numbered element labels (or `AGENT_BROWSER_ANNOTATE` env) |
486
+ | `--headed` | Show browser window (not headless) |
487
+ | `--cdp <port\|url>` | Connect via Chrome DevTools Protocol (port or WebSocket URL) |
488
+ | `--auto-connect` | Auto-discover and connect to running Chrome (or `AGENT_BROWSER_AUTO_CONNECT` env) |
489
+ | `--color-scheme <scheme>` | Color scheme: `dark`, `light`, `no-preference` (or `AGENT_BROWSER_COLOR_SCHEME` env) |
490
+ | `--config <path>` | Use a custom config file (or `AGENT_BROWSER_CONFIG` env) |
491
+ | `--debug` | Debug output |
492
+
493
+ Project policy:
494
+ - `--profile` / `AGENT_BROWSER_PROFILE` are forbidden
495
+ - `--channel` / `AGENT_BROWSER_CHANNEL` are forbidden
496
+ - Default mode must connect to an existing browser at `localhost:9333` (no automatic local-launch fallback)
497
+
498
+ ## Configuration
499
+
500
+ Create an `agent-browser.json` file to set persistent defaults instead of repeating flags on every command.
501
+
502
+ **Locations (lowest to highest priority):**
503
+
504
+ 1. `~/.agent-browser/config.json` -- user-level defaults
505
+ 2. `./agent-browser.json` -- project-level overrides (in working directory)
506
+ 3. `AGENT_BROWSER_*` environment variables override config file values
507
+ 4. CLI flags override everything
508
+
509
+ **Example `agent-browser.json`:**
510
+
511
+ ```json
512
+ {
513
+ "headed": true,
514
+ "proxy": "http://localhost:8080",
515
+ "userAgent": "my-agent/1.0",
516
+ "ignoreHttpsErrors": true
517
+ }
518
+ ```
519
+
520
+ Use `--config <path>` or `AGENT_BROWSER_CONFIG` to load a specific config file instead of the defaults:
521
+
522
+ ```bash
523
+ agent-browser --config ./ci-config.json open example.com
524
+ AGENT_BROWSER_CONFIG=./ci-config.json agent-browser open example.com
525
+ ```
526
+
527
+ All options from the table above can be set in the config file using camelCase keys (e.g., `--executable-path` becomes `"executablePath"`, `--proxy-bypass` becomes `"proxyBypass"`). Unknown keys are ignored for forward compatibility.
528
+
529
+ Boolean flags accept an optional `true`/`false` value to override config settings. For example, `--headed false` disables `"headed": true` from config. A bare `--headed` is equivalent to `--headed true`.
530
+
531
+ Auto-discovered config files that are missing are silently ignored. If `--config <path>` points to a missing or invalid file, agent-browser exits with an error. Extensions from user and project configs are merged (concatenated), not replaced.
532
+
533
+ > **Tip:** If your project-level `agent-browser.json` contains environment-specific values (paths, proxies), consider adding it to `.gitignore`.
534
+
535
+ ## Default Timeout
536
+
537
+ The default Playwright timeout for standard operations (clicks, waits, fills, etc.) is 25 seconds. This is intentionally below the CLI's 30-second IPC read timeout so that Playwright returns a proper error instead of the CLI timing out with EAGAIN.
538
+
539
+ Override the default timeout via environment variable:
540
+
541
+ ```bash
542
+ # Set a longer timeout for slow pages (in milliseconds)
543
+ export AGENT_BROWSER_DEFAULT_TIMEOUT=45000
544
+ ```
545
+
546
+ > **Note:** Setting this above 30000 (30s) may cause EAGAIN errors on slow operations because the CLI's read timeout will expire before Playwright responds. The CLI retries transient errors automatically, but response times will increase.
547
+
548
+ | Variable | Description |
549
+ |----------|-------------|
550
+ | `AGENT_BROWSER_DEFAULT_TIMEOUT` | Default Playwright timeout in ms (default: 25000) |
551
+
552
+ ## Selectors
553
+
554
+ ### Refs (Recommended for AI)
555
+
556
+ Refs provide deterministic element selection from snapshots:
557
+
558
+ ```bash
559
+ # 1. Get snapshot with refs
560
+ agent-browser snapshot
561
+ # Output:
562
+ # - heading "Example Domain" [ref=e1] [level=1]
563
+ # - button "Submit" [ref=e2]
564
+ # - textbox "Email" [ref=e3]
565
+ # - link "Learn more" [ref=e4]
566
+
567
+ # 2. Use refs to interact
568
+ agent-browser click @e2 # Click the button
569
+ agent-browser fill @e3 "test@example.com" # Fill the textbox
570
+ agent-browser get text @e1 # Get heading text
571
+ agent-browser hover @e4 # Hover the link
572
+ ```
573
+
574
+ **Why use refs?**
575
+ - **Deterministic**: Ref points to exact element from snapshot
576
+ - **Fast**: No DOM re-query needed
577
+ - **AI-friendly**: Snapshot + ref workflow is optimal for LLMs
578
+
579
+ ### CSS Selectors
580
+
581
+ ```bash
582
+ agent-browser click "#id"
583
+ agent-browser click ".class"
584
+ agent-browser click "div > button"
585
+ ```
586
+
587
+ ### Text & XPath
588
+
589
+ ```bash
590
+ agent-browser click "text=Submit"
591
+ agent-browser click "xpath=//button"
592
+ ```
593
+
594
+ ### Semantic Locators
595
+
596
+ ```bash
597
+ agent-browser find role button click --name "Submit"
598
+ agent-browser find label "Email" fill "test@test.com"
599
+ ```
600
+
601
+ ## Agent Mode
602
+
603
+ Use `--json` for machine-readable output:
604
+
605
+ ```bash
606
+ agent-browser snapshot --json
607
+ # Returns: {"success":true,"data":{"snapshot":"...","refs":{"e1":{"role":"heading","name":"Title"},...}}}
608
+
609
+ agent-browser get text @e1 --json
610
+ agent-browser is visible @e2 --json
611
+ ```
612
+
613
+ ### Optimal AI Workflow
614
+
615
+ ```bash
616
+ # 1. Navigate and get snapshot
617
+ agent-browser open example.com
618
+ agent-browser snapshot -i --json # AI parses tree and refs
619
+
620
+ # 2. AI identifies target refs from snapshot
621
+ # 3. Execute actions using refs
622
+ agent-browser click @e2
623
+ agent-browser fill @e3 "input text"
624
+
625
+ # 4. Get new snapshot if page changed
626
+ agent-browser snapshot -i --json
627
+ ```
628
+
629
+ ### Command Chaining
630
+
631
+ Commands can be chained with `&&` in a single shell invocation. The browser persists via a background daemon, so chaining is safe and more efficient:
632
+
633
+ ```bash
634
+ # Open, wait for load, and snapshot in one call
635
+ agent-browser open example.com && agent-browser wait --load networkidle && agent-browser snapshot -i
636
+
637
+ # Chain multiple interactions
638
+ agent-browser fill @e1 "user@example.com" && agent-browser fill @e2 "pass" && agent-browser click @e3
639
+
640
+ # Navigate and screenshot
641
+ agent-browser open example.com && agent-browser wait --load networkidle && agent-browser screenshot page.png
642
+ ```
643
+
644
+ Use `&&` when you don't need intermediate output. Run commands separately when you need to parse output first (e.g., snapshot to discover refs before interacting).
645
+
646
+ ## Headed Mode
647
+
648
+ Show the browser window for debugging:
649
+
650
+ ```bash
651
+ agent-browser open example.com --headed
652
+ ```
653
+
654
+ This opens a visible browser window instead of running headless.
655
+
656
+ ## Authenticated Sessions
657
+
658
+ Use `--headers` to set HTTP headers for a specific origin, enabling authentication without login flows:
659
+
660
+ ```bash
661
+ # Headers are scoped to api.example.com only
662
+ agent-browser open api.example.com --headers '{"Authorization": "Bearer <token>"}'
663
+
664
+ # Requests to api.example.com include the auth header
665
+ agent-browser snapshot -i --json
666
+ agent-browser click @e2
667
+
668
+ # Navigate to another domain - headers are NOT sent (safe!)
669
+ agent-browser open other-site.com
670
+ ```
671
+
672
+ This is useful for:
673
+ - **Skipping login flows** - Authenticate via headers instead of UI
674
+ - **Switching users** - Start new sessions with different auth tokens
675
+ - **API testing** - Access protected endpoints directly
676
+ - **Security** - Headers are scoped to the origin, not leaked to other domains
677
+
678
+ To set headers for multiple origins, use `--headers` with each `open` command:
679
+
680
+ ```bash
681
+ agent-browser open api.example.com --headers '{"Authorization": "Bearer token1"}'
682
+ agent-browser open api.acme.com --headers '{"Authorization": "Bearer token2"}'
683
+ ```
684
+
685
+ For global headers (all domains), use `set headers`:
686
+
687
+ ```bash
688
+ agent-browser set headers '{"X-Custom-Header": "value"}'
689
+ ```
690
+
691
+ ## Custom Browser Executable
692
+
693
+ Use a custom browser executable instead of the bundled Chromium. This is useful for:
694
+ - **Serverless deployment**: Use lightweight Chromium builds like `@sparticuz/chromium` (~50MB vs ~684MB)
695
+ - **System browsers**: Use an existing Chrome/Chromium installation
696
+ - **Custom builds**: Use modified browser builds
697
+
698
+ ### CLI Usage
699
+
700
+ ```bash
701
+ # Via flag
702
+ agent-browser --executable-path /path/to/chromium open example.com
703
+
704
+ # Via environment variable
705
+ AGENT_BROWSER_EXECUTABLE_PATH=/path/to/chromium agent-browser open example.com
706
+ ```
707
+
708
+ ### Serverless Example (Vercel/AWS Lambda)
709
+
710
+ ```typescript
711
+ import chromium from '@sparticuz/chromium';
712
+ import { BrowserManager } from 'agent-browser-stealth';
713
+
714
+ export async function handler() {
715
+ const browser = new BrowserManager();
716
+ await browser.launch({
717
+ executablePath: await chromium.executablePath(),
718
+ headless: true,
719
+ });
720
+ // ... use browser
721
+ }
722
+ ```
723
+
724
+ ## Local Files
725
+
726
+ Open and interact with local files (PDFs, HTML, etc.) using `file://` URLs:
727
+
728
+ ```bash
729
+ # Enable file access (required for JavaScript to access local files)
730
+ agent-browser --allow-file-access open file:///path/to/document.pdf
731
+ agent-browser --allow-file-access open file:///path/to/page.html
732
+
733
+ # Take screenshot of a local PDF
734
+ agent-browser --allow-file-access open file:///Users/me/report.pdf
735
+ agent-browser screenshot report.png
736
+ ```
737
+
738
+ The `--allow-file-access` flag adds Chromium flags (`--allow-file-access-from-files`, `--allow-file-access`) that allow `file://` URLs to:
739
+ - Load and render local files
740
+ - Access other local files via JavaScript (XHR, fetch)
741
+ - Load local resources (images, scripts, stylesheets)
742
+
743
+ **Note:** This flag only works with Chromium. For security, it's disabled by default.
744
+
745
+ ## Stealth Mode
746
+
747
+ `agent-browser-stealth` is built around stealth as a primary design goal, not an add-on.
748
+ Stealth is **always on** with no flag needed. Every browser session automatically applies anti-detection countermeasures:
749
+
750
+ - **Uses Chrome channel for Chromium launches** -- local Chromium sessions are launched through Playwright's `chrome` channel for a genuine Chrome fingerprint
751
+
752
+ - Removes `navigator.webdriver` automation indicator
753
+ - Disables Chromium's `AutomationControlled` blink feature
754
+ - Replaces "HeadlessChrome" in User-Agent and userAgentData (including CDP-level override)
755
+ - Uses ANGLE rendering instead of SwiftShader to avoid GPU fingerprinting
756
+ - Adds realistic `navigator.plugins` and `navigator.mimeTypes` (passes `instanceof` checks)
757
+ - Patches `window.chrome.runtime` to match real Chrome
758
+ - Masks WebGL vendor/renderer
759
+ - Fixes `navigator.permissions.query` for notifications
760
+ - Reports realistic `navigator.hardwareConcurrency` and `performance.memory`
761
+ - Provides default media devices for `enumerateDevices()`
762
+ - Patches screen/window dimensions to avoid viewport-equals-screen fingerprint
763
+ - Sets opaque background color (headless default is transparent)
764
+ - Cleans up CDP-injected properties on the document
765
+
766
+ ### Stealth Verification
767
+
768
+ On February 24, 2026, local validation against CreepJS using `scripts/check-creepjs-headless.js` reported:
769
+
770
+ | Metric | Result |
771
+ | --- | --- |
772
+ | like headless | 0% |
773
+ | headless | 0% |
774
+ | stealth | 0% |
775
+
776
+ Reproduce:
777
+
778
+ ```bash
779
+ node scripts/check-creepjs-headless.js --binary ./cli/target/release/agent-browser
780
+ ```
781
+
782
+ ### Humanized Interactions
783
+
784
+ All interactions are automatically humanized to avoid behavioral detection:
785
+
786
+ - **Randomized typing** -- When using `type --delay`, each keystroke delay varies by +-40% so timing appears natural rather than mechanical
787
+ - **Random wait ranges** -- `wait 2000-5000` pauses for a random duration between 2 and 5 seconds
788
+ - **Bezier curve mouse movement** -- Before every `click`, the mouse moves to the target element along a randomized cubic Bezier curve with natural-looking control points
789
+ - **Navigation pacing** -- Each page navigation includes a short random delay (300-1000ms) to avoid burst patterns
790
+
791
+ These behaviors are always active and require no additional flags.
792
+
793
+ ### Auto Region Detection
794
+
795
+ When navigating to a site, the URL's TLD is used to automatically match locale, timezone, and Accept-Language headers to the target region. For example, opening `shopee.tw` automatically sets locale to `zh-TW` and timezone to `Asia/Taipei`, eliminating region-signal mismatches that server-side risk systems commonly flag.
796
+
797
+ Supported TLDs include: `.tw`, `.cn`, `.hk`, `.jp`, `.kr`, `.th`, `.vn`, `.sg`, `.my`, `.id`, `.ph`, `.br`, `.mx`, `.de`, `.fr`, `.uk`, `.ru`, `.in`, `.au`, and more.
798
+
799
+ Override with environment variables: `AGENT_BROWSER_LOCALE`, `AGENT_BROWSER_TIMEZONE`.
800
+
801
+ ### Captcha / Verification Detection
802
+
803
+ If a navigation lands on a known captcha or verification page (detected by URL patterns like `/verify/captcha` or titles like "Checking your browser"), the browser automatically retries up to 2 times with randomized backoff (3-7 seconds). If all retries are exhausted, a warning suggests `--headed` mode or `--session-name` persistence.
804
+
805
+ ## CDP Mode
806
+
807
+ Connect to an existing browser via Chrome DevTools Protocol:
808
+
809
+ By default in this fork, when you run commands without `--cdp`, agent-browser requires an existing browser at `localhost:9333` (resident browser via CDP). If CDP is unavailable, the command fails fast instead of launching a new managed browser.
810
+
811
+ ```bash
812
+ # Start Chrome with: google-chrome --remote-debugging-port=9222
813
+
814
+ # Connect once, then run commands without --cdp
815
+ agent-browser connect 9222
816
+ agent-browser snapshot
817
+ agent-browser tab
818
+ agent-browser close
819
+
820
+ # Or pass --cdp on each command
821
+ agent-browser --cdp 9222 snapshot
822
+
823
+ # Connect to remote browser via WebSocket URL
824
+ agent-browser --cdp "wss://your-browser-service.com/cdp?token=..." snapshot
825
+ ```
826
+
827
+ The `--cdp` flag accepts either:
828
+ - A port number (e.g., `9222`) for local connections via `http://localhost:{port}`
829
+ - A full WebSocket URL (e.g., `wss://...` or `ws://...`) for remote browser services
830
+
831
+ This enables control of:
832
+ - Electron apps
833
+ - Chrome/Chromium instances with remote debugging
834
+ - WebView2 applications
835
+ - Any browser exposing a CDP endpoint
836
+
837
+ ### Auto-Connect
838
+
839
+ Use `--auto-connect` to automatically discover and connect to a running Chrome instance without specifying a port:
840
+
841
+ ```bash
842
+ # Auto-discover running Chrome with remote debugging
843
+ agent-browser --auto-connect open example.com
844
+ agent-browser --auto-connect snapshot
845
+
846
+ # Or via environment variable
847
+ AGENT_BROWSER_AUTO_CONNECT=1 agent-browser snapshot
848
+ ```
849
+
850
+ Auto-connect discovers Chrome by:
851
+ 1. Reading Chrome's `DevToolsActivePort` file from the default user data directory
852
+ 2. Falling back to probing common debugging ports (9222, 9229, 9333)
853
+
854
+ This is useful when:
855
+ - Chrome 144+ has remote debugging enabled via `chrome://inspect/#remote-debugging` (which uses a dynamic port)
856
+ - You want a zero-configuration connection to your existing browser
857
+ - You don't want to track which port Chrome is using
858
+
859
+ ## Streaming (Browser Preview)
860
+
861
+ Stream the browser viewport via WebSocket for live preview or "pair browsing" where a human can watch and interact alongside an AI agent.
862
+
863
+ ### Enable Streaming
864
+
865
+ Set the `AGENT_BROWSER_STREAM_PORT` environment variable:
866
+
867
+ ```bash
868
+ AGENT_BROWSER_STREAM_PORT=9223 agent-browser open example.com
869
+ ```
870
+
871
+ This starts a WebSocket server on the specified port that streams the browser viewport and accepts input events.
872
+
873
+ ### WebSocket Protocol
874
+
875
+ Connect to `ws://localhost:9223` to receive frames and send input:
876
+
877
+ **Receive frames:**
878
+ ```json
879
+ {
880
+ "type": "frame",
881
+ "data": "<base64-encoded-jpeg>",
882
+ "metadata": {
883
+ "deviceWidth": 1280,
884
+ "deviceHeight": 720,
885
+ "pageScaleFactor": 1,
886
+ "offsetTop": 0,
887
+ "scrollOffsetX": 0,
888
+ "scrollOffsetY": 0
889
+ }
890
+ }
891
+ ```
892
+
893
+ **Send mouse events:**
894
+ ```json
895
+ {
896
+ "type": "input_mouse",
897
+ "eventType": "mousePressed",
898
+ "x": 100,
899
+ "y": 200,
900
+ "button": "left",
901
+ "clickCount": 1
902
+ }
903
+ ```
904
+
905
+ **Send keyboard events:**
906
+ ```json
907
+ {
908
+ "type": "input_keyboard",
909
+ "eventType": "keyDown",
910
+ "key": "Enter",
911
+ "code": "Enter"
912
+ }
913
+ ```
914
+
915
+ **Send touch events:**
916
+ ```json
917
+ {
918
+ "type": "input_touch",
919
+ "eventType": "touchStart",
920
+ "touchPoints": [{ "x": 100, "y": 200 }]
921
+ }
922
+ ```
923
+
924
+ ### Programmatic API
925
+
926
+ For advanced use, control streaming directly via the protocol:
927
+
928
+ ```typescript
929
+ import { BrowserManager } from 'agent-browser-stealth';
930
+
931
+ const browser = new BrowserManager();
932
+ await browser.launch({ headless: true });
933
+ await browser.navigate('https://example.com');
934
+
935
+ // Start screencast
936
+ await browser.startScreencast((frame) => {
937
+ // frame.data is base64-encoded image
938
+ // frame.metadata contains viewport info
939
+ console.log('Frame received:', frame.metadata.deviceWidth, 'x', frame.metadata.deviceHeight);
940
+ }, {
941
+ format: 'jpeg',
942
+ quality: 80,
943
+ maxWidth: 1280,
944
+ maxHeight: 720,
945
+ });
946
+
947
+ // Inject mouse events
948
+ await browser.injectMouseEvent({
949
+ type: 'mousePressed',
950
+ x: 100,
951
+ y: 200,
952
+ button: 'left',
953
+ });
954
+
955
+ // Inject keyboard events
956
+ await browser.injectKeyboardEvent({
957
+ type: 'keyDown',
958
+ key: 'Enter',
959
+ code: 'Enter',
960
+ });
961
+
962
+ // Stop when done
963
+ await browser.stopScreencast();
964
+ ```
965
+
966
+ ## Architecture
967
+
968
+ agent-browser uses a client-daemon architecture:
969
+
970
+ 1. **Rust CLI** (fast native binary) - Parses commands, communicates with daemon
971
+ 2. **Node.js Daemon** - Manages Playwright browser instance
972
+ 3. **Fallback** - If native binary unavailable, uses Node.js directly
973
+
974
+ The daemon starts automatically on first command and persists between commands for fast subsequent operations.
975
+
976
+ **Browser Engine:** Uses Chromium by default. The daemon also supports Firefox and WebKit via the Playwright protocol.
977
+
978
+ ## Platforms
979
+
980
+ | Platform | Binary | Fallback |
981
+ |----------|--------|----------|
982
+ | macOS ARM64 | Native Rust | Node.js |
983
+ | macOS x64 | Native Rust | Node.js |
984
+ | Linux ARM64 | Native Rust | Node.js |
985
+ | Linux x64 | Native Rust | Node.js |
986
+ | Windows x64 | Native Rust | Node.js |
987
+
988
+ ## Usage with AI Agents
989
+
990
+ ### Just ask the agent
991
+
992
+ The simplest approach -- just tell your agent to use it:
993
+
994
+ ```
995
+ Use agent-browser to test the login flow. Run agent-browser --help to see available commands.
996
+ ```
997
+
998
+ The `--help` output is comprehensive and most agents can figure it out from there.
999
+
1000
+ ### AI Coding Assistants (recommended)
1001
+
1002
+ Add the skill to your AI coding assistant for richer context:
1003
+
1004
+ ```bash
1005
+ npx skills add leeguooooo/agent-browser
1006
+ ```
1007
+
1008
+ This works with Claude Code, Codex, Cursor, Gemini CLI, GitHub Copilot, Goose, OpenCode, and Windsurf. The skill is fetched from the repository, so it stays up to date automatically -- do not copy `SKILL.md` from `node_modules` as it will become stale.
1009
+
1010
+ ### Claude Code
1011
+
1012
+ Install as a Claude Code skill:
1013
+
1014
+ ```bash
1015
+ npx skills add leeguooooo/agent-browser
1016
+ ```
1017
+
1018
+ This adds the skill to `.claude/skills/agent-browser/SKILL.md` in your project. The skill teaches Claude Code the full agent-browser workflow, including the snapshot-ref interaction pattern, session management, and timeout handling.
1019
+
1020
+ ### AGENTS.md / CLAUDE.md
1021
+
1022
+ For more consistent results, add to your project or global instructions file:
1023
+
1024
+ ```markdown
1025
+ ## Browser Automation
1026
+
1027
+ Use `agent-browser` for web automation. Run `agent-browser --help` for all commands.
1028
+
1029
+ Core workflow:
1030
+ 1. `agent-browser open <url>` - Navigate to page
1031
+ 2. `agent-browser snapshot -i` - Get interactive elements with refs (@e1, @e2)
1032
+ 3. `agent-browser click @e1` / `fill @e2 "text"` - Interact using refs
1033
+ 4. Re-snapshot after page changes
1034
+ ```
1035
+
1036
+ ## Integrations
1037
+
1038
+ ### iOS Simulator
1039
+
1040
+ Control real Mobile Safari in the iOS Simulator for authentic mobile web testing. Requires macOS with Xcode.
1041
+
1042
+ **Setup:**
1043
+
1044
+ ```bash
1045
+ # Install Appium and XCUITest driver
1046
+ npm install -g appium
1047
+ appium driver install xcuitest
1048
+ ```
1049
+
1050
+ **Usage:**
1051
+
1052
+ ```bash
1053
+ # List available iOS simulators
1054
+ agent-browser device list
1055
+
1056
+ # Launch Safari on a specific device
1057
+ agent-browser -p ios --device "iPhone 16 Pro" open https://example.com
1058
+
1059
+ # Same commands as desktop
1060
+ agent-browser -p ios snapshot -i
1061
+ agent-browser -p ios tap @e1
1062
+ agent-browser -p ios fill @e2 "text"
1063
+ agent-browser -p ios screenshot mobile.png
1064
+
1065
+ # Mobile-specific commands
1066
+ agent-browser -p ios swipe up
1067
+ agent-browser -p ios swipe down 500
1068
+
1069
+ # Close session
1070
+ agent-browser -p ios close
1071
+ ```
1072
+
1073
+ Or use environment variables:
1074
+
1075
+ ```bash
1076
+ export AGENT_BROWSER_PROVIDER=ios
1077
+ export AGENT_BROWSER_IOS_DEVICE="iPhone 16 Pro"
1078
+ agent-browser open https://example.com
1079
+ ```
1080
+
1081
+ | Variable | Description |
1082
+ |----------|-------------|
1083
+ | `AGENT_BROWSER_PROVIDER` | Set to `ios` to enable iOS mode |
1084
+ | `AGENT_BROWSER_IOS_DEVICE` | Device name (e.g., "iPhone 16 Pro", "iPad Pro") |
1085
+ | `AGENT_BROWSER_IOS_UDID` | Device UDID (alternative to device name) |
1086
+
1087
+ **Supported devices:** All iOS Simulators available in Xcode (iPhones, iPads), plus real iOS devices.
1088
+
1089
+ **Note:** The iOS provider boots the simulator, starts Appium, and controls Safari. First launch takes ~30-60 seconds; subsequent commands are fast.
1090
+
1091
+ #### Real Device Support
1092
+
1093
+ Appium also supports real iOS devices connected via USB. This requires additional one-time setup:
1094
+
1095
+ **1. Get your device UDID:**
1096
+ ```bash
1097
+ xcrun xctrace list devices
1098
+ # or
1099
+ system_profiler SPUSBDataType | grep -A 5 "iPhone\|iPad"
1100
+ ```
1101
+
1102
+ **2. Sign WebDriverAgent (one-time):**
1103
+ ```bash
1104
+ # Open the WebDriverAgent Xcode project
1105
+ cd ~/.appium/node_modules/appium-xcuitest-driver/node_modules/appium-webdriveragent
1106
+ open WebDriverAgent.xcodeproj
1107
+ ```
1108
+
1109
+ In Xcode:
1110
+ - Select the `WebDriverAgentRunner` target
1111
+ - Go to Signing & Capabilities
1112
+ - Select your Team (requires Apple Developer account, free tier works)
1113
+ - Let Xcode manage signing automatically
1114
+
1115
+ **3. Use with agent-browser:**
1116
+ ```bash
1117
+ # Connect device via USB, then:
1118
+ agent-browser -p ios --device "<DEVICE_UDID>" open https://example.com
1119
+
1120
+ # Or use the device name if unique
1121
+ agent-browser -p ios --device "John's iPhone" open https://example.com
1122
+ ```
1123
+
1124
+ **Real device notes:**
1125
+ - First run installs WebDriverAgent to the device (may require Trust prompt)
1126
+ - Device must be unlocked and connected via USB
1127
+ - Slightly slower initial connection than simulator
1128
+ - Tests against real Safari performance and behavior
1129
+
1130
+ ### Browserbase
1131
+
1132
+ [Browserbase](https://browserbase.com) provides remote browser infrastructure to make deployment of agentic browsing agents easy. Use it when running the agent-browser CLI in an environment where a local browser isn't feasible.
1133
+
1134
+ To enable Browserbase, use the `-p` flag:
1135
+
1136
+ ```bash
1137
+ export BROWSERBASE_API_KEY="your-api-key"
1138
+ export BROWSERBASE_PROJECT_ID="your-project-id"
1139
+ agent-browser -p browserbase open https://example.com
1140
+ ```
1141
+
1142
+ Or use environment variables for CI/scripts:
1143
+
1144
+ ```bash
1145
+ export AGENT_BROWSER_PROVIDER=browserbase
1146
+ export BROWSERBASE_API_KEY="your-api-key"
1147
+ export BROWSERBASE_PROJECT_ID="your-project-id"
1148
+ agent-browser open https://example.com
1149
+ ```
1150
+
1151
+ When enabled, agent-browser connects to a Browserbase session instead of launching a local browser. All commands work identically.
1152
+
1153
+ Get your API key and project ID from the [Browserbase Dashboard](https://browserbase.com/overview).
1154
+
1155
+ ### Browser Use
1156
+
1157
+ [Browser Use](https://browser-use.com) provides cloud browser infrastructure for AI agents. Use it when running agent-browser in environments where a local browser isn't available (serverless, CI/CD, etc.).
1158
+
1159
+ To enable Browser Use, use the `-p` flag:
1160
+
1161
+ ```bash
1162
+ export BROWSER_USE_API_KEY="your-api-key"
1163
+ agent-browser -p browseruse open https://example.com
1164
+ ```
1165
+
1166
+ Or use environment variables for CI/scripts:
1167
+
1168
+ ```bash
1169
+ export AGENT_BROWSER_PROVIDER=browseruse
1170
+ export BROWSER_USE_API_KEY="your-api-key"
1171
+ agent-browser open https://example.com
1172
+ ```
1173
+
1174
+ When enabled, agent-browser connects to a Browser Use cloud session instead of launching a local browser. All commands work identically.
1175
+
1176
+ Get your API key from the [Browser Use Cloud Dashboard](https://cloud.browser-use.com/settings?tab=api-keys). Free credits are available to get started, with pay-as-you-go pricing after.
1177
+
1178
+ ### Kernel
1179
+
1180
+ [Kernel](https://www.kernel.sh) provides cloud browser infrastructure for AI agents with features like stealth mode and persistent profiles.
1181
+
1182
+ To enable Kernel, use the `-p` flag:
1183
+
1184
+ ```bash
1185
+ export KERNEL_API_KEY="your-api-key"
1186
+ agent-browser -p kernel open https://example.com
1187
+ ```
1188
+
1189
+ Or use environment variables for CI/scripts:
1190
+
1191
+ ```bash
1192
+ export AGENT_BROWSER_PROVIDER=kernel
1193
+ export KERNEL_API_KEY="your-api-key"
1194
+ agent-browser open https://example.com
1195
+ ```
1196
+
1197
+ Optional configuration via environment variables:
1198
+
1199
+ | Variable | Description | Default |
1200
+ |----------|-------------|---------|
1201
+ | `KERNEL_HEADLESS` | Run browser in headless mode (`true`/`false`) | `false` |
1202
+ | `KERNEL_STEALTH` | Enable stealth mode to avoid bot detection (`true`/`false`) | `true` |
1203
+ | `KERNEL_TIMEOUT_SECONDS` | Session timeout in seconds | `300` |
1204
+ | `KERNEL_PROFILE_NAME` | Browser profile name for persistent cookies/logins (created if it doesn't exist) | (none) |
1205
+
1206
+ When enabled, agent-browser connects to a Kernel cloud session instead of launching a local browser. All commands work identically.
1207
+
1208
+ **Profile Persistence:** When `KERNEL_PROFILE_NAME` is set, the profile will be created if it doesn't already exist. Cookies, logins, and session data are automatically saved back to the profile when the browser session ends, making them available for future sessions.
1209
+
1210
+ Get your API key from the [Kernel Dashboard](https://dashboard.onkernel.com).
1211
+
1212
+ ## License
1213
+
1214
+ Apache-2.0