agent-browser-priv 0.27.3-priv.3

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (36) hide show
  1. package/LICENSE +201 -0
  2. package/README.md +1564 -0
  3. package/bin/agent-browser.js +125 -0
  4. package/package.json +52 -0
  5. package/scripts/build-all-platforms.sh +76 -0
  6. package/scripts/check-version-sync.js +51 -0
  7. package/scripts/copy-native.js +36 -0
  8. package/scripts/postinstall.js +327 -0
  9. package/scripts/sync-version.js +81 -0
  10. package/scripts/windows-debug/provision.sh +220 -0
  11. package/scripts/windows-debug/run.sh +92 -0
  12. package/scripts/windows-debug/start.sh +43 -0
  13. package/scripts/windows-debug/stop.sh +28 -0
  14. package/scripts/windows-debug/sync.sh +27 -0
  15. package/skill-data/agentcore/SKILL.md +115 -0
  16. package/skill-data/core/SKILL.md +488 -0
  17. package/skill-data/core/references/authentication.md +303 -0
  18. package/skill-data/core/references/commands.md +403 -0
  19. package/skill-data/core/references/profiling.md +120 -0
  20. package/skill-data/core/references/proxy-support.md +194 -0
  21. package/skill-data/core/references/session-management.md +193 -0
  22. package/skill-data/core/references/snapshot-refs.md +219 -0
  23. package/skill-data/core/references/trust-boundaries.md +89 -0
  24. package/skill-data/core/references/video-recording.md +175 -0
  25. package/skill-data/core/templates/authenticated-session.sh +105 -0
  26. package/skill-data/core/templates/capture-workflow.sh +69 -0
  27. package/skill-data/core/templates/form-automation.sh +62 -0
  28. package/skill-data/dogfood/SKILL.md +220 -0
  29. package/skill-data/dogfood/references/issue-taxonomy.md +109 -0
  30. package/skill-data/dogfood/templates/dogfood-report-template.md +53 -0
  31. package/skill-data/electron/SKILL.md +236 -0
  32. package/skill-data/slack/SKILL.md +285 -0
  33. package/skill-data/slack/references/slack-tasks.md +348 -0
  34. package/skill-data/slack/templates/slack-report-template.md +163 -0
  35. package/skill-data/vercel-sandbox/SKILL.md +280 -0
  36. package/skills/agent-browser/SKILL.md +55 -0
package/README.md ADDED
@@ -0,0 +1,1564 @@
1
+ # agent-browser-priv
2
+
3
+ Browser automation CLI for AI agents, forked for opt-in local privacy/runtime
4
+ backends. Fast native Rust CLI.
5
+
6
+ [![skills.sh](https://skills.sh/b/vercel-labs/agent-browser)](https://skills.sh/vercel-labs/agent-browser)
7
+
8
+ ## Installation
9
+
10
+ ### Global Installation (recommended)
11
+
12
+ Installs the native Rust binary:
13
+
14
+ ```bash
15
+ npm install -g agent-browser-priv
16
+ agent-browser-priv install # Download Chrome from Chrome for Testing (first time only)
17
+ ```
18
+
19
+ ### Project Installation (local dependency)
20
+
21
+ For projects that want to pin the version in `package.json`:
22
+
23
+ ```bash
24
+ npm install agent-browser-priv
25
+ agent-browser-priv install
26
+ ```
27
+
28
+ Then use via `package.json` scripts or by invoking `agent-browser` directly.
29
+
30
+ ### Homebrew (macOS)
31
+
32
+ ```bash
33
+ brew install liuwen/agent-browser-priv/agent-browser-priv
34
+ agent-browser-priv install # Download Chrome from Chrome for Testing (first time only)
35
+ ```
36
+
37
+ ### Cargo (Rust)
38
+
39
+ ```bash
40
+ cargo install --git https://github.com/liuwen/agent-browser-priv
41
+ agent-browser-priv install # Download Chrome from Chrome for Testing (first time only)
42
+ ```
43
+
44
+ ### From Source
45
+
46
+ Requires Node.js 24+, pnpm 11+, and Rust.
47
+
48
+ ```bash
49
+ git clone https://github.com/liuwen/agent-browser-priv
50
+ cd agent-browser-priv
51
+ pnpm install
52
+ pnpm build
53
+ pnpm build:native # Requires Rust (https://rustup.rs)
54
+ pnpm link --global # Makes agent-browser-priv available globally
55
+ agent-browser-priv install
56
+ ```
57
+
58
+ ### Linux Dependencies
59
+
60
+ On Linux, install system dependencies:
61
+
62
+ ```bash
63
+ agent-browser-priv install --with-deps
64
+ ```
65
+
66
+ ### Updating
67
+
68
+ Upgrade to the latest version:
69
+
70
+ ```bash
71
+ agent-browser-priv upgrade
72
+ ```
73
+
74
+ Detects your installation method (npm, Homebrew, or Cargo) and runs the appropriate update command automatically.
75
+
76
+ ### Requirements
77
+
78
+ - **Chrome** - Run `agent-browser-priv install` to download Chrome from [Chrome for Testing](https://developer.chrome.com/blog/chrome-for-testing/) (Google's official automation channel). Existing Chrome, Brave, Playwright, and Puppeteer installations are detected automatically. No Playwright or Node.js required for the daemon.
79
+ - **Node.js 24+ and pnpm 11+** - Only needed when building from source.
80
+ - **Rust** - Only needed when building from source (see From Source above).
81
+
82
+ ## Quick Start
83
+
84
+ ```bash
85
+ agent-browser open example.com
86
+ agent-browser snapshot # Get accessibility tree with refs
87
+ agent-browser click @e2 # Click by ref from snapshot
88
+ agent-browser fill @e3 "test@example.com" # Fill by ref
89
+ agent-browser get text @e1 # Get text by ref
90
+ agent-browser screenshot page.png
91
+ agent-browser close
92
+ ```
93
+
94
+ Clicks fail early when another element covers the target's click point,
95
+ for example a consent banner or modal. Dismiss or interact with the reported
96
+ covering element, then take a fresh snapshot before retrying the original ref.
97
+
98
+ Headless Chromium screenshots hide native scrollbars for consistent image output.
99
+ Pass `--hide-scrollbars false` when launching to keep native scrollbars visible.
100
+
101
+ ### Traditional Selectors (also supported)
102
+
103
+ ```bash
104
+ agent-browser click "#submit"
105
+ agent-browser fill "#email" "test@example.com"
106
+ agent-browser find role button click --name "Submit"
107
+ ```
108
+
109
+ ## Commands
110
+
111
+ ### Core Commands
112
+
113
+ ```bash
114
+ agent-browser open # Launch browser (no navigation); stays on about:blank
115
+ agent-browser open <url> # Launch + navigate to URL (aliases: goto, navigate)
116
+ agent-browser click <sel> # Click element (--new-tab to open in new tab)
117
+ agent-browser dblclick <sel> # Double-click element
118
+ agent-browser focus <sel> # Focus element
119
+ agent-browser type <sel> <text> # Type into element
120
+ agent-browser fill <sel> <text> # Clear and fill
121
+ agent-browser press <key> # Press key (Enter, Tab, Control+a) (alias: key)
122
+ agent-browser keyboard type <text> # Type with real keystrokes (no selector, current focus)
123
+ agent-browser keyboard inserttext <text> # Insert text without key events (no selector)
124
+ agent-browser keydown <key> # Hold key down
125
+ agent-browser keyup <key> # Release key
126
+ agent-browser hover <sel> # Hover element
127
+ agent-browser select <sel> <val> # Select dropdown option
128
+ agent-browser check <sel> # Check checkbox
129
+ agent-browser uncheck <sel> # Uncheck checkbox
130
+ agent-browser scroll <dir> [px] # Scroll (up/down/left/right, --selector <sel>)
131
+ agent-browser scrollintoview <sel> # Scroll element into view (alias: scrollinto)
132
+ agent-browser drag <src> <tgt> # Drag and drop
133
+ agent-browser upload <sel> <files> # Upload files
134
+ agent-browser screenshot [path] # Take screenshot (--full for full page, saves to a temporary directory if no path)
135
+ agent-browser screenshot --annotate # Annotated screenshot with numbered element labels
136
+ agent-browser screenshot --screenshot-dir ./shots # Save to custom directory
137
+ agent-browser screenshot --screenshot-format jpeg --screenshot-quality 80
138
+ agent-browser pdf <path> # Save as PDF
139
+ agent-browser snapshot # Accessibility tree with refs (best for AI)
140
+ agent-browser eval <js> # Run JavaScript (-b for base64, --stdin for piped input)
141
+ agent-browser connect <port> # Connect to browser via CDP
142
+ agent-browser stream enable [--port <port>] # Start runtime WebSocket streaming
143
+ agent-browser stream status # Show runtime streaming state and bound port
144
+ agent-browser stream disable # Stop runtime WebSocket streaming
145
+ agent-browser close # Close browser (aliases: quit, exit)
146
+ agent-browser close --all # Close all active sessions
147
+ agent-browser chat "<instruction>" # AI chat: natural language browser control (single-shot)
148
+ agent-browser chat # AI chat: interactive REPL mode
149
+ ```
150
+
151
+ ### Get Info
152
+
153
+ ```bash
154
+ agent-browser get text <sel> # Get text content
155
+ agent-browser get html <sel> # Get innerHTML
156
+ agent-browser get value <sel> # Get input value
157
+ agent-browser get attr <sel> <attr> # Get attribute
158
+ agent-browser get title # Get page title
159
+ agent-browser get url # Get current URL
160
+ agent-browser get cdp-url # Get CDP WebSocket URL (for DevTools, debugging)
161
+ agent-browser get count <sel> # Count matching elements
162
+ agent-browser get box <sel> # Get bounding box
163
+ agent-browser get styles <sel> # Get computed styles
164
+ ```
165
+
166
+ ### Check State
167
+
168
+ ```bash
169
+ agent-browser is visible <sel> # Check if visible
170
+ agent-browser is enabled <sel> # Check if enabled
171
+ agent-browser is checked <sel> # Check if checked
172
+ ```
173
+
174
+ ### Find Elements (Semantic Locators)
175
+
176
+ ```bash
177
+ agent-browser find role <role> <action> [value] # By ARIA role
178
+ agent-browser find text <text> <action> # By text content
179
+ agent-browser find label <label> <action> [value] # By label
180
+ agent-browser find placeholder <ph> <action> [value] # By placeholder
181
+ agent-browser find alt <text> <action> # By alt text
182
+ agent-browser find title <text> <action> # By title attr
183
+ agent-browser find testid <id> <action> [value] # By data-testid
184
+ agent-browser find first <sel> <action> [value] # First match
185
+ agent-browser find last <sel> <action> [value] # Last match
186
+ agent-browser find nth <n> <sel> <action> [value] # Nth match
187
+ ```
188
+
189
+ **Actions:** `click`, `fill`, `type`, `hover`, `focus`, `check`, `uncheck`, `text`
190
+
191
+ **Options:** `--name <name>` (filter role by accessible name), `--exact` (require exact text match)
192
+
193
+ **Examples:**
194
+
195
+ ```bash
196
+ agent-browser find role button click --name "Submit"
197
+ agent-browser find text "Sign In" click
198
+ agent-browser find label "Email" fill "test@test.com"
199
+ agent-browser find first ".item" click
200
+ agent-browser find nth 2 "a" text
201
+ ```
202
+
203
+ ### Wait
204
+
205
+ ```bash
206
+ agent-browser wait <selector> # Wait for element to be visible
207
+ agent-browser wait <ms> # Wait for time (milliseconds)
208
+ agent-browser wait --text "Welcome" # Wait for text to appear (substring match)
209
+ agent-browser wait --url "**/dash" # Wait for URL pattern
210
+ agent-browser wait --load networkidle # Wait for load state
211
+ agent-browser wait --fn "window.ready === true" # Wait for JS condition
212
+
213
+ # Wait for text/element to disappear
214
+ agent-browser wait --fn "!document.body.innerText.includes('Loading...')"
215
+ agent-browser wait "#spinner" --state hidden
216
+ ```
217
+
218
+ **Load states:** `load`, `domcontentloaded`, `networkidle`
219
+
220
+ ### Batch Execution
221
+
222
+ Execute multiple commands in a single invocation. Commands can be passed as
223
+ quoted arguments or piped as JSON via stdin. This avoids per-command process
224
+ startup overhead when running multi-step workflows.
225
+
226
+ ```bash
227
+ # Argument mode: each quoted argument is a full command
228
+ agent-browser batch "open https://example.com" "snapshot -i" "screenshot"
229
+
230
+ # With --bail to stop on first error
231
+ agent-browser batch --bail "open https://example.com" "click @e1" "screenshot"
232
+
233
+ # Stdin mode: pipe commands as JSON
234
+ echo '[
235
+ ["open", "https://example.com"],
236
+ ["snapshot", "-i"],
237
+ ["click", "@e1"],
238
+ ["screenshot", "result.png"]
239
+ ]' | agent-browser batch --json
240
+ ```
241
+
242
+ ### Clipboard
243
+
244
+ ```bash
245
+ agent-browser clipboard read # Read text from clipboard
246
+ agent-browser clipboard write "Hello, World!" # Write text to clipboard
247
+ agent-browser clipboard copy # Copy current selection (Ctrl+C)
248
+ agent-browser clipboard paste # Paste from clipboard (Ctrl+V)
249
+ ```
250
+
251
+ ### Mouse Control
252
+
253
+ ```bash
254
+ agent-browser mouse move <x> <y> # Move mouse
255
+ agent-browser mouse down [button] # Press button (left/right/middle)
256
+ agent-browser mouse up [button] # Release button
257
+ agent-browser mouse wheel <dy> [dx] # Scroll wheel
258
+ ```
259
+
260
+ ### Browser Settings
261
+
262
+ ```bash
263
+ agent-browser set viewport <w> <h> [scale] # Set viewport size (scale for retina, e.g. 2)
264
+ agent-browser set device <name> # Emulate device ("iPhone 14")
265
+ agent-browser set geo <lat> <lng> # Set geolocation
266
+ agent-browser set offline [on|off] # Toggle offline mode
267
+ agent-browser set headers <json> # Extra HTTP headers
268
+ agent-browser set credentials <u> <p> # HTTP basic auth
269
+ agent-browser set media [dark|light] # Emulate color scheme
270
+ ```
271
+
272
+ ### Cookies & Storage
273
+
274
+ ```bash
275
+ agent-browser cookies # Get all cookies
276
+ agent-browser cookies set <name> <val> # Set cookie
277
+ agent-browser cookies set --curl <file> # Import cookies from a Copy-as-cURL dump,
278
+ # JSON array, or bare Cookie header (auto-detected)
279
+ agent-browser cookies clear # Clear cookies
280
+
281
+ agent-browser storage local # Get all localStorage
282
+ agent-browser storage local <key> # Get specific key
283
+ agent-browser storage local set <k> <v> # Set value
284
+ agent-browser storage local clear # Clear all
285
+
286
+ agent-browser storage session # Same for sessionStorage
287
+ ```
288
+
289
+ ### Network
290
+
291
+ ```bash
292
+ agent-browser network route <url> # Intercept requests
293
+ agent-browser network route <url> --abort # Block requests
294
+ agent-browser network route <url> --body <json> # Mock response
295
+ agent-browser network route '*' --abort --resource-type script # Block scripts only
296
+ agent-browser network unroute [url] # Remove routes
297
+ agent-browser network requests # View tracked requests
298
+ agent-browser network requests --filter api # Filter requests
299
+ agent-browser network requests --type xhr,fetch # Filter by resource type
300
+ agent-browser network requests --method POST # Filter by HTTP method
301
+ agent-browser network requests --status 2xx # Filter by status (200, 2xx, 400-499)
302
+ agent-browser network request <requestId> # View full request/response detail
303
+ agent-browser network har start # Start HAR recording
304
+ agent-browser network har stop [output.har] # Stop and save HAR (temp path if omitted)
305
+ ```
306
+
307
+ ### Tabs & Windows
308
+
309
+ ```bash
310
+ agent-browser tab # List tabs (shows `tabId` and optional label)
311
+ agent-browser tab new [url] # New tab (optionally with URL)
312
+ agent-browser tab new --label docs [url] # New tab with a user-assigned label
313
+ agent-browser tab <t<N>|label> # Switch to a tab by id or label
314
+ agent-browser tab close [t<N>|label] # Close a tab (defaults to active)
315
+ agent-browser window new # New window
316
+ ```
317
+
318
+ Tab ids are stable strings of the form `t1`, `t2`, `t3`. They're never reused
319
+ within a session, so scripts and agents can keep referring to the same tab
320
+ even after other tabs are opened or closed. Positional integers like `tab 2`
321
+ are **not** accepted; the `t` prefix disambiguates handles from indices and
322
+ mirrors the `@e1` convention used for element refs.
323
+
324
+ You can also assign a memorable label (`docs`, `app`, `admin`) and use it
325
+ interchangeably with the id. Labels are never auto-generated and never
326
+ rewritten on navigation — they're yours to name and keep:
327
+
328
+ ```bash
329
+ agent-browser tab new --label docs https://docs.example.com
330
+ agent-browser tab docs # switch to the docs tab
331
+ agent-browser snapshot # populate refs for docs
332
+ agent-browser click @e3 # click uses docs's refs
333
+ agent-browser tab close docs # close by label
334
+ ```
335
+
336
+ ### Frames
337
+
338
+ ```bash
339
+ agent-browser frame <sel> # Switch to iframe
340
+ agent-browser frame main # Back to main frame
341
+ ```
342
+
343
+ ### Dialogs
344
+
345
+ ```bash
346
+ agent-browser dialog accept [text] # Accept (with optional prompt text)
347
+ agent-browser dialog dismiss # Dismiss
348
+ agent-browser dialog status # Check if a dialog is currently open
349
+ ```
350
+
351
+ By default, `alert` and `beforeunload` dialogs are automatically accepted so they never block the agent. `confirm` and `prompt` dialogs still require explicit handling. Use `--no-auto-dialog` (or `AGENT_BROWSER_NO_AUTO_DIALOG=1`) to disable automatic handling.
352
+
353
+ When a JavaScript dialog is pending, all command responses include a `warning` field with the dialog type and message.
354
+
355
+ ### Diff
356
+
357
+ ```bash
358
+ agent-browser diff snapshot # Compare current vs last snapshot
359
+ agent-browser diff snapshot --baseline before.txt # Compare current vs saved snapshot file
360
+ agent-browser diff snapshot --selector "#main" --compact # Scoped snapshot diff
361
+ agent-browser diff screenshot --baseline before.png # Visual pixel diff against baseline
362
+ agent-browser diff screenshot --baseline b.png -o d.png # Save diff image to custom path
363
+ agent-browser diff screenshot --baseline b.png -t 0.2 # Adjust color threshold (0-1)
364
+ agent-browser diff url https://v1.com https://v2.com # Compare two URLs (snapshot diff)
365
+ agent-browser diff url https://v1.com https://v2.com --screenshot # Also visual diff
366
+ agent-browser diff url https://v1.com https://v2.com --wait-until networkidle # Custom wait strategy
367
+ agent-browser diff url https://v1.com https://v2.com --selector "#main" # Scope to element
368
+ ```
369
+
370
+ ### Debug
371
+
372
+ ```bash
373
+ agent-browser trace start # Start recording trace
374
+ agent-browser trace stop [path] # Stop and save trace
375
+ agent-browser profiler start # Start Chrome DevTools profiling
376
+ agent-browser profiler stop [path] # Stop and save profile (.json)
377
+ agent-browser console # View console messages (log, error, warn, info)
378
+ agent-browser console --json # JSON output with raw CDP args for programmatic access
379
+ agent-browser console --clear # Clear console
380
+ agent-browser errors # View page errors (uncaught JavaScript exceptions)
381
+ agent-browser errors --clear # Clear errors
382
+ agent-browser highlight <sel> # Highlight element
383
+ agent-browser inspect # Open Chrome DevTools for the active page
384
+ agent-browser state save <path> # Save auth state
385
+ agent-browser state load <path> # Load auth state
386
+ agent-browser state list # List saved state files
387
+ agent-browser state show <file> # Show state summary
388
+ agent-browser state rename <old> <new> # Rename state file
389
+ agent-browser state clear [name] # Clear states for session
390
+ agent-browser state clear --all # Clear all saved states
391
+ agent-browser state clean --older-than <days> # Delete old states
392
+ ```
393
+
394
+ ### Navigation
395
+
396
+ ```bash
397
+ agent-browser back # Go back
398
+ agent-browser forward # Go forward
399
+ agent-browser reload # Reload page
400
+ agent-browser pushstate <url> # SPA client-side nav; auto-detects window.next.router.push,
401
+ # falls back to history.pushState + popstate
402
+ ```
403
+
404
+ ### Pre-navigation setup
405
+
406
+ Some flows (SSR debug, auth cookies for protected origins, init scripts)
407
+ need state set up *before* the first navigation. Use `open` with no URL
408
+ to launch the browser, then stage cookies / routes / init scripts, then
409
+ navigate. `batch` sends it all in one CLI call:
410
+
411
+ ```bash
412
+ agent-browser batch \
413
+ '["open"]' \
414
+ '["network","route","*","--abort","--resource-type","script"]' \
415
+ '["cookies","set","--curl","cookies.curl","--domain","localhost"]' \
416
+ '["navigate","http://localhost:3000/target"]'
417
+ ```
418
+
419
+ Without `batch` the same sequence is three commands that all reuse the
420
+ same daemon (fast, but not one turn).
421
+
422
+ ### React / Web Vitals
423
+
424
+ Agent-browser ships with first-class React introspection and universal Web
425
+ Vitals metrics. The React commands need the React DevTools hook installed at
426
+ launch; Web Vitals and pushstate are framework-agnostic.
427
+
428
+ ```bash
429
+ agent-browser open --enable react-devtools <url> # Launch with React hook installed
430
+ agent-browser react tree # Full component tree
431
+ agent-browser react inspect <fiberId> # props, hooks, state, source
432
+ agent-browser react renders start # Begin fiber render recording
433
+ agent-browser react renders stop [--json] # Stop and print profile (--json for raw data)
434
+ agent-browser react suspense [--only-dynamic] [--json] # Suspense boundaries + classifier
435
+ # --only-dynamic hides the "static" list
436
+ agent-browser vitals [url] [--json] # LCP/CLS/TTFB/FCP/INP + hydration summary
437
+ ```
438
+
439
+ Each `react ...` subcommand requires `--enable react-devtools` to have been
440
+ passed at launch (the React DevTools `installHook.js` is embedded in the
441
+ binary). Without it the commands error with `React DevTools hook not installed
442
+ - relaunch with --enable react-devtools`.
443
+
444
+ Works on any React app — Next.js, Remix, Vite+React, CRA, TanStack Start,
445
+ React Native Web, etc. `vitals` and `pushstate` are framework-agnostic.
446
+ `vitals` prints a summary by default; pass `--json` for the full structured
447
+ payload.
448
+
449
+ ### Init scripts
450
+
451
+ ```bash
452
+ agent-browser open --init-script <path> # Register page init script before first navigation
453
+ # (repeatable; also AGENT_BROWSER_INIT_SCRIPTS env)
454
+ agent-browser addinitscript <js> # Register at runtime (returns identifier)
455
+ agent-browser removeinitscript <identifier> # Remove a previously registered init script
456
+ ```
457
+
458
+ ### Setup
459
+
460
+ ```bash
461
+ agent-browser install # Download Chrome from Chrome for Testing (Google's official automation channel)
462
+ agent-browser install --with-deps # Also install system deps (Linux)
463
+ agent-browser upgrade # Upgrade agent-browser to the latest version
464
+ agent-browser doctor # Diagnose the install and auto-clean stale daemon files
465
+ agent-browser doctor --fix # Also run destructive repairs (reinstall Chrome, purge old state, ...)
466
+ agent-browser doctor --offline --quick # Skip network probes and the live launch test
467
+ ```
468
+
469
+ `doctor` checks your environment, Chrome install, daemon state, config files,
470
+ encryption key, providers, network reachability, and runs a live headless
471
+ browser launch test. Stale socket/pid sidecar files are auto-cleaned. Output
472
+ is also available as `--json` for agents.
473
+
474
+ ### Skills
475
+
476
+ ```bash
477
+ agent-browser skills # List available skills
478
+ agent-browser skills list # Same as above
479
+ agent-browser skills get <name> # Output a skill's full content
480
+ agent-browser skills get <name> --full # Include references and templates
481
+ agent-browser skills get --all # Output every skill
482
+ agent-browser skills path [name] # Print skill directory path
483
+ ```
484
+
485
+ Serves bundled skill content that always matches the installed CLI version. AI agents use this to get current instructions rather than relying on cached copies. Set `AGENT_BROWSER_SKILLS_DIR` to override the skills directory path.
486
+
487
+ ## Authentication
488
+
489
+ agent-browser provides multiple ways to persist login sessions so you don't re-authenticate every run.
490
+
491
+ ### Quick summary
492
+
493
+ | Approach | Best for | Flag / Env |
494
+ |----------|----------|------------|
495
+ | **Chrome profile reuse** | Reuse your existing Chrome login state (cookies, sessions) with zero setup | `--profile <name>` / `AGENT_BROWSER_PROFILE` |
496
+ | **Persistent profile** | Full browser state (cookies, IndexedDB, service workers, cache) across restarts | `--profile <path>` / `AGENT_BROWSER_PROFILE` |
497
+ | **Session persistence** | Auto-save/restore cookies + localStorage by name | `--session-name <name>` / `AGENT_BROWSER_SESSION_NAME` |
498
+ | **Import from your browser** | Grab auth from a Chrome session you already logged into | `--auto-connect` + `state save` |
499
+ | **State file** | Load a previously saved state JSON on launch | `--state <path>` / `AGENT_BROWSER_STATE` |
500
+ | **Auth vault** | Store credentials locally (encrypted), login by name | `auth save` / `auth login` |
501
+
502
+ ### Import auth from your browser
503
+
504
+ If you are already logged in to a site in Chrome, you can grab that auth state and reuse it:
505
+
506
+ ```bash
507
+ # 1. Launch Chrome with remote debugging enabled
508
+ # macOS:
509
+ "/Applications/Google Chrome.app/Contents/MacOS/Google Chrome" --remote-debugging-port=9222
510
+ # Or use --auto-connect to discover an already-running Chrome
511
+
512
+ # 2. Connect and save the authenticated state
513
+ agent-browser --auto-connect state save ./my-auth.json
514
+
515
+ # 3. Use the saved auth in future sessions
516
+ agent-browser --state ./my-auth.json open https://app.example.com/dashboard
517
+
518
+ # 4. Or use --session-name for automatic persistence
519
+ agent-browser --session-name myapp state load ./my-auth.json
520
+ # From now on, --session-name myapp auto-saves/restores this state
521
+ ```
522
+
523
+ > **Security notes:**
524
+ > - `--remote-debugging-port` exposes full browser control on localhost. Any local process can connect. Only use on trusted machines and close Chrome when done.
525
+ > - State files contain session tokens in plaintext. Add them to `.gitignore` and delete when no longer needed. For encryption at rest, set `AGENT_BROWSER_ENCRYPTION_KEY` (see [State Encryption](#state-encryption)).
526
+
527
+ For full details on login flows, OAuth, 2FA, cookie-based auth, and the auth vault, see the [Authentication](docs/src/app/sessions/page.mdx) docs.
528
+
529
+ ## Sessions
530
+
531
+ Run multiple isolated browser instances:
532
+
533
+ ```bash
534
+ # Different sessions
535
+ agent-browser --session agent1 open site-a.com
536
+ agent-browser --session agent2 open site-b.com
537
+
538
+ # Or via environment variable
539
+ AGENT_BROWSER_SESSION=agent1 agent-browser click "#btn"
540
+
541
+ # List active sessions
542
+ agent-browser session list
543
+ # Output:
544
+ # Active sessions:
545
+ # -> default
546
+ # agent1
547
+
548
+ # Show current session
549
+ agent-browser session
550
+ ```
551
+
552
+ Each session has its own:
553
+
554
+ - Browser instance
555
+ - Cookies and storage
556
+ - Navigation history
557
+ - Authentication state
558
+
559
+ ## Chrome Profile Reuse
560
+
561
+ The fastest way to use your existing login state: pass a Chrome profile name to `--profile`:
562
+
563
+ ```bash
564
+ # List available Chrome profiles
565
+ agent-browser profiles
566
+
567
+ # Reuse your default Chrome profile's login state
568
+ agent-browser --profile Default open https://gmail.com
569
+
570
+ # Use a named profile (by display name or directory name)
571
+ agent-browser --profile "Work" open https://app.example.com
572
+
573
+ # Or via environment variable
574
+ AGENT_BROWSER_PROFILE=Default agent-browser open https://gmail.com
575
+ ```
576
+
577
+ This copies your Chrome profile to a temp directory (read-only snapshot, no changes to your original profile), so the browser launches with your existing cookies and sessions.
578
+
579
+ > **Note:** On Windows, close Chrome before using `--profile <name>` if Chrome is running, as some profile files may be locked.
580
+
581
+ ## Persistent Profiles
582
+
583
+ For a persistent custom profile directory that stores state across browser restarts, pass a path to `--profile`:
584
+
585
+ ```bash
586
+ # Use a persistent profile directory
587
+ agent-browser --profile ~/.myapp-profile open myapp.com
588
+
589
+ # Login once, then reuse the authenticated session
590
+ agent-browser --profile ~/.myapp-profile open myapp.com/dashboard
591
+
592
+ # Or via environment variable
593
+ AGENT_BROWSER_PROFILE=~/.myapp-profile agent-browser open myapp.com
594
+ ```
595
+
596
+ The profile directory stores:
597
+
598
+ - Cookies and localStorage
599
+ - IndexedDB data
600
+ - Service workers
601
+ - Browser cache
602
+ - Login sessions
603
+
604
+ **Tip**: Use different profile paths for different projects to keep their browser state isolated.
605
+
606
+ ## Session Persistence
607
+
608
+ Alternatively, use `--session-name` to automatically save and restore cookies and localStorage across browser restarts:
609
+
610
+ ```bash
611
+ # Auto-save/load state for "twitter" session
612
+ agent-browser --session-name twitter open twitter.com
613
+
614
+ # Login once, then state persists automatically
615
+ # State files stored in ~/.agent-browser/sessions/
616
+
617
+ # Or via environment variable
618
+ export AGENT_BROWSER_SESSION_NAME=twitter
619
+ agent-browser open twitter.com
620
+ ```
621
+
622
+ ### State Encryption
623
+
624
+ Encrypt saved session data at rest with AES-256-GCM:
625
+
626
+ ```bash
627
+ # Generate key: openssl rand -hex 32
628
+ export AGENT_BROWSER_ENCRYPTION_KEY=<64-char-hex-key>
629
+
630
+ # State files are now encrypted automatically
631
+ agent-browser --session-name secure open example.com
632
+ ```
633
+
634
+ | Variable | Description |
635
+ | --------------------------------- | -------------------------------------------------- |
636
+ | `AGENT_BROWSER_SESSION_NAME` | Auto-save/load state persistence name |
637
+ | `AGENT_BROWSER_ENCRYPTION_KEY` | 64-char hex key for AES-256-GCM encryption |
638
+ | `AGENT_BROWSER_STATE_EXPIRE_DAYS` | Auto-delete states older than N days (default: 30) |
639
+
640
+ ## Security
641
+
642
+ agent-browser includes security features for safe AI agent deployments. All features are opt-in, and existing workflows are unaffected until you explicitly enable a feature:
643
+
644
+ - **Authentication Vault**: Store credentials locally (always encrypted), reference by name. The LLM never sees passwords. `auth login` navigates with `load` and then waits for login form selectors to appear (SPA-friendly, timeout follows the default action timeout). A key is auto-generated at `~/.agent-browser/.encryption-key` if `AGENT_BROWSER_ENCRYPTION_KEY` is not set: `echo "pass" | agent-browser auth save github --url https://github.com/login --username user --password-stdin` then `agent-browser auth login github`
645
+ - **Content Boundary Markers**: Wrap page output in delimiters so LLMs can distinguish tool output from untrusted content: `--content-boundaries`
646
+ - **Domain Allowlist**: Restrict navigation to trusted domains (wildcards like `*.example.com` also match the bare domain): `--allowed-domains "example.com,*.example.com"`. Sub-resource requests (scripts, images, fetch) and WebSocket/EventSource connections to non-allowed domains are also blocked. Include any CDN domains your target pages depend on (e.g., `*.cdn.example.com`).
647
+ - **Action Policy**: Gate destructive actions with a static policy file: `--action-policy ./policy.json`
648
+ - **Action Confirmation**: Require explicit approval for sensitive action categories: `--confirm-actions eval,download`
649
+ - **Output Length Limits**: Prevent context flooding: `--max-output 50000`
650
+
651
+ | Variable | Description |
652
+ | ----------------------------------- | ---------------------------------------- |
653
+ | `AGENT_BROWSER_CONTENT_BOUNDARIES` | Wrap page output in boundary markers |
654
+ | `AGENT_BROWSER_MAX_OUTPUT` | Max characters for page output |
655
+ | `AGENT_BROWSER_ALLOWED_DOMAINS` | Comma-separated allowed domain patterns |
656
+ | `AGENT_BROWSER_ACTION_POLICY` | Path to action policy JSON file |
657
+ | `AGENT_BROWSER_CONFIRM_ACTIONS` | Action categories requiring confirmation |
658
+ | `AGENT_BROWSER_CONFIRM_INTERACTIVE` | Enable interactive confirmation prompts |
659
+
660
+ See [Security documentation](https://agent-browser.dev/security) for details.
661
+
662
+ ## Snapshot Options
663
+
664
+ The `snapshot` command supports filtering to reduce output size:
665
+
666
+ ```bash
667
+ agent-browser snapshot # Full accessibility tree
668
+ agent-browser snapshot -i # Interactive elements only (buttons, inputs, links)
669
+ agent-browser snapshot -i --urls # Interactive elements with link URLs
670
+ agent-browser snapshot -c # Compact (remove empty structural elements)
671
+ agent-browser snapshot -d 3 # Limit depth to 3 levels
672
+ agent-browser snapshot -s "#main" # Scope to CSS selector
673
+ agent-browser snapshot -i -c -d 5 # Combine options
674
+ ```
675
+
676
+ | Option | Description |
677
+ | ---------------------- | ----------------------------------------------------------------------- |
678
+ | `-i, --interactive` | Only show interactive elements (buttons, links, inputs) |
679
+ | `-u, --urls` | Include href URLs for link elements |
680
+ | `-c, --compact` | Remove empty structural elements |
681
+ | `-d, --depth <n>` | Limit tree depth |
682
+ | `-s, --selector <sel>` | Scope to CSS selector |
683
+
684
+ ## Annotated Screenshots
685
+
686
+ The `--annotate` flag overlays numbered labels on interactive elements in the screenshot. Each label `[N]` corresponds to ref `@eN`, so the same refs work for both visual and text-based workflows.
687
+
688
+ Annotated screenshots are supported on the CDP-backed browser path (Chrome/Lightpanda). The Safari/WebDriver backend does not yet support `--annotate`.
689
+
690
+ ```bash
691
+ agent-browser screenshot --annotate
692
+ # -> Screenshot saved to /tmp/screenshot-2026-02-17T12-00-00-abc123.png
693
+ # [1] @e1 button "Submit"
694
+ # [2] @e2 link "Home"
695
+ # [3] @e3 textbox "Email"
696
+ ```
697
+
698
+ After an annotated screenshot, refs are cached so you can immediately interact with elements:
699
+
700
+ ```bash
701
+ agent-browser screenshot --annotate ./page.png
702
+ agent-browser click @e2 # Click the "Home" link labeled [2]
703
+ ```
704
+
705
+ This is useful for multimodal AI models that can reason about visual layout, unlabeled icon buttons, canvas elements, or visual state that the text accessibility tree cannot capture.
706
+
707
+ ## Options
708
+
709
+ | Option | Description |
710
+ |--------|-------------|
711
+ | `--session <name>` | Use isolated session (or `AGENT_BROWSER_SESSION` env) |
712
+ | `--session-name <name>` | Auto-save/restore session state (or `AGENT_BROWSER_SESSION_NAME` env) |
713
+ | `--profile <name\|path>` | Chrome profile name or persistent directory path (or `AGENT_BROWSER_PROFILE` env) |
714
+ | `--state <path>` | Load storage state from JSON file (or `AGENT_BROWSER_STATE` env) |
715
+ | `--headers <json>` | Set HTTP headers scoped to the URL's origin |
716
+ | `--executable-path <path>` | Custom browser executable (or `AGENT_BROWSER_EXECUTABLE_PATH` env) |
717
+ | `--extension <path>` | Load browser extension (repeatable; or `AGENT_BROWSER_EXTENSIONS` env) |
718
+ | `--init-script <path>` | Register a page init script before the first navigation (repeatable; or `AGENT_BROWSER_INIT_SCRIPTS` env) |
719
+ | `--enable <feature>` | Built-in init scripts: `react-devtools` (repeatable or comma-list; or `AGENT_BROWSER_ENABLE` env) |
720
+ | `--args <args>` | Browser launch args, comma or newline separated (or `AGENT_BROWSER_ARGS` env) |
721
+ | `--user-agent <ua>` | Custom User-Agent string (or `AGENT_BROWSER_USER_AGENT` env) |
722
+ | `--proxy <url>` | Proxy server URL with optional auth (or `AGENT_BROWSER_PROXY` env) |
723
+ | `--proxy-bypass <hosts>` | Hosts to bypass proxy (or `AGENT_BROWSER_PROXY_BYPASS` env) |
724
+ | `--ignore-https-errors` | Ignore HTTPS certificate errors (useful for self-signed certs) |
725
+ | `--allow-file-access` | Allow file:// URLs to access local files (Chromium only) |
726
+ | `--hide-scrollbars <bool>` | Hide native scrollbars in headless Chromium screenshots, enabled by default (or `AGENT_BROWSER_HIDE_SCROLLBARS` env) |
727
+ | `-p, --provider <name>` | Cloud browser provider (or `AGENT_BROWSER_PROVIDER` env) |
728
+ | `--device <name>` | iOS device name, e.g. "iPhone 15 Pro" (or `AGENT_BROWSER_IOS_DEVICE` env) |
729
+ | `--json` | JSON output (for agents) |
730
+ | `--annotate` | Annotated screenshot with numbered element labels (or `AGENT_BROWSER_ANNOTATE` env) |
731
+ | `--screenshot-dir <path>` | Default screenshot output directory (or `AGENT_BROWSER_SCREENSHOT_DIR` env) |
732
+ | `--screenshot-quality <n>` | JPEG quality 0-100 (or `AGENT_BROWSER_SCREENSHOT_QUALITY` env) |
733
+ | `--screenshot-format <fmt>` | Screenshot format: `png`, `jpeg` (or `AGENT_BROWSER_SCREENSHOT_FORMAT` env) |
734
+ | `--headed` | Show browser window (not headless) (or `AGENT_BROWSER_HEADED` env) |
735
+ | `--cdp <port\|url>` | Connect via Chrome DevTools Protocol (port or WebSocket URL) |
736
+ | `--auto-connect` | Auto-discover and connect to running Chrome (or `AGENT_BROWSER_AUTO_CONNECT` env) |
737
+ | `--color-scheme <scheme>` | Color scheme: `dark`, `light`, `no-preference` (or `AGENT_BROWSER_COLOR_SCHEME` env) |
738
+ | `--download-path <path>` | Default download directory (or `AGENT_BROWSER_DOWNLOAD_PATH` env) |
739
+ | `--content-boundaries` | Wrap page output in boundary markers for LLM safety (or `AGENT_BROWSER_CONTENT_BOUNDARIES` env) |
740
+ | `--max-output <chars>` | Truncate page output to N characters (or `AGENT_BROWSER_MAX_OUTPUT` env) |
741
+ | `--allowed-domains <list>` | Comma-separated allowed domain patterns (or `AGENT_BROWSER_ALLOWED_DOMAINS` env) |
742
+ | `--action-policy <path>` | Path to action policy JSON file (or `AGENT_BROWSER_ACTION_POLICY` env) |
743
+ | `--confirm-actions <list>` | Action categories requiring confirmation (or `AGENT_BROWSER_CONFIRM_ACTIONS` env) |
744
+ | `--confirm-interactive` | Interactive confirmation prompts; auto-denies if stdin is not a TTY (or `AGENT_BROWSER_CONFIRM_INTERACTIVE` env) |
745
+ | `--engine <name>` | Browser engine: `chrome` (default), `lightpanda` (or `AGENT_BROWSER_ENGINE` env) |
746
+ | `--backend <name>` | Local Chrome backend: `chrome` (default), `patchright` (or `AGENT_BROWSER_BACKEND` env) |
747
+ | `--no-auto-dialog` | Disable automatic dismissal of `alert`/`beforeunload` dialogs (or `AGENT_BROWSER_NO_AUTO_DIALOG` env) |
748
+ | `--model <name>` | AI model for chat command (or `AI_GATEWAY_MODEL` env) |
749
+ | `-v`, `--verbose` | Show tool commands and their raw output (chat) |
750
+ | `-q`, `--quiet` | Show only AI text responses, hide tool calls (chat) |
751
+ | `--config <path>` | Use a custom config file (or `AGENT_BROWSER_CONFIG` env) |
752
+ | `--debug` | Debug output |
753
+
754
+ ## Patchright backend
755
+
756
+ `agent-browser-priv` keeps the normal Chrome CDP backend by default. For local
757
+ development environments that need Patchright-managed Chromium artifacts or a
758
+ Patchright-launched persistent browser, install the backend once:
759
+
760
+ ```bash
761
+ agent-browser-priv install patchright
762
+ ```
763
+
764
+ Then opt in per session:
765
+
766
+ ```bash
767
+ agent-browser-priv --backend patchright --headed open https://example.com
768
+ agent-browser-priv --backend patchright --profile ~/.agent-browser-priv/profiles/dev open https://example.com
769
+ ```
770
+
771
+ Patchright is used only to launch the local Chrome-compatible browser and expose
772
+ CDP on localhost. The agent-browser command surface remains unchanged. This
773
+ backend does not solve CAPTCHA, Turnstile, or other human verification pages;
774
+ preserve those pages for human handoff.
775
+
776
+ ## Observability Dashboard
777
+
778
+ Monitor agent-browser sessions in real time with a local web dashboard showing a live viewport and command activity feed.
779
+
780
+ ```bash
781
+ # Start the dashboard server (runs in background on port 4848)
782
+ agent-browser dashboard start
783
+ agent-browser dashboard start --port 8080 # Custom port
784
+
785
+ # All sessions are automatically visible in the dashboard
786
+ agent-browser open example.com
787
+
788
+ # Stop the dashboard
789
+ agent-browser dashboard stop
790
+ ```
791
+
792
+ The dashboard runs as a standalone background process on port 4848, independent of browser sessions. It stays available even when no sessions are running, and it works from `http://localhost:4848` or a proxied/forwarded URL that reaches the dashboard server, such as `https://dashboard.agent-browser.localhost` or a Coder workspace URL. The browser stays on the dashboard origin; session-specific tabs, status, and stream traffic are proxied internally, so session ports do not need to be exposed.
793
+
794
+ The dashboard displays:
795
+ - **Live viewport**: real-time JPEG frames from the browser
796
+ - **Activity feed**: chronological command/result stream with timing and expandable details
797
+ - **Console output**: browser console messages (log, warn, error)
798
+ - **Session creation**: create new sessions from the UI with local engines (Chrome, Lightpanda) or cloud providers (AgentCore, Browserbase, Browserless, Browser Use, Kernel)
799
+ - **AI Chat**: chat with an AI assistant directly in the dashboard (requires Vercel AI Gateway configuration)
800
+
801
+ ### AI Chat
802
+
803
+ The dashboard includes an optional AI chat panel powered by the Vercel AI Gateway. The same functionality is available directly from the CLI via the `chat` command. Set these environment variables to enable AI chat:
804
+
805
+ ```bash
806
+ export AI_GATEWAY_API_KEY=gw_your_key_here
807
+ export AI_GATEWAY_MODEL=anthropic/claude-sonnet-4.6 # optional, this is the default
808
+ export AI_GATEWAY_URL=https://ai-gateway.vercel.sh # optional, this is the default
809
+ ```
810
+
811
+ **CLI usage:**
812
+
813
+ ```bash
814
+ agent-browser chat "open google.com and search for cats" # Single-shot
815
+ agent-browser chat # Interactive REPL
816
+ agent-browser -q chat "summarize this page" # Quiet mode (text only)
817
+ agent-browser -v chat "fill in the login form" # Verbose (show command output)
818
+ agent-browser --model openai/gpt-4o chat "take a screenshot" # Override model
819
+ ```
820
+
821
+ The `chat` command translates natural language instructions into agent-browser commands, executes them, and streams the AI response. In interactive mode, type `quit` to exit. Use `--json` for structured output suitable for agent consumption.
822
+
823
+ **Dashboard usage:**
824
+
825
+ The Chat tab is always visible in the dashboard. When `AI_GATEWAY_API_KEY` is set, the Rust server proxies requests to the gateway and streams responses back using the Vercel AI SDK's UI Message Stream protocol. Without the key, sending a message shows an error inline.
826
+
827
+ ## Configuration
828
+
829
+ Create an `agent-browser.json` file to set persistent defaults instead of repeating flags on every command.
830
+
831
+ **Locations (lowest to highest priority):**
832
+
833
+ 1. `~/.agent-browser/config.json`: user-level defaults
834
+ 2. `./agent-browser.json`: project-level overrides (in working directory)
835
+ 3. `AGENT_BROWSER_*` environment variables override config file values
836
+ 4. CLI flags override everything
837
+
838
+ **Example `agent-browser.json`:**
839
+
840
+ ```json
841
+ {
842
+ "headed": true,
843
+ "proxy": "http://localhost:8080",
844
+ "profile": "./browser-data",
845
+ "userAgent": "my-agent/1.0",
846
+ "hideScrollbars": false,
847
+ "ignoreHttpsErrors": true
848
+ }
849
+ ```
850
+
851
+ Use `--config <path>` or `AGENT_BROWSER_CONFIG` to load a specific config file instead of the defaults:
852
+
853
+ ```bash
854
+ agent-browser --config ./ci-config.json open example.com
855
+ AGENT_BROWSER_CONFIG=./ci-config.json agent-browser open example.com
856
+ ```
857
+
858
+ All options from the table above can be set in the config file using camelCase keys (e.g., `--executable-path` becomes `"executablePath"`, `--proxy-bypass` becomes `"proxyBypass"`). Unknown keys are ignored for forward compatibility.
859
+
860
+ A [JSON Schema](agent-browser.schema.json) is available for IDE autocomplete and validation. Add a `$schema` key to your config file to enable it:
861
+
862
+ ```json
863
+ {
864
+ "$schema": "https://agent-browser.dev/schema.json",
865
+ "headed": true
866
+ }
867
+ ```
868
+
869
+ Boolean flags accept an optional `true`/`false` value to override config settings. For example, `--headed false` disables `"headed": true` from config. A bare `--headed` is equivalent to `--headed true`.
870
+
871
+ Auto-discovered config files that are missing are silently ignored. If `--config <path>` points to a missing or invalid file, agent-browser exits with an error. Extensions from user and project configs are merged (concatenated), not replaced.
872
+
873
+ > **Tip:** If your project-level `agent-browser.json` contains environment-specific values (paths, proxies), consider adding it to `.gitignore`.
874
+
875
+ ## Default Timeout
876
+
877
+ The default timeout for standard operations (clicks, waits, fills, etc.) is 25 seconds. This is intentionally below the CLI's 30-second IPC read timeout so that the daemon returns a proper error instead of the CLI timing out with EAGAIN.
878
+
879
+ Override the default timeout via environment variable:
880
+
881
+ ```bash
882
+ # Set a longer timeout for slow pages (in milliseconds)
883
+ export AGENT_BROWSER_DEFAULT_TIMEOUT=45000
884
+ ```
885
+
886
+ > **Note:** Setting this above 30000 (30s) may cause EAGAIN errors on slow operations because the CLI's read timeout will expire before the daemon responds. The CLI retries transient errors automatically, but response times will increase.
887
+
888
+ | Variable | Description |
889
+ | ------------------------------- | ---------------------------------------- |
890
+ | `AGENT_BROWSER_DEFAULT_TIMEOUT` | Default operation timeout in ms (default: 25000) |
891
+
892
+ ## Selectors
893
+
894
+ ### Refs (Recommended for AI)
895
+
896
+ Refs provide deterministic element selection from snapshots:
897
+
898
+ ```bash
899
+ # 1. Get snapshot with refs
900
+ agent-browser snapshot
901
+ # Output:
902
+ # - heading "Example Domain" [ref=e1] [level=1]
903
+ # - button "Submit" [ref=e2]
904
+ # - textbox "Email" [ref=e3]
905
+ # - link "Learn more" [ref=e4]
906
+
907
+ # 2. Use refs to interact
908
+ agent-browser click @e2 # Click the button
909
+ agent-browser fill @e3 "test@example.com" # Fill the textbox
910
+ agent-browser get text @e1 # Get heading text
911
+ agent-browser hover @e4 # Hover the link
912
+ ```
913
+
914
+ When a ref click is blocked by an overlay, the error includes the covering
915
+ element, such as `covered by <div#consent-banner>`. Click the banner or dialog
916
+ control first, then run `snapshot` again before reusing refs.
917
+
918
+ **Why use refs?**
919
+
920
+ - **Deterministic**: Ref points to exact element from snapshot
921
+ - **Fast**: No DOM re-query needed
922
+ - **AI-friendly**: Snapshot + ref workflow is optimal for LLMs
923
+
924
+ ### CSS Selectors
925
+
926
+ ```bash
927
+ agent-browser click "#id"
928
+ agent-browser click ".class"
929
+ agent-browser click "div > button"
930
+ ```
931
+
932
+ ### Text & XPath
933
+
934
+ ```bash
935
+ agent-browser click "text=Submit"
936
+ agent-browser click "xpath=//button"
937
+ ```
938
+
939
+ ### Semantic Locators
940
+
941
+ ```bash
942
+ agent-browser find role button click --name "Submit"
943
+ agent-browser find label "Email" fill "test@test.com"
944
+ ```
945
+
946
+ ## Agent Mode
947
+
948
+ Use `--json` for machine-readable output:
949
+
950
+ ```bash
951
+ agent-browser snapshot --json
952
+ # Returns: {"success":true,"data":{"snapshot":"...","refs":{"e1":{"role":"heading","name":"Title"},...}}}
953
+
954
+ agent-browser get text @e1 --json
955
+ agent-browser is visible @e2 --json
956
+ ```
957
+
958
+ ### Optimal AI Workflow
959
+
960
+ ```bash
961
+ # 1. Navigate and get snapshot
962
+ agent-browser open example.com
963
+ agent-browser snapshot -i --json # AI parses tree and refs
964
+
965
+ # 2. AI identifies target refs from snapshot
966
+ # 3. Execute actions using refs
967
+ agent-browser click @e2
968
+ agent-browser fill @e3 "input text"
969
+
970
+ # 4. Get new snapshot if page changed
971
+ agent-browser snapshot -i --json
972
+ ```
973
+
974
+ ### Command Chaining
975
+
976
+ Commands can be chained with `&&` in a single shell invocation. The browser persists via a background daemon, so chaining is safe and more efficient:
977
+
978
+ ```bash
979
+ # Open, wait for load, and snapshot in one call
980
+ agent-browser open example.com && agent-browser wait --load networkidle && agent-browser snapshot -i
981
+
982
+ # Chain multiple interactions
983
+ agent-browser fill @e1 "user@example.com" && agent-browser fill @e2 "pass" && agent-browser click @e3
984
+
985
+ # Navigate and screenshot
986
+ agent-browser open example.com && agent-browser wait --load networkidle && agent-browser screenshot page.png
987
+ ```
988
+
989
+ Use `&&` when you don't need intermediate output. Run commands separately when you need to parse output first (e.g., snapshot to discover refs before interacting).
990
+
991
+ ## Headed Mode
992
+
993
+ Show the browser window for debugging:
994
+
995
+ ```bash
996
+ agent-browser open example.com --headed
997
+ ```
998
+
999
+ This opens a visible browser window instead of running headless.
1000
+
1001
+ > **Note:** Browser extensions work in both headed and headless mode (Chrome's `--headless=new`).
1002
+
1003
+ ## Authenticated Sessions
1004
+
1005
+ Use `--headers` to set HTTP headers for a specific origin, enabling authentication without login flows:
1006
+
1007
+ ```bash
1008
+ # Headers are scoped to api.example.com only
1009
+ agent-browser open api.example.com --headers '{"Authorization": "Bearer <token>"}'
1010
+
1011
+ # Requests to api.example.com include the auth header
1012
+ agent-browser snapshot -i --json
1013
+ agent-browser click @e2
1014
+
1015
+ # Navigate to another domain - headers are NOT sent (safe!)
1016
+ agent-browser open other-site.com
1017
+ ```
1018
+
1019
+ This is useful for:
1020
+
1021
+ - **Skipping login flows** - Authenticate via headers instead of UI
1022
+ - **Switching users** - Start new sessions with different auth tokens
1023
+ - **API testing** - Access protected endpoints directly
1024
+ - **Security** - Headers are scoped to the origin, not leaked to other domains
1025
+
1026
+ To set headers for multiple origins, use `--headers` with each `open` command:
1027
+
1028
+ ```bash
1029
+ agent-browser open api.example.com --headers '{"Authorization": "Bearer token1"}'
1030
+ agent-browser open api.acme.com --headers '{"Authorization": "Bearer token2"}'
1031
+ ```
1032
+
1033
+ For global headers (all domains), use `set headers`:
1034
+
1035
+ ```bash
1036
+ agent-browser set headers '{"X-Custom-Header": "value"}'
1037
+ ```
1038
+
1039
+ ## Custom Browser Executable
1040
+
1041
+ Use a custom browser executable instead of the bundled Chromium. This is useful for:
1042
+
1043
+ - **Serverless deployment**: Use lightweight Chromium builds like `@sparticuz/chromium` (~50MB vs ~684MB)
1044
+ - **System browsers**: Use an existing Chrome/Chromium installation
1045
+ - **Custom builds**: Use modified browser builds
1046
+
1047
+ ### CLI Usage
1048
+
1049
+ ```bash
1050
+ # Via flag
1051
+ agent-browser --executable-path /path/to/chromium open example.com
1052
+
1053
+ # Via environment variable
1054
+ AGENT_BROWSER_EXECUTABLE_PATH=/path/to/chromium agent-browser open example.com
1055
+ ```
1056
+
1057
+ ### Serverless (Vercel)
1058
+
1059
+ Run agent-browser + Chrome in an ephemeral Vercel Sandbox microVM. No external server needed:
1060
+
1061
+ ```typescript
1062
+ import { Sandbox } from "@vercel/sandbox";
1063
+
1064
+ const sandbox = await Sandbox.create({ runtime: "node24" });
1065
+ await sandbox.runCommand("agent-browser", ["open", "https://example.com"]);
1066
+ const result = await sandbox.runCommand("agent-browser", ["screenshot", "--json"]);
1067
+ await sandbox.stop();
1068
+ ```
1069
+
1070
+ See the [environments example](examples/environments/) for a working demo with a UI and deploy-to-Vercel button.
1071
+
1072
+ ### Serverless (AWS Lambda)
1073
+
1074
+ ```typescript
1075
+ import chromium from '@sparticuz/chromium';
1076
+ import { execSync } from 'child_process';
1077
+
1078
+ export async function handler() {
1079
+ const executablePath = await chromium.executablePath();
1080
+ const result = execSync(
1081
+ `AGENT_BROWSER_EXECUTABLE_PATH=${executablePath} agent-browser open https://example.com && agent-browser snapshot -i --json`,
1082
+ { encoding: 'utf-8' }
1083
+ );
1084
+ return JSON.parse(result);
1085
+ }
1086
+ ```
1087
+
1088
+ ## Local Files
1089
+
1090
+ Open and interact with local files (PDFs, HTML, etc.) using `file://` URLs:
1091
+
1092
+ ```bash
1093
+ # Enable file access (required for JavaScript to access local files)
1094
+ agent-browser --allow-file-access open file:///path/to/document.pdf
1095
+ agent-browser --allow-file-access open file:///path/to/page.html
1096
+
1097
+ # Take screenshot of a local PDF
1098
+ agent-browser --allow-file-access open file:///Users/me/report.pdf
1099
+ agent-browser screenshot report.png
1100
+ ```
1101
+
1102
+ The `--allow-file-access` flag adds Chromium flags (`--allow-file-access-from-files`, `--allow-file-access`) that allow `file://` URLs to:
1103
+
1104
+ - Load and render local files
1105
+ - Access other local files via JavaScript (XHR, fetch)
1106
+ - Load local resources (images, scripts, stylesheets)
1107
+
1108
+ **Note:** This flag only works with Chromium. For security, it's disabled by default.
1109
+
1110
+ ## CDP Mode
1111
+
1112
+ Connect to an existing browser via Chrome DevTools Protocol:
1113
+
1114
+ ```bash
1115
+ # Start Chrome with: google-chrome --remote-debugging-port=9222
1116
+
1117
+ # Connect once, then run commands without --cdp
1118
+ agent-browser connect 9222
1119
+ agent-browser snapshot
1120
+ agent-browser tab
1121
+ agent-browser close
1122
+
1123
+ # Or pass --cdp on each command
1124
+ agent-browser --cdp 9222 snapshot
1125
+
1126
+ # Connect to remote browser via WebSocket URL
1127
+ agent-browser --cdp "wss://your-browser-service.com/cdp?token=..." snapshot
1128
+ ```
1129
+
1130
+ The `--cdp` flag accepts either:
1131
+
1132
+ - A port number (e.g., `9222`) for local connections via `http://localhost:{port}`
1133
+ - A full WebSocket URL (e.g., `wss://...` or `ws://...`) for remote browser services
1134
+
1135
+ This enables control of:
1136
+
1137
+ - Electron apps
1138
+ - Chrome/Chromium instances with remote debugging
1139
+ - WebView2 applications
1140
+ - Any browser exposing a CDP endpoint
1141
+
1142
+ ### Auto-Connect
1143
+
1144
+ Use `--auto-connect` to automatically discover and connect to a running Chrome instance without specifying a port:
1145
+
1146
+ ```bash
1147
+ # Auto-discover running Chrome with remote debugging
1148
+ agent-browser --auto-connect open example.com
1149
+ agent-browser --auto-connect snapshot
1150
+
1151
+ # Or via environment variable
1152
+ AGENT_BROWSER_AUTO_CONNECT=1 agent-browser snapshot
1153
+ ```
1154
+
1155
+ Auto-connect discovers Chrome by:
1156
+
1157
+ 1. Reading Chrome's `DevToolsActivePort` file from the default user data directory
1158
+ 2. Falling back to probing common debugging ports (9222, 9229)
1159
+ 3. If HTTP-based discovery (`/json/version`, `/json/list`) fails, falling back to a direct WebSocket connection
1160
+
1161
+ This is useful when:
1162
+
1163
+ - Chrome 144+ has remote debugging enabled via `chrome://inspect/#remote-debugging` (which uses a dynamic port)
1164
+ - You want a zero-configuration connection to your existing browser
1165
+ - You don't want to track which port Chrome is using
1166
+
1167
+ ## Streaming (Browser Preview)
1168
+
1169
+ Stream the browser viewport via WebSocket for live preview or "pair browsing" where a human can watch and interact alongside an AI agent.
1170
+
1171
+ ### Streaming
1172
+
1173
+ Every session automatically starts a WebSocket stream server on an OS-assigned port. Use `stream status` to see the bound port and connection state:
1174
+
1175
+ ```bash
1176
+ agent-browser stream status
1177
+ ```
1178
+
1179
+ To bind to a specific port, set `AGENT_BROWSER_STREAM_PORT`:
1180
+
1181
+ ```bash
1182
+ AGENT_BROWSER_STREAM_PORT=9223 agent-browser open example.com
1183
+ ```
1184
+
1185
+ You can also manage streaming at runtime with `stream enable`, `stream disable`, and `stream status`:
1186
+
1187
+ ```bash
1188
+ agent-browser stream enable --port 9223 # Re-enable on a specific port
1189
+ agent-browser stream disable # Stop streaming for the session
1190
+ ```
1191
+
1192
+ The WebSocket server streams the browser viewport and accepts input events.
1193
+
1194
+ ### WebSocket Protocol
1195
+
1196
+ Connect to `ws://localhost:9223` to receive frames and send input:
1197
+
1198
+ **Receive frames:**
1199
+
1200
+ ```json
1201
+ {
1202
+ "type": "frame",
1203
+ "data": "<base64-encoded-jpeg>",
1204
+ "metadata": {
1205
+ "deviceWidth": 1280,
1206
+ "deviceHeight": 720,
1207
+ "pageScaleFactor": 1,
1208
+ "offsetTop": 0,
1209
+ "scrollOffsetX": 0,
1210
+ "scrollOffsetY": 0
1211
+ }
1212
+ }
1213
+ ```
1214
+
1215
+ **Send mouse events:**
1216
+
1217
+ ```json
1218
+ {
1219
+ "type": "input_mouse",
1220
+ "eventType": "mousePressed",
1221
+ "x": 100,
1222
+ "y": 200,
1223
+ "button": "left",
1224
+ "clickCount": 1
1225
+ }
1226
+ ```
1227
+
1228
+ **Send keyboard events:**
1229
+
1230
+ ```json
1231
+ {
1232
+ "type": "input_keyboard",
1233
+ "eventType": "keyDown",
1234
+ "key": "Enter",
1235
+ "code": "Enter"
1236
+ }
1237
+ ```
1238
+
1239
+ **Send touch events:**
1240
+
1241
+ ```json
1242
+ {
1243
+ "type": "input_touch",
1244
+ "eventType": "touchStart",
1245
+ "touchPoints": [{ "x": 100, "y": 200 }]
1246
+ }
1247
+ ```
1248
+
1249
+ ## Architecture
1250
+
1251
+ agent-browser uses a client-daemon architecture:
1252
+
1253
+ 1. **Rust CLI** - Parses commands, communicates with daemon
1254
+ 2. **Rust Daemon** - Pure Rust daemon using direct CDP, no Node.js required
1255
+
1256
+ The daemon starts automatically on first command and persists between commands for fast subsequent operations. To auto-shutdown the daemon after a period of inactivity, set `AGENT_BROWSER_IDLE_TIMEOUT_MS` (value in milliseconds). When set, the daemon closes the browser and exits after receiving no commands for the specified duration.
1257
+
1258
+ **Browser Engine:** Uses Chrome (from Chrome for Testing) by default. The `--engine` flag selects between `chrome` and `lightpanda`. Supported browsers: Chromium/Chrome (via CDP) and Safari (via WebDriver for iOS).
1259
+
1260
+ ## Platforms
1261
+
1262
+ | Platform | Binary |
1263
+ | ----------- | ----------- |
1264
+ | macOS ARM64 | Native Rust |
1265
+ | Linux ARM64 | Native Rust |
1266
+ | Linux x64 | Native Rust |
1267
+ | Windows x64 | Native Rust |
1268
+
1269
+ ## Usage with AI Agents
1270
+
1271
+ ### Just ask the agent
1272
+
1273
+ The simplest approach is to tell your agent to use it:
1274
+
1275
+ ```
1276
+ Use agent-browser to test the login flow. Run agent-browser --help to see available commands.
1277
+ ```
1278
+
1279
+ The `--help` output is comprehensive and most agents can figure it out from there.
1280
+
1281
+ ### AI Coding Assistants (recommended)
1282
+
1283
+ Add the skill to your AI coding assistant for richer context:
1284
+
1285
+ ```bash
1286
+ npx skills add vercel-labs/agent-browser
1287
+ ```
1288
+
1289
+ This works with Claude Code, Codex, Cursor, Gemini CLI, GitHub Copilot, Goose, OpenCode, and Windsurf. The skill is fetched from the repository, so it stays up to date automatically. Do not copy `SKILL.md` from `node_modules` as it will become stale.
1290
+
1291
+ ### Claude Code
1292
+
1293
+ Install as a Claude Code skill:
1294
+
1295
+ ```bash
1296
+ npx skills add vercel-labs/agent-browser
1297
+ ```
1298
+
1299
+ This adds a thin discovery stub at `.claude/skills/agent-browser/SKILL.md`. The stub is intentionally minimal — it points Claude Code at `agent-browser skills get core` to load the actual workflow content at runtime. This way the instructions always match the installed CLI version instead of going stale between releases.
1300
+
1301
+ ### AGENTS.md / CLAUDE.md
1302
+
1303
+ For more consistent results, add to your project or global instructions file:
1304
+
1305
+ ```markdown
1306
+ ## Browser Automation
1307
+
1308
+ Use `agent-browser` for web automation. Run `agent-browser --help` for all commands.
1309
+
1310
+ Core workflow:
1311
+
1312
+ 1. `agent-browser open <url>` - Navigate to page
1313
+ 2. `agent-browser snapshot -i` - Get interactive elements with refs (@e1, @e2)
1314
+ 3. `agent-browser click @e1` / `fill @e2 "text"` - Interact using refs
1315
+ 4. Re-snapshot after page changes
1316
+ ```
1317
+
1318
+ ## Integrations
1319
+
1320
+ ### iOS Simulator
1321
+
1322
+ Control real Mobile Safari in the iOS Simulator for authentic mobile web testing. Requires macOS with Xcode.
1323
+
1324
+ **Setup:**
1325
+
1326
+ ```bash
1327
+ # Install Appium and XCUITest driver
1328
+ npm install -g appium
1329
+ appium driver install xcuitest
1330
+ ```
1331
+
1332
+ **Usage:**
1333
+
1334
+ ```bash
1335
+ # List available iOS simulators
1336
+ agent-browser device list
1337
+
1338
+ # Launch Safari on a specific device
1339
+ agent-browser -p ios --device "iPhone 16 Pro" open https://example.com
1340
+
1341
+ # Same commands as desktop
1342
+ agent-browser -p ios snapshot -i
1343
+ agent-browser -p ios tap @e1
1344
+ agent-browser -p ios fill @e2 "text"
1345
+ agent-browser -p ios screenshot mobile.png
1346
+
1347
+ # Mobile-specific commands
1348
+ agent-browser -p ios swipe up
1349
+ agent-browser -p ios swipe down 500
1350
+
1351
+ # Close session
1352
+ agent-browser -p ios close
1353
+ ```
1354
+
1355
+ Or use environment variables:
1356
+
1357
+ ```bash
1358
+ export AGENT_BROWSER_PROVIDER=ios
1359
+ export AGENT_BROWSER_IOS_DEVICE="iPhone 16 Pro"
1360
+ agent-browser open https://example.com
1361
+ ```
1362
+
1363
+ | Variable | Description |
1364
+ | -------------------------- | ----------------------------------------------- |
1365
+ | `AGENT_BROWSER_PROVIDER` | Set to `ios` to enable iOS mode |
1366
+ | `AGENT_BROWSER_IOS_DEVICE` | Device name (e.g., "iPhone 16 Pro", "iPad Pro") |
1367
+ | `AGENT_BROWSER_IOS_UDID` | Device UDID (alternative to device name) |
1368
+
1369
+ **Supported devices:** All iOS Simulators available in Xcode (iPhones, iPads), plus real iOS devices.
1370
+
1371
+ **Note:** The iOS provider boots the simulator, starts Appium, and controls Safari. First launch takes ~30-60 seconds; subsequent commands are fast.
1372
+
1373
+ #### Real Device Support
1374
+
1375
+ Appium also supports real iOS devices connected via USB. This requires additional one-time setup:
1376
+
1377
+ **1. Get your device UDID:**
1378
+
1379
+ ```bash
1380
+ xcrun xctrace list devices
1381
+ # or
1382
+ system_profiler SPUSBDataType | grep -A 5 "iPhone\|iPad"
1383
+ ```
1384
+
1385
+ **2. Sign WebDriverAgent (one-time):**
1386
+
1387
+ ```bash
1388
+ # Open the WebDriverAgent Xcode project
1389
+ cd ~/.appium/node_modules/appium-xcuitest-driver/node_modules/appium-webdriveragent
1390
+ open WebDriverAgent.xcodeproj
1391
+ ```
1392
+
1393
+ In Xcode:
1394
+
1395
+ - Select the `WebDriverAgentRunner` target
1396
+ - Go to Signing & Capabilities
1397
+ - Select your Team (requires Apple Developer account, free tier works)
1398
+ - Let Xcode manage signing automatically
1399
+
1400
+ **3. Use with agent-browser:**
1401
+
1402
+ ```bash
1403
+ # Connect device via USB, then:
1404
+ agent-browser -p ios --device "<DEVICE_UDID>" open https://example.com
1405
+
1406
+ # Or use the device name if unique
1407
+ agent-browser -p ios --device "John's iPhone" open https://example.com
1408
+ ```
1409
+
1410
+ **Real device notes:**
1411
+
1412
+ - First run installs WebDriverAgent to the device (may require Trust prompt)
1413
+ - Device must be unlocked and connected via USB
1414
+ - Slightly slower initial connection than simulator
1415
+ - Tests against real Safari performance and behavior
1416
+
1417
+ ### Browserless
1418
+
1419
+ [Browserless](https://browserless.io) provides cloud browser infrastructure with a Sessions API. Use it when running agent-browser in environments where a local browser isn't available.
1420
+
1421
+ To enable Browserless, use the `-p` flag:
1422
+
1423
+ ```bash
1424
+ export BROWSERLESS_API_KEY="your-api-token"
1425
+ agent-browser -p browserless open https://example.com
1426
+ ```
1427
+
1428
+ Or use environment variables for CI/scripts:
1429
+
1430
+ ```bash
1431
+ export AGENT_BROWSER_PROVIDER=browserless
1432
+ export BROWSERLESS_API_KEY="your-api-token"
1433
+ agent-browser open https://example.com
1434
+ ```
1435
+
1436
+ Optional configuration via environment variables:
1437
+
1438
+ | Variable | Description | Default |
1439
+ | -------------------------- | ------------------------------------------------ | --------------------------------------- |
1440
+ | `BROWSERLESS_API_URL` | Base API URL (for custom regions or self-hosted) | `https://production-sfo.browserless.io` |
1441
+ | `BROWSERLESS_BROWSER_TYPE` | Type of browser to use (chromium or chrome) | chromium |
1442
+ | `BROWSERLESS_TTL` | Session TTL in milliseconds | `300000` |
1443
+ | `BROWSERLESS_STEALTH` | Enable stealth mode (`true`/`false`) | `true` |
1444
+
1445
+ When enabled, agent-browser connects to a Browserless cloud session instead of launching a local browser. All commands work identically.
1446
+
1447
+ Get your API token from the [Browserless Dashboard](https://browserless.io).
1448
+
1449
+ ### Browserbase
1450
+
1451
+ [Browserbase](https://browserbase.com) provides remote browser infrastructure to make deployment of agentic browsing agents easy. Use it when running the agent-browser CLI in an environment where a local browser isn't feasible.
1452
+
1453
+ To enable Browserbase, use the `-p` flag:
1454
+
1455
+ ```bash
1456
+ export BROWSERBASE_API_KEY="your-api-key"
1457
+ agent-browser -p browserbase open https://example.com
1458
+ ```
1459
+
1460
+ Or use environment variables for CI/scripts:
1461
+
1462
+ ```bash
1463
+ export AGENT_BROWSER_PROVIDER=browserbase
1464
+ export BROWSERBASE_API_KEY="your-api-key"
1465
+ agent-browser open https://example.com
1466
+ ```
1467
+
1468
+ When enabled, agent-browser connects to a Browserbase session instead of launching a local browser. All commands work identically.
1469
+
1470
+ Get your API key from the [Browserbase Dashboard](https://browserbase.com/overview).
1471
+
1472
+ ### Browser Use
1473
+
1474
+ [Browser Use](https://browser-use.com) provides cloud browser infrastructure for AI agents. Use it when running agent-browser in environments where a local browser isn't available (serverless, CI/CD, etc.).
1475
+
1476
+ To enable Browser Use, use the `-p` flag:
1477
+
1478
+ ```bash
1479
+ export BROWSER_USE_API_KEY="your-api-key"
1480
+ agent-browser -p browseruse open https://example.com
1481
+ ```
1482
+
1483
+ Or use environment variables for CI/scripts:
1484
+
1485
+ ```bash
1486
+ export AGENT_BROWSER_PROVIDER=browseruse
1487
+ export BROWSER_USE_API_KEY="your-api-key"
1488
+ agent-browser open https://example.com
1489
+ ```
1490
+
1491
+ When enabled, agent-browser connects to a Browser Use cloud session instead of launching a local browser. All commands work identically.
1492
+
1493
+ Get your API key from the [Browser Use Cloud Dashboard](https://cloud.browser-use.com/settings?tab=api-keys). Free credits are available to get started, with pay-as-you-go pricing after.
1494
+
1495
+ ### Kernel
1496
+
1497
+ [Kernel](https://www.kernel.sh) provides cloud browser infrastructure for AI agents with features like stealth mode and persistent profiles.
1498
+
1499
+ To enable Kernel, use the `-p` flag:
1500
+
1501
+ ```bash
1502
+ export KERNEL_API_KEY="your-api-key"
1503
+ agent-browser -p kernel open https://example.com
1504
+ ```
1505
+
1506
+ Or use environment variables for CI/scripts:
1507
+
1508
+ ```bash
1509
+ export AGENT_BROWSER_PROVIDER=kernel
1510
+ export KERNEL_API_KEY="your-api-key"
1511
+ agent-browser open https://example.com
1512
+ ```
1513
+
1514
+ Optional configuration via environment variables:
1515
+
1516
+ | Variable | Description | Default |
1517
+ | ------------------------ | -------------------------------------------------------------------------------- | ------- |
1518
+ | `KERNEL_HEADLESS` | Run browser in headless mode (`true`/`false`) | `true` |
1519
+ | `KERNEL_STEALTH` | Enable stealth mode to avoid bot detection (`true`/`false`) | `false` |
1520
+ | `KERNEL_TIMEOUT_SECONDS` | Session timeout in seconds | `300` |
1521
+ | `KERNEL_PROFILE_NAME` | Browser profile name for persistent cookies/logins (created if it doesn't exist) | (none) |
1522
+
1523
+ When enabled, agent-browser connects to a Kernel cloud session instead of launching a local browser. All commands work identically.
1524
+
1525
+ **Profile Persistence:** When `KERNEL_PROFILE_NAME` is set, the profile will be created if it doesn't already exist. Cookies, logins, and session data are automatically saved back to the profile when the browser session ends, making them available for future sessions.
1526
+
1527
+ Get your API key from the [Kernel Dashboard](https://dashboard.onkernel.com).
1528
+
1529
+ ### AgentCore
1530
+
1531
+ [AWS Bedrock AgentCore](https://aws.amazon.com/bedrock/agentcore/) provides cloud browser sessions with SigV4 authentication.
1532
+
1533
+ To enable AgentCore, use the `-p` flag:
1534
+
1535
+ ```bash
1536
+ agent-browser -p agentcore open https://example.com
1537
+ ```
1538
+
1539
+ Or use environment variables for CI/scripts:
1540
+
1541
+ ```bash
1542
+ export AGENT_BROWSER_PROVIDER=agentcore
1543
+ agent-browser open https://example.com
1544
+ ```
1545
+
1546
+ Credentials are automatically resolved from environment variables (`AWS_ACCESS_KEY_ID`, `AWS_SECRET_ACCESS_KEY`) or the AWS CLI (`aws configure export-credentials`), which supports SSO, profiles, and IAM roles.
1547
+
1548
+ Optional configuration via environment variables:
1549
+
1550
+ | Variable | Description | Default |
1551
+ | -------------------------- | -------------------------------------------------------------------- | ---------------- |
1552
+ | `AGENTCORE_REGION` | AWS region for the AgentCore endpoint | `us-east-1` |
1553
+ | `AGENTCORE_BROWSER_ID` | Browser identifier | `aws.browser.v1` |
1554
+ | `AGENTCORE_PROFILE_ID` | Browser profile for persistent state (cookies, localStorage) | (none) |
1555
+ | `AGENTCORE_SESSION_TIMEOUT`| Session timeout in seconds | `3600` |
1556
+ | `AWS_PROFILE` | AWS CLI profile for credential resolution | `default` |
1557
+
1558
+ **Browser profiles:** When `AGENTCORE_PROFILE_ID` is set, browser state (cookies, localStorage) is persisted across sessions automatically.
1559
+
1560
+ When enabled, agent-browser connects to an AgentCore cloud browser session instead of launching a local browser. All commands work identically.
1561
+
1562
+ ## License
1563
+
1564
+ Apache-2.0