agent-browser 0.27.0 → 0.27.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -42,6 +42,8 @@ agent-browser install # Download Chrome from Chrome for Testing (first time onl
42
42
 
43
43
  ### From Source
44
44
 
45
+ Requires Node.js 24+, pnpm 11+, and Rust.
46
+
45
47
  ```bash
46
48
  git clone https://github.com/vercel-labs/agent-browser
47
49
  cd agent-browser
@@ -73,6 +75,7 @@ Detects your installation method (npm, Homebrew, or Cargo) and runs the appropri
73
75
  ### Requirements
74
76
 
75
77
  - **Chrome** - Run `agent-browser install` to download Chrome from [Chrome for Testing](https://developer.chrome.com/blog/chrome-for-testing/) (Google's official automation channel). Existing Chrome, Brave, Playwright, and Puppeteer installations are detected automatically. No Playwright or Node.js required for the daemon.
78
+ - **Node.js 24+ and pnpm 11+** - Only needed when building from source.
76
79
  - **Rust** - Only needed when building from source (see From Source above).
77
80
 
78
81
  ## Quick Start
@@ -87,6 +90,9 @@ agent-browser screenshot page.png
87
90
  agent-browser close
88
91
  ```
89
92
 
93
+ Headless Chromium screenshots hide native scrollbars for consistent image output.
94
+ Pass `--hide-scrollbars false` when launching to keep native scrollbars visible.
95
+
90
96
  ### Traditional Selectors (also supported)
91
97
 
92
98
  ```bash
@@ -359,7 +365,7 @@ agent-browser diff url https://v1.com https://v2.com --selector "#main" # Scope
359
365
  ### Debug
360
366
 
361
367
  ```bash
362
- agent-browser trace start [path] # Start recording trace
368
+ agent-browser trace start # Start recording trace
363
369
  agent-browser trace stop [path] # Stop and save trace
364
370
  agent-browser profiler start # Start Chrome DevTools profiling
365
371
  agent-browser profiler stop [path] # Stop and save profile (.json)
@@ -422,7 +428,7 @@ agent-browser react renders start # Begin fiber render recordin
422
428
  agent-browser react renders stop [--json] # Stop and print profile (--json for raw data)
423
429
  agent-browser react suspense [--only-dynamic] [--json] # Suspense boundaries + classifier
424
430
  # --only-dynamic hides the "static" list
425
- agent-browser vitals [url] [--json] # LCP/CLS/TTFB/FCP/INP + React hydration phases
431
+ agent-browser vitals [url] [--json] # LCP/CLS/TTFB/FCP/INP + hydration summary
426
432
  ```
427
433
 
428
434
  Each `react ...` subcommand requires `--enable react-devtools` to have been
@@ -432,6 +438,8 @@ binary). Without it the commands error with `React DevTools hook not installed
432
438
 
433
439
  Works on any React app — Next.js, Remix, Vite+React, CRA, TanStack Start,
434
440
  React Native Web, etc. `vitals` and `pushstate` are framework-agnostic.
441
+ `vitals` prints a summary by default; pass `--json` for the full structured
442
+ payload.
435
443
 
436
444
  ### Init scripts
437
445
 
@@ -626,14 +634,14 @@ agent-browser --session-name secure open example.com
626
634
 
627
635
  ## Security
628
636
 
629
- agent-browser includes security features for safe AI agent deployments. All features are opt-in -- existing workflows are unaffected until you explicitly enable a feature:
637
+ agent-browser includes security features for safe AI agent deployments. All features are opt-in, and existing workflows are unaffected until you explicitly enable a feature:
630
638
 
631
- - **Authentication Vault** -- Store credentials locally (always encrypted), reference by name. The LLM never sees passwords. `auth login` navigates with `load` and then waits for login form selectors to appear (SPA-friendly, timeout follows the default action timeout). A key is auto-generated at `~/.agent-browser/.encryption-key` if `AGENT_BROWSER_ENCRYPTION_KEY` is not set: `echo "pass" | agent-browser auth save github --url https://github.com/login --username user --password-stdin` then `agent-browser auth login github`
632
- - **Content Boundary Markers** -- Wrap page output in delimiters so LLMs can distinguish tool output from untrusted content: `--content-boundaries`
633
- - **Domain Allowlist** -- Restrict navigation to trusted domains (wildcards like `*.example.com` also match the bare domain): `--allowed-domains "example.com,*.example.com"`. Sub-resource requests (scripts, images, fetch) and WebSocket/EventSource connections to non-allowed domains are also blocked. Include any CDN domains your target pages depend on (e.g., `*.cdn.example.com`).
634
- - **Action Policy** -- Gate destructive actions with a static policy file: `--action-policy ./policy.json`
635
- - **Action Confirmation** -- Require explicit approval for sensitive action categories: `--confirm-actions eval,download`
636
- - **Output Length Limits** -- Prevent context flooding: `--max-output 50000`
639
+ - **Authentication Vault**: Store credentials locally (always encrypted), reference by name. The LLM never sees passwords. `auth login` navigates with `load` and then waits for login form selectors to appear (SPA-friendly, timeout follows the default action timeout). A key is auto-generated at `~/.agent-browser/.encryption-key` if `AGENT_BROWSER_ENCRYPTION_KEY` is not set: `echo "pass" | agent-browser auth save github --url https://github.com/login --username user --password-stdin` then `agent-browser auth login github`
640
+ - **Content Boundary Markers**: Wrap page output in delimiters so LLMs can distinguish tool output from untrusted content: `--content-boundaries`
641
+ - **Domain Allowlist**: Restrict navigation to trusted domains (wildcards like `*.example.com` also match the bare domain): `--allowed-domains "example.com,*.example.com"`. Sub-resource requests (scripts, images, fetch) and WebSocket/EventSource connections to non-allowed domains are also blocked. Include any CDN domains your target pages depend on (e.g., `*.cdn.example.com`).
642
+ - **Action Policy**: Gate destructive actions with a static policy file: `--action-policy ./policy.json`
643
+ - **Action Confirmation**: Require explicit approval for sensitive action categories: `--confirm-actions eval,download`
644
+ - **Output Length Limits**: Prevent context flooding: `--max-output 50000`
637
645
 
638
646
  | Variable | Description |
639
647
  | ----------------------------------- | ---------------------------------------- |
@@ -710,6 +718,7 @@ This is useful for multimodal AI models that can reason about visual layout, unl
710
718
  | `--proxy-bypass <hosts>` | Hosts to bypass proxy (or `AGENT_BROWSER_PROXY_BYPASS` env) |
711
719
  | `--ignore-https-errors` | Ignore HTTPS certificate errors (useful for self-signed certs) |
712
720
  | `--allow-file-access` | Allow file:// URLs to access local files (Chromium only) |
721
+ | `--hide-scrollbars <bool>` | Hide native scrollbars in headless Chromium screenshots, enabled by default (or `AGENT_BROWSER_HIDE_SCROLLBARS` env) |
713
722
  | `-p, --provider <name>` | Cloud browser provider (or `AGENT_BROWSER_PROVIDER` env) |
714
723
  | `--device <name>` | iOS device name, e.g. "iPhone 15 Pro" (or `AGENT_BROWSER_IOS_DEVICE` env) |
715
724
  | `--json` | JSON output (for agents) |
@@ -755,11 +764,11 @@ agent-browser dashboard stop
755
764
  The dashboard runs as a standalone background process on port 4848, independent of browser sessions. It stays available even when no sessions are running, and it works from `http://localhost:4848` or a proxied/forwarded URL that reaches the dashboard server, such as `https://dashboard.agent-browser.localhost` or a Coder workspace URL. The browser stays on the dashboard origin; session-specific tabs, status, and stream traffic are proxied internally, so session ports do not need to be exposed.
756
765
 
757
766
  The dashboard displays:
758
- - **Live viewport** -- real-time JPEG frames from the browser
759
- - **Activity feed** -- chronological command/result stream with timing and expandable details
760
- - **Console output** -- browser console messages (log, warn, error)
761
- - **Session creation** -- create new sessions from the UI with local engines (Chrome, Lightpanda) or cloud providers (AgentCore, Browserbase, Browserless, Browser Use, Kernel)
762
- - **AI Chat** -- chat with an AI assistant directly in the dashboard (requires Vercel AI Gateway configuration)
767
+ - **Live viewport**: real-time JPEG frames from the browser
768
+ - **Activity feed**: chronological command/result stream with timing and expandable details
769
+ - **Console output**: browser console messages (log, warn, error)
770
+ - **Session creation**: create new sessions from the UI with local engines (Chrome, Lightpanda) or cloud providers (AgentCore, Browserbase, Browserless, Browser Use, Kernel)
771
+ - **AI Chat**: chat with an AI assistant directly in the dashboard (requires Vercel AI Gateway configuration)
763
772
 
764
773
  ### AI Chat
765
774
 
@@ -793,8 +802,8 @@ Create an `agent-browser.json` file to set persistent defaults instead of repeat
793
802
 
794
803
  **Locations (lowest to highest priority):**
795
804
 
796
- 1. `~/.agent-browser/config.json` -- user-level defaults
797
- 2. `./agent-browser.json` -- project-level overrides (in working directory)
805
+ 1. `~/.agent-browser/config.json`: user-level defaults
806
+ 2. `./agent-browser.json`: project-level overrides (in working directory)
798
807
  3. `AGENT_BROWSER_*` environment variables override config file values
799
808
  4. CLI flags override everything
800
809
 
@@ -806,6 +815,7 @@ Create an `agent-browser.json` file to set persistent defaults instead of repeat
806
815
  "proxy": "http://localhost:8080",
807
816
  "profile": "./browser-data",
808
817
  "userAgent": "my-agent/1.0",
818
+ "hideScrollbars": false,
809
819
  "ignoreHttpsErrors": true
810
820
  }
811
821
  ```
@@ -1229,7 +1239,7 @@ The daemon starts automatically on first command and persists between commands f
1229
1239
 
1230
1240
  ### Just ask the agent
1231
1241
 
1232
- The simplest approach -- just tell your agent to use it:
1242
+ The simplest approach is to tell your agent to use it:
1233
1243
 
1234
1244
  ```
1235
1245
  Use agent-browser to test the login flow. Run agent-browser --help to see available commands.
@@ -1245,7 +1255,7 @@ Add the skill to your AI coding assistant for richer context:
1245
1255
  npx skills add vercel-labs/agent-browser
1246
1256
  ```
1247
1257
 
1248
- This works with Claude Code, Codex, Cursor, Gemini CLI, GitHub Copilot, Goose, OpenCode, and Windsurf. The skill is fetched from the repository, so it stays up to date automatically -- do not copy `SKILL.md` from `node_modules` as it will become stale.
1258
+ This works with Claude Code, Codex, Cursor, Gemini CLI, GitHub Copilot, Goose, OpenCode, and Windsurf. The skill is fetched from the repository, so it stays up to date automatically. Do not copy `SKILL.md` from `node_modules` as it will become stale.
1249
1259
 
1250
1260
  ### Claude Code
1251
1261
 
@@ -1474,8 +1484,8 @@ Optional configuration via environment variables:
1474
1484
 
1475
1485
  | Variable | Description | Default |
1476
1486
  | ------------------------ | -------------------------------------------------------------------------------- | ------- |
1477
- | `KERNEL_HEADLESS` | Run browser in headless mode (`true`/`false`) | `false` |
1478
- | `KERNEL_STEALTH` | Enable stealth mode to avoid bot detection (`true`/`false`) | `true` |
1487
+ | `KERNEL_HEADLESS` | Run browser in headless mode (`true`/`false`) | `true` |
1488
+ | `KERNEL_STEALTH` | Enable stealth mode to avoid bot detection (`true`/`false`) | `false` |
1479
1489
  | `KERNEL_TIMEOUT_SECONDS` | Session timeout in seconds | `300` |
1480
1490
  | `KERNEL_PROFILE_NAME` | Browser profile name for persistent cookies/logins (created if it doesn't exist) | (none) |
1481
1491
 
Binary file
Binary file
Binary file
Binary file
Binary file
Binary file
Binary file
package/package.json CHANGED
@@ -1,8 +1,13 @@
1
1
  {
2
2
  "name": "agent-browser",
3
- "version": "0.27.0",
3
+ "version": "0.27.1",
4
4
  "description": "Browser automation CLI for AI agents",
5
5
  "type": "module",
6
+ "packageManager": "pnpm@11.1.3",
7
+ "engines": {
8
+ "node": ">=24.0.0",
9
+ "pnpm": ">=11.0.0"
10
+ },
6
11
  "files": [
7
12
  "bin",
8
13
  "scripts",
@@ -243,6 +243,9 @@ agent-browser screenshot --full full.png # full scroll height
243
243
  agent-browser screenshot --annotate map.png # numbered labels + legend keyed to snapshot refs
244
244
  ```
245
245
 
246
+ Headless Chromium screenshots hide native scrollbars for consistent image output.
247
+ Pass `--hide-scrollbars false` when launching to keep native scrollbars visible.
248
+
246
249
  `--annotate` is designed for multimodal models: each label `[N]` maps to ref `@eN`.
247
250
 
248
251
  ### Handle multiple pages via tabs
@@ -250,13 +253,11 @@ agent-browser screenshot --annotate map.png # numbered labels + legend keyed
250
253
  ```bash
251
254
  agent-browser tab # list open tabs (with stable tabId)
252
255
  agent-browser tab new https://docs... # open a new tab (and switch to it)
253
- agent-browser tab 2 # switch to tab 2
254
- agent-browser tab close 2 # close tab 2
256
+ agent-browser tab t2 # switch to tab t2
257
+ agent-browser tab close t2 # close tab t2
255
258
  ```
256
259
 
257
- Stable `tabId`s mean `tab 2` points at the same tab across commands even
258
- when other tabs open or close. After switching, refs from a prior snapshot
259
- on a different tab no longer apply — re-snapshot.
260
+ Stable `tabId`s mean `t2` points at the same tab across commands even when other tabs open or close. After switching, refs from a prior snapshot on a different tab no longer apply — re-snapshot.
260
261
 
261
262
  ### Run multiple browsers in parallel
262
263
 
@@ -287,8 +288,8 @@ agent-browser network har stop /tmp/trace.har
287
288
  ### Record a video of the workflow
288
289
 
289
290
  ```bash
290
- agent-browser record start demo.webm
291
291
  agent-browser open https://example.com
292
+ agent-browser record start demo.webm
292
293
  agent-browser snapshot -i
293
294
  agent-browser click @e3
294
295
  agent-browser record stop
@@ -444,7 +445,8 @@ agent-browser pushstate <url> # SPA navigation (auto-detects
444
445
  ```
445
446
 
446
447
  Without `--enable react-devtools`, the `react …` commands error. `vitals`
447
- and `pushstate` work on any site regardless of framework.
448
+ and `pushstate` work on any site regardless of framework. `vitals` prints a
449
+ summary by default; use `--json` for the full structured payload.
448
450
 
449
451
  ## Working safely
450
452
 
@@ -103,9 +103,13 @@ agent-browser screenshot --full # Full page
103
103
  agent-browser pdf output.pdf # Save as PDF
104
104
  ```
105
105
 
106
+ Headless Chromium screenshots hide native scrollbars for consistent image output.
107
+ Pass `--hide-scrollbars false` when launching to keep native scrollbars visible.
108
+
106
109
  ## Video Recording
107
110
 
108
111
  ```bash
112
+ agent-browser open https://example.com # Launch a browser session first
109
113
  agent-browser record start ./demo.webm # Start recording
110
114
  agent-browser click @e1 # Perform actions
111
115
  agent-browser record stop # Stop and save video
@@ -300,7 +304,6 @@ agent-browser state load auth.json # Restore saved state
300
304
  agent-browser --session <name> ... # Isolated browser session
301
305
  agent-browser --json ... # JSON output for parsing
302
306
  agent-browser --headed ... # Show browser window (not headless)
303
- agent-browser --full ... # Full page screenshot (-f)
304
307
  agent-browser --cdp <port> ... # Connect via Chrome DevTools Protocol
305
308
  agent-browser -p <provider> ... # Cloud browser provider (--provider)
306
309
  agent-browser --proxy <url> ... # Use proxy server
@@ -309,6 +312,7 @@ agent-browser --headers <json> ... # HTTP headers scoped to URL's origin
309
312
  agent-browser --executable-path <p> # Custom browser executable
310
313
  agent-browser --extension <path> ... # Load browser extension (repeatable)
311
314
  agent-browser --ignore-https-errors # Ignore SSL certificate errors
315
+ agent-browser --hide-scrollbars false # Keep native scrollbars visible in headless Chromium screenshots
312
316
  agent-browser --help # Show help (-h)
313
317
  agent-browser --version # Show version (-V)
314
318
  agent-browser <command> --help # Show detailed help for a command
@@ -327,7 +331,7 @@ agent-browser errors --clear # Clear errors
327
331
  agent-browser highlight @e1 # Highlight element
328
332
  agent-browser inspect # Open Chrome DevTools for this session
329
333
  agent-browser trace start # Start recording trace
330
- agent-browser trace stop trace.zip # Stop and save trace
334
+ agent-browser trace stop trace.json # Stop and save trace
331
335
  agent-browser profiler start # Start Chrome DevTools profiling
332
336
  agent-browser profiler stop trace.json # Stop and save profile
333
337
  ```
@@ -349,6 +353,9 @@ agent-browser vitals [url] [--json] # LCP/CLS/TTFB/FCP/INP + hyd
349
353
  agent-browser pushstate <url> # SPA client-side nav (auto-detects Next router)
350
354
  ```
351
355
 
356
+ `vitals` prints a summary by default and uses the same fields as the structured
357
+ `--json` response.
358
+
352
359
  ## Init scripts
353
360
 
354
361
  ```bash
@@ -383,7 +390,9 @@ AGENT_BROWSER_EXECUTABLE_PATH="/path/chrome" # Custom browser path
383
390
  AGENT_BROWSER_EXTENSIONS="/ext1,/ext2" # Comma-separated extension paths
384
391
  AGENT_BROWSER_INIT_SCRIPTS="/a.js,/b.js" # Comma-separated init script paths
385
392
  AGENT_BROWSER_ENABLE="react-devtools" # Comma-separated built-in init script features
393
+ AGENT_BROWSER_HIDE_SCROLLBARS="false" # Keep native scrollbars visible in headless Chromium screenshots
386
394
  AGENT_BROWSER_PROVIDER="browserbase" # Cloud browser provider
387
395
  AGENT_BROWSER_STREAM_PORT="9223" # Override WebSocket streaming port (default: OS-assigned)
388
- AGENT_BROWSER_HOME="/path/to/agent-browser" # Custom install location
396
+ AGENT_BROWSER_CONFIG="./agent-browser.json" # Custom config file
397
+ AGENT_BROWSER_CDP="9222" # Connect daemon to CDP port or WebSocket URL
389
398
  ```
@@ -16,11 +16,11 @@ Capture browser automation as video for debugging, documentation, or verificatio
16
16
  ## Basic Recording
17
17
 
18
18
  ```bash
19
- # Start recording
19
+ # Launch the browser, then start recording
20
+ agent-browser open https://example.com
20
21
  agent-browser record start ./demo.webm
21
22
 
22
23
  # Perform actions
23
- agent-browser open https://example.com
24
24
  agent-browser snapshot -i
25
25
  agent-browser click @e1
26
26
  agent-browser fill @e2 "test input"
@@ -32,6 +32,9 @@ agent-browser record stop
32
32
  ## Recording Commands
33
33
 
34
34
  ```bash
35
+ # Launch a session first
36
+ agent-browser open
37
+
35
38
  # Start recording to file
36
39
  agent-browser record start ./output.webm
37
40
 
@@ -50,10 +53,9 @@ agent-browser record restart ./take2.webm
50
53
  #!/bin/bash
51
54
  # Record automation for debugging
52
55
 
53
- agent-browser record start ./debug-$(date +%Y%m%d-%H%M%S).webm
54
-
55
56
  # Run your automation
56
57
  agent-browser open https://app.example.com
58
+ agent-browser record start ./debug-$(date +%Y%m%d-%H%M%S).webm
57
59
  agent-browser snapshot -i
58
60
  agent-browser click @e1 || {
59
61
  echo "Click failed - check recording"
@@ -70,9 +72,8 @@ agent-browser record stop
70
72
  #!/bin/bash
71
73
  # Record workflow for documentation
72
74
 
73
- agent-browser record start ./docs/how-to-login.webm
74
-
75
75
  agent-browser open https://app.example.com/login
76
+ agent-browser record start ./docs/how-to-login.webm
76
77
  agent-browser wait 1000 # Pause for visibility
77
78
 
78
79
  agent-browser snapshot -i
@@ -99,6 +100,7 @@ TEST_NAME="${1:-e2e-test}"
99
100
  RECORDING_DIR="./test-recordings"
100
101
  mkdir -p "$RECORDING_DIR"
101
102
 
103
+ agent-browser open
102
104
  agent-browser record start "$RECORDING_DIR/$TEST_NAME-$(date +%s).webm"
103
105
 
104
106
  # Run test
@@ -141,6 +143,7 @@ cleanup() {
141
143
  }
142
144
  trap cleanup EXIT
143
145
 
146
+ agent-browser open
144
147
  agent-browser record start ./automation.webm
145
148
  # ... automation steps ...
146
149
  ```
@@ -149,9 +152,8 @@ agent-browser record start ./automation.webm
149
152
 
150
153
  ```bash
151
154
  # Record video AND capture key frames
152
- agent-browser record start ./flow.webm
153
-
154
155
  agent-browser open https://example.com
156
+ agent-browser record start ./flow.webm
155
157
  agent-browser screenshot ./screenshots/step1-homepage.png
156
158
 
157
159
  agent-browser click @e1
@@ -1 +0,0 @@
1
- pnpm