agent-browser 0.27.0 → 0.27.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -42,6 +42,8 @@ agent-browser install # Download Chrome from Chrome for Testing (first time onl
42
42
 
43
43
  ### From Source
44
44
 
45
+ Requires Node.js 24+, pnpm 11+, and Rust.
46
+
45
47
  ```bash
46
48
  git clone https://github.com/vercel-labs/agent-browser
47
49
  cd agent-browser
@@ -73,6 +75,7 @@ Detects your installation method (npm, Homebrew, or Cargo) and runs the appropri
73
75
  ### Requirements
74
76
 
75
77
  - **Chrome** - Run `agent-browser install` to download Chrome from [Chrome for Testing](https://developer.chrome.com/blog/chrome-for-testing/) (Google's official automation channel). Existing Chrome, Brave, Playwright, and Puppeteer installations are detected automatically. No Playwright or Node.js required for the daemon.
78
+ - **Node.js 24+ and pnpm 11+** - Only needed when building from source.
76
79
  - **Rust** - Only needed when building from source (see From Source above).
77
80
 
78
81
  ## Quick Start
@@ -87,6 +90,13 @@ agent-browser screenshot page.png
87
90
  agent-browser close
88
91
  ```
89
92
 
93
+ Clicks fail early when another element covers the target's click point,
94
+ for example a consent banner or modal. Dismiss or interact with the reported
95
+ covering element, then take a fresh snapshot before retrying the original ref.
96
+
97
+ Headless Chromium screenshots hide native scrollbars for consistent image output.
98
+ Pass `--hide-scrollbars false` when launching to keep native scrollbars visible.
99
+
90
100
  ### Traditional Selectors (also supported)
91
101
 
92
102
  ```bash
@@ -359,7 +369,7 @@ agent-browser diff url https://v1.com https://v2.com --selector "#main" # Scope
359
369
  ### Debug
360
370
 
361
371
  ```bash
362
- agent-browser trace start [path] # Start recording trace
372
+ agent-browser trace start # Start recording trace
363
373
  agent-browser trace stop [path] # Stop and save trace
364
374
  agent-browser profiler start # Start Chrome DevTools profiling
365
375
  agent-browser profiler stop [path] # Stop and save profile (.json)
@@ -422,7 +432,7 @@ agent-browser react renders start # Begin fiber render recordin
422
432
  agent-browser react renders stop [--json] # Stop and print profile (--json for raw data)
423
433
  agent-browser react suspense [--only-dynamic] [--json] # Suspense boundaries + classifier
424
434
  # --only-dynamic hides the "static" list
425
- agent-browser vitals [url] [--json] # LCP/CLS/TTFB/FCP/INP + React hydration phases
435
+ agent-browser vitals [url] [--json] # LCP/CLS/TTFB/FCP/INP + hydration summary
426
436
  ```
427
437
 
428
438
  Each `react ...` subcommand requires `--enable react-devtools` to have been
@@ -432,6 +442,8 @@ binary). Without it the commands error with `React DevTools hook not installed
432
442
 
433
443
  Works on any React app — Next.js, Remix, Vite+React, CRA, TanStack Start,
434
444
  React Native Web, etc. `vitals` and `pushstate` are framework-agnostic.
445
+ `vitals` prints a summary by default; pass `--json` for the full structured
446
+ payload.
435
447
 
436
448
  ### Init scripts
437
449
 
@@ -626,14 +638,14 @@ agent-browser --session-name secure open example.com
626
638
 
627
639
  ## Security
628
640
 
629
- agent-browser includes security features for safe AI agent deployments. All features are opt-in -- existing workflows are unaffected until you explicitly enable a feature:
641
+ agent-browser includes security features for safe AI agent deployments. All features are opt-in, and existing workflows are unaffected until you explicitly enable a feature:
630
642
 
631
- - **Authentication Vault** -- Store credentials locally (always encrypted), reference by name. The LLM never sees passwords. `auth login` navigates with `load` and then waits for login form selectors to appear (SPA-friendly, timeout follows the default action timeout). A key is auto-generated at `~/.agent-browser/.encryption-key` if `AGENT_BROWSER_ENCRYPTION_KEY` is not set: `echo "pass" | agent-browser auth save github --url https://github.com/login --username user --password-stdin` then `agent-browser auth login github`
632
- - **Content Boundary Markers** -- Wrap page output in delimiters so LLMs can distinguish tool output from untrusted content: `--content-boundaries`
633
- - **Domain Allowlist** -- Restrict navigation to trusted domains (wildcards like `*.example.com` also match the bare domain): `--allowed-domains "example.com,*.example.com"`. Sub-resource requests (scripts, images, fetch) and WebSocket/EventSource connections to non-allowed domains are also blocked. Include any CDN domains your target pages depend on (e.g., `*.cdn.example.com`).
634
- - **Action Policy** -- Gate destructive actions with a static policy file: `--action-policy ./policy.json`
635
- - **Action Confirmation** -- Require explicit approval for sensitive action categories: `--confirm-actions eval,download`
636
- - **Output Length Limits** -- Prevent context flooding: `--max-output 50000`
643
+ - **Authentication Vault**: Store credentials locally (always encrypted), reference by name. The LLM never sees passwords. `auth login` navigates with `load` and then waits for login form selectors to appear (SPA-friendly, timeout follows the default action timeout). A key is auto-generated at `~/.agent-browser/.encryption-key` if `AGENT_BROWSER_ENCRYPTION_KEY` is not set: `echo "pass" | agent-browser auth save github --url https://github.com/login --username user --password-stdin` then `agent-browser auth login github`
644
+ - **Content Boundary Markers**: Wrap page output in delimiters so LLMs can distinguish tool output from untrusted content: `--content-boundaries`
645
+ - **Domain Allowlist**: Restrict navigation to trusted domains (wildcards like `*.example.com` also match the bare domain): `--allowed-domains "example.com,*.example.com"`. Sub-resource requests (scripts, images, fetch) and WebSocket/EventSource connections to non-allowed domains are also blocked. Include any CDN domains your target pages depend on (e.g., `*.cdn.example.com`).
646
+ - **Action Policy**: Gate destructive actions with a static policy file: `--action-policy ./policy.json`
647
+ - **Action Confirmation**: Require explicit approval for sensitive action categories: `--confirm-actions eval,download`
648
+ - **Output Length Limits**: Prevent context flooding: `--max-output 50000`
637
649
 
638
650
  | Variable | Description |
639
651
  | ----------------------------------- | ---------------------------------------- |
@@ -710,6 +722,7 @@ This is useful for multimodal AI models that can reason about visual layout, unl
710
722
  | `--proxy-bypass <hosts>` | Hosts to bypass proxy (or `AGENT_BROWSER_PROXY_BYPASS` env) |
711
723
  | `--ignore-https-errors` | Ignore HTTPS certificate errors (useful for self-signed certs) |
712
724
  | `--allow-file-access` | Allow file:// URLs to access local files (Chromium only) |
725
+ | `--hide-scrollbars <bool>` | Hide native scrollbars in headless Chromium screenshots, enabled by default (or `AGENT_BROWSER_HIDE_SCROLLBARS` env) |
713
726
  | `-p, --provider <name>` | Cloud browser provider (or `AGENT_BROWSER_PROVIDER` env) |
714
727
  | `--device <name>` | iOS device name, e.g. "iPhone 15 Pro" (or `AGENT_BROWSER_IOS_DEVICE` env) |
715
728
  | `--json` | JSON output (for agents) |
@@ -755,11 +768,11 @@ agent-browser dashboard stop
755
768
  The dashboard runs as a standalone background process on port 4848, independent of browser sessions. It stays available even when no sessions are running, and it works from `http://localhost:4848` or a proxied/forwarded URL that reaches the dashboard server, such as `https://dashboard.agent-browser.localhost` or a Coder workspace URL. The browser stays on the dashboard origin; session-specific tabs, status, and stream traffic are proxied internally, so session ports do not need to be exposed.
756
769
 
757
770
  The dashboard displays:
758
- - **Live viewport** -- real-time JPEG frames from the browser
759
- - **Activity feed** -- chronological command/result stream with timing and expandable details
760
- - **Console output** -- browser console messages (log, warn, error)
761
- - **Session creation** -- create new sessions from the UI with local engines (Chrome, Lightpanda) or cloud providers (AgentCore, Browserbase, Browserless, Browser Use, Kernel)
762
- - **AI Chat** -- chat with an AI assistant directly in the dashboard (requires Vercel AI Gateway configuration)
771
+ - **Live viewport**: real-time JPEG frames from the browser
772
+ - **Activity feed**: chronological command/result stream with timing and expandable details
773
+ - **Console output**: browser console messages (log, warn, error)
774
+ - **Session creation**: create new sessions from the UI with local engines (Chrome, Lightpanda) or cloud providers (AgentCore, Browserbase, Browserless, Browser Use, Kernel)
775
+ - **AI Chat**: chat with an AI assistant directly in the dashboard (requires Vercel AI Gateway configuration)
763
776
 
764
777
  ### AI Chat
765
778
 
@@ -793,8 +806,8 @@ Create an `agent-browser.json` file to set persistent defaults instead of repeat
793
806
 
794
807
  **Locations (lowest to highest priority):**
795
808
 
796
- 1. `~/.agent-browser/config.json` -- user-level defaults
797
- 2. `./agent-browser.json` -- project-level overrides (in working directory)
809
+ 1. `~/.agent-browser/config.json`: user-level defaults
810
+ 2. `./agent-browser.json`: project-level overrides (in working directory)
798
811
  3. `AGENT_BROWSER_*` environment variables override config file values
799
812
  4. CLI flags override everything
800
813
 
@@ -806,6 +819,7 @@ Create an `agent-browser.json` file to set persistent defaults instead of repeat
806
819
  "proxy": "http://localhost:8080",
807
820
  "profile": "./browser-data",
808
821
  "userAgent": "my-agent/1.0",
822
+ "hideScrollbars": false,
809
823
  "ignoreHttpsErrors": true
810
824
  }
811
825
  ```
@@ -873,6 +887,10 @@ agent-browser get text @e1 # Get heading text
873
887
  agent-browser hover @e4 # Hover the link
874
888
  ```
875
889
 
890
+ When a ref click is blocked by an overlay, the error includes the covering
891
+ element, such as `covered by <div#consent-banner>`. Click the banner or dialog
892
+ control first, then run `snapshot` again before reusing refs.
893
+
876
894
  **Why use refs?**
877
895
 
878
896
  - **Deterministic**: Ref points to exact element from snapshot
@@ -1229,7 +1247,7 @@ The daemon starts automatically on first command and persists between commands f
1229
1247
 
1230
1248
  ### Just ask the agent
1231
1249
 
1232
- The simplest approach -- just tell your agent to use it:
1250
+ The simplest approach is to tell your agent to use it:
1233
1251
 
1234
1252
  ```
1235
1253
  Use agent-browser to test the login flow. Run agent-browser --help to see available commands.
@@ -1245,7 +1263,7 @@ Add the skill to your AI coding assistant for richer context:
1245
1263
  npx skills add vercel-labs/agent-browser
1246
1264
  ```
1247
1265
 
1248
- This works with Claude Code, Codex, Cursor, Gemini CLI, GitHub Copilot, Goose, OpenCode, and Windsurf. The skill is fetched from the repository, so it stays up to date automatically -- do not copy `SKILL.md` from `node_modules` as it will become stale.
1266
+ This works with Claude Code, Codex, Cursor, Gemini CLI, GitHub Copilot, Goose, OpenCode, and Windsurf. The skill is fetched from the repository, so it stays up to date automatically. Do not copy `SKILL.md` from `node_modules` as it will become stale.
1249
1267
 
1250
1268
  ### Claude Code
1251
1269
 
@@ -1474,8 +1492,8 @@ Optional configuration via environment variables:
1474
1492
 
1475
1493
  | Variable | Description | Default |
1476
1494
  | ------------------------ | -------------------------------------------------------------------------------- | ------- |
1477
- | `KERNEL_HEADLESS` | Run browser in headless mode (`true`/`false`) | `false` |
1478
- | `KERNEL_STEALTH` | Enable stealth mode to avoid bot detection (`true`/`false`) | `true` |
1495
+ | `KERNEL_HEADLESS` | Run browser in headless mode (`true`/`false`) | `true` |
1496
+ | `KERNEL_STEALTH` | Enable stealth mode to avoid bot detection (`true`/`false`) | `false` |
1479
1497
  | `KERNEL_TIMEOUT_SECONDS` | Session timeout in seconds | `300` |
1480
1498
  | `KERNEL_PROFILE_NAME` | Browser profile name for persistent cookies/logins (created if it doesn't exist) | (none) |
1481
1499
 
Binary file
Binary file
Binary file
Binary file
Binary file
Binary file
Binary file
package/package.json CHANGED
@@ -1,8 +1,13 @@
1
1
  {
2
2
  "name": "agent-browser",
3
- "version": "0.27.0",
3
+ "version": "0.27.2",
4
4
  "description": "Browser automation CLI for AI agents",
5
5
  "type": "module",
6
+ "packageManager": "pnpm@11.1.3",
7
+ "engines": {
8
+ "node": ">=24.0.0",
9
+ "pnpm": ">=11.0.0"
10
+ },
6
11
  "files": [
7
12
  "bin",
8
13
  "scripts",
@@ -20,7 +25,7 @@
20
25
  "build:macos": "npm run version:sync && (cargo build --release --manifest-path cli/Cargo.toml --target aarch64-apple-darwin & cargo build --release --manifest-path cli/Cargo.toml --target x86_64-apple-darwin & wait) && cp cli/target/aarch64-apple-darwin/release/agent-browser bin/agent-browser-darwin-arm64 && cp cli/target/x86_64-apple-darwin/release/agent-browser bin/agent-browser-darwin-x64",
21
26
  "build:windows": "npm run version:sync && docker compose -f docker/docker-compose.yml run --rm build-windows",
22
27
  "build:all-platforms": "npm run version:sync && (npm run build:linux & npm run build:windows & wait) && npm run build:macos",
23
- "build:docker": "docker build -t agent-browser-builder -f docker/Dockerfile.build .",
28
+ "build:docker": "docker build --platform linux/amd64 -t agent-browser-builder -f docker/Dockerfile.build .",
24
29
  "release": "npm run version:sync && npm run build:all-platforms && npm publish",
25
30
  "postinstall": "node scripts/postinstall.js",
26
31
  "build:dashboard": "cd packages/dashboard && pnpm build"
@@ -1,5 +1,5 @@
1
1
  #!/bin/bash
2
- set -e
2
+ set -euo pipefail
3
3
 
4
4
  # Build agent-browser for all platforms using Docker
5
5
  # Usage: ./scripts/build-all-platforms.sh
@@ -22,21 +22,32 @@ mkdir -p "$OUTPUT_DIR"
22
22
 
23
23
  # Build the Docker image if needed
24
24
  echo -e "${YELLOW}Building Docker cross-compilation image...${NC}"
25
- docker build -t agent-browser-builder -f "$PROJECT_ROOT/docker/Dockerfile.build" "$PROJECT_ROOT"
25
+ docker build --platform linux/amd64 -t agent-browser-builder -f "$PROJECT_ROOT/docker/Dockerfile.build" "$PROJECT_ROOT"
26
26
 
27
27
  # Function to build for a target
28
28
  build_target() {
29
- local target=$1
30
- local output_name=$2
31
-
32
- echo -e "${YELLOW}Building for ${target}...${NC}"
33
-
29
+ local rust_target=$1
30
+ local build_target=$2
31
+ local output_name=$3
32
+
33
+ echo -e "${YELLOW}Building for ${build_target}...${NC}"
34
+
35
+ rm -f "$OUTPUT_DIR/$output_name"
36
+
34
37
  docker run --rm \
38
+ --platform linux/amd64 \
35
39
  -v "$PROJECT_ROOT/cli:/build" \
36
40
  -v "$OUTPUT_DIR:/output" \
37
41
  agent-browser-builder \
38
- -c "cargo zigbuild --release --target ${target} && cp /build/target/${target}/release/agent-browser* /output/${output_name} && chmod +x /output/${output_name} 2>/dev/null || true"
39
-
42
+ -c "set -euo pipefail
43
+ cargo zigbuild --release --target ${build_target}
44
+ source_path=/build/target/${rust_target}/release/agent-browser
45
+ if [ -f \"\$source_path.exe\" ]; then
46
+ source_path=\"\$source_path.exe\"
47
+ fi
48
+ cp \"\$source_path\" /output/${output_name}
49
+ chmod +x /output/${output_name} 2>/dev/null || true"
50
+
40
51
  if [ -f "$OUTPUT_DIR/$output_name" ]; then
41
52
  echo -e "${GREEN}✓ Built ${output_name}${NC}"
42
53
  else
@@ -47,25 +58,25 @@ build_target() {
47
58
 
48
59
  # Build for each platform
49
60
  # Linux x64
50
- build_target "x86_64-unknown-linux-gnu" "agent-browser-linux-x64"
61
+ build_target "x86_64-unknown-linux-gnu" "x86_64-unknown-linux-gnu.2.28" "agent-browser-linux-x64"
51
62
 
52
63
  # Linux ARM64
53
- build_target "aarch64-unknown-linux-gnu" "agent-browser-linux-arm64"
64
+ build_target "aarch64-unknown-linux-gnu" "aarch64-unknown-linux-gnu.2.28" "agent-browser-linux-arm64"
54
65
 
55
66
  # Windows x64
56
- build_target "x86_64-pc-windows-gnu" "agent-browser-win32-x64.exe"
67
+ build_target "x86_64-pc-windows-gnu" "x86_64-pc-windows-gnu" "agent-browser-win32-x64.exe"
57
68
 
58
69
  # macOS x64 (via zig for cross-compilation)
59
- build_target "x86_64-apple-darwin" "agent-browser-darwin-x64"
70
+ build_target "x86_64-apple-darwin" "x86_64-apple-darwin" "agent-browser-darwin-x64"
60
71
 
61
72
  # macOS ARM64 (via zig for cross-compilation)
62
- build_target "aarch64-apple-darwin" "agent-browser-darwin-arm64"
73
+ build_target "aarch64-apple-darwin" "aarch64-apple-darwin" "agent-browser-darwin-arm64"
63
74
 
64
75
  # Linux musl x64 (Alpine)
65
- build_target "x86_64-unknown-linux-musl" "agent-browser-linux-musl-x64"
76
+ build_target "x86_64-unknown-linux-musl" "x86_64-unknown-linux-musl" "agent-browser-linux-musl-x64"
66
77
 
67
78
  # Linux musl ARM64 (Alpine)
68
- build_target "aarch64-unknown-linux-musl" "agent-browser-linux-musl-arm64"
79
+ build_target "aarch64-unknown-linux-musl" "aarch64-unknown-linux-musl" "agent-browser-linux-musl-arm64"
69
80
 
70
81
  echo ""
71
82
  echo -e "${GREEN}Build complete!${NC}"
@@ -243,6 +243,9 @@ agent-browser screenshot --full full.png # full scroll height
243
243
  agent-browser screenshot --annotate map.png # numbered labels + legend keyed to snapshot refs
244
244
  ```
245
245
 
246
+ Headless Chromium screenshots hide native scrollbars for consistent image output.
247
+ Pass `--hide-scrollbars false` when launching to keep native scrollbars visible.
248
+
246
249
  `--annotate` is designed for multimodal models: each label `[N]` maps to ref `@eN`.
247
250
 
248
251
  ### Handle multiple pages via tabs
@@ -250,13 +253,11 @@ agent-browser screenshot --annotate map.png # numbered labels + legend keyed
250
253
  ```bash
251
254
  agent-browser tab # list open tabs (with stable tabId)
252
255
  agent-browser tab new https://docs... # open a new tab (and switch to it)
253
- agent-browser tab 2 # switch to tab 2
254
- agent-browser tab close 2 # close tab 2
256
+ agent-browser tab t2 # switch to tab t2
257
+ agent-browser tab close t2 # close tab t2
255
258
  ```
256
259
 
257
- Stable `tabId`s mean `tab 2` points at the same tab across commands even
258
- when other tabs open or close. After switching, refs from a prior snapshot
259
- on a different tab no longer apply — re-snapshot.
260
+ Stable `tabId`s mean `t2` points at the same tab across commands even when other tabs open or close. After switching, refs from a prior snapshot on a different tab no longer apply — re-snapshot.
260
261
 
261
262
  ### Run multiple browsers in parallel
262
263
 
@@ -287,8 +288,8 @@ agent-browser network har stop /tmp/trace.har
287
288
  ### Record a video of the workflow
288
289
 
289
290
  ```bash
290
- agent-browser record start demo.webm
291
291
  agent-browser open https://example.com
292
+ agent-browser record start demo.webm
292
293
  agent-browser snapshot -i
293
294
  agent-browser click @e3
294
295
  agent-browser record stop
@@ -366,8 +367,9 @@ agent-browser snapshot -i
366
367
  ```
367
368
 
368
369
  **Click does nothing / overlay swallows the click**
369
- Some modals and cookie banners block other clicks. Snapshot, find the
370
- dismiss/close button, click it, then re-snapshot.
370
+ Some modals and cookie banners block other clicks. If `click` reports
371
+ `covered by <...>`, interact with that covering element first. Otherwise,
372
+ snapshot, find the dismiss/close button, click it, then re-snapshot.
371
373
 
372
374
  **Fill / type doesn't work**
373
375
  Some custom input components intercept key events. Try:
@@ -444,7 +446,8 @@ agent-browser pushstate <url> # SPA navigation (auto-detects
444
446
  ```
445
447
 
446
448
  Without `--enable react-devtools`, the `react …` commands error. `vitals`
447
- and `pushstate` work on any site regardless of framework.
449
+ and `pushstate` work on any site regardless of framework. `vitals` prints a
450
+ summary by default; use `--json` for the full structured payload.
448
451
 
449
452
  ## Working safely
450
453
 
@@ -71,6 +71,11 @@ agent-browser drag @e1 @e2 # Drag and drop
71
71
  agent-browser upload @e1 file.pdf # Upload files
72
72
  ```
73
73
 
74
+ Clicks fail before dispatch when another element covers the target's click
75
+ point. The error names the covering element, for example
76
+ `covered by <div#consent-banner>`. Dismiss or interact with that element, run a
77
+ fresh snapshot, then retry the original action.
78
+
74
79
  ## Get Information
75
80
 
76
81
  ```bash
@@ -103,9 +108,13 @@ agent-browser screenshot --full # Full page
103
108
  agent-browser pdf output.pdf # Save as PDF
104
109
  ```
105
110
 
111
+ Headless Chromium screenshots hide native scrollbars for consistent image output.
112
+ Pass `--hide-scrollbars false` when launching to keep native scrollbars visible.
113
+
106
114
  ## Video Recording
107
115
 
108
116
  ```bash
117
+ agent-browser open https://example.com # Launch a browser session first
109
118
  agent-browser record start ./demo.webm # Start recording
110
119
  agent-browser click @e1 # Perform actions
111
120
  agent-browser record stop # Stop and save video
@@ -300,7 +309,6 @@ agent-browser state load auth.json # Restore saved state
300
309
  agent-browser --session <name> ... # Isolated browser session
301
310
  agent-browser --json ... # JSON output for parsing
302
311
  agent-browser --headed ... # Show browser window (not headless)
303
- agent-browser --full ... # Full page screenshot (-f)
304
312
  agent-browser --cdp <port> ... # Connect via Chrome DevTools Protocol
305
313
  agent-browser -p <provider> ... # Cloud browser provider (--provider)
306
314
  agent-browser --proxy <url> ... # Use proxy server
@@ -309,6 +317,7 @@ agent-browser --headers <json> ... # HTTP headers scoped to URL's origin
309
317
  agent-browser --executable-path <p> # Custom browser executable
310
318
  agent-browser --extension <path> ... # Load browser extension (repeatable)
311
319
  agent-browser --ignore-https-errors # Ignore SSL certificate errors
320
+ agent-browser --hide-scrollbars false # Keep native scrollbars visible in headless Chromium screenshots
312
321
  agent-browser --help # Show help (-h)
313
322
  agent-browser --version # Show version (-V)
314
323
  agent-browser <command> --help # Show detailed help for a command
@@ -327,7 +336,7 @@ agent-browser errors --clear # Clear errors
327
336
  agent-browser highlight @e1 # Highlight element
328
337
  agent-browser inspect # Open Chrome DevTools for this session
329
338
  agent-browser trace start # Start recording trace
330
- agent-browser trace stop trace.zip # Stop and save trace
339
+ agent-browser trace stop trace.json # Stop and save trace
331
340
  agent-browser profiler start # Start Chrome DevTools profiling
332
341
  agent-browser profiler stop trace.json # Stop and save profile
333
342
  ```
@@ -349,6 +358,9 @@ agent-browser vitals [url] [--json] # LCP/CLS/TTFB/FCP/INP + hyd
349
358
  agent-browser pushstate <url> # SPA client-side nav (auto-detects Next router)
350
359
  ```
351
360
 
361
+ `vitals` prints a summary by default and uses the same fields as the structured
362
+ `--json` response.
363
+
352
364
  ## Init scripts
353
365
 
354
366
  ```bash
@@ -383,7 +395,9 @@ AGENT_BROWSER_EXECUTABLE_PATH="/path/chrome" # Custom browser path
383
395
  AGENT_BROWSER_EXTENSIONS="/ext1,/ext2" # Comma-separated extension paths
384
396
  AGENT_BROWSER_INIT_SCRIPTS="/a.js,/b.js" # Comma-separated init script paths
385
397
  AGENT_BROWSER_ENABLE="react-devtools" # Comma-separated built-in init script features
398
+ AGENT_BROWSER_HIDE_SCROLLBARS="false" # Keep native scrollbars visible in headless Chromium screenshots
386
399
  AGENT_BROWSER_PROVIDER="browserbase" # Cloud browser provider
387
400
  AGENT_BROWSER_STREAM_PORT="9223" # Override WebSocket streaming port (default: OS-assigned)
388
- AGENT_BROWSER_HOME="/path/to/agent-browser" # Custom install location
401
+ AGENT_BROWSER_CONFIG="./agent-browser.json" # Custom config file
402
+ AGENT_BROWSER_CDP="9222" # Connect daemon to CDP port or WebSocket URL
389
403
  ```
@@ -16,11 +16,11 @@ Capture browser automation as video for debugging, documentation, or verificatio
16
16
  ## Basic Recording
17
17
 
18
18
  ```bash
19
- # Start recording
19
+ # Launch the browser, then start recording
20
+ agent-browser open https://example.com
20
21
  agent-browser record start ./demo.webm
21
22
 
22
23
  # Perform actions
23
- agent-browser open https://example.com
24
24
  agent-browser snapshot -i
25
25
  agent-browser click @e1
26
26
  agent-browser fill @e2 "test input"
@@ -32,6 +32,9 @@ agent-browser record stop
32
32
  ## Recording Commands
33
33
 
34
34
  ```bash
35
+ # Launch a session first
36
+ agent-browser open
37
+
35
38
  # Start recording to file
36
39
  agent-browser record start ./output.webm
37
40
 
@@ -50,10 +53,9 @@ agent-browser record restart ./take2.webm
50
53
  #!/bin/bash
51
54
  # Record automation for debugging
52
55
 
53
- agent-browser record start ./debug-$(date +%Y%m%d-%H%M%S).webm
54
-
55
56
  # Run your automation
56
57
  agent-browser open https://app.example.com
58
+ agent-browser record start ./debug-$(date +%Y%m%d-%H%M%S).webm
57
59
  agent-browser snapshot -i
58
60
  agent-browser click @e1 || {
59
61
  echo "Click failed - check recording"
@@ -70,9 +72,8 @@ agent-browser record stop
70
72
  #!/bin/bash
71
73
  # Record workflow for documentation
72
74
 
73
- agent-browser record start ./docs/how-to-login.webm
74
-
75
75
  agent-browser open https://app.example.com/login
76
+ agent-browser record start ./docs/how-to-login.webm
76
77
  agent-browser wait 1000 # Pause for visibility
77
78
 
78
79
  agent-browser snapshot -i
@@ -99,6 +100,7 @@ TEST_NAME="${1:-e2e-test}"
99
100
  RECORDING_DIR="./test-recordings"
100
101
  mkdir -p "$RECORDING_DIR"
101
102
 
103
+ agent-browser open
102
104
  agent-browser record start "$RECORDING_DIR/$TEST_NAME-$(date +%s).webm"
103
105
 
104
106
  # Run test
@@ -141,6 +143,7 @@ cleanup() {
141
143
  }
142
144
  trap cleanup EXIT
143
145
 
146
+ agent-browser open
144
147
  agent-browser record start ./automation.webm
145
148
  # ... automation steps ...
146
149
  ```
@@ -149,9 +152,8 @@ agent-browser record start ./automation.webm
149
152
 
150
153
  ```bash
151
154
  # Record video AND capture key frames
152
- agent-browser record start ./flow.webm
153
-
154
155
  agent-browser open https://example.com
156
+ agent-browser record start ./flow.webm
155
157
  agent-browser screenshot ./screenshots/step1-homepage.png
156
158
 
157
159
  agent-browser click @e1
@@ -1 +0,0 @@
1
- pnpm