npm - agent-browser - Versions diffs - 0.27.0 → 0.27.1 - Mend

agent-browser 0.27.0 → 0.27.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (13) hide show

package/README.md +30 -20
package/bin/agent-browser-darwin-arm64 +0 -0
package/bin/agent-browser-darwin-x64 +0 -0
package/bin/agent-browser-linux-arm64 +0 -0
package/bin/agent-browser-linux-musl-arm64 +0 -0
package/bin/agent-browser-linux-musl-x64 +0 -0
package/bin/agent-browser-linux-x64 +0 -0
package/bin/agent-browser-win32-x64.exe +0 -0
package/package.json +6 -1
package/skill-data/core/SKILL.md +9 -7
package/skill-data/core/references/commands.md +12 -3
package/skill-data/core/references/video-recording.md +10 -8
package/bin/.install-method +0 -1

package/README.md CHANGED Viewed

@@ -42,6 +42,8 @@ agent-browser install  # Download Chrome from Chrome for Testing (first time onl
 ### From Source
+Requires Node.js 24+, pnpm 11+, and Rust.
 ```bash
 git clone https://github.com/vercel-labs/agent-browser
 cd agent-browser
@@ -73,6 +75,7 @@ Detects your installation method (npm, Homebrew, or Cargo) and runs the appropri
 ### Requirements
 - **Chrome** - Run `agent-browser install` to download Chrome from [Chrome for Testing](https://developer.chrome.com/blog/chrome-for-testing/) (Google's official automation channel). Existing Chrome, Brave, Playwright, and Puppeteer installations are detected automatically. No Playwright or Node.js required for the daemon.
+- **Node.js 24+ and pnpm 11+** - Only needed when building from source.
 - **Rust** - Only needed when building from source (see From Source above).
 ## Quick Start
@@ -87,6 +90,9 @@ agent-browser screenshot page.png
 agent-browser close
 ```
+Headless Chromium screenshots hide native scrollbars for consistent image output.
+Pass `--hide-scrollbars false` when launching to keep native scrollbars visible.
 ### Traditional Selectors (also supported)
 ```bash
@@ -359,7 +365,7 @@ agent-browser diff url https://v1.com https://v2.com --selector "#main"  # Scope
 ### Debug
 ```bash
-agent-browser trace start [path]      # Start recording trace
+agent-browser trace start             # Start recording trace
 agent-browser trace stop [path]       # Stop and save trace
 agent-browser profiler start          # Start Chrome DevTools profiling
 agent-browser profiler stop [path]    # Stop and save profile (.json)
@@ -422,7 +428,7 @@ agent-browser react renders start                  # Begin fiber render recordin
 agent-browser react renders stop [--json]          # Stop and print profile (--json for raw data)
 agent-browser react suspense [--only-dynamic] [--json]  # Suspense boundaries + classifier
                                                          # --only-dynamic hides the "static" list
-agent-browser vitals [url] [--json]                # LCP/CLS/TTFB/FCP/INP + React hydration phases
+agent-browser vitals [url] [--json]                # LCP/CLS/TTFB/FCP/INP + hydration summary
 ```
 Each `react ...` subcommand requires `--enable react-devtools` to have been
@@ -432,6 +438,8 @@ binary). Without it the commands error with `React DevTools hook not installed
 Works on any React app — Next.js, Remix, Vite+React, CRA, TanStack Start,
 React Native Web, etc. `vitals` and `pushstate` are framework-agnostic.
+`vitals` prints a summary by default; pass `--json` for the full structured
+payload.
 ### Init scripts
@@ -626,14 +634,14 @@ agent-browser --session-name secure open example.com
 ## Security
-agent-browser includes security features for safe AI agent deployments. All features are opt-in -- existing workflows are unaffected until you explicitly enable a feature:
+agent-browser includes security features for safe AI agent deployments. All features are opt-in, and existing workflows are unaffected until you explicitly enable a feature:
-- **Authentication Vault** -- Store credentials locally (always encrypted), reference by name. The LLM never sees passwords. `auth login` navigates with `load` and then waits for login form selectors to appear (SPA-friendly, timeout follows the default action timeout). A key is auto-generated at `~/.agent-browser/.encryption-key` if `AGENT_BROWSER_ENCRYPTION_KEY` is not set: `echo "pass" | agent-browser auth save github --url https://github.com/login --username user --password-stdin` then `agent-browser auth login github`
-- **Content Boundary Markers** -- Wrap page output in delimiters so LLMs can distinguish tool output from untrusted content: `--content-boundaries`
-- **Domain Allowlist** -- Restrict navigation to trusted domains (wildcards like `*.example.com` also match the bare domain): `--allowed-domains "example.com,*.example.com"`. Sub-resource requests (scripts, images, fetch) and WebSocket/EventSource connections to non-allowed domains are also blocked. Include any CDN domains your target pages depend on (e.g., `*.cdn.example.com`).
-- **Action Policy** -- Gate destructive actions with a static policy file: `--action-policy ./policy.json`
-- **Action Confirmation** -- Require explicit approval for sensitive action categories: `--confirm-actions eval,download`
-- **Output Length Limits** -- Prevent context flooding: `--max-output 50000`
+- **Authentication Vault**: Store credentials locally (always encrypted), reference by name. The LLM never sees passwords. `auth login` navigates with `load` and then waits for login form selectors to appear (SPA-friendly, timeout follows the default action timeout). A key is auto-generated at `~/.agent-browser/.encryption-key` if `AGENT_BROWSER_ENCRYPTION_KEY` is not set: `echo "pass" | agent-browser auth save github --url https://github.com/login --username user --password-stdin` then `agent-browser auth login github`
+- **Content Boundary Markers**: Wrap page output in delimiters so LLMs can distinguish tool output from untrusted content: `--content-boundaries`
+- **Domain Allowlist**: Restrict navigation to trusted domains (wildcards like `*.example.com` also match the bare domain): `--allowed-domains "example.com,*.example.com"`. Sub-resource requests (scripts, images, fetch) and WebSocket/EventSource connections to non-allowed domains are also blocked. Include any CDN domains your target pages depend on (e.g., `*.cdn.example.com`).
+- **Action Policy**: Gate destructive actions with a static policy file: `--action-policy ./policy.json`
+- **Action Confirmation**: Require explicit approval for sensitive action categories: `--confirm-actions eval,download`
+- **Output Length Limits**: Prevent context flooding: `--max-output 50000`
 | Variable                            | Description                              |
 | ----------------------------------- | ---------------------------------------- |
@@ -710,6 +718,7 @@ This is useful for multimodal AI models that can reason about visual layout, unl
 | `--proxy-bypass <hosts>` | Hosts to bypass proxy (or `AGENT_BROWSER_PROXY_BYPASS` env) |
 | `--ignore-https-errors` | Ignore HTTPS certificate errors (useful for self-signed certs) |
 | `--allow-file-access` | Allow file:// URLs to access local files (Chromium only) |
+| `--hide-scrollbars <bool>` | Hide native scrollbars in headless Chromium screenshots, enabled by default (or `AGENT_BROWSER_HIDE_SCROLLBARS` env) |
 | `-p, --provider <name>` | Cloud browser provider (or `AGENT_BROWSER_PROVIDER` env) |
 | `--device <name>` | iOS device name, e.g. "iPhone 15 Pro" (or `AGENT_BROWSER_IOS_DEVICE` env) |
 | `--json` | JSON output (for agents) |
@@ -755,11 +764,11 @@ agent-browser dashboard stop
 The dashboard runs as a standalone background process on port 4848, independent of browser sessions. It stays available even when no sessions are running, and it works from `http://localhost:4848` or a proxied/forwarded URL that reaches the dashboard server, such as `https://dashboard.agent-browser.localhost` or a Coder workspace URL. The browser stays on the dashboard origin; session-specific tabs, status, and stream traffic are proxied internally, so session ports do not need to be exposed.
 The dashboard displays:
-- **Live viewport** -- real-time JPEG frames from the browser
-- **Activity feed** -- chronological command/result stream with timing and expandable details
-- **Console output** -- browser console messages (log, warn, error)
-- **Session creation** -- create new sessions from the UI with local engines (Chrome, Lightpanda) or cloud providers (AgentCore, Browserbase, Browserless, Browser Use, Kernel)
-- **AI Chat** -- chat with an AI assistant directly in the dashboard (requires Vercel AI Gateway configuration)
+- **Live viewport**: real-time JPEG frames from the browser
+- **Activity feed**: chronological command/result stream with timing and expandable details
+- **Console output**: browser console messages (log, warn, error)
+- **Session creation**: create new sessions from the UI with local engines (Chrome, Lightpanda) or cloud providers (AgentCore, Browserbase, Browserless, Browser Use, Kernel)
+- **AI Chat**: chat with an AI assistant directly in the dashboard (requires Vercel AI Gateway configuration)
 ### AI Chat
@@ -793,8 +802,8 @@ Create an `agent-browser.json` file to set persistent defaults instead of repeat
 **Locations (lowest to highest priority):**
-1. `~/.agent-browser/config.json` -- user-level defaults
-2. `./agent-browser.json` -- project-level overrides (in working directory)
+1. `~/.agent-browser/config.json`: user-level defaults
+2. `./agent-browser.json`: project-level overrides (in working directory)
 3. `AGENT_BROWSER_*` environment variables override config file values
 4. CLI flags override everything
@@ -806,6 +815,7 @@ Create an `agent-browser.json` file to set persistent defaults instead of repeat
   "proxy": "http://localhost:8080",
   "profile": "./browser-data",
   "userAgent": "my-agent/1.0",
+  "hideScrollbars": false,
   "ignoreHttpsErrors": true
 }
 ```
@@ -1229,7 +1239,7 @@ The daemon starts automatically on first command and persists between commands f
 ### Just ask the agent
-The simplest approach -- just tell your agent to use it:
+The simplest approach is to tell your agent to use it:
 ```
 Use agent-browser to test the login flow. Run agent-browser --help to see available commands.
@@ -1245,7 +1255,7 @@ Add the skill to your AI coding assistant for richer context:
 npx skills add vercel-labs/agent-browser
 ```
-This works with Claude Code, Codex, Cursor, Gemini CLI, GitHub Copilot, Goose, OpenCode, and Windsurf. The skill is fetched from the repository, so it stays up to date automatically -- do not copy `SKILL.md` from `node_modules` as it will become stale.
+This works with Claude Code, Codex, Cursor, Gemini CLI, GitHub Copilot, Goose, OpenCode, and Windsurf. The skill is fetched from the repository, so it stays up to date automatically. Do not copy `SKILL.md` from `node_modules` as it will become stale.
 ### Claude Code
@@ -1474,8 +1484,8 @@ Optional configuration via environment variables:
 | Variable                 | Description                                                                      | Default |
 | ------------------------ | -------------------------------------------------------------------------------- | ------- |
-| `KERNEL_HEADLESS`        | Run browser in headless mode (`true`/`false`)                                    | `false` |
-| `KERNEL_STEALTH`         | Enable stealth mode to avoid bot detection (`true`/`false`)                      | `true`  |
+| `KERNEL_HEADLESS`        | Run browser in headless mode (`true`/`false`)                                    | `true`  |
+| `KERNEL_STEALTH`         | Enable stealth mode to avoid bot detection (`true`/`false`)                      | `false` |
 | `KERNEL_TIMEOUT_SECONDS` | Session timeout in seconds                                                       | `300`   |
 | `KERNEL_PROFILE_NAME`    | Browser profile name for persistent cookies/logins (created if it doesn't exist) | (none)  |

package/bin/agent-browser-darwin-arm64 CHANGED Viewed

Binary file

package/bin/agent-browser-darwin-x64 CHANGED Viewed

Binary file

package/bin/agent-browser-linux-arm64 CHANGED Viewed

Binary file

package/bin/agent-browser-linux-musl-arm64 CHANGED Viewed

Binary file

package/bin/agent-browser-linux-musl-x64 CHANGED Viewed

Binary file

package/bin/agent-browser-linux-x64 CHANGED Viewed

Binary file

package/bin/agent-browser-win32-x64.exe CHANGED Viewed

Binary file

package/package.json CHANGED Viewed

@@ -1,8 +1,13 @@
 {
   "name": "agent-browser",
-  "version": "0.27.0",
+  "version": "0.27.1",
   "description": "Browser automation CLI for AI agents",
   "type": "module",
+  "packageManager": "pnpm@11.1.3",
+  "engines": {
+    "node": ">=24.0.0",
+    "pnpm": ">=11.0.0"
+  },
   "files": [
     "bin",
     "scripts",

package/skill-data/core/SKILL.md CHANGED Viewed

@@ -243,6 +243,9 @@ agent-browser screenshot --full full.png        # full scroll height
 agent-browser screenshot --annotate map.png     # numbered labels + legend keyed to snapshot refs
 ```
+Headless Chromium screenshots hide native scrollbars for consistent image output.
+Pass `--hide-scrollbars false` when launching to keep native scrollbars visible.
 `--annotate` is designed for multimodal models: each label `[N]` maps to ref `@eN`.
 ### Handle multiple pages via tabs
@@ -250,13 +253,11 @@ agent-browser screenshot --annotate map.png     # numbered labels + legend keyed
 ```bash
 agent-browser tab                      # list open tabs (with stable tabId)
 agent-browser tab new https://docs...  # open a new tab (and switch to it)
-agent-browser tab 2                    # switch to tab 2
-agent-browser tab close 2              # close tab 2
+agent-browser tab t2                   # switch to tab t2
+agent-browser tab close t2             # close tab t2
 ```
-Stable `tabId`s mean `tab 2` points at the same tab across commands even
-when other tabs open or close. After switching, refs from a prior snapshot
-on a different tab no longer apply — re-snapshot.
+Stable `tabId`s mean `t2` points at the same tab across commands even when other tabs open or close. After switching, refs from a prior snapshot on a different tab no longer apply — re-snapshot.
 ### Run multiple browsers in parallel
@@ -287,8 +288,8 @@ agent-browser network har stop /tmp/trace.har
 ### Record a video of the workflow
 ```bash
-agent-browser record start demo.webm
 agent-browser open https://example.com
+agent-browser record start demo.webm
 agent-browser snapshot -i
 agent-browser click @e3
 agent-browser record stop
@@ -444,7 +445,8 @@ agent-browser pushstate <url>                    # SPA navigation (auto-detects
 ```
 Without `--enable react-devtools`, the `react …` commands error. `vitals`
-and `pushstate` work on any site regardless of framework.
+and `pushstate` work on any site regardless of framework. `vitals` prints a
+summary by default; use `--json` for the full structured payload.
 ## Working safely

package/skill-data/core/references/commands.md CHANGED Viewed

@@ -103,9 +103,13 @@ agent-browser screenshot --full   # Full page
 agent-browser pdf output.pdf      # Save as PDF
 ```
+Headless Chromium screenshots hide native scrollbars for consistent image output.
+Pass `--hide-scrollbars false` when launching to keep native scrollbars visible.
 ## Video Recording
 ```bash
+agent-browser open https://example.com     # Launch a browser session first
 agent-browser record start ./demo.webm    # Start recording
 agent-browser click @e1                   # Perform actions
 agent-browser record stop                 # Stop and save video
@@ -300,7 +304,6 @@ agent-browser state load auth.json    # Restore saved state
 agent-browser --session <name> ...    # Isolated browser session
 agent-browser --json ...              # JSON output for parsing
 agent-browser --headed ...            # Show browser window (not headless)
-agent-browser --full ...              # Full page screenshot (-f)
 agent-browser --cdp <port> ...        # Connect via Chrome DevTools Protocol
 agent-browser -p <provider> ...       # Cloud browser provider (--provider)
 agent-browser --proxy <url> ...       # Use proxy server
@@ -309,6 +312,7 @@ agent-browser --headers <json> ...    # HTTP headers scoped to URL's origin
 agent-browser --executable-path <p>   # Custom browser executable
 agent-browser --extension <path> ...  # Load browser extension (repeatable)
 agent-browser --ignore-https-errors   # Ignore SSL certificate errors
+agent-browser --hide-scrollbars false # Keep native scrollbars visible in headless Chromium screenshots
 agent-browser --help                  # Show help (-h)
 agent-browser --version               # Show version (-V)
 agent-browser <command> --help        # Show detailed help for a command
@@ -327,7 +331,7 @@ agent-browser errors --clear              # Clear errors
 agent-browser highlight @e1               # Highlight element
 agent-browser inspect                     # Open Chrome DevTools for this session
 agent-browser trace start                 # Start recording trace
-agent-browser trace stop trace.zip        # Stop and save trace
+agent-browser trace stop trace.json       # Stop and save trace
 agent-browser profiler start              # Start Chrome DevTools profiling
 agent-browser profiler stop trace.json    # Stop and save profile
 ```
@@ -349,6 +353,9 @@ agent-browser vitals [url] [--json]                 # LCP/CLS/TTFB/FCP/INP + hyd
 agent-browser pushstate <url>                       # SPA client-side nav (auto-detects Next router)
 ```
+`vitals` prints a summary by default and uses the same fields as the structured
+`--json` response.
 ## Init scripts
 ```bash
@@ -383,7 +390,9 @@ AGENT_BROWSER_EXECUTABLE_PATH="/path/chrome" # Custom browser path
 AGENT_BROWSER_EXTENSIONS="/ext1,/ext2"       # Comma-separated extension paths
 AGENT_BROWSER_INIT_SCRIPTS="/a.js,/b.js"     # Comma-separated init script paths
 AGENT_BROWSER_ENABLE="react-devtools"        # Comma-separated built-in init script features
+AGENT_BROWSER_HIDE_SCROLLBARS="false"        # Keep native scrollbars visible in headless Chromium screenshots
 AGENT_BROWSER_PROVIDER="browserbase"         # Cloud browser provider
 AGENT_BROWSER_STREAM_PORT="9223"             # Override WebSocket streaming port (default: OS-assigned)
-AGENT_BROWSER_HOME="/path/to/agent-browser"  # Custom install location
+AGENT_BROWSER_CONFIG="./agent-browser.json"  # Custom config file
+AGENT_BROWSER_CDP="9222"                     # Connect daemon to CDP port or WebSocket URL
 ```

package/skill-data/core/references/video-recording.md CHANGED Viewed

@@ -16,11 +16,11 @@ Capture browser automation as video for debugging, documentation, or verificatio
 ## Basic Recording
 ```bash
-# Start recording
+# Launch the browser, then start recording
+agent-browser open https://example.com
 agent-browser record start ./demo.webm
 # Perform actions
-agent-browser open https://example.com
 agent-browser snapshot -i
 agent-browser click @e1
 agent-browser fill @e2 "test input"
@@ -32,6 +32,9 @@ agent-browser record stop
 ## Recording Commands
 ```bash
+# Launch a session first
+agent-browser open
 # Start recording to file
 agent-browser record start ./output.webm
@@ -50,10 +53,9 @@ agent-browser record restart ./take2.webm
 #!/bin/bash
 # Record automation for debugging
-agent-browser record start ./debug-$(date +%Y%m%d-%H%M%S).webm
 # Run your automation
 agent-browser open https://app.example.com
+agent-browser record start ./debug-$(date +%Y%m%d-%H%M%S).webm
 agent-browser snapshot -i
 agent-browser click @e1 || {
     echo "Click failed - check recording"
@@ -70,9 +72,8 @@ agent-browser record stop
 #!/bin/bash
 # Record workflow for documentation
-agent-browser record start ./docs/how-to-login.webm
 agent-browser open https://app.example.com/login
+agent-browser record start ./docs/how-to-login.webm
 agent-browser wait 1000  # Pause for visibility
 agent-browser snapshot -i
@@ -99,6 +100,7 @@ TEST_NAME="${1:-e2e-test}"
 RECORDING_DIR="./test-recordings"
 mkdir -p "$RECORDING_DIR"
+agent-browser open
 agent-browser record start "$RECORDING_DIR/$TEST_NAME-$(date +%s).webm"
 # Run test
@@ -141,6 +143,7 @@ cleanup() {
 }
 trap cleanup EXIT
+agent-browser open
 agent-browser record start ./automation.webm
 # ... automation steps ...
 ```
@@ -149,9 +152,8 @@ agent-browser record start ./automation.webm
 ```bash
 # Record video AND capture key frames
-agent-browser record start ./flow.webm
 agent-browser open https://example.com
+agent-browser record start ./flow.webm
 agent-browser screenshot ./screenshots/step1-homepage.png
 agent-browser click @e1

package/bin/.install-method DELETED Viewed

	@@ -1 +0,0 @@
1	- pnpm