npm - @dyyz1993/agent-browser - Versions diffs - 0.11.0 → 0.11.2 - Mend

@dyyz1993/agent-browser 0.11.0 → 0.11.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (48) hide show

package/bin/agent-browser-linux-x64 +0 -0
package/dist/cli/help.js +1 -1
package/dist/openapi.js +1 -1
package/dist/stream-server-standalone.js +3 -3
package/dist/viewer-script.js +54 -54
package/package.json +1 -1
package/skills/agent-browser/SKILL.md +279 -229
package/skills/agent-browser/references/mobile-viewer.md +188 -0
package/skills/agent-browser/references/viewer-mode.md +148 -0
package/skills/agent-browser/templates/api-interception.sh +3 -1
package/skills/agent-browser/templates/data-extraction.sh +8 -4
package/skills/agent-browser/templates/form-automation.sh +18 -23
package/skills/agent-browser/templates/network-intercept-crawl.sh +1 -0
package/skills/agent-browser/templates/recorder-workflow.sh +51 -0
package/skills/agent-browser/templates/viewer-remote.sh +41 -0
package/bin/agent-browser-darwin-arm64 +0 -0
package/scripts/check_goods_container.js +0 -35
package/scripts/check_page_content.js +0 -36
package/scripts/click_applause_rate.js +0 -30
package/scripts/e2e-test-recorder.ts +0 -584
package/scripts/explore_jd_page.js +0 -31
package/scripts/extract_all_jd_data.js +0 -80
package/scripts/extract_jd_product_detail.js +0 -62
package/scripts/extract_jd_products_correct_links.js +0 -78
package/scripts/extract_jd_products_final.js +0 -80
package/scripts/extract_jd_reviews.js +0 -48
package/scripts/extract_jd_seafood_final.js +0 -78
package/scripts/extract_multiple_products.js +0 -77
package/scripts/extract_products_no_scroll.js +0 -68
package/scripts/extract_products_simple.js +0 -68
package/scripts/find_applause_rate.js +0 -26
package/scripts/find_jd_links.js +0 -28
package/scripts/find_main_content.js +0 -20
package/scripts/find_product_cards.js +0 -38
package/scripts/find_root_content.js +0 -26
package/scripts/find_unique_products.js +0 -55
package/scripts/get_jd_product_detail.js +0 -16
package/scripts/get_jd_products.js +0 -23
package/scripts/get_jd_seafood_products.js +0 -44
package/scripts/get_product_details_from_images.js +0 -54
package/scripts/scroll_and_get_products.js +0 -47
package/scripts/scroll_deep_and_find.js +0 -45
package/scripts/verify-baidu-enter.ts +0 -116
package/scripts/verify-form.sh +0 -67
package/scripts/verify-login.sh +0 -65
package/scripts/verify-recording.sh +0 -80
package/scripts/verify-upload.sh +0 -41
package/skills/agent-browser/references/profiling.md +0 -120

package/skills/agent-browser/SKILL.md CHANGED Viewed

@@ -1,12 +1,12 @@
 ---
 name: agent-browser
-description: Browser automation CLI for AI agents. Use when the user needs to interact with websites, including navigating pages, filling forms, clicking buttons, taking screenshots, extracting data, testing web apps, or automating any browser task. Triggers include requests to "open a website", "fill out a form", "click a button", "take a screenshot", "scrape data from a page", "test this web app", "login to a site", "automate browser actions", or any task requiring programmatic web interaction.
+description: Browser automation CLI for AI agents. Use when the user needs to interact with websites, including navigating pages, filling forms, clicking buttons, taking screenshots, extracting data, testing web apps, viewer/streaming mode, mobile remote control, or automating any browser task. Triggers include requests to "open a website", "fill out a form", "click a button", "take a screenshot", "scrape data from a page", "test this web app", "login to a site", "automate browser actions", "view remote browser", "mobile browsing", or any task requiring programmatic web interaction.
 allowed-tools: Bash(agent-browser:*)
 ---
 # Browser Automation with agent-browser
-## Core Workflow
+## Quick Start
 Every browser automation follows this pattern:
@@ -27,144 +27,191 @@ agent-browser wait --load networkidle
 agent-browser snapshot -i  # Check result
 ```
-### Recording & Replaying Workflows
+## Essential Commands
-For test automation and workflow capture:
+### Navigation
 ```bash
-# Start recording
-agent-browser recorder start --session my-test
-# Perform workflow
-agent-browser open https://example.com/form
-agent-browser snapshot -i
-agent-browser fill @e1 "user@example.com"
-agent-browser fill @e2 "password123"
-agent-browser click @e3
-# Stop and save
-agent-browser recorder stop --output test-workflow.yaml
-# Replay later
-agent-browser recorder replay test-workflow.yaml
+agent-browser open <url>              # Navigate (aliases: goto, navigate)
+agent-browser back                   # Go back
+agent-browser forward                # Go forward
+agent-browser reload                 # Reload page
+agent-browser close                  # Close browser (alias: quit, exit)
 ```
-See [recorder.md](references/recorder.md) for detailed recording workflows.
-## Working with Iframes
-Use `--in-frame` to operate inside iframes. The path uses iframe name/id or index:
+### Element Interaction
 ```bash
-# Direct iframe by ID or name
-agent-browser snapshot --in-frame "#my-iframe"
-# Nested iframe using path (name/id or index)
-agent-browser snapshot --in-frame "#outer-frame/inner-frame"
-# Example: Click element inside nested cross-origin iframe
-agent-browser open https://example.com
-agent-browser snapshot --in-frame "#iframe-container"
-agent-browser click @e1 --in-frame "#iframe-container/login-frame"
-agent-browser fill #username "admin" --in-frame "#iframe-container/login-frame"
-agent-browser get value #username --in-frame "#iframe-container/login-frame"
+agent-browser click @e1               # Click element
+agent-browser dblclick @e1            # Double-click
+agent-browser fill @e2 "text"         # Clear and type text
+agent-browser type @e2 "text"         # Type without clearing
+agent-browser select @e1 "option"     # Select dropdown option
+agent-browser check @e1               # Check checkbox
+agent-browser uncheck @e1             # Uncheck checkbox
+agent-browser press Enter             # Press key (alias: key)
+agent-browser keydown / keyup         # Raw key down / up
+agent-browser hover @e1               # Hover over element
+agent-browser focus @e1               # Focus element
+agent-browser drag @e1 @e2            # Drag from e1 to e2
+agent-browser upload @e1 "/path"      # Upload file
+agent-browser download @e1 "/path"   # Download resource
 ```
-### Frame Path Syntax
-The frame path supports:
-- **ID/Name**: `#frame-id` or `#frame-name`
-- **Index**: `#0`, `#1` (by position)
-- **Nested**: `#parent/child/grandchild`
+### Scrolling
-Examples:
-- `#my-iframe` - Single iframe
-- `#0` - First iframe
-- `#outer-iframe/login-frame` - Nested iframes by name
-- `#0/1` - First iframe's second child
+```bash
+agent-browser scroll down 500         # Scroll pixels
+agent-browser scrollintoview @e1     # Scroll element into view
+```
-## Essential Commands
+### Snapshot & Inspection
 ```bash
-# Navigation
-agent-browser open <url>              # Navigate (aliases: goto, navigate)
-agent-browser close                   # Close browser
-# Snapshot
 agent-browser snapshot -i             # Interactive elements with refs (recommended)
-agent-browser snapshot -i -C          # Include cursor-interactive elements (divs with onclick, cursor:pointer)
+agent-browser snapshot -i -C          # Include cursor-interactive elements
 agent-browser snapshot -s "#selector" # Scope to CSS selector
 agent-browser snapshot -s "body" --path   # Include xpath and cssPath in refs
 agent-browser snapshot -s "body" --attrs  # Include element attributes in refs
+agent-browser snapshot -i --json       # JSON output for parsing
+```
-# Interaction (use @refs from snapshot)
-agent-browser click @e1               # Click element
-agent-browser fill @e2 "text"         # Clear and type text
-agent-browser type @e2 "text"         # Type without clearing
-agent-browser select @e1 "option"     # Select dropdown option
-agent-browser check @e1               # Check checkbox
-agent-browser press Enter             # Press key
-agent-browser scroll down 500         # Scroll page
+### Getting Information
-# Get information
-agent-browser get text @e1            # Get element text
+```bash
+agent-browser get text @e1            # Get element text content
 agent-browser get url                 # Get current URL
 agent-browser get title               # Get page title
+agent-browser get count ".item"       # Count matching elements
+agent-browser get box @e1             # Bounding box {x,y,width,height}
+agent-browser get styles @e1           # Computed styles
+agent-browser is visible @e1          # Visibility check
+agent-browser is enabled @e1          # Enabled check
+agent-browser is checked @e1          # Checked state
+```
-# Wait
-agent-browser wait @e1                # Wait for element
+### Waiting
+```bash
+agent-browser wait @e1                # Wait for element to appear
 agent-browser wait --load networkidle # Wait for network idle
-agent-browser wait --url "**/page"    # Wait for URL pattern
-agent-browser wait 2000               # Wait milliseconds
+agent-browser wait --load domcontentloaded  # Wait for DOM ready
+agent-browser wait --url "**/page"    # Wait for URL pattern match
+agent-browser wait --text "Hello"     # Wait for text on page
+agent-browser wait --fn "document.hidden === false"  # Wait for JS expression
+agent-browser wait --download         # Wait for download to complete
+agent-browser wait 2000               # Wait milliseconds (fixed delay)
+agent-browser wait --request "api/data"  # Wait for specific network request (background listener)
+```
-# Network monitoring
-agent-browser network requests                 # View network requests
-agent-browser network requests --filter "**/api/**"  # Filter requests
-agent-browser network requests --clear         # Clear history
-agent-browser network route "**/api/**" --abort  # Block requests
-agent-browser network route "**/api/**" --body '{}'  # Mock response
-agent-browser network unroute "**/api/**"     # Remove routes
+### Capture
-# Capture
+```bash
 agent-browser screenshot              # Screenshot to temp dir
 agent-browser screenshot --full       # Full page screenshot
+agent-browser screenshot output.png  # Save to file
 agent-browser pdf output.pdf          # Save as PDF
 ```
-## Human-like Mouse Movement
+### Network Monitoring
+```bash
+agent-browser network requests                 # View all network requests
+agent-browser network requests --filter "**/api/**"  # Filter by URL pattern
+agent-browser network requests --clear         # Clear request history
+agent-browser network requests --capture-response  # Capture response bodies
+agent-browser network requests --capture-response --type json  # Filter captured by content type
+agent-browser network requests --output ./captures/  # Save captures to directory
+agent-browser network route "**/api/**" --abort  # Block requests
+agent-browser network route "**/api/**" --body '{"users": []}'  # Mock response
+agent-browser network route "**/api/**" --status 404  # Mock status code
+agent-browser network unroute "**/api/**"     # Remove route
+```
+See [network-monitoring.md](references/network-monitoring.md) for advanced patterns.
+### Tabs & Windows
+```bash
+agent-browser tab list                # List all tabs
+agent-browser tab new                 # Open new tab
+agent-browser tab close 2             # Close tab by index
+agent-browser tab switch 0            # Switch to tab
+agent-browser window new              # Open new window
+```
+### Dialogs & Alerts
+```bash
+agent-browser dialog accept            # Accept alert/dialog
+agent-browser dialog dismiss           # Dismiss alert/dialog
+```
+### Browser State
+```bash
+agent-browser state save auth.json    # Save cookies/localStorage/session
+agent-browser state clear             # Clear all state
+agent-browser storage session dump     # Dump session storage
+agent-browser storage session load     # Load session storage
+agent-browser cookies set name value domain  # Set cookie
+agent-browser cookies export            # Export all cookies
+```
+### Debugging
+```bash
+agent-browser console "1+1"           # Evaluate JS in browser console
+agent-browser errors                   # Show recent page errors
+agent-browser highlight @e1            # Highlight element on page
+agent-browser trace start             # Start Chrome trace
+agent-browser trace stop ./trace.json  # Stop and save trace
+```
-Enable globally via environment variable to simulate natural mouse trajectories:
+### Session Management
 ```bash
-# Enable human mode (default: arc path type)
-export AGENT_BROWSER_HUMAN=1
-# Or specify path type
-export AGENT_BROWSER_HUMAN=bezier   # Bezier curve with overshoot
-export AGENT_BROWSER_HUMAN=arc      # Smooth arc (default, most natural)
-export AGENT_BROWSER_HUMAN=random   # Random path with jitter
-export AGENT_BROWSER_HUMAN=linear   # Straight line (fastest)
-# All interactions will use human-like movement
-agent-browser click @e1
-agent-browser fill @e1 "text"
-agent-browser type @e1 "text"
-agent-browser hover @e1
-agent-browser dblclick @e1
-# Wait with mouse wandering (when human mode enabled)
-agent-browser wait 3000  # Wanders mouse while waiting
-# Disable human mode
-unset AGENT_BROWSER_HUMAN
+agent-browser --session site1 open https://a.com   # Named session
+agent-browser --session site2 open https://b.com   # Parallel session
+agent-browser session list                       # List active sessions
+agent-browser connect ws://localhost:9222        # Connect to remote CDP browser
+agent-browser kill                                 # Kill daemon process
+agent-browser config                               # Show/edit config
+agent-browser config [--json]                      # Config as JSON
 ```
-**Features:**
-- Continues from last mouse position for realistic trajectories
-- Natural acceleration/deceleration curves
-- Randomized delays between movements
-- Four trajectory types: `arc` (default), `bezier`, `random`, `linear`
-- `wait <ms>` automatically does mouse wandering when enabled
+## Global Options
+These flags work with most commands:
+| Flag                       | Description                                    |
+| -------------------------- | ---------------------------------------------- |
+| `--session <name>`         | Named browser session                          |
+| `--json`                   | JSON output format                             |
+| `--headed`                 | Show visible browser window                    |
+| `--cdp <url>`              | Connect via Chrome DevTools Protocol directly  |
+| `-p/--provider`            | Provider: ios, browserbase, kernel, browseruse |
+| `--proxy <url>`            | HTTP/SOCKS5 proxy                              |
+| `--proxy-bypass <rules>`   | Proxy bypass rules                             |
+| `--headers 'K: V'`         | Extra HTTP headers per request                 |
+| `--state <path>`           | Restore browser state from file                |
+| `--profile <path>`         | Chrome profile directory                       |
+| `--args "<args>"`          | Extra Chromium launch arguments                |
+| `--user-agent <ua>`        | Custom User-Agent string                       |
+| `--executable-path <path>` | Browser binary path                            |
+| `--extension <path>`       | Load .crx Chrome extension                     |
+| `--ignore-https-errors`    | Ignore HTTPS certificate errors                |
+| `--allow-file-access`      | Allow file:// URLs                             |
+| `--timeout <ms>`           | Global operation timeout                       |
+| `--debug`                  | Verbose debug logging                          |
+Examples:
+```bash
+agent-browser --proxy http://proxy:8080 open https://example.com
+agent-browser --headed --debug open https://example.com
+agent-browser --user-agent "MyBot/1.0" open https://example.com
+```
 ## Common Patterns
@@ -193,7 +240,7 @@ agent-browser click @e3
 agent-browser wait --url "**/dashboard"
 agent-browser state save auth.json
-# Reuse in future sessions (use --state flag)
+# Reuse in future sessions
 agent-browser --state auth.json open https://app.example.com/dashboard
 ```
@@ -202,196 +249,199 @@ agent-browser --state auth.json open https://app.example.com/dashboard
 ```bash
 agent-browser open https://example.com/products
 agent-browser snapshot -i
-agent-browser get text @e5           # Get specific element text
-agent-browser get text body > page.txt  # Get all page text
-# JSON output for parsing
-agent-browser snapshot -i --json
-agent-browser get text @e1 --json
+agent-browser get text @e5           # Specific element
+agent-browser get text body > page.txt  # All page text
+agent-browser snapshot -i --json      # JSON for parsing
+agent-browser get text @e1 --json   # Element as JSON
 ```
-### API Interception
+### API Interception (Passive Capture)
-Passively capture API responses without making direct requests. Useful for sites with anti-scraping measures.
+Capture API responses without making direct requests:
 ```bash
-# 1. Open blank page first
 agent-browser open "about:blank"
-# 2. Start request listener in background
 (agent-browser wait --request "api/users" --timeout 30000 > response.json) &
-WAIT_PID=$!
 sleep 1
-# 3. Navigate to trigger the API call
 agent-browser open "https://example.com/user/profile"
-# 4. Wait for response
-wait $WAIT_PID
-# 5. Process captured data
+wait $!
 jq '.body' response.json
 ```
-Example: Capture Douyin user videos
-```bash
-agent-browser open "about:blank"
-(agent-browser wait --request "aweme/post" --timeout 30000 > /tmp/douyin.json) &
-sleep 1
-agent-browser open "https://www.douyin.com/user/xxx"
-sleep 5
-wait
-jq '.body.aweme_list[:10] | map({id, desc, stats})' /tmp/douyin.json
-```
 ### Network Monitoring & API Mocking
-Monitor, filter, and mock network requests for testing and debugging.
 ```bash
-# View all network requests
-agent-browser network requests
-# Filter requests by pattern
 agent-browser network requests --filter "**/api/**"
-# Clear request history
-agent-browser network requests --clear
-# Mock API responses
 agent-browser network route "**/api/users" --body '{"users": []}'
-# Block unwanted requests (ads, tracking)
 agent-browser network route "**/ads/**" --abort
-# Remove routes
 agent-browser network unroute "**/api/users"
 ```
-See [network-monitoring.md](references/network-monitoring.md) for detailed network monitoring patterns.
 ### Parallel Sessions
 ```bash
 agent-browser --session site1 open https://site-a.com
 agent-browser --session site2 open https://site-b.com
 agent-browser --session site1 snapshot -i
-agent-browser --session site2 snapshot -i
 agent-browser session list
 ```
-### Visual Browser (Debugging)
-```bash
-agent-browser --headed open https://example.com
-agent-browser highlight @e1          # Highlight element
-agent-browser record start demo.webm # Record session
-```
 ### Local Files (PDFs, HTML)
 ```bash
-# Open local files with file:// URLs
-agent-browser --allow-file-access open file:///path/to/document.pdf
+agent-browser --allow-file-access open file:///path/to/doc.pdf
 agent-browser --allow-file-access open file:///path/to/page.html
 agent-browser screenshot output.png
 ```
-### iOS Simulator (Mobile Safari)
+### Working with Iframes
+Use `--in-frame` to operate inside iframes:
 ```bash
-# List available iOS simulators
-agent-browser device list
+agent-browser snapshot --in-frame "#my-iframe"
+agent-browser snapshot --in-frame "#outer/inner"  # Nested path
+agent-browser click @e1 --in-frame "#container/frame"
+agent-browser fill #user "admin" --in-frame "#container/login-frame"
+```
-# Launch Safari on a specific device
-agent-browser -p ios --device "iPhone 16 Pro" open https://example.com
+Frame path syntax: `#id-or-name`, `#index` (position), `#parent/child` (nested).
-# Same workflow as desktop - snapshot, interact, re-snapshot
-agent-browser -p ios snapshot -i
-agent-browser -p ios click @e1          # Click/tap element
-agent-browser -p ios fill @e2 "text"
-agent-browser -p ios scroll down 500    # Scroll gesture
+### Semantic Locators (Alternative to Refs)
-# Take screenshot
-agent-browser -p ios screenshot mobile.png
+When refs are unavailable, use semantic locators:
-# Close session (shuts down simulator)
-agent-browser -p ios close
+```bash
+agent-browser find text "Sign In" click
+agent-browser find label "Email" fill "user@test.com"
+agent-browser find role button click --name "Submit"
+agent-browser find placeholder "Search" type "query"
+agent-browser find testid "submit-btn" click
 ```
-**Requirements:** macOS with Xcode, Appium (`npm install -g appium && appium driver install xcuitest`)
+### Proxy Configuration
-**Real devices:** Works with physical iOS devices if pre-configured. Use `--device "<UDID>"` where UDID is from `xcrun xctrace list devices`.
+```bash
+agent-browser --proxy http://proxy:8080 open https://example.com
+agent-browser --proxy socks5://proxy:1080 open https://example.com
+agent-browser --proxy http://user:pass@proxy:8080 --proxy-bypass "localhost,*.internal" open https://example.com
+```
-**Note:** iOS uses standard commands like `click`, `fill`, `scroll` instead of mobile-specific aliases like `tap` or `swipe`.
+## Advanced Features
-## Ref Lifecycle (Important)
+### Recording & Replaying Workflows
-Refs (`@e1`, `@e2`, etc.) are invalidated when the page changes. Always re-snapshot after:
+For test automation and workflow capture:
-- Clicking links or buttons that navigate
-- Form submissions
-- Dynamic content loading (dropdowns, modals)
+```bash
+agent-browser recorder start --session my-test
+agent-browser open https://example.com/form
+agent-browser snapshot -i
+agent-browser fill @e1 "user@example.com"
+agent-browser click @e3
+agent-browser recorder stop --output test-workflow.yaml
+agent-browser recorder replay test-workflow.yaml
+```
+See [recorder.md](references/recorder.md) for details.
+### Human-like Mouse Movement
+Simulate natural mouse trajectories via environment variable:
 ```bash
-agent-browser click @e5              # Navigates to new page
-agent-browser snapshot -i            # MUST re-snapshot
-agent-browser click @e1              # Use new refs
+export AGENT_BROWSER_HUMAN=1           # Enable (default: arc path)
+export AGENT_BROWSER_HUMAN=bezier     # Bezier curve with overshoot
+export AGENT_BROWSER_HUMAN=random     # Random path with jitter
+export AGENT_BROWSER_HUMAN=linear     # Straight line (fastest)
+agent-browser click @e1              # Uses human trajectory
+agent-browser wait 3000              # Mouse wandering while waiting
+unset AGENT_BROWSER_HUMAN           # Disable
 ```
-**Important for Shell Scripts:** Refs are session-specific and cannot be used in standalone shell scripts. When converting interactive workflows to scripts, use semantic locators or CSS selectors instead. See [references/snapshot-refs.md](references/snapshot-refs.md#converting-to-shell-scripts) for details.
+Features: continuous position tracking, acceleration curves, 4 trajectory types, auto-wandering on wait.
-## Semantic Locators (Alternative to Refs)
+### Viewer / Streaming Mode
-When refs are unavailable or unreliable, use semantic locators:
+Real-time remote browser visualization with frame streaming over WebSocket.
 ```bash
-agent-browser find text "Sign In" click
-agent-browser find label "Email" fill "user@test.com"
-agent-browser find role button click --name "Submit"
-agent-browser find placeholder "Search" type "query"
-agent-browser find testid "submit-btn" click
+# Start viewer after opening a page
+agent-browser open https://example.com
+agent-browser viewer                    # Opens viewer URL in browser
+agent-browser viewer --json              # Get connection details as JSON
 ```
-## Proxy Configuration
+**Architecture:** Browser -> Daemon (IPC) -> Standalone Server (:5005) -> Viewer (WebSocket)
+**Element Crop Mode:** Stream can be cropped to a specific DOM element's bounds. Coordinates auto-map to element-local space.
+See [viewer-mode.md](references/viewer-mode.md) for architecture details, troubleshooting, and element mode.
-Configure proxy for geo-testing, rate limiting avoidance, and corporate environments:
+### Mobile Remote Control (Touch Devices)
+When viewer is opened on a phone/tablet, it automatically enters **mobile mode** with touch-optimized UI:
+- **Touchpad**: Bottom-area gesture surface (tap=click, drag=move cursor, long-press=drag, 2-finger=scroll)
+- **Input Panel**: Tap remote input field -> local text input appears -> syncs to remote via `input_fill`
+- **Virtual Keyboard Toolbar**: Tab, Arrows, Enter, Backspace, Escape
+- **IME Support**: Chinese/Japanese composition (pinyin etc.) — intermediate input NOT sent to remote
+- **DeviceMode**: Auto-detects device type, switches UI dynamically on resize/orientationchange/matchMedia
+See [mobile-viewer.md](references/mobile-viewer.md) for touchpad gestures, input panel flow, DeviceMode architecture.
+### iOS Simulator (Appium)
+Native iOS automation via Xcode + Appium:
 ```bash
-# Basic proxy via global option
-agent-browser --proxy http://proxy.example.com:8080 open https://example.com
+agent-browser device list                                    # List simulators
+agent-browser -p ios --device "iPhone 16 Pro" open https://example.com
+agent-browser -p ios snapshot -i && agent-browser -p ios click @e1
+agent-browser -p ios close                                        # Shuts down simulator
+```
+Requires: macOS + Xcode + `npm install -g appium && appium driver install xcuitest`.
+Note: Mobile viewer mode (above) works on ANY phone browser via web viewer — no simulator needed.
-# HTTPS proxy
-agent-browser --proxy https://proxy.example.com:8080 open https://example.com
+### Cloud Browser Providers
-# SOCKS5 proxy
-agent-browser --proxy socks5://proxy.example.com:1080 open https://example.com
+Connect to managed browser services:
-# Authenticated proxy
-agent-browser --proxy http://user:pass@proxy.example.com:8080 open https://example.com
+```bash
+BROWSERBASE_API_KEY=key agent-browser --provider browserbase open https://example.com
+KERNEL_API_KEY=key agent-browser --provider kernel open https://example.com
+BROWSERUSE_API_KEY=key agent-browser --provider browseruse open https://example.com
+```
+Useful for: geo-distributed testing, IP diversity, team sharing, parallel scaling.
+## Ref Lifecycle (Important)
-# Proxy with bypass list
-agent-browser --proxy http://proxy.example.com:8080 --proxy-bypass "localhost,*.internal.com" open https://example.com
+Refs (`@e1`, `@e2`) are invalidated when the page changes. Always re-snapshot after navigation, form submission, or dynamic content loading:
-# Verify proxy is working (check IP)
-agent-browser --proxy http://proxy.example.com:8080 open https://httpbin.org/ip
-agent-browser get text body
+```bash
+agent-browser click @e5              # Navigates to new page
+agent-browser snapshot -i            # MUST re-snapshot
+agent-browser click @e1              # Use new refs
 ```
-**Proxy Validation:** The proxy setting is actively enforced - if you specify an unreachable proxy server, navigation will fail with a connection error, confirming the proxy configuration is being used (not just ignored).
-## Deep-Dive Documentation
-| Reference | When to Use |
-|-----------|-------------|
-| [references/commands.md](references/commands.md) | Full command reference with all options |
-| [references/data-extraction.md](references/data-extraction.md) | **Data extraction patterns: DOM, JS variables, API interception, infinite scroll, iframe** |
-| [references/snapshot-refs.md](references/snapshot-refs.md) | Ref lifecycle, invalidation rules, troubleshooting |
-| [references/session-management.md](references/session-management.md) | Parallel sessions, state persistence, concurrent scraping |
-| [references/authentication.md](references/authentication.md) | Login flows, OAuth, 2FA handling, state reuse |
-| [references/video-recording.md](references/video-recording.md) | Video recording for debugging |
-| [references/recorder.md](references/recorder.md) | **Action recording & replay for test automation** |
-| [references/proxy-support.md](references/proxy-support.md) | Proxy configuration, geo-testing, rotating proxies |
-| [references/network-monitoring.md](references/network-monitoring.md) | **Network request monitoring, API mocking, request blocking** |
+Refs are session-specific. For shell scripts, use semantic locators or CSS selectors instead. See [snapshot-refs.md](references/snapshot-refs.md).
+## Reference Docs
+| Reference                                                 | Content                                                       |
+| --------------------------------------------------------- | ------------------------------------------------------------- |
+| [commands.md](references/commands.md)                     | Complete command reference with all options                   |
+| [data-extraction.md](references/data-extraction.md)       | DOM, JS variables, API interception, infinite scroll, iframe  |
+| [snapshot-refs.md](references/snapshot-refs.md)           | Ref lifecycle, invalidation rules, shell script conversion    |
+| [session-management.md](references/session-management.md) | Parallel sessions, state persistence, concurrent scraping     |
+| [authentication.md](references/authentication.md)         | Login flows, OAuth, 2FA handling, state reuse                 |
+| [video-recording.md](references/video-recording.md)       | Video recording for debugging                                 |
+| [recorder.md](references/recorder.md)                     | Action recording & replay for test automation                 |
+| [proxy-support.md](references/proxy-support.md)           | Proxy config, geo-testing, rotating proxies                   |
+| [network-monitoring.md](references/network-monitoring.md) | Request monitoring, API mocking, request blocking             |
+| [viewer-mode.md](references/viewer-mode.md)               | Streaming viewer, element crop, architecture, troubleshooting |
+| [mobile-viewer.md](references/mobile-viewer.md)           | Touchpad, input panel, IME/CJK support, DeviceMode            |