@dyyz1993/agent-browser 0.11.0 → 0.11.2
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/bin/agent-browser-linux-x64 +0 -0
- package/dist/cli/help.js +1 -1
- package/dist/openapi.js +1 -1
- package/dist/stream-server-standalone.js +3 -3
- package/dist/viewer-script.js +54 -54
- package/package.json +1 -1
- package/skills/agent-browser/SKILL.md +279 -229
- package/skills/agent-browser/references/mobile-viewer.md +188 -0
- package/skills/agent-browser/references/viewer-mode.md +148 -0
- package/skills/agent-browser/templates/api-interception.sh +3 -1
- package/skills/agent-browser/templates/data-extraction.sh +8 -4
- package/skills/agent-browser/templates/form-automation.sh +18 -23
- package/skills/agent-browser/templates/network-intercept-crawl.sh +1 -0
- package/skills/agent-browser/templates/recorder-workflow.sh +51 -0
- package/skills/agent-browser/templates/viewer-remote.sh +41 -0
- package/bin/agent-browser-darwin-arm64 +0 -0
- package/scripts/check_goods_container.js +0 -35
- package/scripts/check_page_content.js +0 -36
- package/scripts/click_applause_rate.js +0 -30
- package/scripts/e2e-test-recorder.ts +0 -584
- package/scripts/explore_jd_page.js +0 -31
- package/scripts/extract_all_jd_data.js +0 -80
- package/scripts/extract_jd_product_detail.js +0 -62
- package/scripts/extract_jd_products_correct_links.js +0 -78
- package/scripts/extract_jd_products_final.js +0 -80
- package/scripts/extract_jd_reviews.js +0 -48
- package/scripts/extract_jd_seafood_final.js +0 -78
- package/scripts/extract_multiple_products.js +0 -77
- package/scripts/extract_products_no_scroll.js +0 -68
- package/scripts/extract_products_simple.js +0 -68
- package/scripts/find_applause_rate.js +0 -26
- package/scripts/find_jd_links.js +0 -28
- package/scripts/find_main_content.js +0 -20
- package/scripts/find_product_cards.js +0 -38
- package/scripts/find_root_content.js +0 -26
- package/scripts/find_unique_products.js +0 -55
- package/scripts/get_jd_product_detail.js +0 -16
- package/scripts/get_jd_products.js +0 -23
- package/scripts/get_jd_seafood_products.js +0 -44
- package/scripts/get_product_details_from_images.js +0 -54
- package/scripts/scroll_and_get_products.js +0 -47
- package/scripts/scroll_deep_and_find.js +0 -45
- package/scripts/verify-baidu-enter.ts +0 -116
- package/scripts/verify-form.sh +0 -67
- package/scripts/verify-login.sh +0 -65
- package/scripts/verify-recording.sh +0 -80
- package/scripts/verify-upload.sh +0 -41
- package/skills/agent-browser/references/profiling.md +0 -120
|
@@ -1,12 +1,12 @@
|
|
|
1
1
|
---
|
|
2
2
|
name: agent-browser
|
|
3
|
-
description: Browser automation CLI for AI agents. Use when the user needs to interact with websites, including navigating pages, filling forms, clicking buttons, taking screenshots, extracting data, testing web apps, or automating any browser task. Triggers include requests to "open a website", "fill out a form", "click a button", "take a screenshot", "scrape data from a page", "test this web app", "login to a site", "automate browser actions", or any task requiring programmatic web interaction.
|
|
3
|
+
description: Browser automation CLI for AI agents. Use when the user needs to interact with websites, including navigating pages, filling forms, clicking buttons, taking screenshots, extracting data, testing web apps, viewer/streaming mode, mobile remote control, or automating any browser task. Triggers include requests to "open a website", "fill out a form", "click a button", "take a screenshot", "scrape data from a page", "test this web app", "login to a site", "automate browser actions", "view remote browser", "mobile browsing", or any task requiring programmatic web interaction.
|
|
4
4
|
allowed-tools: Bash(agent-browser:*)
|
|
5
5
|
---
|
|
6
6
|
|
|
7
7
|
# Browser Automation with agent-browser
|
|
8
8
|
|
|
9
|
-
##
|
|
9
|
+
## Quick Start
|
|
10
10
|
|
|
11
11
|
Every browser automation follows this pattern:
|
|
12
12
|
|
|
@@ -27,144 +27,191 @@ agent-browser wait --load networkidle
|
|
|
27
27
|
agent-browser snapshot -i # Check result
|
|
28
28
|
```
|
|
29
29
|
|
|
30
|
-
|
|
30
|
+
## Essential Commands
|
|
31
31
|
|
|
32
|
-
|
|
32
|
+
### Navigation
|
|
33
33
|
|
|
34
34
|
```bash
|
|
35
|
-
#
|
|
36
|
-
agent-browser
|
|
37
|
-
|
|
38
|
-
#
|
|
39
|
-
agent-browser
|
|
40
|
-
agent-browser snapshot -i
|
|
41
|
-
agent-browser fill @e1 "user@example.com"
|
|
42
|
-
agent-browser fill @e2 "password123"
|
|
43
|
-
agent-browser click @e3
|
|
44
|
-
|
|
45
|
-
# Stop and save
|
|
46
|
-
agent-browser recorder stop --output test-workflow.yaml
|
|
47
|
-
|
|
48
|
-
# Replay later
|
|
49
|
-
agent-browser recorder replay test-workflow.yaml
|
|
35
|
+
agent-browser open <url> # Navigate (aliases: goto, navigate)
|
|
36
|
+
agent-browser back # Go back
|
|
37
|
+
agent-browser forward # Go forward
|
|
38
|
+
agent-browser reload # Reload page
|
|
39
|
+
agent-browser close # Close browser (alias: quit, exit)
|
|
50
40
|
```
|
|
51
41
|
|
|
52
|
-
|
|
53
|
-
|
|
54
|
-
## Working with Iframes
|
|
55
|
-
|
|
56
|
-
Use `--in-frame` to operate inside iframes. The path uses iframe name/id or index:
|
|
42
|
+
### Element Interaction
|
|
57
43
|
|
|
58
44
|
```bash
|
|
59
|
-
|
|
60
|
-
agent-browser
|
|
61
|
-
|
|
62
|
-
|
|
63
|
-
agent-browser
|
|
64
|
-
|
|
65
|
-
|
|
66
|
-
agent-browser
|
|
67
|
-
agent-browser
|
|
68
|
-
agent-browser
|
|
69
|
-
agent-browser
|
|
70
|
-
agent-browser
|
|
45
|
+
agent-browser click @e1 # Click element
|
|
46
|
+
agent-browser dblclick @e1 # Double-click
|
|
47
|
+
agent-browser fill @e2 "text" # Clear and type text
|
|
48
|
+
agent-browser type @e2 "text" # Type without clearing
|
|
49
|
+
agent-browser select @e1 "option" # Select dropdown option
|
|
50
|
+
agent-browser check @e1 # Check checkbox
|
|
51
|
+
agent-browser uncheck @e1 # Uncheck checkbox
|
|
52
|
+
agent-browser press Enter # Press key (alias: key)
|
|
53
|
+
agent-browser keydown / keyup # Raw key down / up
|
|
54
|
+
agent-browser hover @e1 # Hover over element
|
|
55
|
+
agent-browser focus @e1 # Focus element
|
|
56
|
+
agent-browser drag @e1 @e2 # Drag from e1 to e2
|
|
57
|
+
agent-browser upload @e1 "/path" # Upload file
|
|
58
|
+
agent-browser download @e1 "/path" # Download resource
|
|
71
59
|
```
|
|
72
60
|
|
|
73
|
-
###
|
|
74
|
-
|
|
75
|
-
The frame path supports:
|
|
76
|
-
- **ID/Name**: `#frame-id` or `#frame-name`
|
|
77
|
-
- **Index**: `#0`, `#1` (by position)
|
|
78
|
-
- **Nested**: `#parent/child/grandchild`
|
|
61
|
+
### Scrolling
|
|
79
62
|
|
|
80
|
-
|
|
81
|
-
-
|
|
82
|
-
-
|
|
83
|
-
|
|
84
|
-
- `#0/1` - First iframe's second child
|
|
63
|
+
```bash
|
|
64
|
+
agent-browser scroll down 500 # Scroll pixels
|
|
65
|
+
agent-browser scrollintoview @e1 # Scroll element into view
|
|
66
|
+
```
|
|
85
67
|
|
|
86
|
-
|
|
68
|
+
### Snapshot & Inspection
|
|
87
69
|
|
|
88
70
|
```bash
|
|
89
|
-
# Navigation
|
|
90
|
-
agent-browser open <url> # Navigate (aliases: goto, navigate)
|
|
91
|
-
agent-browser close # Close browser
|
|
92
|
-
|
|
93
|
-
# Snapshot
|
|
94
71
|
agent-browser snapshot -i # Interactive elements with refs (recommended)
|
|
95
|
-
agent-browser snapshot -i -C # Include cursor-interactive elements
|
|
72
|
+
agent-browser snapshot -i -C # Include cursor-interactive elements
|
|
96
73
|
agent-browser snapshot -s "#selector" # Scope to CSS selector
|
|
97
74
|
agent-browser snapshot -s "body" --path # Include xpath and cssPath in refs
|
|
98
75
|
agent-browser snapshot -s "body" --attrs # Include element attributes in refs
|
|
76
|
+
agent-browser snapshot -i --json # JSON output for parsing
|
|
77
|
+
```
|
|
99
78
|
|
|
100
|
-
|
|
101
|
-
agent-browser click @e1 # Click element
|
|
102
|
-
agent-browser fill @e2 "text" # Clear and type text
|
|
103
|
-
agent-browser type @e2 "text" # Type without clearing
|
|
104
|
-
agent-browser select @e1 "option" # Select dropdown option
|
|
105
|
-
agent-browser check @e1 # Check checkbox
|
|
106
|
-
agent-browser press Enter # Press key
|
|
107
|
-
agent-browser scroll down 500 # Scroll page
|
|
79
|
+
### Getting Information
|
|
108
80
|
|
|
109
|
-
|
|
110
|
-
agent-browser get text @e1 # Get element text
|
|
81
|
+
```bash
|
|
82
|
+
agent-browser get text @e1 # Get element text content
|
|
111
83
|
agent-browser get url # Get current URL
|
|
112
84
|
agent-browser get title # Get page title
|
|
85
|
+
agent-browser get count ".item" # Count matching elements
|
|
86
|
+
agent-browser get box @e1 # Bounding box {x,y,width,height}
|
|
87
|
+
agent-browser get styles @e1 # Computed styles
|
|
88
|
+
agent-browser is visible @e1 # Visibility check
|
|
89
|
+
agent-browser is enabled @e1 # Enabled check
|
|
90
|
+
agent-browser is checked @e1 # Checked state
|
|
91
|
+
```
|
|
113
92
|
|
|
114
|
-
|
|
115
|
-
|
|
93
|
+
### Waiting
|
|
94
|
+
|
|
95
|
+
```bash
|
|
96
|
+
agent-browser wait @e1 # Wait for element to appear
|
|
116
97
|
agent-browser wait --load networkidle # Wait for network idle
|
|
117
|
-
agent-browser wait --
|
|
118
|
-
agent-browser wait
|
|
98
|
+
agent-browser wait --load domcontentloaded # Wait for DOM ready
|
|
99
|
+
agent-browser wait --url "**/page" # Wait for URL pattern match
|
|
100
|
+
agent-browser wait --text "Hello" # Wait for text on page
|
|
101
|
+
agent-browser wait --fn "document.hidden === false" # Wait for JS expression
|
|
102
|
+
agent-browser wait --download # Wait for download to complete
|
|
103
|
+
agent-browser wait 2000 # Wait milliseconds (fixed delay)
|
|
104
|
+
agent-browser wait --request "api/data" # Wait for specific network request (background listener)
|
|
105
|
+
```
|
|
119
106
|
|
|
120
|
-
|
|
121
|
-
agent-browser network requests # View network requests
|
|
122
|
-
agent-browser network requests --filter "**/api/**" # Filter requests
|
|
123
|
-
agent-browser network requests --clear # Clear history
|
|
124
|
-
agent-browser network route "**/api/**" --abort # Block requests
|
|
125
|
-
agent-browser network route "**/api/**" --body '{}' # Mock response
|
|
126
|
-
agent-browser network unroute "**/api/**" # Remove routes
|
|
107
|
+
### Capture
|
|
127
108
|
|
|
128
|
-
|
|
109
|
+
```bash
|
|
129
110
|
agent-browser screenshot # Screenshot to temp dir
|
|
130
111
|
agent-browser screenshot --full # Full page screenshot
|
|
112
|
+
agent-browser screenshot output.png # Save to file
|
|
131
113
|
agent-browser pdf output.pdf # Save as PDF
|
|
132
114
|
```
|
|
133
115
|
|
|
134
|
-
|
|
116
|
+
### Network Monitoring
|
|
117
|
+
|
|
118
|
+
```bash
|
|
119
|
+
agent-browser network requests # View all network requests
|
|
120
|
+
agent-browser network requests --filter "**/api/**" # Filter by URL pattern
|
|
121
|
+
agent-browser network requests --clear # Clear request history
|
|
122
|
+
agent-browser network requests --capture-response # Capture response bodies
|
|
123
|
+
agent-browser network requests --capture-response --type json # Filter captured by content type
|
|
124
|
+
agent-browser network requests --output ./captures/ # Save captures to directory
|
|
125
|
+
agent-browser network route "**/api/**" --abort # Block requests
|
|
126
|
+
agent-browser network route "**/api/**" --body '{"users": []}' # Mock response
|
|
127
|
+
agent-browser network route "**/api/**" --status 404 # Mock status code
|
|
128
|
+
agent-browser network unroute "**/api/**" # Remove route
|
|
129
|
+
```
|
|
130
|
+
|
|
131
|
+
See [network-monitoring.md](references/network-monitoring.md) for advanced patterns.
|
|
132
|
+
|
|
133
|
+
### Tabs & Windows
|
|
134
|
+
|
|
135
|
+
```bash
|
|
136
|
+
agent-browser tab list # List all tabs
|
|
137
|
+
agent-browser tab new # Open new tab
|
|
138
|
+
agent-browser tab close 2 # Close tab by index
|
|
139
|
+
agent-browser tab switch 0 # Switch to tab
|
|
140
|
+
agent-browser window new # Open new window
|
|
141
|
+
```
|
|
142
|
+
|
|
143
|
+
### Dialogs & Alerts
|
|
144
|
+
|
|
145
|
+
```bash
|
|
146
|
+
agent-browser dialog accept # Accept alert/dialog
|
|
147
|
+
agent-browser dialog dismiss # Dismiss alert/dialog
|
|
148
|
+
```
|
|
149
|
+
|
|
150
|
+
### Browser State
|
|
151
|
+
|
|
152
|
+
```bash
|
|
153
|
+
agent-browser state save auth.json # Save cookies/localStorage/session
|
|
154
|
+
agent-browser state clear # Clear all state
|
|
155
|
+
agent-browser storage session dump # Dump session storage
|
|
156
|
+
agent-browser storage session load # Load session storage
|
|
157
|
+
agent-browser cookies set name value domain # Set cookie
|
|
158
|
+
agent-browser cookies export # Export all cookies
|
|
159
|
+
```
|
|
160
|
+
|
|
161
|
+
### Debugging
|
|
162
|
+
|
|
163
|
+
```bash
|
|
164
|
+
agent-browser console "1+1" # Evaluate JS in browser console
|
|
165
|
+
agent-browser errors # Show recent page errors
|
|
166
|
+
agent-browser highlight @e1 # Highlight element on page
|
|
167
|
+
agent-browser trace start # Start Chrome trace
|
|
168
|
+
agent-browser trace stop ./trace.json # Stop and save trace
|
|
169
|
+
```
|
|
135
170
|
|
|
136
|
-
|
|
171
|
+
### Session Management
|
|
137
172
|
|
|
138
173
|
```bash
|
|
139
|
-
|
|
140
|
-
|
|
141
|
-
|
|
142
|
-
#
|
|
143
|
-
|
|
144
|
-
|
|
145
|
-
|
|
146
|
-
export AGENT_BROWSER_HUMAN=linear # Straight line (fastest)
|
|
147
|
-
|
|
148
|
-
# All interactions will use human-like movement
|
|
149
|
-
agent-browser click @e1
|
|
150
|
-
agent-browser fill @e1 "text"
|
|
151
|
-
agent-browser type @e1 "text"
|
|
152
|
-
agent-browser hover @e1
|
|
153
|
-
agent-browser dblclick @e1
|
|
154
|
-
|
|
155
|
-
# Wait with mouse wandering (when human mode enabled)
|
|
156
|
-
agent-browser wait 3000 # Wanders mouse while waiting
|
|
157
|
-
|
|
158
|
-
# Disable human mode
|
|
159
|
-
unset AGENT_BROWSER_HUMAN
|
|
174
|
+
agent-browser --session site1 open https://a.com # Named session
|
|
175
|
+
agent-browser --session site2 open https://b.com # Parallel session
|
|
176
|
+
agent-browser session list # List active sessions
|
|
177
|
+
agent-browser connect ws://localhost:9222 # Connect to remote CDP browser
|
|
178
|
+
agent-browser kill # Kill daemon process
|
|
179
|
+
agent-browser config # Show/edit config
|
|
180
|
+
agent-browser config [--json] # Config as JSON
|
|
160
181
|
```
|
|
161
182
|
|
|
162
|
-
|
|
163
|
-
|
|
164
|
-
|
|
165
|
-
|
|
166
|
-
|
|
167
|
-
|
|
183
|
+
## Global Options
|
|
184
|
+
|
|
185
|
+
These flags work with most commands:
|
|
186
|
+
|
|
187
|
+
| Flag | Description |
|
|
188
|
+
| -------------------------- | ---------------------------------------------- |
|
|
189
|
+
| `--session <name>` | Named browser session |
|
|
190
|
+
| `--json` | JSON output format |
|
|
191
|
+
| `--headed` | Show visible browser window |
|
|
192
|
+
| `--cdp <url>` | Connect via Chrome DevTools Protocol directly |
|
|
193
|
+
| `-p/--provider` | Provider: ios, browserbase, kernel, browseruse |
|
|
194
|
+
| `--proxy <url>` | HTTP/SOCKS5 proxy |
|
|
195
|
+
| `--proxy-bypass <rules>` | Proxy bypass rules |
|
|
196
|
+
| `--headers 'K: V'` | Extra HTTP headers per request |
|
|
197
|
+
| `--state <path>` | Restore browser state from file |
|
|
198
|
+
| `--profile <path>` | Chrome profile directory |
|
|
199
|
+
| `--args "<args>"` | Extra Chromium launch arguments |
|
|
200
|
+
| `--user-agent <ua>` | Custom User-Agent string |
|
|
201
|
+
| `--executable-path <path>` | Browser binary path |
|
|
202
|
+
| `--extension <path>` | Load .crx Chrome extension |
|
|
203
|
+
| `--ignore-https-errors` | Ignore HTTPS certificate errors |
|
|
204
|
+
| `--allow-file-access` | Allow file:// URLs |
|
|
205
|
+
| `--timeout <ms>` | Global operation timeout |
|
|
206
|
+
| `--debug` | Verbose debug logging |
|
|
207
|
+
|
|
208
|
+
Examples:
|
|
209
|
+
|
|
210
|
+
```bash
|
|
211
|
+
agent-browser --proxy http://proxy:8080 open https://example.com
|
|
212
|
+
agent-browser --headed --debug open https://example.com
|
|
213
|
+
agent-browser --user-agent "MyBot/1.0" open https://example.com
|
|
214
|
+
```
|
|
168
215
|
|
|
169
216
|
## Common Patterns
|
|
170
217
|
|
|
@@ -193,7 +240,7 @@ agent-browser click @e3
|
|
|
193
240
|
agent-browser wait --url "**/dashboard"
|
|
194
241
|
agent-browser state save auth.json
|
|
195
242
|
|
|
196
|
-
# Reuse in future sessions
|
|
243
|
+
# Reuse in future sessions
|
|
197
244
|
agent-browser --state auth.json open https://app.example.com/dashboard
|
|
198
245
|
```
|
|
199
246
|
|
|
@@ -202,196 +249,199 @@ agent-browser --state auth.json open https://app.example.com/dashboard
|
|
|
202
249
|
```bash
|
|
203
250
|
agent-browser open https://example.com/products
|
|
204
251
|
agent-browser snapshot -i
|
|
205
|
-
agent-browser get text @e5 #
|
|
206
|
-
agent-browser get text body > page.txt #
|
|
207
|
-
|
|
208
|
-
#
|
|
209
|
-
agent-browser snapshot -i --json
|
|
210
|
-
agent-browser get text @e1 --json
|
|
252
|
+
agent-browser get text @e5 # Specific element
|
|
253
|
+
agent-browser get text body > page.txt # All page text
|
|
254
|
+
agent-browser snapshot -i --json # JSON for parsing
|
|
255
|
+
agent-browser get text @e1 --json # Element as JSON
|
|
211
256
|
```
|
|
212
257
|
|
|
213
|
-
### API Interception
|
|
258
|
+
### API Interception (Passive Capture)
|
|
214
259
|
|
|
215
|
-
|
|
260
|
+
Capture API responses without making direct requests:
|
|
216
261
|
|
|
217
262
|
```bash
|
|
218
|
-
# 1. Open blank page first
|
|
219
263
|
agent-browser open "about:blank"
|
|
220
|
-
|
|
221
|
-
# 2. Start request listener in background
|
|
222
264
|
(agent-browser wait --request "api/users" --timeout 30000 > response.json) &
|
|
223
|
-
WAIT_PID=$!
|
|
224
265
|
sleep 1
|
|
225
|
-
|
|
226
|
-
# 3. Navigate to trigger the API call
|
|
227
266
|
agent-browser open "https://example.com/user/profile"
|
|
228
|
-
|
|
229
|
-
# 4. Wait for response
|
|
230
|
-
wait $WAIT_PID
|
|
231
|
-
|
|
232
|
-
# 5. Process captured data
|
|
267
|
+
wait $!
|
|
233
268
|
jq '.body' response.json
|
|
234
269
|
```
|
|
235
270
|
|
|
236
|
-
Example: Capture Douyin user videos
|
|
237
|
-
```bash
|
|
238
|
-
agent-browser open "about:blank"
|
|
239
|
-
(agent-browser wait --request "aweme/post" --timeout 30000 > /tmp/douyin.json) &
|
|
240
|
-
sleep 1
|
|
241
|
-
agent-browser open "https://www.douyin.com/user/xxx"
|
|
242
|
-
sleep 5
|
|
243
|
-
wait
|
|
244
|
-
jq '.body.aweme_list[:10] | map({id, desc, stats})' /tmp/douyin.json
|
|
245
|
-
```
|
|
246
|
-
|
|
247
271
|
### Network Monitoring & API Mocking
|
|
248
272
|
|
|
249
|
-
Monitor, filter, and mock network requests for testing and debugging.
|
|
250
|
-
|
|
251
273
|
```bash
|
|
252
|
-
# View all network requests
|
|
253
|
-
agent-browser network requests
|
|
254
|
-
|
|
255
|
-
# Filter requests by pattern
|
|
256
274
|
agent-browser network requests --filter "**/api/**"
|
|
257
|
-
|
|
258
|
-
# Clear request history
|
|
259
|
-
agent-browser network requests --clear
|
|
260
|
-
|
|
261
|
-
# Mock API responses
|
|
262
275
|
agent-browser network route "**/api/users" --body '{"users": []}'
|
|
263
|
-
|
|
264
|
-
# Block unwanted requests (ads, tracking)
|
|
265
276
|
agent-browser network route "**/ads/**" --abort
|
|
266
|
-
|
|
267
|
-
# Remove routes
|
|
268
277
|
agent-browser network unroute "**/api/users"
|
|
269
278
|
```
|
|
270
279
|
|
|
271
|
-
See [network-monitoring.md](references/network-monitoring.md) for detailed network monitoring patterns.
|
|
272
|
-
|
|
273
280
|
### Parallel Sessions
|
|
274
281
|
|
|
275
282
|
```bash
|
|
276
283
|
agent-browser --session site1 open https://site-a.com
|
|
277
284
|
agent-browser --session site2 open https://site-b.com
|
|
278
|
-
|
|
279
285
|
agent-browser --session site1 snapshot -i
|
|
280
|
-
agent-browser --session site2 snapshot -i
|
|
281
|
-
|
|
282
286
|
agent-browser session list
|
|
283
287
|
```
|
|
284
288
|
|
|
285
|
-
### Visual Browser (Debugging)
|
|
286
|
-
|
|
287
|
-
```bash
|
|
288
|
-
agent-browser --headed open https://example.com
|
|
289
|
-
agent-browser highlight @e1 # Highlight element
|
|
290
|
-
agent-browser record start demo.webm # Record session
|
|
291
|
-
```
|
|
292
|
-
|
|
293
289
|
### Local Files (PDFs, HTML)
|
|
294
290
|
|
|
295
291
|
```bash
|
|
296
|
-
|
|
297
|
-
agent-browser --allow-file-access open file:///path/to/document.pdf
|
|
292
|
+
agent-browser --allow-file-access open file:///path/to/doc.pdf
|
|
298
293
|
agent-browser --allow-file-access open file:///path/to/page.html
|
|
299
294
|
agent-browser screenshot output.png
|
|
300
295
|
```
|
|
301
296
|
|
|
302
|
-
###
|
|
297
|
+
### Working with Iframes
|
|
298
|
+
|
|
299
|
+
Use `--in-frame` to operate inside iframes:
|
|
303
300
|
|
|
304
301
|
```bash
|
|
305
|
-
|
|
306
|
-
agent-browser
|
|
302
|
+
agent-browser snapshot --in-frame "#my-iframe"
|
|
303
|
+
agent-browser snapshot --in-frame "#outer/inner" # Nested path
|
|
304
|
+
agent-browser click @e1 --in-frame "#container/frame"
|
|
305
|
+
agent-browser fill #user "admin" --in-frame "#container/login-frame"
|
|
306
|
+
```
|
|
307
307
|
|
|
308
|
-
|
|
309
|
-
agent-browser -p ios --device "iPhone 16 Pro" open https://example.com
|
|
308
|
+
Frame path syntax: `#id-or-name`, `#index` (position), `#parent/child` (nested).
|
|
310
309
|
|
|
311
|
-
|
|
312
|
-
agent-browser -p ios snapshot -i
|
|
313
|
-
agent-browser -p ios click @e1 # Click/tap element
|
|
314
|
-
agent-browser -p ios fill @e2 "text"
|
|
315
|
-
agent-browser -p ios scroll down 500 # Scroll gesture
|
|
310
|
+
### Semantic Locators (Alternative to Refs)
|
|
316
311
|
|
|
317
|
-
|
|
318
|
-
agent-browser -p ios screenshot mobile.png
|
|
312
|
+
When refs are unavailable, use semantic locators:
|
|
319
313
|
|
|
320
|
-
|
|
321
|
-
agent-browser
|
|
314
|
+
```bash
|
|
315
|
+
agent-browser find text "Sign In" click
|
|
316
|
+
agent-browser find label "Email" fill "user@test.com"
|
|
317
|
+
agent-browser find role button click --name "Submit"
|
|
318
|
+
agent-browser find placeholder "Search" type "query"
|
|
319
|
+
agent-browser find testid "submit-btn" click
|
|
322
320
|
```
|
|
323
321
|
|
|
324
|
-
|
|
322
|
+
### Proxy Configuration
|
|
325
323
|
|
|
326
|
-
|
|
324
|
+
```bash
|
|
325
|
+
agent-browser --proxy http://proxy:8080 open https://example.com
|
|
326
|
+
agent-browser --proxy socks5://proxy:1080 open https://example.com
|
|
327
|
+
agent-browser --proxy http://user:pass@proxy:8080 --proxy-bypass "localhost,*.internal" open https://example.com
|
|
328
|
+
```
|
|
327
329
|
|
|
328
|
-
|
|
330
|
+
## Advanced Features
|
|
329
331
|
|
|
330
|
-
|
|
332
|
+
### Recording & Replaying Workflows
|
|
331
333
|
|
|
332
|
-
|
|
334
|
+
For test automation and workflow capture:
|
|
333
335
|
|
|
334
|
-
|
|
335
|
-
-
|
|
336
|
-
-
|
|
336
|
+
```bash
|
|
337
|
+
agent-browser recorder start --session my-test
|
|
338
|
+
agent-browser open https://example.com/form
|
|
339
|
+
agent-browser snapshot -i
|
|
340
|
+
agent-browser fill @e1 "user@example.com"
|
|
341
|
+
agent-browser click @e3
|
|
342
|
+
agent-browser recorder stop --output test-workflow.yaml
|
|
343
|
+
agent-browser recorder replay test-workflow.yaml
|
|
344
|
+
```
|
|
345
|
+
|
|
346
|
+
See [recorder.md](references/recorder.md) for details.
|
|
347
|
+
|
|
348
|
+
### Human-like Mouse Movement
|
|
349
|
+
|
|
350
|
+
Simulate natural mouse trajectories via environment variable:
|
|
337
351
|
|
|
338
352
|
```bash
|
|
339
|
-
|
|
340
|
-
|
|
341
|
-
|
|
353
|
+
export AGENT_BROWSER_HUMAN=1 # Enable (default: arc path)
|
|
354
|
+
export AGENT_BROWSER_HUMAN=bezier # Bezier curve with overshoot
|
|
355
|
+
export AGENT_BROWSER_HUMAN=random # Random path with jitter
|
|
356
|
+
export AGENT_BROWSER_HUMAN=linear # Straight line (fastest)
|
|
357
|
+
|
|
358
|
+
agent-browser click @e1 # Uses human trajectory
|
|
359
|
+
agent-browser wait 3000 # Mouse wandering while waiting
|
|
360
|
+
unset AGENT_BROWSER_HUMAN # Disable
|
|
342
361
|
```
|
|
343
362
|
|
|
344
|
-
|
|
363
|
+
Features: continuous position tracking, acceleration curves, 4 trajectory types, auto-wandering on wait.
|
|
345
364
|
|
|
346
|
-
|
|
365
|
+
### Viewer / Streaming Mode
|
|
347
366
|
|
|
348
|
-
|
|
367
|
+
Real-time remote browser visualization with frame streaming over WebSocket.
|
|
349
368
|
|
|
350
369
|
```bash
|
|
351
|
-
|
|
352
|
-
agent-browser
|
|
353
|
-
agent-browser
|
|
354
|
-
agent-browser
|
|
355
|
-
agent-browser find testid "submit-btn" click
|
|
370
|
+
# Start viewer after opening a page
|
|
371
|
+
agent-browser open https://example.com
|
|
372
|
+
agent-browser viewer # Opens viewer URL in browser
|
|
373
|
+
agent-browser viewer --json # Get connection details as JSON
|
|
356
374
|
```
|
|
357
375
|
|
|
358
|
-
|
|
376
|
+
**Architecture:** Browser -> Daemon (IPC) -> Standalone Server (:5005) -> Viewer (WebSocket)
|
|
377
|
+
|
|
378
|
+
**Element Crop Mode:** Stream can be cropped to a specific DOM element's bounds. Coordinates auto-map to element-local space.
|
|
379
|
+
|
|
380
|
+
See [viewer-mode.md](references/viewer-mode.md) for architecture details, troubleshooting, and element mode.
|
|
359
381
|
|
|
360
|
-
|
|
382
|
+
### Mobile Remote Control (Touch Devices)
|
|
383
|
+
|
|
384
|
+
When viewer is opened on a phone/tablet, it automatically enters **mobile mode** with touch-optimized UI:
|
|
385
|
+
|
|
386
|
+
- **Touchpad**: Bottom-area gesture surface (tap=click, drag=move cursor, long-press=drag, 2-finger=scroll)
|
|
387
|
+
- **Input Panel**: Tap remote input field -> local text input appears -> syncs to remote via `input_fill`
|
|
388
|
+
- **Virtual Keyboard Toolbar**: Tab, Arrows, Enter, Backspace, Escape
|
|
389
|
+
- **IME Support**: Chinese/Japanese composition (pinyin etc.) — intermediate input NOT sent to remote
|
|
390
|
+
- **DeviceMode**: Auto-detects device type, switches UI dynamically on resize/orientationchange/matchMedia
|
|
391
|
+
|
|
392
|
+
See [mobile-viewer.md](references/mobile-viewer.md) for touchpad gestures, input panel flow, DeviceMode architecture.
|
|
393
|
+
|
|
394
|
+
### iOS Simulator (Appium)
|
|
395
|
+
|
|
396
|
+
Native iOS automation via Xcode + Appium:
|
|
361
397
|
|
|
362
398
|
```bash
|
|
363
|
-
|
|
364
|
-
agent-browser --
|
|
399
|
+
agent-browser device list # List simulators
|
|
400
|
+
agent-browser -p ios --device "iPhone 16 Pro" open https://example.com
|
|
401
|
+
agent-browser -p ios snapshot -i && agent-browser -p ios click @e1
|
|
402
|
+
agent-browser -p ios close # Shuts down simulator
|
|
403
|
+
```
|
|
404
|
+
|
|
405
|
+
Requires: macOS + Xcode + `npm install -g appium && appium driver install xcuitest`.
|
|
406
|
+
|
|
407
|
+
Note: Mobile viewer mode (above) works on ANY phone browser via web viewer — no simulator needed.
|
|
365
408
|
|
|
366
|
-
|
|
367
|
-
agent-browser --proxy https://proxy.example.com:8080 open https://example.com
|
|
409
|
+
### Cloud Browser Providers
|
|
368
410
|
|
|
369
|
-
|
|
370
|
-
agent-browser --proxy socks5://proxy.example.com:1080 open https://example.com
|
|
411
|
+
Connect to managed browser services:
|
|
371
412
|
|
|
372
|
-
|
|
373
|
-
agent-browser --
|
|
413
|
+
```bash
|
|
414
|
+
BROWSERBASE_API_KEY=key agent-browser --provider browserbase open https://example.com
|
|
415
|
+
KERNEL_API_KEY=key agent-browser --provider kernel open https://example.com
|
|
416
|
+
BROWSERUSE_API_KEY=key agent-browser --provider browseruse open https://example.com
|
|
417
|
+
```
|
|
418
|
+
|
|
419
|
+
Useful for: geo-distributed testing, IP diversity, team sharing, parallel scaling.
|
|
420
|
+
|
|
421
|
+
## Ref Lifecycle (Important)
|
|
374
422
|
|
|
375
|
-
|
|
376
|
-
agent-browser --proxy http://proxy.example.com:8080 --proxy-bypass "localhost,*.internal.com" open https://example.com
|
|
423
|
+
Refs (`@e1`, `@e2`) are invalidated when the page changes. Always re-snapshot after navigation, form submission, or dynamic content loading:
|
|
377
424
|
|
|
378
|
-
|
|
379
|
-
agent-browser
|
|
380
|
-
agent-browser
|
|
425
|
+
```bash
|
|
426
|
+
agent-browser click @e5 # Navigates to new page
|
|
427
|
+
agent-browser snapshot -i # MUST re-snapshot
|
|
428
|
+
agent-browser click @e1 # Use new refs
|
|
381
429
|
```
|
|
382
430
|
|
|
383
|
-
|
|
384
|
-
|
|
385
|
-
##
|
|
386
|
-
|
|
387
|
-
| Reference
|
|
388
|
-
|
|
389
|
-
| [
|
|
390
|
-
| [
|
|
391
|
-
| [
|
|
392
|
-
| [
|
|
393
|
-
| [
|
|
394
|
-
| [
|
|
395
|
-
| [
|
|
396
|
-
| [
|
|
397
|
-
| [
|
|
431
|
+
Refs are session-specific. For shell scripts, use semantic locators or CSS selectors instead. See [snapshot-refs.md](references/snapshot-refs.md).
|
|
432
|
+
|
|
433
|
+
## Reference Docs
|
|
434
|
+
|
|
435
|
+
| Reference | Content |
|
|
436
|
+
| --------------------------------------------------------- | ------------------------------------------------------------- |
|
|
437
|
+
| [commands.md](references/commands.md) | Complete command reference with all options |
|
|
438
|
+
| [data-extraction.md](references/data-extraction.md) | DOM, JS variables, API interception, infinite scroll, iframe |
|
|
439
|
+
| [snapshot-refs.md](references/snapshot-refs.md) | Ref lifecycle, invalidation rules, shell script conversion |
|
|
440
|
+
| [session-management.md](references/session-management.md) | Parallel sessions, state persistence, concurrent scraping |
|
|
441
|
+
| [authentication.md](references/authentication.md) | Login flows, OAuth, 2FA handling, state reuse |
|
|
442
|
+
| [video-recording.md](references/video-recording.md) | Video recording for debugging |
|
|
443
|
+
| [recorder.md](references/recorder.md) | Action recording & replay for test automation |
|
|
444
|
+
| [proxy-support.md](references/proxy-support.md) | Proxy config, geo-testing, rotating proxies |
|
|
445
|
+
| [network-monitoring.md](references/network-monitoring.md) | Request monitoring, API mocking, request blocking |
|
|
446
|
+
| [viewer-mode.md](references/viewer-mode.md) | Streaming viewer, element crop, architecture, troubleshooting |
|
|
447
|
+
| [mobile-viewer.md](references/mobile-viewer.md) | Touchpad, input panel, IME/CJK support, DeviceMode |
|