@dyyz1993/agent-browser 0.11.0 → 0.11.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (48) hide show
  1. package/bin/agent-browser-linux-x64 +0 -0
  2. package/dist/cli/help.js +1 -1
  3. package/dist/openapi.js +1 -1
  4. package/dist/stream-server-standalone.js +3 -3
  5. package/dist/viewer-script.js +54 -54
  6. package/package.json +1 -1
  7. package/skills/agent-browser/SKILL.md +279 -229
  8. package/skills/agent-browser/references/mobile-viewer.md +188 -0
  9. package/skills/agent-browser/references/viewer-mode.md +148 -0
  10. package/skills/agent-browser/templates/api-interception.sh +3 -1
  11. package/skills/agent-browser/templates/data-extraction.sh +8 -4
  12. package/skills/agent-browser/templates/form-automation.sh +18 -23
  13. package/skills/agent-browser/templates/network-intercept-crawl.sh +1 -0
  14. package/skills/agent-browser/templates/recorder-workflow.sh +51 -0
  15. package/skills/agent-browser/templates/viewer-remote.sh +41 -0
  16. package/bin/agent-browser-darwin-arm64 +0 -0
  17. package/scripts/check_goods_container.js +0 -35
  18. package/scripts/check_page_content.js +0 -36
  19. package/scripts/click_applause_rate.js +0 -30
  20. package/scripts/e2e-test-recorder.ts +0 -584
  21. package/scripts/explore_jd_page.js +0 -31
  22. package/scripts/extract_all_jd_data.js +0 -80
  23. package/scripts/extract_jd_product_detail.js +0 -62
  24. package/scripts/extract_jd_products_correct_links.js +0 -78
  25. package/scripts/extract_jd_products_final.js +0 -80
  26. package/scripts/extract_jd_reviews.js +0 -48
  27. package/scripts/extract_jd_seafood_final.js +0 -78
  28. package/scripts/extract_multiple_products.js +0 -77
  29. package/scripts/extract_products_no_scroll.js +0 -68
  30. package/scripts/extract_products_simple.js +0 -68
  31. package/scripts/find_applause_rate.js +0 -26
  32. package/scripts/find_jd_links.js +0 -28
  33. package/scripts/find_main_content.js +0 -20
  34. package/scripts/find_product_cards.js +0 -38
  35. package/scripts/find_root_content.js +0 -26
  36. package/scripts/find_unique_products.js +0 -55
  37. package/scripts/get_jd_product_detail.js +0 -16
  38. package/scripts/get_jd_products.js +0 -23
  39. package/scripts/get_jd_seafood_products.js +0 -44
  40. package/scripts/get_product_details_from_images.js +0 -54
  41. package/scripts/scroll_and_get_products.js +0 -47
  42. package/scripts/scroll_deep_and_find.js +0 -45
  43. package/scripts/verify-baidu-enter.ts +0 -116
  44. package/scripts/verify-form.sh +0 -67
  45. package/scripts/verify-login.sh +0 -65
  46. package/scripts/verify-recording.sh +0 -80
  47. package/scripts/verify-upload.sh +0 -41
  48. package/skills/agent-browser/references/profiling.md +0 -120
@@ -1,12 +1,12 @@
1
1
  ---
2
2
  name: agent-browser
3
- description: Browser automation CLI for AI agents. Use when the user needs to interact with websites, including navigating pages, filling forms, clicking buttons, taking screenshots, extracting data, testing web apps, or automating any browser task. Triggers include requests to "open a website", "fill out a form", "click a button", "take a screenshot", "scrape data from a page", "test this web app", "login to a site", "automate browser actions", or any task requiring programmatic web interaction.
3
+ description: Browser automation CLI for AI agents. Use when the user needs to interact with websites, including navigating pages, filling forms, clicking buttons, taking screenshots, extracting data, testing web apps, viewer/streaming mode, mobile remote control, or automating any browser task. Triggers include requests to "open a website", "fill out a form", "click a button", "take a screenshot", "scrape data from a page", "test this web app", "login to a site", "automate browser actions", "view remote browser", "mobile browsing", or any task requiring programmatic web interaction.
4
4
  allowed-tools: Bash(agent-browser:*)
5
5
  ---
6
6
 
7
7
  # Browser Automation with agent-browser
8
8
 
9
- ## Core Workflow
9
+ ## Quick Start
10
10
 
11
11
  Every browser automation follows this pattern:
12
12
 
@@ -27,144 +27,191 @@ agent-browser wait --load networkidle
27
27
  agent-browser snapshot -i # Check result
28
28
  ```
29
29
 
30
- ### Recording & Replaying Workflows
30
+ ## Essential Commands
31
31
 
32
- For test automation and workflow capture:
32
+ ### Navigation
33
33
 
34
34
  ```bash
35
- # Start recording
36
- agent-browser recorder start --session my-test
37
-
38
- # Perform workflow
39
- agent-browser open https://example.com/form
40
- agent-browser snapshot -i
41
- agent-browser fill @e1 "user@example.com"
42
- agent-browser fill @e2 "password123"
43
- agent-browser click @e3
44
-
45
- # Stop and save
46
- agent-browser recorder stop --output test-workflow.yaml
47
-
48
- # Replay later
49
- agent-browser recorder replay test-workflow.yaml
35
+ agent-browser open <url> # Navigate (aliases: goto, navigate)
36
+ agent-browser back # Go back
37
+ agent-browser forward # Go forward
38
+ agent-browser reload # Reload page
39
+ agent-browser close # Close browser (alias: quit, exit)
50
40
  ```
51
41
 
52
- See [recorder.md](references/recorder.md) for detailed recording workflows.
53
-
54
- ## Working with Iframes
55
-
56
- Use `--in-frame` to operate inside iframes. The path uses iframe name/id or index:
42
+ ### Element Interaction
57
43
 
58
44
  ```bash
59
- # Direct iframe by ID or name
60
- agent-browser snapshot --in-frame "#my-iframe"
61
-
62
- # Nested iframe using path (name/id or index)
63
- agent-browser snapshot --in-frame "#outer-frame/inner-frame"
64
-
65
- # Example: Click element inside nested cross-origin iframe
66
- agent-browser open https://example.com
67
- agent-browser snapshot --in-frame "#iframe-container"
68
- agent-browser click @e1 --in-frame "#iframe-container/login-frame"
69
- agent-browser fill #username "admin" --in-frame "#iframe-container/login-frame"
70
- agent-browser get value #username --in-frame "#iframe-container/login-frame"
45
+ agent-browser click @e1 # Click element
46
+ agent-browser dblclick @e1 # Double-click
47
+ agent-browser fill @e2 "text" # Clear and type text
48
+ agent-browser type @e2 "text" # Type without clearing
49
+ agent-browser select @e1 "option" # Select dropdown option
50
+ agent-browser check @e1 # Check checkbox
51
+ agent-browser uncheck @e1 # Uncheck checkbox
52
+ agent-browser press Enter # Press key (alias: key)
53
+ agent-browser keydown / keyup # Raw key down / up
54
+ agent-browser hover @e1 # Hover over element
55
+ agent-browser focus @e1 # Focus element
56
+ agent-browser drag @e1 @e2 # Drag from e1 to e2
57
+ agent-browser upload @e1 "/path" # Upload file
58
+ agent-browser download @e1 "/path" # Download resource
71
59
  ```
72
60
 
73
- ### Frame Path Syntax
74
-
75
- The frame path supports:
76
- - **ID/Name**: `#frame-id` or `#frame-name`
77
- - **Index**: `#0`, `#1` (by position)
78
- - **Nested**: `#parent/child/grandchild`
61
+ ### Scrolling
79
62
 
80
- Examples:
81
- - `#my-iframe` - Single iframe
82
- - `#0` - First iframe
83
- - `#outer-iframe/login-frame` - Nested iframes by name
84
- - `#0/1` - First iframe's second child
63
+ ```bash
64
+ agent-browser scroll down 500 # Scroll pixels
65
+ agent-browser scrollintoview @e1 # Scroll element into view
66
+ ```
85
67
 
86
- ## Essential Commands
68
+ ### Snapshot & Inspection
87
69
 
88
70
  ```bash
89
- # Navigation
90
- agent-browser open <url> # Navigate (aliases: goto, navigate)
91
- agent-browser close # Close browser
92
-
93
- # Snapshot
94
71
  agent-browser snapshot -i # Interactive elements with refs (recommended)
95
- agent-browser snapshot -i -C # Include cursor-interactive elements (divs with onclick, cursor:pointer)
72
+ agent-browser snapshot -i -C # Include cursor-interactive elements
96
73
  agent-browser snapshot -s "#selector" # Scope to CSS selector
97
74
  agent-browser snapshot -s "body" --path # Include xpath and cssPath in refs
98
75
  agent-browser snapshot -s "body" --attrs # Include element attributes in refs
76
+ agent-browser snapshot -i --json # JSON output for parsing
77
+ ```
99
78
 
100
- # Interaction (use @refs from snapshot)
101
- agent-browser click @e1 # Click element
102
- agent-browser fill @e2 "text" # Clear and type text
103
- agent-browser type @e2 "text" # Type without clearing
104
- agent-browser select @e1 "option" # Select dropdown option
105
- agent-browser check @e1 # Check checkbox
106
- agent-browser press Enter # Press key
107
- agent-browser scroll down 500 # Scroll page
79
+ ### Getting Information
108
80
 
109
- # Get information
110
- agent-browser get text @e1 # Get element text
81
+ ```bash
82
+ agent-browser get text @e1 # Get element text content
111
83
  agent-browser get url # Get current URL
112
84
  agent-browser get title # Get page title
85
+ agent-browser get count ".item" # Count matching elements
86
+ agent-browser get box @e1 # Bounding box {x,y,width,height}
87
+ agent-browser get styles @e1 # Computed styles
88
+ agent-browser is visible @e1 # Visibility check
89
+ agent-browser is enabled @e1 # Enabled check
90
+ agent-browser is checked @e1 # Checked state
91
+ ```
113
92
 
114
- # Wait
115
- agent-browser wait @e1 # Wait for element
93
+ ### Waiting
94
+
95
+ ```bash
96
+ agent-browser wait @e1 # Wait for element to appear
116
97
  agent-browser wait --load networkidle # Wait for network idle
117
- agent-browser wait --url "**/page" # Wait for URL pattern
118
- agent-browser wait 2000 # Wait milliseconds
98
+ agent-browser wait --load domcontentloaded # Wait for DOM ready
99
+ agent-browser wait --url "**/page" # Wait for URL pattern match
100
+ agent-browser wait --text "Hello" # Wait for text on page
101
+ agent-browser wait --fn "document.hidden === false" # Wait for JS expression
102
+ agent-browser wait --download # Wait for download to complete
103
+ agent-browser wait 2000 # Wait milliseconds (fixed delay)
104
+ agent-browser wait --request "api/data" # Wait for specific network request (background listener)
105
+ ```
119
106
 
120
- # Network monitoring
121
- agent-browser network requests # View network requests
122
- agent-browser network requests --filter "**/api/**" # Filter requests
123
- agent-browser network requests --clear # Clear history
124
- agent-browser network route "**/api/**" --abort # Block requests
125
- agent-browser network route "**/api/**" --body '{}' # Mock response
126
- agent-browser network unroute "**/api/**" # Remove routes
107
+ ### Capture
127
108
 
128
- # Capture
109
+ ```bash
129
110
  agent-browser screenshot # Screenshot to temp dir
130
111
  agent-browser screenshot --full # Full page screenshot
112
+ agent-browser screenshot output.png # Save to file
131
113
  agent-browser pdf output.pdf # Save as PDF
132
114
  ```
133
115
 
134
- ## Human-like Mouse Movement
116
+ ### Network Monitoring
117
+
118
+ ```bash
119
+ agent-browser network requests # View all network requests
120
+ agent-browser network requests --filter "**/api/**" # Filter by URL pattern
121
+ agent-browser network requests --clear # Clear request history
122
+ agent-browser network requests --capture-response # Capture response bodies
123
+ agent-browser network requests --capture-response --type json # Filter captured by content type
124
+ agent-browser network requests --output ./captures/ # Save captures to directory
125
+ agent-browser network route "**/api/**" --abort # Block requests
126
+ agent-browser network route "**/api/**" --body '{"users": []}' # Mock response
127
+ agent-browser network route "**/api/**" --status 404 # Mock status code
128
+ agent-browser network unroute "**/api/**" # Remove route
129
+ ```
130
+
131
+ See [network-monitoring.md](references/network-monitoring.md) for advanced patterns.
132
+
133
+ ### Tabs & Windows
134
+
135
+ ```bash
136
+ agent-browser tab list # List all tabs
137
+ agent-browser tab new # Open new tab
138
+ agent-browser tab close 2 # Close tab by index
139
+ agent-browser tab switch 0 # Switch to tab
140
+ agent-browser window new # Open new window
141
+ ```
142
+
143
+ ### Dialogs & Alerts
144
+
145
+ ```bash
146
+ agent-browser dialog accept # Accept alert/dialog
147
+ agent-browser dialog dismiss # Dismiss alert/dialog
148
+ ```
149
+
150
+ ### Browser State
151
+
152
+ ```bash
153
+ agent-browser state save auth.json # Save cookies/localStorage/session
154
+ agent-browser state clear # Clear all state
155
+ agent-browser storage session dump # Dump session storage
156
+ agent-browser storage session load # Load session storage
157
+ agent-browser cookies set name value domain # Set cookie
158
+ agent-browser cookies export # Export all cookies
159
+ ```
160
+
161
+ ### Debugging
162
+
163
+ ```bash
164
+ agent-browser console "1+1" # Evaluate JS in browser console
165
+ agent-browser errors # Show recent page errors
166
+ agent-browser highlight @e1 # Highlight element on page
167
+ agent-browser trace start # Start Chrome trace
168
+ agent-browser trace stop ./trace.json # Stop and save trace
169
+ ```
135
170
 
136
- Enable globally via environment variable to simulate natural mouse trajectories:
171
+ ### Session Management
137
172
 
138
173
  ```bash
139
- # Enable human mode (default: arc path type)
140
- export AGENT_BROWSER_HUMAN=1
141
-
142
- # Or specify path type
143
- export AGENT_BROWSER_HUMAN=bezier # Bezier curve with overshoot
144
- export AGENT_BROWSER_HUMAN=arc # Smooth arc (default, most natural)
145
- export AGENT_BROWSER_HUMAN=random # Random path with jitter
146
- export AGENT_BROWSER_HUMAN=linear # Straight line (fastest)
147
-
148
- # All interactions will use human-like movement
149
- agent-browser click @e1
150
- agent-browser fill @e1 "text"
151
- agent-browser type @e1 "text"
152
- agent-browser hover @e1
153
- agent-browser dblclick @e1
154
-
155
- # Wait with mouse wandering (when human mode enabled)
156
- agent-browser wait 3000 # Wanders mouse while waiting
157
-
158
- # Disable human mode
159
- unset AGENT_BROWSER_HUMAN
174
+ agent-browser --session site1 open https://a.com # Named session
175
+ agent-browser --session site2 open https://b.com # Parallel session
176
+ agent-browser session list # List active sessions
177
+ agent-browser connect ws://localhost:9222 # Connect to remote CDP browser
178
+ agent-browser kill # Kill daemon process
179
+ agent-browser config # Show/edit config
180
+ agent-browser config [--json] # Config as JSON
160
181
  ```
161
182
 
162
- **Features:**
163
- - Continues from last mouse position for realistic trajectories
164
- - Natural acceleration/deceleration curves
165
- - Randomized delays between movements
166
- - Four trajectory types: `arc` (default), `bezier`, `random`, `linear`
167
- - `wait <ms>` automatically does mouse wandering when enabled
183
+ ## Global Options
184
+
185
+ These flags work with most commands:
186
+
187
+ | Flag | Description |
188
+ | -------------------------- | ---------------------------------------------- |
189
+ | `--session <name>` | Named browser session |
190
+ | `--json` | JSON output format |
191
+ | `--headed` | Show visible browser window |
192
+ | `--cdp <url>` | Connect via Chrome DevTools Protocol directly |
193
+ | `-p/--provider` | Provider: ios, browserbase, kernel, browseruse |
194
+ | `--proxy <url>` | HTTP/SOCKS5 proxy |
195
+ | `--proxy-bypass <rules>` | Proxy bypass rules |
196
+ | `--headers 'K: V'` | Extra HTTP headers per request |
197
+ | `--state <path>` | Restore browser state from file |
198
+ | `--profile <path>` | Chrome profile directory |
199
+ | `--args "<args>"` | Extra Chromium launch arguments |
200
+ | `--user-agent <ua>` | Custom User-Agent string |
201
+ | `--executable-path <path>` | Browser binary path |
202
+ | `--extension <path>` | Load .crx Chrome extension |
203
+ | `--ignore-https-errors` | Ignore HTTPS certificate errors |
204
+ | `--allow-file-access` | Allow file:// URLs |
205
+ | `--timeout <ms>` | Global operation timeout |
206
+ | `--debug` | Verbose debug logging |
207
+
208
+ Examples:
209
+
210
+ ```bash
211
+ agent-browser --proxy http://proxy:8080 open https://example.com
212
+ agent-browser --headed --debug open https://example.com
213
+ agent-browser --user-agent "MyBot/1.0" open https://example.com
214
+ ```
168
215
 
169
216
  ## Common Patterns
170
217
 
@@ -193,7 +240,7 @@ agent-browser click @e3
193
240
  agent-browser wait --url "**/dashboard"
194
241
  agent-browser state save auth.json
195
242
 
196
- # Reuse in future sessions (use --state flag)
243
+ # Reuse in future sessions
197
244
  agent-browser --state auth.json open https://app.example.com/dashboard
198
245
  ```
199
246
 
@@ -202,196 +249,199 @@ agent-browser --state auth.json open https://app.example.com/dashboard
202
249
  ```bash
203
250
  agent-browser open https://example.com/products
204
251
  agent-browser snapshot -i
205
- agent-browser get text @e5 # Get specific element text
206
- agent-browser get text body > page.txt # Get all page text
207
-
208
- # JSON output for parsing
209
- agent-browser snapshot -i --json
210
- agent-browser get text @e1 --json
252
+ agent-browser get text @e5 # Specific element
253
+ agent-browser get text body > page.txt # All page text
254
+ agent-browser snapshot -i --json # JSON for parsing
255
+ agent-browser get text @e1 --json # Element as JSON
211
256
  ```
212
257
 
213
- ### API Interception
258
+ ### API Interception (Passive Capture)
214
259
 
215
- Passively capture API responses without making direct requests. Useful for sites with anti-scraping measures.
260
+ Capture API responses without making direct requests:
216
261
 
217
262
  ```bash
218
- # 1. Open blank page first
219
263
  agent-browser open "about:blank"
220
-
221
- # 2. Start request listener in background
222
264
  (agent-browser wait --request "api/users" --timeout 30000 > response.json) &
223
- WAIT_PID=$!
224
265
  sleep 1
225
-
226
- # 3. Navigate to trigger the API call
227
266
  agent-browser open "https://example.com/user/profile"
228
-
229
- # 4. Wait for response
230
- wait $WAIT_PID
231
-
232
- # 5. Process captured data
267
+ wait $!
233
268
  jq '.body' response.json
234
269
  ```
235
270
 
236
- Example: Capture Douyin user videos
237
- ```bash
238
- agent-browser open "about:blank"
239
- (agent-browser wait --request "aweme/post" --timeout 30000 > /tmp/douyin.json) &
240
- sleep 1
241
- agent-browser open "https://www.douyin.com/user/xxx"
242
- sleep 5
243
- wait
244
- jq '.body.aweme_list[:10] | map({id, desc, stats})' /tmp/douyin.json
245
- ```
246
-
247
271
  ### Network Monitoring & API Mocking
248
272
 
249
- Monitor, filter, and mock network requests for testing and debugging.
250
-
251
273
  ```bash
252
- # View all network requests
253
- agent-browser network requests
254
-
255
- # Filter requests by pattern
256
274
  agent-browser network requests --filter "**/api/**"
257
-
258
- # Clear request history
259
- agent-browser network requests --clear
260
-
261
- # Mock API responses
262
275
  agent-browser network route "**/api/users" --body '{"users": []}'
263
-
264
- # Block unwanted requests (ads, tracking)
265
276
  agent-browser network route "**/ads/**" --abort
266
-
267
- # Remove routes
268
277
  agent-browser network unroute "**/api/users"
269
278
  ```
270
279
 
271
- See [network-monitoring.md](references/network-monitoring.md) for detailed network monitoring patterns.
272
-
273
280
  ### Parallel Sessions
274
281
 
275
282
  ```bash
276
283
  agent-browser --session site1 open https://site-a.com
277
284
  agent-browser --session site2 open https://site-b.com
278
-
279
285
  agent-browser --session site1 snapshot -i
280
- agent-browser --session site2 snapshot -i
281
-
282
286
  agent-browser session list
283
287
  ```
284
288
 
285
- ### Visual Browser (Debugging)
286
-
287
- ```bash
288
- agent-browser --headed open https://example.com
289
- agent-browser highlight @e1 # Highlight element
290
- agent-browser record start demo.webm # Record session
291
- ```
292
-
293
289
  ### Local Files (PDFs, HTML)
294
290
 
295
291
  ```bash
296
- # Open local files with file:// URLs
297
- agent-browser --allow-file-access open file:///path/to/document.pdf
292
+ agent-browser --allow-file-access open file:///path/to/doc.pdf
298
293
  agent-browser --allow-file-access open file:///path/to/page.html
299
294
  agent-browser screenshot output.png
300
295
  ```
301
296
 
302
- ### iOS Simulator (Mobile Safari)
297
+ ### Working with Iframes
298
+
299
+ Use `--in-frame` to operate inside iframes:
303
300
 
304
301
  ```bash
305
- # List available iOS simulators
306
- agent-browser device list
302
+ agent-browser snapshot --in-frame "#my-iframe"
303
+ agent-browser snapshot --in-frame "#outer/inner" # Nested path
304
+ agent-browser click @e1 --in-frame "#container/frame"
305
+ agent-browser fill #user "admin" --in-frame "#container/login-frame"
306
+ ```
307
307
 
308
- # Launch Safari on a specific device
309
- agent-browser -p ios --device "iPhone 16 Pro" open https://example.com
308
+ Frame path syntax: `#id-or-name`, `#index` (position), `#parent/child` (nested).
310
309
 
311
- # Same workflow as desktop - snapshot, interact, re-snapshot
312
- agent-browser -p ios snapshot -i
313
- agent-browser -p ios click @e1 # Click/tap element
314
- agent-browser -p ios fill @e2 "text"
315
- agent-browser -p ios scroll down 500 # Scroll gesture
310
+ ### Semantic Locators (Alternative to Refs)
316
311
 
317
- # Take screenshot
318
- agent-browser -p ios screenshot mobile.png
312
+ When refs are unavailable, use semantic locators:
319
313
 
320
- # Close session (shuts down simulator)
321
- agent-browser -p ios close
314
+ ```bash
315
+ agent-browser find text "Sign In" click
316
+ agent-browser find label "Email" fill "user@test.com"
317
+ agent-browser find role button click --name "Submit"
318
+ agent-browser find placeholder "Search" type "query"
319
+ agent-browser find testid "submit-btn" click
322
320
  ```
323
321
 
324
- **Requirements:** macOS with Xcode, Appium (`npm install -g appium && appium driver install xcuitest`)
322
+ ### Proxy Configuration
325
323
 
326
- **Real devices:** Works with physical iOS devices if pre-configured. Use `--device "<UDID>"` where UDID is from `xcrun xctrace list devices`.
324
+ ```bash
325
+ agent-browser --proxy http://proxy:8080 open https://example.com
326
+ agent-browser --proxy socks5://proxy:1080 open https://example.com
327
+ agent-browser --proxy http://user:pass@proxy:8080 --proxy-bypass "localhost,*.internal" open https://example.com
328
+ ```
327
329
 
328
- **Note:** iOS uses standard commands like `click`, `fill`, `scroll` instead of mobile-specific aliases like `tap` or `swipe`.
330
+ ## Advanced Features
329
331
 
330
- ## Ref Lifecycle (Important)
332
+ ### Recording & Replaying Workflows
331
333
 
332
- Refs (`@e1`, `@e2`, etc.) are invalidated when the page changes. Always re-snapshot after:
334
+ For test automation and workflow capture:
333
335
 
334
- - Clicking links or buttons that navigate
335
- - Form submissions
336
- - Dynamic content loading (dropdowns, modals)
336
+ ```bash
337
+ agent-browser recorder start --session my-test
338
+ agent-browser open https://example.com/form
339
+ agent-browser snapshot -i
340
+ agent-browser fill @e1 "user@example.com"
341
+ agent-browser click @e3
342
+ agent-browser recorder stop --output test-workflow.yaml
343
+ agent-browser recorder replay test-workflow.yaml
344
+ ```
345
+
346
+ See [recorder.md](references/recorder.md) for details.
347
+
348
+ ### Human-like Mouse Movement
349
+
350
+ Simulate natural mouse trajectories via environment variable:
337
351
 
338
352
  ```bash
339
- agent-browser click @e5 # Navigates to new page
340
- agent-browser snapshot -i # MUST re-snapshot
341
- agent-browser click @e1 # Use new refs
353
+ export AGENT_BROWSER_HUMAN=1 # Enable (default: arc path)
354
+ export AGENT_BROWSER_HUMAN=bezier # Bezier curve with overshoot
355
+ export AGENT_BROWSER_HUMAN=random # Random path with jitter
356
+ export AGENT_BROWSER_HUMAN=linear # Straight line (fastest)
357
+
358
+ agent-browser click @e1 # Uses human trajectory
359
+ agent-browser wait 3000 # Mouse wandering while waiting
360
+ unset AGENT_BROWSER_HUMAN # Disable
342
361
  ```
343
362
 
344
- **Important for Shell Scripts:** Refs are session-specific and cannot be used in standalone shell scripts. When converting interactive workflows to scripts, use semantic locators or CSS selectors instead. See [references/snapshot-refs.md](references/snapshot-refs.md#converting-to-shell-scripts) for details.
363
+ Features: continuous position tracking, acceleration curves, 4 trajectory types, auto-wandering on wait.
345
364
 
346
- ## Semantic Locators (Alternative to Refs)
365
+ ### Viewer / Streaming Mode
347
366
 
348
- When refs are unavailable or unreliable, use semantic locators:
367
+ Real-time remote browser visualization with frame streaming over WebSocket.
349
368
 
350
369
  ```bash
351
- agent-browser find text "Sign In" click
352
- agent-browser find label "Email" fill "user@test.com"
353
- agent-browser find role button click --name "Submit"
354
- agent-browser find placeholder "Search" type "query"
355
- agent-browser find testid "submit-btn" click
370
+ # Start viewer after opening a page
371
+ agent-browser open https://example.com
372
+ agent-browser viewer # Opens viewer URL in browser
373
+ agent-browser viewer --json # Get connection details as JSON
356
374
  ```
357
375
 
358
- ## Proxy Configuration
376
+ **Architecture:** Browser -> Daemon (IPC) -> Standalone Server (:5005) -> Viewer (WebSocket)
377
+
378
+ **Element Crop Mode:** Stream can be cropped to a specific DOM element's bounds. Coordinates auto-map to element-local space.
379
+
380
+ See [viewer-mode.md](references/viewer-mode.md) for architecture details, troubleshooting, and element mode.
359
381
 
360
- Configure proxy for geo-testing, rate limiting avoidance, and corporate environments:
382
+ ### Mobile Remote Control (Touch Devices)
383
+
384
+ When viewer is opened on a phone/tablet, it automatically enters **mobile mode** with touch-optimized UI:
385
+
386
+ - **Touchpad**: Bottom-area gesture surface (tap=click, drag=move cursor, long-press=drag, 2-finger=scroll)
387
+ - **Input Panel**: Tap remote input field -> local text input appears -> syncs to remote via `input_fill`
388
+ - **Virtual Keyboard Toolbar**: Tab, Arrows, Enter, Backspace, Escape
389
+ - **IME Support**: Chinese/Japanese composition (pinyin etc.) — intermediate input NOT sent to remote
390
+ - **DeviceMode**: Auto-detects device type, switches UI dynamically on resize/orientationchange/matchMedia
391
+
392
+ See [mobile-viewer.md](references/mobile-viewer.md) for touchpad gestures, input panel flow, DeviceMode architecture.
393
+
394
+ ### iOS Simulator (Appium)
395
+
396
+ Native iOS automation via Xcode + Appium:
361
397
 
362
398
  ```bash
363
- # Basic proxy via global option
364
- agent-browser --proxy http://proxy.example.com:8080 open https://example.com
399
+ agent-browser device list # List simulators
400
+ agent-browser -p ios --device "iPhone 16 Pro" open https://example.com
401
+ agent-browser -p ios snapshot -i && agent-browser -p ios click @e1
402
+ agent-browser -p ios close # Shuts down simulator
403
+ ```
404
+
405
+ Requires: macOS + Xcode + `npm install -g appium && appium driver install xcuitest`.
406
+
407
+ Note: Mobile viewer mode (above) works on ANY phone browser via web viewer — no simulator needed.
365
408
 
366
- # HTTPS proxy
367
- agent-browser --proxy https://proxy.example.com:8080 open https://example.com
409
+ ### Cloud Browser Providers
368
410
 
369
- # SOCKS5 proxy
370
- agent-browser --proxy socks5://proxy.example.com:1080 open https://example.com
411
+ Connect to managed browser services:
371
412
 
372
- # Authenticated proxy
373
- agent-browser --proxy http://user:pass@proxy.example.com:8080 open https://example.com
413
+ ```bash
414
+ BROWSERBASE_API_KEY=key agent-browser --provider browserbase open https://example.com
415
+ KERNEL_API_KEY=key agent-browser --provider kernel open https://example.com
416
+ BROWSERUSE_API_KEY=key agent-browser --provider browseruse open https://example.com
417
+ ```
418
+
419
+ Useful for: geo-distributed testing, IP diversity, team sharing, parallel scaling.
420
+
421
+ ## Ref Lifecycle (Important)
374
422
 
375
- # Proxy with bypass list
376
- agent-browser --proxy http://proxy.example.com:8080 --proxy-bypass "localhost,*.internal.com" open https://example.com
423
+ Refs (`@e1`, `@e2`) are invalidated when the page changes. Always re-snapshot after navigation, form submission, or dynamic content loading:
377
424
 
378
- # Verify proxy is working (check IP)
379
- agent-browser --proxy http://proxy.example.com:8080 open https://httpbin.org/ip
380
- agent-browser get text body
425
+ ```bash
426
+ agent-browser click @e5 # Navigates to new page
427
+ agent-browser snapshot -i # MUST re-snapshot
428
+ agent-browser click @e1 # Use new refs
381
429
  ```
382
430
 
383
- **Proxy Validation:** The proxy setting is actively enforced - if you specify an unreachable proxy server, navigation will fail with a connection error, confirming the proxy configuration is being used (not just ignored).
384
-
385
- ## Deep-Dive Documentation
386
-
387
- | Reference | When to Use |
388
- |-----------|-------------|
389
- | [references/commands.md](references/commands.md) | Full command reference with all options |
390
- | [references/data-extraction.md](references/data-extraction.md) | **Data extraction patterns: DOM, JS variables, API interception, infinite scroll, iframe** |
391
- | [references/snapshot-refs.md](references/snapshot-refs.md) | Ref lifecycle, invalidation rules, troubleshooting |
392
- | [references/session-management.md](references/session-management.md) | Parallel sessions, state persistence, concurrent scraping |
393
- | [references/authentication.md](references/authentication.md) | Login flows, OAuth, 2FA handling, state reuse |
394
- | [references/video-recording.md](references/video-recording.md) | Video recording for debugging |
395
- | [references/recorder.md](references/recorder.md) | **Action recording & replay for test automation** |
396
- | [references/proxy-support.md](references/proxy-support.md) | Proxy configuration, geo-testing, rotating proxies |
397
- | [references/network-monitoring.md](references/network-monitoring.md) | **Network request monitoring, API mocking, request blocking** |
431
+ Refs are session-specific. For shell scripts, use semantic locators or CSS selectors instead. See [snapshot-refs.md](references/snapshot-refs.md).
432
+
433
+ ## Reference Docs
434
+
435
+ | Reference | Content |
436
+ | --------------------------------------------------------- | ------------------------------------------------------------- |
437
+ | [commands.md](references/commands.md) | Complete command reference with all options |
438
+ | [data-extraction.md](references/data-extraction.md) | DOM, JS variables, API interception, infinite scroll, iframe |
439
+ | [snapshot-refs.md](references/snapshot-refs.md) | Ref lifecycle, invalidation rules, shell script conversion |
440
+ | [session-management.md](references/session-management.md) | Parallel sessions, state persistence, concurrent scraping |
441
+ | [authentication.md](references/authentication.md) | Login flows, OAuth, 2FA handling, state reuse |
442
+ | [video-recording.md](references/video-recording.md) | Video recording for debugging |
443
+ | [recorder.md](references/recorder.md) | Action recording & replay for test automation |
444
+ | [proxy-support.md](references/proxy-support.md) | Proxy config, geo-testing, rotating proxies |
445
+ | [network-monitoring.md](references/network-monitoring.md) | Request monitoring, API mocking, request blocking |
446
+ | [viewer-mode.md](references/viewer-mode.md) | Streaming viewer, element crop, architecture, troubleshooting |
447
+ | [mobile-viewer.md](references/mobile-viewer.md) | Touchpad, input panel, IME/CJK support, DeviceMode |