@dyyz1993/agent-browser 0.9.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (187) hide show
  1. package/LICENSE +202 -0
  2. package/README.md +907 -0
  3. package/bin/agent-browser-darwin-arm64 +0 -0
  4. package/bin/agent-browser.js +120 -0
  5. package/dist/__tests__/e2e/utils/test-helpers.d.ts +5 -0
  6. package/dist/__tests__/e2e/utils/test-helpers.d.ts.map +1 -0
  7. package/dist/__tests__/e2e/utils/test-helpers.js +22 -0
  8. package/dist/__tests__/e2e/utils/test-helpers.js.map +1 -0
  9. package/dist/__tests__/test-iframe.d.ts +2 -0
  10. package/dist/__tests__/test-iframe.d.ts.map +1 -0
  11. package/dist/__tests__/test-iframe.js +52 -0
  12. package/dist/__tests__/test-iframe.js.map +1 -0
  13. package/dist/__tests__/utils/parseCli.d.ts +20 -0
  14. package/dist/__tests__/utils/parseCli.d.ts.map +1 -0
  15. package/dist/__tests__/utils/parseCli.js +1086 -0
  16. package/dist/__tests__/utils/parseCli.js.map +1 -0
  17. package/dist/actions.d.ts +50 -0
  18. package/dist/actions.d.ts.map +1 -0
  19. package/dist/actions.js +2164 -0
  20. package/dist/actions.js.map +1 -0
  21. package/dist/browser.d.ts +556 -0
  22. package/dist/browser.d.ts.map +1 -0
  23. package/dist/browser.js +2599 -0
  24. package/dist/browser.js.map +1 -0
  25. package/dist/cli/commands.d.ts +8 -0
  26. package/dist/cli/commands.d.ts.map +1 -0
  27. package/dist/cli/commands.js +1038 -0
  28. package/dist/cli/commands.js.map +1 -0
  29. package/dist/cli/connection.d.ts +50 -0
  30. package/dist/cli/connection.d.ts.map +1 -0
  31. package/dist/cli/connection.js +595 -0
  32. package/dist/cli/connection.js.map +1 -0
  33. package/dist/cli/flags.d.ts +36 -0
  34. package/dist/cli/flags.d.ts.map +1 -0
  35. package/dist/cli/flags.js +206 -0
  36. package/dist/cli/flags.js.map +1 -0
  37. package/dist/cli/help.d.ts +4 -0
  38. package/dist/cli/help.d.ts.map +1 -0
  39. package/dist/cli/help.js +1024 -0
  40. package/dist/cli/help.js.map +1 -0
  41. package/dist/cli/output.d.ts +14 -0
  42. package/dist/cli/output.d.ts.map +1 -0
  43. package/dist/cli/output.js +456 -0
  44. package/dist/cli/output.js.map +1 -0
  45. package/dist/cli-new.d.ts +3 -0
  46. package/dist/cli-new.d.ts.map +1 -0
  47. package/dist/cli-new.js +308 -0
  48. package/dist/cli-new.js.map +1 -0
  49. package/dist/cli-old.d.ts +3 -0
  50. package/dist/cli-old.d.ts.map +1 -0
  51. package/dist/cli-old.js +1101 -0
  52. package/dist/cli-old.js.map +1 -0
  53. package/dist/cli.d.ts +3 -0
  54. package/dist/cli.d.ts.map +1 -0
  55. package/dist/cli.js +403 -0
  56. package/dist/cli.js.map +1 -0
  57. package/dist/content-detection.d.ts +18 -0
  58. package/dist/content-detection.d.ts.map +1 -0
  59. package/dist/content-detection.js +68 -0
  60. package/dist/content-detection.js.map +1 -0
  61. package/dist/daemon.d.ts +55 -0
  62. package/dist/daemon.d.ts.map +1 -0
  63. package/dist/daemon.js +426 -0
  64. package/dist/daemon.js.map +1 -0
  65. package/dist/diff.d.ts +42 -0
  66. package/dist/diff.d.ts.map +1 -0
  67. package/dist/diff.js +166 -0
  68. package/dist/diff.js.map +1 -0
  69. package/dist/human-mouse.d.ts +31 -0
  70. package/dist/human-mouse.d.ts.map +1 -0
  71. package/dist/human-mouse.js +184 -0
  72. package/dist/human-mouse.js.map +1 -0
  73. package/dist/ios-actions.d.ts +11 -0
  74. package/dist/ios-actions.d.ts.map +1 -0
  75. package/dist/ios-actions.js +228 -0
  76. package/dist/ios-actions.js.map +1 -0
  77. package/dist/ios-manager.d.ts +266 -0
  78. package/dist/ios-manager.d.ts.map +1 -0
  79. package/dist/ios-manager.js +1076 -0
  80. package/dist/ios-manager.js.map +1 -0
  81. package/dist/message-bridge.d.ts +10 -0
  82. package/dist/message-bridge.d.ts.map +1 -0
  83. package/dist/message-bridge.js +60 -0
  84. package/dist/message-bridge.js.map +1 -0
  85. package/dist/protocol.d.ts +26 -0
  86. package/dist/protocol.d.ts.map +1 -0
  87. package/dist/protocol.js +912 -0
  88. package/dist/protocol.js.map +1 -0
  89. package/dist/recorder/binding.d.ts +24 -0
  90. package/dist/recorder/binding.d.ts.map +1 -0
  91. package/dist/recorder/binding.js +215 -0
  92. package/dist/recorder/binding.js.map +1 -0
  93. package/dist/recorder/index.d.ts +4 -0
  94. package/dist/recorder/index.d.ts.map +1 -0
  95. package/dist/recorder/index.js +4 -0
  96. package/dist/recorder/index.js.map +1 -0
  97. package/dist/recorder/inject.js +1913 -0
  98. package/dist/recorder/recorder.d.ts +19 -0
  99. package/dist/recorder/recorder.d.ts.map +1 -0
  100. package/dist/recorder/recorder.js +101 -0
  101. package/dist/recorder/recorder.js.map +1 -0
  102. package/dist/recorder/store.d.ts +22 -0
  103. package/dist/recorder/store.d.ts.map +1 -0
  104. package/dist/recorder/store.js +150 -0
  105. package/dist/recorder/store.js.map +1 -0
  106. package/dist/recorder/types.d.ts +73 -0
  107. package/dist/recorder/types.d.ts.map +1 -0
  108. package/dist/recorder/types.js +5 -0
  109. package/dist/recorder/types.js.map +1 -0
  110. package/dist/snapshot.d.ts +81 -0
  111. package/dist/snapshot.d.ts.map +1 -0
  112. package/dist/snapshot.js +1348 -0
  113. package/dist/snapshot.js.map +1 -0
  114. package/dist/stream-server-standalone.d.ts +38 -0
  115. package/dist/stream-server-standalone.d.ts.map +1 -0
  116. package/dist/stream-server-standalone.js +494 -0
  117. package/dist/stream-server-standalone.js.map +1 -0
  118. package/dist/stream-server.d.ts +214 -0
  119. package/dist/stream-server.d.ts.map +1 -0
  120. package/dist/stream-server.js +811 -0
  121. package/dist/stream-server.js.map +1 -0
  122. package/dist/types.d.ts +914 -0
  123. package/dist/types.d.ts.map +1 -0
  124. package/dist/types.js +4 -0
  125. package/dist/types.js.map +1 -0
  126. package/dist/viewer-html.d.ts +2 -0
  127. package/dist/viewer-html.d.ts.map +1 -0
  128. package/dist/viewer-html.js +185 -0
  129. package/dist/viewer-html.js.map +1 -0
  130. package/dist/viewer-script.d.ts +47 -0
  131. package/dist/viewer-script.d.ts.map +1 -0
  132. package/dist/viewer-script.js +586 -0
  133. package/dist/viewer-script.js.map +1 -0
  134. package/package.json +86 -0
  135. package/scripts/build-all-platforms.sh +68 -0
  136. package/scripts/check-version-sync.js +39 -0
  137. package/scripts/check_goods_container.js +35 -0
  138. package/scripts/check_page_content.js +36 -0
  139. package/scripts/click_applause_rate.js +30 -0
  140. package/scripts/copy-native.js +36 -0
  141. package/scripts/copy-recorder.js +21 -0
  142. package/scripts/e2e-test-recorder.ts +584 -0
  143. package/scripts/explore_jd_page.js +31 -0
  144. package/scripts/extract_all_jd_data.js +80 -0
  145. package/scripts/extract_jd_product_detail.js +62 -0
  146. package/scripts/extract_jd_products_correct_links.js +78 -0
  147. package/scripts/extract_jd_products_final.js +80 -0
  148. package/scripts/extract_jd_reviews.js +48 -0
  149. package/scripts/extract_jd_seafood_final.js +78 -0
  150. package/scripts/extract_multiple_products.js +77 -0
  151. package/scripts/extract_products_no_scroll.js +68 -0
  152. package/scripts/extract_products_simple.js +68 -0
  153. package/scripts/find_applause_rate.js +26 -0
  154. package/scripts/find_jd_links.js +28 -0
  155. package/scripts/find_main_content.js +20 -0
  156. package/scripts/find_product_cards.js +38 -0
  157. package/scripts/find_root_content.js +26 -0
  158. package/scripts/find_unique_products.js +55 -0
  159. package/scripts/get_jd_product_detail.js +16 -0
  160. package/scripts/get_jd_products.js +23 -0
  161. package/scripts/get_jd_seafood_products.js +44 -0
  162. package/scripts/get_product_details_from_images.js +54 -0
  163. package/scripts/postinstall.js +235 -0
  164. package/scripts/scroll_and_get_products.js +47 -0
  165. package/scripts/scroll_deep_and_find.js +45 -0
  166. package/scripts/sync-version.js +69 -0
  167. package/scripts/verify-baidu-enter.ts +116 -0
  168. package/skills/agent-browser/SKILL.md +310 -0
  169. package/skills/agent-browser/references/authentication.md +198 -0
  170. package/skills/agent-browser/references/commands.md +471 -0
  171. package/skills/agent-browser/references/data-extraction.md +377 -0
  172. package/skills/agent-browser/references/proxy-support.md +188 -0
  173. package/skills/agent-browser/references/session-management.md +197 -0
  174. package/skills/agent-browser/references/snapshot-refs.md +379 -0
  175. package/skills/agent-browser/references/video-recording.md +173 -0
  176. package/skills/agent-browser/templates/api-interception.sh +53 -0
  177. package/skills/agent-browser/templates/authenticated-session.sh +97 -0
  178. package/skills/agent-browser/templates/capture-workflow.sh +69 -0
  179. package/skills/agent-browser/templates/data-extraction.sh +210 -0
  180. package/skills/agent-browser/templates/form-automation.sh +62 -0
  181. package/skills/skill-creator/LICENSE.txt +202 -0
  182. package/skills/skill-creator/SKILL.md +356 -0
  183. package/skills/skill-creator/references/output-patterns.md +82 -0
  184. package/skills/skill-creator/references/workflows.md +28 -0
  185. package/skills/skill-creator/scripts/init_skill.py +303 -0
  186. package/skills/skill-creator/scripts/package_skill.py +113 -0
  187. package/skills/skill-creator/scripts/quick_validate.py +95 -0
package/README.md ADDED
@@ -0,0 +1,907 @@
1
+ # agent-browser
2
+
3
+ > **Fork of [vercel-labs/agent-browser](https://github.com/vercel-labs/agent-browser)** with additional enhancements and features.
4
+
5
+ Headless browser automation CLI for AI agents. Fast Rust CLI with Node.js fallback.
6
+
7
+ ## Installation
8
+
9
+ ### npm (recommended)
10
+
11
+ ```bash
12
+ npm install -g agent-browser
13
+ agent-browser install # Download Chromium
14
+ ```
15
+
16
+ ### Homebrew (macOS)
17
+
18
+ ```bash
19
+ brew install agent-browser
20
+ agent-browser install # Download Chromium
21
+ ```
22
+
23
+ ### From Source
24
+
25
+ ```bash
26
+ git clone https://github.com/xuyingzhou/agent-browser
27
+ cd agent-browser
28
+ pnpm install
29
+ pnpm build
30
+ pnpm build:native # Requires Rust (https://rustup.rs)
31
+ pnpm link --global # Makes agent-browser available globally
32
+ agent-browser install
33
+ ```
34
+
35
+ ### Linux Dependencies
36
+
37
+ On Linux, install system dependencies:
38
+
39
+ ```bash
40
+ agent-browser install --with-deps
41
+ # or manually: npx playwright install-deps chromium
42
+ ```
43
+
44
+ ## Quick Start
45
+
46
+ ```bash
47
+ agent-browser open example.com
48
+ agent-browser snapshot # Get accessibility tree with refs
49
+ agent-browser click @e2 # Click by ref from snapshot
50
+ agent-browser fill @e3 "test@example.com" # Fill by ref
51
+ agent-browser get text @e1 # Get text by ref
52
+ agent-browser screenshot page.png
53
+ agent-browser close
54
+ ```
55
+
56
+ ### Traditional Selectors (also supported)
57
+
58
+ ```bash
59
+ agent-browser click "#submit"
60
+ agent-browser fill "#email" "test@example.com"
61
+ agent-browser find role button click --name "Submit"
62
+ ```
63
+
64
+ ## Commands
65
+
66
+ ### Core Commands
67
+
68
+ ```bash
69
+ agent-browser open <url> # Navigate to URL (aliases: goto, navigate)
70
+ agent-browser click <sel> # Click element
71
+ agent-browser dblclick <sel> # Double-click element
72
+ agent-browser focus <sel> # Focus element
73
+ agent-browser type <sel> <text> # Type into element
74
+ agent-browser fill <sel> <text> # Clear and fill
75
+ agent-browser press <key> # Press key (Enter, Tab, Control+a) (alias: key)
76
+ agent-browser keydown <key> # Hold key down
77
+ agent-browser keyup <key> # Release key
78
+ agent-browser hover <sel> # Hover element
79
+ agent-browser select <sel> <val> # Select dropdown option
80
+ agent-browser check <sel> # Check checkbox
81
+ agent-browser uncheck <sel> # Uncheck checkbox
82
+ agent-browser scroll <dir> [px] # Scroll (up/down/left/right)
83
+ agent-browser scrollintoview <sel> # Scroll element into view (alias: scrollinto)
84
+ agent-browser drag <src> <tgt> # Drag and drop
85
+ agent-browser upload <sel> <files> # Upload files
86
+ agent-browser screenshot [path] # Take screenshot (--full for full page, saves to a temporary directory if no path)
87
+ agent-browser pdf <path> # Save as PDF
88
+ agent-browser snapshot # Accessibility tree with refs (best for AI)
89
+ agent-browser eval <js> # Run JavaScript (-b for base64, --stdin for piped input)
90
+ agent-browser connect <port> # Connect to browser via CDP
91
+ agent-browser close # Close browser (aliases: quit, exit)
92
+ ```
93
+
94
+ ### Get Info
95
+
96
+ ```bash
97
+ agent-browser get text <sel> # Get text content
98
+ agent-browser get html <sel> # Get innerHTML
99
+ agent-browser get value <sel> # Get input value
100
+ agent-browser get attr <sel> <attr> # Get attribute
101
+ agent-browser get title # Get page title
102
+ agent-browser get url # Get current URL
103
+ agent-browser get count <sel> # Count matching elements
104
+ agent-browser get box <sel> # Get bounding box
105
+ ```
106
+
107
+ ### Check State
108
+
109
+ ```bash
110
+ agent-browser is visible <sel> # Check if visible
111
+ agent-browser is enabled <sel> # Check if enabled
112
+ agent-browser is checked <sel> # Check if checked
113
+ ```
114
+
115
+ ### Find Elements (Semantic Locators)
116
+
117
+ ```bash
118
+ agent-browser find role <role> <action> [value] # By ARIA role
119
+ agent-browser find text <text> <action> # By text content
120
+ agent-browser find label <label> <action> [value] # By label
121
+ agent-browser find placeholder <ph> <action> [value] # By placeholder
122
+ agent-browser find alt <text> <action> # By alt text
123
+ agent-browser find title <text> <action> # By title attr
124
+ agent-browser find testid <id> <action> [value] # By data-testid
125
+ agent-browser find first <sel> <action> [value] # First match
126
+ agent-browser find last <sel> <action> [value] # Last match
127
+ agent-browser find nth <n> <sel> <action> [value] # Nth match
128
+ ```
129
+
130
+ **Actions:** `click`, `fill`, `check`, `hover`, `text`
131
+
132
+ **Examples:**
133
+ ```bash
134
+ agent-browser find role button click --name "Submit"
135
+ agent-browser find text "Sign In" click
136
+ agent-browser find label "Email" fill "test@test.com"
137
+ agent-browser find first ".item" click
138
+ agent-browser find nth 2 "a" text
139
+ ```
140
+
141
+ ### Wait
142
+
143
+ ```bash
144
+ agent-browser wait <selector> # Wait for element to be visible
145
+ agent-browser wait <ms> # Wait for time (milliseconds)
146
+ agent-browser wait --text "Welcome" # Wait for text to appear
147
+ agent-browser wait --url "**/dash" # Wait for URL pattern
148
+ agent-browser wait --load networkidle # Wait for load state
149
+ agent-browser wait --fn "window.ready === true" # Wait for JS condition
150
+ ```
151
+
152
+ **Load states:** `load`, `domcontentloaded`, `networkidle`
153
+
154
+ ### Mouse Control
155
+
156
+ ```bash
157
+ agent-browser mouse move <x> <y> # Move mouse
158
+ agent-browser mouse down [button] # Press button (left/right/middle)
159
+ agent-browser mouse up [button] # Release button
160
+ agent-browser mouse wheel <dy> [dx] # Scroll wheel
161
+ ```
162
+
163
+ ### Browser Settings
164
+
165
+ ```bash
166
+ agent-browser set viewport <w> <h> # Set viewport size
167
+ agent-browser set device <name> # Emulate device ("iPhone 14")
168
+ agent-browser set geo <lat> <lng> # Set geolocation
169
+ agent-browser set offline [on|off] # Toggle offline mode
170
+ agent-browser set headers <json> # Extra HTTP headers
171
+ agent-browser set credentials <u> <p> # HTTP basic auth
172
+ agent-browser set media [dark|light] # Emulate color scheme
173
+ ```
174
+
175
+ ### Cookies & Storage
176
+
177
+ ```bash
178
+ agent-browser cookies # Get all cookies
179
+ agent-browser cookies set <name> <val> # Set cookie
180
+ agent-browser cookies clear # Clear cookies
181
+
182
+ agent-browser storage local # Get all localStorage
183
+ agent-browser storage local <key> # Get specific key
184
+ agent-browser storage local set <k> <v> # Set value
185
+ agent-browser storage local clear # Clear all
186
+
187
+ agent-browser storage session # Same for sessionStorage
188
+ ```
189
+
190
+ ### Network
191
+
192
+ ```bash
193
+ agent-browser network route <url> # Intercept requests
194
+ agent-browser network route <url> --abort # Block requests
195
+ agent-browser network route <url> --body <json> # Mock response
196
+ agent-browser network unroute [url] # Remove routes
197
+ agent-browser network requests # View tracked requests
198
+ agent-browser network requests --filter api # Filter requests
199
+ ```
200
+
201
+ ### Tabs & Windows
202
+
203
+ ```bash
204
+ agent-browser tab # List tabs
205
+ agent-browser tab new [url] # New tab (optionally with URL)
206
+ agent-browser tab <n> # Switch to tab n
207
+ agent-browser tab close [n] # Close tab
208
+ agent-browser window new # New window
209
+ ```
210
+
211
+ ### Frames
212
+
213
+ ```bash
214
+ agent-browser frame <sel> # Switch to iframe
215
+ agent-browser frame main # Back to main frame
216
+ ```
217
+
218
+ ### Dialogs
219
+
220
+ ```bash
221
+ agent-browser dialog accept [text] # Accept (with optional prompt text)
222
+ agent-browser dialog dismiss # Dismiss
223
+ ```
224
+
225
+ ### Debug
226
+
227
+ ```bash
228
+ agent-browser trace start [path] # Start recording trace
229
+ agent-browser trace stop [path] # Stop and save trace
230
+ agent-browser console # View console messages (log, error, warn, info)
231
+ agent-browser console --clear # Clear console
232
+ agent-browser errors # View page errors (uncaught JavaScript exceptions)
233
+ agent-browser errors --clear # Clear errors
234
+ agent-browser highlight <sel> # Highlight element
235
+ agent-browser state save <path> # Save auth state
236
+ agent-browser state load <path> # Load auth state
237
+ ```
238
+
239
+ ### Navigation
240
+
241
+ ```bash
242
+ agent-browser back # Go back
243
+ agent-browser forward # Go forward
244
+ agent-browser reload # Reload page
245
+ ```
246
+
247
+ ### Setup
248
+
249
+ ```bash
250
+ agent-browser install # Download Chromium browser
251
+ agent-browser install --with-deps # Also install system deps (Linux)
252
+ ```
253
+
254
+ ## Sessions
255
+
256
+ Run multiple isolated browser instances:
257
+
258
+ ```bash
259
+ # Different sessions
260
+ agent-browser --session agent1 open site-a.com
261
+ agent-browser --session agent2 open site-b.com
262
+
263
+ # Or via environment variable
264
+ AGENT_BROWSER_SESSION=agent1 agent-browser click "#btn"
265
+
266
+ # List active sessions
267
+ agent-browser session list
268
+ # Output:
269
+ # Active sessions:
270
+ # -> default
271
+ # agent1
272
+
273
+ # Show current session
274
+ agent-browser session
275
+ ```
276
+
277
+ Each session has its own:
278
+ - Browser instance
279
+ - Cookies and storage
280
+ - Navigation history
281
+ - Authentication state
282
+
283
+ ## Persistent Profiles
284
+
285
+ By default, browser state (cookies, localStorage, login sessions) is ephemeral and lost when the browser closes. Use `--profile` to persist state across browser restarts:
286
+
287
+ ```bash
288
+ # Use a persistent profile directory
289
+ agent-browser --profile ~/.myapp-profile open myapp.com
290
+
291
+ # Login once, then reuse the authenticated session
292
+ agent-browser --profile ~/.myapp-profile open myapp.com/dashboard
293
+
294
+ # Or via environment variable
295
+ AGENT_BROWSER_PROFILE=~/.myapp-profile agent-browser open myapp.com
296
+ ```
297
+
298
+ The profile directory stores:
299
+ - Cookies and localStorage
300
+ - IndexedDB data
301
+ - Service workers
302
+ - Browser cache
303
+ - Login sessions
304
+
305
+ **Tip**: Use different profile paths for different projects to keep their browser state isolated.
306
+
307
+ ## Snapshot Options
308
+
309
+ The `snapshot` command supports filtering to reduce output size:
310
+
311
+ ```bash
312
+ agent-browser snapshot # Full accessibility tree
313
+ agent-browser snapshot -i # Interactive elements only (buttons, inputs, links)
314
+ agent-browser snapshot -i -C # Include cursor-interactive elements (divs with onclick, etc.)
315
+ agent-browser snapshot -c # Compact (remove empty structural elements)
316
+ agent-browser snapshot -d 3 # Limit depth to 3 levels
317
+ agent-browser snapshot -s "#main" # Scope to CSS selector
318
+ agent-browser snapshot -i -c -d 5 # Combine options
319
+ ```
320
+
321
+ | Option | Description |
322
+ |--------|-------------|
323
+ | `-i, --interactive` | Only show interactive elements (buttons, links, inputs) |
324
+ | `-C, --cursor` | Include cursor-interactive elements (cursor:pointer, onclick, tabindex) |
325
+ | `-c, --compact` | Remove empty structural elements |
326
+ | `-d, --depth <n>` | Limit tree depth |
327
+ | `-s, --selector <sel>` | Scope to CSS selector |
328
+
329
+ The `-C` flag is useful for modern web apps that use custom clickable elements (divs, spans) instead of standard buttons/links.
330
+
331
+ ## Options
332
+
333
+ | Option | Description |
334
+ |--------|-------------|
335
+ | `--session <name>` | Use isolated session (or `AGENT_BROWSER_SESSION` env) |
336
+ | `--profile <path>` | Persistent browser profile directory (or `AGENT_BROWSER_PROFILE` env) |
337
+ | `--headers <json>` | Set HTTP headers scoped to the URL's origin |
338
+ | `--executable-path <path>` | Custom browser executable (or `AGENT_BROWSER_EXECUTABLE_PATH` env) |
339
+ | `--args <args>` | Browser launch args, comma or newline separated (or `AGENT_BROWSER_ARGS` env) |
340
+ | `--user-agent <ua>` | Custom User-Agent string (or `AGENT_BROWSER_USER_AGENT` env) |
341
+ | `--proxy <url>` | Proxy server URL with optional auth (or `AGENT_BROWSER_PROXY` env) |
342
+ | `--proxy-bypass <hosts>` | Hosts to bypass proxy (or `AGENT_BROWSER_PROXY_BYPASS` env) |
343
+ | `-p, --provider <name>` | Cloud browser provider (or `AGENT_BROWSER_PROVIDER` env) |
344
+ | `--json` | JSON output (for agents) |
345
+ | `--full, -f` | Full page screenshot |
346
+ | `--name, -n` | Locator name filter |
347
+ | `--exact` | Exact text match |
348
+ | `--headed` | Show browser window (not headless) |
349
+ | `--cdp <port>` | Connect via Chrome DevTools Protocol |
350
+ | `--ignore-https-errors` | Ignore HTTPS certificate errors (useful for self-signed certs) |
351
+ | `--allow-file-access` | Allow file:// URLs to access local files (Chromium only) |
352
+ | `--debug` | Debug output |
353
+
354
+ ## Selectors
355
+
356
+ ### Refs (Recommended for AI)
357
+
358
+ Refs provide deterministic element selection from snapshots:
359
+
360
+ ```bash
361
+ # 1. Get snapshot with refs
362
+ agent-browser snapshot
363
+ # Output:
364
+ # - heading "Example Domain" [ref=e1] [level=1]
365
+ # - button "Submit" [ref=e2]
366
+ # - textbox "Email" [ref=e3]
367
+ # - link "Learn more" [ref=e4]
368
+
369
+ # 2. Use refs to interact
370
+ agent-browser click @e2 # Click the button
371
+ agent-browser fill @e3 "test@example.com" # Fill the textbox
372
+ agent-browser get text @e1 # Get heading text
373
+ agent-browser hover @e4 # Hover the link
374
+ ```
375
+
376
+ **Why use refs?**
377
+ - **Deterministic**: Ref points to exact element from snapshot
378
+ - **Fast**: No DOM re-query needed
379
+ - **AI-friendly**: Snapshot + ref workflow is optimal for LLMs
380
+
381
+ ### CSS Selectors
382
+
383
+ ```bash
384
+ agent-browser click "#id"
385
+ agent-browser click ".class"
386
+ agent-browser click "div > button"
387
+ ```
388
+
389
+ ### Text & XPath
390
+
391
+ ```bash
392
+ agent-browser click "text=Submit"
393
+ agent-browser click "xpath=//button"
394
+ ```
395
+
396
+ ### Semantic Locators
397
+
398
+ ```bash
399
+ agent-browser find role button click --name "Submit"
400
+ agent-browser find label "Email" fill "test@test.com"
401
+ ```
402
+
403
+ ## Agent Mode
404
+
405
+ Use `--json` for machine-readable output:
406
+
407
+ ```bash
408
+ agent-browser snapshot --json
409
+ # Returns: {"success":true,"data":{"snapshot":"...","refs":{"e1":{"role":"heading","name":"Title"},...}}}
410
+
411
+ agent-browser get text @e1 --json
412
+ agent-browser is visible @e2 --json
413
+ ```
414
+
415
+ ### Optimal AI Workflow
416
+
417
+ ```bash
418
+ # 1. Navigate and get snapshot
419
+ agent-browser open example.com
420
+ agent-browser snapshot -i --json # AI parses tree and refs
421
+
422
+ # 2. AI identifies target refs from snapshot
423
+ # 3. Execute actions using refs
424
+ agent-browser click @e2
425
+ agent-browser fill @e3 "input text"
426
+
427
+ # 4. Get new snapshot if page changed
428
+ agent-browser snapshot -i --json
429
+ ```
430
+
431
+ ## Headed Mode
432
+
433
+ Show the browser window for debugging:
434
+
435
+ ```bash
436
+ agent-browser open example.com --headed
437
+ ```
438
+
439
+ This opens a visible browser window instead of running headless.
440
+
441
+ ## Authenticated Sessions
442
+
443
+ Use `--headers` to set HTTP headers for a specific origin, enabling authentication without login flows:
444
+
445
+ ```bash
446
+ # Headers are scoped to api.example.com only
447
+ agent-browser open api.example.com --headers '{"Authorization": "Bearer <token>"}'
448
+
449
+ # Requests to api.example.com include the auth header
450
+ agent-browser snapshot -i --json
451
+ agent-browser click @e2
452
+
453
+ # Navigate to another domain - headers are NOT sent (safe!)
454
+ agent-browser open other-site.com
455
+ ```
456
+
457
+ This is useful for:
458
+ - **Skipping login flows** - Authenticate via headers instead of UI
459
+ - **Switching users** - Start new sessions with different auth tokens
460
+ - **API testing** - Access protected endpoints directly
461
+ - **Security** - Headers are scoped to the origin, not leaked to other domains
462
+
463
+ To set headers for multiple origins, use `--headers` with each `open` command:
464
+
465
+ ```bash
466
+ agent-browser open api.example.com --headers '{"Authorization": "Bearer token1"}'
467
+ agent-browser open api.acme.com --headers '{"Authorization": "Bearer token2"}'
468
+ ```
469
+
470
+ For global headers (all domains), use `set headers`:
471
+
472
+ ```bash
473
+ agent-browser set headers '{"X-Custom-Header": "value"}'
474
+ ```
475
+
476
+ ## Custom Browser Executable
477
+
478
+ Use a custom browser executable instead of the bundled Chromium. This is useful for:
479
+ - **Serverless deployment**: Use lightweight Chromium builds like `@sparticuz/chromium` (~50MB vs ~684MB)
480
+ - **System browsers**: Use an existing Chrome/Chromium installation
481
+ - **Custom builds**: Use modified browser builds
482
+
483
+ ### CLI Usage
484
+
485
+ ```bash
486
+ # Via flag
487
+ agent-browser --executable-path /path/to/chromium open example.com
488
+
489
+ # Via environment variable
490
+ AGENT_BROWSER_EXECUTABLE_PATH=/path/to/chromium agent-browser open example.com
491
+ ```
492
+
493
+ ### Serverless Example (Vercel/AWS Lambda)
494
+
495
+ ```typescript
496
+ import chromium from '@sparticuz/chromium';
497
+ import { BrowserManager } from 'agent-browser';
498
+
499
+ export async function handler() {
500
+ const browser = new BrowserManager();
501
+ await browser.launch({
502
+ executablePath: await chromium.executablePath(),
503
+ headless: true,
504
+ });
505
+ // ... use browser
506
+ }
507
+ ```
508
+
509
+ ## Local Files
510
+
511
+ Open and interact with local files (PDFs, HTML, etc.) using `file://` URLs:
512
+
513
+ ```bash
514
+ # Enable file access (required for JavaScript to access local files)
515
+ agent-browser --allow-file-access open file:///path/to/document.pdf
516
+ agent-browser --allow-file-access open file:///path/to/page.html
517
+
518
+ # Take screenshot of a local PDF
519
+ agent-browser --allow-file-access open file:///Users/me/report.pdf
520
+ agent-browser screenshot report.png
521
+ ```
522
+
523
+ The `--allow-file-access` flag adds Chromium flags (`--allow-file-access-from-files`, `--allow-file-access`) that allow `file://` URLs to:
524
+ - Load and render local files
525
+ - Access other local files via JavaScript (XHR, fetch)
526
+ - Load local resources (images, scripts, stylesheets)
527
+
528
+ **Note:** This flag only works with Chromium. For security, it's disabled by default.
529
+
530
+ ## CDP Mode
531
+
532
+ Connect to an existing browser via Chrome DevTools Protocol:
533
+
534
+ ```bash
535
+ # Start Chrome with: google-chrome --remote-debugging-port=9222
536
+
537
+ # Connect once, then run commands without --cdp
538
+ agent-browser connect 9222
539
+ agent-browser snapshot
540
+ agent-browser tab
541
+ agent-browser close
542
+
543
+ # Or pass --cdp on each command
544
+ agent-browser --cdp 9222 snapshot
545
+
546
+ # Connect to remote browser via WebSocket URL
547
+ agent-browser --cdp "wss://your-browser-service.com/cdp?token=..." snapshot
548
+ ```
549
+
550
+ The `--cdp` flag accepts either:
551
+ - A port number (e.g., `9222`) for local connections via `http://localhost:{port}`
552
+ - A full WebSocket URL (e.g., `wss://...` or `ws://...`) for remote browser services
553
+
554
+ This enables control of:
555
+ - Electron apps
556
+ - Chrome/Chromium instances with remote debugging
557
+ - WebView2 applications
558
+ - Any browser exposing a CDP endpoint
559
+
560
+ ## Streaming (Browser Preview)
561
+
562
+ Stream the browser viewport via WebSocket for live preview or "pair browsing" where a human can watch and interact alongside an AI agent.
563
+
564
+ ### Enable Streaming
565
+
566
+ Set the `AGENT_BROWSER_STREAM_PORT` environment variable:
567
+
568
+ ```bash
569
+ AGENT_BROWSER_STREAM_PORT=9223 agent-browser open example.com
570
+ ```
571
+
572
+ This starts a WebSocket server on the specified port that streams the browser viewport and accepts input events.
573
+
574
+ ### WebSocket Protocol
575
+
576
+ Connect to `ws://localhost:9223` to receive frames and send input:
577
+
578
+ **Receive frames:**
579
+ ```json
580
+ {
581
+ "type": "frame",
582
+ "data": "<base64-encoded-jpeg>",
583
+ "metadata": {
584
+ "deviceWidth": 1280,
585
+ "deviceHeight": 720,
586
+ "pageScaleFactor": 1,
587
+ "offsetTop": 0,
588
+ "scrollOffsetX": 0,
589
+ "scrollOffsetY": 0
590
+ }
591
+ }
592
+ ```
593
+
594
+ **Send mouse events:**
595
+ ```json
596
+ {
597
+ "type": "input_mouse",
598
+ "eventType": "mousePressed",
599
+ "x": 100,
600
+ "y": 200,
601
+ "button": "left",
602
+ "clickCount": 1
603
+ }
604
+ ```
605
+
606
+ **Send keyboard events:**
607
+ ```json
608
+ {
609
+ "type": "input_keyboard",
610
+ "eventType": "keyDown",
611
+ "key": "Enter",
612
+ "code": "Enter"
613
+ }
614
+ ```
615
+
616
+ **Send touch events:**
617
+ ```json
618
+ {
619
+ "type": "input_touch",
620
+ "eventType": "touchStart",
621
+ "touchPoints": [{ "x": 100, "y": 200 }]
622
+ }
623
+ ```
624
+
625
+ ### Programmatic API
626
+
627
+ For advanced use, control streaming directly via the protocol:
628
+
629
+ ```typescript
630
+ import { BrowserManager } from 'agent-browser';
631
+
632
+ const browser = new BrowserManager();
633
+ await browser.launch({ headless: true });
634
+ await browser.navigate('https://example.com');
635
+
636
+ // Start screencast
637
+ await browser.startScreencast((frame) => {
638
+ // frame.data is base64-encoded image
639
+ // frame.metadata contains viewport info
640
+ console.log('Frame received:', frame.metadata.deviceWidth, 'x', frame.metadata.deviceHeight);
641
+ }, {
642
+ format: 'jpeg',
643
+ quality: 80,
644
+ maxWidth: 1280,
645
+ maxHeight: 720,
646
+ });
647
+
648
+ // Inject mouse events
649
+ await browser.injectMouseEvent({
650
+ type: 'mousePressed',
651
+ x: 100,
652
+ y: 200,
653
+ button: 'left',
654
+ });
655
+
656
+ // Inject keyboard events
657
+ await browser.injectKeyboardEvent({
658
+ type: 'keyDown',
659
+ key: 'Enter',
660
+ code: 'Enter',
661
+ });
662
+
663
+ // Stop when done
664
+ await browser.stopScreencast();
665
+ ```
666
+
667
+ ## Architecture
668
+
669
+ agent-browser uses a client-daemon architecture:
670
+
671
+ 1. **Node.js CLI** (default) - Direct Node.js implementation, bypasses Rust layer
672
+ 2. **Rust CLI** (optional) - Fast native binary, requires separate build
673
+ 3. **Node.js Daemon** - Manages Playwright browser instance
674
+
675
+ The default Node.js CLI provides a direct implementation that communicates with the daemon, eliminating the need for Rust dependencies while maintaining full functionality.
676
+
677
+ The daemon starts automatically on first command and persists between commands for fast subsequent operations.
678
+
679
+ **Browser Engine:** Uses Chromium by default. The daemon also supports Firefox and WebKit via the Playwright protocol.
680
+
681
+ ## Platforms
682
+
683
+ | Platform | Binary | Fallback |
684
+ |----------|--------|----------|
685
+ | macOS ARM64 | Native Rust | Node.js |
686
+ | macOS x64 | Native Rust | Node.js |
687
+ | Linux ARM64 | Native Rust | Node.js |
688
+ | Linux x64 | Native Rust | Node.js |
689
+ | Windows x64 | Native Rust | Node.js |
690
+
691
+ ## Usage with AI Agents
692
+
693
+ ### Just ask the agent
694
+
695
+ The simplest approach - just tell your agent to use it:
696
+
697
+ ```
698
+ Use agent-browser to test the login flow. Run agent-browser --help to see available commands.
699
+ ```
700
+
701
+ The `--help` output is comprehensive and most agents can figure it out from there.
702
+
703
+ ### AI Coding Assistants
704
+
705
+ Add the skill to your AI coding assistant for richer context:
706
+
707
+ ```bash
708
+ npx skills add xuyingzhou/agent-browser
709
+ ```
710
+
711
+ This works with Claude Code, Codex, Cursor, Gemini CLI, GitHub Copilot, Goose, OpenCode, and Windsurf.
712
+
713
+ ### AGENTS.md / CLAUDE.md
714
+
715
+ For more consistent results, add to your project or global instructions file:
716
+
717
+ ```markdown
718
+ ## Browser Automation
719
+
720
+ Use `agent-browser` for web automation. Run `agent-browser --help` for all commands.
721
+
722
+ Core workflow:
723
+ 1. `agent-browser open <url>` - Navigate to page
724
+ 2. `agent-browser snapshot -i` - Get interactive elements with refs (@e1, @e2)
725
+ 3. `agent-browser click @e1` / `fill @e2 "text"` - Interact using refs
726
+ 4. Re-snapshot after page changes
727
+ ```
728
+
729
+ ## Integrations
730
+
731
+ ### iOS Simulator
732
+
733
+ Control real Mobile Safari in the iOS Simulator for authentic mobile web testing. Requires macOS with Xcode.
734
+
735
+ **Setup:**
736
+
737
+ ```bash
738
+ # Install Appium and XCUITest driver
739
+ npm install -g appium
740
+ appium driver install xcuitest
741
+ ```
742
+
743
+ **Usage:**
744
+
745
+ ```bash
746
+ # List available iOS simulators
747
+ agent-browser device list
748
+
749
+ # Launch Safari on a specific device
750
+ agent-browser -p ios --device "iPhone 16 Pro" open https://example.com
751
+
752
+ # Same commands as desktop
753
+ agent-browser -p ios snapshot -i
754
+ agent-browser -p ios tap @e1
755
+ agent-browser -p ios fill @e2 "text"
756
+ agent-browser -p ios screenshot mobile.png
757
+
758
+ # Mobile-specific commands
759
+ agent-browser -p ios swipe up
760
+ agent-browser -p ios swipe down 500
761
+
762
+ # Close session
763
+ agent-browser -p ios close
764
+ ```
765
+
766
+ Or use environment variables:
767
+
768
+ ```bash
769
+ export AGENT_BROWSER_PROVIDER=ios
770
+ export AGENT_BROWSER_IOS_DEVICE="iPhone 16 Pro"
771
+ agent-browser open https://example.com
772
+ ```
773
+
774
+ | Variable | Description |
775
+ |----------|-------------|
776
+ | `AGENT_BROWSER_PROVIDER` | Set to `ios` to enable iOS mode |
777
+ | `AGENT_BROWSER_IOS_DEVICE` | Device name (e.g., "iPhone 16 Pro", "iPad Pro") |
778
+ | `AGENT_BROWSER_IOS_UDID` | Device UDID (alternative to device name) |
779
+
780
+ **Supported devices:** All iOS Simulators available in Xcode (iPhones, iPads), plus real iOS devices.
781
+
782
+ **Note:** The iOS provider boots the simulator, starts Appium, and controls Safari. First launch takes ~30-60 seconds; subsequent commands are fast.
783
+
784
+ #### Real Device Support
785
+
786
+ Appium also supports real iOS devices connected via USB. This requires additional one-time setup:
787
+
788
+ **1. Get your device UDID:**
789
+ ```bash
790
+ xcrun xctrace list devices
791
+ # or
792
+ system_profiler SPUSBDataType | grep -A 5 "iPhone\|iPad"
793
+ ```
794
+
795
+ **2. Sign WebDriverAgent (one-time):**
796
+ ```bash
797
+ # Open the WebDriverAgent Xcode project
798
+ cd ~/.appium/node_modules/appium-xcuitest-driver/node_modules/appium-webdriveragent
799
+ open WebDriverAgent.xcodeproj
800
+ ```
801
+
802
+ In Xcode:
803
+ - Select the `WebDriverAgentRunner` target
804
+ - Go to Signing & Capabilities
805
+ - Select your Team (requires Apple Developer account, free tier works)
806
+ - Let Xcode manage signing automatically
807
+
808
+ **3. Use with agent-browser:**
809
+ ```bash
810
+ # Connect device via USB, then:
811
+ agent-browser -p ios --device "<DEVICE_UDID>" open https://example.com
812
+
813
+ # Or use the device name if unique
814
+ agent-browser -p ios --device "John's iPhone" open https://example.com
815
+ ```
816
+
817
+ **Real device notes:**
818
+ - First run installs WebDriverAgent to the device (may require Trust prompt)
819
+ - Device must be unlocked and connected via USB
820
+ - Slightly slower initial connection than simulator
821
+ - Tests against real Safari performance and behavior
822
+
823
+ ### Browserbase
824
+
825
+ [Browserbase](https://browserbase.com) provides remote browser infrastructure to make deployment of agentic browsing agents easy. Use it when running the agent-browser CLI in an environment where a local browser isn't feasible.
826
+
827
+ To enable Browserbase, use the `-p` flag:
828
+
829
+ ```bash
830
+ export BROWSERBASE_API_KEY="your-api-key"
831
+ export BROWSERBASE_PROJECT_ID="your-project-id"
832
+ agent-browser -p browserbase open https://example.com
833
+ ```
834
+
835
+ Or use environment variables for CI/scripts:
836
+
837
+ ```bash
838
+ export AGENT_BROWSER_PROVIDER=browserbase
839
+ export BROWSERBASE_API_KEY="your-api-key"
840
+ export BROWSERBASE_PROJECT_ID="your-project-id"
841
+ agent-browser open https://example.com
842
+ ```
843
+
844
+ When enabled, agent-browser connects to a Browserbase session instead of launching a local browser. All commands work identically.
845
+
846
+ Get your API key and project ID from the [Browserbase Dashboard](https://browserbase.com/overview).
847
+
848
+ ### Browser Use
849
+
850
+ [Browser Use](https://browser-use.com) provides cloud browser infrastructure for AI agents. Use it when running agent-browser in environments where a local browser isn't available (serverless, CI/CD, etc.).
851
+
852
+ To enable Browser Use, use the `-p` flag:
853
+
854
+ ```bash
855
+ export BROWSER_USE_API_KEY="your-api-key"
856
+ agent-browser -p browseruse open https://example.com
857
+ ```
858
+
859
+ Or use environment variables for CI/scripts:
860
+
861
+ ```bash
862
+ export AGENT_BROWSER_PROVIDER=browseruse
863
+ export BROWSER_USE_API_KEY="your-api-key"
864
+ agent-browser open https://example.com
865
+ ```
866
+
867
+ When enabled, agent-browser connects to a Browser Use cloud session instead of launching a local browser. All commands work identically.
868
+
869
+ Get your API key from the [Browser Use Cloud Dashboard](https://cloud.browser-use.com/settings?tab=api-keys). Free credits are available to get started, with pay-as-you-go pricing after.
870
+
871
+ ### Kernel
872
+
873
+ [Kernel](https://www.kernel.sh) provides cloud browser infrastructure for AI agents with features like stealth mode and persistent profiles.
874
+
875
+ To enable Kernel, use the `-p` flag:
876
+
877
+ ```bash
878
+ export KERNEL_API_KEY="your-api-key"
879
+ agent-browser -p kernel open https://example.com
880
+ ```
881
+
882
+ Or use environment variables for CI/scripts:
883
+
884
+ ```bash
885
+ export AGENT_BROWSER_PROVIDER=kernel
886
+ export KERNEL_API_KEY="your-api-key"
887
+ agent-browser open https://example.com
888
+ ```
889
+
890
+ Optional configuration via environment variables:
891
+
892
+ | Variable | Description | Default |
893
+ |----------|-------------|---------|
894
+ | `KERNEL_HEADLESS` | Run browser in headless mode (`true`/`false`) | `false` |
895
+ | `KERNEL_STEALTH` | Enable stealth mode to avoid bot detection (`true`/`false`) | `true` |
896
+ | `KERNEL_TIMEOUT_SECONDS` | Session timeout in seconds | `300` |
897
+ | `KERNEL_PROFILE_NAME` | Browser profile name for persistent cookies/logins (created if it doesn't exist) | (none) |
898
+
899
+ When enabled, agent-browser connects to a Kernel cloud session instead of launching a local browser. All commands work identically.
900
+
901
+ **Profile Persistence:** When `KERNEL_PROFILE_NAME` is set, the profile will be created if it doesn't already exist. Cookies, logins, and session data are automatically saved back to the profile when the browser session ends, making them available for future sessions.
902
+
903
+ Get your API key from the [Kernel Dashboard](https://dashboard.onkernel.com).
904
+
905
+ ## License
906
+
907
+ Apache-2.0