@kritchoff/agent-browser 0.9.52 → 1.0.1
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +82 -849
- package/bin/agent-browser.js +2 -1
- package/package.json +3 -14
- package/README.sdk.md +0 -129
- package/bin/agent-browser-linux-x64 +0 -0
- package/scripts/copy-native.js +0 -36
- package/scripts/fast_reset.sh +0 -117
- package/scripts/postinstall.js +0 -235
- package/scripts/snapshot_manager.sh +0 -293
- package/scripts/vaccine-run +0 -26
- package/sdk.sh +0 -176
- package/start.sh +0 -109
package/README.md
CHANGED
|
@@ -1,903 +1,136 @@
|
|
|
1
|
-
# agent-browser
|
|
1
|
+
# @kritchoff/agent-browser
|
|
2
2
|
|
|
3
|
-
|
|
3
|
+
<p align="center">
|
|
4
|
+
<strong>The Ultimate Headless Android Browser SDK for AI Agents</strong>
|
|
5
|
+
</p>
|
|
4
6
|
|
|
5
|
-
|
|
7
|
+
This SDK provides a **Real Android Browser** (WootzApp) wrapped in a Docker container, controlled by a high-speed, Playwright-like TypeScript daemon.
|
|
6
8
|
|
|
7
|
-
|
|
9
|
+
It is specifically designed for AI Agents to navigate the mobile web, bypass bot detection, and generate LLM-friendly semantic trees (`AXTree`).
|
|
8
10
|
|
|
9
|
-
|
|
10
|
-
npm install -g agent-browser
|
|
11
|
-
agent-browser install # Download Chromium
|
|
12
|
-
```
|
|
13
|
-
|
|
14
|
-
### Homebrew (macOS)
|
|
15
|
-
|
|
16
|
-
```bash
|
|
17
|
-
brew install agent-browser
|
|
18
|
-
agent-browser install # Download Chromium
|
|
19
|
-
```
|
|
20
|
-
|
|
21
|
-
### From Source
|
|
22
|
-
|
|
23
|
-
```bash
|
|
24
|
-
git clone https://github.com/vercel-labs/agent-browser
|
|
25
|
-
cd agent-browser
|
|
26
|
-
pnpm install
|
|
27
|
-
pnpm build
|
|
28
|
-
pnpm build:native # Requires Rust (https://rustup.rs)
|
|
29
|
-
pnpm link --global # Makes agent-browser available globally
|
|
30
|
-
agent-browser install
|
|
31
|
-
```
|
|
32
|
-
|
|
33
|
-
### Linux Dependencies
|
|
34
|
-
|
|
35
|
-
On Linux, install system dependencies:
|
|
36
|
-
|
|
37
|
-
```bash
|
|
38
|
-
agent-browser install --with-deps
|
|
39
|
-
# or manually: npx playwright install-deps chromium
|
|
40
|
-
```
|
|
41
|
-
|
|
42
|
-
## Quick Start
|
|
43
|
-
|
|
44
|
-
```bash
|
|
45
|
-
agent-browser open example.com
|
|
46
|
-
agent-browser snapshot # Get accessibility tree with refs
|
|
47
|
-
agent-browser click @e2 # Click by ref from snapshot
|
|
48
|
-
agent-browser fill @e3 "test@example.com" # Fill by ref
|
|
49
|
-
agent-browser get text @e1 # Get text by ref
|
|
50
|
-
agent-browser screenshot page.png
|
|
51
|
-
agent-browser close
|
|
52
|
-
```
|
|
53
|
-
|
|
54
|
-
### Traditional Selectors (also supported)
|
|
55
|
-
|
|
56
|
-
```bash
|
|
57
|
-
agent-browser click "#submit"
|
|
58
|
-
agent-browser fill "#email" "test@example.com"
|
|
59
|
-
agent-browser find role button click --name "Submit"
|
|
60
|
-
```
|
|
61
|
-
|
|
62
|
-
## Commands
|
|
63
|
-
|
|
64
|
-
### Core Commands
|
|
65
|
-
|
|
66
|
-
```bash
|
|
67
|
-
agent-browser open <url> # Navigate to URL (aliases: goto, navigate)
|
|
68
|
-
agent-browser click <sel> # Click element
|
|
69
|
-
agent-browser dblclick <sel> # Double-click element
|
|
70
|
-
agent-browser focus <sel> # Focus element
|
|
71
|
-
agent-browser type <sel> <text> # Type into element
|
|
72
|
-
agent-browser fill <sel> <text> # Clear and fill
|
|
73
|
-
agent-browser press <key> # Press key (Enter, Tab, Control+a) (alias: key)
|
|
74
|
-
agent-browser keydown <key> # Hold key down
|
|
75
|
-
agent-browser keyup <key> # Release key
|
|
76
|
-
agent-browser hover <sel> # Hover element
|
|
77
|
-
agent-browser select <sel> <val> # Select dropdown option
|
|
78
|
-
agent-browser check <sel> # Check checkbox
|
|
79
|
-
agent-browser uncheck <sel> # Uncheck checkbox
|
|
80
|
-
agent-browser scroll <dir> [px] # Scroll (up/down/left/right)
|
|
81
|
-
agent-browser scrollintoview <sel> # Scroll element into view (alias: scrollinto)
|
|
82
|
-
agent-browser drag <src> <tgt> # Drag and drop
|
|
83
|
-
agent-browser upload <sel> <files> # Upload files
|
|
84
|
-
agent-browser screenshot [path] # Take screenshot (--full for full page, saves to a temporary directory if no path)
|
|
85
|
-
agent-browser pdf <path> # Save as PDF
|
|
86
|
-
agent-browser snapshot # Accessibility tree with refs (best for AI)
|
|
87
|
-
agent-browser eval <js> # Run JavaScript (-b for base64, --stdin for piped input)
|
|
88
|
-
agent-browser connect <port> # Connect to browser via CDP
|
|
89
|
-
agent-browser close # Close browser (aliases: quit, exit)
|
|
90
|
-
```
|
|
91
|
-
|
|
92
|
-
### Get Info
|
|
93
|
-
|
|
94
|
-
```bash
|
|
95
|
-
agent-browser get text <sel> # Get text content
|
|
96
|
-
agent-browser get html <sel> # Get innerHTML
|
|
97
|
-
agent-browser get value <sel> # Get input value
|
|
98
|
-
agent-browser get attr <sel> <attr> # Get attribute
|
|
99
|
-
agent-browser get title # Get page title
|
|
100
|
-
agent-browser get url # Get current URL
|
|
101
|
-
agent-browser get count <sel> # Count matching elements
|
|
102
|
-
agent-browser get box <sel> # Get bounding box
|
|
103
|
-
```
|
|
104
|
-
|
|
105
|
-
### Check State
|
|
106
|
-
|
|
107
|
-
```bash
|
|
108
|
-
agent-browser is visible <sel> # Check if visible
|
|
109
|
-
agent-browser is enabled <sel> # Check if enabled
|
|
110
|
-
agent-browser is checked <sel> # Check if checked
|
|
111
|
-
```
|
|
112
|
-
|
|
113
|
-
### Find Elements (Semantic Locators)
|
|
114
|
-
|
|
115
|
-
```bash
|
|
116
|
-
agent-browser find role <role> <action> [value] # By ARIA role
|
|
117
|
-
agent-browser find text <text> <action> # By text content
|
|
118
|
-
agent-browser find label <label> <action> [value] # By label
|
|
119
|
-
agent-browser find placeholder <ph> <action> [value] # By placeholder
|
|
120
|
-
agent-browser find alt <text> <action> # By alt text
|
|
121
|
-
agent-browser find title <text> <action> # By title attr
|
|
122
|
-
agent-browser find testid <id> <action> [value] # By data-testid
|
|
123
|
-
agent-browser find first <sel> <action> [value] # First match
|
|
124
|
-
agent-browser find last <sel> <action> [value] # Last match
|
|
125
|
-
agent-browser find nth <n> <sel> <action> [value] # Nth match
|
|
126
|
-
```
|
|
127
|
-
|
|
128
|
-
**Actions:** `click`, `fill`, `check`, `hover`, `text`
|
|
129
|
-
|
|
130
|
-
**Examples:**
|
|
131
|
-
```bash
|
|
132
|
-
agent-browser find role button click --name "Submit"
|
|
133
|
-
agent-browser find text "Sign In" click
|
|
134
|
-
agent-browser find label "Email" fill "test@test.com"
|
|
135
|
-
agent-browser find first ".item" click
|
|
136
|
-
agent-browser find nth 2 "a" text
|
|
137
|
-
```
|
|
138
|
-
|
|
139
|
-
### Wait
|
|
140
|
-
|
|
141
|
-
```bash
|
|
142
|
-
agent-browser wait <selector> # Wait for element to be visible
|
|
143
|
-
agent-browser wait <ms> # Wait for time (milliseconds)
|
|
144
|
-
agent-browser wait --text "Welcome" # Wait for text to appear
|
|
145
|
-
agent-browser wait --url "**/dash" # Wait for URL pattern
|
|
146
|
-
agent-browser wait --load networkidle # Wait for load state
|
|
147
|
-
agent-browser wait --fn "window.ready === true" # Wait for JS condition
|
|
148
|
-
```
|
|
149
|
-
|
|
150
|
-
**Load states:** `load`, `domcontentloaded`, `networkidle`
|
|
151
|
-
|
|
152
|
-
### Mouse Control
|
|
153
|
-
|
|
154
|
-
```bash
|
|
155
|
-
agent-browser mouse move <x> <y> # Move mouse
|
|
156
|
-
agent-browser mouse down [button] # Press button (left/right/middle)
|
|
157
|
-
agent-browser mouse up [button] # Release button
|
|
158
|
-
agent-browser mouse wheel <dy> [dx] # Scroll wheel
|
|
159
|
-
```
|
|
160
|
-
|
|
161
|
-
### Browser Settings
|
|
162
|
-
|
|
163
|
-
```bash
|
|
164
|
-
agent-browser set viewport <w> <h> # Set viewport size
|
|
165
|
-
agent-browser set device <name> # Emulate device ("iPhone 14")
|
|
166
|
-
agent-browser set geo <lat> <lng> # Set geolocation
|
|
167
|
-
agent-browser set offline [on|off] # Toggle offline mode
|
|
168
|
-
agent-browser set headers <json> # Extra HTTP headers
|
|
169
|
-
agent-browser set credentials <u> <p> # HTTP basic auth
|
|
170
|
-
agent-browser set media [dark|light] # Emulate color scheme
|
|
171
|
-
```
|
|
172
|
-
|
|
173
|
-
### Cookies & Storage
|
|
174
|
-
|
|
175
|
-
```bash
|
|
176
|
-
agent-browser cookies # Get all cookies
|
|
177
|
-
agent-browser cookies set <name> <val> # Set cookie
|
|
178
|
-
agent-browser cookies clear # Clear cookies
|
|
179
|
-
|
|
180
|
-
agent-browser storage local # Get all localStorage
|
|
181
|
-
agent-browser storage local <key> # Get specific key
|
|
182
|
-
agent-browser storage local set <k> <v> # Set value
|
|
183
|
-
agent-browser storage local clear # Clear all
|
|
184
|
-
|
|
185
|
-
agent-browser storage session # Same for sessionStorage
|
|
186
|
-
```
|
|
187
|
-
|
|
188
|
-
### Network
|
|
189
|
-
|
|
190
|
-
```bash
|
|
191
|
-
agent-browser network route <url> # Intercept requests
|
|
192
|
-
agent-browser network route <url> --abort # Block requests
|
|
193
|
-
agent-browser network route <url> --body <json> # Mock response
|
|
194
|
-
agent-browser network unroute [url] # Remove routes
|
|
195
|
-
agent-browser network requests # View tracked requests
|
|
196
|
-
agent-browser network requests --filter api # Filter requests
|
|
197
|
-
```
|
|
198
|
-
|
|
199
|
-
### Tabs & Windows
|
|
200
|
-
|
|
201
|
-
```bash
|
|
202
|
-
agent-browser tab # List tabs
|
|
203
|
-
agent-browser tab new [url] # New tab (optionally with URL)
|
|
204
|
-
agent-browser tab <n> # Switch to tab n
|
|
205
|
-
agent-browser tab close [n] # Close tab
|
|
206
|
-
agent-browser window new # New window
|
|
207
|
-
```
|
|
208
|
-
|
|
209
|
-
### Frames
|
|
210
|
-
|
|
211
|
-
```bash
|
|
212
|
-
agent-browser frame <sel> # Switch to iframe
|
|
213
|
-
agent-browser frame main # Back to main frame
|
|
214
|
-
```
|
|
215
|
-
|
|
216
|
-
### Dialogs
|
|
217
|
-
|
|
218
|
-
```bash
|
|
219
|
-
agent-browser dialog accept [text] # Accept (with optional prompt text)
|
|
220
|
-
agent-browser dialog dismiss # Dismiss
|
|
221
|
-
```
|
|
222
|
-
|
|
223
|
-
### Debug
|
|
224
|
-
|
|
225
|
-
```bash
|
|
226
|
-
agent-browser trace start [path] # Start recording trace
|
|
227
|
-
agent-browser trace stop [path] # Stop and save trace
|
|
228
|
-
agent-browser console # View console messages (log, error, warn, info)
|
|
229
|
-
agent-browser console --clear # Clear console
|
|
230
|
-
agent-browser errors # View page errors (uncaught JavaScript exceptions)
|
|
231
|
-
agent-browser errors --clear # Clear errors
|
|
232
|
-
agent-browser highlight <sel> # Highlight element
|
|
233
|
-
agent-browser state save <path> # Save auth state
|
|
234
|
-
agent-browser state load <path> # Load auth state
|
|
235
|
-
```
|
|
236
|
-
|
|
237
|
-
### Navigation
|
|
238
|
-
|
|
239
|
-
```bash
|
|
240
|
-
agent-browser back # Go back
|
|
241
|
-
agent-browser forward # Go forward
|
|
242
|
-
agent-browser reload # Reload page
|
|
243
|
-
```
|
|
244
|
-
|
|
245
|
-
### Setup
|
|
246
|
-
|
|
247
|
-
```bash
|
|
248
|
-
agent-browser install # Download Chromium browser
|
|
249
|
-
agent-browser install --with-deps # Also install system deps (Linux)
|
|
250
|
-
```
|
|
251
|
-
|
|
252
|
-
## Sessions
|
|
253
|
-
|
|
254
|
-
Run multiple isolated browser instances:
|
|
255
|
-
|
|
256
|
-
```bash
|
|
257
|
-
# Different sessions
|
|
258
|
-
agent-browser --session agent1 open site-a.com
|
|
259
|
-
agent-browser --session agent2 open site-b.com
|
|
260
|
-
|
|
261
|
-
# Or via environment variable
|
|
262
|
-
AGENT_BROWSER_SESSION=agent1 agent-browser click "#btn"
|
|
263
|
-
|
|
264
|
-
# List active sessions
|
|
265
|
-
agent-browser session list
|
|
266
|
-
# Output:
|
|
267
|
-
# Active sessions:
|
|
268
|
-
# -> default
|
|
269
|
-
# agent1
|
|
270
|
-
|
|
271
|
-
# Show current session
|
|
272
|
-
agent-browser session
|
|
273
|
-
```
|
|
274
|
-
|
|
275
|
-
Each session has its own:
|
|
276
|
-
- Browser instance
|
|
277
|
-
- Cookies and storage
|
|
278
|
-
- Navigation history
|
|
279
|
-
- Authentication state
|
|
280
|
-
|
|
281
|
-
## Persistent Profiles
|
|
282
|
-
|
|
283
|
-
By default, browser state (cookies, localStorage, login sessions) is ephemeral and lost when the browser closes. Use `--profile` to persist state across browser restarts:
|
|
284
|
-
|
|
285
|
-
```bash
|
|
286
|
-
# Use a persistent profile directory
|
|
287
|
-
agent-browser --profile ~/.myapp-profile open myapp.com
|
|
288
|
-
|
|
289
|
-
# Login once, then reuse the authenticated session
|
|
290
|
-
agent-browser --profile ~/.myapp-profile open myapp.com/dashboard
|
|
291
|
-
|
|
292
|
-
# Or via environment variable
|
|
293
|
-
AGENT_BROWSER_PROFILE=~/.myapp-profile agent-browser open myapp.com
|
|
294
|
-
```
|
|
295
|
-
|
|
296
|
-
The profile directory stores:
|
|
297
|
-
- Cookies and localStorage
|
|
298
|
-
- IndexedDB data
|
|
299
|
-
- Service workers
|
|
300
|
-
- Browser cache
|
|
301
|
-
- Login sessions
|
|
302
|
-
|
|
303
|
-
**Tip**: Use different profile paths for different projects to keep their browser state isolated.
|
|
304
|
-
|
|
305
|
-
## Snapshot Options
|
|
306
|
-
|
|
307
|
-
The `snapshot` command supports filtering to reduce output size:
|
|
308
|
-
|
|
309
|
-
```bash
|
|
310
|
-
agent-browser snapshot # Full accessibility tree
|
|
311
|
-
agent-browser snapshot -i # Interactive elements only (buttons, inputs, links)
|
|
312
|
-
agent-browser snapshot -i -C # Include cursor-interactive elements (divs with onclick, etc.)
|
|
313
|
-
agent-browser snapshot -c # Compact (remove empty structural elements)
|
|
314
|
-
agent-browser snapshot -d 3 # Limit depth to 3 levels
|
|
315
|
-
agent-browser snapshot -s "#main" # Scope to CSS selector
|
|
316
|
-
agent-browser snapshot -i -c -d 5 # Combine options
|
|
317
|
-
```
|
|
318
|
-
|
|
319
|
-
| Option | Description |
|
|
320
|
-
|--------|-------------|
|
|
321
|
-
| `-i, --interactive` | Only show interactive elements (buttons, links, inputs) |
|
|
322
|
-
| `-C, --cursor` | Include cursor-interactive elements (cursor:pointer, onclick, tabindex) |
|
|
323
|
-
| `-c, --compact` | Remove empty structural elements |
|
|
324
|
-
| `-d, --depth <n>` | Limit tree depth |
|
|
325
|
-
| `-s, --selector <sel>` | Scope to CSS selector |
|
|
326
|
-
|
|
327
|
-
The `-C` flag is useful for modern web apps that use custom clickable elements (divs, spans) instead of standard buttons/links.
|
|
328
|
-
|
|
329
|
-
## Options
|
|
330
|
-
|
|
331
|
-
| Option | Description |
|
|
332
|
-
|--------|-------------|
|
|
333
|
-
| `--session <name>` | Use isolated session (or `AGENT_BROWSER_SESSION` env) |
|
|
334
|
-
| `--profile <path>` | Persistent browser profile directory (or `AGENT_BROWSER_PROFILE` env) |
|
|
335
|
-
| `--headers <json>` | Set HTTP headers scoped to the URL's origin |
|
|
336
|
-
| `--executable-path <path>` | Custom browser executable (or `AGENT_BROWSER_EXECUTABLE_PATH` env) |
|
|
337
|
-
| `--args <args>` | Browser launch args, comma or newline separated (or `AGENT_BROWSER_ARGS` env) |
|
|
338
|
-
| `--user-agent <ua>` | Custom User-Agent string (or `AGENT_BROWSER_USER_AGENT` env) |
|
|
339
|
-
| `--proxy <url>` | Proxy server URL with optional auth (or `AGENT_BROWSER_PROXY` env) |
|
|
340
|
-
| `--proxy-bypass <hosts>` | Hosts to bypass proxy (or `AGENT_BROWSER_PROXY_BYPASS` env) |
|
|
341
|
-
| `-p, --provider <name>` | Cloud browser provider (or `AGENT_BROWSER_PROVIDER` env) |
|
|
342
|
-
| `--json` | JSON output (for agents) |
|
|
343
|
-
| `--full, -f` | Full page screenshot |
|
|
344
|
-
| `--name, -n` | Locator name filter |
|
|
345
|
-
| `--exact` | Exact text match |
|
|
346
|
-
| `--headed` | Show browser window (not headless) |
|
|
347
|
-
| `--cdp <port>` | Connect via Chrome DevTools Protocol |
|
|
348
|
-
| `--ignore-https-errors` | Ignore HTTPS certificate errors (useful for self-signed certs) |
|
|
349
|
-
| `--allow-file-access` | Allow file:// URLs to access local files (Chromium only) |
|
|
350
|
-
| `--debug` | Debug output |
|
|
351
|
-
|
|
352
|
-
## Selectors
|
|
353
|
-
|
|
354
|
-
### Refs (Recommended for AI)
|
|
355
|
-
|
|
356
|
-
Refs provide deterministic element selection from snapshots:
|
|
357
|
-
|
|
358
|
-
```bash
|
|
359
|
-
# 1. Get snapshot with refs
|
|
360
|
-
agent-browser snapshot
|
|
361
|
-
# Output:
|
|
362
|
-
# - heading "Example Domain" [ref=e1] [level=1]
|
|
363
|
-
# - button "Submit" [ref=e2]
|
|
364
|
-
# - textbox "Email" [ref=e3]
|
|
365
|
-
# - link "Learn more" [ref=e4]
|
|
366
|
-
|
|
367
|
-
# 2. Use refs to interact
|
|
368
|
-
agent-browser click @e2 # Click the button
|
|
369
|
-
agent-browser fill @e3 "test@example.com" # Fill the textbox
|
|
370
|
-
agent-browser get text @e1 # Get heading text
|
|
371
|
-
agent-browser hover @e4 # Hover the link
|
|
372
|
-
```
|
|
373
|
-
|
|
374
|
-
**Why use refs?**
|
|
375
|
-
- **Deterministic**: Ref points to exact element from snapshot
|
|
376
|
-
- **Fast**: No DOM re-query needed
|
|
377
|
-
- **AI-friendly**: Snapshot + ref workflow is optimal for LLMs
|
|
378
|
-
|
|
379
|
-
### CSS Selectors
|
|
380
|
-
|
|
381
|
-
```bash
|
|
382
|
-
agent-browser click "#id"
|
|
383
|
-
agent-browser click ".class"
|
|
384
|
-
agent-browser click "div > button"
|
|
385
|
-
```
|
|
386
|
-
|
|
387
|
-
### Text & XPath
|
|
388
|
-
|
|
389
|
-
```bash
|
|
390
|
-
agent-browser click "text=Submit"
|
|
391
|
-
agent-browser click "xpath=//button"
|
|
392
|
-
```
|
|
393
|
-
|
|
394
|
-
### Semantic Locators
|
|
395
|
-
|
|
396
|
-
```bash
|
|
397
|
-
agent-browser find role button click --name "Submit"
|
|
398
|
-
agent-browser find label "Email" fill "test@test.com"
|
|
399
|
-
```
|
|
400
|
-
|
|
401
|
-
## Agent Mode
|
|
402
|
-
|
|
403
|
-
Use `--json` for machine-readable output:
|
|
404
|
-
|
|
405
|
-
```bash
|
|
406
|
-
agent-browser snapshot --json
|
|
407
|
-
# Returns: {"success":true,"data":{"snapshot":"...","refs":{"e1":{"role":"heading","name":"Title"},...}}}
|
|
408
|
-
|
|
409
|
-
agent-browser get text @e1 --json
|
|
410
|
-
agent-browser is visible @e2 --json
|
|
411
|
-
```
|
|
412
|
-
|
|
413
|
-
### Optimal AI Workflow
|
|
11
|
+
---
|
|
414
12
|
|
|
415
|
-
|
|
416
|
-
# 1. Navigate and get snapshot
|
|
417
|
-
agent-browser open example.com
|
|
418
|
-
agent-browser snapshot -i --json # AI parses tree and refs
|
|
419
|
-
|
|
420
|
-
# 2. AI identifies target refs from snapshot
|
|
421
|
-
# 3. Execute actions using refs
|
|
422
|
-
agent-browser click @e2
|
|
423
|
-
agent-browser fill @e3 "input text"
|
|
13
|
+
## 🌟 Key Features
|
|
424
14
|
|
|
425
|
-
|
|
426
|
-
|
|
427
|
-
|
|
15
|
+
- **Cross-Platform**: Works natively on **Windows, macOS (Intel & Apple Silicon), and Linux** using a pure Node.js Orchestrator (No WSL or bash required).
|
|
16
|
+
- **Zero-Config Setup**: Automatically downloads and orchestrates the required Docker containers.
|
|
17
|
+
- **Hyper-Speed Warm Boots**: Uses advanced VDI Volume Mounting to boot the Android environment in **< 5 seconds** after the first run.
|
|
18
|
+
- **Fast Resets**: Cleans the browser state via Android userspace reboot in **~15 seconds**.
|
|
19
|
+
- **Playwright Parity**: Control the mobile browser using standard Playwright commands (`click`, `type`, `waitForSelector`).
|
|
20
|
+
- **Semantic AXTree**: Built-in `snapshot()` method generates a clean, text-based UI tree optimized for LLM reasoning.
|
|
428
21
|
|
|
429
|
-
|
|
22
|
+
---
|
|
430
23
|
|
|
431
|
-
|
|
24
|
+
## 🛠️ Prerequisites
|
|
432
25
|
|
|
433
|
-
|
|
434
|
-
|
|
435
|
-
|
|
26
|
+
1. **Docker Engine**: Must be installed and running.
|
|
27
|
+
- *Windows/Mac Users*: Install [Docker Desktop](https://www.docker.com/products/docker-desktop/).
|
|
28
|
+
- *Linux Users*: Ensure your user is in the `docker` group (`sudo usermod -aG docker $USER`).
|
|
29
|
+
2. **Node.js**: v18+ is required.
|
|
436
30
|
|
|
437
|
-
|
|
31
|
+
---
|
|
438
32
|
|
|
439
|
-
##
|
|
33
|
+
## 📦 Installation
|
|
440
34
|
|
|
441
|
-
|
|
35
|
+
Install the SDK in your project:
|
|
442
36
|
|
|
443
37
|
```bash
|
|
444
|
-
|
|
445
|
-
agent-browser open api.example.com --headers '{"Authorization": "Bearer <token>"}'
|
|
446
|
-
|
|
447
|
-
# Requests to api.example.com include the auth header
|
|
448
|
-
agent-browser snapshot -i --json
|
|
449
|
-
agent-browser click @e2
|
|
450
|
-
|
|
451
|
-
# Navigate to another domain - headers are NOT sent (safe!)
|
|
452
|
-
agent-browser open other-site.com
|
|
38
|
+
npm install @kritchoff/agent-browser
|
|
453
39
|
```
|
|
454
40
|
|
|
455
|
-
|
|
456
|
-
- **Skipping login flows** - Authenticate via headers instead of UI
|
|
457
|
-
- **Switching users** - Start new sessions with different auth tokens
|
|
458
|
-
- **API testing** - Access protected endpoints directly
|
|
459
|
-
- **Security** - Headers are scoped to the origin, not leaked to other domains
|
|
460
|
-
|
|
461
|
-
To set headers for multiple origins, use `--headers` with each `open` command:
|
|
462
|
-
|
|
41
|
+
*(Optional but recommended)* Install `tsx` to run TypeScript files natively without pre-compiling:
|
|
463
42
|
```bash
|
|
464
|
-
|
|
465
|
-
agent-browser open api.acme.com --headers '{"Authorization": "Bearer token2"}'
|
|
43
|
+
npm install -D tsx
|
|
466
44
|
```
|
|
467
45
|
|
|
468
|
-
|
|
469
|
-
|
|
470
|
-
```bash
|
|
471
|
-
agent-browser set headers '{"X-Custom-Header": "value"}'
|
|
472
|
-
```
|
|
473
|
-
|
|
474
|
-
## Custom Browser Executable
|
|
475
|
-
|
|
476
|
-
Use a custom browser executable instead of the bundled Chromium. This is useful for:
|
|
477
|
-
- **Serverless deployment**: Use lightweight Chromium builds like `@sparticuz/chromium` (~50MB vs ~684MB)
|
|
478
|
-
- **System browsers**: Use an existing Chrome/Chromium installation
|
|
479
|
-
- **Custom builds**: Use modified browser builds
|
|
480
|
-
|
|
481
|
-
### CLI Usage
|
|
482
|
-
|
|
483
|
-
```bash
|
|
484
|
-
# Via flag
|
|
485
|
-
agent-browser --executable-path /path/to/chromium open example.com
|
|
46
|
+
---
|
|
486
47
|
|
|
487
|
-
|
|
488
|
-
AGENT_BROWSER_EXECUTABLE_PATH=/path/to/chromium agent-browser open example.com
|
|
489
|
-
```
|
|
48
|
+
## 🚀 Quick Start Guide
|
|
490
49
|
|
|
491
|
-
|
|
50
|
+
Create a file named `agent.ts`:
|
|
492
51
|
|
|
493
52
|
```typescript
|
|
494
|
-
import
|
|
495
|
-
import { BrowserManager } from 'agent-browser';
|
|
496
|
-
|
|
497
|
-
export async function handler() {
|
|
498
|
-
const browser = new BrowserManager();
|
|
499
|
-
await browser.launch({
|
|
500
|
-
executablePath: await chromium.executablePath(),
|
|
501
|
-
headless: true,
|
|
502
|
-
});
|
|
503
|
-
// ... use browser
|
|
504
|
-
}
|
|
505
|
-
```
|
|
506
|
-
|
|
507
|
-
## Local Files
|
|
508
|
-
|
|
509
|
-
Open and interact with local files (PDFs, HTML, etc.) using `file://` URLs:
|
|
510
|
-
|
|
511
|
-
```bash
|
|
512
|
-
# Enable file access (required for JavaScript to access local files)
|
|
513
|
-
agent-browser --allow-file-access open file:///path/to/document.pdf
|
|
514
|
-
agent-browser --allow-file-access open file:///path/to/page.html
|
|
515
|
-
|
|
516
|
-
# Take screenshot of a local PDF
|
|
517
|
-
agent-browser --allow-file-access open file:///Users/me/report.pdf
|
|
518
|
-
agent-browser screenshot report.png
|
|
519
|
-
```
|
|
520
|
-
|
|
521
|
-
The `--allow-file-access` flag adds Chromium flags (`--allow-file-access-from-files`, `--allow-file-access`) that allow `file://` URLs to:
|
|
522
|
-
- Load and render local files
|
|
523
|
-
- Access other local files via JavaScript (XHR, fetch)
|
|
524
|
-
- Load local resources (images, scripts, stylesheets)
|
|
525
|
-
|
|
526
|
-
**Note:** This flag only works with Chromium. For security, it's disabled by default.
|
|
527
|
-
|
|
528
|
-
## CDP Mode
|
|
529
|
-
|
|
530
|
-
Connect to an existing browser via Chrome DevTools Protocol:
|
|
531
|
-
|
|
532
|
-
```bash
|
|
533
|
-
# Start Chrome with: google-chrome --remote-debugging-port=9222
|
|
534
|
-
|
|
535
|
-
# Connect once, then run commands without --cdp
|
|
536
|
-
agent-browser connect 9222
|
|
537
|
-
agent-browser snapshot
|
|
538
|
-
agent-browser tab
|
|
539
|
-
agent-browser close
|
|
53
|
+
import { WootzAgent } from '@kritchoff/agent-browser';
|
|
540
54
|
|
|
541
|
-
|
|
542
|
-
|
|
55
|
+
async function main() {
|
|
56
|
+
// 1. Initialize the controller
|
|
57
|
+
const agent = new WootzAgent();
|
|
543
58
|
|
|
544
|
-
|
|
545
|
-
|
|
546
|
-
|
|
547
|
-
|
|
548
|
-
The `--cdp` flag accepts either:
|
|
549
|
-
- A port number (e.g., `9222`) for local connections via `http://localhost:{port}`
|
|
550
|
-
- A full WebSocket URL (e.g., `wss://...` or `ws://...`) for remote browser services
|
|
551
|
-
|
|
552
|
-
This enables control of:
|
|
553
|
-
- Electron apps
|
|
554
|
-
- Chrome/Chromium instances with remote debugging
|
|
555
|
-
- WebView2 applications
|
|
556
|
-
- Any browser exposing a CDP endpoint
|
|
59
|
+
console.log('🚀 Booting Environment...');
|
|
60
|
+
// First run: Downloads 3GB image and cold boots (~90s).
|
|
61
|
+
// Next run: Instant Hyper-Speed Warm Boot (~5s).
|
|
62
|
+
await agent.start();
|
|
557
63
|
|
|
558
|
-
|
|
64
|
+
console.log('🌐 Navigating to Google...');
|
|
65
|
+
await agent.navigate('https://google.com');
|
|
559
66
|
|
|
560
|
-
|
|
67
|
+
console.log('📸 Capturing Semantic Tree for LLM...');
|
|
68
|
+
const uiTree = await agent.snapshot();
|
|
69
|
+
console.log(uiTree);
|
|
561
70
|
|
|
562
|
-
|
|
71
|
+
console.log('⌨️ Typing and Searching...');
|
|
72
|
+
await agent.type('textarea[name="q"]', 'WootzApp AI');
|
|
73
|
+
await agent.press('Enter');
|
|
563
74
|
|
|
564
|
-
|
|
565
|
-
|
|
566
|
-
|
|
567
|
-
AGENT_BROWSER_STREAM_PORT=9223 agent-browser open example.com
|
|
568
|
-
```
|
|
569
|
-
|
|
570
|
-
This starts a WebSocket server on the specified port that streams the browser viewport and accepts input events.
|
|
571
|
-
|
|
572
|
-
### WebSocket Protocol
|
|
573
|
-
|
|
574
|
-
Connect to `ws://localhost:9223` to receive frames and send input:
|
|
575
|
-
|
|
576
|
-
**Receive frames:**
|
|
577
|
-
```json
|
|
578
|
-
{
|
|
579
|
-
"type": "frame",
|
|
580
|
-
"data": "<base64-encoded-jpeg>",
|
|
581
|
-
"metadata": {
|
|
582
|
-
"deviceWidth": 1280,
|
|
583
|
-
"deviceHeight": 720,
|
|
584
|
-
"pageScaleFactor": 1,
|
|
585
|
-
"offsetTop": 0,
|
|
586
|
-
"scrollOffsetX": 0,
|
|
587
|
-
"scrollOffsetY": 0
|
|
588
|
-
}
|
|
589
|
-
}
|
|
590
|
-
```
|
|
75
|
+
console.log('🧹 Fast Reset for next task...');
|
|
76
|
+
// Wipes all tabs, cookies, and cache in ~15s
|
|
77
|
+
await agent.reset();
|
|
591
78
|
|
|
592
|
-
|
|
593
|
-
|
|
594
|
-
|
|
595
|
-
"type": "input_mouse",
|
|
596
|
-
"eventType": "mousePressed",
|
|
597
|
-
"x": 100,
|
|
598
|
-
"y": 200,
|
|
599
|
-
"button": "left",
|
|
600
|
-
"clickCount": 1
|
|
79
|
+
console.log('🛑 Shutting down...');
|
|
80
|
+
// Completely destroys containers and releases ports
|
|
81
|
+
await agent.stop();
|
|
601
82
|
}
|
|
602
|
-
```
|
|
603
|
-
|
|
604
|
-
**Send keyboard events:**
|
|
605
|
-
```json
|
|
606
|
-
{
|
|
607
|
-
"type": "input_keyboard",
|
|
608
|
-
"eventType": "keyDown",
|
|
609
|
-
"key": "Enter",
|
|
610
|
-
"code": "Enter"
|
|
611
|
-
}
|
|
612
|
-
```
|
|
613
|
-
|
|
614
|
-
**Send touch events:**
|
|
615
|
-
```json
|
|
616
|
-
{
|
|
617
|
-
"type": "input_touch",
|
|
618
|
-
"eventType": "touchStart",
|
|
619
|
-
"touchPoints": [{ "x": 100, "y": 200 }]
|
|
620
|
-
}
|
|
621
|
-
```
|
|
622
|
-
|
|
623
|
-
### Programmatic API
|
|
624
|
-
|
|
625
|
-
For advanced use, control streaming directly via the protocol:
|
|
626
|
-
|
|
627
|
-
```typescript
|
|
628
|
-
import { BrowserManager } from 'agent-browser';
|
|
629
|
-
|
|
630
|
-
const browser = new BrowserManager();
|
|
631
|
-
await browser.launch({ headless: true });
|
|
632
|
-
await browser.navigate('https://example.com');
|
|
633
|
-
|
|
634
|
-
// Start screencast
|
|
635
|
-
await browser.startScreencast((frame) => {
|
|
636
|
-
// frame.data is base64-encoded image
|
|
637
|
-
// frame.metadata contains viewport info
|
|
638
|
-
console.log('Frame received:', frame.metadata.deviceWidth, 'x', frame.metadata.deviceHeight);
|
|
639
|
-
}, {
|
|
640
|
-
format: 'jpeg',
|
|
641
|
-
quality: 80,
|
|
642
|
-
maxWidth: 1280,
|
|
643
|
-
maxHeight: 720,
|
|
644
|
-
});
|
|
645
|
-
|
|
646
|
-
// Inject mouse events
|
|
647
|
-
await browser.injectMouseEvent({
|
|
648
|
-
type: 'mousePressed',
|
|
649
|
-
x: 100,
|
|
650
|
-
y: 200,
|
|
651
|
-
button: 'left',
|
|
652
|
-
});
|
|
653
|
-
|
|
654
|
-
// Inject keyboard events
|
|
655
|
-
await browser.injectKeyboardEvent({
|
|
656
|
-
type: 'keyDown',
|
|
657
|
-
key: 'Enter',
|
|
658
|
-
code: 'Enter',
|
|
659
|
-
});
|
|
660
|
-
|
|
661
|
-
// Stop when done
|
|
662
|
-
await browser.stopScreencast();
|
|
663
|
-
```
|
|
664
|
-
|
|
665
|
-
## Architecture
|
|
666
|
-
|
|
667
|
-
agent-browser uses a client-daemon architecture:
|
|
668
|
-
|
|
669
|
-
1. **Rust CLI** (fast native binary) - Parses commands, communicates with daemon
|
|
670
|
-
2. **Node.js Daemon** - Manages Playwright browser instance
|
|
671
|
-
3. **Fallback** - If native binary unavailable, uses Node.js directly
|
|
672
|
-
|
|
673
|
-
The daemon starts automatically on first command and persists between commands for fast subsequent operations.
|
|
674
|
-
|
|
675
|
-
**Browser Engine:** Uses Chromium by default. The daemon also supports Firefox and WebKit via the Playwright protocol.
|
|
676
|
-
|
|
677
|
-
## Platforms
|
|
678
|
-
|
|
679
|
-
| Platform | Binary | Fallback |
|
|
680
|
-
|----------|--------|----------|
|
|
681
|
-
| macOS ARM64 | Native Rust | Node.js |
|
|
682
|
-
| macOS x64 | Native Rust | Node.js |
|
|
683
|
-
| Linux ARM64 | Native Rust | Node.js |
|
|
684
|
-
| Linux x64 | Native Rust | Node.js |
|
|
685
|
-
| Windows x64 | Native Rust | Node.js |
|
|
686
|
-
|
|
687
|
-
## Usage with AI Agents
|
|
688
|
-
|
|
689
|
-
### Just ask the agent
|
|
690
|
-
|
|
691
|
-
The simplest approach - just tell your agent to use it:
|
|
692
|
-
|
|
693
|
-
```
|
|
694
|
-
Use agent-browser to test the login flow. Run agent-browser --help to see available commands.
|
|
695
|
-
```
|
|
696
83
|
|
|
697
|
-
|
|
698
|
-
|
|
699
|
-
### AI Coding Assistants
|
|
700
|
-
|
|
701
|
-
Add the skill to your AI coding assistant for richer context:
|
|
702
|
-
|
|
703
|
-
```bash
|
|
704
|
-
npx skills add vercel-labs/agent-browser
|
|
705
|
-
```
|
|
706
|
-
|
|
707
|
-
This works with Claude Code, Codex, Cursor, Gemini CLI, GitHub Copilot, Goose, OpenCode, and Windsurf.
|
|
708
|
-
|
|
709
|
-
### AGENTS.md / CLAUDE.md
|
|
710
|
-
|
|
711
|
-
For more consistent results, add to your project or global instructions file:
|
|
712
|
-
|
|
713
|
-
```markdown
|
|
714
|
-
## Browser Automation
|
|
715
|
-
|
|
716
|
-
Use `agent-browser` for web automation. Run `agent-browser --help` for all commands.
|
|
717
|
-
|
|
718
|
-
Core workflow:
|
|
719
|
-
1. `agent-browser open <url>` - Navigate to page
|
|
720
|
-
2. `agent-browser snapshot -i` - Get interactive elements with refs (@e1, @e2)
|
|
721
|
-
3. `agent-browser click @e1` / `fill @e2 "text"` - Interact using refs
|
|
722
|
-
4. Re-snapshot after page changes
|
|
723
|
-
```
|
|
724
|
-
|
|
725
|
-
## Integrations
|
|
726
|
-
|
|
727
|
-
### iOS Simulator
|
|
728
|
-
|
|
729
|
-
Control real Mobile Safari in the iOS Simulator for authentic mobile web testing. Requires macOS with Xcode.
|
|
730
|
-
|
|
731
|
-
**Setup:**
|
|
732
|
-
|
|
733
|
-
```bash
|
|
734
|
-
# Install Appium and XCUITest driver
|
|
735
|
-
npm install -g appium
|
|
736
|
-
appium driver install xcuitest
|
|
737
|
-
```
|
|
738
|
-
|
|
739
|
-
**Usage:**
|
|
740
|
-
|
|
741
|
-
```bash
|
|
742
|
-
# List available iOS simulators
|
|
743
|
-
agent-browser device list
|
|
744
|
-
|
|
745
|
-
# Launch Safari on a specific device
|
|
746
|
-
agent-browser -p ios --device "iPhone 16 Pro" open https://example.com
|
|
747
|
-
|
|
748
|
-
# Same commands as desktop
|
|
749
|
-
agent-browser -p ios snapshot -i
|
|
750
|
-
agent-browser -p ios tap @e1
|
|
751
|
-
agent-browser -p ios fill @e2 "text"
|
|
752
|
-
agent-browser -p ios screenshot mobile.png
|
|
753
|
-
|
|
754
|
-
# Mobile-specific commands
|
|
755
|
-
agent-browser -p ios swipe up
|
|
756
|
-
agent-browser -p ios swipe down 500
|
|
757
|
-
|
|
758
|
-
# Close session
|
|
759
|
-
agent-browser -p ios close
|
|
760
|
-
```
|
|
761
|
-
|
|
762
|
-
Or use environment variables:
|
|
763
|
-
|
|
764
|
-
```bash
|
|
765
|
-
export AGENT_BROWSER_PROVIDER=ios
|
|
766
|
-
export AGENT_BROWSER_IOS_DEVICE="iPhone 16 Pro"
|
|
767
|
-
agent-browser open https://example.com
|
|
768
|
-
```
|
|
769
|
-
|
|
770
|
-
| Variable | Description |
|
|
771
|
-
|----------|-------------|
|
|
772
|
-
| `AGENT_BROWSER_PROVIDER` | Set to `ios` to enable iOS mode |
|
|
773
|
-
| `AGENT_BROWSER_IOS_DEVICE` | Device name (e.g., "iPhone 16 Pro", "iPad Pro") |
|
|
774
|
-
| `AGENT_BROWSER_IOS_UDID` | Device UDID (alternative to device name) |
|
|
775
|
-
|
|
776
|
-
**Supported devices:** All iOS Simulators available in Xcode (iPhones, iPads), plus real iOS devices.
|
|
777
|
-
|
|
778
|
-
**Note:** The iOS provider boots the simulator, starts Appium, and controls Safari. First launch takes ~30-60 seconds; subsequent commands are fast.
|
|
779
|
-
|
|
780
|
-
#### Real Device Support
|
|
781
|
-
|
|
782
|
-
Appium also supports real iOS devices connected via USB. This requires additional one-time setup:
|
|
783
|
-
|
|
784
|
-
**1. Get your device UDID:**
|
|
785
|
-
```bash
|
|
786
|
-
xcrun xctrace list devices
|
|
787
|
-
# or
|
|
788
|
-
system_profiler SPUSBDataType | grep -A 5 "iPhone\|iPad"
|
|
789
|
-
```
|
|
790
|
-
|
|
791
|
-
**2. Sign WebDriverAgent (one-time):**
|
|
792
|
-
```bash
|
|
793
|
-
# Open the WebDriverAgent Xcode project
|
|
794
|
-
cd ~/.appium/node_modules/appium-xcuitest-driver/node_modules/appium-webdriveragent
|
|
795
|
-
open WebDriverAgent.xcodeproj
|
|
796
|
-
```
|
|
797
|
-
|
|
798
|
-
In Xcode:
|
|
799
|
-
- Select the `WebDriverAgentRunner` target
|
|
800
|
-
- Go to Signing & Capabilities
|
|
801
|
-
- Select your Team (requires Apple Developer account, free tier works)
|
|
802
|
-
- Let Xcode manage signing automatically
|
|
803
|
-
|
|
804
|
-
**3. Use with agent-browser:**
|
|
805
|
-
```bash
|
|
806
|
-
# Connect device via USB, then:
|
|
807
|
-
agent-browser -p ios --device "<DEVICE_UDID>" open https://example.com
|
|
808
|
-
|
|
809
|
-
# Or use the device name if unique
|
|
810
|
-
agent-browser -p ios --device "John's iPhone" open https://example.com
|
|
811
|
-
```
|
|
812
|
-
|
|
813
|
-
**Real device notes:**
|
|
814
|
-
- First run installs WebDriverAgent to the device (may require Trust prompt)
|
|
815
|
-
- Device must be unlocked and connected via USB
|
|
816
|
-
- Slightly slower initial connection than simulator
|
|
817
|
-
- Tests against real Safari performance and behavior
|
|
818
|
-
|
|
819
|
-
### Browserbase
|
|
820
|
-
|
|
821
|
-
[Browserbase](https://browserbase.com) provides remote browser infrastructure to make deployment of agentic browsing agents easy. Use it when running the agent-browser CLI in an environment where a local browser isn't feasible.
|
|
822
|
-
|
|
823
|
-
To enable Browserbase, use the `-p` flag:
|
|
824
|
-
|
|
825
|
-
```bash
|
|
826
|
-
export BROWSERBASE_API_KEY="your-api-key"
|
|
827
|
-
export BROWSERBASE_PROJECT_ID="your-project-id"
|
|
828
|
-
agent-browser -p browserbase open https://example.com
|
|
84
|
+
main().catch(console.error);
|
|
829
85
|
```
|
|
830
86
|
|
|
831
|
-
|
|
832
|
-
|
|
87
|
+
Run your agent:
|
|
833
88
|
```bash
|
|
834
|
-
|
|
835
|
-
export BROWSERBASE_API_KEY="your-api-key"
|
|
836
|
-
export BROWSERBASE_PROJECT_ID="your-project-id"
|
|
837
|
-
agent-browser open https://example.com
|
|
89
|
+
npx tsx agent.ts
|
|
838
90
|
```
|
|
839
91
|
|
|
840
|
-
|
|
841
|
-
|
|
842
|
-
Get your API key and project ID from the [Browserbase Dashboard](https://browserbase.com/overview).
|
|
843
|
-
|
|
844
|
-
### Browser Use
|
|
92
|
+
---
|
|
845
93
|
|
|
846
|
-
|
|
94
|
+
## 💻 CLI Usage (Global Install)
|
|
847
95
|
|
|
848
|
-
|
|
96
|
+
You can use the SDK directly from your terminal to debug or control the browser manually without writing code. The CLI is automatically installed when you install the package globally.
|
|
849
97
|
|
|
850
98
|
```bash
|
|
851
|
-
|
|
852
|
-
agent-browser -p browseruse open https://example.com
|
|
853
|
-
```
|
|
854
|
-
|
|
855
|
-
Or use environment variables for CI/scripts:
|
|
99
|
+
npm install -g @kritchoff/agent-browser
|
|
856
100
|
|
|
857
|
-
|
|
858
|
-
|
|
859
|
-
export BROWSER_USE_API_KEY="your-api-key"
|
|
860
|
-
agent-browser open https://example.com
|
|
861
|
-
```
|
|
101
|
+
# 1. Start the environment (Takes ~90s first time, ~5s after)
|
|
102
|
+
agent-browser start
|
|
862
103
|
|
|
863
|
-
|
|
864
|
-
|
|
865
|
-
|
|
866
|
-
|
|
867
|
-
### Kernel
|
|
868
|
-
|
|
869
|
-
[Kernel](https://www.kernel.sh) provides cloud browser infrastructure for AI agents with features like stealth mode and persistent profiles.
|
|
104
|
+
# 2. Run commands interactively
|
|
105
|
+
agent-browser navigate https://news.ycombinator.com
|
|
106
|
+
agent-browser click ".titleline a"
|
|
107
|
+
agent-browser snapshot
|
|
870
108
|
|
|
871
|
-
|
|
109
|
+
# 3. Clean the browser for a new session
|
|
110
|
+
agent-browser reset
|
|
872
111
|
|
|
873
|
-
|
|
874
|
-
|
|
875
|
-
agent-browser -p kernel open https://example.com
|
|
112
|
+
# 4. Stop and clean up containers
|
|
113
|
+
agent-browser stop
|
|
876
114
|
```
|
|
877
115
|
|
|
878
|
-
|
|
879
|
-
|
|
880
|
-
```bash
|
|
881
|
-
export AGENT_BROWSER_PROVIDER=kernel
|
|
882
|
-
export KERNEL_API_KEY="your-api-key"
|
|
883
|
-
agent-browser open https://example.com
|
|
884
|
-
```
|
|
116
|
+
---
|
|
885
117
|
|
|
886
|
-
|
|
118
|
+
## 📖 Complete API Reference
|
|
887
119
|
|
|
888
|
-
|
|
889
|
-
|----------|-------------|---------|
|
|
890
|
-
| `KERNEL_HEADLESS` | Run browser in headless mode (`true`/`false`) | `false` |
|
|
891
|
-
| `KERNEL_STEALTH` | Enable stealth mode to avoid bot detection (`true`/`false`) | `true` |
|
|
892
|
-
| `KERNEL_TIMEOUT_SECONDS` | Session timeout in seconds | `300` |
|
|
893
|
-
| `KERNEL_PROFILE_NAME` | Browser profile name for persistent cookies/logins (created if it doesn't exist) | (none) |
|
|
120
|
+
For a complete list of all available commands (clicking, typing, tabbing, network interception, storage), please read the **[COMMANDS.md](./COMMANDS.md)** file.
|
|
894
121
|
|
|
895
|
-
|
|
122
|
+
---
|
|
896
123
|
|
|
897
|
-
|
|
124
|
+
## ❓ Troubleshooting
|
|
898
125
|
|
|
899
|
-
|
|
126
|
+
### `Error: Timed out waiting for Agent Daemon on port 32001`
|
|
127
|
+
- **Cause**: The Android container took too long to download or boot, or your machine is slow. (We wait 3 minutes by default).
|
|
128
|
+
- **Fix**: Run `agent.stop()` (or `agent-browser stop`) and try `agent.start()` again. The Docker images might still be downloading in the background.
|
|
900
129
|
|
|
901
|
-
|
|
130
|
+
### `net::ERR_NAME_NOT_RESOLVED`
|
|
131
|
+
- **Cause**: The Android Emulator temporarily lost its internet connection after a Warm Boot.
|
|
132
|
+
- **Fix**: The SDK automatically toggles Airplane Mode to fix this DHCP issue. If it persists, ensure your host machine has a stable internet connection and your VPN/Firewall isn't blocking Docker bridge networks.
|
|
902
133
|
|
|
903
|
-
|
|
134
|
+
### `Selector "..." matched X elements (Strict Mode Violation)`
|
|
135
|
+
- **Cause**: Playwright requires selectors to point to exactly one element.
|
|
136
|
+
- **Fix**: Use more specific selectors, or use Playwright's `>> nth=0` pseudo-selector to pick the first match (e.g., `agent.click('a >> nth=0')`).
|