agent-browser 0.24.0 → 0.25.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +69 -9
- package/bin/agent-browser-darwin-arm64 +0 -0
- package/bin/agent-browser-darwin-x64 +0 -0
- package/bin/agent-browser-linux-arm64 +0 -0
- package/bin/agent-browser-linux-musl-arm64 +0 -0
- package/bin/agent-browser-linux-musl-x64 +0 -0
- package/bin/agent-browser-linux-x64 +0 -0
- package/bin/agent-browser-win32-x64.exe +0 -0
- package/package.json +1 -1
- package/skills/agent-browser/SKILL.md +117 -55
- package/skills/agentcore/SKILL.md +115 -0
package/README.md
CHANGED
|
@@ -130,6 +130,8 @@ agent-browser stream status # Show runtime streaming state and bound p
|
|
|
130
130
|
agent-browser stream disable # Stop runtime WebSocket streaming
|
|
131
131
|
agent-browser close # Close browser (aliases: quit, exit)
|
|
132
132
|
agent-browser close --all # Close all active sessions
|
|
133
|
+
agent-browser chat "<instruction>" # AI chat: natural language browser control (single-shot)
|
|
134
|
+
agent-browser chat # AI chat: interactive REPL mode
|
|
133
135
|
```
|
|
134
136
|
|
|
135
137
|
### Get Info
|
|
@@ -203,21 +205,24 @@ agent-browser wait "#spinner" --state hidden
|
|
|
203
205
|
|
|
204
206
|
### Batch Execution
|
|
205
207
|
|
|
206
|
-
Execute multiple commands in a single invocation
|
|
207
|
-
|
|
208
|
-
when running multi-step workflows.
|
|
208
|
+
Execute multiple commands in a single invocation. Commands can be passed as
|
|
209
|
+
quoted arguments or piped as JSON via stdin. This avoids per-command process
|
|
210
|
+
startup overhead when running multi-step workflows.
|
|
209
211
|
|
|
210
212
|
```bash
|
|
211
|
-
#
|
|
213
|
+
# Argument mode: each quoted argument is a full command
|
|
214
|
+
agent-browser batch "open https://example.com" "snapshot -i" "screenshot"
|
|
215
|
+
|
|
216
|
+
# With --bail to stop on first error
|
|
217
|
+
agent-browser batch --bail "open https://example.com" "click @e1" "screenshot"
|
|
218
|
+
|
|
219
|
+
# Stdin mode: pipe commands as JSON
|
|
212
220
|
echo '[
|
|
213
221
|
["open", "https://example.com"],
|
|
214
222
|
["snapshot", "-i"],
|
|
215
223
|
["click", "@e1"],
|
|
216
224
|
["screenshot", "result.png"]
|
|
217
225
|
]' | agent-browser batch --json
|
|
218
|
-
|
|
219
|
-
# Stop on first error
|
|
220
|
-
agent-browser batch --bail < commands.json
|
|
221
226
|
```
|
|
222
227
|
|
|
223
228
|
### Clipboard
|
|
@@ -374,6 +379,7 @@ agent-browser provides multiple ways to persist login sessions so you don't re-a
|
|
|
374
379
|
|
|
375
380
|
| Approach | Best for | Flag / Env |
|
|
376
381
|
|----------|----------|------------|
|
|
382
|
+
| **Chrome profile reuse** | Reuse your existing Chrome login state (cookies, sessions) with zero setup | `--profile <name>` / `AGENT_BROWSER_PROFILE` |
|
|
377
383
|
| **Persistent profile** | Full browser state (cookies, IndexedDB, service workers, cache) across restarts | `--profile <path>` / `AGENT_BROWSER_PROFILE` |
|
|
378
384
|
| **Session persistence** | Auto-save/restore cookies + localStorage by name | `--session-name <name>` / `AGENT_BROWSER_SESSION_NAME` |
|
|
379
385
|
| **Import from your browser** | Grab auth from a Chrome session you already logged into | `--auto-connect` + `state save` |
|
|
@@ -437,9 +443,31 @@ Each session has its own:
|
|
|
437
443
|
- Navigation history
|
|
438
444
|
- Authentication state
|
|
439
445
|
|
|
446
|
+
## Chrome Profile Reuse
|
|
447
|
+
|
|
448
|
+
The fastest way to use your existing login state: pass a Chrome profile name to `--profile`:
|
|
449
|
+
|
|
450
|
+
```bash
|
|
451
|
+
# List available Chrome profiles
|
|
452
|
+
agent-browser profiles
|
|
453
|
+
|
|
454
|
+
# Reuse your default Chrome profile's login state
|
|
455
|
+
agent-browser --profile Default open https://gmail.com
|
|
456
|
+
|
|
457
|
+
# Use a named profile (by display name or directory name)
|
|
458
|
+
agent-browser --profile "Work" open https://app.example.com
|
|
459
|
+
|
|
460
|
+
# Or via environment variable
|
|
461
|
+
AGENT_BROWSER_PROFILE=Default agent-browser open https://gmail.com
|
|
462
|
+
```
|
|
463
|
+
|
|
464
|
+
This copies your Chrome profile to a temp directory (read-only snapshot, no changes to your original profile), so the browser launches with your existing cookies and sessions.
|
|
465
|
+
|
|
466
|
+
> **Note:** On Windows, close Chrome before using `--profile <name>` if Chrome is running, as some profile files may be locked.
|
|
467
|
+
|
|
440
468
|
## Persistent Profiles
|
|
441
469
|
|
|
442
|
-
|
|
470
|
+
For a persistent custom profile directory that stores state across browser restarts, pass a path to `--profile`:
|
|
443
471
|
|
|
444
472
|
```bash
|
|
445
473
|
# Use a persistent profile directory
|
|
@@ -525,6 +553,7 @@ The `snapshot` command supports filtering to reduce output size:
|
|
|
525
553
|
```bash
|
|
526
554
|
agent-browser snapshot # Full accessibility tree
|
|
527
555
|
agent-browser snapshot -i # Interactive elements only (buttons, inputs, links)
|
|
556
|
+
agent-browser snapshot -i --urls # Interactive elements with link URLs
|
|
528
557
|
agent-browser snapshot -c # Compact (remove empty structural elements)
|
|
529
558
|
agent-browser snapshot -d 3 # Limit depth to 3 levels
|
|
530
559
|
agent-browser snapshot -s "#main" # Scope to CSS selector
|
|
@@ -534,6 +563,7 @@ agent-browser snapshot -i -c -d 5 # Combine options
|
|
|
534
563
|
| Option | Description |
|
|
535
564
|
| ---------------------- | ----------------------------------------------------------------------- |
|
|
536
565
|
| `-i, --interactive` | Only show interactive elements (buttons, links, inputs) |
|
|
566
|
+
| `-u, --urls` | Include href URLs for link elements |
|
|
537
567
|
| `-c, --compact` | Remove empty structural elements |
|
|
538
568
|
| `-d, --depth <n>` | Limit tree depth |
|
|
539
569
|
| `-s, --selector <sel>` | Scope to CSS selector |
|
|
@@ -567,7 +597,7 @@ This is useful for multimodal AI models that can reason about visual layout, unl
|
|
|
567
597
|
|--------|-------------|
|
|
568
598
|
| `--session <name>` | Use isolated session (or `AGENT_BROWSER_SESSION` env) |
|
|
569
599
|
| `--session-name <name>` | Auto-save/restore session state (or `AGENT_BROWSER_SESSION_NAME` env) |
|
|
570
|
-
| `--profile <path>` |
|
|
600
|
+
| `--profile <name\|path>` | Chrome profile name or persistent directory path (or `AGENT_BROWSER_PROFILE` env) |
|
|
571
601
|
| `--state <path>` | Load storage state from JSON file (or `AGENT_BROWSER_STATE` env) |
|
|
572
602
|
| `--headers <json>` | Set HTTP headers scoped to the URL's origin |
|
|
573
603
|
| `--executable-path <path>` | Custom browser executable (or `AGENT_BROWSER_EXECUTABLE_PATH` env) |
|
|
@@ -598,6 +628,9 @@ This is useful for multimodal AI models that can reason about visual layout, unl
|
|
|
598
628
|
| `--confirm-interactive` | Interactive confirmation prompts; auto-denies if stdin is not a TTY (or `AGENT_BROWSER_CONFIRM_INTERACTIVE` env) |
|
|
599
629
|
| `--engine <name>` | Browser engine: `chrome` (default), `lightpanda` (or `AGENT_BROWSER_ENGINE` env) |
|
|
600
630
|
| `--no-auto-dialog` | Disable automatic dismissal of `alert`/`beforeunload` dialogs (or `AGENT_BROWSER_NO_AUTO_DIALOG` env) |
|
|
631
|
+
| `--model <name>` | AI model for chat command (or `AI_GATEWAY_MODEL` env) |
|
|
632
|
+
| `-v`, `--verbose` | Show tool commands and their raw output (chat) |
|
|
633
|
+
| `-q`, `--quiet` | Show only AI text responses, hide tool calls (chat) |
|
|
601
634
|
| `--config <path>` | Use a custom config file (or `AGENT_BROWSER_CONFIG` env) |
|
|
602
635
|
| `--debug` | Debug output |
|
|
603
636
|
|
|
@@ -627,6 +660,33 @@ The dashboard displays:
|
|
|
627
660
|
- **Activity feed** -- chronological command/result stream with timing and expandable details
|
|
628
661
|
- **Console output** -- browser console messages (log, warn, error)
|
|
629
662
|
- **Session creation** -- create new sessions from the UI with local engines (Chrome, Lightpanda) or cloud providers (AgentCore, Browserbase, Browserless, Browser Use, Kernel)
|
|
663
|
+
- **AI Chat** -- chat with an AI assistant directly in the dashboard (requires Vercel AI Gateway configuration)
|
|
664
|
+
|
|
665
|
+
### AI Chat
|
|
666
|
+
|
|
667
|
+
The dashboard includes an optional AI chat panel powered by the Vercel AI Gateway. The same functionality is available directly from the CLI via the `chat` command. Set these environment variables to enable AI chat:
|
|
668
|
+
|
|
669
|
+
```bash
|
|
670
|
+
export AI_GATEWAY_API_KEY=gw_your_key_here
|
|
671
|
+
export AI_GATEWAY_MODEL=anthropic/claude-sonnet-4.6 # optional, this is the default
|
|
672
|
+
export AI_GATEWAY_URL=https://ai-gateway.vercel.sh # optional, this is the default
|
|
673
|
+
```
|
|
674
|
+
|
|
675
|
+
**CLI usage:**
|
|
676
|
+
|
|
677
|
+
```bash
|
|
678
|
+
agent-browser chat "open google.com and search for cats" # Single-shot
|
|
679
|
+
agent-browser chat # Interactive REPL
|
|
680
|
+
agent-browser -q chat "summarize this page" # Quiet mode (text only)
|
|
681
|
+
agent-browser -v chat "fill in the login form" # Verbose (show command output)
|
|
682
|
+
agent-browser --model openai/gpt-4o chat "take a screenshot" # Override model
|
|
683
|
+
```
|
|
684
|
+
|
|
685
|
+
The `chat` command translates natural language instructions into agent-browser commands, executes them, and streams the AI response. In interactive mode, type `quit` to exit. Use `--json` for structured output suitable for agent consumption.
|
|
686
|
+
|
|
687
|
+
**Dashboard usage:**
|
|
688
|
+
|
|
689
|
+
The Chat tab is always visible in the dashboard. When `AI_GATEWAY_API_KEY` is set, the Rust server proxies requests to the gateway and streams responses back using the Vercel AI SDK's UI Message Stream protocol. Without the key, sending a message shows an error inline.
|
|
630
690
|
|
|
631
691
|
## Configuration
|
|
632
692
|
|
|
Binary file
|
|
Binary file
|
|
Binary file
|
|
Binary file
|
|
Binary file
|
|
Binary file
|
|
Binary file
|
package/package.json
CHANGED
|
@@ -25,7 +25,7 @@ agent-browser snapshot -i
|
|
|
25
25
|
agent-browser fill @e1 "user@example.com"
|
|
26
26
|
agent-browser fill @e2 "password123"
|
|
27
27
|
agent-browser click @e3
|
|
28
|
-
agent-browser wait
|
|
28
|
+
agent-browser wait 2000
|
|
29
29
|
agent-browser snapshot -i # Check result
|
|
30
30
|
```
|
|
31
31
|
|
|
@@ -34,14 +34,14 @@ agent-browser snapshot -i # Check result
|
|
|
34
34
|
Commands can be chained with `&&` in a single shell invocation. The browser persists between commands via a background daemon, so chaining is safe and more efficient than separate calls.
|
|
35
35
|
|
|
36
36
|
```bash
|
|
37
|
-
# Chain open +
|
|
38
|
-
agent-browser open https://example.com && agent-browser
|
|
37
|
+
# Chain open + snapshot in one call (open already waits for page load)
|
|
38
|
+
agent-browser open https://example.com && agent-browser snapshot -i
|
|
39
39
|
|
|
40
40
|
# Chain multiple interactions
|
|
41
41
|
agent-browser fill @e1 "user@example.com" && agent-browser fill @e2 "password123" && agent-browser click @e3
|
|
42
42
|
|
|
43
43
|
# Navigate and capture
|
|
44
|
-
agent-browser open https://example.com && agent-browser
|
|
44
|
+
agent-browser open https://example.com && agent-browser screenshot
|
|
45
45
|
```
|
|
46
46
|
|
|
47
47
|
**When to chain:** Use `&&` when you don't need to read the output of an intermediate command before proceeding (e.g., open + wait + screenshot). Run commands separately when you need to parse the output first (e.g., snapshot to discover refs, then interact using those refs).
|
|
@@ -61,7 +61,17 @@ agent-browser --state ./auth.json open https://app.example.com/dashboard
|
|
|
61
61
|
|
|
62
62
|
State files contain session tokens in plaintext -- add to `.gitignore` and delete when no longer needed. Set `AGENT_BROWSER_ENCRYPTION_KEY` for encryption at rest.
|
|
63
63
|
|
|
64
|
-
**Option 2:
|
|
64
|
+
**Option 2: Chrome profile reuse (zero setup)**
|
|
65
|
+
|
|
66
|
+
```bash
|
|
67
|
+
# List available Chrome profiles
|
|
68
|
+
agent-browser profiles
|
|
69
|
+
|
|
70
|
+
# Reuse the user's existing Chrome login state
|
|
71
|
+
agent-browser --profile Default open https://gmail.com
|
|
72
|
+
```
|
|
73
|
+
|
|
74
|
+
**Option 3: Persistent profile (for recurring tasks)**
|
|
65
75
|
|
|
66
76
|
```bash
|
|
67
77
|
# First run: login manually or via automation
|
|
@@ -72,7 +82,7 @@ agent-browser --profile ~/.myapp open https://app.example.com/login
|
|
|
72
82
|
agent-browser --profile ~/.myapp open https://app.example.com/dashboard
|
|
73
83
|
```
|
|
74
84
|
|
|
75
|
-
**Option
|
|
85
|
+
**Option 4: Session name (auto-save/restore cookies + localStorage)**
|
|
76
86
|
|
|
77
87
|
```bash
|
|
78
88
|
agent-browser --session-name myapp open https://app.example.com/login
|
|
@@ -107,6 +117,11 @@ See [references/authentication.md](references/authentication.md) for OAuth, 2FA,
|
|
|
107
117
|
## Essential Commands
|
|
108
118
|
|
|
109
119
|
```bash
|
|
120
|
+
# Batch: ALWAYS use batch for 2+ sequential commands. Commands run in order.
|
|
121
|
+
agent-browser batch "open https://example.com" "snapshot -i"
|
|
122
|
+
agent-browser batch "open https://example.com" "screenshot"
|
|
123
|
+
agent-browser batch "click @e1" "wait 1000" "screenshot"
|
|
124
|
+
|
|
110
125
|
# Navigation
|
|
111
126
|
agent-browser open <url> # Navigate (aliases: goto, navigate)
|
|
112
127
|
agent-browser close # Close browser
|
|
@@ -114,6 +129,7 @@ agent-browser close --all # Close all active sessions
|
|
|
114
129
|
|
|
115
130
|
# Snapshot
|
|
116
131
|
agent-browser snapshot -i # Interactive elements with refs (recommended)
|
|
132
|
+
agent-browser snapshot -i --urls # Include href URLs for links
|
|
117
133
|
agent-browser snapshot -s "#selector" # Scope to CSS selector
|
|
118
134
|
|
|
119
135
|
# Interaction (use @refs from snapshot)
|
|
@@ -137,10 +153,10 @@ agent-browser get cdp-url # Get CDP WebSocket URL
|
|
|
137
153
|
|
|
138
154
|
# Wait
|
|
139
155
|
agent-browser wait @e1 # Wait for element
|
|
140
|
-
agent-browser wait --load networkidle # Wait for network idle
|
|
141
|
-
agent-browser wait --url "**/page" # Wait for URL pattern
|
|
142
156
|
agent-browser wait 2000 # Wait milliseconds
|
|
143
|
-
agent-browser wait --
|
|
157
|
+
agent-browser wait --url "**/page" # Wait for URL pattern
|
|
158
|
+
agent-browser wait --text "Welcome" # Wait for text to appear (substring match)
|
|
159
|
+
agent-browser wait --load networkidle # Wait for network idle (caution: see Pitfalls)
|
|
144
160
|
agent-browser wait --fn "!document.body.innerText.includes('Loading...')" # Wait for text to disappear
|
|
145
161
|
agent-browser wait "#spinner" --state hidden # Wait for element to disappear
|
|
146
162
|
|
|
@@ -149,6 +165,14 @@ agent-browser download @e1 ./file.pdf # Click element to trigger downlo
|
|
|
149
165
|
agent-browser wait --download ./output.zip # Wait for any download to complete
|
|
150
166
|
agent-browser --download-path ./downloads open <url> # Set default download directory
|
|
151
167
|
|
|
168
|
+
# Tab management
|
|
169
|
+
agent-browser tab list # List all open tabs
|
|
170
|
+
agent-browser tab new # Open a blank new tab
|
|
171
|
+
agent-browser tab new https://example.com # Open URL in a new tab
|
|
172
|
+
agent-browser tab 2 # Switch to tab by index (0-based)
|
|
173
|
+
agent-browser tab close # Close the current tab
|
|
174
|
+
agent-browser tab close 2 # Close tab by index
|
|
175
|
+
|
|
152
176
|
# Network
|
|
153
177
|
agent-browser network requests # Inspect tracked requests
|
|
154
178
|
agent-browser network requests --type xhr,fetch # Filter by resource type
|
|
@@ -200,6 +224,13 @@ agent-browser diff screenshot --baseline before.png # Visual pixel diff
|
|
|
200
224
|
agent-browser diff url <url1> <url2> # Compare two pages
|
|
201
225
|
agent-browser diff url <url1> <url2> --wait-until networkidle # Custom wait strategy
|
|
202
226
|
agent-browser diff url <url1> <url2> --selector "#main" # Scope to element
|
|
227
|
+
|
|
228
|
+
# Chat (AI natural language control)
|
|
229
|
+
agent-browser chat "open google.com and search for cats" # Single-shot instruction
|
|
230
|
+
agent-browser chat # Interactive REPL mode
|
|
231
|
+
agent-browser -q chat "summarize this page" # Quiet (text only, no tool calls)
|
|
232
|
+
agent-browser -v chat "fill in the login form" # Verbose (show command output)
|
|
233
|
+
agent-browser --model openai/gpt-4o chat "take a screenshot" # Override model
|
|
203
234
|
```
|
|
204
235
|
|
|
205
236
|
## Streaming
|
|
@@ -208,35 +239,62 @@ Every session automatically starts a WebSocket stream server on an OS-assigned p
|
|
|
208
239
|
|
|
209
240
|
## Batch Execution
|
|
210
241
|
|
|
211
|
-
|
|
242
|
+
ALWAYS use `batch` when running 2+ commands in sequence. Batch executes commands in order, so dependent commands (like navigate then screenshot) work correctly. Each quoted argument is a separate command.
|
|
212
243
|
|
|
213
244
|
```bash
|
|
214
|
-
|
|
215
|
-
|
|
216
|
-
|
|
217
|
-
|
|
218
|
-
|
|
219
|
-
]' | agent-browser batch --json
|
|
245
|
+
# Navigate and take a snapshot
|
|
246
|
+
agent-browser batch "open https://example.com" "snapshot -i"
|
|
247
|
+
|
|
248
|
+
# Navigate, snapshot, and screenshot in one call
|
|
249
|
+
agent-browser batch "open https://example.com" "snapshot -i" "screenshot"
|
|
220
250
|
|
|
221
|
-
#
|
|
251
|
+
# Click, wait, then screenshot
|
|
252
|
+
agent-browser batch "click @e1" "wait 1000" "screenshot"
|
|
253
|
+
|
|
254
|
+
# With --bail to stop on first error
|
|
255
|
+
agent-browser batch --bail "open https://example.com" "click @e1" "screenshot"
|
|
256
|
+
```
|
|
257
|
+
|
|
258
|
+
Only use a single command (not batch) when you need to read the output before deciding the next command. For example, you must run `snapshot -i` as a single command when you need to read the refs to decide what to click. After reading the snapshot, batch the remaining steps.
|
|
259
|
+
|
|
260
|
+
Stdin mode is also supported for programmatic use:
|
|
261
|
+
|
|
262
|
+
```bash
|
|
263
|
+
echo '[["open","https://example.com"],["screenshot"]]' | agent-browser batch --json
|
|
222
264
|
agent-browser batch --bail < commands.json
|
|
223
265
|
```
|
|
224
266
|
|
|
225
|
-
|
|
267
|
+
## Efficiency Strategies
|
|
268
|
+
|
|
269
|
+
These patterns minimize tool calls and token usage.
|
|
270
|
+
|
|
271
|
+
**Use `--urls` to avoid re-navigation.** When you need to visit links from a page, use `snapshot -i --urls` to get all href URLs upfront. Then `open` each URL directly instead of clicking refs and navigating back.
|
|
272
|
+
|
|
273
|
+
**Snapshot once, act many times.** Never re-snapshot the same page. Extract all needed info (refs, URLs, text) from a single snapshot, then batch the remaining actions.
|
|
274
|
+
|
|
275
|
+
**Multi-page workflow (e.g. "visit N sites and screenshot each"):**
|
|
276
|
+
|
|
277
|
+
```bash
|
|
278
|
+
# 1. Get all URLs in one call
|
|
279
|
+
agent-browser batch "open https://news.ycombinator.com" "snapshot -i --urls"
|
|
280
|
+
# Read output to extract URLs, then visit each directly:
|
|
281
|
+
# 2. One batch per target site
|
|
282
|
+
agent-browser batch "open https://github.com/example/repo" "screenshot"
|
|
283
|
+
agent-browser batch "open https://example.com/article" "screenshot"
|
|
284
|
+
agent-browser batch "open https://other.com/page" "screenshot"
|
|
285
|
+
```
|
|
286
|
+
|
|
287
|
+
This approach uses 4 tool calls instead of 14+. Never go back to the listing page between visits.
|
|
226
288
|
|
|
227
289
|
## Common Patterns
|
|
228
290
|
|
|
229
291
|
### Form Submission
|
|
230
292
|
|
|
231
293
|
```bash
|
|
232
|
-
|
|
233
|
-
agent-browser snapshot -i
|
|
234
|
-
|
|
235
|
-
agent-browser fill @e2 "jane@example.com"
|
|
236
|
-
agent-browser select @e3 "California"
|
|
237
|
-
agent-browser check @e4
|
|
238
|
-
agent-browser click @e5
|
|
239
|
-
agent-browser wait --load networkidle
|
|
294
|
+
# Navigate and get the form structure
|
|
295
|
+
agent-browser batch "open https://example.com/signup" "snapshot -i"
|
|
296
|
+
# Read the snapshot output to identify form refs, then fill and submit
|
|
297
|
+
agent-browser batch "fill @e1 \"Jane Doe\"" "fill @e2 \"jane@example.com\"" "select @e3 \"California\"" "check @e4" "click @e5" "wait 2000"
|
|
240
298
|
```
|
|
241
299
|
|
|
242
300
|
### Authentication with Auth Vault (Recommended)
|
|
@@ -261,17 +319,12 @@ agent-browser auth delete github
|
|
|
261
319
|
|
|
262
320
|
```bash
|
|
263
321
|
# Login once and save state
|
|
264
|
-
agent-browser open https://app.example.com/login
|
|
265
|
-
|
|
266
|
-
agent-browser fill @e1 "$USERNAME"
|
|
267
|
-
agent-browser fill @e2 "$PASSWORD"
|
|
268
|
-
agent-browser click @e3
|
|
269
|
-
agent-browser wait --url "**/dashboard"
|
|
270
|
-
agent-browser state save auth.json
|
|
322
|
+
agent-browser batch "open https://app.example.com/login" "snapshot -i"
|
|
323
|
+
# Read snapshot to find form refs, then fill and submit
|
|
324
|
+
agent-browser batch "fill @e1 \"$USERNAME\"" "fill @e2 \"$PASSWORD\"" "click @e3" "wait --url **/dashboard" "state save auth.json"
|
|
271
325
|
|
|
272
326
|
# Reuse in future sessions
|
|
273
|
-
agent-browser state load auth.json
|
|
274
|
-
agent-browser open https://app.example.com/dashboard
|
|
327
|
+
agent-browser batch "state load auth.json" "open https://app.example.com/dashboard"
|
|
275
328
|
```
|
|
276
329
|
|
|
277
330
|
### Session Persistence
|
|
@@ -301,8 +354,7 @@ agent-browser state clean --older-than 7
|
|
|
301
354
|
Iframe content is automatically inlined in snapshots. Refs inside iframes carry frame context, so you can interact with them directly.
|
|
302
355
|
|
|
303
356
|
```bash
|
|
304
|
-
agent-browser open https://example.com/checkout
|
|
305
|
-
agent-browser snapshot -i
|
|
357
|
+
agent-browser batch "open https://example.com/checkout" "snapshot -i"
|
|
306
358
|
# @e1 [heading] "Checkout"
|
|
307
359
|
# @e2 [Iframe] "payment-frame"
|
|
308
360
|
# @e3 [input] "Card number"
|
|
@@ -310,23 +362,19 @@ agent-browser snapshot -i
|
|
|
310
362
|
# @e5 [button] "Pay"
|
|
311
363
|
|
|
312
364
|
# Interact directly — no frame switch needed
|
|
313
|
-
agent-browser fill @e3 "4111111111111111"
|
|
314
|
-
agent-browser fill @e4 "12/28"
|
|
315
|
-
agent-browser click @e5
|
|
365
|
+
agent-browser batch "fill @e3 \"4111111111111111\"" "fill @e4 \"12/28\"" "click @e5"
|
|
316
366
|
|
|
317
367
|
# To scope a snapshot to one iframe:
|
|
318
|
-
agent-browser frame @e2
|
|
319
|
-
agent-browser snapshot -i # Only iframe content
|
|
368
|
+
agent-browser batch "frame @e2" "snapshot -i"
|
|
320
369
|
agent-browser frame main # Return to main frame
|
|
321
370
|
```
|
|
322
371
|
|
|
323
372
|
### Data Extraction
|
|
324
373
|
|
|
325
374
|
```bash
|
|
326
|
-
agent-browser open https://example.com/products
|
|
327
|
-
|
|
375
|
+
agent-browser batch "open https://example.com/products" "snapshot -i"
|
|
376
|
+
# Read snapshot to find element refs, then extract
|
|
328
377
|
agent-browser get text @e5 # Get specific element text
|
|
329
|
-
agent-browser get text body > page.txt # Get all page text
|
|
330
378
|
|
|
331
379
|
# JSON output for parsing
|
|
332
380
|
agent-browser snapshot -i --json
|
|
@@ -520,27 +568,29 @@ agent-browser diff url https://staging.example.com https://prod.example.com --sc
|
|
|
520
568
|
|
|
521
569
|
## Timeouts and Slow Pages
|
|
522
570
|
|
|
523
|
-
The default timeout is 25 seconds. This can be overridden with the `AGENT_BROWSER_DEFAULT_TIMEOUT` environment variable (value in milliseconds).
|
|
571
|
+
The default timeout is 25 seconds. This can be overridden with the `AGENT_BROWSER_DEFAULT_TIMEOUT` environment variable (value in milliseconds).
|
|
524
572
|
|
|
525
|
-
|
|
526
|
-
# Wait for network activity to settle (best for slow pages)
|
|
527
|
-
agent-browser wait --load networkidle
|
|
573
|
+
**Important:** `open` already waits for the page `load` event before returning. In most cases, no additional wait is needed before taking a snapshot or screenshot. Only add an explicit wait when content loads asynchronously after the initial page load.
|
|
528
574
|
|
|
529
|
-
|
|
575
|
+
```bash
|
|
576
|
+
# Wait for a specific element to appear (preferred for dynamic content)
|
|
530
577
|
agent-browser wait "#content"
|
|
531
578
|
agent-browser wait @e1
|
|
532
579
|
|
|
580
|
+
# Wait a fixed duration (good default for slow SPAs)
|
|
581
|
+
agent-browser wait 2000
|
|
582
|
+
|
|
533
583
|
# Wait for a specific URL pattern (useful after redirects)
|
|
534
584
|
agent-browser wait --url "**/dashboard"
|
|
535
585
|
|
|
536
|
-
# Wait for
|
|
537
|
-
agent-browser wait --
|
|
586
|
+
# Wait for text to appear on the page
|
|
587
|
+
agent-browser wait --text "Results loaded"
|
|
538
588
|
|
|
539
|
-
# Wait
|
|
540
|
-
agent-browser wait
|
|
589
|
+
# Wait for a JavaScript condition
|
|
590
|
+
agent-browser wait --fn "document.querySelectorAll('.item').length > 0"
|
|
541
591
|
```
|
|
542
592
|
|
|
543
|
-
|
|
593
|
+
**Avoid `wait --load networkidle`** unless you are certain the site has no persistent network activity. Ad-heavy sites, sites with analytics/tracking, and sites with websockets will cause `networkidle` to hang indefinitely. Prefer `wait 2000` or `wait <selector>` instead.
|
|
544
594
|
|
|
545
595
|
## JavaScript Dialogs (alert / confirm / prompt)
|
|
546
596
|
|
|
@@ -754,6 +804,18 @@ agent-browser dashboard stop
|
|
|
754
804
|
|
|
755
805
|
The dashboard runs independently of browser sessions on port 4848 (configurable with `--port`). All sessions automatically stream to the dashboard. Sessions can also be created from the dashboard UI with local engines or cloud providers.
|
|
756
806
|
|
|
807
|
+
### Dashboard AI Chat
|
|
808
|
+
|
|
809
|
+
The dashboard has an optional AI chat tab powered by the Vercel AI Gateway. Enable it by setting:
|
|
810
|
+
|
|
811
|
+
```bash
|
|
812
|
+
export AI_GATEWAY_API_KEY=gw_your_key_here
|
|
813
|
+
export AI_GATEWAY_MODEL=anthropic/claude-sonnet-4.6 # optional default
|
|
814
|
+
export AI_GATEWAY_URL=https://ai-gateway.vercel.sh # optional default
|
|
815
|
+
```
|
|
816
|
+
|
|
817
|
+
The Chat tab is always visible in the dashboard. Set `AI_GATEWAY_API_KEY` to enable AI responses.
|
|
818
|
+
|
|
757
819
|
## Ready-to-Use Templates
|
|
758
820
|
|
|
759
821
|
| Template | Description |
|
|
@@ -0,0 +1,115 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: agentcore
|
|
3
|
+
description: Run agent-browser on AWS Bedrock AgentCore cloud browsers. Use when the user wants to use AgentCore, run browser automation on AWS, use a cloud browser with AWS credentials, or needs a managed browser session backed by AWS infrastructure. Triggers include "use agentcore", "run on AWS", "cloud browser with AWS", "bedrock browser", "agentcore session", or any task requiring AWS-hosted browser automation.
|
|
4
|
+
allowed-tools: Bash(agent-browser:*), Bash(npx agent-browser:*)
|
|
5
|
+
---
|
|
6
|
+
|
|
7
|
+
# AWS Bedrock AgentCore
|
|
8
|
+
|
|
9
|
+
Run agent-browser on cloud browser sessions hosted by AWS Bedrock AgentCore. All standard agent-browser commands work identically; the only difference is where the browser runs.
|
|
10
|
+
|
|
11
|
+
## Setup
|
|
12
|
+
|
|
13
|
+
Credentials are resolved automatically:
|
|
14
|
+
|
|
15
|
+
1. Environment variables (`AWS_ACCESS_KEY_ID`, `AWS_SECRET_ACCESS_KEY`, optionally `AWS_SESSION_TOKEN`)
|
|
16
|
+
2. AWS CLI fallback (`aws configure export-credentials`), which supports SSO, IAM roles, and named profiles
|
|
17
|
+
|
|
18
|
+
No additional setup is needed if the user already has working AWS credentials.
|
|
19
|
+
|
|
20
|
+
## Core Workflow
|
|
21
|
+
|
|
22
|
+
```bash
|
|
23
|
+
# Open a page on an AgentCore cloud browser
|
|
24
|
+
agent-browser -p agentcore open https://example.com
|
|
25
|
+
|
|
26
|
+
# Everything else is the same as local Chrome
|
|
27
|
+
agent-browser snapshot -i
|
|
28
|
+
agent-browser click @e1
|
|
29
|
+
agent-browser screenshot page.png
|
|
30
|
+
agent-browser close
|
|
31
|
+
```
|
|
32
|
+
|
|
33
|
+
## Environment Variables
|
|
34
|
+
|
|
35
|
+
| Variable | Description | Default |
|
|
36
|
+
|----------|-------------|---------|
|
|
37
|
+
| `AGENTCORE_REGION` | AWS region | `us-east-1` |
|
|
38
|
+
| `AGENTCORE_BROWSER_ID` | Browser identifier | `aws.browser.v1` |
|
|
39
|
+
| `AGENTCORE_PROFILE_ID` | Persistent browser profile (cookies, localStorage) | (none) |
|
|
40
|
+
| `AGENTCORE_SESSION_TIMEOUT` | Session timeout in seconds | `3600` |
|
|
41
|
+
| `AWS_PROFILE` | AWS CLI profile for credential resolution | `default` |
|
|
42
|
+
|
|
43
|
+
## Persistent Profiles
|
|
44
|
+
|
|
45
|
+
Use `AGENTCORE_PROFILE_ID` to persist browser state across sessions. This is useful for maintaining login sessions:
|
|
46
|
+
|
|
47
|
+
```bash
|
|
48
|
+
# First run: log in
|
|
49
|
+
AGENTCORE_PROFILE_ID=my-app agent-browser -p agentcore open https://app.example.com/login
|
|
50
|
+
agent-browser snapshot -i
|
|
51
|
+
agent-browser fill @e1 "user@example.com"
|
|
52
|
+
agent-browser fill @e2 "password"
|
|
53
|
+
agent-browser click @e3
|
|
54
|
+
agent-browser close
|
|
55
|
+
|
|
56
|
+
# Future runs: already authenticated
|
|
57
|
+
AGENTCORE_PROFILE_ID=my-app agent-browser -p agentcore open https://app.example.com/dashboard
|
|
58
|
+
```
|
|
59
|
+
|
|
60
|
+
## Live View
|
|
61
|
+
|
|
62
|
+
When a session starts, AgentCore prints a Live View URL to stderr. Open it in a browser to watch the session in real time from the AWS Console:
|
|
63
|
+
|
|
64
|
+
```
|
|
65
|
+
Session: abc123-def456
|
|
66
|
+
Live View: https://us-east-1.console.aws.amazon.com/bedrock-agentcore/browser/aws.browser.v1/session/abc123-def456#
|
|
67
|
+
```
|
|
68
|
+
|
|
69
|
+
## Region Selection
|
|
70
|
+
|
|
71
|
+
```bash
|
|
72
|
+
# Default: us-east-1
|
|
73
|
+
agent-browser -p agentcore open https://example.com
|
|
74
|
+
|
|
75
|
+
# Explicit region
|
|
76
|
+
AGENTCORE_REGION=eu-west-1 agent-browser -p agentcore open https://example.com
|
|
77
|
+
```
|
|
78
|
+
|
|
79
|
+
## Credential Patterns
|
|
80
|
+
|
|
81
|
+
```bash
|
|
82
|
+
# Explicit credentials (CI/CD, scripts)
|
|
83
|
+
export AWS_ACCESS_KEY_ID=AKIA...
|
|
84
|
+
export AWS_SECRET_ACCESS_KEY=...
|
|
85
|
+
agent-browser -p agentcore open https://example.com
|
|
86
|
+
|
|
87
|
+
# SSO (interactive)
|
|
88
|
+
aws sso login --profile my-profile
|
|
89
|
+
AWS_PROFILE=my-profile agent-browser -p agentcore open https://example.com
|
|
90
|
+
|
|
91
|
+
# IAM role / default credential chain
|
|
92
|
+
agent-browser -p agentcore open https://example.com
|
|
93
|
+
```
|
|
94
|
+
|
|
95
|
+
## Using with AGENT_BROWSER_PROVIDER
|
|
96
|
+
|
|
97
|
+
Set the provider via environment variable to avoid passing `-p agentcore` on every command:
|
|
98
|
+
|
|
99
|
+
```bash
|
|
100
|
+
export AGENT_BROWSER_PROVIDER=agentcore
|
|
101
|
+
export AGENTCORE_REGION=us-east-2
|
|
102
|
+
|
|
103
|
+
agent-browser open https://example.com
|
|
104
|
+
agent-browser snapshot -i
|
|
105
|
+
agent-browser click @e1
|
|
106
|
+
agent-browser close
|
|
107
|
+
```
|
|
108
|
+
|
|
109
|
+
## Common Issues
|
|
110
|
+
|
|
111
|
+
**"Failed to run aws CLI"** means AWS CLI is not installed or not in PATH. Either install it or set `AWS_ACCESS_KEY_ID` and `AWS_SECRET_ACCESS_KEY` directly.
|
|
112
|
+
|
|
113
|
+
**"AWS CLI failed: ... Run 'aws sso login'"** means SSO credentials have expired. Run `aws sso login` to refresh them.
|
|
114
|
+
|
|
115
|
+
**Session timeout:** The default is 3600 seconds (1 hour). For longer tasks, increase with `AGENTCORE_SESSION_TIMEOUT=7200`.
|