agent-browser 0.8.7 → 0.8.8

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Binary file
Binary file
Binary file
Binary file
Binary file
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "agent-browser",
3
- "version": "0.8.7",
3
+ "version": "0.8.8",
4
4
  "description": "Headless browser automation CLI for AI agents",
5
5
  "type": "module",
6
6
  "main": "dist/daemon.js",
@@ -1,339 +1,172 @@
1
1
  ---
2
2
  name: agent-browser
3
- description: Automates browser interactions for web testing, form filling, screenshots, and data extraction. Use when the user needs to navigate websites, interact with web pages, fill forms, take screenshots, test web applications, or extract information from web pages.
3
+ description: Browser automation CLI for AI agents. Use when the user needs to interact with websites, including navigating pages, filling forms, clicking buttons, taking screenshots, extracting data, testing web apps, or automating any browser task. Triggers include requests to "open a website", "fill out a form", "click a button", "take a screenshot", "scrape data from a page", "test this web app", "login to a site", "automate browser actions", or any task requiring programmatic web interaction.
4
4
  allowed-tools: Bash(agent-browser:*)
5
5
  ---
6
6
 
7
7
  # Browser Automation with agent-browser
8
8
 
9
- ## Quick start
9
+ ## Core Workflow
10
10
 
11
- ```bash
12
- agent-browser open <url> # Navigate to page
13
- agent-browser snapshot -i # Get interactive elements with refs
14
- agent-browser click @e1 # Click element by ref
15
- agent-browser fill @e2 "text" # Fill input by ref
16
- agent-browser close # Close browser
17
- ```
18
-
19
- ## Core workflow
20
-
21
- 1. Navigate: `agent-browser open <url>`
22
- 2. Snapshot: `agent-browser snapshot -i` (returns elements with refs like `@e1`, `@e2`)
23
- 3. Interact using refs from the snapshot
24
- 4. Re-snapshot after navigation or significant DOM changes
25
-
26
- ## Commands
27
-
28
- ### Navigation
29
-
30
- ```bash
31
- agent-browser open <url> # Navigate to URL (aliases: goto, navigate)
32
- # Supports: https://, http://, file://, about:, data://
33
- # Auto-prepends https:// if no protocol given
34
- agent-browser back # Go back
35
- agent-browser forward # Go forward
36
- agent-browser reload # Reload page
37
- agent-browser close # Close browser (aliases: quit, exit)
38
- agent-browser connect 9222 # Connect to browser via CDP port
39
- ```
40
-
41
- ### Snapshot (page analysis)
42
-
43
- ```bash
44
- agent-browser snapshot # Full accessibility tree
45
- agent-browser snapshot -i # Interactive elements only (recommended)
46
- agent-browser snapshot -c # Compact output
47
- agent-browser snapshot -d 3 # Limit depth to 3
48
- agent-browser snapshot -s "#main" # Scope to CSS selector
49
- ```
50
-
51
- ### Interactions (use @refs from snapshot)
52
-
53
- ```bash
54
- agent-browser click @e1 # Click
55
- agent-browser dblclick @e1 # Double-click
56
- agent-browser focus @e1 # Focus element
57
- agent-browser fill @e2 "text" # Clear and type
58
- agent-browser type @e2 "text" # Type without clearing
59
- agent-browser press Enter # Press key (alias: key)
60
- agent-browser press Control+a # Key combination
61
- agent-browser keydown Shift # Hold key down
62
- agent-browser keyup Shift # Release key
63
- agent-browser hover @e1 # Hover
64
- agent-browser check @e1 # Check checkbox
65
- agent-browser uncheck @e1 # Uncheck checkbox
66
- agent-browser select @e1 "value" # Select dropdown option
67
- agent-browser select @e1 "a" "b" # Select multiple options
68
- agent-browser scroll down 500 # Scroll page (default: down 300px)
69
- agent-browser scrollintoview @e1 # Scroll element into view (alias: scrollinto)
70
- agent-browser drag @e1 @e2 # Drag and drop
71
- agent-browser upload @e1 file.pdf # Upload files
72
- ```
73
-
74
- ### Get information
75
-
76
- ```bash
77
- agent-browser get text @e1 # Get element text
78
- agent-browser get html @e1 # Get innerHTML
79
- agent-browser get value @e1 # Get input value
80
- agent-browser get attr @e1 href # Get attribute
81
- agent-browser get title # Get page title
82
- agent-browser get url # Get current URL
83
- agent-browser get count ".item" # Count matching elements
84
- agent-browser get box @e1 # Get bounding box
85
- agent-browser get styles @e1 # Get computed styles (font, color, bg, etc.)
86
- ```
87
-
88
- ### Check state
89
-
90
- ```bash
91
- agent-browser is visible @e1 # Check if visible
92
- agent-browser is enabled @e1 # Check if enabled
93
- agent-browser is checked @e1 # Check if checked
94
- ```
95
-
96
- ### Screenshots & PDF
97
-
98
- ```bash
99
- agent-browser screenshot # Save to a temporary directory
100
- agent-browser screenshot path.png # Save to a specific path
101
- agent-browser screenshot --full # Full page
102
- agent-browser pdf output.pdf # Save as PDF
103
- ```
11
+ Every browser automation follows this pattern:
104
12
 
105
- ### Video recording
13
+ 1. **Navigate**: `agent-browser open <url>`
14
+ 2. **Snapshot**: `agent-browser snapshot -i` (get element refs like `@e1`, `@e2`)
15
+ 3. **Interact**: Use refs to click, fill, select
16
+ 4. **Re-snapshot**: After navigation or DOM changes, get fresh refs
106
17
 
107
18
  ```bash
108
- agent-browser record start ./demo.webm # Start recording (uses current URL + state)
109
- agent-browser click @e1 # Perform actions
110
- agent-browser record stop # Stop and save video
111
- agent-browser record restart ./take2.webm # Stop current + start new recording
112
- ```
113
-
114
- Recording creates a fresh context but preserves cookies/storage from your session. If no URL is provided, it
115
- automatically returns to your current page. For smooth demos, explore first, then start recording.
116
-
117
- ### Wait
19
+ agent-browser open https://example.com/form
20
+ agent-browser snapshot -i
21
+ # Output: @e1 [input type="email"], @e2 [input type="password"], @e3 [button] "Submit"
118
22
 
119
- ```bash
120
- agent-browser wait @e1 # Wait for element
121
- agent-browser wait 2000 # Wait milliseconds
122
- agent-browser wait --text "Success" # Wait for text (or -t)
123
- agent-browser wait --url "**/dashboard" # Wait for URL pattern (or -u)
124
- agent-browser wait --load networkidle # Wait for network idle (or -l)
125
- agent-browser wait --fn "window.ready" # Wait for JS condition (or -f)
23
+ agent-browser fill @e1 "user@example.com"
24
+ agent-browser fill @e2 "password123"
25
+ agent-browser click @e3
26
+ agent-browser wait --load networkidle
27
+ agent-browser snapshot -i # Check result
126
28
  ```
127
29
 
128
- ### Mouse control
30
+ ## Essential Commands
129
31
 
130
32
  ```bash
131
- agent-browser mouse move 100 200 # Move mouse
132
- agent-browser mouse down left # Press button
133
- agent-browser mouse up left # Release button
134
- agent-browser mouse wheel 100 # Scroll wheel
135
- ```
136
-
137
- ### Semantic locators (alternative to refs)
33
+ # Navigation
34
+ agent-browser open <url> # Navigate (aliases: goto, navigate)
35
+ agent-browser close # Close browser
138
36
 
139
- ```bash
140
- agent-browser find role button click --name "Submit"
141
- agent-browser find text "Sign In" click
142
- agent-browser find text "Sign In" click --exact # Exact match only
143
- agent-browser find label "Email" fill "user@test.com"
144
- agent-browser find placeholder "Search" type "query"
145
- agent-browser find alt "Logo" click
146
- agent-browser find title "Close" click
147
- agent-browser find testid "submit-btn" click
148
- agent-browser find first ".item" click
149
- agent-browser find last ".item" click
150
- agent-browser find nth 2 "a" hover
151
- ```
37
+ # Snapshot
38
+ agent-browser snapshot -i # Interactive elements with refs (recommended)
39
+ agent-browser snapshot -s "#selector" # Scope to CSS selector
152
40
 
153
- ### Browser settings
41
+ # Interaction (use @refs from snapshot)
42
+ agent-browser click @e1 # Click element
43
+ agent-browser fill @e2 "text" # Clear and type text
44
+ agent-browser type @e2 "text" # Type without clearing
45
+ agent-browser select @e1 "option" # Select dropdown option
46
+ agent-browser check @e1 # Check checkbox
47
+ agent-browser press Enter # Press key
48
+ agent-browser scroll down 500 # Scroll page
154
49
 
155
- ```bash
156
- agent-browser set viewport 1920 1080 # Set viewport size
157
- agent-browser set device "iPhone 14" # Emulate device
158
- agent-browser set geo 37.7749 -122.4194 # Set geolocation (alias: geolocation)
159
- agent-browser set offline on # Toggle offline mode
160
- agent-browser set headers '{"X-Key":"v"}' # Extra HTTP headers
161
- agent-browser set credentials user pass # HTTP basic auth (alias: auth)
162
- agent-browser set media dark # Emulate color scheme
163
- agent-browser set media light reduced-motion # Light mode + reduced motion
164
- ```
50
+ # Get information
51
+ agent-browser get text @e1 # Get element text
52
+ agent-browser get url # Get current URL
53
+ agent-browser get title # Get page title
165
54
 
166
- ### Cookies & Storage
55
+ # Wait
56
+ agent-browser wait @e1 # Wait for element
57
+ agent-browser wait --load networkidle # Wait for network idle
58
+ agent-browser wait --url "**/page" # Wait for URL pattern
59
+ agent-browser wait 2000 # Wait milliseconds
167
60
 
168
- ```bash
169
- agent-browser cookies # Get all cookies
170
- agent-browser cookies set name value # Set cookie
171
- agent-browser cookies clear # Clear cookies
172
- agent-browser storage local # Get all localStorage
173
- agent-browser storage local key # Get specific key
174
- agent-browser storage local set k v # Set value
175
- agent-browser storage local clear # Clear all
61
+ # Capture
62
+ agent-browser screenshot # Screenshot to temp dir
63
+ agent-browser screenshot --full # Full page screenshot
64
+ agent-browser pdf output.pdf # Save as PDF
176
65
  ```
177
66
 
178
- ### Network
67
+ ## Common Patterns
179
68
 
180
- ```bash
181
- agent-browser network route <url> # Intercept requests
182
- agent-browser network route <url> --abort # Block requests
183
- agent-browser network route <url> --body '{}' # Mock response
184
- agent-browser network unroute [url] # Remove routes
185
- agent-browser network requests # View tracked requests
186
- agent-browser network requests --filter api # Filter requests
187
- ```
188
-
189
- ### Tabs & Windows
69
+ ### Form Submission
190
70
 
191
71
  ```bash
192
- agent-browser tab # List tabs
193
- agent-browser tab new [url] # New tab
194
- agent-browser tab 2 # Switch to tab by index
195
- agent-browser tab close # Close current tab
196
- agent-browser tab close 2 # Close tab by index
197
- agent-browser window new # New window
72
+ agent-browser open https://example.com/signup
73
+ agent-browser snapshot -i
74
+ agent-browser fill @e1 "Jane Doe"
75
+ agent-browser fill @e2 "jane@example.com"
76
+ agent-browser select @e3 "California"
77
+ agent-browser check @e4
78
+ agent-browser click @e5
79
+ agent-browser wait --load networkidle
198
80
  ```
199
81
 
200
- ### Frames
82
+ ### Authentication with State Persistence
201
83
 
202
84
  ```bash
203
- agent-browser frame "#iframe" # Switch to iframe
204
- agent-browser frame main # Back to main frame
205
- ```
206
-
207
- ### Dialogs
85
+ # Login once and save state
86
+ agent-browser open https://app.example.com/login
87
+ agent-browser snapshot -i
88
+ agent-browser fill @e1 "$USERNAME"
89
+ agent-browser fill @e2 "$PASSWORD"
90
+ agent-browser click @e3
91
+ agent-browser wait --url "**/dashboard"
92
+ agent-browser state save auth.json
208
93
 
209
- ```bash
210
- agent-browser dialog accept [text] # Accept dialog
211
- agent-browser dialog dismiss # Dismiss dialog
94
+ # Reuse in future sessions
95
+ agent-browser state load auth.json
96
+ agent-browser open https://app.example.com/dashboard
212
97
  ```
213
98
 
214
- ### JavaScript
99
+ ### Data Extraction
215
100
 
216
101
  ```bash
217
- agent-browser eval "document.title" # Run JavaScript
218
- ```
219
-
220
- ## Global options
102
+ agent-browser open https://example.com/products
103
+ agent-browser snapshot -i
104
+ agent-browser get text @e5 # Get specific element text
105
+ agent-browser get text body > page.txt # Get all page text
221
106
 
222
- ```bash
223
- agent-browser --session <name> ... # Isolated browser session
224
- agent-browser --json ... # JSON output for parsing
225
- agent-browser --headed ... # Show browser window (not headless)
226
- agent-browser --full ... # Full page screenshot (-f)
227
- agent-browser --cdp <port> ... # Connect via Chrome DevTools Protocol
228
- agent-browser -p <provider> ... # Cloud browser provider (--provider)
229
- agent-browser --proxy <url> ... # Use proxy server
230
- agent-browser --headers <json> ... # HTTP headers scoped to URL's origin
231
- agent-browser --executable-path <p> # Custom browser executable
232
- agent-browser --extension <path> ... # Load browser extension (repeatable)
233
- agent-browser --help # Show help (-h)
234
- agent-browser --version # Show version (-V)
235
- agent-browser <command> --help # Show detailed help for a command
107
+ # JSON output for parsing
108
+ agent-browser snapshot -i --json
109
+ agent-browser get text @e1 --json
236
110
  ```
237
111
 
238
- ### Proxy support
112
+ ### Parallel Sessions
239
113
 
240
114
  ```bash
241
- agent-browser --proxy http://proxy.com:8080 open example.com
242
- agent-browser --proxy http://user:pass@proxy.com:8080 open example.com
243
- agent-browser --proxy socks5://proxy.com:1080 open example.com
244
- ```
115
+ agent-browser --session site1 open https://site-a.com
116
+ agent-browser --session site2 open https://site-b.com
245
117
 
246
- ## Environment variables
118
+ agent-browser --session site1 snapshot -i
119
+ agent-browser --session site2 snapshot -i
247
120
 
248
- ```bash
249
- AGENT_BROWSER_SESSION="mysession" # Default session name
250
- AGENT_BROWSER_EXECUTABLE_PATH="/path/chrome" # Custom browser path
251
- AGENT_BROWSER_EXTENSIONS="/ext1,/ext2" # Comma-separated extension paths
252
- AGENT_BROWSER_PROVIDER="your-cloud-browser-provider" # Cloud browser provider (select browseruse or browserbase)
253
- AGENT_BROWSER_STREAM_PORT="9223" # WebSocket streaming port
254
- AGENT_BROWSER_HOME="/path/to/agent-browser" # Custom install location (for daemon.js)
121
+ agent-browser session list
255
122
  ```
256
123
 
257
- ## Example: Form submission
124
+ ### Visual Browser (Debugging)
258
125
 
259
126
  ```bash
260
- agent-browser open https://example.com/form
261
- agent-browser snapshot -i
262
- # Output shows: textbox "Email" [ref=e1], textbox "Password" [ref=e2], button "Submit" [ref=e3]
263
-
264
- agent-browser fill @e1 "user@example.com"
265
- agent-browser fill @e2 "password123"
266
- agent-browser click @e3
267
- agent-browser wait --load networkidle
268
- agent-browser snapshot -i # Check result
127
+ agent-browser --headed open https://example.com
128
+ agent-browser highlight @e1 # Highlight element
129
+ agent-browser record start demo.webm # Record session
269
130
  ```
270
131
 
271
- ## Example: Authentication with saved state
132
+ ## Ref Lifecycle (Important)
272
133
 
273
- ```bash
274
- # Login once
275
- agent-browser open https://app.example.com/login
276
- agent-browser snapshot -i
277
- agent-browser fill @e1 "username"
278
- agent-browser fill @e2 "password"
279
- agent-browser click @e3
280
- agent-browser wait --url "**/dashboard"
281
- agent-browser state save auth.json
282
-
283
- # Later sessions: load saved state
284
- agent-browser state load auth.json
285
- agent-browser open https://app.example.com/dashboard
286
- ```
134
+ Refs (`@e1`, `@e2`, etc.) are invalidated when the page changes. Always re-snapshot after:
287
135
 
288
- ## Sessions (parallel browsers)
136
+ - Clicking links or buttons that navigate
137
+ - Form submissions
138
+ - Dynamic content loading (dropdowns, modals)
289
139
 
290
140
  ```bash
291
- agent-browser --session test1 open site-a.com
292
- agent-browser --session test2 open site-b.com
293
- agent-browser session list
141
+ agent-browser click @e5 # Navigates to new page
142
+ agent-browser snapshot -i # MUST re-snapshot
143
+ agent-browser click @e1 # Use new refs
294
144
  ```
295
145
 
296
- ## JSON output (for parsing)
297
-
298
- Add `--json` for machine-readable output:
299
-
300
- ```bash
301
- agent-browser snapshot -i --json
302
- agent-browser get text @e1 --json
303
- ```
146
+ ## Semantic Locators (Alternative to Refs)
304
147
 
305
- ## Debugging
148
+ When refs are unavailable or unreliable, use semantic locators:
306
149
 
307
150
  ```bash
308
- agent-browser --headed open example.com # Show browser window
309
- agent-browser --cdp 9222 snapshot # Connect via CDP port
310
- agent-browser connect 9222 # Alternative: connect command
311
- agent-browser console # View console messages
312
- agent-browser console --clear # Clear console
313
- agent-browser errors # View page errors
314
- agent-browser errors --clear # Clear errors
315
- agent-browser highlight @e1 # Highlight element
316
- agent-browser trace start # Start recording trace
317
- agent-browser trace stop trace.zip # Stop and save trace
318
- agent-browser record start ./debug.webm # Record video from current page
319
- agent-browser record stop # Save recording
151
+ agent-browser find text "Sign In" click
152
+ agent-browser find label "Email" fill "user@test.com"
153
+ agent-browser find role button click --name "Submit"
154
+ agent-browser find placeholder "Search" type "query"
155
+ agent-browser find testid "submit-btn" click
320
156
  ```
321
157
 
322
- ## Deep-dive documentation
323
-
324
- For detailed patterns and best practices, see:
158
+ ## Deep-Dive Documentation
325
159
 
326
- | Reference | Description |
160
+ | Reference | When to Use |
327
161
  |-----------|-------------|
162
+ | [references/commands.md](references/commands.md) | Full command reference with all options |
328
163
  | [references/snapshot-refs.md](references/snapshot-refs.md) | Ref lifecycle, invalidation rules, troubleshooting |
329
164
  | [references/session-management.md](references/session-management.md) | Parallel sessions, state persistence, concurrent scraping |
330
165
  | [references/authentication.md](references/authentication.md) | Login flows, OAuth, 2FA handling, state reuse |
331
166
  | [references/video-recording.md](references/video-recording.md) | Recording workflows for debugging and documentation |
332
167
  | [references/proxy-support.md](references/proxy-support.md) | Proxy configuration, geo-testing, rotating proxies |
333
168
 
334
- ## Ready-to-use templates
335
-
336
- Executable workflow scripts for common patterns:
169
+ ## Ready-to-Use Templates
337
170
 
338
171
  | Template | Description |
339
172
  |----------|-------------|
@@ -341,16 +174,8 @@ Executable workflow scripts for common patterns:
341
174
  | [templates/authenticated-session.sh](templates/authenticated-session.sh) | Login once, reuse state |
342
175
  | [templates/capture-workflow.sh](templates/capture-workflow.sh) | Content extraction with screenshots |
343
176
 
344
- Usage:
345
177
  ```bash
346
178
  ./templates/form-automation.sh https://example.com/form
347
179
  ./templates/authenticated-session.sh https://app.example.com/login
348
180
  ./templates/capture-workflow.sh https://example.com ./output
349
181
  ```
350
-
351
- ## HTTPS Certificate Errors
352
-
353
- For sites with self-signed or invalid certificates:
354
- ```bash
355
- agent-browser open https://localhost:8443 --ignore-https-errors
356
- ```
@@ -1,6 +1,20 @@
1
1
  # Authentication Patterns
2
2
 
3
- Patterns for handling login flows, session persistence, and authenticated browsing.
3
+ Login flows, session persistence, OAuth, 2FA, and authenticated browsing.
4
+
5
+ **Related**: [session-management.md](session-management.md) for state persistence details, [SKILL.md](../SKILL.md) for quick start.
6
+
7
+ ## Contents
8
+
9
+ - [Basic Login Flow](#basic-login-flow)
10
+ - [Saving Authentication State](#saving-authentication-state)
11
+ - [Restoring Authentication](#restoring-authentication)
12
+ - [OAuth / SSO Flows](#oauth--sso-flows)
13
+ - [Two-Factor Authentication](#two-factor-authentication)
14
+ - [HTTP Basic Auth](#http-basic-auth)
15
+ - [Cookie-Based Auth](#cookie-based-auth)
16
+ - [Token Refresh Handling](#token-refresh-handling)
17
+ - [Security Best Practices](#security-best-practices)
4
18
 
5
19
  ## Basic Login Flow
6
20