@every-env/compound-plugin 2.36.5 → 2.37.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -3,9 +3,9 @@ name: agent-browser
3
3
  description: Browser automation using Vercel's agent-browser CLI. Use when you need to interact with web pages, fill forms, take screenshots, or scrape data. Alternative to Playwright MCP - uses Bash commands with ref-based element selection. Triggers on "browse website", "fill form", "click button", "take screenshot", "scrape page", "web automation".
4
4
  ---
5
5
 
6
- # agent-browser: CLI Browser Automation
6
+ # Browser Automation with agent-browser
7
7
 
8
- Vercel's headless browser automation CLI designed for AI agents. Uses ref-based selection (@e1, @e2) from accessibility snapshots.
8
+ The CLI uses Chrome/Chromium via CDP directly. Install via `npm i -g agent-browser`, `brew install agent-browser`, or `cargo install agent-browser`. Run `agent-browser install` to download Chrome.
9
9
 
10
10
  ## Setup Check
11
11
 
@@ -23,284 +23,625 @@ agent-browser install # Downloads Chromium
23
23
 
24
24
  ## Core Workflow
25
25
 
26
- **The snapshot + ref pattern is optimal for LLMs:**
26
+ Every browser automation follows this pattern:
27
27
 
28
- 1. **Navigate** to URL
29
- 2. **Snapshot** to get interactive elements with refs
30
- 3. **Interact** using refs (@e1, @e2, etc.)
31
- 4. **Re-snapshot** after navigation or DOM changes
28
+ 1. **Navigate**: `agent-browser open <url>`
29
+ 2. **Snapshot**: `agent-browser snapshot -i` (get element refs like `@e1`, `@e2`)
30
+ 3. **Interact**: Use refs to click, fill, select
31
+ 4. **Re-snapshot**: After navigation or DOM changes, get fresh refs
32
32
 
33
33
  ```bash
34
- # Step 1: Open URL
35
- agent-browser open https://example.com
34
+ agent-browser open https://example.com/form
35
+ agent-browser snapshot -i
36
+ # Output: @e1 [input type="email"], @e2 [input type="password"], @e3 [button] "Submit"
36
37
 
37
- # Step 2: Get interactive elements with refs
38
- agent-browser snapshot -i --json
38
+ agent-browser fill @e1 "user@example.com"
39
+ agent-browser fill @e2 "password123"
40
+ agent-browser click @e3
41
+ agent-browser wait --load networkidle
42
+ agent-browser snapshot -i # Check result
43
+ ```
39
44
 
40
- # Step 3: Interact using refs
41
- agent-browser click @e1
42
- agent-browser fill @e2 "search query"
45
+ ## Command Chaining
43
46
 
44
- # Step 4: Re-snapshot after changes
45
- agent-browser snapshot -i
47
+ Commands can be chained with `&&` in a single shell invocation. The browser persists between commands via a background daemon, so chaining is safe and more efficient than separate calls.
48
+
49
+ ```bash
50
+ # Chain open + wait + snapshot in one call
51
+ agent-browser open https://example.com && agent-browser wait --load networkidle && agent-browser snapshot -i
52
+
53
+ # Chain multiple interactions
54
+ agent-browser fill @e1 "user@example.com" && agent-browser fill @e2 "password123" && agent-browser click @e3
55
+
56
+ # Navigate and capture
57
+ agent-browser open https://example.com && agent-browser wait --load networkidle && agent-browser screenshot page.png
46
58
  ```
47
59
 
48
- ## Key Commands
60
+ **When to chain:** Use `&&` when you don't need to read the output of an intermediate command before proceeding (e.g., open + wait + screenshot). Run commands separately when you need to parse the output first (e.g., snapshot to discover refs, then interact using those refs).
61
+
62
+ ## Handling Authentication
49
63
 
50
- ### Navigation
64
+ When automating a site that requires login, choose the approach that fits:
65
+
66
+ **Option 1: Import auth from the user's browser (fastest for one-off tasks)**
51
67
 
52
68
  ```bash
53
- agent-browser open <url> # Navigate to URL
54
- agent-browser back # Go back
55
- agent-browser forward # Go forward
56
- agent-browser reload # Reload page
57
- agent-browser close # Close browser
69
+ # Connect to the user's running Chrome (they're already logged in)
70
+ agent-browser --auto-connect state save ./auth.json
71
+ # Use that auth state
72
+ agent-browser --state ./auth.json open https://app.example.com/dashboard
58
73
  ```
59
74
 
60
- ### Snapshots (Essential for AI)
75
+ State files contain session tokens in plaintext -- add to `.gitignore` and delete when no longer needed. Set `AGENT_BROWSER_ENCRYPTION_KEY` for encryption at rest.
76
+
77
+ **Option 2: Persistent profile (simplest for recurring tasks)**
61
78
 
62
79
  ```bash
63
- agent-browser snapshot # Full accessibility tree
64
- agent-browser snapshot -i # Interactive elements only (recommended)
65
- agent-browser snapshot -i --json # JSON output for parsing
66
- agent-browser snapshot -c # Compact (remove empty elements)
67
- agent-browser snapshot -d 3 # Limit depth
80
+ # First run: login manually or via automation
81
+ agent-browser --profile ~/.myapp open https://app.example.com/login
82
+ # ... fill credentials, submit ...
83
+
84
+ # All future runs: already authenticated
85
+ agent-browser --profile ~/.myapp open https://app.example.com/dashboard
68
86
  ```
69
87
 
70
- ### Interactions
88
+ **Option 3: Session name (auto-save/restore cookies + localStorage)**
71
89
 
72
90
  ```bash
73
- agent-browser click @e1 # Click element
74
- agent-browser dblclick @e1 # Double-click
75
- agent-browser fill @e1 "text" # Clear and fill input
76
- agent-browser type @e1 "text" # Type without clearing
77
- agent-browser press Enter # Press key
78
- agent-browser hover @e1 # Hover element
79
- agent-browser check @e1 # Check checkbox
80
- agent-browser uncheck @e1 # Uncheck checkbox
81
- agent-browser select @e1 "option" # Select dropdown option
82
- agent-browser scroll down 500 # Scroll (up/down/left/right)
83
- agent-browser scrollintoview @e1 # Scroll element into view
91
+ agent-browser --session-name myapp open https://app.example.com/login
92
+ # ... login flow ...
93
+ agent-browser close # State auto-saved
94
+
95
+ # Next time: state auto-restored
96
+ agent-browser --session-name myapp open https://app.example.com/dashboard
84
97
  ```
85
98
 
86
- ### Get Information
99
+ **Option 4: Auth vault (credentials stored encrypted, login by name)**
87
100
 
88
101
  ```bash
89
- agent-browser get text @e1 # Get element text
90
- agent-browser get html @e1 # Get element HTML
91
- agent-browser get value @e1 # Get input value
92
- agent-browser get attr href @e1 # Get attribute
93
- agent-browser get title # Get page title
94
- agent-browser get url # Get current URL
95
- agent-browser get count "button" # Count matching elements
102
+ echo "$PASSWORD" | agent-browser auth save myapp --url https://app.example.com/login --username user --password-stdin
103
+ agent-browser auth login myapp
96
104
  ```
97
105
 
98
- ### Screenshots & PDFs
106
+ **Option 5: State file (manual save/load)**
99
107
 
100
108
  ```bash
101
- agent-browser screenshot # Viewport screenshot
102
- agent-browser screenshot --full # Full page
103
- agent-browser screenshot output.png # Save to file
104
- agent-browser screenshot --full output.png # Full page to file
105
- agent-browser pdf output.pdf # Save as PDF
109
+ # After logging in:
110
+ agent-browser state save ./auth.json
111
+ # In a future session:
112
+ agent-browser state load ./auth.json
113
+ agent-browser open https://app.example.com/dashboard
106
114
  ```
107
115
 
108
- ### Wait
116
+ See [references/authentication.md](references/authentication.md) for OAuth, 2FA, cookie-based auth, and token refresh patterns.
117
+
118
+ ## Essential Commands
109
119
 
110
120
  ```bash
111
- agent-browser wait @e1 # Wait for element
112
- agent-browser wait 2000 # Wait milliseconds
113
- agent-browser wait "text" # Wait for text to appear
121
+ # Navigation
122
+ agent-browser open <url> # Navigate (aliases: goto, navigate)
123
+ agent-browser close # Close browser
124
+
125
+ # Snapshot
126
+ agent-browser snapshot -i # Interactive elements with refs (recommended)
127
+ agent-browser snapshot -i -C # Include cursor-interactive elements (divs with onclick, cursor:pointer)
128
+ agent-browser snapshot -s "#selector" # Scope to CSS selector
129
+
130
+ # Interaction (use @refs from snapshot)
131
+ agent-browser click @e1 # Click element
132
+ agent-browser click @e1 --new-tab # Click and open in new tab
133
+ agent-browser fill @e2 "text" # Clear and type text
134
+ agent-browser type @e2 "text" # Type without clearing
135
+ agent-browser select @e1 "option" # Select dropdown option
136
+ agent-browser check @e1 # Check checkbox
137
+ agent-browser press Enter # Press key
138
+ agent-browser keyboard type "text" # Type at current focus (no selector)
139
+ agent-browser keyboard inserttext "text" # Insert without key events
140
+ agent-browser scroll down 500 # Scroll page
141
+ agent-browser scroll down 500 --selector "div.content" # Scroll within a specific container
142
+
143
+ # Get information
144
+ agent-browser get text @e1 # Get element text
145
+ agent-browser get url # Get current URL
146
+ agent-browser get title # Get page title
147
+ agent-browser get cdp-url # Get CDP WebSocket URL
148
+
149
+ # Wait
150
+ agent-browser wait @e1 # Wait for element
151
+ agent-browser wait --load networkidle # Wait for network idle
152
+ agent-browser wait --url "**/page" # Wait for URL pattern
153
+ agent-browser wait 2000 # Wait milliseconds
154
+ agent-browser wait --text "Welcome" # Wait for text to appear (substring match)
155
+ agent-browser wait --fn "!document.body.innerText.includes('Loading...')" # Wait for text to disappear
156
+ agent-browser wait "#spinner" --state hidden # Wait for element to disappear
157
+
158
+ # Downloads
159
+ agent-browser download @e1 ./file.pdf # Click element to trigger download
160
+ agent-browser wait --download ./output.zip # Wait for any download to complete
161
+ agent-browser --download-path ./downloads open <url> # Set default download directory
162
+
163
+ # Viewport & Device Emulation
164
+ agent-browser set viewport 1920 1080 # Set viewport size (default: 1280x720)
165
+ agent-browser set viewport 1920 1080 2 # 2x retina (same CSS size, higher res screenshots)
166
+ agent-browser set device "iPhone 14" # Emulate device (viewport + user agent)
167
+
168
+ # Capture
169
+ agent-browser screenshot # Screenshot to temp dir
170
+ agent-browser screenshot --full # Full page screenshot
171
+ agent-browser screenshot --annotate # Annotated screenshot with numbered element labels
172
+ agent-browser screenshot --screenshot-dir ./shots # Save to custom directory
173
+ agent-browser screenshot --screenshot-format jpeg --screenshot-quality 80
174
+ agent-browser pdf output.pdf # Save as PDF
175
+
176
+ # Clipboard
177
+ agent-browser clipboard read # Read text from clipboard
178
+ agent-browser clipboard write "Hello, World!" # Write text to clipboard
179
+ agent-browser clipboard copy # Copy current selection
180
+ agent-browser clipboard paste # Paste from clipboard
181
+
182
+ # Diff (compare page states)
183
+ agent-browser diff snapshot # Compare current vs last snapshot
184
+ agent-browser diff snapshot --baseline before.txt # Compare current vs saved file
185
+ agent-browser diff screenshot --baseline before.png # Visual pixel diff
186
+ agent-browser diff url <url1> <url2> # Compare two pages
187
+ agent-browser diff url <url1> <url2> --wait-until networkidle # Custom wait strategy
188
+ agent-browser diff url <url1> <url2> --selector "#main" # Scope to element
114
189
  ```
115
190
 
116
- ## Semantic Locators (Alternative to Refs)
191
+ ## Common Patterns
192
+
193
+ ### Form Submission
117
194
 
118
195
  ```bash
119
- agent-browser find role button click --name "Submit"
120
- agent-browser find text "Sign up" click
121
- agent-browser find label "Email" fill "user@example.com"
122
- agent-browser find placeholder "Search..." fill "query"
196
+ agent-browser open https://example.com/signup
197
+ agent-browser snapshot -i
198
+ agent-browser fill @e1 "Jane Doe"
199
+ agent-browser fill @e2 "jane@example.com"
200
+ agent-browser select @e3 "California"
201
+ agent-browser check @e4
202
+ agent-browser click @e5
203
+ agent-browser wait --load networkidle
123
204
  ```
124
205
 
125
- ## Sessions (Parallel Browsers)
206
+ ### Authentication with Auth Vault (Recommended)
126
207
 
127
208
  ```bash
128
- # Run multiple independent browser sessions
129
- agent-browser --session browser1 open https://site1.com
130
- agent-browser --session browser2 open https://site2.com
209
+ # Save credentials once (encrypted with AGENT_BROWSER_ENCRYPTION_KEY)
210
+ # Recommended: pipe password via stdin to avoid shell history exposure
211
+ echo "pass" | agent-browser auth save github --url https://github.com/login --username user --password-stdin
131
212
 
132
- # List active sessions
133
- agent-browser session list
134
- ```
213
+ # Login using saved profile (LLM never sees password)
214
+ agent-browser auth login github
135
215
 
136
- ## Examples
216
+ # List/show/delete profiles
217
+ agent-browser auth list
218
+ agent-browser auth show github
219
+ agent-browser auth delete github
220
+ ```
137
221
 
138
- ### Login Flow
222
+ ### Authentication with State Persistence
139
223
 
140
224
  ```bash
225
+ # Login once and save state
141
226
  agent-browser open https://app.example.com/login
142
227
  agent-browser snapshot -i
143
- # Output shows: textbox "Email" [ref=e1], textbox "Password" [ref=e2], button "Sign in" [ref=e3]
144
- agent-browser fill @e1 "user@example.com"
145
- agent-browser fill @e2 "password123"
228
+ agent-browser fill @e1 "$USERNAME"
229
+ agent-browser fill @e2 "$PASSWORD"
146
230
  agent-browser click @e3
147
- agent-browser wait 2000
148
- agent-browser snapshot -i # Verify logged in
231
+ agent-browser wait --url "**/dashboard"
232
+ agent-browser state save auth.json
233
+
234
+ # Reuse in future sessions
235
+ agent-browser state load auth.json
236
+ agent-browser open https://app.example.com/dashboard
149
237
  ```
150
238
 
151
- ### Search and Extract
239
+ ### Session Persistence
152
240
 
153
241
  ```bash
154
- agent-browser open https://news.ycombinator.com
155
- agent-browser snapshot -i --json
156
- # Parse JSON to find story links
157
- agent-browser get text @e12 # Get headline text
158
- agent-browser click @e12 # Click to open story
242
+ # Auto-save/restore cookies and localStorage across browser restarts
243
+ agent-browser --session-name myapp open https://app.example.com/login
244
+ # ... login flow ...
245
+ agent-browser close # State auto-saved to ~/.agent-browser/sessions/
246
+
247
+ # Next time, state is auto-loaded
248
+ agent-browser --session-name myapp open https://app.example.com/dashboard
249
+
250
+ # Encrypt state at rest
251
+ export AGENT_BROWSER_ENCRYPTION_KEY=$(openssl rand -hex 32)
252
+ agent-browser --session-name secure open https://app.example.com
253
+
254
+ # Manage saved states
255
+ agent-browser state list
256
+ agent-browser state show myapp-default.json
257
+ agent-browser state clear myapp
258
+ agent-browser state clean --older-than 7
159
259
  ```
160
260
 
161
- ### Form Filling
261
+ ### Data Extraction
162
262
 
163
263
  ```bash
164
- agent-browser open https://forms.example.com
264
+ agent-browser open https://example.com/products
165
265
  agent-browser snapshot -i
166
- agent-browser fill @e1 "John Doe"
167
- agent-browser fill @e2 "john@example.com"
168
- agent-browser select @e3 "United States"
169
- agent-browser check @e4 # Agree to terms
170
- agent-browser click @e5 # Submit button
171
- agent-browser screenshot confirmation.png
266
+ agent-browser get text @e5 # Get specific element text
267
+ agent-browser get text body > page.txt # Get all page text
268
+
269
+ # JSON output for parsing
270
+ agent-browser snapshot -i --json
271
+ agent-browser get text @e1 --json
172
272
  ```
173
273
 
174
- ### Debug Mode
274
+ ### Parallel Sessions
275
+
276
+ ```bash
277
+ agent-browser --session site1 open https://site-a.com
278
+ agent-browser --session site2 open https://site-b.com
279
+
280
+ agent-browser --session site1 snapshot -i
281
+ agent-browser --session site2 snapshot -i
282
+
283
+ agent-browser session list
284
+ ```
285
+
286
+ ### Connect to Existing Chrome
287
+
288
+ ```bash
289
+ # Auto-discover running Chrome with remote debugging enabled
290
+ agent-browser --auto-connect open https://example.com
291
+ agent-browser --auto-connect snapshot
292
+
293
+ # Or with explicit CDP port
294
+ agent-browser --cdp 9222 snapshot
295
+ ```
296
+
297
+ ### Color Scheme (Dark Mode)
298
+
299
+ ```bash
300
+ # Persistent dark mode via flag (applies to all pages and new tabs)
301
+ agent-browser --color-scheme dark open https://example.com
302
+
303
+ # Or via environment variable
304
+ AGENT_BROWSER_COLOR_SCHEME=dark agent-browser open https://example.com
305
+
306
+ # Or set during session (persists for subsequent commands)
307
+ agent-browser set media dark
308
+ ```
309
+
310
+ ### Viewport & Responsive Testing
311
+
312
+ ```bash
313
+ # Set a custom viewport size (default is 1280x720)
314
+ agent-browser set viewport 1920 1080
315
+ agent-browser screenshot desktop.png
316
+
317
+ # Test mobile-width layout
318
+ agent-browser set viewport 375 812
319
+ agent-browser screenshot mobile.png
320
+
321
+ # Retina/HiDPI: same CSS layout at 2x pixel density
322
+ # Screenshots stay at logical viewport size, but content renders at higher DPI
323
+ agent-browser set viewport 1920 1080 2
324
+ agent-browser screenshot retina.png
325
+
326
+ # Device emulation (sets viewport + user agent in one step)
327
+ agent-browser set device "iPhone 14"
328
+ agent-browser screenshot device.png
329
+ ```
330
+
331
+ The `scale` parameter (3rd argument) sets `window.devicePixelRatio` without changing CSS layout. Use it when testing retina rendering or capturing higher-resolution screenshots.
332
+
333
+ ### Visual Browser (Debugging)
175
334
 
176
335
  ```bash
177
- # Run with visible browser window
178
336
  agent-browser --headed open https://example.com
179
- agent-browser --headed snapshot -i
180
- agent-browser --headed click @e1
337
+ agent-browser highlight @e1 # Highlight element
338
+ agent-browser inspect # Open Chrome DevTools for the active page
339
+ agent-browser record start demo.webm # Record session
340
+ agent-browser profiler start # Start Chrome DevTools profiling
341
+ agent-browser profiler stop trace.json # Stop and save profile (path optional)
181
342
  ```
182
343
 
183
- ## JSON Output
344
+ Use `AGENT_BROWSER_HEADED=1` to enable headed mode via environment variable. Browser extensions work in both headed and headless mode.
184
345
 
185
- Add `--json` for structured output:
346
+ ### Local Files (PDFs, HTML)
186
347
 
187
348
  ```bash
188
- agent-browser snapshot -i --json
349
+ # Open local files with file:// URLs
350
+ agent-browser --allow-file-access open file:///path/to/document.pdf
351
+ agent-browser --allow-file-access open file:///path/to/page.html
352
+ agent-browser screenshot output.png
189
353
  ```
190
354
 
191
- Returns:
355
+ ### iOS Simulator (Mobile Safari)
356
+
357
+ ```bash
358
+ # List available iOS simulators
359
+ agent-browser device list
360
+
361
+ # Launch Safari on a specific device
362
+ agent-browser -p ios --device "iPhone 16 Pro" open https://example.com
363
+
364
+ # Same workflow as desktop - snapshot, interact, re-snapshot
365
+ agent-browser -p ios snapshot -i
366
+ agent-browser -p ios tap @e1 # Tap (alias for click)
367
+ agent-browser -p ios fill @e2 "text"
368
+ agent-browser -p ios swipe up # Mobile-specific gesture
369
+
370
+ # Take screenshot
371
+ agent-browser -p ios screenshot mobile.png
372
+
373
+ # Close session (shuts down simulator)
374
+ agent-browser -p ios close
375
+ ```
376
+
377
+ **Requirements:** macOS with Xcode, Appium (`npm install -g appium && appium driver install xcuitest`)
378
+
379
+ **Real devices:** Works with physical iOS devices if pre-configured. Use `--device "<UDID>"` where UDID is from `xcrun xctrace list devices`.
380
+
381
+ ## Security
382
+
383
+ All security features are opt-in. By default, agent-browser imposes no restrictions on navigation, actions, or output.
384
+
385
+ ### Content Boundaries (Recommended for AI Agents)
386
+
387
+ Enable `--content-boundaries` to wrap page-sourced output in markers that help LLMs distinguish tool output from untrusted page content:
388
+
389
+ ```bash
390
+ export AGENT_BROWSER_CONTENT_BOUNDARIES=1
391
+ agent-browser snapshot
392
+ # Output:
393
+ # --- AGENT_BROWSER_PAGE_CONTENT nonce=<hex> origin=https://example.com ---
394
+ # [accessibility tree]
395
+ # --- END_AGENT_BROWSER_PAGE_CONTENT nonce=<hex> ---
396
+ ```
397
+
398
+ ### Domain Allowlist
399
+
400
+ Restrict navigation to trusted domains. Wildcards like `*.example.com` also match the bare domain `example.com`. Sub-resource requests, WebSocket, and EventSource connections to non-allowed domains are also blocked. Include CDN domains your target pages depend on:
401
+
402
+ ```bash
403
+ export AGENT_BROWSER_ALLOWED_DOMAINS="example.com,*.example.com"
404
+ agent-browser open https://example.com # OK
405
+ agent-browser open https://malicious.com # Blocked
406
+ ```
407
+
408
+ ### Action Policy
409
+
410
+ Use a policy file to gate destructive actions:
411
+
412
+ ```bash
413
+ export AGENT_BROWSER_ACTION_POLICY=./policy.json
414
+ ```
415
+
416
+ Example `policy.json`:
417
+
192
418
  ```json
193
- {
194
- "success": true,
195
- "data": {
196
- "refs": {
197
- "e1": {"name": "Submit", "role": "button"},
198
- "e2": {"name": "Email", "role": "textbox"}
199
- },
200
- "snapshot": "- button \"Submit\" [ref=e1]\n- textbox \"Email\" [ref=e2]"
201
- }
202
- }
419
+ { "default": "deny", "allow": ["navigate", "snapshot", "click", "scroll", "wait", "get"] }
420
+ ```
421
+
422
+ Auth vault operations (`auth login`, etc.) bypass action policy but domain allowlist still applies.
423
+
424
+ ### Output Limits
425
+
426
+ Prevent context flooding from large pages:
427
+
428
+ ```bash
429
+ export AGENT_BROWSER_MAX_OUTPUT=50000
203
430
  ```
204
431
 
205
- ## Inspection & Debugging
432
+ ## Diffing (Verifying Changes)
206
433
 
207
- ### JavaScript Evaluation
434
+ Use `diff snapshot` after performing an action to verify it had the intended effect. This compares the current accessibility tree against the last snapshot taken in the session.
208
435
 
209
436
  ```bash
210
- agent-browser eval "document.title" # Evaluate JS expression
211
- agent-browser eval "JSON.stringify(localStorage)" # Return serialized data
212
- agent-browser eval "document.querySelectorAll('a').length" # Count elements
437
+ # Typical workflow: snapshot -> action -> diff
438
+ agent-browser snapshot -i # Take baseline snapshot
439
+ agent-browser click @e2 # Perform action
440
+ agent-browser diff snapshot # See what changed (auto-compares to last snapshot)
441
+ ```
442
+
443
+ For visual regression testing or monitoring:
444
+
445
+ ```bash
446
+ # Save a baseline screenshot, then compare later
447
+ agent-browser screenshot baseline.png
448
+ # ... time passes or changes are made ...
449
+ agent-browser diff screenshot --baseline baseline.png
450
+
451
+ # Compare staging vs production
452
+ agent-browser diff url https://staging.example.com https://prod.example.com --screenshot
453
+ ```
454
+
455
+ `diff snapshot` output uses `+` for additions and `-` for removals, similar to git diff. `diff screenshot` produces a diff image with changed pixels highlighted in red, plus a mismatch percentage.
456
+
457
+ ## Timeouts and Slow Pages
458
+
459
+ The default timeout is 25 seconds. This can be overridden with the `AGENT_BROWSER_DEFAULT_TIMEOUT` environment variable (value in milliseconds). For slow websites or large pages, use explicit waits instead of relying on the default timeout:
460
+
461
+ ```bash
462
+ # Wait for network activity to settle (best for slow pages)
463
+ agent-browser wait --load networkidle
464
+
465
+ # Wait for a specific element to appear
466
+ agent-browser wait "#content"
467
+ agent-browser wait @e1
468
+
469
+ # Wait for a specific URL pattern (useful after redirects)
470
+ agent-browser wait --url "**/dashboard"
471
+
472
+ # Wait for a JavaScript condition
473
+ agent-browser wait --fn "document.readyState === 'complete'"
474
+
475
+ # Wait a fixed duration (milliseconds) as a last resort
476
+ agent-browser wait 5000
213
477
  ```
214
478
 
215
- ### Console & Errors
479
+ When dealing with consistently slow websites, use `wait --load networkidle` after `open` to ensure the page is fully loaded before taking a snapshot. If a specific element is slow to render, wait for it directly with `wait <selector>` or `wait @ref`.
480
+
481
+ ## Session Management and Cleanup
482
+
483
+ When running multiple agents or automations concurrently, always use named sessions to avoid conflicts:
216
484
 
217
485
  ```bash
218
- agent-browser console # Show browser console output
219
- agent-browser console --clear # Show and clear console
220
- agent-browser errors # Show JavaScript errors only
221
- agent-browser errors --clear # Show and clear errors
486
+ # Each agent gets its own isolated session
487
+ agent-browser --session agent1 open site-a.com
488
+ agent-browser --session agent2 open site-b.com
489
+
490
+ # Check active sessions
491
+ agent-browser session list
222
492
  ```
223
493
 
224
- ## Network
494
+ Always close your browser session when done to avoid leaked processes:
225
495
 
226
496
  ```bash
227
- agent-browser network requests # List captured requests
228
- agent-browser network requests --filter "api" # Filter by URL pattern
229
- agent-browser route "**/*.png" abort # Block matching requests
230
- agent-browser route "https://api.example.com/*" fulfill --status 200 --body '{"mock":true}' # Mock response
231
- agent-browser unroute "**/*.png" # Remove route handler
497
+ agent-browser close # Close default session
498
+ agent-browser --session agent1 close # Close specific session
232
499
  ```
233
500
 
234
- ## Storage
501
+ If a previous session was not closed properly, the daemon may still be running. Use `agent-browser close` to clean it up before starting new work.
235
502
 
236
- ### Cookies
503
+ To auto-shutdown the daemon after a period of inactivity (useful for ephemeral/CI environments):
237
504
 
238
505
  ```bash
239
- agent-browser cookies get # Get all cookies
240
- agent-browser cookies get --name "session" # Get specific cookie
241
- agent-browser cookies set --name "token" --value "abc" # Set cookie
242
- agent-browser cookies clear # Clear all cookies
506
+ AGENT_BROWSER_IDLE_TIMEOUT_MS=60000 agent-browser open example.com
243
507
  ```
244
508
 
245
- ### Local & Session Storage
509
+ ## Ref Lifecycle (Important)
510
+
511
+ Refs (`@e1`, `@e2`, etc.) are invalidated when the page changes. Always re-snapshot after:
512
+
513
+ - Clicking links or buttons that navigate
514
+ - Form submissions
515
+ - Dynamic content loading (dropdowns, modals)
246
516
 
247
517
  ```bash
248
- agent-browser storage local # Get all localStorage
249
- agent-browser storage local --key "theme" # Get specific key
250
- agent-browser storage session # Get all sessionStorage
251
- agent-browser storage session --key "cart" # Get specific key
518
+ agent-browser click @e5 # Navigates to new page
519
+ agent-browser snapshot -i # MUST re-snapshot
520
+ agent-browser click @e1 # Use new refs
252
521
  ```
253
522
 
254
- ## Device & Settings
523
+ ## Annotated Screenshots (Vision Mode)
524
+
525
+ Use `--annotate` to take a screenshot with numbered labels overlaid on interactive elements. Each label `[N]` maps to ref `@eN`. This also caches refs, so you can interact with elements immediately without a separate snapshot.
255
526
 
256
527
  ```bash
257
- agent-browser set viewport 1920 1080 # Set viewport size
258
- agent-browser set device "iPhone 14" # Emulate device
259
- agent-browser set geo --lat 47.6 --lon -122.3 # Set geolocation
260
- agent-browser set offline true # Enable offline mode
261
- agent-browser set offline false # Disable offline mode
262
- agent-browser set media "prefers-color-scheme" "dark" # Set media feature
263
- agent-browser set headers '{"X-Custom":"value"}' # Set extra HTTP headers
264
- agent-browser set credentials "user" "pass" # Set HTTP auth credentials
528
+ agent-browser screenshot --annotate
529
+ # Output includes the image path and a legend:
530
+ # [1] @e1 button "Submit"
531
+ # [2] @e2 link "Home"
532
+ # [3] @e3 textbox "Email"
533
+ agent-browser click @e2 # Click using ref from annotated screenshot
265
534
  ```
266
535
 
267
- ## Element Debugging
536
+ Use annotated screenshots when:
537
+
538
+ - The page has unlabeled icon buttons or visual-only elements
539
+ - You need to verify visual layout or styling
540
+ - Canvas or chart elements are present (invisible to text snapshots)
541
+ - You need spatial reasoning about element positions
542
+
543
+ ## Semantic Locators (Alternative to Refs)
544
+
545
+ When refs are unavailable or unreliable, use semantic locators:
268
546
 
269
547
  ```bash
270
- agent-browser highlight @e1 # Highlight element visually
271
- agent-browser get box @e1 # Get bounding box (x, y, width, height)
272
- agent-browser get styles @e1 # Get computed styles
273
- agent-browser is visible @e1 # Check if element is visible
274
- agent-browser is enabled @e1 # Check if element is enabled
275
- agent-browser is checked @e1 # Check if checkbox/radio is checked
548
+ agent-browser find text "Sign In" click
549
+ agent-browser find label "Email" fill "user@test.com"
550
+ agent-browser find role button click --name "Submit"
551
+ agent-browser find placeholder "Search" type "query"
552
+ agent-browser find testid "submit-btn" click
276
553
  ```
277
554
 
278
- ## Recording & Tracing
555
+ ## JavaScript Evaluation (eval)
556
+
557
+ Use `eval` to run JavaScript in the browser context. **Shell quoting can corrupt complex expressions** -- use `--stdin` or `-b` to avoid issues.
279
558
 
280
559
  ```bash
281
- agent-browser trace start # Start recording trace
282
- agent-browser trace stop trace.zip # Stop and save trace file
283
- agent-browser record start # Start recording video
284
- agent-browser record stop video.webm # Stop and save recording
560
+ # Simple expressions work with regular quoting
561
+ agent-browser eval 'document.title'
562
+ agent-browser eval 'document.querySelectorAll("img").length'
563
+
564
+ # Complex JS: use --stdin with heredoc (RECOMMENDED)
565
+ agent-browser eval --stdin <<'EVALEOF'
566
+ JSON.stringify(
567
+ Array.from(document.querySelectorAll("img"))
568
+ .filter(i => !i.alt)
569
+ .map(i => ({ src: i.src.split("/").pop(), width: i.width }))
570
+ )
571
+ EVALEOF
572
+
573
+ # Alternative: base64 encoding (avoids all shell escaping issues)
574
+ agent-browser eval -b "$(echo -n 'Array.from(document.querySelectorAll("a")).map(a => a.href)' | base64)"
575
+ ```
576
+
577
+ **Why this matters:** When the shell processes your command, inner double quotes, `!` characters (history expansion), backticks, and `$()` can all corrupt the JavaScript before it reaches agent-browser. The `--stdin` and `-b` flags bypass shell interpretation entirely.
578
+
579
+ **Rules of thumb:**
580
+
581
+ - Single-line, no nested quotes -> regular `eval 'expression'` with single quotes is fine
582
+ - Nested quotes, arrow functions, template literals, or multiline -> use `eval --stdin <<'EVALEOF'`
583
+ - Programmatic/generated scripts -> use `eval -b` with base64
584
+
585
+ ## Configuration File
586
+
587
+ Create `agent-browser.json` in the project root for persistent settings:
588
+
589
+ ```json
590
+ {
591
+ "headed": true,
592
+ "proxy": "http://localhost:8080",
593
+ "profile": "./browser-data"
594
+ }
285
595
  ```
286
596
 
287
- ## Tabs & Windows
597
+ Priority (lowest to highest): `~/.agent-browser/config.json` < `./agent-browser.json` < env vars < CLI flags. Use `--config <path>` or `AGENT_BROWSER_CONFIG` env var for a custom config file (exits with error if missing/invalid). All CLI options map to camelCase keys (e.g., `--executable-path` -> `"executablePath"`). Boolean flags accept `true`/`false` values (e.g., `--headed false` overrides config). Extensions from user and project configs are merged, not replaced.
598
+
599
+ ## Browser Engine Selection
600
+
601
+ Use `--engine` to choose a local browser engine. The default is `chrome`.
288
602
 
289
603
  ```bash
290
- agent-browser tab list # List open tabs
291
- agent-browser tab new https://example.com # Open URL in new tab
292
- agent-browser tab close # Close current tab
293
- agent-browser tab 2 # Switch to tab by index
604
+ # Use Lightpanda (fast headless browser, requires separate install)
605
+ agent-browser --engine lightpanda open example.com
606
+
607
+ # Via environment variable
608
+ export AGENT_BROWSER_ENGINE=lightpanda
609
+ agent-browser open example.com
610
+
611
+ # With custom binary path
612
+ agent-browser --engine lightpanda --executable-path /path/to/lightpanda open example.com
294
613
  ```
295
614
 
296
- ## Advanced Mouse
615
+ Supported engines:
616
+ - `chrome` (default) -- Chrome/Chromium via CDP
617
+ - `lightpanda` -- Lightpanda headless browser via CDP (10x faster, 10x less memory than Chrome)
618
+
619
+ Lightpanda does not support `--extension`, `--profile`, `--state`, or `--allow-file-access`. Install Lightpanda from https://lightpanda.io/docs/open-source/installation.
620
+
621
+ ## Deep-Dive Documentation
622
+
623
+ | Reference | When to Use |
624
+ | -------------------------------------------------------------------- | --------------------------------------------------------- |
625
+ | [references/commands.md](references/commands.md) | Full command reference with all options |
626
+ | [references/snapshot-refs.md](references/snapshot-refs.md) | Ref lifecycle, invalidation rules, troubleshooting |
627
+ | [references/session-management.md](references/session-management.md) | Parallel sessions, state persistence, concurrent scraping |
628
+ | [references/authentication.md](references/authentication.md) | Login flows, OAuth, 2FA handling, state reuse |
629
+ | [references/video-recording.md](references/video-recording.md) | Recording workflows for debugging and documentation |
630
+ | [references/profiling.md](references/profiling.md) | Chrome DevTools profiling for performance analysis |
631
+ | [references/proxy-support.md](references/proxy-support.md) | Proxy configuration, geo-testing, rotating proxies |
632
+
633
+ ## Ready-to-Use Templates
634
+
635
+ | Template | Description |
636
+ | ------------------------------------------------------------------------ | ----------------------------------- |
637
+ | [templates/form-automation.sh](templates/form-automation.sh) | Form filling with validation |
638
+ | [templates/authenticated-session.sh](templates/authenticated-session.sh) | Login once, reuse state |
639
+ | [templates/capture-workflow.sh](templates/capture-workflow.sh) | Content extraction with screenshots |
297
640
 
298
641
  ```bash
299
- agent-browser mouse move 100 200 # Move mouse to coordinates
300
- agent-browser mouse down # Press mouse button
301
- agent-browser mouse up # Release mouse button
302
- agent-browser mouse wheel 0 500 # Scroll (deltaX, deltaY)
303
- agent-browser drag @e1 @e2 # Drag from element to element
642
+ ./templates/form-automation.sh https://example.com/form
643
+ ./templates/authenticated-session.sh https://app.example.com/login
644
+ ./templates/capture-workflow.sh https://example.com ./output
304
645
  ```
305
646
 
306
647
  ## vs Playwright MCP