agent-browser-stealth 0.17.0-fork.2 → 0.24.0-fork.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (122) hide show
  1. package/README.md +1256 -240
  2. package/bin/agent-browser-darwin-arm64 +0 -0
  3. package/bin/agent-browser-darwin-x64 +0 -0
  4. package/bin/agent-browser-linux-arm64 +0 -0
  5. package/bin/agent-browser-linux-x64 +0 -0
  6. package/bin/agent-browser-win32-x64.exe +0 -0
  7. package/bin/agent-browser.js +13 -2
  8. package/extensions/tab-group-cdp/content-script.js +425 -0
  9. package/extensions/tab-group-cdp/icons/icon.svg +7 -0
  10. package/extensions/tab-group-cdp/manifest.json +34 -0
  11. package/extensions/tab-group-cdp/page-bridge.js +133 -0
  12. package/extensions/tab-group-cdp/service-worker.js +2249 -0
  13. package/extensions/tab-group-cdp/sidepanel.css +258 -0
  14. package/extensions/tab-group-cdp/sidepanel.html +28 -0
  15. package/extensions/tab-group-cdp/sidepanel.js +1225 -0
  16. package/package.json +17 -69
  17. package/scripts/build-all-platforms.sh +6 -0
  18. package/scripts/check-version-sync.js +14 -2
  19. package/scripts/copy-native.js +8 -50
  20. package/scripts/postinstall.js +149 -165
  21. package/scripts/windows-debug/provision.sh +220 -0
  22. package/scripts/windows-debug/run.sh +92 -0
  23. package/scripts/windows-debug/start.sh +43 -0
  24. package/scripts/windows-debug/stop.sh +28 -0
  25. package/scripts/windows-debug/sync.sh +27 -0
  26. package/skills/agent-browser/SKILL.md +256 -159
  27. package/skills/agent-browser/references/authentication.md +101 -0
  28. package/skills/agent-browser/references/commands.md +34 -2
  29. package/skills/agent-browser/references/snapshot-refs.md +25 -0
  30. package/skills/agentcore/SKILL.md +115 -0
  31. package/skills/dogfood/SKILL.md +4 -2
  32. package/skills/electron/SKILL.md +26 -2
  33. package/skills/slack/SKILL.md +0 -9
  34. package/skills/slack/references/slack-tasks.md +2 -8
  35. package/skills/vercel-sandbox/SKILL.md +280 -0
  36. package/bin/agent-browser-local +0 -0
  37. package/bin/agent-browser-stealth +0 -0
  38. package/bin/agent-browser-stealth.d +0 -1
  39. package/dist/action-policy.d.ts +0 -14
  40. package/dist/action-policy.d.ts.map +0 -1
  41. package/dist/action-policy.js +0 -253
  42. package/dist/action-policy.js.map +0 -1
  43. package/dist/actions.d.ts +0 -21
  44. package/dist/actions.d.ts.map +0 -1
  45. package/dist/actions.js +0 -2139
  46. package/dist/actions.js.map +0 -1
  47. package/dist/auth-cli.d.ts +0 -2
  48. package/dist/auth-cli.d.ts.map +0 -1
  49. package/dist/auth-cli.js +0 -97
  50. package/dist/auth-cli.js.map +0 -1
  51. package/dist/auth-vault.d.ts +0 -36
  52. package/dist/auth-vault.d.ts.map +0 -1
  53. package/dist/auth-vault.js +0 -125
  54. package/dist/auth-vault.js.map +0 -1
  55. package/dist/browser.d.ts +0 -665
  56. package/dist/browser.d.ts.map +0 -1
  57. package/dist/browser.js +0 -3210
  58. package/dist/browser.js.map +0 -1
  59. package/dist/confirmation.d.ts +0 -8
  60. package/dist/confirmation.d.ts.map +0 -1
  61. package/dist/confirmation.js +0 -30
  62. package/dist/confirmation.js.map +0 -1
  63. package/dist/daemon.d.ts +0 -78
  64. package/dist/daemon.d.ts.map +0 -1
  65. package/dist/daemon.js +0 -744
  66. package/dist/daemon.js.map +0 -1
  67. package/dist/diff.d.ts +0 -18
  68. package/dist/diff.d.ts.map +0 -1
  69. package/dist/diff.js +0 -271
  70. package/dist/diff.js.map +0 -1
  71. package/dist/domain-filter.d.ts +0 -28
  72. package/dist/domain-filter.d.ts.map +0 -1
  73. package/dist/domain-filter.js +0 -149
  74. package/dist/domain-filter.js.map +0 -1
  75. package/dist/encryption.d.ts +0 -73
  76. package/dist/encryption.d.ts.map +0 -1
  77. package/dist/encryption.js +0 -171
  78. package/dist/encryption.js.map +0 -1
  79. package/dist/ios-actions.d.ts +0 -11
  80. package/dist/ios-actions.d.ts.map +0 -1
  81. package/dist/ios-actions.js +0 -228
  82. package/dist/ios-actions.js.map +0 -1
  83. package/dist/ios-manager.d.ts +0 -266
  84. package/dist/ios-manager.d.ts.map +0 -1
  85. package/dist/ios-manager.js +0 -1073
  86. package/dist/ios-manager.js.map +0 -1
  87. package/dist/protocol.d.ts +0 -26
  88. package/dist/protocol.d.ts.map +0 -1
  89. package/dist/protocol.js +0 -990
  90. package/dist/protocol.js.map +0 -1
  91. package/dist/snapshot.d.ts +0 -67
  92. package/dist/snapshot.d.ts.map +0 -1
  93. package/dist/snapshot.js +0 -514
  94. package/dist/snapshot.js.map +0 -1
  95. package/dist/state-utils.d.ts +0 -77
  96. package/dist/state-utils.d.ts.map +0 -1
  97. package/dist/state-utils.js +0 -178
  98. package/dist/state-utils.js.map +0 -1
  99. package/dist/stealth.d.ts +0 -41
  100. package/dist/stealth.d.ts.map +0 -1
  101. package/dist/stealth.js +0 -1743
  102. package/dist/stealth.js.map +0 -1
  103. package/dist/stream-server.d.ts +0 -117
  104. package/dist/stream-server.d.ts.map +0 -1
  105. package/dist/stream-server.js +0 -309
  106. package/dist/stream-server.js.map +0 -1
  107. package/dist/types.d.ts +0 -973
  108. package/dist/types.d.ts.map +0 -1
  109. package/dist/types.js +0 -2
  110. package/dist/types.js.map +0 -1
  111. package/scripts/check-creepjs-headless.js +0 -137
  112. package/scripts/check-daemon-pid-recovery.js +0 -148
  113. package/scripts/check-sannysoft-webdriver.js +0 -112
  114. package/scripts/check-stealth-regression.js +0 -199
  115. package/scripts/check-turnstile-testkey.ts +0 -125
  116. package/scripts/clawhub-sync.sh +0 -27
  117. package/scripts/sync-upstream.sh +0 -142
  118. package/scripts/verify-bundled-binaries.js +0 -71
  119. package/scripts/verify-native-version.js +0 -48
  120. package/scripts/verify-packed-host-binary.js +0 -88
  121. package/scripts/verify-registry-host-binary.js +0 -120
  122. package/skills/agent-browser-stealth/SKILL.md +0 -127
@@ -1,14 +1,12 @@
1
1
  ---
2
2
  name: agent-browser
3
3
  description: Browser automation CLI for AI agents. Use when the user needs to interact with websites, including navigating pages, filling forms, clicking buttons, taking screenshots, extracting data, testing web apps, or automating any browser task. Triggers include requests to "open a website", "fill out a form", "click a button", "take a screenshot", "scrape data from a page", "test this web app", "login to a site", "automate browser actions", or any task requiring programmatic web interaction.
4
- allowed-tools: Bash(npx agent-browser-stealth:*), Bash(npx agent-browser:*), Bash(agent-browser:*), Bash(abs:*)
4
+ allowed-tools: Bash(npx agent-browser:*), Bash(agent-browser:*)
5
5
  ---
6
6
 
7
7
  # Browser Automation with agent-browser
8
8
 
9
- Install package: `pnpm add -g agent-browser-stealth` (CLI commands: `agent-browser`, `agent-browser-stealth`, and short alias `abs`). If global install is unavailable in your environment, use `pnpm dlx agent-browser-stealth <command>` for one-off runs.
10
-
11
- Use `agent-browser start` when you want to pre-warm the managed automation browser on `localhost:9333` before the actual navigation or task commands run.
9
+ The CLI uses Chrome/Chromium via CDP directly. Install via `npm i -g agent-browser`, `brew install agent-browser`, or `cargo install agent-browser`. Run `agent-browser install` to download Chrome. Existing Chrome, Brave, Playwright, and Puppeteer installations are detected automatically. Run `agent-browser upgrade` to update to the latest version.
12
10
 
13
11
  ## Core Workflow
14
12
 
@@ -48,31 +46,85 @@ agent-browser open https://example.com && agent-browser wait --load networkidle
48
46
 
49
47
  **When to chain:** Use `&&` when you don't need to read the output of an intermediate command before proceeding (e.g., open + wait + screenshot). Run commands separately when you need to parse the output first (e.g., snapshot to discover refs, then interact using those refs).
50
48
 
49
+ ## Handling Authentication
50
+
51
+ When automating a site that requires login, choose the approach that fits:
52
+
53
+ **Option 1: Import auth from the user's browser (fastest for one-off tasks)**
54
+
55
+ ```bash
56
+ # Connect to the user's running Chrome (they're already logged in)
57
+ agent-browser --auto-connect state save ./auth.json
58
+ # Use that auth state
59
+ agent-browser --state ./auth.json open https://app.example.com/dashboard
60
+ ```
61
+
62
+ State files contain session tokens in plaintext -- add to `.gitignore` and delete when no longer needed. Set `AGENT_BROWSER_ENCRYPTION_KEY` for encryption at rest.
63
+
64
+ **Option 2: Persistent profile (simplest for recurring tasks)**
65
+
66
+ ```bash
67
+ # First run: login manually or via automation
68
+ agent-browser --profile ~/.myapp open https://app.example.com/login
69
+ # ... fill credentials, submit ...
70
+
71
+ # All future runs: already authenticated
72
+ agent-browser --profile ~/.myapp open https://app.example.com/dashboard
73
+ ```
74
+
75
+ **Option 3: Session name (auto-save/restore cookies + localStorage)**
76
+
77
+ ```bash
78
+ agent-browser --session-name myapp open https://app.example.com/login
79
+ # ... login flow ...
80
+ agent-browser close # State auto-saved
81
+
82
+ # Next time: state auto-restored
83
+ agent-browser --session-name myapp open https://app.example.com/dashboard
84
+ ```
85
+
86
+ **Option 4: Auth vault (credentials stored encrypted, login by name)**
87
+
88
+ ```bash
89
+ echo "$PASSWORD" | agent-browser auth save myapp --url https://app.example.com/login --username user --password-stdin
90
+ agent-browser auth login myapp
91
+ ```
92
+
93
+ `auth login` navigates with `load` and then waits for login form selectors to appear before filling/clicking, which is more reliable on delayed SPA login screens.
94
+
95
+ **Option 5: State file (manual save/load)**
96
+
97
+ ```bash
98
+ # After logging in:
99
+ agent-browser state save ./auth.json
100
+ # In a future session:
101
+ agent-browser state load ./auth.json
102
+ agent-browser open https://app.example.com/dashboard
103
+ ```
104
+
105
+ See [references/authentication.md](references/authentication.md) for OAuth, 2FA, cookie-based auth, and token refresh patterns.
106
+
51
107
  ## Essential Commands
52
108
 
53
109
  ```bash
54
110
  # Navigation
55
- agent-browser start # Start/reuse managed browser on localhost:9333
56
111
  agent-browser open <url> # Navigate (aliases: goto, navigate)
57
- agent-browser --risk-mode block open <url> # Block if verification/captcha interstitial is detected
58
- agent-browser doctor # Diagnose CDP + sourceURL + tab-group plugin health
59
112
  agent-browser close # Close browser
60
- agent-browser --version # Show CLI version (fork builds include upstream/fork)
113
+ agent-browser close --all # Close all active sessions
61
114
 
62
115
  # Snapshot
63
116
  agent-browser snapshot -i # Interactive elements with refs (recommended)
64
- agent-browser snapshot -i -C # Include cursor-interactive elements (divs with onclick, cursor:pointer)
65
117
  agent-browser snapshot -s "#selector" # Scope to CSS selector
66
118
 
67
119
  # Interaction (use @refs from snapshot)
68
120
  agent-browser click @e1 # Click element
69
121
  agent-browser click @e1 --new-tab # Click and open in new tab
70
122
  agent-browser fill @e2 "text" # Clear and type text
71
- agent-browser type @e2 "text" --delay 120 # Type without clearing (human-like pacing)
123
+ agent-browser type @e2 "text" # Type without clearing
72
124
  agent-browser select @e1 "option" # Select dropdown option
73
125
  agent-browser check @e1 # Check checkbox
74
126
  agent-browser press Enter # Press key
75
- agent-browser keyboard type "text" --delay 90 # Type at current focus (no selector)
127
+ agent-browser keyboard type "text" # Type at current focus (no selector)
76
128
  agent-browser keyboard inserttext "text" # Insert without key events
77
129
  agent-browser scroll down 500 # Scroll page
78
130
  agent-browser scroll down 500 --selector "div.content" # Scroll within a specific container
@@ -81,26 +133,66 @@ agent-browser scroll down 500 --selector "div.content" # Scroll within a specif
81
133
  agent-browser get text @e1 # Get element text
82
134
  agent-browser get url # Get current URL
83
135
  agent-browser get title # Get page title
136
+ agent-browser get cdp-url # Get CDP WebSocket URL
84
137
 
85
138
  # Wait
86
139
  agent-browser wait @e1 # Wait for element
87
140
  agent-browser wait --load networkidle # Wait for network idle
88
141
  agent-browser wait --url "**/page" # Wait for URL pattern
89
142
  agent-browser wait 2000 # Wait milliseconds
90
- agent-browser wait 2000-5000 # Random wait between 2-5 seconds
143
+ agent-browser wait --text "Welcome" # Wait for text to appear (substring match)
144
+ agent-browser wait --fn "!document.body.innerText.includes('Loading...')" # Wait for text to disappear
145
+ agent-browser wait "#spinner" --state hidden # Wait for element to disappear
91
146
 
92
147
  # Downloads
93
148
  agent-browser download @e1 ./file.pdf # Click element to trigger download
94
149
  agent-browser wait --download ./output.zip # Wait for any download to complete
95
150
  agent-browser --download-path ./downloads open <url> # Set default download directory
96
- agent-browser --tab-group "My Agent Group" open <url> # Override default tab-group base title
151
+
152
+ # Network
153
+ agent-browser network requests # Inspect tracked requests
154
+ agent-browser network requests --type xhr,fetch # Filter by resource type
155
+ agent-browser network requests --method POST # Filter by HTTP method
156
+ agent-browser network requests --status 2xx # Filter by status (200, 2xx, 400-499)
157
+ agent-browser network request <requestId> # View full request/response detail
158
+ agent-browser network route "**/api/*" --abort # Block matching requests
159
+ agent-browser network har start # Start HAR recording
160
+ agent-browser network har stop ./capture.har # Stop and save HAR file
161
+
162
+ # Viewport & Device Emulation
163
+ agent-browser set viewport 1920 1080 # Set viewport size (default: 1280x720)
164
+ agent-browser set viewport 1920 1080 2 # 2x retina (same CSS size, higher res screenshots)
165
+ agent-browser set device "iPhone 14" # Emulate device (viewport + user agent)
97
166
 
98
167
  # Capture
99
168
  agent-browser screenshot # Screenshot to temp dir
100
169
  agent-browser screenshot --full # Full page screenshot
101
170
  agent-browser screenshot --annotate # Annotated screenshot with numbered element labels
171
+ agent-browser screenshot --screenshot-dir ./shots # Save to custom directory
172
+ agent-browser screenshot --screenshot-format jpeg --screenshot-quality 80
102
173
  agent-browser pdf output.pdf # Save as PDF
103
174
 
175
+ # Live preview / streaming
176
+ agent-browser stream enable # Start runtime WebSocket streaming on an auto-selected port
177
+ agent-browser stream enable --port 9223 # Bind a specific localhost port
178
+ agent-browser stream status # Inspect enabled state, port, connection, and screencasting
179
+ agent-browser stream disable # Stop runtime streaming and remove the .stream metadata file
180
+
181
+ # Clipboard
182
+ agent-browser clipboard read # Read text from clipboard
183
+ agent-browser clipboard write "Hello, World!" # Write text to clipboard
184
+ agent-browser clipboard copy # Copy current selection
185
+ agent-browser clipboard paste # Paste from clipboard
186
+
187
+ # Dialogs (alert, confirm, prompt, beforeunload)
188
+ # By default, alert and beforeunload dialogs are auto-accepted so they never block the agent.
189
+ # confirm and prompt dialogs still require explicit handling.
190
+ # Use --no-auto-dialog (or AGENT_BROWSER_NO_AUTO_DIALOG=1) to disable automatic handling.
191
+ agent-browser dialog accept # Accept dialog
192
+ agent-browser dialog accept "my input" # Accept prompt dialog with text
193
+ agent-browser dialog dismiss # Dismiss/cancel dialog
194
+ agent-browser dialog status # Check if a dialog is currently open
195
+
104
196
  # Diff (compare page states)
105
197
  agent-browser diff snapshot # Compare current vs last snapshot
106
198
  agent-browser diff snapshot --baseline before.txt # Compare current vs saved file
@@ -110,6 +202,28 @@ agent-browser diff url <url1> <url2> --wait-until networkidle # Custom wait str
110
202
  agent-browser diff url <url1> <url2> --selector "#main" # Scope to element
111
203
  ```
112
204
 
205
+ ## Streaming
206
+
207
+ Every session automatically starts a WebSocket stream server on an OS-assigned port. Use `agent-browser stream status` to see the bound port and connection state. Use `stream disable` to tear it down, and `stream enable --port <port>` to re-enable on a specific port.
208
+
209
+ ## Batch Execution
210
+
211
+ Execute multiple commands in a single invocation by piping a JSON array of string arrays to `batch`. This avoids per-command process startup overhead when running multi-step workflows.
212
+
213
+ ```bash
214
+ echo '[
215
+ ["open", "https://example.com"],
216
+ ["snapshot", "-i"],
217
+ ["click", "@e1"],
218
+ ["screenshot", "result.png"]
219
+ ]' | agent-browser batch --json
220
+
221
+ # Stop on first error
222
+ agent-browser batch --bail < commands.json
223
+ ```
224
+
225
+ Use `batch` when you have a known sequence of commands that don't depend on intermediate output. Use separate commands or `&&` chaining when you need to parse output between steps (e.g., snapshot to discover refs, then interact).
226
+
113
227
  ## Common Patterns
114
228
 
115
229
  ### Form Submission
@@ -141,6 +255,8 @@ agent-browser auth show github
141
255
  agent-browser auth delete github
142
256
  ```
143
257
 
258
+ `auth login` waits for username/password/submit selectors before interacting, with a timeout tied to the default action timeout.
259
+
144
260
  ### Authentication with State Persistence
145
261
 
146
262
  ```bash
@@ -158,25 +274,10 @@ agent-browser state load auth.json
158
274
  agent-browser open https://app.example.com/dashboard
159
275
  ```
160
276
 
161
- ### Cookie Injection for Auth Callbacks
162
-
163
- ```bash
164
- # Before navigation: set by URL
165
- agent-browser cookies set session_id "abc123" --url https://app.example.com/api/auth/sso/callback
166
-
167
- # Explicit domain/path pair (must be provided together)
168
- agent-browser cookies set auth_token "xyz789" --domain .example.com --path /api
169
-
170
- # Or navigate first and rely on current URL
171
- agent-browser open https://app.example.com/api/auth/sso/callback
172
- agent-browser cookies set callback_token "token123"
173
- ```
174
-
175
277
  ### Session Persistence
176
278
 
177
279
  ```bash
178
280
  # Auto-save/restore cookies and localStorage across browser restarts
179
- # If --session-name is omitted, it defaults to "default"
180
281
  agent-browser --session-name myapp open https://app.example.com/login
181
282
  # ... login flow ...
182
283
  agent-browser close # State auto-saved to ~/.agent-browser/sessions/
@@ -195,6 +296,30 @@ agent-browser state clear myapp
195
296
  agent-browser state clean --older-than 7
196
297
  ```
197
298
 
299
+ ### Working with Iframes
300
+
301
+ Iframe content is automatically inlined in snapshots. Refs inside iframes carry frame context, so you can interact with them directly.
302
+
303
+ ```bash
304
+ agent-browser open https://example.com/checkout
305
+ agent-browser snapshot -i
306
+ # @e1 [heading] "Checkout"
307
+ # @e2 [Iframe] "payment-frame"
308
+ # @e3 [input] "Card number"
309
+ # @e4 [input] "Expiry"
310
+ # @e5 [button] "Pay"
311
+
312
+ # Interact directly — no frame switch needed
313
+ agent-browser fill @e3 "4111111111111111"
314
+ agent-browser fill @e4 "12/28"
315
+ agent-browser click @e5
316
+
317
+ # To scope a snapshot to one iframe:
318
+ agent-browser frame @e2
319
+ agent-browser snapshot -i # Only iframe content
320
+ agent-browser frame main # Return to main frame
321
+ ```
322
+
198
323
  ### Data Extraction
199
324
 
200
325
  ```bash
@@ -208,59 +333,30 @@ agent-browser snapshot -i --json
208
333
  agent-browser get text @e1 --json
209
334
  ```
210
335
 
211
- ### Parallel Workflows
336
+ ### Parallel Sessions
212
337
 
213
338
  ```bash
214
- agent-browser --parallel site1 open https://site-a.com
215
- agent-browser --parallel site2 open https://site-b.com
339
+ agent-browser --session site1 open https://site-a.com
340
+ agent-browser --session site2 open https://site-b.com
216
341
 
217
- agent-browser --parallel site1 snapshot -i
218
- agent-browser --parallel site2 snapshot -i
219
- ```
342
+ agent-browser --session site1 snapshot -i
343
+ agent-browser --session site2 snapshot -i
220
344
 
221
- Use `--parallel <name>` for stateless throughput tasks (navigation, extraction, checks). For login/auth continuity, use `--session-name` instead.
222
- Default-session commands intentionally reap all non-default daemon sessions (`parallel-*` and legacy named channels) to prevent stale daemon reuse.
345
+ agent-browser session list
346
+ ```
223
347
 
224
348
  ### Connect to Existing Chrome
225
349
 
226
- By default in this fork, commands without `--cdp` use a dedicated automation browser with this order:
227
-
228
- 1. Try CDP at `localhost:9333`
229
- 2. If unavailable, auto-start Chrome with the persistent profile `~/.agent-browser/chrome-bot-profile`
230
- 3. If managed `:9333` startup fails, exit with guidance
231
-
232
350
  ```bash
233
- # Explicitly attach to an existing manual Chrome session
351
+ # Auto-discover running Chrome with remote debugging enabled
234
352
  agent-browser --auto-connect open https://example.com
235
353
  agent-browser --auto-connect snapshot
236
354
 
237
355
  # Or with explicit CDP port
238
356
  agent-browser --cdp 9222 snapshot
239
-
240
- # Debug auto-attach behavior
241
- agent-browser --debug snapshot
242
-
243
- # Diagnose CDP + sourceURL + plugin handshake status
244
- agent-browser doctor
245
-
246
- # Avoid `open` timeout on challenge-heavy pages
247
- agent-browser --wait-until domcontentloaded open https://example.com
248
-
249
- # Deterministic Turnstile smoke check (official test key)
250
- pnpm run check:turnstile-testkey
251
357
  ```
252
358
 
253
- ### Daemon Lifecycle
254
-
255
- Daemons auto-shutdown after 10 minutes of inactivity.
256
-
257
- Use `--resident` for long-running workflows that should not auto-close:
258
-
259
- ```bash
260
- agent-browser --resident open https://example.com
261
- # keep running until explicit close
262
- agent-browser close
263
- ```
359
+ Auto-connect discovers Chrome via `DevToolsActivePort`, common debugging ports (9222, 9229), and falls back to a direct WebSocket connection if HTTP-based CDP discovery fails.
264
360
 
265
361
  ### Color Scheme (Dark Mode)
266
362
 
@@ -275,51 +371,41 @@ AGENT_BROWSER_COLOR_SCHEME=dark agent-browser open https://example.com
275
371
  agent-browser set media dark
276
372
  ```
277
373
 
278
- ### Tab Grouping
374
+ ### Viewport & Responsive Testing
279
375
 
280
376
  ```bash
281
- # CDP mode groups tabs when tab-group extension is installed
282
- agent-browser open https://example.com
377
+ # Set a custom viewport size (default is 1280x720)
378
+ agent-browser set viewport 1920 1080
379
+ agent-browser screenshot desktop.png
283
380
 
284
- # Override the default group title
285
- agent-browser --tab-group "My Agent Group" open https://example.com
381
+ # Test mobile-width layout
382
+ agent-browser set viewport 375 812
383
+ agent-browser screenshot mobile.png
286
384
 
287
- # Or via environment variable
288
- AGENT_BROWSER_TAB_GROUP="My Agent Group" agent-browser open https://example.com
289
-
290
- # Override expected extension ID if needed
291
- AGENT_BROWSER_TAB_GROUP_PLUGIN_ID="<extension-id>" agent-browser open https://example.com
292
- ```
293
-
294
- Notes:
295
-
296
- - Works in CDP mode via extension handshake.
297
- - Extension package name in Chrome: `agent-browser-stealth`.
298
- - Extension installed and reachable: tabs are grouped by session.
299
- - Extension missing/unavailable: silent no-op (no warning/error unless debug mode).
300
- - Default titles:
301
- - `default` session: `Agent Browser Stealth`
302
- - non-default session: `Agent Browser Stealth • <session>`
303
- - Additional extension-side capabilities:
304
- - Session window isolation + deterministic group colors.
305
- - Side panel browser controls: `open` / `back` / `forward` / `reload` + selector-based `click` / `fill` / `press`.
306
- - Side panel developer signals: page console events, fetch/xhr network events, command history, and on-demand DOM snapshots.
307
- - Side panel automation: action recording, workflow run, slash shortcut binding, and scheduled execution (daily/weekly/monthly/yearly).
308
- - Side panel operations: Focus / Keep Only This / Clean Empty Groups + isolation/auto-clean toggles.
309
- - Session allowlist policy editing and fallback blocking (`about:blank`).
310
- - Download auto-routing to `agent-browser-stealth/<session>/...`.
385
+ # Retina/HiDPI: same CSS layout at 2x pixel density
386
+ # Screenshots stay at logical viewport size, but content renders at higher DPI
387
+ agent-browser set viewport 1920 1080 2
388
+ agent-browser screenshot retina.png
389
+
390
+ # Device emulation (sets viewport + user agent in one step)
391
+ agent-browser set device "iPhone 14"
392
+ agent-browser screenshot device.png
393
+ ```
394
+
395
+ The `scale` parameter (3rd argument) sets `window.devicePixelRatio` without changing CSS layout. Use it when testing retina rendering or capturing higher-resolution screenshots.
311
396
 
312
397
  ### Visual Browser (Debugging)
313
398
 
314
399
  ```bash
315
400
  agent-browser --headed open https://example.com
316
401
  agent-browser highlight @e1 # Highlight element
402
+ agent-browser inspect # Open Chrome DevTools for the active page
317
403
  agent-browser record start demo.webm # Record session
318
404
  agent-browser profiler start # Start Chrome DevTools profiling
319
405
  agent-browser profiler stop trace.json # Stop and save profile (path optional)
320
406
  ```
321
407
 
322
- Use `AGENT_BROWSER_HEADED=1` or `AGENT_BROWSER_HEADED=true` to enable headed mode via environment variable. In this fork, local launches and extension launches stay headed by default unless headless is explicitly requested.
408
+ Use `AGENT_BROWSER_HEADED=1` to enable headed mode via environment variable. Browser extensions work in both headed and headless mode.
323
409
 
324
410
  ### Local Files (PDFs, HTML)
325
411
 
@@ -330,42 +416,6 @@ agent-browser --allow-file-access open file:///path/to/page.html
330
416
  agent-browser screenshot output.png
331
417
  ```
332
418
 
333
- ### Project Policy
334
-
335
- - `--profile` / `AGENT_BROWSER_PROFILE` are forbidden
336
- - `--channel` / `AGENT_BROWSER_CHANNEL` are forbidden
337
- - Default mode uses the managed CDP browser on `localhost:9333`; use `--auto-connect` only for explicit existing-browser attachment
338
-
339
- ### Stealth Mode (Always On)
340
-
341
- Stealth is always active -- no flags needed. All sessions automatically apply anti-detection patches (navigator.webdriver removal, UA override, plugin injection, WebGL masking, humanized interactions, etc.).
342
-
343
- Chromium launches in managed mode use Chrome channel by default for a genuine browser binary fingerprint.
344
-
345
- For best results against strong bot detection, use `--headed` and keep one stable `--session-name`.
346
-
347
- ### Auto Region Detection
348
-
349
- The browser automatically detects the target site's region from the URL TLD and sets matching locale, timezone, and Accept-Language headers. For example, navigating to `shopee.tw` sets locale `zh-TW` and timezone `Asia/Taipei`. This reduces server-side risk scoring from region-signal mismatches.
350
-
351
- Override: `AGENT_BROWSER_LOCALE`, `AGENT_BROWSER_TIMEZONE` env vars.
352
-
353
- ### Captcha Detection & Auto-Retry
354
-
355
- When a navigation lands on a captcha/verification page, behavior is controlled by `--risk-mode` (or `AGENT_BROWSER_RISK_MODE`):
356
-
357
- - `warn` (default): wait for auto-clear first, then retry up to 2 times with randomized backoff (3-7s), then return warning plus structured `riskSignals`
358
- - `block`: fail fast once a risk interstitial is detected
359
- - `off`: disable this detection/retry path
360
-
361
- Examples:
362
-
363
- ```bash
364
- agent-browser --risk-mode warn open https://example.com
365
- agent-browser --risk-mode block open https://example.com
366
- AGENT_BROWSER_RISK_MODE=off agent-browser open https://example.com
367
- ```
368
-
369
419
  ### iOS Simulator (Mobile Safari)
370
420
 
371
421
  ```bash
@@ -470,7 +520,7 @@ agent-browser diff url https://staging.example.com https://prod.example.com --sc
470
520
 
471
521
  ## Timeouts and Slow Pages
472
522
 
473
- The default Playwright timeout is 25 seconds for local browsers. This can be overridden with the `AGENT_BROWSER_DEFAULT_TIMEOUT` environment variable (value in milliseconds). For slow websites or large pages, use explicit waits instead of relying on the default timeout:
523
+ The default timeout is 25 seconds. This can be overridden with the `AGENT_BROWSER_DEFAULT_TIMEOUT` environment variable (value in milliseconds). For slow websites or large pages, use explicit waits instead of relying on the default timeout:
474
524
 
475
525
  ```bash
476
526
  # Wait for network activity to settle (best for slow pages)
@@ -488,35 +538,58 @@ agent-browser wait --fn "document.readyState === 'complete'"
488
538
 
489
539
  # Wait a fixed duration (milliseconds) as a last resort
490
540
  agent-browser wait 5000
491
-
492
- # Random wait between 2-5 seconds (useful for anti-detection)
493
- agent-browser wait 2000-5000
494
541
  ```
495
542
 
496
543
  When dealing with consistently slow websites, use `wait --load networkidle` after `open` to ensure the page is fully loaded before taking a snapshot. If a specific element is slow to render, wait for it directly with `wait <selector>` or `wait @ref`.
497
544
 
498
- ### Humanized Interactions
545
+ ## JavaScript Dialogs (alert / confirm / prompt)
546
+
547
+ When a page opens a JavaScript dialog (`alert()`, `confirm()`, or `prompt()`), it blocks all other browser commands (snapshot, screenshot, click, etc.) until the dialog is dismissed. If commands start timing out unexpectedly, check for a pending dialog:
548
+
549
+ ```bash
550
+ # Check if a dialog is blocking
551
+ agent-browser dialog status
552
+
553
+ # Accept the dialog (dismiss the alert / click OK)
554
+ agent-browser dialog accept
499
555
 
500
- agent-browser automatically humanizes interactions to avoid behavioral detection:
556
+ # Accept a prompt dialog with input text
557
+ agent-browser dialog accept "my input"
501
558
 
502
- - **Randomized typing**: `type --delay` varies each keystroke delay by +-40%
503
- - **Random wait ranges**: `wait 2000-5000` pauses for a random duration in that range
504
- - **Bezier curve mouse**: Before every `click`, the mouse moves along a natural-looking curve
559
+ # Dismiss the dialog (click Cancel)
560
+ agent-browser dialog dismiss
561
+ ```
505
562
 
506
- These behaviors are always active. For sensitive sites, combine with `--headed` and a stable `--session-name` for best results.
563
+ When a dialog is pending, all command responses include a `warning` field indicating the dialog type and message. In `--json` mode this appears as a `"warning"` key in the response object.
507
564
 
508
565
  ## Session Management and Cleanup
509
566
 
510
- `--session` is ignored in this fork. Runtime defaults to `default`; use `--parallel <name>` for isolated concurrent channels, and `--session-name` for persistence isolation.
511
- When a default-session command runs, non-default daemon sessions are reaped automatically.
567
+ When running multiple agents or automations concurrently, always use named sessions to avoid conflicts:
568
+
569
+ ```bash
570
+ # Each agent gets its own isolated session
571
+ agent-browser --session agent1 open site-a.com
572
+ agent-browser --session agent2 open site-b.com
573
+
574
+ # Check active sessions
575
+ agent-browser session list
576
+ ```
512
577
 
513
578
  Always close your browser session when done to avoid leaked processes:
514
579
 
515
580
  ```bash
516
- agent-browser close
581
+ agent-browser close # Close default session
582
+ agent-browser --session agent1 close # Close specific session
583
+ agent-browser close --all # Close all active sessions
517
584
  ```
518
585
 
519
- If a previous session was not closed properly, the daemon may still be running. Use `agent-browser close` to clean it up before starting new work.
586
+ If a previous session was not closed properly, the daemon may still be running. Use `agent-browser close` to clean it up, or `agent-browser close --all` to shut down every session at once.
587
+
588
+ To auto-shutdown the daemon after a period of inactivity (useful for ephemeral/CI environments):
589
+
590
+ ```bash
591
+ AGENT_BROWSER_IDLE_TIMEOUT_MS=60000 agent-browser open example.com
592
+ ```
520
593
 
521
594
  ## Ref Lifecycle (Important)
522
595
 
@@ -601,7 +674,8 @@ Create `agent-browser.json` in the project root for persistent settings:
601
674
  ```json
602
675
  {
603
676
  "headed": true,
604
- "proxy": "http://localhost:8080"
677
+ "proxy": "http://localhost:8080",
678
+ "profile": "./browser-data"
605
679
  }
606
680
  ```
607
681
 
@@ -619,20 +693,24 @@ Priority (lowest to highest): `~/.agent-browser/config.json` < `./agent-browser.
619
693
  | [references/profiling.md](references/profiling.md) | Chrome DevTools profiling for performance analysis |
620
694
  | [references/proxy-support.md](references/proxy-support.md) | Proxy configuration, geo-testing, rotating proxies |
621
695
 
622
- ## Experimental: Native Mode
696
+ ## Cloud Providers
623
697
 
624
- agent-browser has an experimental native Rust daemon that communicates with Chrome directly via CDP, bypassing Node.js and Playwright entirely. It is opt-in and not recommended for production use yet.
698
+ Use `-p <provider>` (or `AGENT_BROWSER_PROVIDER`) to run against a cloud browser instead of launching a local Chrome instance. Supported providers: `agentcore`, `browserbase`, `browserless`, `browseruse`, `kernel`.
699
+
700
+ ### AgentCore (AWS Bedrock)
625
701
 
626
702
  ```bash
627
- # Enable via flag
628
- agent-browser --native open example.com
703
+ # Credentials auto-resolved from env vars or AWS CLI (SSO, IAM roles, etc.)
704
+ agent-browser -p agentcore open https://example.com
629
705
 
630
- # Enable via environment variable (avoids passing --native every time)
631
- export AGENT_BROWSER_NATIVE=1
632
- agent-browser open example.com
706
+ # With persistent browser profile
707
+ AGENTCORE_PROFILE_ID=my-profile agent-browser -p agentcore open https://example.com
708
+
709
+ # With explicit region
710
+ AGENTCORE_REGION=eu-west-1 agent-browser -p agentcore open https://example.com
633
711
  ```
634
712
 
635
- The native daemon supports Chromium and Safari (via WebDriver). Firefox and WebKit are not yet supported. All core commands (navigate, snapshot, click, fill, screenshot, cookies, storage, tabs, eval, etc.) work identically in native mode. Use `agent-browser close` before switching between native and default mode within the same session.
713
+ Set `AWS_PROFILE` to select a named AWS profile.
636
714
 
637
715
  ## Browser Engine Selection
638
716
 
@@ -646,16 +724,35 @@ agent-browser --engine lightpanda open example.com
646
724
  export AGENT_BROWSER_ENGINE=lightpanda
647
725
  agent-browser open example.com
648
726
 
649
- # With a custom binary path
727
+ # With custom binary path
650
728
  agent-browser --engine lightpanda --executable-path /path/to/lightpanda open example.com
651
729
  ```
652
730
 
653
731
  Supported engines:
654
-
655
732
  - `chrome` (default) -- Chrome/Chromium via CDP
656
- - `lightpanda` -- Lightpanda headless browser via CDP
733
+ - `lightpanda` -- Lightpanda headless browser via CDP (10x faster, 10x less memory than Chrome)
734
+
735
+ Lightpanda does not support `--extension`, `--profile`, `--state`, or `--allow-file-access`. Install Lightpanda from https://lightpanda.io/docs/open-source/installation.
736
+
737
+ ## Observability Dashboard
738
+
739
+ The dashboard is a standalone background server that shows live browser viewports, command activity, and console output for all sessions.
740
+
741
+ ```bash
742
+ # Install the dashboard once
743
+ agent-browser dashboard install
744
+
745
+ # Start the dashboard server (background, port 4848)
746
+ agent-browser dashboard start
747
+
748
+ # All sessions are automatically visible in the dashboard
749
+ agent-browser open example.com
750
+
751
+ # Stop the dashboard
752
+ agent-browser dashboard stop
753
+ ```
657
754
 
658
- Lightpanda is headless-only and does not support `--extension`, `--state`, `--profile`, or `--allow-file-access`. Install it from https://lightpanda.io/docs/open-source/installation.
755
+ The dashboard runs independently of browser sessions on port 4848 (configurable with `--port`). All sessions automatically stream to the dashboard. Sessions can also be created from the dashboard UI with local engines or cloud providers.
659
756
 
660
757
  ## Ready-to-Use Templates
661
758