agent-browser 0.20.13 → 0.21.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -187,6 +187,25 @@ agent-browser wait "#spinner" --state hidden
187
187
 
188
188
  **Load states:** `load`, `domcontentloaded`, `networkidle`
189
189
 
190
+ ### Batch Execution
191
+
192
+ Execute multiple commands in a single invocation by piping a JSON array of
193
+ string arrays to `batch`. This avoids per-command process startup overhead
194
+ when running multi-step workflows.
195
+
196
+ ```bash
197
+ # Pipe commands as JSON
198
+ echo '[
199
+ ["open", "https://example.com"],
200
+ ["snapshot", "-i"],
201
+ ["click", "@e1"],
202
+ ["screenshot", "result.png"]
203
+ ]' | agent-browser batch --json
204
+
205
+ # Stop on first error
206
+ agent-browser batch --bail < commands.json
207
+ ```
208
+
190
209
  ### Clipboard
191
210
 
192
211
  ```bash
@@ -241,6 +260,8 @@ agent-browser network route <url> --body <json> # Mock response
241
260
  agent-browser network unroute [url] # Remove routes
242
261
  agent-browser network requests # View tracked requests
243
262
  agent-browser network requests --filter api # Filter requests
263
+ agent-browser network har start # Start HAR recording
264
+ agent-browser network har stop [output.har] # Stop and save HAR (temp path if omitted)
244
265
  ```
245
266
 
246
267
  ### Tabs & Windows
@@ -539,7 +560,6 @@ This is useful for multimodal AI models that can reason about visual layout, unl
539
560
  | `-p, --provider <name>` | Cloud browser provider (or `AGENT_BROWSER_PROVIDER` env) |
540
561
  | `--device <name>` | iOS device name, e.g. "iPhone 15 Pro" (or `AGENT_BROWSER_IOS_DEVICE` env) |
541
562
  | `--json` | JSON output (for agents) |
542
- | `--full, -f` | Full page screenshot |
543
563
  | `--annotate` | Annotated screenshot with numbered element labels (or `AGENT_BROWSER_ANNOTATE` env) |
544
564
  | `--screenshot-dir <path>` | Default screenshot output directory (or `AGENT_BROWSER_SCREENSHOT_DIR` env) |
545
565
  | `--screenshot-quality <n>` | JPEG quality 0-100 (or `AGENT_BROWSER_SCREENSHOT_QUALITY` env) |
@@ -877,6 +897,7 @@ Auto-connect discovers Chrome by:
877
897
 
878
898
  1. Reading Chrome's `DevToolsActivePort` file from the default user data directory
879
899
  2. Falling back to probing common debugging ports (9222, 9229)
900
+ 3. If HTTP-based discovery (`/json/version`, `/json/list`) fails, falling back to a direct WebSocket connection
880
901
 
881
902
  This is useful when:
882
903
 
Binary file
Binary file
Binary file
Binary file
Binary file
Binary file
Binary file
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "agent-browser",
3
- "version": "0.20.13",
3
+ "version": "0.21.0",
4
4
  "description": "Headless browser automation CLI for AI agents",
5
5
  "type": "module",
6
6
  "files": [
@@ -147,6 +147,12 @@ agent-browser download @e1 ./file.pdf # Click element to trigger downlo
147
147
  agent-browser wait --download ./output.zip # Wait for any download to complete
148
148
  agent-browser --download-path ./downloads open <url> # Set default download directory
149
149
 
150
+ # Network
151
+ agent-browser network requests # Inspect tracked requests
152
+ agent-browser network route "**/api/*" --abort # Block matching requests
153
+ agent-browser network har start # Start HAR recording
154
+ agent-browser network har stop ./capture.har # Stop and save HAR file
155
+
150
156
  # Viewport & Device Emulation
151
157
  agent-browser set viewport 1920 1080 # Set viewport size (default: 1280x720)
152
158
  agent-browser set viewport 1920 1080 2 # 2x retina (same CSS size, higher res screenshots)
@@ -175,6 +181,24 @@ agent-browser diff url <url1> <url2> --wait-until networkidle # Custom wait str
175
181
  agent-browser diff url <url1> <url2> --selector "#main" # Scope to element
176
182
  ```
177
183
 
184
+ ## Batch Execution
185
+
186
+ Execute multiple commands in a single invocation by piping a JSON array of string arrays to `batch`. This avoids per-command process startup overhead when running multi-step workflows.
187
+
188
+ ```bash
189
+ echo '[
190
+ ["open", "https://example.com"],
191
+ ["snapshot", "-i"],
192
+ ["click", "@e1"],
193
+ ["screenshot", "result.png"]
194
+ ]' | agent-browser batch --json
195
+
196
+ # Stop on first error
197
+ agent-browser batch --bail < commands.json
198
+ ```
199
+
200
+ Use `batch` when you have a known sequence of commands that don't depend on intermediate output. Use separate commands or `&&` chaining when you need to parse output between steps (e.g., snapshot to discover refs, then interact).
201
+
178
202
  ## Common Patterns
179
203
 
180
204
  ### Form Submission
@@ -245,6 +269,30 @@ agent-browser state clear myapp
245
269
  agent-browser state clean --older-than 7
246
270
  ```
247
271
 
272
+ ### Working with Iframes
273
+
274
+ Iframe content is automatically inlined in snapshots. Refs inside iframes carry frame context, so you can interact with them directly.
275
+
276
+ ```bash
277
+ agent-browser open https://example.com/checkout
278
+ agent-browser snapshot -i
279
+ # @e1 [heading] "Checkout"
280
+ # @e2 [Iframe] "payment-frame"
281
+ # @e3 [input] "Card number"
282
+ # @e4 [input] "Expiry"
283
+ # @e5 [button] "Pay"
284
+
285
+ # Interact directly — no frame switch needed
286
+ agent-browser fill @e3 "4111111111111111"
287
+ agent-browser fill @e4 "12/28"
288
+ agent-browser click @e5
289
+
290
+ # To scope a snapshot to one iframe:
291
+ agent-browser frame @e2
292
+ agent-browser snapshot -i # Only iframe content
293
+ agent-browser frame main # Return to main frame
294
+ ```
295
+
248
296
  ### Data Extraction
249
297
 
250
298
  ```bash
@@ -281,6 +329,8 @@ agent-browser --auto-connect snapshot
281
329
  agent-browser --cdp 9222 snapshot
282
330
  ```
283
331
 
332
+ Auto-connect discovers Chrome via `DevToolsActivePort`, common debugging ports (9222, 9229), and falls back to a direct WebSocket connection if HTTP-based CDP discovery fails.
333
+
284
334
  ### Color Scheme (Dark Mode)
285
335
 
286
336
  ```bash
@@ -177,10 +177,36 @@ agent-browser window new # New window
177
177
  ## Frames
178
178
 
179
179
  ```bash
180
- agent-browser frame "#iframe" # Switch to iframe
180
+ agent-browser frame "#iframe" # Switch to iframe by CSS selector
181
+ agent-browser frame @e3 # Switch to iframe by element ref
181
182
  agent-browser frame main # Back to main frame
182
183
  ```
183
184
 
185
+ ### Iframe support
186
+
187
+ Iframes are detected automatically during snapshots. When the main-frame snapshot runs, `Iframe` nodes are resolved and their content is inlined beneath the iframe element in the output (one level of nesting; iframes within iframes are not expanded).
188
+
189
+ ```bash
190
+ agent-browser snapshot -i
191
+ # @e3 [Iframe] "payment-frame"
192
+ # @e4 [input] "Card number"
193
+ # @e5 [button] "Pay"
194
+
195
+ # Interact directly — refs inside iframes already work
196
+ agent-browser fill @e4 "4111111111111111"
197
+ agent-browser click @e5
198
+
199
+ # Or switch frame context for scoped snapshots
200
+ agent-browser frame @e3 # Switch using element ref
201
+ agent-browser snapshot -i # Snapshot scoped to that iframe
202
+ agent-browser frame main # Return to main frame
203
+ ```
204
+
205
+ The `frame` command accepts:
206
+ - **Element refs** — `frame @e3` resolves the ref to an iframe element
207
+ - **CSS selectors** — `frame "#payment-iframe"` finds the iframe by selector
208
+ - **Frame name/URL** — matches against the browser's frame tree
209
+
184
210
  ## Dialogs
185
211
 
186
212
  ```bash
@@ -162,6 +162,31 @@ agent-browser snapshot @e9
162
162
  @e10 [radio] selected # Selected radio
163
163
  ```
164
164
 
165
+ ## Iframes
166
+
167
+ Snapshots automatically detect and inline iframe content. When the main-frame snapshot runs, each `Iframe` node is resolved and its child accessibility tree is included directly beneath it in the output. Refs assigned to elements inside iframes carry frame context, so interactions like `click`, `fill`, and `type` work without manually switching frames.
168
+
169
+ ```bash
170
+ agent-browser snapshot -i
171
+ # @e1 [heading] "Checkout"
172
+ # @e2 [Iframe] "payment-frame"
173
+ # @e3 [input] "Card number"
174
+ # @e4 [input] "Expiry"
175
+ # @e5 [button] "Pay"
176
+ # @e6 [button] "Cancel"
177
+
178
+ # Interact with iframe elements directly using their refs
179
+ agent-browser fill @e3 "4111111111111111"
180
+ agent-browser fill @e4 "12/28"
181
+ agent-browser click @e5
182
+ ```
183
+
184
+ **Key details:**
185
+ - Only one level of iframe nesting is expanded (iframes within iframes are not recursed)
186
+ - Cross-origin iframes that block accessibility tree access are silently skipped
187
+ - Empty iframes or iframes with no interactive content are omitted from the output
188
+ - To scope a snapshot to a single iframe, use `frame @ref` then `snapshot -i`
189
+
165
190
  ## Troubleshooting
166
191
 
167
192
  ### "Ref not found" Error