agent-browser 0.20.14 → 0.21.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -58,6 +58,16 @@ On Linux, install system dependencies:
58
58
  agent-browser install --with-deps
59
59
  ```
60
60
 
61
+ ### Updating
62
+
63
+ Upgrade to the latest version:
64
+
65
+ ```bash
66
+ agent-browser upgrade
67
+ ```
68
+
69
+ Detects your installation method (npm, Homebrew, or Cargo) and runs the appropriate update command automatically.
70
+
61
71
  ### Requirements
62
72
 
63
73
  - **Chrome** - Run `agent-browser install` to download Chrome from [Chrome for Testing](https://developer.chrome.com/blog/chrome-for-testing/) (Google's official automation channel). No Playwright or Node.js required for the daemon.
@@ -187,6 +197,25 @@ agent-browser wait "#spinner" --state hidden
187
197
 
188
198
  **Load states:** `load`, `domcontentloaded`, `networkidle`
189
199
 
200
+ ### Batch Execution
201
+
202
+ Execute multiple commands in a single invocation by piping a JSON array of
203
+ string arrays to `batch`. This avoids per-command process startup overhead
204
+ when running multi-step workflows.
205
+
206
+ ```bash
207
+ # Pipe commands as JSON
208
+ echo '[
209
+ ["open", "https://example.com"],
210
+ ["snapshot", "-i"],
211
+ ["click", "@e1"],
212
+ ["screenshot", "result.png"]
213
+ ]' | agent-browser batch --json
214
+
215
+ # Stop on first error
216
+ agent-browser batch --bail < commands.json
217
+ ```
218
+
190
219
  ### Clipboard
191
220
 
192
221
  ```bash
@@ -241,6 +270,8 @@ agent-browser network route <url> --body <json> # Mock response
241
270
  agent-browser network unroute [url] # Remove routes
242
271
  agent-browser network requests # View tracked requests
243
272
  agent-browser network requests --filter api # Filter requests
273
+ agent-browser network har start # Start HAR recording
274
+ agent-browser network har stop [output.har] # Stop and save HAR (temp path if omitted)
244
275
  ```
245
276
 
246
277
  ### Tabs & Windows
@@ -318,6 +349,7 @@ agent-browser reload # Reload page
318
349
  ```bash
319
350
  agent-browser install # Download Chrome from Chrome for Testing (Google's official automation channel)
320
351
  agent-browser install --with-deps # Also install system deps (Linux)
352
+ agent-browser upgrade # Upgrade agent-browser to the latest version
321
353
  ```
322
354
 
323
355
  ## Authentication
@@ -539,7 +571,6 @@ This is useful for multimodal AI models that can reason about visual layout, unl
539
571
  | `-p, --provider <name>` | Cloud browser provider (or `AGENT_BROWSER_PROVIDER` env) |
540
572
  | `--device <name>` | iOS device name, e.g. "iPhone 15 Pro" (or `AGENT_BROWSER_IOS_DEVICE` env) |
541
573
  | `--json` | JSON output (for agents) |
542
- | `--full, -f` | Full page screenshot |
543
574
  | `--annotate` | Annotated screenshot with numbered element labels (or `AGENT_BROWSER_ANNOTATE` env) |
544
575
  | `--screenshot-dir <path>` | Default screenshot output directory (or `AGENT_BROWSER_SCREENSHOT_DIR` env) |
545
576
  | `--screenshot-quality <n>` | JPEG quality 0-100 (or `AGENT_BROWSER_SCREENSHOT_QUALITY` env) |
@@ -877,6 +908,7 @@ Auto-connect discovers Chrome by:
877
908
 
878
909
  1. Reading Chrome's `DevToolsActivePort` file from the default user data directory
879
910
  2. Falling back to probing common debugging ports (9222, 9229)
911
+ 3. If HTTP-based discovery (`/json/version`, `/json/list`) fails, falling back to a direct WebSocket connection
880
912
 
881
913
  This is useful when:
882
914
 
Binary file
Binary file
Binary file
Binary file
Binary file
Binary file
Binary file
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "agent-browser",
3
- "version": "0.20.14",
3
+ "version": "0.21.1",
4
4
  "description": "Headless browser automation CLI for AI agents",
5
5
  "type": "module",
6
6
  "files": [
@@ -6,7 +6,7 @@ allowed-tools: Bash(npx agent-browser:*), Bash(agent-browser:*)
6
6
 
7
7
  # Browser Automation with agent-browser
8
8
 
9
- The CLI uses Chrome/Chromium via CDP directly. Install via `npm i -g agent-browser`, `brew install agent-browser`, or `cargo install agent-browser`. Run `agent-browser install` to download Chrome.
9
+ The CLI uses Chrome/Chromium via CDP directly. Install via `npm i -g agent-browser`, `brew install agent-browser`, or `cargo install agent-browser`. Run `agent-browser install` to download Chrome. Run `agent-browser upgrade` to update to the latest version.
10
10
 
11
11
  ## Core Workflow
12
12
 
@@ -147,6 +147,12 @@ agent-browser download @e1 ./file.pdf # Click element to trigger downlo
147
147
  agent-browser wait --download ./output.zip # Wait for any download to complete
148
148
  agent-browser --download-path ./downloads open <url> # Set default download directory
149
149
 
150
+ # Network
151
+ agent-browser network requests # Inspect tracked requests
152
+ agent-browser network route "**/api/*" --abort # Block matching requests
153
+ agent-browser network har start # Start HAR recording
154
+ agent-browser network har stop ./capture.har # Stop and save HAR file
155
+
150
156
  # Viewport & Device Emulation
151
157
  agent-browser set viewport 1920 1080 # Set viewport size (default: 1280x720)
152
158
  agent-browser set viewport 1920 1080 2 # 2x retina (same CSS size, higher res screenshots)
@@ -175,6 +181,24 @@ agent-browser diff url <url1> <url2> --wait-until networkidle # Custom wait str
175
181
  agent-browser diff url <url1> <url2> --selector "#main" # Scope to element
176
182
  ```
177
183
 
184
+ ## Batch Execution
185
+
186
+ Execute multiple commands in a single invocation by piping a JSON array of string arrays to `batch`. This avoids per-command process startup overhead when running multi-step workflows.
187
+
188
+ ```bash
189
+ echo '[
190
+ ["open", "https://example.com"],
191
+ ["snapshot", "-i"],
192
+ ["click", "@e1"],
193
+ ["screenshot", "result.png"]
194
+ ]' | agent-browser batch --json
195
+
196
+ # Stop on first error
197
+ agent-browser batch --bail < commands.json
198
+ ```
199
+
200
+ Use `batch` when you have a known sequence of commands that don't depend on intermediate output. Use separate commands or `&&` chaining when you need to parse output between steps (e.g., snapshot to discover refs, then interact).
201
+
178
202
  ## Common Patterns
179
203
 
180
204
  ### Form Submission
@@ -245,6 +269,30 @@ agent-browser state clear myapp
245
269
  agent-browser state clean --older-than 7
246
270
  ```
247
271
 
272
+ ### Working with Iframes
273
+
274
+ Iframe content is automatically inlined in snapshots. Refs inside iframes carry frame context, so you can interact with them directly.
275
+
276
+ ```bash
277
+ agent-browser open https://example.com/checkout
278
+ agent-browser snapshot -i
279
+ # @e1 [heading] "Checkout"
280
+ # @e2 [Iframe] "payment-frame"
281
+ # @e3 [input] "Card number"
282
+ # @e4 [input] "Expiry"
283
+ # @e5 [button] "Pay"
284
+
285
+ # Interact directly — no frame switch needed
286
+ agent-browser fill @e3 "4111111111111111"
287
+ agent-browser fill @e4 "12/28"
288
+ agent-browser click @e5
289
+
290
+ # To scope a snapshot to one iframe:
291
+ agent-browser frame @e2
292
+ agent-browser snapshot -i # Only iframe content
293
+ agent-browser frame main # Return to main frame
294
+ ```
295
+
248
296
  ### Data Extraction
249
297
 
250
298
  ```bash
@@ -281,6 +329,8 @@ agent-browser --auto-connect snapshot
281
329
  agent-browser --cdp 9222 snapshot
282
330
  ```
283
331
 
332
+ Auto-connect discovers Chrome via `DevToolsActivePort`, common debugging ports (9222, 9229), and falls back to a direct WebSocket connection if HTTP-based CDP discovery fails.
333
+
284
334
  ### Color Scheme (Dark Mode)
285
335
 
286
336
  ```bash
@@ -177,10 +177,36 @@ agent-browser window new # New window
177
177
  ## Frames
178
178
 
179
179
  ```bash
180
- agent-browser frame "#iframe" # Switch to iframe
180
+ agent-browser frame "#iframe" # Switch to iframe by CSS selector
181
+ agent-browser frame @e3 # Switch to iframe by element ref
181
182
  agent-browser frame main # Back to main frame
182
183
  ```
183
184
 
185
+ ### Iframe support
186
+
187
+ Iframes are detected automatically during snapshots. When the main-frame snapshot runs, `Iframe` nodes are resolved and their content is inlined beneath the iframe element in the output (one level of nesting; iframes within iframes are not expanded).
188
+
189
+ ```bash
190
+ agent-browser snapshot -i
191
+ # @e3 [Iframe] "payment-frame"
192
+ # @e4 [input] "Card number"
193
+ # @e5 [button] "Pay"
194
+
195
+ # Interact directly — refs inside iframes already work
196
+ agent-browser fill @e4 "4111111111111111"
197
+ agent-browser click @e5
198
+
199
+ # Or switch frame context for scoped snapshots
200
+ agent-browser frame @e3 # Switch using element ref
201
+ agent-browser snapshot -i # Snapshot scoped to that iframe
202
+ agent-browser frame main # Return to main frame
203
+ ```
204
+
205
+ The `frame` command accepts:
206
+ - **Element refs** — `frame @e3` resolves the ref to an iframe element
207
+ - **CSS selectors** — `frame "#payment-iframe"` finds the iframe by selector
208
+ - **Frame name/URL** — matches against the browser's frame tree
209
+
184
210
  ## Dialogs
185
211
 
186
212
  ```bash
@@ -162,6 +162,31 @@ agent-browser snapshot @e9
162
162
  @e10 [radio] selected # Selected radio
163
163
  ```
164
164
 
165
+ ## Iframes
166
+
167
+ Snapshots automatically detect and inline iframe content. When the main-frame snapshot runs, each `Iframe` node is resolved and its child accessibility tree is included directly beneath it in the output. Refs assigned to elements inside iframes carry frame context, so interactions like `click`, `fill`, and `type` work without manually switching frames.
168
+
169
+ ```bash
170
+ agent-browser snapshot -i
171
+ # @e1 [heading] "Checkout"
172
+ # @e2 [Iframe] "payment-frame"
173
+ # @e3 [input] "Card number"
174
+ # @e4 [input] "Expiry"
175
+ # @e5 [button] "Pay"
176
+ # @e6 [button] "Cancel"
177
+
178
+ # Interact with iframe elements directly using their refs
179
+ agent-browser fill @e3 "4111111111111111"
180
+ agent-browser fill @e4 "12/28"
181
+ agent-browser click @e5
182
+ ```
183
+
184
+ **Key details:**
185
+ - Only one level of iframe nesting is expanded (iframes within iframes are not recursed)
186
+ - Cross-origin iframes that block accessibility tree access are silently skipped
187
+ - Empty iframes or iframes with no interactive content are omitted from the output
188
+ - To scope a snapshot to a single iframe, use `frame @ref` then `snapshot -i`
189
+
165
190
  ## Troubleshooting
166
191
 
167
192
  ### "Ref not found" Error