agent-browser 0.21.3 → 0.22.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -270,6 +270,10 @@ agent-browser network route <url> --body <json> # Mock response
270
270
  agent-browser network unroute [url] # Remove routes
271
271
  agent-browser network requests # View tracked requests
272
272
  agent-browser network requests --filter api # Filter requests
273
+ agent-browser network requests --type xhr,fetch # Filter by resource type
274
+ agent-browser network requests --method POST # Filter by HTTP method
275
+ agent-browser network requests --status 2xx # Filter by status (200, 2xx, 400-499)
276
+ agent-browser network request <requestId> # View full request/response detail
273
277
  agent-browser network har start # Start HAR recording
274
278
  agent-browser network har stop [output.har] # Stop and save HAR (temp path if omitted)
275
279
  ```
@@ -486,7 +490,7 @@ agent-browser --session-name secure open example.com
486
490
 
487
491
  agent-browser includes security features for safe AI agent deployments. All features are opt-in -- existing workflows are unaffected until you explicitly enable a feature:
488
492
 
489
- - **Authentication Vault** -- Store credentials locally (always encrypted), reference by name. The LLM never sees passwords. A key is auto-generated at `~/.agent-browser/.encryption-key` if `AGENT_BROWSER_ENCRYPTION_KEY` is not set: `echo "pass" | agent-browser auth save github --url https://github.com/login --username user --password-stdin` then `agent-browser auth login github`
493
+ - **Authentication Vault** -- Store credentials locally (always encrypted), reference by name. The LLM never sees passwords. `auth login` navigates with `load` and then waits for login form selectors to appear (SPA-friendly, timeout follows the default action timeout). A key is auto-generated at `~/.agent-browser/.encryption-key` if `AGENT_BROWSER_ENCRYPTION_KEY` is not set: `echo "pass" | agent-browser auth save github --url https://github.com/login --username user --password-stdin` then `agent-browser auth login github`
490
494
  - **Content Boundary Markers** -- Wrap page output in delimiters so LLMs can distinguish tool output from untrusted content: `--content-boundaries`
491
495
  - **Domain Allowlist** -- Restrict navigation to trusted domains (wildcards like `*.example.com` also match the bare domain): `--allowed-domains "example.com,*.example.com"`. Sub-resource requests (scripts, images, fetch) and WebSocket/EventSource connections to non-allowed domains are also blocked. Include any CDN domains your target pages depend on (e.g., `*.cdn.example.com`).
492
496
  - **Action Policy** -- Gate destructive actions with a static policy file: `--action-policy ./policy.json`
@@ -511,7 +515,6 @@ The `snapshot` command supports filtering to reduce output size:
511
515
  ```bash
512
516
  agent-browser snapshot # Full accessibility tree
513
517
  agent-browser snapshot -i # Interactive elements only (buttons, inputs, links)
514
- agent-browser snapshot -i -C # Include cursor-interactive elements (divs with onclick, etc.)
515
518
  agent-browser snapshot -c # Compact (remove empty structural elements)
516
519
  agent-browser snapshot -d 3 # Limit depth to 3 levels
517
520
  agent-browser snapshot -s "#main" # Scope to CSS selector
@@ -521,13 +524,10 @@ agent-browser snapshot -i -c -d 5 # Combine options
521
524
  | Option | Description |
522
525
  | ---------------------- | ----------------------------------------------------------------------- |
523
526
  | `-i, --interactive` | Only show interactive elements (buttons, links, inputs) |
524
- | `-C, --cursor` | Include cursor-interactive elements (cursor:pointer, onclick, tabindex) |
525
527
  | `-c, --compact` | Remove empty structural elements |
526
528
  | `-d, --depth <n>` | Limit tree depth |
527
529
  | `-s, --selector <sel>` | Scope to CSS selector |
528
530
 
529
- The `-C` flag is useful for modern web apps that use custom clickable elements (divs, spans) instead of standard buttons/links.
530
-
531
531
  ## Annotated Screenshots
532
532
 
533
533
  The `--annotate` flag overlays numbered labels on interactive elements in the screenshot. Each label `[N]` corresponds to ref `@eN`, so the same refs work for both visual and text-based workflows.
@@ -0,0 +1 @@
1
+ pnpm
Binary file
Binary file
Binary file
Binary file
Binary file
Binary file
Binary file
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "agent-browser",
3
- "version": "0.21.3",
3
+ "version": "0.22.0",
4
4
  "description": "Headless browser automation CLI for AI agents",
5
5
  "type": "module",
6
6
  "files": [
@@ -80,6 +80,31 @@ async function downloadFile(url, dest) {
80
80
  });
81
81
  }
82
82
 
83
+ /**
84
+ * Detect which package manager ran this postinstall and write a marker file
85
+ * next to the binary so `agent-browser upgrade` can use the correct one
86
+ * without fragile path heuristics or slow subprocess probing.
87
+ *
88
+ * npm_config_user_agent is set by npm/pnpm/yarn/bun during lifecycle scripts,
89
+ * e.g. "pnpm/8.10.0 node/v20.10.0 linux x64"
90
+ */
91
+ function writeInstallMethod() {
92
+ const ua = process.env.npm_config_user_agent || '';
93
+ let method = '';
94
+ if (ua.startsWith('pnpm/')) method = 'pnpm';
95
+ else if (ua.startsWith('yarn/')) method = 'yarn';
96
+ else if (ua.startsWith('bun/')) method = 'bun';
97
+ else if (ua.startsWith('npm/')) method = 'npm';
98
+
99
+ if (method) {
100
+ try {
101
+ writeFileSync(join(binDir, '.install-method'), method);
102
+ } catch {
103
+ // Non-critical — upgrade will fall back to heuristics
104
+ }
105
+ }
106
+ }
107
+
83
108
  async function main() {
84
109
  // Check if binary already exists
85
110
  if (existsSync(binaryPath)) {
@@ -88,10 +113,12 @@ async function main() {
88
113
  chmodSync(binaryPath, 0o755);
89
114
  }
90
115
  console.log(`✓ Native binary ready: ${binaryName}`);
91
-
116
+
117
+ writeInstallMethod();
118
+
92
119
  // On global installs, fix npm's bin entry to use native binary directly
93
120
  await fixGlobalInstallBin();
94
-
121
+
95
122
  showInstallReminder();
96
123
  return;
97
124
  }
@@ -106,12 +133,12 @@ async function main() {
106
133
 
107
134
  try {
108
135
  await downloadFile(DOWNLOAD_URL, binaryPath);
109
-
136
+
110
137
  // Make executable on Unix
111
138
  if (platform() !== 'win32') {
112
139
  chmodSync(binaryPath, 0o755);
113
140
  }
114
-
141
+
115
142
  console.log(`✓ Downloaded native binary: ${binaryName}`);
116
143
  } catch (err) {
117
144
  console.log(`Could not download native binary: ${err.message}`);
@@ -121,6 +148,8 @@ async function main() {
121
148
  console.log(' 2. Run: npm run build:native');
122
149
  }
123
150
 
151
+ writeInstallMethod();
152
+
124
153
  // On global installs, fix npm's bin entry to use native binary directly
125
154
  // This avoids the /bin/sh error on Windows and provides zero-overhead execution
126
155
  await fixGlobalInstallBin();
@@ -90,6 +90,8 @@ echo "$PASSWORD" | agent-browser auth save myapp --url https://app.example.com/l
90
90
  agent-browser auth login myapp
91
91
  ```
92
92
 
93
+ `auth login` navigates with `load` and then waits for login form selectors to appear before filling/clicking, which is more reliable on delayed SPA login screens.
94
+
93
95
  **Option 5: State file (manual save/load)**
94
96
 
95
97
  ```bash
@@ -111,7 +113,6 @@ agent-browser close # Close browser
111
113
 
112
114
  # Snapshot
113
115
  agent-browser snapshot -i # Interactive elements with refs (recommended)
114
- agent-browser snapshot -i -C # Include cursor-interactive elements (divs with onclick, cursor:pointer)
115
116
  agent-browser snapshot -s "#selector" # Scope to CSS selector
116
117
 
117
118
  # Interaction (use @refs from snapshot)
@@ -149,6 +150,10 @@ agent-browser --download-path ./downloads open <url> # Set default download dir
149
150
 
150
151
  # Network
151
152
  agent-browser network requests # Inspect tracked requests
153
+ agent-browser network requests --type xhr,fetch # Filter by resource type
154
+ agent-browser network requests --method POST # Filter by HTTP method
155
+ agent-browser network requests --status 2xx # Filter by status (200, 2xx, 400-499)
156
+ agent-browser network request <requestId> # View full request/response detail
152
157
  agent-browser network route "**/api/*" --abort # Block matching requests
153
158
  agent-browser network har start # Start HAR recording
154
159
  agent-browser network har stop ./capture.har # Stop and save HAR file
@@ -230,6 +235,8 @@ agent-browser auth show github
230
235
  agent-browser auth delete github
231
236
  ```
232
237
 
238
+ `auth login` waits for username/password/submit selectors before interacting, with a timeout tied to the default action timeout.
239
+
233
240
  ### Authentication with State Persistence
234
241
 
235
242
  ```bash
@@ -217,7 +217,6 @@ AGENT_BROWSER_COLOR_SCHEME=dark agent-browser connect 9222
217
217
  ### Elements not appearing in snapshot
218
218
 
219
219
  - The app may use multiple webviews. Use `agent-browser tab` to list targets and switch to the right one
220
- - Use `agent-browser snapshot -i -C` to include cursor-interactive elements (divs with onclick handlers)
221
220
 
222
221
  ### Cannot type in input fields
223
222
 
@@ -235,15 +235,6 @@ agent-browser console
235
235
  agent-browser errors
236
236
  ```
237
237
 
238
- ### View raw HTML of an element
239
-
240
- ```bash
241
- # Snapshot shows the accessibility tree. If an element isn't there,
242
- # it may not be interactive (e.g., div instead of button)
243
- # Use snapshot -i -C to include cursor-interactive divs
244
- agent-browser snapshot -i -C
245
- ```
246
-
247
238
  ### Get current page state
248
239
 
249
240
  ```bash
@@ -334,19 +334,13 @@ If you can't find an element:
334
334
  agent-browser snapshot -i
335
335
  ```
336
336
 
337
- 3. **Try snapshot with extended range**
338
- ```bash
339
- # Include cursor-interactive elements (divs with onclick handlers)
340
- agent-browser snapshot -i -C
341
- ```
342
-
343
- 4. **Check current URL**
337
+ 3. **Check current URL**
344
338
  ```bash
345
339
  agent-browser get url
346
340
  # Verify you're in the right section
347
341
  ```
348
342
 
349
- 5. **Wait for page to load**
343
+ 4. **Wait for page to load**
350
344
  ```bash
351
345
  agent-browser wait --load networkidle
352
346
  agent-browser wait 1000