browserclaw 0.10.6 → 0.11.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -1,4 +1,4 @@
1
- <h2 align="center">🦞 BrowserClaw — Standalone OpenClaw browser module</h2>
1
+ <h2 align="center">🦞 BrowserClawAI-native browser automation for agents</h2>
2
2
 
3
3
  <p align="center">
4
4
  <a href="https://browserclaw.org"><img src="https://img.shields.io/badge/Live-browserclaw.org-orange" alt="Live" /></a>
@@ -10,20 +10,20 @@
10
10
 
11
11
  > **DISCLAIMER: This project is NOT affiliated with browserclaw.com in any form. We have no connection to that site and recommend treating it with caution.**
12
12
 
13
- Extracted and refined from [OpenClaw](https://github.com/openclaw/openclaw)'s browser automation module. A standalone, typed library for AI-friendly browser control with **snapshot + ref targeting** — no CSS selectors, no XPath, no vision, just numbered refs that map to interactive elements.
13
+ The AI-native browser automation library — born from [OpenClaw](https://github.com/openclaw/openclaw), built for agents. **Snapshot + ref targeting** — no CSS selectors, no XPath, no vision, just numbered refs that map to interactive elements.
14
14
 
15
15
  ```typescript
16
16
  import { BrowserClaw } from 'browserclaw';
17
17
 
18
- const browser = await BrowserClaw.launch({ headless: false });
19
- const page = await browser.open('https://example.com');
18
+ const browser = await BrowserClaw.launch({ url: 'https://example.com' });
19
+ const page = await browser.currentPage();
20
20
 
21
21
  // Snapshot — the core feature
22
22
  const { snapshot, refs } = await page.snapshot();
23
23
  // snapshot: AI-readable text tree
24
24
  // refs: { "e1": { role: "link", name: "More info" }, "e2": { role: "button", name: "Submit" } }
25
25
 
26
- await page.click('e1'); // Click by ref
26
+ await page.click('e1'); // Click by ref
27
27
  await page.type('e3', 'hello'); // Type by ref
28
28
  await browser.stop();
29
29
  ```
@@ -37,6 +37,7 @@ Most browser automation tools were built for humans writing test scripts. AI age
37
37
  - **browserclaw** gives the AI a **text snapshot** with numbered refs — the AI reads text (what it's best at) and returns a ref ID (deterministic targeting)
38
38
 
39
39
  The snapshot + ref pattern means:
40
+
40
41
  1. **Deterministic** — refs resolve to exact elements via Playwright locators, no guessing
41
42
  2. **Fast** — text snapshots are tiny compared to screenshots
42
43
  3. **Cheap** — no vision API calls, just text in/text out
@@ -46,15 +47,15 @@ The snapshot + ref pattern means:
46
47
 
47
48
  The AI browser automation space is moving fast. Here's how browserclaw compares to the major alternatives.
48
49
 
49
- | | [browserclaw](https://github.com/idan-rubin/browserclaw) | [browser-use](https://github.com/browser-use/browser-use) | [Stagehand](https://github.com/browserbase/stagehand) | [Playwright MCP](https://github.com/microsoft/playwright-mcp) |
50
- |:---|:---:|:---:|:---:|:---:|
51
- | Ref → exact element, no guessing | :white_check_mark: | :heavy_minus_sign: | :x: | :white_check_mark: |
52
- | No vision model in the loop | :white_check_mark: | :heavy_minus_sign: | :white_check_mark: | :white_check_mark: |
53
- | Survives redesigns (semantic, not pixel) | :white_check_mark: | :heavy_minus_sign: | :white_check_mark: | :white_check_mark: |
54
- | Fill 10 form fields in one call | :white_check_mark: | :x: | :x: | :x: |
55
- | Interact with cross-origin iframes | :white_check_mark: | :white_check_mark: | :x: | :x: |
56
- | Playwright engine (auto-wait, locators) | :white_check_mark: | :x: | :white_check_mark: | :white_check_mark: |
57
- | Embeddable in your own JS/TS agent loop | :white_check_mark: | :x: | :heavy_minus_sign: | :x: |
50
+ | | [browserclaw](https://github.com/idan-rubin/browserclaw) | [browser-use](https://github.com/browser-use/browser-use) | [Stagehand](https://github.com/browserbase/stagehand) | [Playwright MCP](https://github.com/microsoft/playwright-mcp) |
51
+ | :--------------------------------------- | :------------------------------------------------------: | :-------------------------------------------------------: | :---------------------------------------------------: | :-----------------------------------------------------------: |
52
+ | Ref → exact element, no guessing | :white_check_mark: | :heavy_minus_sign: | :x: | :white_check_mark: |
53
+ | No vision model in the loop | :white_check_mark: | :heavy_minus_sign: | :white_check_mark: | :white_check_mark: |
54
+ | Survives redesigns (semantic, not pixel) | :white_check_mark: | :heavy_minus_sign: | :white_check_mark: | :white_check_mark: |
55
+ | Fill 10 form fields in one call | :white_check_mark: | :x: | :x: | :x: |
56
+ | Interact with cross-origin iframes | :white_check_mark: | :white_check_mark: | :x: | :x: |
57
+ | Playwright engine (auto-wait, locators) | :white_check_mark: | :x: | :white_check_mark: | :white_check_mark: |
58
+ | Embeddable in your own JS/TS agent loop | :white_check_mark: | :x: | :heavy_minus_sign: | :x: |
58
59
 
59
60
  :white_check_mark: = Yes&ensp; :heavy_minus_sign: = Partial&ensp; :x: = No
60
61
 
@@ -88,7 +89,7 @@ When you're running the same multi-step workflow hundreds of times — filling f
88
89
 
89
90
  [browserclaw.org](https://browserclaw.org) is an open-source playground where you can type a prompt and watch an AI agent use browserclaw in a real browser — live. No setup, no API keys, just a text box and a browser stream.
90
91
 
91
- Want to run it yourself? The source is at [github.com/idan-rubin/browserclaw.agent](https://github.com/idan-rubin/browserclaw.agent) — spin it up with Docker or Node.js. Supports Groq, Gemini, OpenAI, and Anthropic out of the box.
92
+ Want to run it yourself? The source is at [github.com/idan-rubin/browserclaw-agent](https://github.com/idan-rubin/browserclaw-agent) — spin it up with Docker or Node.js. Supports Groq, Gemini, OpenAI, and Anthropic out of the box.
92
93
 
93
94
  ## Install
94
95
 
@@ -132,19 +133,23 @@ Requires a Chromium-based browser installed on the system (Chrome, Brave, Edge,
132
133
  ```typescript
133
134
  // Launch a new Chrome instance (auto-detects Chrome/Brave/Edge/Chromium)
134
135
  const browser = await BrowserClaw.launch({
135
- headless: false, // default: false (visible window)
136
+ url: 'https://example.com', // navigate initial tab (no extra tabs)
137
+ headless: false, // default: false (visible window)
136
138
  executablePath: '...', // optional: specific browser path
137
- cdpPort: 9222, // default: 9222
138
- noSandbox: false, // default: false (set true for Docker/CI)
139
- userDataDir: '...', // optional: custom user data directory
139
+ cdpPort: 9222, // default: 9222
140
+ noSandbox: false, // default: false (set true for Docker/CI)
141
+ ignoreHTTPSErrors: false, // default: false (set true for expired local dev certs)
142
+ userDataDir: '...', // optional: custom user data directory
140
143
  profileName: 'browserclaw', // profile name in Chrome title bar
141
- profileColor: '#FF4500', // profile accent color (hex)
144
+ profileColor: '#FF4500', // profile accent color (hex)
142
145
  chromeArgs: ['--start-maximized'], // additional Chrome flags
143
146
  });
144
147
 
145
- // Or connect to an already-running Chrome instance
146
- // (started with: chrome --remote-debugging-port=9222)
148
+ // Connect to an already-running Chrome instance
147
149
  const browser = await BrowserClaw.connect('http://localhost:9222');
150
+
151
+ // Auto-discovery: scans common CDP ports (9222-9226, 9229)
152
+ const browser = await BrowserClaw.connect();
148
153
  ```
149
154
 
150
155
  `connect()` checks that Chrome is reachable, then the internal CDP connection retries 3 times with increasing timeouts (5 s, 7 s, 9 s) — safe for Docker/CI where Chrome starts slowly.
@@ -156,16 +161,30 @@ const browser = await BrowserClaw.connect('http://localhost:9222');
156
161
  ```typescript
157
162
  const page = await browser.open('https://example.com');
158
163
  const current = await browser.currentPage(); // get active tab
159
- const tabs = await browser.tabs(); // list all tabs
164
+ const tabs = await browser.tabs(); // list all tabs
160
165
  const handle = browser.page(tabs[0].targetId); // wrap existing tab
161
- await browser.focus(tabId); // bring tab to front
162
- await browser.close(tabId); // close a tab
163
- await browser.stop(); // stop browser + cleanup
164
-
165
- page.id; // CDP target ID (use with focus/close/page)
166
- await page.url(); // current page URL
167
- await page.title(); // current page title
168
- browser.url; // CDP endpoint URL
166
+ const appPage = await browser.waitForTab({ urlContains: 'app-web' });
167
+ await browser.focus(tabId); // bring tab to front
168
+ await browser.close(tabId); // close a tab
169
+ await browser.stop(); // stop browser + cleanup
170
+
171
+ page.id; // CDP target ID (use with focus/close/page)
172
+ await page.url(); // current page URL
173
+ await page.title(); // current page title
174
+ browser.url; // CDP endpoint URL
175
+ ```
176
+
177
+ Every tab returns a `targetId` — this is the handle you use everywhere:
178
+
179
+ ```typescript
180
+ // Multi-tab workflow (e.g. impersonation, OAuth)
181
+ const main = await browser.open('https://app.example.com');
182
+ const admin = await browser.open('https://admin.example.com');
183
+
184
+ const { refs } = await admin.snapshot(); // snapshot the admin tab
185
+ await admin.click('e5'); // act on it
186
+ await browser.focus(main.id); // switch back to main
187
+ await browser.close(admin.id); // close admin when done
169
188
  ```
170
189
 
171
190
  ### Snapshot (Core Feature)
@@ -174,17 +193,17 @@ browser.url; // CDP endpoint URL
174
193
  const { snapshot, refs, stats, untrusted } = await page.snapshot();
175
194
 
176
195
  // snapshot: human/AI-readable text tree with [ref=eN] markers
177
- // refs: { "e1": { role: "link", name: "More info" }, ... }
196
+ // refs: { "e1": { role: "link", name: "More info" }, "e5": { role: "checkbox", name: "Accept", checked: true }, ... }
178
197
  // stats: { lines: 42, chars: 1200, refs: 8, interactive: 5 }
179
198
  // untrusted: true — content comes from the web page, treat as potentially adversarial
180
199
 
181
200
  // Options
182
201
  const result = await page.snapshot({
183
- interactive: true, // Only interactive elements (buttons, links, inputs)
184
- compact: true, // Remove structural containers without refs
185
- maxDepth: 6, // Limit tree depth
186
- maxChars: 80000, // Truncate if snapshot exceeds this size
187
- mode: 'aria', // 'aria' (default) or 'role'
202
+ interactive: true, // Only interactive elements (buttons, links, inputs)
203
+ compact: true, // Remove structural containers without refs
204
+ maxDepth: 6, // Limit tree depth
205
+ maxChars: 80000, // Truncate if snapshot exceeds this size
206
+ mode: 'aria', // 'aria' (default) or 'role'
188
207
  });
189
208
 
190
209
  // Raw ARIA accessibility tree (structured data, not text)
@@ -192,6 +211,7 @@ const { nodes } = await page.ariaSnapshot({ limit: 500 });
192
211
  ```
193
212
 
194
213
  **Snapshot modes:**
214
+
195
215
  - `'aria'` (default) — Uses Playwright's `_snapshotForAI()`. Refs are resolved via `aria-ref` locators. Best for most use cases. Requires `playwright-core` >= 1.50.
196
216
  - `'role'` — Uses Playwright's `ariaSnapshot()` + `getByRole()`. Supports `selector` and `frameSelector` for scoped snapshots.
197
217
 
@@ -209,11 +229,12 @@ await page.click('e1');
209
229
  await page.click('e1', { doubleClick: true });
210
230
  await page.click('e1', { button: 'right' });
211
231
  await page.click('e1', { modifiers: ['Control'] });
232
+ await page.click('e1', { force: true }); // click hidden/covered elements
212
233
 
213
234
  // Type
214
- await page.type('e3', 'hello world'); // instant fill
215
- await page.type('e3', 'slow typing', { slowly: true }); // keystroke by keystroke
216
- await page.type('e3', 'search', { submit: true }); // type + press Enter
235
+ await page.type('e3', 'hello world'); // instant fill
236
+ await page.type('e3', 'slow typing', { slowly: true }); // keystroke by keystroke
237
+ await page.type('e3', 'search', { submit: true }); // type + press Enter
217
238
 
218
239
  // Other interactions
219
240
  await page.hover('e2');
@@ -234,7 +255,31 @@ await page.fill([
234
255
  ]);
235
256
  ```
236
257
 
237
- `fill()` field types: `'text'` (default) calls Playwright `fill()` with the string value. `'checkbox'` and `'radio'` call `setChecked()` truthy values are `true`, `1`, `'1'`, `'true'`. Type can be omitted and defaults to `'text'`. Empty ref throws.
258
+ `fill()` field types: `'text'` (default) calls Playwright `fill()` with the string value. `'checkbox'` and `'radio'` call `setChecked()` with `force: true` (works on hidden inputs behind custom styling). Truthy values are `true`, `1`, `'1'`, `'true'`. Type can be omitted and defaults to `'text'`. Empty ref throws.
259
+
260
+ #### No-snapshot actions
261
+
262
+ These methods find and click elements without needing a snapshot first — useful when you know the text or role but don't want the snapshot+ref round-trip.
263
+
264
+ ```typescript
265
+ // Click by visible text or title attribute
266
+ await page.clickByText('Submit');
267
+ await page.clickByText('Save Changes', { exact: true });
268
+
269
+ // Click by ARIA role and accessible name
270
+ await page.clickByRole('button', 'Save');
271
+ await page.clickByRole('link', 'Settings');
272
+ await page.clickByRole('button', 'Create', { index: 1 }); // second match
273
+
274
+ // Click by CSS selector
275
+ await page.clickBySelector('#submit-btn');
276
+
277
+ // Click at page coordinates (for canvas elements, custom widgets)
278
+ await page.mouseClick(400, 300);
279
+
280
+ // Press and hold at coordinates (raw CDP events, bypasses automation detection)
281
+ await page.pressAndHold(400, 300, { holdMs: 5000, delay: 150 });
282
+ ```
238
283
 
239
284
  #### Highlight
240
285
 
@@ -256,7 +301,7 @@ await uploadDone;
256
301
 
257
302
  #### Dialog Handling
258
303
 
259
- Handle JavaScript dialogs (alert, confirm, prompt). Arm the handler *before* the action that triggers the dialog.
304
+ Handle JavaScript dialogs (alert, confirm, prompt). Arm the handler _before_ the action that triggers the dialog.
260
305
 
261
306
  ```typescript
262
307
  const dialogDone = page.armDialog({ accept: true });
@@ -267,22 +312,33 @@ await dialogDone;
267
312
  const promptDone = page.armDialog({ accept: true, promptText: 'my answer' });
268
313
  await page.click('e6'); // triggers prompt()
269
314
  await promptDone;
315
+
316
+ // Persistent handler: called for every dialog until cleared
317
+ await page.onDialog((event) => {
318
+ console.log(`${event.type}: ${event.message}`);
319
+ event.accept(); // or event.dismiss()
320
+ });
321
+ await page.onDialog(undefined); // clear the handler
270
322
  ```
271
323
 
324
+ By default, unexpected dialogs are auto-dismissed to prevent `ProtocolError` crashes.
325
+
272
326
  ### Navigation & Waiting
273
327
 
274
328
  ```typescript
275
329
  await page.goto('https://example.com');
276
- await page.reload(); // reload the current page
277
- await page.goBack(); // navigate back in history
278
- await page.goForward(); // navigate forward in history
330
+ await page.reload(); // reload the current page
331
+ await page.goBack(); // navigate back in history
332
+ await page.goForward(); // navigate forward in history
279
333
  await page.waitFor({ loadState: 'networkidle' });
280
334
  await page.waitFor({ text: 'Welcome' });
281
335
  await page.waitFor({ textGone: 'Loading...' });
282
336
  await page.waitFor({ url: '**/dashboard' });
283
- await page.waitFor({ selector: '.loaded' }); // wait for CSS selector
284
- await page.waitFor({ fn: '() => document.readyState === "complete"' }); // custom JS
285
- await page.waitFor({ timeMs: 1000 }); // sleep
337
+ await page.waitFor({ selector: '.loaded' }); // wait for CSS selector
338
+ await page.waitFor({ fn: '() => document.readyState === "complete"' }); // custom JS (string)
339
+ await page.waitFor({ fn: () => document.title === 'Done' }); // custom JS (function)
340
+ await page.waitFor({ fn: (name) => document.querySelector('button')?.textContent === name, arg: 'Save' }); // with arg
341
+ await page.waitFor({ timeMs: 1000 }); // sleep
286
342
  await page.waitFor({ text: 'Ready', timeoutMs: 5000 }); // custom timeout
287
343
  ```
288
344
 
@@ -290,14 +346,14 @@ await page.waitFor({ text: 'Ready', timeoutMs: 5000 }); // custom timeout
290
346
 
291
347
  ```typescript
292
348
  // Screenshots
293
- const screenshot = await page.screenshot(); // viewport PNG → Buffer
294
- const fullPage = await page.screenshot({ fullPage: true }); // full scrollable page
295
- const element = await page.screenshot({ ref: 'e1' }); // specific element by ref
349
+ const screenshot = await page.screenshot(); // viewport PNG → Buffer
350
+ const fullPage = await page.screenshot({ fullPage: true }); // full scrollable page
351
+ const element = await page.screenshot({ ref: 'e1' }); // specific element by ref
296
352
  const bySelector = await page.screenshot({ element: '.hero' }); // by CSS selector
297
- const jpeg = await page.screenshot({ type: 'jpeg' }); // JPEG format
353
+ const jpeg = await page.screenshot({ type: 'jpeg' }); // JPEG format
298
354
 
299
355
  // PDF
300
- const pdf = await page.pdf(); // PDF export (headless only)
356
+ const pdf = await page.pdf(); // PDF export (headless only)
301
357
 
302
358
  // Labeled screenshot — numbered badges on each ref for visual debugging
303
359
  const { buffer, labels, skipped } = await page.screenshotWithLabels(['e1', 'e2', 'e3']);
@@ -331,17 +387,33 @@ console.log(resp.status, resp.body);
331
387
 
332
388
  Options: `timeoutMs` (default 30 s), `maxChars` (truncate body).
333
389
 
390
+ #### Wait For Request
391
+
392
+ Wait for a network request matching a URL pattern and get full request + response details, including POST body.
393
+
394
+ ```typescript
395
+ const reqPromise = page.waitForRequest('/api/submit', { method: 'POST' });
396
+ await page.click('e5'); // submit a form
397
+ const req = await reqPromise;
398
+ console.log(req.method, req.postData); // 'POST', '{"name":"Jane"}'
399
+ console.log(req.status, req.ok); // 200, true
400
+ console.log(req.responseBody); // '{"id":123}'
401
+ // { url, method, postData?, status, ok, responseBody?, truncated? }
402
+ ```
403
+
404
+ Options: `method` (filter by HTTP method), `timeoutMs` (default 30 s), `maxChars` (truncate response body).
405
+
334
406
  ### Activity Monitoring
335
407
 
336
408
  Console messages, errors, and network requests are buffered automatically.
337
409
 
338
410
  ```typescript
339
- const logs = await page.consoleLogs(); // all messages
340
- const errors = await page.consoleLogs({ level: 'error' }); // errors only
341
- const recent = await page.consoleLogs({ clear: true }); // read and clear buffer
342
- const pageErrors = await page.pageErrors(); // uncaught exceptions
343
- const requests = await page.networkRequests({ filter: '/api' }); // filter by URL
344
- const fresh = await page.networkRequests({ clear: true }); // read and clear buffer
411
+ const logs = await page.consoleLogs(); // all messages
412
+ const errors = await page.consoleLogs({ level: 'error' }); // errors only
413
+ const recent = await page.consoleLogs({ clear: true }); // read and clear buffer
414
+ const pageErrors = await page.pageErrors(); // uncaught exceptions
415
+ const requests = await page.networkRequests({ filter: '/api' }); // filter by URL
416
+ const fresh = await page.networkRequests({ clear: true }); // read and clear buffer
345
417
  ```
346
418
 
347
419
  ### Storage
@@ -459,7 +531,7 @@ Contributions welcome! Please:
459
531
 
460
532
  ## Acknowledgments
461
533
 
462
- browserclaw is extracted and refined from the browser automation module in [OpenClaw](https://github.com/openclaw/openclaw), built by [Peter Steinberger](https://github.com/steipete) and an [amazing community of contributors](https://github.com/openclaw/openclaw?tab=readme-ov-file#community). The snapshot + ref system, CDP connection management, and Playwright integration originate from that project.
534
+ browserclaw was born from the browser automation module in [OpenClaw](https://github.com/openclaw/openclaw), built by [Peter Steinberger](https://github.com/steipete) and an [amazing community of contributors](https://github.com/openclaw/openclaw?tab=readme-ov-file#community). The snapshot + ref system, CDP connection management, and Playwright integration originate from that project.
463
535
 
464
536
  ## License
465
537