browserclaw 0.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/LICENSE ADDED
@@ -0,0 +1,22 @@
1
+ MIT License
2
+
3
+ Copyright (c) 2025 Peter Steinberger (OpenClaw)
4
+ Copyright (c) 2026 Idan Rubin
5
+
6
+ Permission is hereby granted, free of charge, to any person obtaining a copy
7
+ of this software and associated documentation files (the "Software"), to deal
8
+ in the Software without restriction, including without limitation the rights
9
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
10
+ copies of the Software, and to permit persons to whom the Software is
11
+ furnished to do so, subject to the following conditions:
12
+
13
+ The above copyright notice and this permission notice shall be included in all
14
+ copies or substantial portions of the Software.
15
+
16
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
17
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
18
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
19
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
20
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
21
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
22
+ SOFTWARE.
package/README.md ADDED
@@ -0,0 +1,278 @@
1
+ # browserclaw
2
+
3
+ [![npm version](https://img.shields.io/npm/v/browserclaw.svg)](https://www.npmjs.com/package/browserclaw)
4
+ [![License: MIT](https://img.shields.io/badge/License-MIT-blue.svg)](./LICENSE)
5
+
6
+ A standalone, typed wrapper around [OpenClaw](https://github.com/openclaw/openclaw)'s browser automation module. Provides AI-friendly browser control with **snapshot + ref targeting** — no CSS selectors, no XPath, no vision, just numbered refs that map to interactive elements.
7
+
8
+ ```typescript
9
+ import { BrowserClaw } from 'browserclaw';
10
+
11
+ const browser = await BrowserClaw.launch({ headless: false });
12
+ const page = await browser.open('https://example.com');
13
+
14
+ // Snapshot — the core feature
15
+ const { snapshot, refs } = await page.snapshot();
16
+ // snapshot: AI-readable text tree
17
+ // refs: { "e1": { role: "link", name: "More info" }, "e2": { role: "button", name: "Submit" } }
18
+
19
+ await page.click('e1'); // Click by ref
20
+ await page.type('e3', 'hello'); // Type by ref
21
+ await browser.stop();
22
+ ```
23
+
24
+ ## Why browserclaw?
25
+
26
+ Most browser automation tools were built for humans writing test scripts. AI agents need something different:
27
+
28
+ - **Vision-based tools** (screenshot → click coordinates) are slow, expensive, and probabilistic
29
+ - **Selector-based tools** (CSS/XPath) are brittle and meaningless to an LLM
30
+ - **browserclaw** gives the AI a **text snapshot** with numbered refs — the AI reads text (what it's best at) and returns a ref ID (deterministic targeting)
31
+
32
+ The snapshot + ref pattern means:
33
+ 1. **Deterministic** — refs resolve to exact elements via Playwright's `getByRole()`, no guessing
34
+ 2. **Fast** — text snapshots are tiny compared to screenshots
35
+ 3. **Cheap** — no vision API calls, just text in/text out
36
+ 4. **Reliable** — built on Playwright, the most robust browser automation engine
37
+
38
+ ## Install
39
+
40
+ ```bash
41
+ npm install browserclaw
42
+ ```
43
+
44
+ Requires a Chromium-based browser installed on the system (Chrome, Brave, Edge, or Chromium). browserclaw auto-detects your installed browser — no need to install Playwright browsers separately.
45
+
46
+ ## How It Works
47
+
48
+ ```
49
+ ┌─────────────┐ snapshot() ┌─────────────────────────────────┐
50
+ │ Web Page │ ──────────────► │ AI-readable text tree │
51
+ │ │ │ │
52
+ │ [buttons] │ │ - heading "Example Domain" │
53
+ │ [links] │ │ - paragraph "This domain..." │
54
+ │ [inputs] │ │ - link "More information" [e1] │
55
+ └─────────────┘ └──────────────┬──────────────────┘
56
+
57
+ AI reads snapshot,
58
+ decides: click e1
59
+
60
+ ┌─────────────┐ click('e1') ┌──────────────▼──────────────────┐
61
+ │ Web Page │ ◄────────────── │ Ref "e1" resolves to: │
62
+ │ (navigated) │ │ getByRole('link', │
63
+ │ │ │ { name: 'More information' })│
64
+ └─────────────┘ └─────────────────────────────────┘
65
+ ```
66
+
67
+ 1. **Snapshot** a page → get an AI-readable text tree with numbered refs (`e1`, `e2`, `e3`...)
68
+ 2. **AI reads** the snapshot text and picks a ref to act on
69
+ 3. **Actions target refs** → browserclaw resolves each ref to a Playwright locator and executes the action
70
+
71
+ > **Note:** Refs are not stable across navigations or page changes. Always take a fresh snapshot before acting — if an action fails, re-snapshot and use the new refs.
72
+
73
+ ## API
74
+
75
+ ### Launch & Connect
76
+
77
+ ```typescript
78
+ // Launch a new Chrome instance (auto-detects Chrome/Brave/Edge/Chromium)
79
+ const browser = await BrowserClaw.launch({
80
+ headless: false, // default: false (visible window)
81
+ executablePath: '...', // optional: specific browser path
82
+ cdpPort: 9222, // default: 9222
83
+ noSandbox: false, // default: false (set true for Docker/CI)
84
+ chromeArgs: ['--start-maximized'], // additional Chrome flags
85
+ });
86
+
87
+ // Or connect to an already-running Chrome instance
88
+ // (started with: chrome --remote-debugging-port=9222)
89
+ const browser = await BrowserClaw.connect('http://localhost:9222');
90
+ ```
91
+
92
+ ### Pages & Tabs
93
+
94
+ ```typescript
95
+ const page = await browser.open('https://example.com');
96
+ const current = await browser.currentPage(); // get active tab
97
+ const tabs = await browser.tabs(); // list all tabs
98
+ await browser.focus(tabId); // bring tab to front
99
+ await browser.close(tabId); // close a tab
100
+ await browser.stop(); // stop browser + cleanup
101
+ ```
102
+
103
+ ### Snapshot (Core Feature)
104
+
105
+ ```typescript
106
+ const { snapshot, refs, stats } = await page.snapshot();
107
+
108
+ // snapshot: human/AI-readable text tree with [ref=eN] markers
109
+ // refs: { "e1": { role: "link", name: "More info" }, ... }
110
+ // stats: { lines: 42, chars: 1200, refs: 8, interactive: 5 }
111
+
112
+ // Options
113
+ const result = await page.snapshot({
114
+ interactive: true, // Only interactive elements (buttons, links, inputs)
115
+ compact: true, // Remove structural containers without refs
116
+ maxDepth: 6, // Limit tree depth
117
+ maxChars: 80000, // Truncate if snapshot exceeds this size
118
+ mode: 'aria', // 'aria' (default) or 'role'
119
+ });
120
+
121
+ // Raw ARIA accessibility tree (structured data, not text)
122
+ const { nodes } = await page.ariaSnapshot({ limit: 500 });
123
+ ```
124
+
125
+ **Snapshot modes:**
126
+ - `'aria'` (default) — Uses Playwright's `_snapshotForAI()`. Refs are resolved via `aria-ref` locators. Best for most use cases. Requires `playwright-core` >= 1.50.
127
+ - `'role'` — Uses Playwright's `ariaSnapshot()` + `getByRole()`. Supports `selector` and `frameSelector` for scoped snapshots.
128
+
129
+ ### Actions
130
+
131
+ All actions target elements by ref ID from the most recent snapshot.
132
+
133
+ ```typescript
134
+ // Click
135
+ await page.click('e1');
136
+ await page.click('e1', { doubleClick: true });
137
+ await page.click('e1', { button: 'right' });
138
+ await page.click('e1', { modifiers: ['Control'] });
139
+
140
+ // Type
141
+ await page.type('e3', 'hello world'); // instant fill
142
+ await page.type('e3', 'slow typing', { slowly: true }); // keystroke by keystroke
143
+ await page.type('e3', 'search', { submit: true }); // type + press Enter
144
+
145
+ // Other interactions
146
+ await page.hover('e2');
147
+ await page.select('e5', 'Option A', 'Option B');
148
+ await page.drag('e1', 'e4');
149
+ await page.scrollIntoView('e7');
150
+
151
+ // Keyboard
152
+ await page.press('Enter');
153
+ await page.press('Control+a');
154
+ await page.press('Meta+Shift+p');
155
+
156
+ // Fill multiple form fields at once
157
+ await page.fill([
158
+ { ref: 'e2', type: 'text', value: 'Jane Doe' },
159
+ { ref: 'e4', type: 'text', value: 'jane@example.com' },
160
+ { ref: 'e6', type: 'checkbox', value: true },
161
+ ]);
162
+ ```
163
+
164
+ ### Navigation & Waiting
165
+
166
+ ```typescript
167
+ await page.goto('https://example.com');
168
+ await page.waitFor({ loadState: 'networkidle' });
169
+ await page.waitFor({ text: 'Welcome' });
170
+ await page.waitFor({ textGone: 'Loading...' });
171
+ await page.waitFor({ url: '**/dashboard' });
172
+ await page.waitFor({ timeMs: 1000 }); // sleep
173
+ ```
174
+
175
+ ### Capture
176
+
177
+ ```typescript
178
+ const screenshot = await page.screenshot(); // viewport PNG
179
+ const fullPage = await page.screenshot({ fullPage: true }); // full scrollable page
180
+ const element = await page.screenshot({ ref: 'e1' }); // specific element
181
+ const jpeg = await page.screenshot({ type: 'jpeg' }); // JPEG format
182
+ const pdf = await page.pdf(); // PDF export (headless only)
183
+ ```
184
+
185
+ ### Activity Monitoring
186
+
187
+ Console messages, errors, and network requests are buffered automatically.
188
+
189
+ ```typescript
190
+ const logs = await page.consoleLogs(); // all messages
191
+ const errors = await page.consoleLogs({ level: 'error' }); // errors only
192
+ const pageErrors = await page.pageErrors(); // uncaught exceptions
193
+ const requests = await page.networkRequests({ filter: '/api' });// filter by URL
194
+ const fresh = await page.networkRequests({ clear: true }); // read and clear buffer
195
+ ```
196
+
197
+ ### Storage
198
+
199
+ ```typescript
200
+ // Cookies
201
+ const cookies = await page.cookies();
202
+ await page.setCookie({ name: 'token', value: 'abc', url: 'https://example.com' });
203
+ await page.clearCookies();
204
+
205
+ // localStorage / sessionStorage
206
+ const values = await page.storageGet('local');
207
+ const token = await page.storageGet('local', 'authToken');
208
+ await page.storageSet('local', 'key', 'value');
209
+ await page.storageClear('session');
210
+ ```
211
+
212
+ ### Evaluate
213
+
214
+ Run JavaScript directly in the browser page context.
215
+
216
+ ```typescript
217
+ const title = await page.evaluate('() => document.title');
218
+ const text = await page.evaluate('(el) => el.textContent', { ref: 'e1' });
219
+ const count = await page.evaluate('() => document.querySelectorAll("img").length');
220
+ ```
221
+
222
+ #### `evaluateInAllFrames(fn)`
223
+
224
+ Run JavaScript in ALL frames on the page, including cross-origin iframes. Playwright bypasses the same-origin policy via CDP, making this essential for interacting with embedded payment forms (Stripe, etc.).
225
+
226
+ ```typescript
227
+ const results = await page.evaluateInAllFrames(`() => {
228
+ const el = document.querySelector('input[name="cardnumber"]');
229
+ return el ? 'found' : null;
230
+ }`);
231
+ // Returns: [{ frameUrl: '...', frameName: '...', result: 'found' }, ...]
232
+ ```
233
+
234
+ ### Viewport
235
+
236
+ ```typescript
237
+ await page.resize(1280, 720);
238
+ ```
239
+
240
+ ## Examples
241
+
242
+ See the [`examples/`](./examples) directory for runnable demos:
243
+
244
+ - **[basic.ts](./examples/basic.ts)** — Navigate, snapshot, click a ref
245
+ - **[form-fill.ts](./examples/form-fill.ts)** — Fill a multi-field form using refs
246
+ - **[ai-agent.ts](./examples/ai-agent.ts)** — AI agent loop pattern with Claude/GPT
247
+
248
+ Run from the source tree:
249
+
250
+ ```bash
251
+ npx tsx examples/basic.ts
252
+ ```
253
+
254
+ ## Requirements
255
+
256
+ - **Node.js** >= 18
257
+ - **Chromium-based browser** installed (Chrome, Brave, Edge, or Chromium)
258
+ - **playwright-core** >= 1.50 (installed automatically as a dependency)
259
+
260
+ No need to install Playwright browsers — browserclaw uses your system's existing Chrome installation via CDP.
261
+
262
+ ## Contributing
263
+
264
+ Contributions welcome! Please:
265
+
266
+ 1. Fork the repository
267
+ 2. Create a feature branch (`git checkout -b my-feature`)
268
+ 3. Make your changes
269
+ 4. Run `npm run typecheck && npm run build` to verify
270
+ 5. Submit a pull request
271
+
272
+ ## Acknowledgments
273
+
274
+ browserclaw extracts and wraps the browser automation module from [OpenClaw](https://github.com/openclaw/openclaw) by [Peter Steinberger](https://github.com/steipete). The snapshot + ref system, CDP connection management, and Playwright integration originate from that project.
275
+
276
+ ## License
277
+
278
+ [MIT](./LICENSE)