browser-pilot 0.0.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/LICENSE ADDED
@@ -0,0 +1,21 @@
1
+ MIT License
2
+
3
+ Copyright (c) 2026 J S
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in all
13
+ copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21
+ SOFTWARE.
package/README.md ADDED
@@ -0,0 +1,539 @@
1
+ # browser-pilot
2
+
3
+ [![Docs](https://img.shields.io/badge/docs-API%20Reference-blue?style=flat&logo=gitbook&logoColor=white)](https://svilupp.github.io/browser-pilot/)
4
+ [![npm version](https://img.shields.io/npm/v/browser-pilot.svg)](https://www.npmjs.com/package/browser-pilot)
5
+ [![CI status](https://github.com/svilupp/browser-pilot/workflows/CI/badge.svg)](https://github.com/svilupp/browser-pilot/actions)
6
+ [![TypeScript](https://img.shields.io/badge/TypeScript-5.0-blue?style=flat&logo=typescript&logoColor=white)](https://www.typescriptlang.org/)
7
+ [![License](https://img.shields.io/npm/l/browser-pilot.svg)](https://github.com/svilupp/browser-pilot/blob/main/LICENSE)
8
+
9
+ Lightweight CDP-based browser automation for AI agents. Zero dependencies, works in Node.js, Bun, and Cloudflare Workers.
10
+
11
+ ```typescript
12
+ import { connect } from 'browser-pilot';
13
+
14
+ const browser = await connect({ provider: 'browserbase', apiKey: process.env.BROWSERBASE_API_KEY });
15
+ const page = await browser.page();
16
+
17
+ await page.goto('https://example.com/login');
18
+ await page.fill(['#email', 'input[type=email]'], 'user@example.com');
19
+ await page.fill(['#password', 'input[type=password]'], 'secret');
20
+ await page.submit(['#login-btn', 'button[type=submit]']);
21
+
22
+ const snapshot = await page.snapshot();
23
+ console.log(snapshot.text); // Accessibility tree as text
24
+
25
+ await browser.close();
26
+ ```
27
+
28
+ ## Why browser-pilot?
29
+
30
+ | Problem with Playwright/Puppeteer | browser-pilot Solution |
31
+ |-----------------------------------|------------------------|
32
+ | Won't run in Cloudflare Workers | Pure Web Standard APIs, zero Node.js dependencies |
33
+ | Bun CDP connection bugs | Custom CDP client that works everywhere |
34
+ | Single-selector API (fragile) | Multi-selector by default: `['#primary', '.fallback']` |
35
+ | No action batching (high latency) | Batch DSL: one call for entire sequences |
36
+ | No AI-optimized snapshots | Built-in accessibility tree extraction |
37
+
38
+ ## Installation
39
+
40
+ ```bash
41
+ bun add browser-pilot
42
+ # or
43
+ npm install browser-pilot
44
+ ```
45
+
46
+ ## Providers
47
+
48
+ ### BrowserBase (Recommended for production)
49
+
50
+ ```typescript
51
+ const browser = await connect({
52
+ provider: 'browserbase',
53
+ apiKey: process.env.BROWSERBASE_API_KEY,
54
+ projectId: process.env.BROWSERBASE_PROJECT_ID, // optional
55
+ });
56
+ ```
57
+
58
+ ### Browserless
59
+
60
+ ```typescript
61
+ const browser = await connect({
62
+ provider: 'browserless',
63
+ apiKey: process.env.BROWSERLESS_API_KEY,
64
+ });
65
+ ```
66
+
67
+ ### Generic (Local Chrome)
68
+
69
+ ```bash
70
+ # Start Chrome with remote debugging
71
+ chrome --remote-debugging-port=9222
72
+ ```
73
+
74
+ ```typescript
75
+ const browser = await connect({
76
+ provider: 'generic',
77
+ wsUrl: 'ws://localhost:9222/devtools/browser/...', // optional, auto-discovers
78
+ });
79
+ ```
80
+
81
+ ## Core Concepts
82
+
83
+ ### Multi-Selector (Robust Automation)
84
+
85
+ Every action accepts `string | string[]`. When given an array, tries each selector in order until one works:
86
+
87
+ ```typescript
88
+ // Tries #submit first, falls back to alternatives
89
+ await page.click(['#submit', 'button[type=submit]', '.submit-btn']);
90
+
91
+ // Cookie consent - try multiple common patterns
92
+ await page.click([
93
+ '#accept-cookies',
94
+ '.cookie-accept',
95
+ 'button:has-text("Accept")',
96
+ '[data-testid="cookie-accept"]'
97
+ ], { optional: true, timeout: 3000 });
98
+ ```
99
+
100
+ ### Built-in Waiting
101
+
102
+ Every action automatically waits for the element to be visible before interacting:
103
+
104
+ ```typescript
105
+ // No separate waitFor needed - this waits automatically
106
+ await page.click('.dynamic-button', { timeout: 5000 });
107
+
108
+ // Explicit waiting when needed
109
+ await page.waitFor('.loading', { state: 'hidden' });
110
+ await page.waitForNavigation();
111
+ await page.waitForNetworkIdle();
112
+ ```
113
+
114
+ ### Batch Actions
115
+
116
+ Execute multiple actions in a single call with full result tracking:
117
+
118
+ ```typescript
119
+ const result = await page.batch([
120
+ { action: 'goto', url: 'https://example.com/login' },
121
+ { action: 'fill', selector: '#email', value: 'user@example.com' },
122
+ { action: 'fill', selector: '#password', value: 'secret' },
123
+ { action: 'submit', selector: '#login-btn' },
124
+ { action: 'wait', waitFor: 'navigation' },
125
+ { action: 'snapshot' },
126
+ ]);
127
+
128
+ console.log(result.success); // true if all steps succeeded
129
+ console.log(result.totalDurationMs); // total execution time
130
+ console.log(result.steps[5].result); // snapshot from step 5
131
+ ```
132
+
133
+ ### AI-Optimized Snapshots
134
+
135
+ Get the page state in a format perfect for LLMs:
136
+
137
+ ```typescript
138
+ const snapshot = await page.snapshot();
139
+
140
+ // Structured accessibility tree
141
+ console.log(snapshot.accessibilityTree);
142
+
143
+ // Interactive elements with refs
144
+ console.log(snapshot.interactiveElements);
145
+ // [{ ref: 'e1', role: 'button', name: 'Submit', selector: '...' }, ...]
146
+
147
+ // Text representation for LLMs
148
+ console.log(snapshot.text);
149
+ // - main [ref=e1]
150
+ // - heading "Welcome" [ref=e2]
151
+ // - button "Get Started" [ref=e3]
152
+ // - textbox [ref=e4] placeholder="Email"
153
+ ```
154
+
155
+ ### Ref-Based Selectors
156
+
157
+ After taking a snapshot, use element refs directly as selectors:
158
+
159
+ ```typescript
160
+ const snapshot = await page.snapshot();
161
+ // Output shows: button "Submit" [ref=e4]
162
+
163
+ // Click using the ref - no fragile CSS needed
164
+ await page.click('ref:e4');
165
+
166
+ // Fill input by ref
167
+ await page.fill('ref:e23', 'hello@example.com');
168
+
169
+ // Combine ref with CSS fallbacks
170
+ await page.click(['ref:e4', '#submit', 'button[type=submit]']);
171
+ ```
172
+
173
+ Refs are stable until page navigation. Always take a fresh snapshot after navigating.
174
+
175
+ ## Page API
176
+
177
+ ### Navigation
178
+
179
+ ```typescript
180
+ await page.goto(url, options?)
181
+ await page.reload(options?)
182
+ await page.goBack(options?)
183
+ await page.goForward(options?)
184
+
185
+ const url = await page.url()
186
+ const title = await page.title()
187
+ ```
188
+
189
+ ### Actions
190
+
191
+ All actions accept `string | string[]` for selectors:
192
+
193
+ ```typescript
194
+ await page.click(selector, options?)
195
+ await page.fill(selector, value, options?) // clears first by default
196
+ await page.type(selector, text, options?) // types character by character
197
+ await page.select(selector, value, options?) // native <select>
198
+ await page.select({ trigger, option, value, match }, options?) // custom dropdown
199
+ await page.check(selector, options?)
200
+ await page.uncheck(selector, options?)
201
+ await page.submit(selector, options?) // tries Enter, then click
202
+ await page.press(key)
203
+ await page.focus(selector, options?)
204
+ await page.hover(selector, options?)
205
+ await page.scroll(selector, options?)
206
+ ```
207
+
208
+ ### Waiting
209
+
210
+ ```typescript
211
+ await page.waitFor(selector, { state: 'visible' | 'hidden' | 'attached' | 'detached' })
212
+ await page.waitForNavigation(options?)
213
+ await page.waitForNetworkIdle({ idleTime: 500 })
214
+ ```
215
+
216
+ ### Content
217
+
218
+ ```typescript
219
+ const snapshot = await page.snapshot()
220
+ const text = await page.text(selector?)
221
+ const screenshot = await page.screenshot({ format: 'png', fullPage: true })
222
+ const result = await page.evaluate(() => document.title)
223
+ ```
224
+
225
+ ### Files
226
+
227
+ ```typescript
228
+ await page.setInputFiles(selector, [{ name: 'file.pdf', mimeType: 'application/pdf', buffer: data }])
229
+ const download = await page.waitForDownload(() => page.click('#download-btn'))
230
+ ```
231
+
232
+ ### Emulation
233
+
234
+ ```typescript
235
+ import { devices } from 'browser-pilot';
236
+
237
+ await page.emulate(devices['iPhone 14']); // Full device emulation
238
+ await page.setViewport({ width: 1280, height: 720, deviceScaleFactor: 2 });
239
+ await page.setUserAgent('Custom UA');
240
+ await page.setGeolocation({ latitude: 37.7749, longitude: -122.4194 });
241
+ await page.setTimezone('America/New_York');
242
+ await page.setLocale('fr-FR');
243
+ ```
244
+
245
+ Devices: `iPhone 14`, `iPhone 14 Pro Max`, `Pixel 7`, `iPad Pro 11`, `Desktop Chrome`, `Desktop Firefox`
246
+
247
+ ### Request Interception
248
+
249
+ ```typescript
250
+ // Block images and fonts
251
+ await page.blockResources(['Image', 'Font']);
252
+
253
+ // Mock API responses
254
+ await page.route('**/api/users', { status: 200, body: { users: [] } });
255
+
256
+ // Full control
257
+ await page.intercept('*api*', async (request, actions) => {
258
+ if (request.url.includes('blocked')) await actions.fail();
259
+ else await actions.continue({ headers: { ...request.headers, 'X-Custom': 'value' } });
260
+ });
261
+ ```
262
+
263
+ ### Cookies & Storage
264
+
265
+ ```typescript
266
+ // Cookies
267
+ const cookies = await page.cookies();
268
+ await page.setCookie({ name: 'session', value: 'abc', domain: '.example.com' });
269
+ await page.clearCookies();
270
+
271
+ // localStorage / sessionStorage
272
+ await page.setLocalStorage('key', 'value');
273
+ const value = await page.getLocalStorage('key');
274
+ await page.clearLocalStorage();
275
+ ```
276
+
277
+ ### Console & Dialogs
278
+
279
+ ```typescript
280
+ // Capture console messages
281
+ await page.onConsole((msg) => console.log(`[${msg.type}] ${msg.text}`));
282
+
283
+ // Handle dialogs (alert, confirm, prompt)
284
+ await page.onDialog(async (dialog) => {
285
+ if (dialog.type === 'confirm') await dialog.accept();
286
+ else await dialog.dismiss();
287
+ });
288
+
289
+ // Collect messages during an action
290
+ const { result, messages } = await page.collectConsole(async () => {
291
+ return await page.click('#button');
292
+ });
293
+ ```
294
+
295
+ **Important:** Native browser dialogs (`alert()`, `confirm()`, `prompt()`) block all CDP commands until handled. Always set up a dialog handler before triggering actions that may show dialogs.
296
+
297
+ ### Iframes
298
+
299
+ Switch context to interact with iframe content:
300
+
301
+ ```typescript
302
+ // Switch to iframe
303
+ await page.switchToFrame('iframe#payment');
304
+
305
+ // Now actions target the iframe
306
+ await page.fill('#card-number', '4242424242424242');
307
+ await page.fill('#expiry', '12/25');
308
+
309
+ // Switch back to main document
310
+ await page.switchToMain();
311
+ await page.click('#submit-order');
312
+ ```
313
+
314
+ Note: Cross-origin iframes cannot be accessed due to browser security.
315
+
316
+ ### Options
317
+
318
+ ```typescript
319
+ interface ActionOptions {
320
+ timeout?: number; // default: 30000ms
321
+ optional?: boolean; // return false instead of throwing on failure
322
+ }
323
+ ```
324
+
325
+ ## CLI
326
+
327
+ The CLI provides session persistence for interactive workflows:
328
+
329
+ ```bash
330
+ # Connect to a browser
331
+ bp connect --provider browserbase --name my-session
332
+ bp connect --provider generic # auto-discovers local Chrome
333
+
334
+ # Execute actions
335
+ bp exec -s my-session '{"action":"goto","url":"https://example.com"}'
336
+ bp exec -s my-session '[
337
+ {"action":"fill","selector":"#search","value":"browser automation"},
338
+ {"action":"submit","selector":"#search-form"}
339
+ ]'
340
+
341
+ # Get page state (note the refs in output)
342
+ bp snapshot -s my-session --format text
343
+ # Output: button "Submit" [ref=e4], textbox "Email" [ref=e5], ...
344
+
345
+ # Use refs from snapshot for reliable targeting
346
+ bp exec -s my-session '{"action":"click","selector":"ref:e4"}'
347
+ bp exec -s my-session '{"action":"fill","selector":"ref:e5","value":"test@example.com"}'
348
+
349
+ # Handle native dialogs (alert/confirm/prompt)
350
+ bp exec --dialog accept '{"action":"click","selector":"#delete-btn"}'
351
+
352
+ # Other commands
353
+ bp text -s my-session --selector ".main-content"
354
+ bp screenshot -s my-session --output page.png
355
+ bp list # list all sessions
356
+ bp close -s my-session # close session
357
+ bp actions # show complete action reference
358
+ ```
359
+
360
+ ### CLI for AI Agents
361
+
362
+ The CLI is designed for AI agent tool calls. The recommended workflow:
363
+
364
+ 1. **Take snapshot** to see the page structure with refs
365
+ 2. **Use refs** (`ref:e4`) for reliable element targeting
366
+ 3. **Batch actions** to reduce round trips
367
+
368
+ ```bash
369
+ # Step 1: Get page state with refs
370
+ bp snapshot --format text
371
+ # Output shows: button "Add to Cart" [ref=e12], textbox "Search" [ref=e5]
372
+
373
+ # Step 2: Use refs to interact (stable, no CSS guessing)
374
+ bp exec '[
375
+ {"action":"fill","selector":"ref:e5","value":"laptop"},
376
+ {"action":"click","selector":"ref:e12"},
377
+ {"action":"snapshot"}
378
+ ]' --output json
379
+ ```
380
+
381
+ Multi-selector fallbacks for robustness:
382
+ ```bash
383
+ bp exec '[
384
+ {"action":"click","selector":["ref:e4","#submit","button[type=submit]"]}
385
+ ]'
386
+ ```
387
+
388
+ Output:
389
+ ```json
390
+ {
391
+ "success": true,
392
+ "steps": [
393
+ {"action": "fill", "success": true, "durationMs": 30},
394
+ {"action": "click", "success": true, "durationMs": 50, "selectorUsed": "ref:e12"},
395
+ {"action": "snapshot", "success": true, "durationMs": 100, "result": "..."}
396
+ ],
397
+ "totalDurationMs": 180
398
+ }
399
+ ```
400
+
401
+ Run `bp actions` for complete action reference.
402
+
403
+ ## Examples
404
+
405
+ ### Login Flow with Error Handling
406
+
407
+ ```typescript
408
+ const result = await page.batch([
409
+ { action: 'goto', url: 'https://app.example.com/login' },
410
+ { action: 'fill', selector: ['#email', 'input[name=email]'], value: email },
411
+ { action: 'fill', selector: ['#password', 'input[name=password]'], value: password },
412
+ { action: 'click', selector: '.remember-me', optional: true },
413
+ { action: 'submit', selector: ['#login', 'button[type=submit]'] },
414
+ ], { onFail: 'stop' });
415
+
416
+ if (!result.success) {
417
+ console.error(`Failed at step ${result.stoppedAtIndex}: ${result.steps[result.stoppedAtIndex!].error}`);
418
+ }
419
+ ```
420
+
421
+ ### Custom Dropdown
422
+
423
+ ```typescript
424
+ // Using the custom select config
425
+ await page.select({
426
+ trigger: '.country-dropdown',
427
+ option: '.dropdown-option',
428
+ value: 'United States',
429
+ match: 'text', // or 'contains' or 'value'
430
+ });
431
+
432
+ // Or compose from primitives
433
+ await page.click('.country-dropdown');
434
+ await page.fill('.dropdown-search', 'United');
435
+ await page.click('.dropdown-option:has-text("United States")');
436
+ ```
437
+
438
+ ### Cloudflare Workers
439
+
440
+ ```typescript
441
+ export default {
442
+ async fetch(request: Request, env: Env): Promise<Response> {
443
+ const browser = await connect({
444
+ provider: 'browserbase',
445
+ apiKey: env.BROWSERBASE_API_KEY,
446
+ });
447
+
448
+ const page = await browser.page();
449
+ await page.goto('https://example.com');
450
+ const snapshot = await page.snapshot();
451
+
452
+ await browser.close();
453
+
454
+ return Response.json({ title: snapshot.title, elements: snapshot.interactiveElements });
455
+ },
456
+ };
457
+ ```
458
+
459
+ ### AI Agent Tool Definition
460
+
461
+ ```typescript
462
+ const browserTool = {
463
+ name: 'browser_action',
464
+ description: 'Execute browser actions and get page state',
465
+ parameters: {
466
+ type: 'object',
467
+ properties: {
468
+ actions: {
469
+ type: 'array',
470
+ items: {
471
+ type: 'object',
472
+ properties: {
473
+ action: { enum: ['goto', 'click', 'fill', 'submit', 'snapshot'] },
474
+ selector: { type: ['string', 'array'] },
475
+ value: { type: 'string' },
476
+ url: { type: 'string' },
477
+ },
478
+ },
479
+ },
480
+ },
481
+ },
482
+ execute: async ({ actions }) => {
483
+ const page = await getOrCreatePage();
484
+ return page.batch(actions);
485
+ },
486
+ };
487
+ ```
488
+
489
+ ## Advanced
490
+
491
+ ### Direct CDP Access
492
+
493
+ ```typescript
494
+ const browser = await connect({ provider: 'generic' });
495
+ const cdp = browser.cdpClient;
496
+
497
+ // Send any CDP command
498
+ await cdp.send('Emulation.setDeviceMetricsOverride', {
499
+ width: 375,
500
+ height: 812,
501
+ deviceScaleFactor: 3,
502
+ mobile: true,
503
+ });
504
+ ```
505
+
506
+ ### Tracing
507
+
508
+ ```typescript
509
+ import { enableTracing } from 'browser-pilot';
510
+
511
+ enableTracing({ output: 'console' });
512
+ // [info] goto https://example.com ✓ (1200ms)
513
+ // [info] click #submit ✓ (50ms)
514
+ ```
515
+
516
+ ## AI Agent Integration
517
+
518
+ browser-pilot is designed for AI agents. Two resources for agent setup:
519
+
520
+ - **[llms.txt](./docs/llms.txt)** - Abbreviated reference for LLM context windows
521
+ - **[Claude Code Skill](./docs/skill/SKILL.md)** - Full skill for Claude Code agents
522
+
523
+ To use with Claude Code, copy `docs/skill/` to your project or reference it in your agent's context.
524
+
525
+ ## Documentation
526
+
527
+ See the [docs](./docs) folder for detailed documentation:
528
+
529
+ - [Getting Started](./docs/getting-started.md)
530
+ - [Providers](./docs/providers.md)
531
+ - [Multi-Selector Guide](./docs/guides/multi-selector.md)
532
+ - [Batch Actions](./docs/guides/batch-actions.md)
533
+ - [Snapshots](./docs/guides/snapshots.md)
534
+ - [CLI Reference](./docs/cli.md)
535
+ - [API Reference](./docs/api/page.md)
536
+
537
+ ## License
538
+
539
+ MIT