browser-pilot 0.0.13 → 0.0.15

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -6,38 +6,19 @@
6
6
  [![TypeScript](https://img.shields.io/badge/TypeScript-5.0-blue?style=flat&logo=typescript&logoColor=white)](https://www.typescriptlang.org/)
7
7
  [![License](https://img.shields.io/npm/l/browser-pilot.svg)](https://github.com/svilupp/browser-pilot/blob/main/LICENSE)
8
8
 
9
- Lightweight CDP-based browser automation for AI agents. Zero dependencies, works in Node.js, Bun, and Cloudflare Workers.
9
+ Automation-first CDP browser control for AI agents.
10
10
 
11
- ```typescript
12
- import { connect } from 'browser-pilot';
13
-
14
- const browser = await connect({ provider: 'browserbase', apiKey: process.env.BROWSERBASE_API_KEY });
15
- const page = await browser.page();
16
-
17
- await page.goto('https://example.com/login');
18
- await page.fill(['#email', 'input[type=email]'], 'user@example.com');
19
- await page.fill(['#password', 'input[type=password]'], 'secret');
20
- await page.submit(['#login-btn', 'button[type=submit]']);
21
-
22
- const snapshot = await page.snapshot();
23
- console.log(snapshot.text); // Accessibility tree as text
11
+ Browser Pilot now teaches one workflow model:
24
12
 
25
- await browser.close();
26
- ```
27
-
28
- ## Why browser-pilot?
13
+ - inspect the page
14
+ - act in the browser
15
+ - record a manual workflow
16
+ - trace behavior over time
17
+ - exercise voice/media and browser conditions
29
18
 
30
- | Problem with Playwright/Puppeteer | browser-pilot Solution |
31
- |-----------------------------------|------------------------|
32
- | Won't run in Cloudflare Workers | Pure Web Standard APIs, zero Node.js dependencies |
33
- | Bun CDP connection bugs | Custom CDP client that works everywhere |
34
- | Single-selector API (fragile) | Multi-selector by default: `['#primary', '.fallback']` |
35
- | No action batching (high latency) | Batch DSL: one call for entire sequences |
36
- | No inline assertions (extra API calls to verify) | Built-in assertions: verify state within the same batch |
37
- | No AI-optimized snapshots | Built-in accessibility tree extraction |
38
- | No audio I/O for voice agents | Mic injection + output capture + Whisper transcription |
19
+ `record` and `trace` are two interfaces over the same capture system. `record` writes the canonical artifact. `trace` explains either a live session or a saved artifact.
39
20
 
40
- ## Installation
21
+ ## Install
41
22
 
42
23
  ```bash
43
24
  bun add browser-pilot
@@ -45,634 +26,118 @@ bun add browser-pilot
45
26
  npm install browser-pilot
46
27
  ```
47
28
 
48
- ## Providers
49
-
50
- ### BrowserBase (Recommended for production)
51
-
52
- ```typescript
53
- const browser = await connect({
54
- provider: 'browserbase',
55
- apiKey: process.env.BROWSERBASE_API_KEY,
56
- projectId: process.env.BROWSERBASE_PROJECT_ID, // optional
57
- });
58
- ```
59
-
60
- ### Browserless
61
-
62
- ```typescript
63
- const browser = await connect({
64
- provider: 'browserless',
65
- apiKey: process.env.BROWSERLESS_API_KEY,
66
- });
67
- ```
68
-
69
- ### Generic (Local Chrome)
29
+ For local Chrome:
70
30
 
71
31
  ```bash
72
- # Start Chrome with remote debugging
73
- chrome --remote-debugging-port=9222
74
- ```
75
-
76
- ```typescript
77
- const browser = await connect({
78
- provider: 'generic',
79
- wsUrl: 'ws://localhost:9222/devtools/browser/...', // optional, auto-discovers
80
- });
81
- ```
82
-
83
- ## Core Concepts
84
-
85
- ### Multi-Selector (Robust Automation)
86
-
87
- Every action accepts `string | string[]`. When given an array, tries each selector in order until one works:
88
-
89
- ```typescript
90
- // Tries #submit first, falls back to alternatives
91
- await page.click(['#submit', 'button[type=submit]', '.submit-btn']);
92
-
93
- // Cookie consent - try multiple common patterns
94
- await page.click([
95
- '#accept-cookies',
96
- '.cookie-accept',
97
- 'button:has-text("Accept")',
98
- '[data-testid="cookie-accept"]'
99
- ], { optional: true, timeout: 3000 });
100
- ```
101
-
102
- ### Built-in Waiting
103
-
104
- Every action automatically waits for the element to be visible before interacting:
105
-
106
- ```typescript
107
- // No separate waitFor needed - this waits automatically
108
- await page.click('.dynamic-button', { timeout: 5000 });
109
-
110
- // Explicit waiting when needed
111
- await page.waitFor('.loading', { state: 'hidden' });
112
- await page.waitForNavigation();
113
- await page.waitForNetworkIdle();
114
- ```
115
-
116
- ### Batch Actions
117
-
118
- Execute multiple actions in a single call with full result tracking:
119
-
120
- ```typescript
121
- const result = await page.batch([
122
- { action: 'goto', url: 'https://example.com/login' },
123
- { action: 'fill', selector: '#email', value: 'user@example.com' },
124
- { action: 'fill', selector: '#password', value: 'secret' },
125
- { action: 'submit', selector: '#login-btn' },
126
- { action: 'wait', waitFor: 'navigation' },
127
- { action: 'snapshot' },
128
- ]);
129
-
130
- console.log(result.success); // true if all steps succeeded
131
- console.log(result.totalDurationMs); // total execution time
132
- console.log(result.steps[5].result); // snapshot from step 5
133
- ```
134
-
135
- Assertion steps verify expected state within the same batch — no extra round trips. Available: `assertVisible`, `assertExists`, `assertText`, `assertUrl`, `assertValue`.
136
-
137
- ```typescript
138
- const result = await page.batch([
139
- { action: 'goto', url: 'https://example.com/login' },
140
- { action: 'fill', selector: '#email', value: 'user@example.com' },
141
- { action: 'fill', selector: '#password', value: 'secret' },
142
- { action: 'submit', selector: '#login-btn' },
143
- { action: 'assertUrl', expect: '/dashboard' },
144
- { action: 'assertVisible', selector: '.welcome-message' },
145
- ]);
146
- ```
147
-
148
- Any step supports `retry` and `retryDelay` for flaky or async content:
149
-
150
- ```typescript
151
- { action: 'assertVisible', selector: '.async-content', retry: 3, retryDelay: 1000 }
152
- ```
153
-
154
- ### AI-Optimized Snapshots
155
-
156
- Get the page state in a format perfect for LLMs:
157
-
158
- ```typescript
159
- const snapshot = await page.snapshot();
160
-
161
- // Structured accessibility tree
162
- console.log(snapshot.accessibilityTree);
163
-
164
- // Interactive elements with refs
165
- console.log(snapshot.interactiveElements);
166
- // [{ ref: 'e1', role: 'button', name: 'Submit', selector: '...' }, ...]
167
-
168
- // Text representation for LLMs
169
- console.log(snapshot.text);
170
- // - main ref:e1
171
- // - heading "Welcome" ref:e2
172
- // - button "Get Started" ref:e3
173
- // - textbox ref:e4 placeholder="Email"
174
- ```
175
-
176
- ### Ref-Based Selectors
177
-
178
- After taking a snapshot, use element refs directly as selectors:
179
-
180
- ```typescript
181
- const snapshot = await page.snapshot();
182
- // Output shows: button "Submit" ref:e4
183
-
184
- // Click using the ref - no fragile CSS needed
185
- await page.click('ref:e4');
186
-
187
- // Fill input by ref
188
- await page.fill('ref:e23', 'hello@example.com');
189
-
190
- // Combine ref with CSS fallbacks
191
- await page.click(['ref:e4', '#submit', 'button[type=submit]']);
32
+ /Applications/Google\ Chrome.app/Contents/MacOS/Google\ Chrome \
33
+ --remote-debugging-port=9222 \
34
+ --user-data-dir=/tmp/browser-pilot-profile
192
35
  ```
193
36
 
194
- Refs are stable until page navigation. Always take a fresh snapshot after navigating.
195
- CLI note: refs are cached per session+URL after a snapshot, so you can reuse them across CLI calls
196
- until navigation changes the URL.
37
+ ## Choose the command by job
197
38
 
198
- ## Page API
199
-
200
- ### Navigation
201
-
202
- ```typescript
203
- await page.goto(url, options?)
204
- await page.reload(options?)
205
- await page.goBack(options?)
206
- await page.goForward(options?)
207
-
208
- const url = await page.url()
209
- const title = await page.title()
210
- ```
211
-
212
- ### Actions
213
-
214
- All actions accept `string | string[]` for selectors:
215
-
216
- ```typescript
217
- await page.click(selector, options?)
218
- await page.fill(selector, value, options?) // clears first by default
219
- await page.type(selector, text, options?) // types character by character
220
- await page.select(selector, value, options?) // native <select>
221
- await page.select({ trigger, option, value, match }, options?) // custom dropdown
222
- await page.check(selector, options?)
223
- await page.uncheck(selector, options?)
224
- await page.submit(selector, options?) // tries Enter, then click
225
- await page.press(key)
226
- await page.focus(selector, options?)
227
- await page.hover(selector, options?)
228
- await page.scroll(selector, options?)
229
- ```
230
-
231
- ### Waiting
232
-
233
- ```typescript
234
- await page.waitFor(selector, { state: 'visible' | 'hidden' | 'attached' | 'detached' })
235
- await page.waitForNavigation(options?)
236
- await page.waitForNetworkIdle({ idleTime: 500 })
237
- ```
238
-
239
- ### Content
240
-
241
- ```typescript
242
- const snapshot = await page.snapshot()
243
- const text = await page.text(selector?)
244
- const screenshot = await page.screenshot({ format: 'png', fullPage: true })
245
- const result = await page.evaluate(() => document.title)
246
- ```
247
-
248
- ### Files
249
-
250
- ```typescript
251
- await page.setInputFiles(selector, [{ name: 'file.pdf', mimeType: 'application/pdf', buffer: data }])
252
- const download = await page.waitForDownload(() => page.click('#download-btn'))
253
- ```
254
-
255
- ### Emulation
256
-
257
- ```typescript
258
- import { devices } from 'browser-pilot';
259
-
260
- await page.emulate(devices['iPhone 14']); // Full device emulation
261
- await page.setViewport({ width: 1280, height: 720, deviceScaleFactor: 2 });
262
- await page.setUserAgent('Custom UA');
263
- await page.setGeolocation({ latitude: 37.7749, longitude: -122.4194 });
264
- await page.setTimezone('America/New_York');
265
- await page.setLocale('fr-FR');
266
- ```
39
+ | Job | Primary commands |
40
+ | --- | --- |
41
+ | Inspect page state | `snapshot`, `page`, `forms`, `text`, `targets`, `diagnose` |
42
+ | Act in the browser | `exec`, `run` |
43
+ | Capture a human demo | `record` |
44
+ | Investigate behavior over time | `trace` |
45
+ | Exercise voice/media | `audio` |
46
+ | Change browser conditions | `env` |
267
47
 
268
- Devices: `iPhone 14`, `iPhone 14 Pro Max`, `Pixel 7`, `iPad Pro 11`, `Desktop Chrome`, `Desktop Firefox`
269
-
270
- ### Audio I/O
271
-
272
- ```typescript
273
- // Set up audio input/output interception
274
- await page.setupAudio();
275
-
276
- // Play audio into the page's fake microphone
277
- await page.audioInput.play(wavBytes, { waitForEnd: true });
278
-
279
- // Capture audio output until silence
280
- const capture = await page.audioOutput.captureUntilSilence({ silenceTimeout: 5000 });
281
-
282
- // Full round-trip: play input → capture response
283
- const result = await page.audioRoundTrip({ input: wavBytes, silenceTimeout: 5000 });
284
-
285
- // Transcribe captured audio (requires OPENAI_API_KEY)
286
- import { transcribe } from 'browser-pilot';
287
- const { text } = await transcribe(capture);
288
- ```
289
-
290
- ### Request Interception
291
-
292
- ```typescript
293
- // Block images and fonts
294
- await page.blockResources(['Image', 'Font']);
295
-
296
- // Mock API responses
297
- await page.route('**/api/users', { status: 200, body: { users: [] } });
298
-
299
- // Full control
300
- await page.intercept('*api*', async (request, actions) => {
301
- if (request.url.includes('blocked')) await actions.fail();
302
- else await actions.continue({ headers: { ...request.headers, 'X-Custom': 'value' } });
303
- });
304
- ```
305
-
306
- ### Cookies & Storage
307
-
308
- ```typescript
309
- // Cookies
310
- const cookies = await page.cookies();
311
- await page.setCookie({ name: 'session', value: 'abc', domain: '.example.com' });
312
- await page.clearCookies();
313
-
314
- // localStorage / sessionStorage
315
- await page.setLocalStorage('key', 'value');
316
- const value = await page.getLocalStorage('key');
317
- await page.clearLocalStorage();
318
- ```
319
-
320
- ### Console & Dialogs
321
-
322
- ```typescript
323
- // Capture console messages
324
- await page.onConsole((msg) => console.log(`[${msg.type}] ${msg.text}`));
325
-
326
- // Handle dialogs (alert, confirm, prompt)
327
- await page.onDialog(async (dialog) => {
328
- if (dialog.type === 'confirm') await dialog.accept();
329
- else await dialog.dismiss();
330
- });
331
-
332
- // Collect messages during an action
333
- const { result, messages } = await page.collectConsole(async () => {
334
- return await page.click('#button');
335
- });
336
- ```
337
-
338
- **Important:** Native browser dialogs (`alert()`, `confirm()`, `prompt()`) block all CDP commands until handled. Always set up a dialog handler before triggering actions that may show dialogs.
339
-
340
- ### Iframes
341
-
342
- Switch context to interact with iframe content:
343
-
344
- ```typescript
345
- // Switch to iframe
346
- await page.switchToFrame('iframe#payment');
347
-
348
- // Now actions target the iframe
349
- await page.fill('#card-number', '4242424242424242');
350
- await page.fill('#expiry', '12/25');
351
-
352
- // Switch back to main document
353
- await page.switchToMain();
354
- await page.click('#submit-order');
355
- ```
356
-
357
- Note: Cross-origin iframes cannot be accessed due to browser security.
358
-
359
- ### Options
360
-
361
- ```typescript
362
- interface ActionOptions {
363
- timeout?: number; // default: 30000ms
364
- optional?: boolean; // return false instead of throwing on failure
365
- }
366
- ```
367
-
368
- ## CLI
369
-
370
- The CLI provides session persistence for interactive workflows:
48
+ ## Golden path 1: automate a page
371
49
 
372
50
  ```bash
373
- # Connect to a browser
374
- bp connect --provider browserbase --name my-session
375
- bp connect --provider generic # auto-discovers local Chrome
376
-
377
- # Execute actions
378
- bp exec -s my-session '{"action":"goto","url":"https://example.com"}'
379
- bp exec -s my-session '[
380
- {"action":"fill","selector":"#search","value":"browser automation"},
381
- {"action":"submit","selector":"#search-form"}
382
- ]'
383
-
384
- # Get page state (note the refs in output)
385
- bp snapshot -s my-session --format text
386
- # Output: button "Submit" ref:e4, textbox "Email" ref:e5, ...
387
-
388
- # Use refs from snapshot for reliable targeting
389
- # Refs are cached per session+URL after snapshot
390
- bp exec -s my-session '{"action":"click","selector":"ref:e4"}'
391
- bp exec -s my-session '{"action":"fill","selector":"ref:e5","value":"test@example.com"}'
392
-
393
- # Quick discovery commands
394
- bp page -s my-session # URL, title, headings, forms, interactive controls
395
- bp forms -s my-session # Structured form metadata only
396
- bp targets -s my-session # Browser tabs with targetIds
397
- bp connect --new-tab --url https://example.com --name fresh
398
-
399
- # Handle native dialogs (alert/confirm/prompt)
400
- bp exec --dialog accept '{"action":"click","selector":"#delete-btn"}'
401
-
402
- # Other commands
403
- bp text -s my-session --selector ".main-content"
404
- bp screenshot -s my-session --output page.png
405
- bp listen ws -m "*voice*" # monitor WebSocket traffic
406
- bp list # list all sessions
407
- bp close -s my-session # close session
408
- bp actions # show complete action reference
409
- bp run workflow.json # run a workflow file
410
-
411
- # Actions with inline assertions (no extra bp eval needed)
412
- bp exec '[
413
- {"action":"goto","url":"https://example.com/login"},
414
- {"action":"fill","selector":"#email","value":"user@example.com"},
415
- {"action":"submit","selector":"form"},
416
- {"action":"assertUrl","expect":"/dashboard"},
51
+ bp connect --provider generic --name dev
52
+ bp snapshot -i -s dev
53
+ bp exec -s dev '[
54
+ {"action":"fill","selector":"ref:e5","value":"user@example.com"},
55
+ {"action":"click","selector":"ref:e7"},
417
56
  {"action":"assertText","expect":"Welcome"}
418
57
  ]'
419
58
  ```
420
59
 
421
- ### CLI for AI Agents
60
+ Use `bp snapshot -i` first. Refs are the default targeting strategy.
422
61
 
423
- The CLI is designed for AI agent tool calls. The recommended workflow:
424
-
425
- 1. **Take snapshot** to see the page structure with refs
426
- 2. **Use refs** (`ref:e4`) for reliable element targeting
427
- 3. **Batch actions** to reduce round trips
62
+ ## Golden path 2: capture a manual workflow and derive automation
428
63
 
429
64
  ```bash
430
- # Step 1: Get page state with refs
431
- bp snapshot --format text
432
- # Output shows: button "Add to Cart" ref:e12, textbox "Search" ref:e5
433
-
434
- # Step 2: Use refs to interact (stable, no CSS guessing)
435
- bp exec '[
436
- {"action":"fill","selector":"ref:e5","value":"laptop"},
437
- {"action":"click","selector":"ref:e12"},
438
- {"action":"snapshot"}
439
- ]' --format json
65
+ bp record -s demo --profile automation -f ./artifacts/demo.recording.json
66
+ # perform the flow manually, then stop with Ctrl+C
67
+ bp record summary ./artifacts/demo.recording.json
68
+ bp record derive ./artifacts/demo.recording.json -o workflow.json
69
+ bp run workflow.json
440
70
  ```
441
71
 
442
- Multi-selector fallbacks for robustness:
443
- ```bash
444
- bp exec '[
445
- {"action":"click","selector":["ref:e4","#submit","button[type=submit]"]}
446
- ]'
447
- ```
448
-
449
- Output:
450
- ```json
451
- {
452
- "success": true,
453
- "steps": [
454
- {"action": "fill", "success": true, "durationMs": 30},
455
- {"action": "click", "success": true, "durationMs": 50, "selectorUsed": "ref:e12"},
456
- {"action": "snapshot", "success": true, "durationMs": 100, "result": "..."}
457
- ],
458
- "totalDurationMs": 180
459
- }
460
- ```
461
-
462
- Run `bp actions` for complete action reference.
463
-
464
- ### Voice Agent Testing
72
+ Do not start by opening the raw artifact. Use `record summary`, `record inspect`, or `trace summary --view ...` first.
465
73
 
466
- Test audio-based AI apps (voice assistants, phone agents) by injecting microphone input and capturing spoken responses.
467
-
468
- > **Full guide:** [Voice Agent Testing Guide](./docs/guides/voice-agent-testing.md)
74
+ ## Golden path 3: debug a realtime or voice session
469
75
 
470
76
  ```bash
471
- export OPENAI_API_KEY=sk-... # Required for --transcribe
472
-
473
- # Validate audio pipeline
474
- bp audio check -s my-session
475
- # Output: "READY for roundtrip" with agent AudioContext detected
476
-
477
- # Full round-trip: send audio prompt → wait for response → transcribe
478
- bp audio roundtrip -i prompt.wav --transcribe --silence-timeout 1500
479
- # Output: { "transcript": "Welcome! I'd be happy to help...", "latencyMs": 5200, ... }
480
-
481
- # Save response audio for manual review
482
- bp audio roundtrip -i prompt.wav -o response.wav --transcribe
483
- ```
484
-
485
- **Important:** Audio overrides must be injected before the voice agent initializes. Use `bp audio check` to validate the pipeline. See the [full guide](./docs/guides/voice-agent-testing.md) for setup order and troubleshooting.
486
-
487
- Programmatic API:
488
-
489
- ```typescript
490
- await page.setupAudio();
491
-
492
- const result = await page.audioRoundTrip({
493
- input: audioBytes,
494
- silenceTimeout: 1500,
495
- });
496
-
497
- import { transcribe } from 'browser-pilot';
498
- const { text } = await transcribe(result.audio);
499
- console.log(text); // "Welcome! I'd be happy to help..."
77
+ bp connect --provider generic --name realtime
78
+ bp trace start -s realtime --timeout 20000
79
+ # reproduce the issue in the app
80
+ bp trace summary -s realtime --view ws
81
+ bp trace summary -s realtime --view console
500
82
  ```
501
83
 
502
- ### Recording Browser Actions
503
-
504
- Record human interactions to create automation recipes:
84
+ Voice workflow:
505
85
 
506
86
  ```bash
507
- # Auto-connect to local Chrome and record (creates new session)
508
- bp record
509
-
510
- # Use most recent session
511
- bp record -s
512
-
513
- # Use specific session with custom output file
514
- bp record -s my-session -f login-flow.json
515
-
516
- # Review and edit the recording
517
- cat recording.json
518
-
519
- # Replay the recording
520
- bp exec -s my-session --file recording.json
521
- ```
522
-
523
- The output format is compatible with `page.batch()`:
524
- ```json
525
- {
526
- "recordedAt": "2026-01-06T10:00:00.000Z",
527
- "startUrl": "https://example.com",
528
- "duration": 15000,
529
- "steps": [
530
- { "action": "fill", "selector": ["[data-testid=\"email\"]", "#email"], "value": "user@example.com" },
531
- { "action": "click", "selector": ["[data-testid=\"submit\"]", "#login-btn"] }
532
- ]
533
- }
534
- ```
535
-
536
- **Notes:**
537
- - Password fields are automatically redacted as `[REDACTED]`
538
- - Selectors are multi-selector arrays ordered by reliability (data attributes > IDs > CSS paths)
539
- - Edit the JSON to adjust selectors or add `optional: true` flags
540
-
541
- ## Examples
542
-
543
- ### Login Flow with Error Handling
544
-
545
- ```typescript
546
- const result = await page.batch([
547
- { action: 'goto', url: 'https://app.example.com/login' },
548
- { action: 'fill', selector: ['#email', 'input[name=email]'], value: email },
549
- { action: 'fill', selector: ['#password', 'input[name=password]'], value: password },
550
- { action: 'click', selector: '.remember-me', optional: true },
551
- { action: 'submit', selector: ['#login', 'button[type=submit]'] },
552
- ], { onFail: 'stop' });
553
-
554
- if (!result.success) {
555
- console.error(`Failed at step ${result.stoppedAtIndex}: ${result.steps[result.stoppedAtIndex!].error}`);
556
- }
87
+ bp audio setup -s realtime
88
+ bp exec -s realtime '{"action":"goto","url":"https://my-voice-app.com"}'
89
+ bp audio check -s realtime
90
+ bp audio roundtrip -s realtime -i prompt.wav --transcribe -o response.wav
91
+ bp trace summary -s realtime --view voice
557
92
  ```
558
93
 
559
- ### Custom Dropdown
94
+ ## Golden path 4: exercise failure modes
560
95
 
561
- ```typescript
562
- // Using the custom select config
563
- await page.select({
564
- trigger: '.country-dropdown',
565
- option: '.dropdown-option',
566
- value: 'United States',
567
- match: 'text', // or 'contains' or 'value'
568
- });
569
-
570
- // Or compose from primitives
571
- await page.click('.country-dropdown');
572
- await page.fill('.dropdown-search', 'United');
573
- await page.click('.dropdown-option:has-text("United States")');
96
+ ```bash
97
+ bp env permissions grant -s realtime microphone
98
+ bp env network offline -s realtime --duration 5000
99
+ bp trace watch -s realtime --view ws --assert profile:reconnect --timeout 15000
100
+ bp env visibility hidden -s realtime
574
101
  ```
575
102
 
576
- ### Cloudflare Workers
103
+ ## What is new in the model
577
104
 
578
- ```typescript
579
- export default {
580
- async fetch(request: Request, env: Env): Promise<Response> {
581
- const browser = await connect({
582
- provider: 'browserbase',
583
- apiKey: env.BROWSERBASE_API_KEY,
584
- });
585
-
586
- const page = await browser.page();
587
- await page.goto('https://example.com');
588
- const snapshot = await page.snapshot();
589
-
590
- await browser.close();
591
-
592
- return Response.json({ title: snapshot.title, elements: snapshot.interactiveElements });
593
- },
594
- };
595
- ```
105
+ - One canonical artifact model with `version: 2`
106
+ - One canonical trace event stream for recording, live trace, and session logs
107
+ - Trace-backed waits and assertions in `exec` / `run`
108
+ - `listen` preserved as a compatibility alias to `trace tail`
109
+ - `audio` for active control, `trace` for explanation, `env` for browser-state controls
596
110
 
597
- ### AI Agent Tool Definition
111
+ ## Programmatic example
598
112
 
599
113
  ```typescript
600
- const browserTool = {
601
- name: 'browser_action',
602
- description: 'Execute browser actions and get page state',
603
- parameters: {
604
- type: 'object',
605
- properties: {
606
- actions: {
607
- type: 'array',
608
- items: {
609
- type: 'object',
610
- properties: {
611
- action: { enum: ['goto', 'click', 'fill', 'submit', 'snapshot', 'assertVisible', 'assertExists', 'assertText', 'assertUrl', 'assertValue'] },
612
- selector: { type: ['string', 'array'] },
613
- value: { type: 'string' },
614
- url: { type: 'string' },
615
- },
616
- },
617
- },
618
- },
619
- },
620
- execute: async ({ actions }) => {
621
- const page = await getOrCreatePage();
622
- return page.batch(actions);
623
- },
624
- };
625
- ```
626
-
627
- ## Advanced
628
-
629
- ### Direct CDP Access
114
+ import { connect } from 'browser-pilot';
630
115
 
631
- ```typescript
632
116
  const browser = await connect({ provider: 'generic' });
633
- const cdp = browser.cdpClient;
634
-
635
- // Send any CDP command
636
- await cdp.send('Emulation.setDeviceMetricsOverride', {
637
- width: 375,
638
- height: 812,
639
- deviceScaleFactor: 3,
640
- mobile: true,
641
- });
642
- ```
643
-
644
- ### Tracing
117
+ const page = await browser.page();
645
118
 
646
- ```typescript
647
- import { enableTracing } from 'browser-pilot';
119
+ await page.batch([
120
+ { action: 'goto', url: 'https://example.com/login' },
121
+ { action: 'fill', selector: ['#email', 'input[type=email]'], value: 'user@example.com' },
122
+ { action: 'submit', selector: 'form' },
123
+ { action: 'assertUrl', expect: '/dashboard' },
124
+ ]);
648
125
 
649
- enableTracing({ output: 'console' });
650
- // [info] goto https://example.com ✓ (1200ms)
651
- // [info] click #submit ✓ (50ms)
126
+ await browser.close();
652
127
  ```
653
128
 
654
- ## AI Agent Integration
655
-
656
- browser-pilot is designed for AI agents. Two resources for agent setup:
657
-
658
- - **[llms.txt](./docs/llms.txt)** - Abbreviated reference for LLM context windows
659
- - **[Claude Code Skill](./docs/skill/SKILL.md)** - Full skill for Claude Code agents
660
-
661
- To use with Claude Code, copy `docs/skill/` to your project or reference it in your agent's context.
662
-
663
- ## Documentation
664
-
665
- See the [docs](./docs) folder for detailed documentation:
129
+ ## Guides
666
130
 
667
- - [Getting Started](./docs/getting-started.md)
668
- - [Providers](./docs/providers.md)
669
- - [Multi-Selector Guide](./docs/guides/multi-selector.md)
670
- - [Batch Actions](./docs/guides/batch-actions.md)
671
- - [Snapshots](./docs/guides/snapshots.md)
672
- - [Voice Agent Testing](./docs/guides/voice-agent-testing.md)
673
- - [CLI Reference](./docs/cli.md)
674
- - [API Reference](./docs/api/page.md)
131
+ - [CLI guide](./docs/cli.md)
132
+ - [Automation workflows](./docs/guides/automation-workflows.md)
133
+ - [Action recording](./docs/guides/action-recording.md)
134
+ - [Trace workflows](./docs/guides/trace-workflows.md)
135
+ - [Realtime debugging](./docs/guides/realtime-debugging.md)
136
+ - [Voice agent testing](./docs/guides/voice-agent-testing.md)
137
+ - [Artifact analysis](./docs/guides/artifact-analysis.md)
138
+ - [LLM contract](./docs/llms.txt)
675
139
 
676
- ## License
140
+ ## Compatibility notes
677
141
 
678
- MIT
142
+ - Prefer `--debug` for transport logging. `--trace` still works as a legacy alias.
143
+ - Prefer `bp trace tail ...`. `bp listen ...` still works as a compatibility alias.