browser-pilot 0.0.14 → 0.0.15

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -6,38 +6,19 @@
6
6
  [![TypeScript](https://img.shields.io/badge/TypeScript-5.0-blue?style=flat&logo=typescript&logoColor=white)](https://www.typescriptlang.org/)
7
7
  [![License](https://img.shields.io/npm/l/browser-pilot.svg)](https://github.com/svilupp/browser-pilot/blob/main/LICENSE)
8
8
 
9
- Lightweight CDP-based browser automation for AI agents. Zero dependencies, works in Node.js, Bun, and Cloudflare Workers.
9
+ Automation-first CDP browser control for AI agents.
10
10
 
11
- ```typescript
12
- import { connect } from 'browser-pilot';
13
-
14
- const browser = await connect({ provider: 'browserbase', apiKey: process.env.BROWSERBASE_API_KEY });
15
- const page = await browser.page();
11
+ Browser Pilot now teaches one workflow model:
16
12
 
17
- await page.goto('https://example.com/login');
18
- await page.fill(['#email', 'input[type=email]'], 'user@example.com');
19
- await page.fill(['#password', 'input[type=password]'], 'secret');
20
- await page.submit(['#login-btn', 'button[type=submit]']);
13
+ - inspect the page
14
+ - act in the browser
15
+ - record a manual workflow
16
+ - trace behavior over time
17
+ - exercise voice/media and browser conditions
21
18
 
22
- const snapshot = await page.snapshot();
23
- console.log(snapshot.text); // Accessibility tree as text
24
-
25
- await browser.close();
26
- ```
19
+ `record` and `trace` are two interfaces over the same capture system. `record` writes the canonical artifact. `trace` explains either a live session or a saved artifact.
27
20
 
28
- ## Why browser-pilot?
29
-
30
- | Problem with Playwright/Puppeteer | browser-pilot Solution |
31
- |-----------------------------------|------------------------|
32
- | Won't run in Cloudflare Workers | Pure Web Standard APIs, zero Node.js dependencies |
33
- | Bun CDP connection bugs | Custom CDP client that works everywhere |
34
- | Single-selector API (fragile) | Multi-selector by default: `['#primary', '.fallback']` |
35
- | No action batching (high latency) | Batch DSL: one call for entire sequences |
36
- | No inline assertions (extra API calls to verify) | Built-in assertions: verify state within the same batch |
37
- | No AI-optimized snapshots | Built-in accessibility tree extraction |
38
- | No audio I/O for voice agents | Mic injection + output capture + Whisper transcription |
39
-
40
- ## Installation
21
+ ## Install
41
22
 
42
23
  ```bash
43
24
  bun add browser-pilot
@@ -45,690 +26,118 @@ bun add browser-pilot
45
26
  npm install browser-pilot
46
27
  ```
47
28
 
48
- ## Providers
49
-
50
- ### BrowserBase (Recommended for production)
51
-
52
- ```typescript
53
- const browser = await connect({
54
- provider: 'browserbase',
55
- apiKey: process.env.BROWSERBASE_API_KEY,
56
- projectId: process.env.BROWSERBASE_PROJECT_ID, // optional
57
- });
58
- ```
59
-
60
- ### Browserless
61
-
62
- ```typescript
63
- const browser = await connect({
64
- provider: 'browserless',
65
- apiKey: process.env.BROWSERLESS_API_KEY,
66
- });
67
- ```
68
-
69
- ### Generic (Local Chrome)
29
+ For local Chrome:
70
30
 
71
31
  ```bash
72
- # Start Chrome with remote debugging
73
- chrome --remote-debugging-port=9222
74
- ```
75
-
76
- ```typescript
77
- const browser = await connect({
78
- provider: 'generic',
79
- wsUrl: 'ws://localhost:9222/devtools/browser/...', // optional, auto-discovers
80
- });
81
- ```
82
-
83
- ## Core Concepts
84
-
85
- ### Multi-Selector (Robust Automation)
86
-
87
- Every action accepts `string | string[]`. When given an array, tries each selector in order until one works:
88
-
89
- ```typescript
90
- // Tries #submit first, falls back to alternatives
91
- await page.click(['#submit', 'button[type=submit]', '.submit-btn']);
92
-
93
- // Cookie consent - try multiple common patterns
94
- await page.click([
95
- '#accept-cookies',
96
- '.cookie-accept',
97
- 'button:has-text("Accept")',
98
- '[data-testid="cookie-accept"]'
99
- ], { optional: true, timeout: 3000 });
100
- ```
101
-
102
- ### Built-in Waiting
103
-
104
- Every action automatically waits for the element to be visible before interacting:
105
-
106
- ```typescript
107
- // No separate waitFor needed - this waits automatically
108
- await page.click('.dynamic-button', { timeout: 5000 });
109
-
110
- // Explicit waiting when needed
111
- await page.waitFor('.loading', { state: 'hidden' });
112
- await page.waitForNavigation();
113
- await page.waitForNetworkIdle();
114
- ```
115
-
116
- ### Batch Actions
117
-
118
- Execute multiple actions in a single call with full result tracking:
119
-
120
- ```typescript
121
- const result = await page.batch([
122
- { action: 'goto', url: 'https://example.com/login' },
123
- { action: 'fill', selector: '#email', value: 'user@example.com' },
124
- { action: 'fill', selector: '#password', value: 'secret' },
125
- { action: 'submit', selector: '#login-btn' },
126
- { action: 'wait', waitFor: 'navigation' },
127
- { action: 'snapshot' },
128
- ]);
129
-
130
- console.log(result.success); // true if all steps succeeded
131
- console.log(result.totalDurationMs); // total execution time
132
- console.log(result.steps[5].result); // snapshot from step 5
133
- ```
134
-
135
- Assertion steps verify expected state within the same batch — no extra round trips. Available: `assertVisible`, `assertExists`, `assertText`, `assertUrl`, `assertValue`.
136
-
137
- ```typescript
138
- const result = await page.batch([
139
- { action: 'goto', url: 'https://example.com/login' },
140
- { action: 'fill', selector: '#email', value: 'user@example.com' },
141
- { action: 'fill', selector: '#password', value: 'secret' },
142
- { action: 'submit', selector: '#login-btn' },
143
- { action: 'assertUrl', expect: '/dashboard' },
144
- { action: 'assertVisible', selector: '.welcome-message' },
145
- ]);
146
- ```
147
-
148
- Any step supports `retry` and `retryDelay` for flaky or async content:
149
-
150
- ```typescript
151
- { action: 'assertVisible', selector: '.async-content', retry: 3, retryDelay: 1000 }
32
+ /Applications/Google\ Chrome.app/Contents/MacOS/Google\ Chrome \
33
+ --remote-debugging-port=9222 \
34
+ --user-data-dir=/tmp/browser-pilot-profile
152
35
  ```
153
36
 
154
- ### AI-Optimized Snapshots
37
+ ## Choose the command by job
155
38
 
156
- Get the page state in a format perfect for LLMs:
157
-
158
- ```typescript
159
- const snapshot = await page.snapshot();
160
-
161
- // Structured accessibility tree
162
- console.log(snapshot.accessibilityTree);
163
-
164
- // Interactive elements with refs
165
- console.log(snapshot.interactiveElements);
166
- // [{ ref: 'e1', role: 'button', name: 'Submit', selector: '...' }, ...]
167
-
168
- // Text representation for LLMs
169
- console.log(snapshot.text);
170
- // - main ref:e1
171
- // - heading "Welcome" ref:e2
172
- // - button "Get Started" ref:e3
173
- // - textbox ref:e4 placeholder="Email"
174
- ```
175
-
176
- ### Ref-Based Selectors
177
-
178
- After taking a snapshot, use element refs directly as selectors:
179
-
180
- ```typescript
181
- const snapshot = await page.snapshot();
182
- // Output shows: button "Submit" ref:e4
183
-
184
- // Click using the ref - no fragile CSS needed
185
- await page.click('ref:e4');
186
-
187
- // Fill input by ref
188
- await page.fill('ref:e23', 'hello@example.com');
189
-
190
- // Combine ref with CSS fallbacks
191
- await page.click(['ref:e4', '#submit', 'button[type=submit]']);
192
- ```
193
-
194
- Refs are stable until page navigation. Always take a fresh snapshot after navigating.
195
- CLI note: refs are cached per session+URL after a snapshot, so you can reuse them across CLI calls
196
- until navigation changes the URL.
197
-
198
- ## Page API
199
-
200
- ### Navigation
201
-
202
- ```typescript
203
- await page.goto(url, options?)
204
- await page.reload(options?)
205
- await page.goBack(options?)
206
- await page.goForward(options?)
207
-
208
- const url = await page.url()
209
- const title = await page.title()
210
- ```
211
-
212
- ### Actions
213
-
214
- All actions accept `string | string[]` for selectors:
215
-
216
- ```typescript
217
- await page.click(selector, options?)
218
- await page.fill(selector, value, options?) // clears first by default
219
- await page.type(selector, text, options?) // types character by character
220
- await page.select(selector, value, options?) // native <select>
221
- await page.select({ trigger, option, value, match }, options?) // custom dropdown
222
- await page.check(selector, options?)
223
- await page.uncheck(selector, options?)
224
- await page.submit(selector, options?) // tries Enter, then click
225
- await page.press(key)
226
- await page.focus(selector, options?)
227
- await page.hover(selector, options?)
228
- await page.scroll(selector, options?)
229
- ```
230
-
231
- ### Waiting
232
-
233
- ```typescript
234
- await page.waitFor(selector, { state: 'visible' | 'hidden' | 'attached' | 'detached' })
235
- await page.waitForNavigation(options?)
236
- await page.waitForNetworkIdle({ idleTime: 500 })
237
- ```
238
-
239
- ### Content
240
-
241
- ```typescript
242
- const snapshot = await page.snapshot()
243
- const text = await page.text(selector?)
244
- const screenshot = await page.screenshot({ format: 'png', fullPage: true })
245
- const result = await page.evaluate(() => document.title)
246
- ```
247
-
248
- ### Files
249
-
250
- ```typescript
251
- await page.setInputFiles(selector, [{ name: 'file.pdf', mimeType: 'application/pdf', buffer: data }])
252
- const download = await page.waitForDownload(() => page.click('#download-btn'))
253
- ```
254
-
255
- ### Emulation
256
-
257
- ```typescript
258
- import { devices } from 'browser-pilot';
259
-
260
- await page.emulate(devices['iPhone 14']); // Full device emulation
261
- await page.setViewport({ width: 1280, height: 720, deviceScaleFactor: 2 });
262
- await page.setUserAgent('Custom UA');
263
- await page.setGeolocation({ latitude: 37.7749, longitude: -122.4194 });
264
- await page.setTimezone('America/New_York');
265
- await page.setLocale('fr-FR');
266
- ```
267
-
268
- Devices: `iPhone 14`, `iPhone 14 Pro Max`, `Pixel 7`, `iPad Pro 11`, `Desktop Chrome`, `Desktop Firefox`
269
-
270
- ### Audio I/O
271
-
272
- ```typescript
273
- // Set up audio input/output interception
274
- await page.setupAudio();
275
-
276
- // Play audio into the page's fake microphone
277
- await page.audioInput.play(wavBytes, { waitForEnd: true });
278
-
279
- // Capture audio output until silence
280
- const capture = await page.audioOutput.captureUntilSilence({ silenceTimeout: 5000 });
281
-
282
- // Full round-trip: play input → capture response
283
- const result = await page.audioRoundTrip({ input: wavBytes, silenceTimeout: 5000 });
284
-
285
- // Transcribe captured audio (requires OPENAI_API_KEY)
286
- import { transcribe } from 'browser-pilot';
287
- const { text } = await transcribe(capture);
288
- ```
39
+ | Job | Primary commands |
40
+ | --- | --- |
41
+ | Inspect page state | `snapshot`, `page`, `forms`, `text`, `targets`, `diagnose` |
42
+ | Act in the browser | `exec`, `run` |
43
+ | Capture a human demo | `record` |
44
+ | Investigate behavior over time | `trace` |
45
+ | Exercise voice/media | `audio` |
46
+ | Change browser conditions | `env` |
289
47
 
290
- ### Request Interception
291
-
292
- ```typescript
293
- // Block images and fonts
294
- await page.blockResources(['Image', 'Font']);
295
-
296
- // Mock API responses
297
- await page.route('**/api/users', { status: 200, body: { users: [] } });
298
-
299
- // Full control
300
- await page.intercept('*api*', async (request, actions) => {
301
- if (request.url.includes('blocked')) await actions.fail();
302
- else await actions.continue({ headers: { ...request.headers, 'X-Custom': 'value' } });
303
- });
304
- ```
305
-
306
- ### Cookies & Storage
307
-
308
- ```typescript
309
- // Cookies
310
- const cookies = await page.cookies();
311
- await page.setCookie({ name: 'session', value: 'abc', domain: '.example.com' });
312
- await page.clearCookies();
313
-
314
- // localStorage / sessionStorage
315
- await page.setLocalStorage('key', 'value');
316
- const value = await page.getLocalStorage('key');
317
- await page.clearLocalStorage();
318
- ```
319
-
320
- ### Console & Dialogs
321
-
322
- ```typescript
323
- // Capture console messages
324
- await page.onConsole((msg) => console.log(`[${msg.type}] ${msg.text}`));
325
-
326
- // Handle dialogs (alert, confirm, prompt)
327
- await page.onDialog(async (dialog) => {
328
- if (dialog.type === 'confirm') await dialog.accept();
329
- else await dialog.dismiss();
330
- });
331
-
332
- // Collect messages during an action
333
- const { result, messages } = await page.collectConsole(async () => {
334
- return await page.click('#button');
335
- });
336
- ```
337
-
338
- **Important:** Native browser dialogs (`alert()`, `confirm()`, `prompt()`) block all CDP commands until handled. Always set up a dialog handler before triggering actions that may show dialogs.
339
-
340
- ### Iframes
341
-
342
- Switch context to interact with iframe content:
343
-
344
- ```typescript
345
- // Switch to iframe
346
- await page.switchToFrame('iframe#payment');
347
-
348
- // Now actions target the iframe
349
- await page.fill('#card-number', '4242424242424242');
350
- await page.fill('#expiry', '12/25');
351
-
352
- // Switch back to main document
353
- await page.switchToMain();
354
- await page.click('#submit-order');
355
- ```
356
-
357
- Note: Cross-origin iframes cannot be accessed due to browser security.
358
-
359
- ### Options
360
-
361
- ```typescript
362
- interface ActionOptions {
363
- timeout?: number; // default: 30000ms
364
- optional?: boolean; // return false instead of throwing on failure
365
- }
366
- ```
367
-
368
- ## CLI
369
-
370
- The CLI provides session persistence for interactive workflows:
48
+ ## Golden path 1: automate a page
371
49
 
372
50
  ```bash
373
- # Connect to a browser
374
- bp connect --provider browserbase --name my-session
375
- bp connect --provider generic # auto-discovers local Chrome
376
- bp connect --no-daemon # skip daemon (direct WebSocket only)
377
- bp connect --daemon-idle 30 # custom idle timeout (minutes)
378
-
379
- # Execute actions
380
- bp exec -s my-session '{"action":"goto","url":"https://example.com"}'
381
- bp exec -s my-session '[
382
- {"action":"fill","selector":"#search","value":"browser automation"},
383
- {"action":"submit","selector":"#search-form"}
384
- ]'
385
-
386
- # Get page state (note the refs in output)
387
- bp snapshot -s my-session --format text
388
- # Output: button "Submit" ref:e4, textbox "Email" ref:e5, ...
389
-
390
- # Use refs from snapshot for reliable targeting
391
- # Refs are cached per session+URL after snapshot
392
- bp exec -s my-session '{"action":"click","selector":"ref:e4"}'
393
- bp exec -s my-session '{"action":"fill","selector":"ref:e5","value":"test@example.com"}'
394
-
395
- # Quick discovery commands
396
- bp page -s my-session # URL, title, headings, forms, interactive controls
397
- bp forms -s my-session # Structured form metadata only
398
- bp targets -s my-session # Browser tabs with targetIds
399
- bp connect --new-tab --url https://example.com --name fresh
400
-
401
- # Handle native dialogs (alert/confirm/prompt)
402
- bp exec --dialog accept '{"action":"click","selector":"#delete-btn"}'
403
- bp exec --record '[{"action":"click","selector":"#checkout"},{"action":"assertText","expect":"Thanks"}]'
404
-
405
- # Other commands
406
- bp text -s my-session --selector ".main-content"
407
- bp screenshot -s my-session --output page.png
408
- bp listen ws -m "*voice*" # monitor WebSocket traffic
409
- bp list # list all sessions
410
- bp clean --max-size 500MB # trim old sessions by disk usage
411
- bp close -s my-session # close session
412
- bp actions # show complete action reference
413
- bp run workflow.json # run a workflow file
414
-
415
- # Daemon management
416
- bp daemon status # check daemon health
417
- bp daemon stop # stop daemon for default session
418
- bp daemon logs # view daemon log
419
-
420
- # Actions with inline assertions (no extra bp eval needed)
421
- bp exec '[
422
- {"action":"goto","url":"https://example.com/login"},
423
- {"action":"fill","selector":"#email","value":"user@example.com"},
424
- {"action":"submit","selector":"form"},
425
- {"action":"assertUrl","expect":"/dashboard"},
51
+ bp connect --provider generic --name dev
52
+ bp snapshot -i -s dev
53
+ bp exec -s dev '[
54
+ {"action":"fill","selector":"ref:e5","value":"user@example.com"},
55
+ {"action":"click","selector":"ref:e7"},
426
56
  {"action":"assertText","expect":"Welcome"}
427
57
  ]'
428
58
  ```
429
59
 
430
- ### CLI for AI Agents
431
-
432
- The CLI is designed for AI agent tool calls. The recommended workflow:
60
+ Use `bp snapshot -i` first. Refs are the default targeting strategy.
433
61
 
434
- 1. **Take snapshot** to see the page structure with refs
435
- 2. **Use refs** (`ref:e4`) for reliable element targeting
436
- 3. **Batch actions** to reduce round trips
62
+ ## Golden path 2: capture a manual workflow and derive automation
437
63
 
438
64
  ```bash
439
- # Step 1: Get page state with refs
440
- bp snapshot --format text
441
- # Output shows: button "Add to Cart" ref:e12, textbox "Search" ref:e5
442
-
443
- # Step 2: Use refs to interact (stable, no CSS guessing)
444
- bp exec '[
445
- {"action":"fill","selector":"ref:e5","value":"laptop"},
446
- {"action":"click","selector":"ref:e12"},
447
- {"action":"snapshot"}
448
- ]' --format json
449
- ```
450
-
451
- Multi-selector fallbacks for robustness:
452
- ```bash
453
- bp exec '[
454
- {"action":"click","selector":["ref:e4","#submit","button[type=submit]"]}
455
- ]'
456
- ```
457
-
458
- Output:
459
- ```json
460
- {
461
- "success": true,
462
- "steps": [
463
- {"action": "fill", "success": true, "durationMs": 30},
464
- {"action": "click", "success": true, "durationMs": 50, "selectorUsed": "ref:e12"},
465
- {"action": "snapshot", "success": true, "durationMs": 100, "result": "..."}
466
- ],
467
- "totalDurationMs": 180
468
- }
469
- ```
470
-
471
- Run `bp actions` for complete action reference.
472
-
473
- ### Voice Agent Testing
474
-
475
- Test audio-based AI apps (voice assistants, phone agents) by injecting microphone input and capturing spoken responses.
476
-
477
- > **Full guide:** [Voice Agent Testing Guide](./docs/guides/voice-agent-testing.md)
478
-
479
- ```bash
480
- export OPENAI_API_KEY=sk-... # Required for --transcribe
481
-
482
- # Validate audio pipeline
483
- bp audio check -s my-session
484
- # Output: "READY for roundtrip" with agent AudioContext detected
485
-
486
- # Full round-trip: send audio prompt → wait for response → transcribe
487
- bp audio roundtrip -i prompt.wav --transcribe --silence-timeout 1500
488
- # Output: { "transcript": "Welcome! I'd be happy to help...", "latencyMs": 5200, ... }
489
-
490
- # Save response audio for manual review
491
- bp audio roundtrip -i prompt.wav -o response.wav --transcribe
492
- ```
493
-
494
- **Important:** Audio overrides must be injected before the voice agent initializes. Use `bp audio check` to validate the pipeline. See the [full guide](./docs/guides/voice-agent-testing.md) for setup order and troubleshooting.
495
-
496
- Programmatic API:
497
-
498
- ```typescript
499
- await page.setupAudio();
500
-
501
- const result = await page.audioRoundTrip({
502
- input: audioBytes,
503
- silenceTimeout: 1500,
504
- });
505
-
506
- import { transcribe } from 'browser-pilot';
507
- const { text } = await transcribe(result.audio);
508
- console.log(text); // "Welcome! I'd be happy to help..."
65
+ bp record -s demo --profile automation -f ./artifacts/demo.recording.json
66
+ # perform the flow manually, then stop with Ctrl+C
67
+ bp record summary ./artifacts/demo.recording.json
68
+ bp record derive ./artifacts/demo.recording.json -o workflow.json
69
+ bp run workflow.json
509
70
  ```
510
71
 
511
- ### Recording Browser Actions
72
+ Do not start by opening the raw artifact. Use `record summary`, `record inspect`, or `trace summary --view ...` first.
512
73
 
513
- Record human interactions to create automation recipes:
74
+ ## Golden path 3: debug a realtime or voice session
514
75
 
515
76
  ```bash
516
- # Auto-connect to local Chrome and record (creates new session)
517
- bp record
518
-
519
- # Use most recent session
520
- bp record -s
521
-
522
- # Use specific session with custom output file
523
- bp record -s my-session -f login-flow.json
524
-
525
- # Review and edit the recording
526
- cat recording.json
527
-
528
- # Replay the recording
529
- bp exec -s my-session --file recording.json
530
- ```
531
-
532
- The output format is compatible with `page.batch()`:
533
- ```json
534
- {
535
- "recordedAt": "2026-01-06T10:00:00.000Z",
536
- "startUrl": "https://example.com",
537
- "duration": 15000,
538
- "steps": [
539
- { "action": "fill", "selector": ["[data-testid=\"email\"]", "#email"], "value": "user@example.com" },
540
- { "action": "click", "selector": ["[data-testid=\"submit\"]", "#login-btn"] }
541
- ]
542
- }
77
+ bp connect --provider generic --name realtime
78
+ bp trace start -s realtime --timeout 20000
79
+ # reproduce the issue in the app
80
+ bp trace summary -s realtime --view ws
81
+ bp trace summary -s realtime --view console
543
82
  ```
544
83
 
545
- **Notes:**
546
- - Sensitive fields are automatically redacted as `[REDACTED]` based on input settings such as `type="password"`, `type="hidden"`, and secret/autofill hints like `autocomplete="one-time-code"` or `cc-number`
547
- - Selectors are multi-selector arrays ordered by reliability (data attributes > IDs > CSS paths)
548
- - Edit the JSON to adjust selectors or add `optional: true` flags
549
-
550
- ### Screenshot Trail During Replay
551
-
552
- Capture a lightweight visual trail while replaying steps. Enable recording at the session level so all `bp exec` calls are captured automatically:
84
+ Voice workflow:
553
85
 
554
86
  ```bash
555
- # Enable recording for the entire session
556
- bp connect --provider generic --name my-session --record
557
-
558
- # All exec calls now produce screenshots frames accumulate in one manifest
559
- bp exec -s my-session '[
560
- {"action":"goto","url":"https://example.com/login"},
561
- {"action":"fill","selector":"#email","value":"user@example.com"},
562
- {"action":"submit","selector":"form"}
563
- ]'
564
- bp exec -s my-session '{"action":"assertUrl","expect":"/dashboard"}'
565
-
566
- # Or enable recording on a single exec call
567
- bp exec --record '[{"action":"click","selector":"#checkout"}]'
568
- ```
569
-
570
- This writes `recording.json` plus a `screenshots/` directory in the session directory. Sensitive field values are redacted in both the manifest and the screenshot overlays. See the [Action Recording Guide](./docs/guides/action-recording.md) for options like `--record-format`, `--record-quality`, and `--no-highlights`.
571
-
572
- ## Examples
573
-
574
- ### Login Flow with Error Handling
575
-
576
- ```typescript
577
- const result = await page.batch([
578
- { action: 'goto', url: 'https://app.example.com/login' },
579
- { action: 'fill', selector: ['#email', 'input[name=email]'], value: email },
580
- { action: 'fill', selector: ['#password', 'input[name=password]'], value: password },
581
- { action: 'click', selector: '.remember-me', optional: true },
582
- { action: 'submit', selector: ['#login', 'button[type=submit]'] },
583
- ], { onFail: 'stop' });
584
-
585
- if (!result.success) {
586
- console.error(`Failed at step ${result.stoppedAtIndex}: ${result.steps[result.stoppedAtIndex!].error}`);
587
- }
87
+ bp audio setup -s realtime
88
+ bp exec -s realtime '{"action":"goto","url":"https://my-voice-app.com"}'
89
+ bp audio check -s realtime
90
+ bp audio roundtrip -s realtime -i prompt.wav --transcribe -o response.wav
91
+ bp trace summary -s realtime --view voice
588
92
  ```
589
93
 
590
- ### Custom Dropdown
591
-
592
- ```typescript
593
- // Using the custom select config
594
- await page.select({
595
- trigger: '.country-dropdown',
596
- option: '.dropdown-option',
597
- value: 'United States',
598
- match: 'text', // or 'contains' or 'value'
599
- });
600
-
601
- // Or compose from primitives
602
- await page.click('.country-dropdown');
603
- await page.fill('.dropdown-search', 'United');
604
- await page.click('.dropdown-option:has-text("United States")');
605
- ```
606
-
607
- ### WebSocket Daemon
608
-
609
- By default, `bp connect` spawns a lightweight background daemon that holds the CDP WebSocket open. Subsequent CLI commands connect via Unix socket (~5-15ms) instead of re-establishing WebSocket (~280-1030ms per command).
94
+ ## Golden path 4: exercise failure modes
610
95
 
611
96
  ```bash
612
- # Daemon spawns automatically on connect
613
- bp connect --provider generic --name dev
614
-
615
- # Subsequent commands use the fast daemon path
616
- bp exec -s dev '{"action":"snapshot"}' # ~5-15ms overhead instead of ~280ms
617
-
618
- # Manage the daemon
619
- bp daemon status # check health, PID, uptime
620
- bp daemon stop # stop daemon
621
- bp daemon logs # view daemon log
622
-
623
- # Disable daemon for direct WebSocket
624
- bp connect --no-daemon
97
+ bp env permissions grant -s realtime microphone
98
+ bp env network offline -s realtime --duration 5000
99
+ bp trace watch -s realtime --view ws --assert profile:reconnect --timeout 15000
100
+ bp env visibility hidden -s realtime
625
101
  ```
626
102
 
627
- The daemon is transparent if it dies or becomes stale, CLI commands fall back to direct WebSocket silently. Each session gets its own daemon with a 60-minute idle timeout.
628
-
629
- ### Cloudflare Workers
103
+ ## What is new in the model
630
104
 
631
- > Note: Cloudflare Workers' Node-compat runtime can expose parts of `node:net` with compatibility flags, but browser-pilot's daemon fast-path is intentionally CLI/Node-specific (Unix domain sockets + local background process). In Workers, use the normal direct WebSocket path shown below.
632
-
633
- ```typescript
634
- export default {
635
- async fetch(request: Request, env: Env): Promise<Response> {
636
- const browser = await connect({
637
- provider: 'browserbase',
638
- apiKey: env.BROWSERBASE_API_KEY,
639
- });
640
-
641
- const page = await browser.page();
642
- await page.goto('https://example.com');
643
- const snapshot = await page.snapshot();
644
-
645
- await browser.close();
646
-
647
- return Response.json({ title: snapshot.title, elements: snapshot.interactiveElements });
648
- },
649
- };
650
- ```
105
+ - One canonical artifact model with `version: 2`
106
+ - One canonical trace event stream for recording, live trace, and session logs
107
+ - Trace-backed waits and assertions in `exec` / `run`
108
+ - `listen` preserved as a compatibility alias to `trace tail`
109
+ - `audio` for active control, `trace` for explanation, `env` for browser-state controls
651
110
 
652
- ### AI Agent Tool Definition
111
+ ## Programmatic example
653
112
 
654
113
  ```typescript
655
- const browserTool = {
656
- name: 'browser_action',
657
- description: 'Execute browser actions and get page state',
658
- parameters: {
659
- type: 'object',
660
- properties: {
661
- actions: {
662
- type: 'array',
663
- items: {
664
- type: 'object',
665
- properties: {
666
- action: { enum: ['goto', 'click', 'fill', 'submit', 'snapshot', 'assertVisible', 'assertExists', 'assertText', 'assertUrl', 'assertValue'] },
667
- selector: { type: ['string', 'array'] },
668
- value: { type: 'string' },
669
- url: { type: 'string' },
670
- },
671
- },
672
- },
673
- },
674
- },
675
- execute: async ({ actions }) => {
676
- const page = await getOrCreatePage();
677
- return page.batch(actions);
678
- },
679
- };
680
- ```
681
-
682
- ## Advanced
683
-
684
- ### Direct CDP Access
114
+ import { connect } from 'browser-pilot';
685
115
 
686
- ```typescript
687
116
  const browser = await connect({ provider: 'generic' });
688
- const cdp = browser.cdpClient;
689
-
690
- // Send any CDP command
691
- await cdp.send('Emulation.setDeviceMetricsOverride', {
692
- width: 375,
693
- height: 812,
694
- deviceScaleFactor: 3,
695
- mobile: true,
696
- });
697
- ```
698
-
699
- ### Tracing
117
+ const page = await browser.page();
700
118
 
701
- ```typescript
702
- import { enableTracing } from 'browser-pilot';
119
+ await page.batch([
120
+ { action: 'goto', url: 'https://example.com/login' },
121
+ { action: 'fill', selector: ['#email', 'input[type=email]'], value: 'user@example.com' },
122
+ { action: 'submit', selector: 'form' },
123
+ { action: 'assertUrl', expect: '/dashboard' },
124
+ ]);
703
125
 
704
- enableTracing({ output: 'console' });
705
- // [info] goto https://example.com ✓ (1200ms)
706
- // [info] click #submit ✓ (50ms)
126
+ await browser.close();
707
127
  ```
708
128
 
709
- ## AI Agent Integration
710
-
711
- browser-pilot is designed for AI agents. Two resources for agent setup:
712
-
713
- - **[llms.txt](./docs/llms.txt)** - Abbreviated reference for LLM context windows
714
- - **[Claude Code Skill](./docs/automating-browsers/SKILL.md)** - Full skill for Claude Code agents
715
-
716
- To use with Claude Code, copy `docs/automating-browsers/` to your project or reference it in your agent's context.
717
-
718
- ## Documentation
719
-
720
- See the [docs](./docs) folder for detailed documentation:
129
+ ## Guides
721
130
 
722
- - [Getting Started](./docs/getting-started.md)
723
- - [Providers](./docs/providers.md)
724
- - [Action Recording](./docs/guides/action-recording.md)
725
- - [Multi-Selector Guide](./docs/guides/multi-selector.md)
726
- - [Batch Actions](./docs/guides/batch-actions.md)
727
- - [Snapshots](./docs/guides/snapshots.md)
728
- - [Voice Agent Testing](./docs/guides/voice-agent-testing.md)
729
- - [CLI Reference](./docs/cli.md)
730
- - [API Reference](./docs/api/page.md)
131
+ - [CLI guide](./docs/cli.md)
132
+ - [Automation workflows](./docs/guides/automation-workflows.md)
133
+ - [Action recording](./docs/guides/action-recording.md)
134
+ - [Trace workflows](./docs/guides/trace-workflows.md)
135
+ - [Realtime debugging](./docs/guides/realtime-debugging.md)
136
+ - [Voice agent testing](./docs/guides/voice-agent-testing.md)
137
+ - [Artifact analysis](./docs/guides/artifact-analysis.md)
138
+ - [LLM contract](./docs/llms.txt)
731
139
 
732
- ## License
140
+ ## Compatibility notes
733
141
 
734
- MIT
142
+ - Prefer `--debug` for transport logging. `--trace` still works as a legacy alias.
143
+ - Prefer `bp trace tail ...`. `bp listen ...` still works as a compatibility alias.