browser-pilot 0.0.14 → 0.0.16

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (44) hide show
  1. package/README.md +89 -667
  2. package/dist/actions.cjs +1073 -41
  3. package/dist/actions.d.cts +11 -3
  4. package/dist/actions.d.ts +11 -3
  5. package/dist/actions.mjs +1 -1
  6. package/dist/browser-ZCR6AA4D.mjs +11 -0
  7. package/dist/browser.cjs +1431 -62
  8. package/dist/browser.d.cts +4 -4
  9. package/dist/browser.d.ts +4 -4
  10. package/dist/browser.mjs +4 -4
  11. package/dist/cdp.cjs +5 -1
  12. package/dist/cdp.d.cts +1 -1
  13. package/dist/cdp.d.ts +1 -1
  14. package/dist/cdp.mjs +1 -1
  15. package/dist/{chunk-7NDR6V7S.mjs → chunk-6GBYX7C2.mjs} +1405 -528
  16. package/dist/{chunk-KIFB526Y.mjs → chunk-BVZALQT4.mjs} +5 -1
  17. package/dist/chunk-DTVRFXKI.mjs +35 -0
  18. package/dist/chunk-EZNZ72VA.mjs +563 -0
  19. package/dist/{chunk-SPSZZH22.mjs → chunk-LCNFBXB5.mjs} +9 -33
  20. package/dist/{chunk-IN5HPAPB.mjs → chunk-NNEHWWHL.mjs} +28 -10
  21. package/dist/chunk-TJ5B56NV.mjs +804 -0
  22. package/dist/{chunk-XMJABKCF.mjs → chunk-V3VLBQAM.mjs} +1073 -41
  23. package/dist/cli.mjs +2799 -1176
  24. package/dist/{client-Ck2nQksT.d.cts → client-B5QBRgIy.d.cts} +2 -0
  25. package/dist/{client-Ck2nQksT.d.ts → client-B5QBRgIy.d.ts} +2 -0
  26. package/dist/{client-3AFV2IAF.mjs → client-JWWZWO6L.mjs} +4 -2
  27. package/dist/index.cjs +1441 -52
  28. package/dist/index.d.cts +5 -5
  29. package/dist/index.d.ts +5 -5
  30. package/dist/index.mjs +19 -7
  31. package/dist/page-IUUTJ3SW.mjs +7 -0
  32. package/dist/providers.cjs +637 -2
  33. package/dist/providers.d.cts +2 -2
  34. package/dist/providers.d.ts +2 -2
  35. package/dist/providers.mjs +17 -3
  36. package/dist/{types-CjT0vClo.d.ts → types-BflRmiDz.d.cts} +17 -3
  37. package/dist/{types-BSoh5v1Y.d.cts → types-BzM-IfsL.d.ts} +17 -3
  38. package/dist/types-DeVSWhXj.d.cts +142 -0
  39. package/dist/types-DeVSWhXj.d.ts +142 -0
  40. package/package.json +1 -1
  41. package/dist/browser-LZTEHUDI.mjs +0 -9
  42. package/dist/chunk-BRAFQUMG.mjs +0 -229
  43. package/dist/types--wXNHUwt.d.cts +0 -56
  44. package/dist/types--wXNHUwt.d.ts +0 -56
package/README.md CHANGED
@@ -6,38 +6,19 @@
6
6
  [![TypeScript](https://img.shields.io/badge/TypeScript-5.0-blue?style=flat&logo=typescript&logoColor=white)](https://www.typescriptlang.org/)
7
7
  [![License](https://img.shields.io/npm/l/browser-pilot.svg)](https://github.com/svilupp/browser-pilot/blob/main/LICENSE)
8
8
 
9
- Lightweight CDP-based browser automation for AI agents. Zero dependencies, works in Node.js, Bun, and Cloudflare Workers.
9
+ Automation-first CDP browser control for AI agents.
10
10
 
11
- ```typescript
12
- import { connect } from 'browser-pilot';
13
-
14
- const browser = await connect({ provider: 'browserbase', apiKey: process.env.BROWSERBASE_API_KEY });
15
- const page = await browser.page();
11
+ Browser Pilot now teaches one workflow model:
16
12
 
17
- await page.goto('https://example.com/login');
18
- await page.fill(['#email', 'input[type=email]'], 'user@example.com');
19
- await page.fill(['#password', 'input[type=password]'], 'secret');
20
- await page.submit(['#login-btn', 'button[type=submit]']);
13
+ - inspect the page
14
+ - act in the browser
15
+ - record a manual workflow
16
+ - trace behavior over time
17
+ - exercise voice/media and browser conditions
21
18
 
22
- const snapshot = await page.snapshot();
23
- console.log(snapshot.text); // Accessibility tree as text
24
-
25
- await browser.close();
26
- ```
19
+ `record` and `trace` are two interfaces over the same capture system. `record` writes the canonical artifact. `trace` explains either a live session or a saved artifact.
27
20
 
28
- ## Why browser-pilot?
29
-
30
- | Problem with Playwright/Puppeteer | browser-pilot Solution |
31
- |-----------------------------------|------------------------|
32
- | Won't run in Cloudflare Workers | Pure Web Standard APIs, zero Node.js dependencies |
33
- | Bun CDP connection bugs | Custom CDP client that works everywhere |
34
- | Single-selector API (fragile) | Multi-selector by default: `['#primary', '.fallback']` |
35
- | No action batching (high latency) | Batch DSL: one call for entire sequences |
36
- | No inline assertions (extra API calls to verify) | Built-in assertions: verify state within the same batch |
37
- | No AI-optimized snapshots | Built-in accessibility tree extraction |
38
- | No audio I/O for voice agents | Mic injection + output capture + Whisper transcription |
39
-
40
- ## Installation
21
+ ## Install
41
22
 
42
23
  ```bash
43
24
  bun add browser-pilot
@@ -45,690 +26,131 @@ bun add browser-pilot
45
26
  npm install browser-pilot
46
27
  ```
47
28
 
48
- ## Providers
49
-
50
- ### BrowserBase (Recommended for production)
51
-
52
- ```typescript
53
- const browser = await connect({
54
- provider: 'browserbase',
55
- apiKey: process.env.BROWSERBASE_API_KEY,
56
- projectId: process.env.BROWSERBASE_PROJECT_ID, // optional
57
- });
58
- ```
59
-
60
- ### Browserless
61
-
62
- ```typescript
63
- const browser = await connect({
64
- provider: 'browserless',
65
- apiKey: process.env.BROWSERLESS_API_KEY,
66
- });
67
- ```
68
-
69
- ### Generic (Local Chrome)
29
+ For local Chrome on Chrome 144+:
70
30
 
71
31
  ```bash
72
- # Start Chrome with remote debugging
73
- chrome --remote-debugging-port=9222
74
- ```
75
-
76
- ```typescript
77
- const browser = await connect({
78
- provider: 'generic',
79
- wsUrl: 'ws://localhost:9222/devtools/browser/...', // optional, auto-discovers
80
- });
81
- ```
82
-
83
- ## Core Concepts
84
-
85
- ### Multi-Selector (Robust Automation)
86
-
87
- Every action accepts `string | string[]`. When given an array, tries each selector in order until one works:
88
-
89
- ```typescript
90
- // Tries #submit first, falls back to alternatives
91
- await page.click(['#submit', 'button[type=submit]', '.submit-btn']);
92
-
93
- // Cookie consent - try multiple common patterns
94
- await page.click([
95
- '#accept-cookies',
96
- '.cookie-accept',
97
- 'button:has-text("Accept")',
98
- '[data-testid="cookie-accept"]'
99
- ], { optional: true, timeout: 3000 });
100
- ```
101
-
102
- ### Built-in Waiting
103
-
104
- Every action automatically waits for the element to be visible before interacting:
105
-
106
- ```typescript
107
- // No separate waitFor needed - this waits automatically
108
- await page.click('.dynamic-button', { timeout: 5000 });
109
-
110
- // Explicit waiting when needed
111
- await page.waitFor('.loading', { state: 'hidden' });
112
- await page.waitForNavigation();
113
- await page.waitForNetworkIdle();
114
- ```
115
-
116
- ### Batch Actions
117
-
118
- Execute multiple actions in a single call with full result tracking:
119
-
120
- ```typescript
121
- const result = await page.batch([
122
- { action: 'goto', url: 'https://example.com/login' },
123
- { action: 'fill', selector: '#email', value: 'user@example.com' },
124
- { action: 'fill', selector: '#password', value: 'secret' },
125
- { action: 'submit', selector: '#login-btn' },
126
- { action: 'wait', waitFor: 'navigation' },
127
- { action: 'snapshot' },
128
- ]);
129
-
130
- console.log(result.success); // true if all steps succeeded
131
- console.log(result.totalDurationMs); // total execution time
132
- console.log(result.steps[5].result); // snapshot from step 5
133
- ```
134
-
135
- Assertion steps verify expected state within the same batch — no extra round trips. Available: `assertVisible`, `assertExists`, `assertText`, `assertUrl`, `assertValue`.
136
-
137
- ```typescript
138
- const result = await page.batch([
139
- { action: 'goto', url: 'https://example.com/login' },
140
- { action: 'fill', selector: '#email', value: 'user@example.com' },
141
- { action: 'fill', selector: '#password', value: 'secret' },
142
- { action: 'submit', selector: '#login-btn' },
143
- { action: 'assertUrl', expect: '/dashboard' },
144
- { action: 'assertVisible', selector: '.welcome-message' },
145
- ]);
146
- ```
147
-
148
- Any step supports `retry` and `retryDelay` for flaky or async content:
149
-
150
- ```typescript
151
- { action: 'assertVisible', selector: '.async-content', retry: 3, retryDelay: 1000 }
152
- ```
153
-
154
- ### AI-Optimized Snapshots
155
-
156
- Get the page state in a format perfect for LLMs:
157
-
158
- ```typescript
159
- const snapshot = await page.snapshot();
160
-
161
- // Structured accessibility tree
162
- console.log(snapshot.accessibilityTree);
163
-
164
- // Interactive elements with refs
165
- console.log(snapshot.interactiveElements);
166
- // [{ ref: 'e1', role: 'button', name: 'Submit', selector: '...' }, ...]
167
-
168
- // Text representation for LLMs
169
- console.log(snapshot.text);
170
- // - main ref:e1
171
- // - heading "Welcome" ref:e2
172
- // - button "Get Started" ref:e3
173
- // - textbox ref:e4 placeholder="Email"
174
- ```
175
-
176
- ### Ref-Based Selectors
177
-
178
- After taking a snapshot, use element refs directly as selectors:
179
-
180
- ```typescript
181
- const snapshot = await page.snapshot();
182
- // Output shows: button "Submit" ref:e4
183
-
184
- // Click using the ref - no fragile CSS needed
185
- await page.click('ref:e4');
186
-
187
- // Fill input by ref
188
- await page.fill('ref:e23', 'hello@example.com');
189
-
190
- // Combine ref with CSS fallbacks
191
- await page.click(['ref:e4', '#submit', 'button[type=submit]']);
192
- ```
193
-
194
- Refs are stable until page navigation. Always take a fresh snapshot after navigating.
195
- CLI note: refs are cached per session+URL after a snapshot, so you can reuse them across CLI calls
196
- until navigation changes the URL.
197
-
198
- ## Page API
199
-
200
- ### Navigation
201
-
202
- ```typescript
203
- await page.goto(url, options?)
204
- await page.reload(options?)
205
- await page.goBack(options?)
206
- await page.goForward(options?)
207
-
208
- const url = await page.url()
209
- const title = await page.title()
32
+ # 1. Start Chrome normally
33
+ # 2. Open chrome://inspect/#remote-debugging
34
+ # 3. Enable remote debugging, then run:
35
+ bp connect
210
36
  ```
211
37
 
212
- ### Actions
213
-
214
- All actions accept `string | string[]` for selectors:
215
-
216
- ```typescript
217
- await page.click(selector, options?)
218
- await page.fill(selector, value, options?) // clears first by default
219
- await page.type(selector, text, options?) // types character by character
220
- await page.select(selector, value, options?) // native <select>
221
- await page.select({ trigger, option, value, match }, options?) // custom dropdown
222
- await page.check(selector, options?)
223
- await page.uncheck(selector, options?)
224
- await page.submit(selector, options?) // tries Enter, then click
225
- await page.press(key)
226
- await page.focus(selector, options?)
227
- await page.hover(selector, options?)
228
- await page.scroll(selector, options?)
229
- ```
230
-
231
- ### Waiting
232
-
233
- ```typescript
234
- await page.waitFor(selector, { state: 'visible' | 'hidden' | 'attached' | 'detached' })
235
- await page.waitForNavigation(options?)
236
- await page.waitForNetworkIdle({ idleTime: 500 })
237
- ```
238
-
239
- ### Content
240
-
241
- ```typescript
242
- const snapshot = await page.snapshot()
243
- const text = await page.text(selector?)
244
- const screenshot = await page.screenshot({ format: 'png', fullPage: true })
245
- const result = await page.evaluate(() => document.title)
246
- ```
247
-
248
- ### Files
249
-
250
- ```typescript
251
- await page.setInputFiles(selector, [{ name: 'file.pdf', mimeType: 'application/pdf', buffer: data }])
252
- const download = await page.waitForDownload(() => page.click('#download-btn'))
253
- ```
254
-
255
- ### Emulation
256
-
257
- ```typescript
258
- import { devices } from 'browser-pilot';
259
-
260
- await page.emulate(devices['iPhone 14']); // Full device emulation
261
- await page.setViewport({ width: 1280, height: 720, deviceScaleFactor: 2 });
262
- await page.setUserAgent('Custom UA');
263
- await page.setGeolocation({ latitude: 37.7749, longitude: -122.4194 });
264
- await page.setTimezone('America/New_York');
265
- await page.setLocale('fr-FR');
266
- ```
267
-
268
- Devices: `iPhone 14`, `iPhone 14 Pro Max`, `Pixel 7`, `iPad Pro 11`, `Desktop Chrome`, `Desktop Firefox`
269
-
270
- ### Audio I/O
271
-
272
- ```typescript
273
- // Set up audio input/output interception
274
- await page.setupAudio();
275
-
276
- // Play audio into the page's fake microphone
277
- await page.audioInput.play(wavBytes, { waitForEnd: true });
38
+ Tip: try plain `bp connect` first. Only add `--channel` or `--user-data-dir` if auto-discovery finds more than one eligible profile.
278
39
 
279
- // Capture audio output until silence
280
- const capture = await page.audioOutput.captureUntilSilence({ silenceTimeout: 5000 });
40
+ Use `bp connect --channel beta` or `bp connect --user-data-dir <path>` when more than one Chrome profile is eligible.
281
41
 
282
- // Full round-trip: play input capture response
283
- const result = await page.audioRoundTrip({ input: wavBytes, silenceTimeout: 5000 });
42
+ Legacy/manual fallback still works with a separate debug profile:
284
43
 
285
- // Transcribe captured audio (requires OPENAI_API_KEY)
286
- import { transcribe } from 'browser-pilot';
287
- const { text } = await transcribe(capture);
288
- ```
289
-
290
- ### Request Interception
291
-
292
- ```typescript
293
- // Block images and fonts
294
- await page.blockResources(['Image', 'Font']);
295
-
296
- // Mock API responses
297
- await page.route('**/api/users', { status: 200, body: { users: [] } });
298
-
299
- // Full control
300
- await page.intercept('*api*', async (request, actions) => {
301
- if (request.url.includes('blocked')) await actions.fail();
302
- else await actions.continue({ headers: { ...request.headers, 'X-Custom': 'value' } });
303
- });
304
- ```
305
-
306
- ### Cookies & Storage
307
-
308
- ```typescript
309
- // Cookies
310
- const cookies = await page.cookies();
311
- await page.setCookie({ name: 'session', value: 'abc', domain: '.example.com' });
312
- await page.clearCookies();
313
-
314
- // localStorage / sessionStorage
315
- await page.setLocalStorage('key', 'value');
316
- const value = await page.getLocalStorage('key');
317
- await page.clearLocalStorage();
318
- ```
319
-
320
- ### Console & Dialogs
321
-
322
- ```typescript
323
- // Capture console messages
324
- await page.onConsole((msg) => console.log(`[${msg.type}] ${msg.text}`));
325
-
326
- // Handle dialogs (alert, confirm, prompt)
327
- await page.onDialog(async (dialog) => {
328
- if (dialog.type === 'confirm') await dialog.accept();
329
- else await dialog.dismiss();
330
- });
331
-
332
- // Collect messages during an action
333
- const { result, messages } = await page.collectConsole(async () => {
334
- return await page.click('#button');
335
- });
336
- ```
337
-
338
- **Important:** Native browser dialogs (`alert()`, `confirm()`, `prompt()`) block all CDP commands until handled. Always set up a dialog handler before triggering actions that may show dialogs.
339
-
340
- ### Iframes
341
-
342
- Switch context to interact with iframe content:
343
-
344
- ```typescript
345
- // Switch to iframe
346
- await page.switchToFrame('iframe#payment');
347
-
348
- // Now actions target the iframe
349
- await page.fill('#card-number', '4242424242424242');
350
- await page.fill('#expiry', '12/25');
351
-
352
- // Switch back to main document
353
- await page.switchToMain();
354
- await page.click('#submit-order');
44
+ ```bash
45
+ /Applications/Google\ Chrome.app/Contents/MacOS/Google\ Chrome \
46
+ --remote-debugging-port=9222 \
47
+ --user-data-dir=/tmp/browser-pilot-profile
355
48
  ```
356
49
 
357
- Note: Cross-origin iframes cannot be accessed due to browser security.
358
-
359
- ### Options
360
-
361
- ```typescript
362
- interface ActionOptions {
363
- timeout?: number; // default: 30000ms
364
- optional?: boolean; // return false instead of throwing on failure
365
- }
366
- ```
50
+ ## Choose the command by job
367
51
 
368
- ## CLI
52
+ | Job | Primary commands |
53
+ | --- | --- |
54
+ | Inspect page state | `snapshot`, `page`, `forms`, `text`, `targets`, `diagnose` |
55
+ | Act in the browser | `exec`, `run` |
56
+ | Capture a human demo | `record` |
57
+ | Investigate behavior over time | `trace` |
58
+ | Exercise voice/media | `audio` |
59
+ | Change browser conditions | `env` |
369
60
 
370
- The CLI provides session persistence for interactive workflows:
61
+ ## Golden path 1: automate a page
371
62
 
372
63
  ```bash
373
- # Connect to a browser
374
- bp connect --provider browserbase --name my-session
375
- bp connect --provider generic # auto-discovers local Chrome
376
- bp connect --no-daemon # skip daemon (direct WebSocket only)
377
- bp connect --daemon-idle 30 # custom idle timeout (minutes)
378
-
379
- # Execute actions
380
- bp exec -s my-session '{"action":"goto","url":"https://example.com"}'
381
- bp exec -s my-session '[
382
- {"action":"fill","selector":"#search","value":"browser automation"},
383
- {"action":"submit","selector":"#search-form"}
384
- ]'
385
-
386
- # Get page state (note the refs in output)
387
- bp snapshot -s my-session --format text
388
- # Output: button "Submit" ref:e4, textbox "Email" ref:e5, ...
389
-
390
- # Use refs from snapshot for reliable targeting
391
- # Refs are cached per session+URL after snapshot
392
- bp exec -s my-session '{"action":"click","selector":"ref:e4"}'
393
- bp exec -s my-session '{"action":"fill","selector":"ref:e5","value":"test@example.com"}'
394
-
395
- # Quick discovery commands
396
- bp page -s my-session # URL, title, headings, forms, interactive controls
397
- bp forms -s my-session # Structured form metadata only
398
- bp targets -s my-session # Browser tabs with targetIds
399
- bp connect --new-tab --url https://example.com --name fresh
400
-
401
- # Handle native dialogs (alert/confirm/prompt)
402
- bp exec --dialog accept '{"action":"click","selector":"#delete-btn"}'
403
- bp exec --record '[{"action":"click","selector":"#checkout"},{"action":"assertText","expect":"Thanks"}]'
404
-
405
- # Other commands
406
- bp text -s my-session --selector ".main-content"
407
- bp screenshot -s my-session --output page.png
408
- bp listen ws -m "*voice*" # monitor WebSocket traffic
409
- bp list # list all sessions
410
- bp clean --max-size 500MB # trim old sessions by disk usage
411
- bp close -s my-session # close session
412
- bp actions # show complete action reference
413
- bp run workflow.json # run a workflow file
414
-
415
- # Daemon management
416
- bp daemon status # check daemon health
417
- bp daemon stop # stop daemon for default session
418
- bp daemon logs # view daemon log
419
-
420
- # Actions with inline assertions (no extra bp eval needed)
421
- bp exec '[
422
- {"action":"goto","url":"https://example.com/login"},
423
- {"action":"fill","selector":"#email","value":"user@example.com"},
424
- {"action":"submit","selector":"form"},
425
- {"action":"assertUrl","expect":"/dashboard"},
64
+ bp connect --name dev
65
+ bp snapshot -i -s dev
66
+ bp exec -s dev '[
67
+ {"action":"fill","selector":"ref:e5","value":"user@example.com"},
68
+ {"action":"click","selector":"ref:e7"},
426
69
  {"action":"assertText","expect":"Welcome"}
427
70
  ]'
428
71
  ```
429
72
 
430
- ### CLI for AI Agents
73
+ Use `bp snapshot -i` first. Refs are the default targeting strategy.
431
74
 
432
- The CLI is designed for AI agent tool calls. The recommended workflow:
433
-
434
- 1. **Take snapshot** to see the page structure with refs
435
- 2. **Use refs** (`ref:e4`) for reliable element targeting
436
- 3. **Batch actions** to reduce round trips
75
+ ## Golden path 2: capture a manual workflow and derive automation
437
76
 
438
77
  ```bash
439
- # Step 1: Get page state with refs
440
- bp snapshot --format text
441
- # Output shows: button "Add to Cart" ref:e12, textbox "Search" ref:e5
442
-
443
- # Step 2: Use refs to interact (stable, no CSS guessing)
444
- bp exec '[
445
- {"action":"fill","selector":"ref:e5","value":"laptop"},
446
- {"action":"click","selector":"ref:e12"},
447
- {"action":"snapshot"}
448
- ]' --format json
78
+ bp record -s demo --profile automation -f ./artifacts/demo.recording.json
79
+ # perform the flow manually, then stop with Ctrl+C
80
+ bp record summary ./artifacts/demo.recording.json
81
+ bp record derive ./artifacts/demo.recording.json -o workflow.json
82
+ bp run workflow.json
449
83
  ```
450
84
 
451
- Multi-selector fallbacks for robustness:
452
- ```bash
453
- bp exec '[
454
- {"action":"click","selector":["ref:e4","#submit","button[type=submit]"]}
455
- ]'
456
- ```
457
-
458
- Output:
459
- ```json
460
- {
461
- "success": true,
462
- "steps": [
463
- {"action": "fill", "success": true, "durationMs": 30},
464
- {"action": "click", "success": true, "durationMs": 50, "selectorUsed": "ref:e12"},
465
- {"action": "snapshot", "success": true, "durationMs": 100, "result": "..."}
466
- ],
467
- "totalDurationMs": 180
468
- }
469
- ```
470
-
471
- Run `bp actions` for complete action reference.
472
-
473
- ### Voice Agent Testing
474
-
475
- Test audio-based AI apps (voice assistants, phone agents) by injecting microphone input and capturing spoken responses.
85
+ Do not start by opening the raw artifact. Use `record summary`, `record inspect`, or `trace summary --view ...` first.
476
86
 
477
- > **Full guide:** [Voice Agent Testing Guide](./docs/guides/voice-agent-testing.md)
87
+ ## Golden path 3: debug a realtime or voice session
478
88
 
479
89
  ```bash
480
- export OPENAI_API_KEY=sk-... # Required for --transcribe
481
-
482
- # Validate audio pipeline
483
- bp audio check -s my-session
484
- # Output: "READY for roundtrip" with agent AudioContext detected
485
-
486
- # Full round-trip: send audio prompt → wait for response → transcribe
487
- bp audio roundtrip -i prompt.wav --transcribe --silence-timeout 1500
488
- # Output: { "transcript": "Welcome! I'd be happy to help...", "latencyMs": 5200, ... }
489
-
490
- # Save response audio for manual review
491
- bp audio roundtrip -i prompt.wav -o response.wav --transcribe
492
- ```
493
-
494
- **Important:** Audio overrides must be injected before the voice agent initializes. Use `bp audio check` to validate the pipeline. See the [full guide](./docs/guides/voice-agent-testing.md) for setup order and troubleshooting.
495
-
496
- Programmatic API:
497
-
498
- ```typescript
499
- await page.setupAudio();
500
-
501
- const result = await page.audioRoundTrip({
502
- input: audioBytes,
503
- silenceTimeout: 1500,
504
- });
505
-
506
- import { transcribe } from 'browser-pilot';
507
- const { text } = await transcribe(result.audio);
508
- console.log(text); // "Welcome! I'd be happy to help..."
90
+ bp connect --name realtime
91
+ bp trace start -s realtime --timeout 20000
92
+ # reproduce the issue in the app
93
+ bp trace summary -s realtime --view ws
94
+ bp trace summary -s realtime --view console
509
95
  ```
510
96
 
511
- ### Recording Browser Actions
512
-
513
- Record human interactions to create automation recipes:
97
+ Voice workflow:
514
98
 
515
99
  ```bash
516
- # Auto-connect to local Chrome and record (creates new session)
517
- bp record
518
-
519
- # Use most recent session
520
- bp record -s
521
-
522
- # Use specific session with custom output file
523
- bp record -s my-session -f login-flow.json
524
-
525
- # Review and edit the recording
526
- cat recording.json
527
-
528
- # Replay the recording
529
- bp exec -s my-session --file recording.json
100
+ bp audio setup -s realtime
101
+ bp exec -s realtime '{"action":"goto","url":"https://my-voice-app.com"}'
102
+ bp audio check -s realtime
103
+ bp audio roundtrip -s realtime -i prompt.wav --transcribe -o response.wav
104
+ bp trace summary -s realtime --view voice
530
105
  ```
531
106
 
532
- The output format is compatible with `page.batch()`:
533
- ```json
534
- {
535
- "recordedAt": "2026-01-06T10:00:00.000Z",
536
- "startUrl": "https://example.com",
537
- "duration": 15000,
538
- "steps": [
539
- { "action": "fill", "selector": ["[data-testid=\"email\"]", "#email"], "value": "user@example.com" },
540
- { "action": "click", "selector": ["[data-testid=\"submit\"]", "#login-btn"] }
541
- ]
542
- }
543
- ```
544
-
545
- **Notes:**
546
- - Sensitive fields are automatically redacted as `[REDACTED]` based on input settings such as `type="password"`, `type="hidden"`, and secret/autofill hints like `autocomplete="one-time-code"` or `cc-number`
547
- - Selectors are multi-selector arrays ordered by reliability (data attributes > IDs > CSS paths)
548
- - Edit the JSON to adjust selectors or add `optional: true` flags
549
-
550
- ### Screenshot Trail During Replay
551
-
552
- Capture a lightweight visual trail while replaying steps. Enable recording at the session level so all `bp exec` calls are captured automatically:
107
+ ## Golden path 4: exercise failure modes
553
108
 
554
109
  ```bash
555
- # Enable recording for the entire session
556
- bp connect --provider generic --name my-session --record
557
-
558
- # All exec calls now produce screenshots — frames accumulate in one manifest
559
- bp exec -s my-session '[
560
- {"action":"goto","url":"https://example.com/login"},
561
- {"action":"fill","selector":"#email","value":"user@example.com"},
562
- {"action":"submit","selector":"form"}
563
- ]'
564
- bp exec -s my-session '{"action":"assertUrl","expect":"/dashboard"}'
565
-
566
- # Or enable recording on a single exec call
567
- bp exec --record '[{"action":"click","selector":"#checkout"}]'
110
+ bp env permissions grant -s realtime microphone
111
+ bp env network offline -s realtime --duration 5000
112
+ bp trace watch -s realtime --view ws --assert profile:reconnect --timeout 15000
113
+ bp env visibility hidden -s realtime
568
114
  ```
569
115
 
570
- This writes `recording.json` plus a `screenshots/` directory in the session directory. Sensitive field values are redacted in both the manifest and the screenshot overlays. See the [Action Recording Guide](./docs/guides/action-recording.md) for options like `--record-format`, `--record-quality`, and `--no-highlights`.
571
-
572
- ## Examples
116
+ ## What is new in the model
573
117
 
574
- ### Login Flow with Error Handling
575
-
576
- ```typescript
577
- const result = await page.batch([
578
- { action: 'goto', url: 'https://app.example.com/login' },
579
- { action: 'fill', selector: ['#email', 'input[name=email]'], value: email },
580
- { action: 'fill', selector: ['#password', 'input[name=password]'], value: password },
581
- { action: 'click', selector: '.remember-me', optional: true },
582
- { action: 'submit', selector: ['#login', 'button[type=submit]'] },
583
- ], { onFail: 'stop' });
584
-
585
- if (!result.success) {
586
- console.error(`Failed at step ${result.stoppedAtIndex}: ${result.steps[result.stoppedAtIndex!].error}`);
587
- }
588
- ```
118
+ - One canonical artifact model with `version: 2`
119
+ - One canonical trace event stream for recording, live trace, and session logs
120
+ - Trace-backed waits and assertions in `exec` / `run`
121
+ - `listen` preserved as a compatibility alias to `trace tail`
122
+ - `audio` for active control, `trace` for explanation, `env` for browser-state controls
589
123
 
590
- ### Custom Dropdown
124
+ ## Programmatic example
591
125
 
592
126
  ```typescript
593
- // Using the custom select config
594
- await page.select({
595
- trigger: '.country-dropdown',
596
- option: '.dropdown-option',
597
- value: 'United States',
598
- match: 'text', // or 'contains' or 'value'
599
- });
600
-
601
- // Or compose from primitives
602
- await page.click('.country-dropdown');
603
- await page.fill('.dropdown-search', 'United');
604
- await page.click('.dropdown-option:has-text("United States")');
605
- ```
606
-
607
- ### WebSocket Daemon
608
-
609
- By default, `bp connect` spawns a lightweight background daemon that holds the CDP WebSocket open. Subsequent CLI commands connect via Unix socket (~5-15ms) instead of re-establishing WebSocket (~280-1030ms per command).
610
-
611
- ```bash
612
- # Daemon spawns automatically on connect
613
- bp connect --provider generic --name dev
614
-
615
- # Subsequent commands use the fast daemon path
616
- bp exec -s dev '{"action":"snapshot"}' # ~5-15ms overhead instead of ~280ms
617
-
618
- # Manage the daemon
619
- bp daemon status # check health, PID, uptime
620
- bp daemon stop # stop daemon
621
- bp daemon logs # view daemon log
622
-
623
- # Disable daemon for direct WebSocket
624
- bp connect --no-daemon
625
- ```
626
-
627
- The daemon is transparent — if it dies or becomes stale, CLI commands fall back to direct WebSocket silently. Each session gets its own daemon with a 60-minute idle timeout.
628
-
629
- ### Cloudflare Workers
630
-
631
- > Note: Cloudflare Workers' Node-compat runtime can expose parts of `node:net` with compatibility flags, but browser-pilot's daemon fast-path is intentionally CLI/Node-specific (Unix domain sockets + local background process). In Workers, use the normal direct WebSocket path shown below.
632
-
633
- ```typescript
634
- export default {
635
- async fetch(request: Request, env: Env): Promise<Response> {
636
- const browser = await connect({
637
- provider: 'browserbase',
638
- apiKey: env.BROWSERBASE_API_KEY,
639
- });
640
-
641
- const page = await browser.page();
642
- await page.goto('https://example.com');
643
- const snapshot = await page.snapshot();
644
-
645
- await browser.close();
646
-
647
- return Response.json({ title: snapshot.title, elements: snapshot.interactiveElements });
648
- },
649
- };
650
- ```
651
-
652
- ### AI Agent Tool Definition
653
-
654
- ```typescript
655
- const browserTool = {
656
- name: 'browser_action',
657
- description: 'Execute browser actions and get page state',
658
- parameters: {
659
- type: 'object',
660
- properties: {
661
- actions: {
662
- type: 'array',
663
- items: {
664
- type: 'object',
665
- properties: {
666
- action: { enum: ['goto', 'click', 'fill', 'submit', 'snapshot', 'assertVisible', 'assertExists', 'assertText', 'assertUrl', 'assertValue'] },
667
- selector: { type: ['string', 'array'] },
668
- value: { type: 'string' },
669
- url: { type: 'string' },
670
- },
671
- },
672
- },
673
- },
674
- },
675
- execute: async ({ actions }) => {
676
- const page = await getOrCreatePage();
677
- return page.batch(actions);
678
- },
679
- };
680
- ```
681
-
682
- ## Advanced
683
-
684
- ### Direct CDP Access
127
+ import { connect } from 'browser-pilot';
685
128
 
686
- ```typescript
687
129
  const browser = await connect({ provider: 'generic' });
688
- const cdp = browser.cdpClient;
689
-
690
- // Send any CDP command
691
- await cdp.send('Emulation.setDeviceMetricsOverride', {
692
- width: 375,
693
- height: 812,
694
- deviceScaleFactor: 3,
695
- mobile: true,
696
- });
697
- ```
698
-
699
- ### Tracing
130
+ const page = await browser.page();
700
131
 
701
- ```typescript
702
- import { enableTracing } from 'browser-pilot';
132
+ await page.batch([
133
+ { action: 'goto', url: 'https://example.com/login' },
134
+ { action: 'fill', selector: ['#email', 'input[type=email]'], value: 'user@example.com' },
135
+ { action: 'submit', selector: 'form' },
136
+ { action: 'assertUrl', expect: '/dashboard' },
137
+ ]);
703
138
 
704
- enableTracing({ output: 'console' });
705
- // [info] goto https://example.com ✓ (1200ms)
706
- // [info] click #submit ✓ (50ms)
139
+ await browser.close();
707
140
  ```
708
141
 
709
- ## AI Agent Integration
710
-
711
- browser-pilot is designed for AI agents. Two resources for agent setup:
712
-
713
- - **[llms.txt](./docs/llms.txt)** - Abbreviated reference for LLM context windows
714
- - **[Claude Code Skill](./docs/automating-browsers/SKILL.md)** - Full skill for Claude Code agents
715
-
716
- To use with Claude Code, copy `docs/automating-browsers/` to your project or reference it in your agent's context.
717
-
718
- ## Documentation
719
-
720
- See the [docs](./docs) folder for detailed documentation:
142
+ ## Guides
721
143
 
722
- - [Getting Started](./docs/getting-started.md)
723
- - [Providers](./docs/providers.md)
724
- - [Action Recording](./docs/guides/action-recording.md)
725
- - [Multi-Selector Guide](./docs/guides/multi-selector.md)
726
- - [Batch Actions](./docs/guides/batch-actions.md)
727
- - [Snapshots](./docs/guides/snapshots.md)
728
- - [Voice Agent Testing](./docs/guides/voice-agent-testing.md)
729
- - [CLI Reference](./docs/cli.md)
730
- - [API Reference](./docs/api/page.md)
144
+ - [CLI guide](./docs/cli.md)
145
+ - [Automation workflows](./docs/guides/automation-workflows.md)
146
+ - [Action recording](./docs/guides/action-recording.md)
147
+ - [Trace workflows](./docs/guides/trace-workflows.md)
148
+ - [Realtime debugging](./docs/guides/realtime-debugging.md)
149
+ - [Voice agent testing](./docs/guides/voice-agent-testing.md)
150
+ - [Artifact analysis](./docs/guides/artifact-analysis.md)
151
+ - [LLM contract](./docs/llms.txt)
731
152
 
732
- ## License
153
+ ## Compatibility notes
733
154
 
734
- MIT
155
+ - Prefer `--debug` for transport logging. `--trace` still works as a legacy alias.
156
+ - Prefer `bp trace tail ...`. `bp listen ...` still works as a compatibility alias.