browser-pilot 0.0.12 → 0.0.14

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -167,10 +167,10 @@ console.log(snapshot.interactiveElements);
167
167
 
168
168
  // Text representation for LLMs
169
169
  console.log(snapshot.text);
170
- // - main [ref=e1]
171
- // - heading "Welcome" [ref=e2]
172
- // - button "Get Started" [ref=e3]
173
- // - textbox [ref=e4] placeholder="Email"
170
+ // - main ref:e1
171
+ // - heading "Welcome" ref:e2
172
+ // - button "Get Started" ref:e3
173
+ // - textbox ref:e4 placeholder="Email"
174
174
  ```
175
175
 
176
176
  ### Ref-Based Selectors
@@ -179,7 +179,7 @@ After taking a snapshot, use element refs directly as selectors:
179
179
 
180
180
  ```typescript
181
181
  const snapshot = await page.snapshot();
182
- // Output shows: button "Submit" [ref=e4]
182
+ // Output shows: button "Submit" ref:e4
183
183
 
184
184
  // Click using the ref - no fragile CSS needed
185
185
  await page.click('ref:e4');
@@ -373,6 +373,8 @@ The CLI provides session persistence for interactive workflows:
373
373
  # Connect to a browser
374
374
  bp connect --provider browserbase --name my-session
375
375
  bp connect --provider generic # auto-discovers local Chrome
376
+ bp connect --no-daemon # skip daemon (direct WebSocket only)
377
+ bp connect --daemon-idle 30 # custom idle timeout (minutes)
376
378
 
377
379
  # Execute actions
378
380
  bp exec -s my-session '{"action":"goto","url":"https://example.com"}'
@@ -383,25 +385,38 @@ bp exec -s my-session '[
383
385
 
384
386
  # Get page state (note the refs in output)
385
387
  bp snapshot -s my-session --format text
386
- # Output: button "Submit" [ref=e4], textbox "Email" [ref=e5], ...
388
+ # Output: button "Submit" ref:e4, textbox "Email" ref:e5, ...
387
389
 
388
390
  # Use refs from snapshot for reliable targeting
389
391
  # Refs are cached per session+URL after snapshot
390
392
  bp exec -s my-session '{"action":"click","selector":"ref:e4"}'
391
393
  bp exec -s my-session '{"action":"fill","selector":"ref:e5","value":"test@example.com"}'
392
394
 
395
+ # Quick discovery commands
396
+ bp page -s my-session # URL, title, headings, forms, interactive controls
397
+ bp forms -s my-session # Structured form metadata only
398
+ bp targets -s my-session # Browser tabs with targetIds
399
+ bp connect --new-tab --url https://example.com --name fresh
400
+
393
401
  # Handle native dialogs (alert/confirm/prompt)
394
402
  bp exec --dialog accept '{"action":"click","selector":"#delete-btn"}'
403
+ bp exec --record '[{"action":"click","selector":"#checkout"},{"action":"assertText","expect":"Thanks"}]'
395
404
 
396
405
  # Other commands
397
406
  bp text -s my-session --selector ".main-content"
398
407
  bp screenshot -s my-session --output page.png
399
408
  bp listen ws -m "*voice*" # monitor WebSocket traffic
400
409
  bp list # list all sessions
410
+ bp clean --max-size 500MB # trim old sessions by disk usage
401
411
  bp close -s my-session # close session
402
412
  bp actions # show complete action reference
403
413
  bp run workflow.json # run a workflow file
404
414
 
415
+ # Daemon management
416
+ bp daemon status # check daemon health
417
+ bp daemon stop # stop daemon for default session
418
+ bp daemon logs # view daemon log
419
+
405
420
  # Actions with inline assertions (no extra bp eval needed)
406
421
  bp exec '[
407
422
  {"action":"goto","url":"https://example.com/login"},
@@ -423,7 +438,7 @@ The CLI is designed for AI agent tool calls. The recommended workflow:
423
438
  ```bash
424
439
  # Step 1: Get page state with refs
425
440
  bp snapshot --format text
426
- # Output shows: button "Add to Cart" [ref=e12], textbox "Search" [ref=e5]
441
+ # Output shows: button "Add to Cart" ref:e12, textbox "Search" ref:e5
427
442
 
428
443
  # Step 2: Use refs to interact (stable, no CSS guessing)
429
444
  bp exec '[
@@ -528,10 +543,32 @@ The output format is compatible with `page.batch()`:
528
543
  ```
529
544
 
530
545
  **Notes:**
531
- - Password fields are automatically redacted as `[REDACTED]`
546
+ - Sensitive fields are automatically redacted as `[REDACTED]` based on input settings such as `type="password"`, `type="hidden"`, and secret/autofill hints like `autocomplete="one-time-code"` or `cc-number`
532
547
  - Selectors are multi-selector arrays ordered by reliability (data attributes > IDs > CSS paths)
533
548
  - Edit the JSON to adjust selectors or add `optional: true` flags
534
549
 
550
+ ### Screenshot Trail During Replay
551
+
552
+ Capture a lightweight visual trail while replaying steps. Enable recording at the session level so all `bp exec` calls are captured automatically:
553
+
554
+ ```bash
555
+ # Enable recording for the entire session
556
+ bp connect --provider generic --name my-session --record
557
+
558
+ # All exec calls now produce screenshots — frames accumulate in one manifest
559
+ bp exec -s my-session '[
560
+ {"action":"goto","url":"https://example.com/login"},
561
+ {"action":"fill","selector":"#email","value":"user@example.com"},
562
+ {"action":"submit","selector":"form"}
563
+ ]'
564
+ bp exec -s my-session '{"action":"assertUrl","expect":"/dashboard"}'
565
+
566
+ # Or enable recording on a single exec call
567
+ bp exec --record '[{"action":"click","selector":"#checkout"}]'
568
+ ```
569
+
570
+ This writes `recording.json` plus a `screenshots/` directory in the session directory. Sensitive field values are redacted in both the manifest and the screenshot overlays. See the [Action Recording Guide](./docs/guides/action-recording.md) for options like `--record-format`, `--record-quality`, and `--no-highlights`.
571
+
535
572
  ## Examples
536
573
 
537
574
  ### Login Flow with Error Handling
@@ -567,8 +604,32 @@ await page.fill('.dropdown-search', 'United');
567
604
  await page.click('.dropdown-option:has-text("United States")');
568
605
  ```
569
606
 
607
+ ### WebSocket Daemon
608
+
609
+ By default, `bp connect` spawns a lightweight background daemon that holds the CDP WebSocket open. Subsequent CLI commands connect via Unix socket (~5-15ms) instead of re-establishing WebSocket (~280-1030ms per command).
610
+
611
+ ```bash
612
+ # Daemon spawns automatically on connect
613
+ bp connect --provider generic --name dev
614
+
615
+ # Subsequent commands use the fast daemon path
616
+ bp exec -s dev '{"action":"snapshot"}' # ~5-15ms overhead instead of ~280ms
617
+
618
+ # Manage the daemon
619
+ bp daemon status # check health, PID, uptime
620
+ bp daemon stop # stop daemon
621
+ bp daemon logs # view daemon log
622
+
623
+ # Disable daemon for direct WebSocket
624
+ bp connect --no-daemon
625
+ ```
626
+
627
+ The daemon is transparent — if it dies or becomes stale, CLI commands fall back to direct WebSocket silently. Each session gets its own daemon with a 60-minute idle timeout.
628
+
570
629
  ### Cloudflare Workers
571
630
 
631
+ > Note: Cloudflare Workers' Node-compat runtime can expose parts of `node:net` with compatibility flags, but browser-pilot's daemon fast-path is intentionally CLI/Node-specific (Unix domain sockets + local background process). In Workers, use the normal direct WebSocket path shown below.
632
+
572
633
  ```typescript
573
634
  export default {
574
635
  async fetch(request: Request, env: Env): Promise<Response> {
@@ -650,9 +711,9 @@ enableTracing({ output: 'console' });
650
711
  browser-pilot is designed for AI agents. Two resources for agent setup:
651
712
 
652
713
  - **[llms.txt](./docs/llms.txt)** - Abbreviated reference for LLM context windows
653
- - **[Claude Code Skill](./docs/skill/SKILL.md)** - Full skill for Claude Code agents
714
+ - **[Claude Code Skill](./docs/automating-browsers/SKILL.md)** - Full skill for Claude Code agents
654
715
 
655
- To use with Claude Code, copy `docs/skill/` to your project or reference it in your agent's context.
716
+ To use with Claude Code, copy `docs/automating-browsers/` to your project or reference it in your agent's context.
656
717
 
657
718
  ## Documentation
658
719
 
@@ -660,6 +721,7 @@ See the [docs](./docs) folder for detailed documentation:
660
721
 
661
722
  - [Getting Started](./docs/getting-started.md)
662
723
  - [Providers](./docs/providers.md)
724
+ - [Action Recording](./docs/guides/action-recording.md)
663
725
  - [Multi-Selector Guide](./docs/guides/multi-selector.md)
664
726
  - [Batch Actions](./docs/guides/batch-actions.md)
665
727
  - [Snapshots](./docs/guides/snapshots.md)