agent-browser 0.22.1 → 0.22.3

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -125,6 +125,9 @@ agent-browser pdf <path> # Save as PDF
125
125
  agent-browser snapshot # Accessibility tree with refs (best for AI)
126
126
  agent-browser eval <js> # Run JavaScript (-b for base64, --stdin for piped input)
127
127
  agent-browser connect <port> # Connect to browser via CDP
128
+ agent-browser stream enable [--port <port>] # Start runtime WebSocket streaming
129
+ agent-browser stream status # Show runtime streaming state and bound port
130
+ agent-browser stream disable # Stop runtime WebSocket streaming
128
131
  agent-browser close # Close browser (aliases: quit, exit)
129
132
  ```
130
133
 
@@ -300,8 +303,11 @@ agent-browser frame main # Back to main frame
300
303
  ```bash
301
304
  agent-browser dialog accept [text] # Accept (with optional prompt text)
302
305
  agent-browser dialog dismiss # Dismiss
306
+ agent-browser dialog status # Check if a dialog is currently open
303
307
  ```
304
308
 
309
+ When a JavaScript dialog is pending, all command responses include a `warning` field with the dialog type and message.
310
+
305
311
  ### Diff
306
312
 
307
313
  ```bash
@@ -922,13 +928,26 @@ Stream the browser viewport via WebSocket for live preview or "pair browsing" wh
922
928
 
923
929
  ### Enable Streaming
924
930
 
925
- Set the `AGENT_BROWSER_STREAM_PORT` environment variable:
931
+ For an already-running session, enable streaming at runtime:
932
+
933
+ ```bash
934
+ agent-browser stream enable
935
+ agent-browser stream status
936
+ agent-browser stream disable
937
+ ```
938
+
939
+ `stream enable` binds an available localhost port automatically unless you pass `--port <port>`.
940
+ Use `stream status` to inspect whether streaming is enabled, which port is active, whether a browser is attached, and whether screencasting is active.
941
+
942
+ If you want streaming to be available immediately when the daemon starts, set `AGENT_BROWSER_STREAM_PORT` before the first command in that session:
926
943
 
927
944
  ```bash
928
945
  AGENT_BROWSER_STREAM_PORT=9223 agent-browser open example.com
929
946
  ```
930
947
 
931
- This starts a WebSocket server on the specified port that streams the browser viewport and accepts input events.
948
+ The environment variable only affects daemon startup. For sessions that are already running, use `agent-browser stream enable` instead.
949
+
950
+ Once enabled, the WebSocket server streams the browser viewport and accepts input events.
932
951
 
933
952
  ### WebSocket Protocol
934
953
 
Binary file
Binary file
Binary file
Binary file
Binary file
Binary file
Binary file
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "agent-browser",
3
- "version": "0.22.1",
3
+ "version": "0.22.3",
4
4
  "description": "Headless browser automation CLI for AI agents",
5
5
  "type": "module",
6
6
  "files": [
@@ -171,12 +171,24 @@ agent-browser screenshot --screenshot-dir ./shots # Save to custom directory
171
171
  agent-browser screenshot --screenshot-format jpeg --screenshot-quality 80
172
172
  agent-browser pdf output.pdf # Save as PDF
173
173
 
174
+ # Live preview / streaming
175
+ agent-browser stream enable # Start runtime WebSocket streaming on an auto-selected port
176
+ agent-browser stream enable --port 9223 # Bind a specific localhost port
177
+ agent-browser stream status # Inspect enabled state, port, connection, and screencasting
178
+ agent-browser stream disable # Stop runtime streaming and remove the .stream metadata file
179
+
174
180
  # Clipboard
175
181
  agent-browser clipboard read # Read text from clipboard
176
182
  agent-browser clipboard write "Hello, World!" # Write text to clipboard
177
183
  agent-browser clipboard copy # Copy current selection
178
184
  agent-browser clipboard paste # Paste from clipboard
179
185
 
186
+ # Dialogs (alert, confirm, prompt)
187
+ agent-browser dialog accept # Accept dialog
188
+ agent-browser dialog accept "my input" # Accept prompt dialog with text
189
+ agent-browser dialog dismiss # Dismiss/cancel dialog
190
+ agent-browser dialog status # Check if a dialog is currently open
191
+
180
192
  # Diff (compare page states)
181
193
  agent-browser diff snapshot # Compare current vs last snapshot
182
194
  agent-browser diff snapshot --baseline before.txt # Compare current vs saved file
@@ -186,6 +198,12 @@ agent-browser diff url <url1> <url2> --wait-until networkidle # Custom wait str
186
198
  agent-browser diff url <url1> <url2> --selector "#main" # Scope to element
187
199
  ```
188
200
 
201
+ ## Runtime Streaming
202
+
203
+ Use `agent-browser stream enable` when you need a live WebSocket preview for an already-running session. This is the preferred runtime path because it does not require restarting the daemon. `stream enable` creates the server, `stream status` reports the bound port and connection state, and `stream disable` tears it down cleanly.
204
+
205
+ If streaming must be present from the first daemon command, `AGENT_BROWSER_STREAM_PORT` still works at daemon startup, but that environment variable is not retroactive for sessions that are already running.
206
+
189
207
  ## Batch Execution
190
208
 
191
209
  Execute multiple commands in a single invocation by piping a JSON array of string arrays to `batch`. This avoids per-command process startup overhead when running multi-step workflows.
@@ -522,6 +540,26 @@ agent-browser wait 5000
522
540
 
523
541
  When dealing with consistently slow websites, use `wait --load networkidle` after `open` to ensure the page is fully loaded before taking a snapshot. If a specific element is slow to render, wait for it directly with `wait <selector>` or `wait @ref`.
524
542
 
543
+ ## JavaScript Dialogs (alert / confirm / prompt)
544
+
545
+ When a page opens a JavaScript dialog (`alert()`, `confirm()`, or `prompt()`), it blocks all other browser commands (snapshot, screenshot, click, etc.) until the dialog is dismissed. If commands start timing out unexpectedly, check for a pending dialog:
546
+
547
+ ```bash
548
+ # Check if a dialog is blocking
549
+ agent-browser dialog status
550
+
551
+ # Accept the dialog (dismiss the alert / click OK)
552
+ agent-browser dialog accept
553
+
554
+ # Accept a prompt dialog with input text
555
+ agent-browser dialog accept "my input"
556
+
557
+ # Dismiss the dialog (click Cancel)
558
+ agent-browser dialog dismiss
559
+ ```
560
+
561
+ When a dialog is pending, all command responses include a `warning` field indicating the dialog type and message. In `--json` mode this appears as a `"warning"` key in the response object.
562
+
525
563
  ## Session Management and Cleanup
526
564
 
527
565
  When running multiple agents or automations concurrently, always use named sessions to avoid conflicts: