agent-browser 0.22.1 → 0.22.3
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +21 -2
- package/bin/agent-browser-darwin-arm64 +0 -0
- package/bin/agent-browser-darwin-x64 +0 -0
- package/bin/agent-browser-linux-arm64 +0 -0
- package/bin/agent-browser-linux-musl-arm64 +0 -0
- package/bin/agent-browser-linux-musl-x64 +0 -0
- package/bin/agent-browser-linux-x64 +0 -0
- package/bin/agent-browser-win32-x64.exe +0 -0
- package/package.json +1 -1
- package/skills/agent-browser/SKILL.md +38 -0
package/README.md
CHANGED
|
@@ -125,6 +125,9 @@ agent-browser pdf <path> # Save as PDF
|
|
|
125
125
|
agent-browser snapshot # Accessibility tree with refs (best for AI)
|
|
126
126
|
agent-browser eval <js> # Run JavaScript (-b for base64, --stdin for piped input)
|
|
127
127
|
agent-browser connect <port> # Connect to browser via CDP
|
|
128
|
+
agent-browser stream enable [--port <port>] # Start runtime WebSocket streaming
|
|
129
|
+
agent-browser stream status # Show runtime streaming state and bound port
|
|
130
|
+
agent-browser stream disable # Stop runtime WebSocket streaming
|
|
128
131
|
agent-browser close # Close browser (aliases: quit, exit)
|
|
129
132
|
```
|
|
130
133
|
|
|
@@ -300,8 +303,11 @@ agent-browser frame main # Back to main frame
|
|
|
300
303
|
```bash
|
|
301
304
|
agent-browser dialog accept [text] # Accept (with optional prompt text)
|
|
302
305
|
agent-browser dialog dismiss # Dismiss
|
|
306
|
+
agent-browser dialog status # Check if a dialog is currently open
|
|
303
307
|
```
|
|
304
308
|
|
|
309
|
+
When a JavaScript dialog is pending, all command responses include a `warning` field with the dialog type and message.
|
|
310
|
+
|
|
305
311
|
### Diff
|
|
306
312
|
|
|
307
313
|
```bash
|
|
@@ -922,13 +928,26 @@ Stream the browser viewport via WebSocket for live preview or "pair browsing" wh
|
|
|
922
928
|
|
|
923
929
|
### Enable Streaming
|
|
924
930
|
|
|
925
|
-
|
|
931
|
+
For an already-running session, enable streaming at runtime:
|
|
932
|
+
|
|
933
|
+
```bash
|
|
934
|
+
agent-browser stream enable
|
|
935
|
+
agent-browser stream status
|
|
936
|
+
agent-browser stream disable
|
|
937
|
+
```
|
|
938
|
+
|
|
939
|
+
`stream enable` binds an available localhost port automatically unless you pass `--port <port>`.
|
|
940
|
+
Use `stream status` to inspect whether streaming is enabled, which port is active, whether a browser is attached, and whether screencasting is active.
|
|
941
|
+
|
|
942
|
+
If you want streaming to be available immediately when the daemon starts, set `AGENT_BROWSER_STREAM_PORT` before the first command in that session:
|
|
926
943
|
|
|
927
944
|
```bash
|
|
928
945
|
AGENT_BROWSER_STREAM_PORT=9223 agent-browser open example.com
|
|
929
946
|
```
|
|
930
947
|
|
|
931
|
-
|
|
948
|
+
The environment variable only affects daemon startup. For sessions that are already running, use `agent-browser stream enable` instead.
|
|
949
|
+
|
|
950
|
+
Once enabled, the WebSocket server streams the browser viewport and accepts input events.
|
|
932
951
|
|
|
933
952
|
### WebSocket Protocol
|
|
934
953
|
|
|
Binary file
|
|
Binary file
|
|
Binary file
|
|
Binary file
|
|
Binary file
|
|
Binary file
|
|
Binary file
|
package/package.json
CHANGED
|
@@ -171,12 +171,24 @@ agent-browser screenshot --screenshot-dir ./shots # Save to custom directory
|
|
|
171
171
|
agent-browser screenshot --screenshot-format jpeg --screenshot-quality 80
|
|
172
172
|
agent-browser pdf output.pdf # Save as PDF
|
|
173
173
|
|
|
174
|
+
# Live preview / streaming
|
|
175
|
+
agent-browser stream enable # Start runtime WebSocket streaming on an auto-selected port
|
|
176
|
+
agent-browser stream enable --port 9223 # Bind a specific localhost port
|
|
177
|
+
agent-browser stream status # Inspect enabled state, port, connection, and screencasting
|
|
178
|
+
agent-browser stream disable # Stop runtime streaming and remove the .stream metadata file
|
|
179
|
+
|
|
174
180
|
# Clipboard
|
|
175
181
|
agent-browser clipboard read # Read text from clipboard
|
|
176
182
|
agent-browser clipboard write "Hello, World!" # Write text to clipboard
|
|
177
183
|
agent-browser clipboard copy # Copy current selection
|
|
178
184
|
agent-browser clipboard paste # Paste from clipboard
|
|
179
185
|
|
|
186
|
+
# Dialogs (alert, confirm, prompt)
|
|
187
|
+
agent-browser dialog accept # Accept dialog
|
|
188
|
+
agent-browser dialog accept "my input" # Accept prompt dialog with text
|
|
189
|
+
agent-browser dialog dismiss # Dismiss/cancel dialog
|
|
190
|
+
agent-browser dialog status # Check if a dialog is currently open
|
|
191
|
+
|
|
180
192
|
# Diff (compare page states)
|
|
181
193
|
agent-browser diff snapshot # Compare current vs last snapshot
|
|
182
194
|
agent-browser diff snapshot --baseline before.txt # Compare current vs saved file
|
|
@@ -186,6 +198,12 @@ agent-browser diff url <url1> <url2> --wait-until networkidle # Custom wait str
|
|
|
186
198
|
agent-browser diff url <url1> <url2> --selector "#main" # Scope to element
|
|
187
199
|
```
|
|
188
200
|
|
|
201
|
+
## Runtime Streaming
|
|
202
|
+
|
|
203
|
+
Use `agent-browser stream enable` when you need a live WebSocket preview for an already-running session. This is the preferred runtime path because it does not require restarting the daemon. `stream enable` creates the server, `stream status` reports the bound port and connection state, and `stream disable` tears it down cleanly.
|
|
204
|
+
|
|
205
|
+
If streaming must be present from the first daemon command, `AGENT_BROWSER_STREAM_PORT` still works at daemon startup, but that environment variable is not retroactive for sessions that are already running.
|
|
206
|
+
|
|
189
207
|
## Batch Execution
|
|
190
208
|
|
|
191
209
|
Execute multiple commands in a single invocation by piping a JSON array of string arrays to `batch`. This avoids per-command process startup overhead when running multi-step workflows.
|
|
@@ -522,6 +540,26 @@ agent-browser wait 5000
|
|
|
522
540
|
|
|
523
541
|
When dealing with consistently slow websites, use `wait --load networkidle` after `open` to ensure the page is fully loaded before taking a snapshot. If a specific element is slow to render, wait for it directly with `wait <selector>` or `wait @ref`.
|
|
524
542
|
|
|
543
|
+
## JavaScript Dialogs (alert / confirm / prompt)
|
|
544
|
+
|
|
545
|
+
When a page opens a JavaScript dialog (`alert()`, `confirm()`, or `prompt()`), it blocks all other browser commands (snapshot, screenshot, click, etc.) until the dialog is dismissed. If commands start timing out unexpectedly, check for a pending dialog:
|
|
546
|
+
|
|
547
|
+
```bash
|
|
548
|
+
# Check if a dialog is blocking
|
|
549
|
+
agent-browser dialog status
|
|
550
|
+
|
|
551
|
+
# Accept the dialog (dismiss the alert / click OK)
|
|
552
|
+
agent-browser dialog accept
|
|
553
|
+
|
|
554
|
+
# Accept a prompt dialog with input text
|
|
555
|
+
agent-browser dialog accept "my input"
|
|
556
|
+
|
|
557
|
+
# Dismiss the dialog (click Cancel)
|
|
558
|
+
agent-browser dialog dismiss
|
|
559
|
+
```
|
|
560
|
+
|
|
561
|
+
When a dialog is pending, all command responses include a `warning` field indicating the dialog type and message. In `--json` mode this appears as a `"warning"` key in the response object.
|
|
562
|
+
|
|
525
563
|
## Session Management and Cleanup
|
|
526
564
|
|
|
527
565
|
When running multiple agents or automations concurrently, always use named sessions to avoid conflicts:
|