@browserbasehq/browse-cli 0.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (4) hide show
  1. package/LICENSE +21 -0
  2. package/README.md +310 -0
  3. package/dist/index.js +1297 -0
  4. package/package.json +70 -0
package/LICENSE ADDED
@@ -0,0 +1,21 @@
1
+ MIT License
2
+
3
+ Copyright (c) 2026 Browserbase, Inc.
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in all
13
+ copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21
+ SOFTWARE.
package/README.md ADDED
@@ -0,0 +1,310 @@
1
+ # Browse CLI
2
+
3
+ Browser automation CLI for AI agents. Built on [Stagehand](https://github.com/browserbase/stagehand), providing raw browser control without requiring LLM integration.
4
+
5
+ ## Installation
6
+
7
+ ```bash
8
+ npm install -g @browserbasehq/browse-cli
9
+ ```
10
+
11
+ Requires Chrome/Chromium installed on the system.
12
+
13
+ ## Quick Start
14
+
15
+ ```bash
16
+ # Navigate to a URL (auto-starts browser daemon)
17
+ browse open https://example.com
18
+
19
+ # Take a snapshot to get element refs
20
+ browse snapshot -c
21
+
22
+ # Click an element by ref
23
+ browse click @0-5
24
+
25
+ # Type text
26
+ browse type "Hello, world!"
27
+
28
+ # Take a screenshot
29
+ browse screenshot ./page.png
30
+
31
+ # Stop the browser
32
+ browse stop
33
+ ```
34
+
35
+ ## How It Works
36
+
37
+ Browse uses a daemon architecture for fast, stateful interactions:
38
+
39
+ 1. **First command** auto-starts a Chrome browser daemon
40
+ 2. **Subsequent commands** reuse the same browser session
41
+ 3. **State persists** between commands (cookies, refs, etc.)
42
+ 4. **Multiple sessions** supported via `--session` or `BROWSE_SESSION` env var
43
+
44
+ ### Self-Healing Sessions
45
+
46
+ The CLI automatically recovers from stale sessions. If the daemon or Chrome crashes:
47
+ 1. Detects the failure
48
+ 2. Cleans up stale processes and files
49
+ 3. Restarts the daemon
50
+ 4. Retries the command
51
+
52
+ Agents don't need to handle recovery - commands "just work".
53
+
54
+ ## Commands
55
+
56
+ ### Navigation
57
+
58
+ ```bash
59
+ browse open <url> [--wait load|domcontentloaded|networkidle] [-t|--timeout ms]
60
+ browse reload
61
+ browse back
62
+ browse forward
63
+ ```
64
+
65
+ The `--timeout` flag (default: 30000ms) controls how long to wait for the page load state. Use longer timeouts for slow-loading pages:
66
+
67
+ ```bash
68
+ browse open https://slow-site.com --timeout 60000
69
+ ```
70
+
71
+ ### Click Actions
72
+
73
+ ```bash
74
+ browse click <ref> [-b left|right|middle] [-c count] # Click by ref (e.g., @0-5)
75
+ browse click_xy <x> <y> [--button] [--xpath] # Click at coordinates
76
+ ```
77
+
78
+ ### Coordinate Actions
79
+
80
+ ```bash
81
+ browse hover <x> <y> [--xpath]
82
+ browse scroll <x> <y> <deltaX> <deltaY> [--xpath]
83
+ browse drag <fromX> <fromY> <toX> <toY> [--steps n] [--xpath]
84
+ ```
85
+
86
+ ### Keyboard
87
+
88
+ ```bash
89
+ browse type <text> [-d delay] [--mistakes]
90
+ browse press <key> # e.g., Enter, Tab, Cmd+A
91
+ ```
92
+
93
+ ### Forms
94
+
95
+ ```bash
96
+ browse fill <selector> <value> [--no-press-enter]
97
+ browse select <selector> <values...>
98
+ browse highlight <selector> [-d duration]
99
+ ```
100
+
101
+ ### Page Info
102
+
103
+ ```bash
104
+ browse get url
105
+ browse get title
106
+ browse get text <selector>
107
+ browse get html <selector>
108
+ browse get value <selector>
109
+ browse get box <selector> # Returns center coordinates
110
+
111
+ browse snapshot [-c|--compact] # Accessibility tree with refs
112
+ browse screenshot [path] [-f|--full-page] [-t png|jpeg]
113
+ ```
114
+
115
+ ### Waiting
116
+
117
+ ```bash
118
+ browse wait load [state]
119
+ browse wait selector <selector> [-t timeout] [-s visible|hidden|attached|detached]
120
+ browse wait timeout <ms>
121
+ ```
122
+
123
+ ### Multi-Tab
124
+
125
+ ```bash
126
+ browse pages # List all tabs
127
+ browse newpage [url] # Open new tab
128
+ browse tab_switch <n> # Switch to tab by index
129
+ browse tab_close [n] # Close tab (default: last)
130
+ ```
131
+
132
+ ### Network Capture
133
+
134
+ Capture HTTP requests to the filesystem for inspection:
135
+
136
+ ```bash
137
+ browse network on # Start capturing requests
138
+ browse network off # Stop capturing
139
+ browse network path # Get capture directory path
140
+ browse network clear # Clear captured requests
141
+ ```
142
+
143
+ Captured requests are saved as directories:
144
+
145
+ ```
146
+ /tmp/browse-default-network/
147
+ 001-GET-api.github.com-repos/
148
+ request.json # method, url, headers, body
149
+ response.json # status, headers, body, duration
150
+ ```
151
+
152
+ ### Daemon Control
153
+
154
+ ```bash
155
+ browse start # Explicitly start daemon
156
+ browse stop [--force] # Stop daemon
157
+ browse status # Check daemon status
158
+ ```
159
+
160
+ ## Global Options
161
+
162
+ | Option | Description |
163
+ |--------|-------------|
164
+ | `--session <name>` | Session name for multiple browsers (default: "default") |
165
+ | `--headless` | Run Chrome in headless mode |
166
+ | `--headed` | Run Chrome with visible window (default) |
167
+ | `--ws <url>` | Connect to existing Chrome via CDP WebSocket |
168
+ | `--json` | Output as JSON |
169
+
170
+ ## Environment Variables
171
+
172
+ | Variable | Description |
173
+ |----------|-------------|
174
+ | `BROWSE_SESSION` | Default session name (alternative to `--session`) |
175
+
176
+ ## Element References
177
+
178
+ After running `browse snapshot`, you can reference elements by their ref ID:
179
+
180
+ ```bash
181
+ # Get snapshot with refs
182
+ browse snapshot -c
183
+
184
+ # Output includes refs like [0-5], [1-2], etc.
185
+ # RootWebArea "Example" url="https://example.com"
186
+ # [0-0] link "Home"
187
+ # [0-1] link "About"
188
+ # [0-2] button "Sign In"
189
+
190
+ # Click using ref (multiple formats supported)
191
+ browse click @0-2 # @ prefix
192
+ browse click 0-2 # Plain ref
193
+ browse click ref=0-2 # Explicit prefix
194
+ ```
195
+
196
+ The full snapshot output includes mappings:
197
+ - **xpathMap**: Cross-frame XPath selectors
198
+ - **cssMap**: Fast CSS selectors when available
199
+ - **urlMap**: Extracted URLs from links
200
+
201
+ ## Multiple Sessions
202
+
203
+ Run multiple browser instances simultaneously:
204
+
205
+ ```bash
206
+ # Terminal 1
207
+ BROWSE_SESSION=session1 browse open https://google.com
208
+
209
+ # Terminal 2
210
+ BROWSE_SESSION=session2 browse open https://github.com
211
+
212
+ # Or use --session flag
213
+ browse --session work open https://slack.com
214
+ browse --session personal open https://twitter.com
215
+ ```
216
+
217
+ ## Direct CDP Connection
218
+
219
+ Connect to an existing Chrome instance:
220
+
221
+ ```bash
222
+ # Start Chrome with remote debugging
223
+ google-chrome --remote-debugging-port=9222
224
+
225
+ # Connect via WebSocket
226
+ browse --ws ws://localhost:9222/devtools/browser/... open https://example.com
227
+ ```
228
+
229
+ ## Optimal AI Workflow
230
+
231
+ 1. **Navigate** to target page (browser auto-starts)
232
+ 2. **Snapshot** to get the accessibility tree with refs
233
+ 3. **Click/Fill** using refs directly (e.g., `@0-5`)
234
+ 4. **Re-snapshot** after actions to verify state changes
235
+ 5. **Stop** when done
236
+
237
+ ```bash
238
+ browse open https://example.com
239
+ browse snapshot -c
240
+ # [0-5] textbox: Search
241
+ # [0-8] button: Submit
242
+ browse fill @0-5 "my query"
243
+ browse click @0-8
244
+ browse snapshot -c # Verify result
245
+ browse stop
246
+ ```
247
+
248
+ ## Troubleshooting
249
+
250
+ ### Chrome not found
251
+
252
+ The CLI uses your system Chrome/Chromium. If not found:
253
+
254
+ ```bash
255
+ # macOS - Install Chrome or set path
256
+ export CHROME_PATH=/Applications/Google\ Chrome.app/Contents/MacOS/Google\ Chrome
257
+
258
+ # Linux - Install chromium
259
+ sudo apt install chromium-browser
260
+ ```
261
+
262
+ ### Stale daemon
263
+
264
+ If the daemon becomes unresponsive:
265
+
266
+ ```bash
267
+ browse stop --force
268
+ ```
269
+
270
+ ### Permission denied on socket
271
+
272
+ ```bash
273
+ # Clean up stale socket files
274
+ rm /tmp/browse-*.sock /tmp/browse-*.pid
275
+ ```
276
+
277
+ ## Platform Support
278
+
279
+ - macOS (Intel and Apple Silicon)
280
+ - Linux (x64 and arm64)
281
+
282
+ Windows support requires WSL or TCP socket implementation.
283
+
284
+ ## Development
285
+
286
+ ```bash
287
+ # Clone and setup (in monorepo)
288
+ cd packages/cli
289
+ npm install # Install dependencies first!
290
+ npm run build # Build the CLI
291
+
292
+ # Run without building (for development)
293
+ npm run dev -- <command>
294
+
295
+ # Or with tsx directly
296
+ npx tsx src/index.ts <command>
297
+
298
+ # Run linting and formatting
299
+ npm run lint
300
+ npm run format
301
+ ```
302
+
303
+ ## License
304
+
305
+ MIT - see [LICENSE](./LICENSE)
306
+
307
+ ## Related
308
+
309
+ - [Stagehand](https://github.com/browserbase/stagehand) - AI web browser automation framework
310
+ - [Browserbase](https://browserbase.com) - Cloud browser infrastructure