@ulpi/browse 0.7.5 → 0.10.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -1,336 +1,499 @@
1
1
  # @ulpi/browse
2
2
 
3
- **The headless browser CLI built for AI agents not humans.**
3
+ Headless browser CLI for AI coding agents. Persistent Chromium daemon via Playwright, ~100ms per command after startup.
4
4
 
5
- When AI agents browse the web, the bottleneck isn't Chromium — it's **what gets dumped into the context window**. [`@playwright/mcp`](https://github.com/microsoft/playwright-mcp) sends the full accessibility snapshot on every navigate, click, and keystroke. On a real e-commerce page, that's **~16,000 tokens per action** — automatically, whether the agent needs it or not.
5
+ ## Installation
6
6
 
7
- Ten actions and you've burned **146K tokens — 73% of a 200K context window** — just on browser output. That leaves almost nothing for the agent to actually think.
7
+ ### Global Installation (recommended)
8
8
 
9
- `@ulpi/browse` flips this. Navigation returns 11 tokens. Clicks return 15 tokens. The agent requests a page snapshot **only when it needs one** — and can filter to interactive elements only, cutting another 2-6x.
9
+ ```bash
10
+ npm install -g @ulpi/browse
11
+ ```
10
12
 
11
- **Same 10 actions: ~11K tokens. 6% of context. 13x less than @playwright/mcp.**
13
+ Requires [Bun](https://bun.sh) runtime. Chromium is installed automatically via Playwright on first `npm install`.
12
14
 
13
- ## Benchmarks
15
+ ### Project Installation (local dependency)
14
16
 
15
- ### vs Agent Browser & Browser-Use (Token Cost)
17
+ ```bash
18
+ npm install @ulpi/browse
19
+ ```
16
20
 
17
- Tested on 3 sites across multi-step browsing flows navigate, snapshot, scroll, search, extract text:
21
+ Then use via `package.json` scripts or by invoking `browse` directly.
18
22
 
19
- **browse is 2.4-2.8x cheaper on tokens, 1.3-2.6x faster, and uses 7% of context vs 17-20%.**
23
+ ### From Source
20
24
 
21
- | Tool | Total Tokens | Total Time | Context Used (200K) |
22
- |------|-------------:|-----------:|--------------------:|
23
- | **browse** | **14,134** | **28.5s** | **7.1%** |
24
- | agent-browser | 39,414 | 36.2s | 19.7% |
25
- | browser-use | 34,281 | 72.7s | 17.1% |
25
+ ```bash
26
+ git clone https://github.com/ulpi-io/browse
27
+ cd browse
28
+ bun install
29
+ bun run src/cli.ts goto https://example.com # Dev mode
30
+ bun run build # Build standalone binary
31
+ ```
26
32
 
27
- **Per site:**
33
+ ## Quick Start
28
34
 
29
- | Site | browse tokens | agent-browser tokens | browser-use tokens | browse time | agent-browser time | browser-use time |
30
- |------|-------:|-------------:|------------:|------:|------:|------:|
31
- | amazon.com | 7,531 | 11,596 | 20,508 | 10.1s | 12.9s | 21.9s |
32
- | bbc.com | 4,032 | 24,861 | 8,827 | 9.8s | 13.5s | 29.9s |
33
- | booking.com | 2,571 | 2,957 | 4,946 | 8.6s | 9.8s | 20.9s |
35
+ ```bash
36
+ browse goto https://example.com
37
+ browse snapshot -i # Get interactive elements with refs
38
+ browse click @e2 # Click by ref from snapshot
39
+ browse fill @e3 "test@example.com" # Fill input by ref
40
+ browse text # Get visible page text
41
+ browse screenshot page.png
42
+ browse stop
43
+ ```
34
44
 
35
- browse uses **2.4x fewer tokens** than browser-use and **2.8x fewer** than agent-browser — and completes **2.5x faster** than browser-use across the same workflows.
45
+ ### The Ref Workflow
36
46
 
37
- ### vs @playwright/mcp (Architecture)
47
+ Every `snapshot` assigns refs (`@e1`, `@e2`, ...) to elements. Use refs as selectors in any command — no CSS selector construction needed:
38
48
 
39
- @playwright/mcp dumps the full accessibility snapshot on every action (navigate, click, type). browse returns ~15 tokens per action — the agent requests a snapshot only when it needs one:
49
+ ```bash
50
+ $ browse snapshot -i
51
+ @e1 [button] "Submit"
52
+ @e2 [link] "Home"
53
+ @e3 [textbox] "Email"
40
54
 
41
- | | @playwright/mcp | @ulpi/browse |
42
- |---|---:|---:|
43
- | Tokens on `navigate` | ~14,578 (auto-dumped) | **~11** (one-liner) |
44
- | Tokens on `click` | ~14,578 (auto-dumped) | **~15** (one-liner) |
45
- | 10-action session | ~145,780 | **~11,388** |
46
- | Context consumed (200K) | **73%** | **6%** |
55
+ $ browse click @e1 # Click the Submit button
56
+ Clicked @e1
47
57
 
48
- The agent decides when to see the page. Most actions don't need a snapshot.
58
+ $ browse fill @e3 "user@example.com" # Fill the Email field
59
+ Filled @e3
60
+ ```
49
61
 
50
- Rerun: `bun run benchmark`
62
+ ### Traditional Selectors (also supported)
51
63
 
52
- ## Why It's Faster
64
+ ```bash
65
+ browse click "#submit"
66
+ browse fill ".email-input" "test@example.com"
67
+ browse click "text=Submit"
68
+ ```
53
69
 
54
- ### 1. You Control What Enters the Context
70
+ ## Commands
55
71
 
56
- ```
57
- @playwright/mcp browser_navigate → 51,150 tokens (full snapshot, every time)
72
+ ### Navigation
58
73
 
59
- browse goto → 11 tokens ("Navigated to https://... (200)")
60
- browse text → 4,970 tokens (clean visible text, when you need it)
61
- browse snap -i → 15,072 tokens (interactive elements + refs, when you need it)
74
+ ```bash
75
+ browse goto <url> # Navigate to URL
76
+ browse back # Go back
77
+ browse forward # Go forward
78
+ browse reload # Reload page
79
+ browse url # Get current URL
62
80
  ```
63
81
 
64
- You pick the right view for the task. Reading prices? Use `text`. Need to click something? Use `snapshot -i`. Just navigating? `goto` is enough.
82
+ ### Content Extraction
65
83
 
66
- ### 2. Ref-Based Interaction — No Selector Construction
84
+ ```bash
85
+ browse text # Visible text (clean, no DOM mutation)
86
+ browse html [sel] # Full HTML or element innerHTML
87
+ browse links # All links as "text -> href"
88
+ browse forms # Form structure as JSON
89
+ browse accessibility # Raw ARIA snapshot tree
90
+ ```
67
91
 
68
- After `snapshot`, every element gets a ref (`@e1`, `@e2`, ...) backed by a Playwright Locator. The agent doesn't waste tokens constructing CSS selectors:
92
+ ### Interaction
69
93
 
70
94
  ```bash
71
- $ browse snapshot -i
72
- @e1 [button] "Help 24/7"
73
- @e2 [link] "Mumzworld"
74
- @e3 [searchbox]
75
- @e4 [link] "Sign In"
76
- @e5 [link] "Cart"
95
+ browse click <sel> # Click element
96
+ browse rightclick <sel> # Right-click element (context menu)
97
+ browse dblclick <sel> # Double-click element
98
+ browse fill <sel> <val> # Clear and fill input
99
+ browse select <sel> <val> # Select dropdown option
100
+ browse hover <sel> # Hover element
101
+ browse focus <sel> # Focus element
102
+ browse tap <sel> # Tap element (requires touch context via emulate)
103
+ browse check <sel> # Check checkbox
104
+ browse uncheck <sel> # Uncheck checkbox
105
+ browse type <text> # Type text via keyboard (current focus)
106
+ browse press <key> # Press key (Enter, Tab, etc.)
107
+ browse keydown <key> # Key down event
108
+ browse keyup <key> # Key up event
109
+ browse keyboard inserttext <text> # Insert text without key events
110
+ browse scroll [sel|up|down] # Scroll element into view or direction
111
+ browse scrollinto <sel> # Scroll element into view (explicit)
112
+ browse swipe <dir> [px] # Swipe up/down/left/right (touch events)
113
+ browse drag <src> <tgt> # Drag and drop
114
+ browse highlight <sel> # Highlight element with visual overlay
115
+ browse download <sel> [path] # Download file triggered by click
116
+ browse upload <sel> <files...> # Upload files to input
117
+ ```
77
118
 
78
- $ browse fill @e3 "strollers"
79
- Filled @e3
119
+ ### Mouse Control
80
120
 
81
- $ browse press Enter
82
- Pressed Enter
121
+ ```bash
122
+ browse mouse move <x> <y> # Move mouse to coordinates
123
+ browse mouse down [button] # Press mouse button (left/right/middle)
124
+ browse mouse up [button] # Release mouse button
125
+ browse mouse wheel <dy> [dx] # Scroll wheel
83
126
  ```
84
127
 
85
- ### 3. Cursor-Interactive Detection — What ARIA Misses
128
+ ### Settings
129
+
130
+ ```bash
131
+ browse set geo <lat> <lng> # Set geolocation
132
+ browse set media <scheme> # Set color scheme (dark/light/no-preference)
133
+ ```
86
134
 
87
- Modern SPAs use `<div onclick>`, `cursor: pointer`, `tabindex`, and `data-action` for interactivity. These are **invisible** to accessibility trees — both @playwright/mcp and raw `ariaSnapshot()` miss them.
135
+ ### Wait
88
136
 
89
137
  ```bash
90
- $ browse snapshot -i -C
91
- @e1 [button] "Submit"
92
- @e2 [textbox] "Email"
138
+ browse wait <selector> # Wait for element
139
+ browse wait <selector> --state hidden # Wait for element to disappear
140
+ browse wait <ms> # Wait for milliseconds
141
+ browse wait --url <pattern> # Wait for URL
142
+ browse wait --text "Welcome" # Wait for text to appear in page
143
+ browse wait --fn "js expr" # Wait for JavaScript condition
144
+ browse wait --load <state> # Wait for load state (load/domcontentloaded/networkidle)
145
+ browse wait --network-idle # Wait for network idle
146
+ ```
93
147
 
94
- [cursor-interactive]
95
- @e3 [div.card] "Add to cart" (cursor:pointer)
96
- @e4 [span.close] "Close dialog" (onclick)
97
- @e5 [div.menu] "Open Menu" (data-action)
148
+ ### Snapshot
149
+
150
+ ```bash
151
+ browse snapshot # Full accessibility tree
152
+ browse snapshot -i # Interactive elements only (terse flat list)
153
+ browse snapshot -i -f # Interactive elements, full indented tree
154
+ browse snapshot -i -C # Include cursor-interactive elements (onclick, cursor:pointer)
155
+ browse snapshot -V # Viewport only — elements visible on screen
156
+ browse snapshot -c # Compact — remove empty structural elements
157
+ browse snapshot -d 3 # Limit depth to 3 levels
158
+ browse snapshot -s "#main" # Scope to CSS selector
159
+ browse snapshot -i -c -d 5 # Combine options
98
160
  ```
99
161
 
100
- Every detected element gets a ref. `browse click @e3` just works.
162
+ | Flag | Description |
163
+ |------|-------------|
164
+ | `-i` | Interactive elements only (buttons, links, inputs) — terse flat list |
165
+ | `-f` | Full — indented tree with props and children (use with `-i`) |
166
+ | `-V` | Viewport — only elements visible in current viewport |
167
+ | `-c` | Compact — remove empty structural elements |
168
+ | `-C` | Cursor-interactive — detect divs with `cursor:pointer`, `onclick`, `tabindex` |
169
+ | `-d N` | Limit tree depth |
170
+ | `-s <sel>` | Scope to CSS selector |
101
171
 
102
- ### 4. 75 Purpose-Built Commands vs Generic Tools
172
+ The `-C` flag catches modern SPA patterns that ARIA trees miss — `<div onclick>`, `cursor: pointer`, `tabindex`, and `data-action` elements.
103
173
 
104
- @playwright/mcp has ~15 tools. For anything beyond navigate/click/type, you write JavaScript via `browser_evaluate`. `browse` has purpose-built commands that return structured, minimal output:
174
+ ### Find Elements
105
175
 
106
- | Need | @playwright/mcp | browse |
107
- |------|----------------|--------|
108
- | Page text | `browser_evaluate` + custom JS | `text` |
109
- | Form fields | `browser_evaluate` + custom JS | `forms` → structured JSON |
110
- | All links | `browser_evaluate` + custom JS | `links` → `Text → URL` |
111
- | Network log | Not available | `network` |
112
- | Cookies | Not available | `cookies` |
113
- | Performance | Not available | `perf` |
114
- | Page diff | Not available | `diff <url1> <url2>` |
115
- | Snapshot diff | Not available | `snapshot-diff` |
116
- | Responsive screenshots | Not available | `responsive` |
117
- | Device emulation | Not available | `emulate iphone` |
118
- | Input value | `browser_evaluate` + custom JS | `value <sel>` |
119
- | Element count | `browser_evaluate` + custom JS | `count <sel>` |
120
- | iframe targeting | Not available | `frame <sel>` / `frame main` |
121
- | Network mocking | Not available | `route <pattern> block\|fulfill` |
122
- | Offline mode | Not available | `offline on\|off` |
123
- | State persistence | Not available | `state save\|load` |
124
- | Credential vault | Not available | `auth save\|login\|list` |
125
- | HAR recording | Not available | `har start\|stop` |
126
- | Video recording | Not available | `video start [dir]\|stop\|status` |
127
- | Clipboard access | Not available | `clipboard [write <text>]` |
128
- | Element finding | Not available | `find role\|text\|label\|placeholder\|testid` |
129
- | DevTools inspect | Not available | `inspect` |
130
- | Domain restriction | Not available | `--allowed-domains` |
131
- | Prompt injection defense | Not available | `--content-boundaries` |
132
- | JSON output mode | Not available | `--json` |
176
+ ```bash
177
+ browse find role <role> [name] # By ARIA role
178
+ browse find text <text> # By text content
179
+ browse find label <label> # By label
180
+ browse find placeholder <placeholder> # By placeholder
181
+ browse find testid <id> # By data-testid
182
+ browse find alt <text> # By alt text
183
+ browse find title <text> # By title attribute
184
+ browse find first <sel> # First matching element
185
+ browse find last <sel> # Last matching element
186
+ browse find nth <n> <sel> # Nth matching element (0-indexed)
187
+ ```
133
188
 
134
- ### 5. Persistent Daemon — 100ms Commands
189
+ ### Inspection
135
190
 
191
+ ```bash
192
+ browse js <expr> # Evaluate JavaScript expression
193
+ browse eval <file> # Evaluate JavaScript file
194
+ browse css <sel> <prop> # Get computed CSS property
195
+ browse attrs <sel> # Get element attributes as JSON
196
+ browse element-state <sel> # Element state (visible, enabled, checked, etc.)
197
+ browse value <sel> # Get input/select value
198
+ browse count <sel> # Count elements matching selector
199
+ browse box <sel> # Get bounding box as JSON {x, y, width, height}
200
+ browse clipboard [write <text>] # Read or write clipboard
201
+ browse console [--clear] # Console log buffer
202
+ browse errors [--clear] # Page errors only (filtered from console)
203
+ browse network [--clear] # Network request buffer
204
+ browse cookies # Browser cookies as JSON
205
+ browse storage [set <k> <v>] # localStorage/sessionStorage
206
+ browse perf # Navigation timing (dns, ttfb, load)
207
+ browse devices [filter] # List available device names
136
208
  ```
137
- First command: ~2s (server + Chromium startup, once)
138
- Every command after: ~100-200ms (HTTP to localhost)
209
+
210
+ ### Visual
211
+
212
+ ```bash
213
+ browse screenshot [path] # Take screenshot (viewport)
214
+ browse screenshot --full [path] # Full-page screenshot
215
+ browse screenshot <sel|@ref> [path] # Screenshot specific element
216
+ browse screenshot --clip x,y,w,h [path] # Screenshot clipped region
217
+ browse screenshot --annotate [path] # Annotated screenshot with numbered labels
218
+ browse pdf [path] # Save page as PDF
219
+ browse responsive [prefix] # Mobile/tablet/desktop screenshots
139
220
  ```
140
221
 
141
- @playwright/mcp starts a new browser per MCP session. `browse` keeps the server running across commands with auto-shutdown after 30 min idle. Crash recovery is built in — the CLI detects a dead server and restarts transparently.
222
+ ### Compare
142
223
 
143
- ### 6. Multi-Agent Sessions — Parallel Browsing on One Chromium
224
+ ```bash
225
+ browse diff <url1> <url2> # Text diff between two pages
226
+ browse snapshot-diff # Diff current vs last snapshot
227
+ browse screenshot-diff <baseline> [current] # Pixel-level visual diff
228
+ ```
144
229
 
145
- Run multiple AI agents in parallel, each with its own isolated browser session, sharing a single Chromium process. Each session gets its own tabs, refs, cookies, localStorage, and console/network buffers — zero cross-talk.
230
+ ### Tabs
146
231
 
147
232
  ```bash
148
- # Agent A researches strollers on mumzworld
149
- browse --session agent-a goto https://www.mumzworld.com
150
- browse --session agent-a snapshot -i
151
- browse --session agent-a fill @e3 "strollers"
152
- browse --session agent-a press Enter
233
+ browse tabs # List all tabs
234
+ browse tab <id> # Switch to tab
235
+ browse newtab [url] # Open new tab
236
+ browse closetab [id] # Close tab
237
+ ```
153
238
 
154
- # Agent B checks competitor pricing on amazon — simultaneously
155
- browse --session agent-b goto https://www.amazon.com
156
- browse --session agent-b snapshot -i
157
- browse --session agent-b fill @e6 "baby stroller"
158
- browse --session agent-b press Enter
239
+ ### Frames
159
240
 
160
- # Or set once via env var
161
- export BROWSE_SESSION=agent-a
162
- browse text # runs in agent-a's session
241
+ ```bash
242
+ browse frame <sel> # Switch to iframe
243
+ browse frame main # Back to main frame
163
244
  ```
164
245
 
165
- Under the hood, each session is a separate Playwright `BrowserContext` on the shared Chromium — same isolation model as browser profiles (separate cookies, storage, cache). One process, no extra memory for multiple Chromium instances.
246
+ ### Device Emulation
166
247
 
248
+ ```bash
249
+ browse emulate "iPhone 14" # Emulate device
250
+ browse emulate reset # Reset to desktop (1920x1080)
251
+ browse devices # List all available devices
252
+ browse devices iphone # Filter device list
253
+ browse viewport 1280x720 # Set viewport size
167
254
  ```
168
- browse --session <id> <command>
169
-
170
- Persistent server (one Chromium process)
171
-
172
- SessionManager
173
- ├── "default" → BrowserContext → tabs, refs, cookies, buffers
174
- ├── "agent-a" → BrowserContext tabs, refs, cookies, buffers
175
- └── "agent-b" → BrowserContext tabs, refs, cookies, buffers
255
+
256
+ 100+ devices: iPhone 12–17, Pixel 5–7, iPad, Galaxy, and all Playwright built-ins.
257
+
258
+ ### Cookies
259
+
260
+ ```bash
261
+ browse cookie <name>=<value> # Set cookie (simple)
262
+ browse cookie set <n> <v> [--domain --secure ...] # Set cookie with options
263
+ browse cookie clear # Clear all cookies
264
+ browse cookie export <file> # Export cookies to JSON
265
+ browse cookie import <file> # Import cookies from JSON
266
+ browse cookies # Read all cookies
176
267
  ```
177
268
 
178
- **Session management:**
269
+ ### Network
270
+
179
271
  ```bash
180
- browse sessions # list active sessions with tab counts
181
- browse session-close agent-a # close a session (frees its tabs/context)
182
- browse status # shows total session count
272
+ browse route <pattern> block # Block matching requests
273
+ browse route <pattern> fulfill <status> [body] # Mock response
274
+ browse route clear # Remove all routes
275
+ browse offline [on|off] # Toggle offline mode
276
+ browse header <name>:<value> # Set extra HTTP header
277
+ browse useragent <string> # Set user agent
183
278
  ```
184
279
 
185
- Sessions auto-close after the idle timeout (default 30 min). The server shuts down when all sessions are idle. Without `--session`, everything runs in a `"default"` session — fully backward compatible.
280
+ ### Dialogs
186
281
 
187
- For full process isolation (separate Chromium instances), use `BROWSE_PORT` to run independent servers.
282
+ ```bash
283
+ browse dialog # Last dialog info
284
+ browse dialog-accept [text] # Accept next dialog (optional prompt text)
285
+ browse dialog-dismiss # Dismiss next dialog
286
+ ```
188
287
 
189
- ## Install
288
+ ### Recording
190
289
 
191
290
  ```bash
192
- npm install -g @ulpi/browse
291
+ browse har start # Start HAR recording
292
+ browse har stop [path] # Stop and save HAR file
293
+
294
+ browse video start [dir] # Start video recording (WebM)
295
+ browse video stop # Stop recording
296
+ browse video status # Check recording status
297
+
298
+ browse record start # Record browsing commands as you go
299
+ browse record stop # Stop recording
300
+ browse record status # Check recording status
301
+ browse record export browse [path] # Export as chain-compatible JSON (replay with browse chain)
302
+ browse record export replay [path] # Export as Chrome DevTools Recorder (Playwright/Puppeteer)
193
303
  ```
194
304
 
195
- Requires [Bun](https://bun.sh) runtime. Chromium is installed automatically via Playwright.
305
+ ### State & Auth
306
+
307
+ ```bash
308
+ browse state save [name] # Save cookies + localStorage
309
+ browse state load [name] # Restore saved state
310
+ browse state list # List saved states
311
+ browse state show [name] # Show state details
312
+
313
+ browse auth save <name> <url> <user> <pass> # Save encrypted credential
314
+ browse auth save <name> <url> <user> --password-stdin # Password from stdin
315
+ browse auth login <name> # Auto-login with saved credential
316
+ browse auth list # List saved credentials
317
+ browse auth delete <name> # Delete credential
318
+ ```
196
319
 
197
- ### Claude Code Skill
320
+ ### Multi-Step (Chaining)
198
321
 
199
- Install via [skills.sh](https://skills.sh) (works across Claude Code, Cursor, Cline, Windsurf, and 15+ agents):
322
+ Execute a sequence of commands in one call:
200
323
 
201
324
  ```bash
202
- npx skills add https://github.com/ulpi-io/skills --skill browse
325
+ echo '[["goto","https://example.com"],["snapshot","-i"],["text"]]' | browse chain
203
326
  ```
204
327
 
205
- Or install directly into your project:
328
+ ### Server Control
206
329
 
207
330
  ```bash
208
- browse install-skill
331
+ browse status # Server health report
332
+ browse instances # List all running browse servers
333
+ browse doctor # System check (Bun, Playwright, Chromium)
334
+ browse upgrade # Self-update via npm
335
+ browse stop # Stop server
336
+ browse restart # Restart server
337
+ browse inspect # Open DevTools (requires BROWSE_DEBUG_PORT)
209
338
  ```
210
339
 
211
- Both copy the skill definition to `.claude/skills/browse/SKILL.md` and add all browse commands to permissions — no more approval prompts.
340
+ ### Setup
341
+
342
+ ```bash
343
+ browse install-skill [path] # Install Claude Code skill
344
+ ```
212
345
 
213
- ## Real-World Example: E-Commerce Flow
346
+ ## Sessions
214
347
 
215
- Agent browses mumzworld.com search, find a product, add to cart, checkout:
348
+ Run multiple AI agents in parallel, each with isolated browser state, sharing one Chromium process:
216
349
 
217
350
  ```bash
218
- browse goto https://www.mumzworld.com
219
- browse snapshot -i # find searchbox → @e3
220
- browse fill @e3 "strollers"
221
- browse press Enter
351
+ # Agent A
352
+ browse --session agent-a goto https://site-a.com
353
+ browse --session agent-a snapshot -i
354
+ browse --session agent-a click @e3
222
355
 
223
- browse text # scan prices in results
224
- browse goto "https://www.mumzworld.com/en/doona-infant-car-seat..."
356
+ # Agent B (simultaneously)
357
+ browse --session agent-b goto https://site-b.com
358
+ browse --session agent-b snapshot -i
359
+ browse --session agent-b fill @e2 "query"
225
360
 
226
- browse snapshot -i # find Add to Cart @e54
227
- browse click @e54
361
+ # Or set once via env var
362
+ export BROWSE_SESSION=agent-a
363
+ browse text
364
+ ```
228
365
 
229
- browse snapshot -i -s "[role=dialog]" # scope to cart modal
230
- browse click @e3 # "View Cart"
366
+ Each session has its own:
367
+ - Browser context (cookies, storage, cache)
368
+ - Tabs and navigation history
369
+ - Refs from snapshots
370
+ - Console and network buffers
231
371
 
232
- browse snapshot -i # find Checkout → @e52
233
- browse click @e52
372
+ ```bash
373
+ browse sessions # List active sessions
374
+ browse session-close agent-a # Close a session
375
+ browse status # Shows total session count
234
376
  ```
235
377
 
236
- **12 steps. ~24K tokens total.** With @playwright/mcp: **~240K tokens** for the same flow (every action dumps a full snapshot).
378
+ Sessions auto-close after the idle timeout (default 30 min). Without `--session`, everything runs in a `"default"` session.
237
379
 
238
- ## Command Reference
380
+ For full process isolation (separate Chromium instances), use `BROWSE_PORT` to run independent servers.
239
381
 
240
- ### Navigation
241
- `goto <url>` | `back` | `forward` | `reload` | `url`
382
+ ## Security
242
383
 
243
- ### Content Extraction
244
- `text` | `html [sel]` | `links` | `forms` | `accessibility`
384
+ All security features are opt-in — existing workflows are unaffected until you explicitly enable a feature.
245
385
 
246
- ### Interaction
247
- `click <sel>` | `dblclick <sel>` | `fill <sel> <val>` | `select <sel> <val>` | `hover <sel>` | `focus <sel>` | `check <sel>` | `uncheck <sel>` | `drag <src> <tgt>` | `type <text>` | `press <key>` | `keydown <key>` | `keyup <key>` | `scroll [sel|up|down]` | `wait <sel|--url|--network-idle>` | `viewport <WxH>` | `highlight <sel>` | `download <sel> [path]`
386
+ ### Domain Allowlist
387
+
388
+ Restrict navigation and sub-resource requests to trusted domains:
248
389
 
249
- ### Snapshot & Refs
390
+ ```bash
391
+ browse --allowed-domains "example.com,*.example.com" goto https://example.com
392
+ # Or via env var
393
+ BROWSE_ALLOWED_DOMAINS="example.com,*.api.io" browse goto https://example.com
250
394
  ```
251
- snapshot [-i] [-f] [-V] [-c] [-C] [-d N] [-s sel]
252
- -i Interactive elements only terse flat list (minimal tokens)
253
- -f Full — indented tree with props and children (use with -i)
254
- -V Viewport only elements visible in current viewport
255
- -c Compact — remove empty structural nodes
256
- -C Cursor-interactive detect hidden clickable elements
257
- -d N Limit tree depth
258
- -s Scope to CSS selector
395
+
396
+ Blocks HTTP requests, WebSocket, EventSource, and `sendBeacon` to non-allowed domains. Wildcards like `*.example.com` match the bare domain and all subdomains.
397
+
398
+ ### Action Policy
399
+
400
+ Gate commands with a `browse-policy.json` file:
401
+
402
+ ```json
403
+ { "default": "allow", "deny": ["js", "eval"], "confirm": ["goto"] }
259
404
  ```
260
- After snapshot, use `@e1`, `@e2`... as selectors in any command.
261
405
 
262
- ### Snapshot Diff
263
- `snapshot-diff` — compare current page against last snapshot.
406
+ Precedence: deny > confirm > allow > default. Hot-reloads on file change — no server restart needed.
264
407
 
265
- ### Device Emulation
266
- `emulate <device>` | `emulate reset` | `devices [filter]`
408
+ ### Credential Vault
267
409
 
268
- 100+ devices: iPhone 12-17, Pixel 5-7, iPad, Galaxy, and all Playwright built-ins.
410
+ Encrypted credential storage (AES-256-GCM). The LLM never sees passwords:
269
411
 
270
- ### Inspection
271
- `js <expr>` | `eval <file>` | `css <sel> <prop>` | `attrs <sel>` | `element-state <sel>` | `value <sel>` | `count <sel>` | `clipboard [write <text>]` | `console [--clear]` | `network [--clear]` | `cookies` | `storage [set <k> <v>]` | `perf`
412
+ ```bash
413
+ echo "mypassword" | browse auth save github https://github.com/login myuser --password-stdin
414
+ browse auth login github # Auto-navigates, detects form, fills + submits
415
+ browse auth list # List saved credentials (no passwords shown)
416
+ ```
272
417
 
273
- ### Visual
274
- `screenshot [path]` | `screenshot --annotate` | `pdf [path]` | `responsive [prefix]`
418
+ Key is auto-generated at `.browse/.encryption-key` or set via `BROWSE_ENCRYPTION_KEY`.
275
419
 
276
- ### Compare
277
- `diff <url1> <url2>` — text diff between two pages.
278
- `screenshot-diff <baseline> [current]` — pixel-level visual regression testing.
420
+ ### Content Boundaries
279
421
 
280
- ### Find
281
- `find role|text|label|placeholder|testid <query> [name]` — semantic element locators.
422
+ Wrap page output in CSPRNG nonce-delimited markers so LLMs can distinguish tool output from untrusted page content:
282
423
 
283
- ### Multi-Step
284
424
  ```bash
285
- echo '[["goto","https://example.com"],["text"]]' | browse chain
425
+ browse --content-boundaries text
286
426
  ```
287
427
 
288
- ### Tabs
289
- `tabs` | `tab <id>` | `newtab [url]` | `closetab [id]`
428
+ ### JSON Output
290
429
 
291
- ### Frames
292
- `frame <sel>` | `frame main`
430
+ Machine-readable output for agent frameworks:
293
431
 
294
- ### Sessions
295
- `sessions` | `session-close <id>`
432
+ ```bash
433
+ browse --json snapshot -i
434
+ # Returns: {"success": true, "data": "...", "command": "snapshot"}
435
+ ```
296
436
 
297
- ### Network
298
- `route <pattern> block` | `route <pattern> fulfill <status> [body]` | `route clear` | `offline [on|off]`
437
+ ## Configuration
438
+
439
+ Create a `browse.json` file at your project root to set persistent defaults:
440
+
441
+ ```json
442
+ {
443
+ "session": "my-agent",
444
+ "json": true,
445
+ "contentBoundaries": true,
446
+ "allowedDomains": ["example.com", "*.api.io"],
447
+ "idleTimeout": 3600000,
448
+ "viewport": "1280x720",
449
+ "device": "iPhone 14",
450
+ "runtime": "playwright"
451
+ }
452
+ ```
299
453
 
300
- ### State & Auth
301
- `state save [name]` | `state load [name]` | `state list` | `state show [name]` | `auth save <name> <url> <user> <pass>` | `auth login <name>` | `auth list` | `auth delete <name>`
454
+ CLI flags and environment variables override config file values.
302
455
 
303
- ### Recording
304
- `har start` | `har stop [path]` | `video start [dir]` | `video stop` | `video status`
456
+ ## Usage with AI Agents
305
457
 
306
- ### Debug
307
- `inspect` — open DevTools debugger (requires `BROWSE_DEBUG_PORT`).
458
+ ### Claude Code (recommended)
308
459
 
309
- ### Server Control
310
- `status` | `instances` | `cookie <n>=<v>` | `header <n>:<v>` | `useragent <str>` | `stop` | `restart`
460
+ Install as a Claude Code skill via [skills.sh](https://skills.sh):
311
461
 
312
- ## Architecture
462
+ ```bash
463
+ npx skills add https://github.com/ulpi-io/skills --skill browse
464
+ ```
313
465
 
466
+ Or install directly:
467
+
468
+ ```bash
469
+ browse install-skill
314
470
  ```
315
- browse [--session <id>] <command>
316
-
317
-
318
- CLI (thin HTTP client)
319
- X-Browse-Session: <id>
320
-
321
-
322
- Persistent server (localhost, auto-started)
323
-
324
- SessionManager
325
- ├── Session "default" BrowserContext + tabs + refs + buffers
326
- ├── Session "agent-a" → BrowserContext + tabs + refs + buffers
327
- └── Session "agent-b" → BrowserContext + tabs + refs + buffers
328
-
329
-
330
- Chromium (Playwright, headless, shared)
471
+
472
+ Both copy the skill definition to `.claude/skills/browse/SKILL.md` and add all browse commands to permissions — no more approval prompts.
473
+
474
+ ### CLAUDE.md / AGENTS.md
475
+
476
+ Add to your project instructions:
477
+
478
+ ```markdown
479
+ ## Browser Automation
480
+
481
+ Use `browse` for web automation. Run `browse --help` for all commands.
482
+
483
+ Core workflow:
484
+ 1. `browse goto <url>` — Navigate to page
485
+ 2. `browse snapshot -i` — Get interactive elements with refs (@e1, @e2)
486
+ 3. `browse click @e1` / `fill @e2 "text"` — Interact using refs
487
+ 4. Re-snapshot after page changes
488
+ ```
489
+
490
+ ### Just ask the agent
491
+
492
+ ```
493
+ Use browse to test the login flow. Run browse --help to see available commands.
331
494
  ```
332
495
 
333
- ## CLI Options
496
+ ## Options
334
497
 
335
498
  | Flag | Description |
336
499
  |------|-------------|
@@ -338,112 +501,93 @@ browse [--session <id>] <command>
338
501
  | `--json` | Wrap output as `{success, data, command}` |
339
502
  | `--content-boundaries` | Wrap page content in nonce-delimited markers |
340
503
  | `--allowed-domains <d,d>` | Block navigation/resources outside allowlist |
341
- | `--headed` | Run browser in headed (visible) mode |
504
+ | `--max-output <n>` | Truncate output to N characters |
505
+ | `--headed` | Show browser window (not headless) |
342
506
 
343
507
  ## Environment Variables
344
508
 
345
509
  | Variable | Default | Description |
346
510
  |----------|---------|-------------|
347
- | `BROWSE_PORT` | auto 9400-10400 | Fixed server port |
511
+ | `BROWSE_PORT` | auto (940010400) | Fixed server port |
348
512
  | `BROWSE_PORT_START` | 9400 | Start of port scan range |
349
513
  | `BROWSE_SESSION` | (none) | Default session ID for all commands |
350
- | `BROWSE_INSTANCE` | auto (PPID) | Instance ID for multi-Claude isolation |
351
- | `BROWSE_IDLE_TIMEOUT` | 1800000 (30m) | Idle shutdown in ms |
514
+ | `BROWSE_INSTANCE` | auto (PPID) | Instance ID for multi-agent isolation |
515
+ | `BROWSE_IDLE_TIMEOUT` | 1800000 (30m) | Idle auto-shutdown in ms |
352
516
  | `BROWSE_TIMEOUT` | (none) | Override all command timeouts (ms) |
353
- | `BROWSE_LOCAL_DIR` | `.browse/` or `/tmp` | State/log directory |
517
+ | `BROWSE_LOCAL_DIR` | `.browse/` or `/tmp` | State/log/screenshot directory |
354
518
  | `BROWSE_JSON` | (none) | Set to `1` for JSON output mode |
355
519
  | `BROWSE_CONTENT_BOUNDARIES` | (none) | Set to `1` for nonce-delimited output |
356
520
  | `BROWSE_ALLOWED_DOMAINS` | (none) | Comma-separated domain allowlist |
357
- | `BROWSE_HEADED` | (none) | Set to `1` for headed (visible) browser mode |
521
+ | `BROWSE_MAX_OUTPUT` | (none) | Truncate output to N characters |
522
+ | `BROWSE_HEADED` | (none) | Set to `1` for headed browser mode |
523
+ | `BROWSE_CDP_URL` | (none) | Connect to remote Chrome via CDP |
358
524
  | `BROWSE_PROXY` | (none) | Proxy server URL |
359
525
  | `BROWSE_PROXY_BYPASS` | (none) | Proxy bypass list |
360
- | `BROWSE_CDP_URL` | (none) | Connect to remote Chrome via CDP |
361
526
  | `BROWSE_SERVER_SCRIPT` | auto-detected | Override path to server.ts |
362
- | `BROWSE_DEBUG_PORT` | (none) | Port for DevTools debugging (inspect command) |
527
+ | `BROWSE_DEBUG_PORT` | (none) | Port for DevTools debugging |
363
528
  | `BROWSE_POLICY` | browse-policy.json | Path to action policy file |
364
- | `BROWSE_CONFIRM_ACTIONS` | (none) | Comma-separated commands requiring confirmation |
529
+ | `BROWSE_CONFIRM_ACTIONS` | (none) | Commands requiring confirmation |
365
530
  | `BROWSE_ENCRYPTION_KEY` | auto-generated | 64-char hex AES key for credential vault |
366
- | `BROWSE_AUTH_PASSWORD` | (none) | Password for auth save (alt to `--password-stdin`) |
531
+ | `BROWSE_AUTH_PASSWORD` | (none) | Password for `auth save` (alt to `--password-stdin`) |
532
+ | `BROWSE_RUNTIME` | playwright | Browser runtime (playwright, rebrowser, lightpanda) |
367
533
 
368
- ## Acknowledgments
534
+ ## Architecture
535
+
536
+ ```
537
+ browse [--session <id>] <command>
538
+ |
539
+ CLI (thin HTTP client)
540
+ |
541
+ Persistent server (localhost, auto-started)
542
+ |
543
+ SessionManager
544
+ ├── "default" → BrowserContext → tabs, refs, cookies, buffers
545
+ ├── "agent-a" → BrowserContext → tabs, refs, cookies, buffers
546
+ └── "agent-b" → BrowserContext → tabs, refs, cookies, buffers
547
+ |
548
+ Chromium (Playwright, headless, shared)
549
+ ```
550
+
551
+ - **First command:** ~2s (server + Chromium startup, once)
552
+ - **Every command after:** ~100–200ms (HTTP to localhost)
553
+ - Server auto-starts on first command, auto-shuts down after 30 min idle
554
+ - Crash recovery: CLI detects dead server and restarts transparently
555
+ - State file: `.browse/browse-server.json` (pid, port, token)
369
556
 
370
- Inspired by and originally derived from the `/browse` skill in [gstack](https://github.com/garrytan/gstack) by Garry Tan. The core architecture — persistent Chromium daemon, thin CLI client, ref-based element selection via ARIA snapshots — comes from gstack.
557
+ ## Benchmarks
558
+
559
+ ### vs Agent Browser & Browser-Use (Token Cost)
560
+
561
+ Tested on 3 sites across multi-step browsing flows — navigate, snapshot, scroll, search, extract text:
562
+
563
+ | Tool | Total Tokens | Total Time | Context Used (200K) |
564
+ |------|-------------:|-----------:|--------------------:|
565
+ | **browse** | **14,134** | **28.5s** | **7.1%** |
566
+ | agent-browser | 39,414 | 36.2s | 19.7% |
567
+ | browser-use | 34,281 | 72.7s | 17.1% |
568
+
569
+ browse uses **2.4x fewer tokens** than browser-use, **2.8x fewer** than agent-browser, and completes **2.5x faster** than browser-use.
570
+
571
+ ### vs @playwright/mcp (Architecture)
572
+
573
+ @playwright/mcp dumps the full accessibility snapshot on every action. browse returns ~15 tokens per action — the agent requests a snapshot only when needed:
574
+
575
+ | | @playwright/mcp | browse |
576
+ |---|---:|---:|
577
+ | Tokens on `navigate` | ~14,578 (auto-dumped) | **~11** |
578
+ | Tokens on `click` | ~14,578 (auto-dumped) | **~15** |
579
+ | 10-action session | ~145,780 | **~11,388** |
580
+ | Context consumed (200K) | **73%** | **6%** |
581
+
582
+ Rerun: `bun run benchmark`
371
583
 
372
584
  ## Changelog
373
585
 
374
- ### v0.7.0 Token Optimization
375
-
376
- - `snapshot -i` now outputs terse flat list by default (no indentation, no props, names truncated to 30 chars)
377
- - `-f` flag for full indented ARIA tree with props/children (the old `-i` behavior)
378
- - `-V` flag for viewport-only snapshot filters to elements visible in the current viewport (BBC: 189 28 elements, ~85% reduction)
379
- - `browse version` / `--version` / `-V` — print CLI version
380
- - 2.4-2.8x fewer tokens than browser-use and agent-browser across real-world benchmarks
381
-
382
- ### v0.4.0 — Video Recording
383
-
384
- - `video start [dir]` | `video stop` | `video status` — compositor-level WebM recording
385
- - Works with local and remote (CDP) browsers
386
-
387
- ### v0.3.0 — Headed Mode, Clipboard, DevTools
388
-
389
- - `--headed` flag — run browser in visible mode for debugging and demos
390
- - `clipboard [write <text>]` — read and write clipboard contents
391
- - `inspect` command — open DevTools debugger via `BROWSE_DEBUG_PORT`
392
- - `screenshot --annotate` — pixel-annotated PNG with numbered badges
393
- - `instances` command — list all running browse servers
394
- - `BROWSE_DEBUG_PORT` env var for DevTools debugging
395
-
396
- ### v0.2.0 — Security, Interactions, DX
397
-
398
- **Commands:**
399
- - `dblclick`, `focus`, `check`, `uncheck`, `drag`, `keydown`, `keyup` — interaction commands
400
- - `frame <sel>` / `frame main` — iframe targeting
401
- - `value <sel>`, `count <sel>` — element inspection
402
- - `scroll up/down` — viewport-relative scrolling
403
- - `wait --url`, `wait --network-idle` — navigation/network wait variants
404
- - `highlight <sel>` — visual element debugging
405
- - `download <sel> [path]` — file download
406
- - `route <pattern> block/fulfill` — network request interception and mocking
407
- - `offline on/off` — offline mode toggle
408
- - `state save/load` — persist and restore cookies + localStorage (all origins)
409
- - `har start/stop` — HAR recording and export
410
- - `video start/stop/status` — video recording (WebM, compositor-level, works with remote CDP)
411
- - `screenshot-diff` — pixel-level visual regression testing
412
- - `find role/text/label/placeholder/testid` — semantic element locators
413
-
414
- **Security:**
415
- - `--allowed-domains` — domain allowlist (HTTP + WebSocket/EventSource/sendBeacon)
416
- - `browse-policy.json` — action policy gate (allow/deny/confirm per command)
417
- - `auth save/login/list/delete` — AES-256-GCM encrypted credential vault
418
- - `--content-boundaries` — CSPRNG nonce wrapping for prompt injection defense
419
-
420
- **DX:**
421
- - `--json` — structured output mode for agent frameworks
422
- - `browse.json` config file support
423
- - AI-friendly error messages — Playwright errors rewritten to actionable hints
424
- - Per-session output folders (`.browse/sessions/{id}/`)
425
-
426
- **Infrastructure:**
427
- - Auto-instance servers via PPID — multi-Claude isolation
428
- - CDP remote connection (`BROWSE_CDP_URL`)
429
- - Proxy support (`BROWSE_PROXY`)
430
- - Compiled binary self-spawn mode
431
- - Orphaned server cleanup
432
-
433
- ### v0.1.0 — Foundation
434
-
435
- **Commands:**
436
- - `emulate` / `devices` — device emulation (100+ devices)
437
- - `snapshot -C` — cursor-interactive detection
438
- - `snapshot-diff` — before/after comparison with ref-number stripping
439
- - `dialog` / `dialog-accept` / `dialog-dismiss` — dialog handling
440
- - `upload` — file upload
441
- - `screenshot --annotate` — numbered badge overlay with legend
442
-
443
- **Infrastructure:**
444
- - Session multiplexing — multiple agents share one Chromium
445
- - Safe retry classification — read vs write commands
446
- - TreeWalker text extraction — no MutationObserver triggers
586
+ See [CHANGELOG.md](CHANGELOG.md) for full release history.
587
+
588
+ ## Acknowledgments
589
+
590
+ Inspired by and originally derived from the `/browse` skill in [gstack](https://github.com/garrytan/gstack) by Garry Tan.
447
591
 
448
592
  ## License
449
593