@ulpi/browse 0.7.4 → 0.10.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -1,327 +1,499 @@
1
1
  # @ulpi/browse
2
2
 
3
- **The headless browser CLI built for AI agents not humans.**
3
+ Headless browser CLI for AI coding agents. Persistent Chromium daemon via Playwright, ~100ms per command after startup.
4
4
 
5
- When AI agents browse the web, the bottleneck isn't Chromium — it's **what gets dumped into the context window**. [`@playwright/mcp`](https://github.com/microsoft/playwright-mcp) sends the full accessibility snapshot on every navigate, click, and keystroke. On a real e-commerce page, that's **~16,000 tokens per action** — automatically, whether the agent needs it or not.
5
+ ## Installation
6
6
 
7
- Ten actions and you've burned **146K tokens — 73% of a 200K context window** — just on browser output. That leaves almost nothing for the agent to actually think.
7
+ ### Global Installation (recommended)
8
8
 
9
- `@ulpi/browse` flips this. Navigation returns 11 tokens. Clicks return 15 tokens. The agent requests a page snapshot **only when it needs one** — and can filter to interactive elements only, cutting another 2-6x.
9
+ ```bash
10
+ npm install -g @ulpi/browse
11
+ ```
10
12
 
11
- **Same 10 actions: ~11K tokens. 6% of context. 13x less than @playwright/mcp.**
13
+ Requires [Bun](https://bun.sh) runtime. Chromium is installed automatically via Playwright on first `npm install`.
12
14
 
13
- ## Benchmarks (Measured)
15
+ ### Project Installation (local dependency)
14
16
 
15
- Tested on 4 e-commerce sites (mumzworld, amazon, ebay, nike) across homepage, search results, and product detail pages ([raw data](BENCHMARKS.md)):
17
+ ```bash
18
+ npm install @ulpi/browse
19
+ ```
16
20
 
17
- | Site | Page | @playwright/mcp navigate | browse snapshot -i | Reduction |
18
- |------|------|-------------------------:|-------------------:|----------:|
19
- | mumzworld.com | Homepage | ~51,151 | ~15,072 | **3x** |
20
- | mumzworld.com | Search | ~13,860 | ~3,614 | **4x** |
21
- | mumzworld.com | PDP | ~10,071 | ~3,084 | **3x** |
22
- | amazon.com | Homepage | ~10,431 | ~2,150 | **5x** |
23
- | amazon.com | Search | ~19,458 | ~3,644 | **5x** |
24
- | ebay.com | Homepage | ~4,641 | ~1,557 | **3x** |
25
- | ebay.com | Search | ~35,929 | ~7,088 | **5x** |
26
- | ebay.com | PDP | ~1,294 | ~678 | **2x** |
27
- | nike.com | Homepage | ~2,495 | ~816 | **3x** |
28
- | nike.com | Search | ~7,998 | ~2,678 | **3x** |
29
- | nike.com | PDP | ~3,034 | ~989 | **3x** |
30
- | **TOTAL** | **11 pages** | **~160,362** | **~41,370** | **4x** |
21
+ Then use via `package.json` scripts or by invoking `browse` directly.
31
22
 
32
- And that's the per-snapshot comparison. The real gap is architectural — @playwright/mcp dumps a snapshot on every action (navigate, click, type). `browse` only returns ~15 tokens per action:
23
+ ### From Source
33
24
 
34
- | | @playwright/mcp | @ulpi/browse |
35
- |---|---:|---:|
36
- | Tokens on `navigate` | ~14,578 (auto-dumped) | **~11** (one-liner) |
37
- | Tokens on `click` | ~14,578 (auto-dumped) | **~15** (one-liner) |
38
- | 10-action session | ~145,780 | **~11,388** |
39
- | Context consumed (200K) | **73%** | **6%** |
25
+ ```bash
26
+ git clone https://github.com/ulpi-io/browse
27
+ cd browse
28
+ bun install
29
+ bun run src/cli.ts goto https://example.com # Dev mode
30
+ bun run build # Build standalone binary
31
+ ```
40
32
 
41
- The agent decides when to see the page. Most actions don't need a snapshot.
33
+ ## Quick Start
42
34
 
43
- Rerun: `bun run benchmark`
35
+ ```bash
36
+ browse goto https://example.com
37
+ browse snapshot -i # Get interactive elements with refs
38
+ browse click @e2 # Click by ref from snapshot
39
+ browse fill @e3 "test@example.com" # Fill input by ref
40
+ browse text # Get visible page text
41
+ browse screenshot page.png
42
+ browse stop
43
+ ```
44
+
45
+ ### The Ref Workflow
44
46
 
45
- ## Why It's Faster
47
+ Every `snapshot` assigns refs (`@e1`, `@e2`, ...) to elements. Use refs as selectors in any command — no CSS selector construction needed:
48
+
49
+ ```bash
50
+ $ browse snapshot -i
51
+ @e1 [button] "Submit"
52
+ @e2 [link] "Home"
53
+ @e3 [textbox] "Email"
54
+
55
+ $ browse click @e1 # Click the Submit button
56
+ Clicked @e1
57
+
58
+ $ browse fill @e3 "user@example.com" # Fill the Email field
59
+ Filled @e3
60
+ ```
46
61
 
47
- ### 1. You Control What Enters the Context
62
+ ### Traditional Selectors (also supported)
48
63
 
64
+ ```bash
65
+ browse click "#submit"
66
+ browse fill ".email-input" "test@example.com"
67
+ browse click "text=Submit"
49
68
  ```
50
- @playwright/mcp browser_navigate → 51,150 tokens (full snapshot, every time)
51
69
 
52
- browse goto → 11 tokens ("Navigated to https://... (200)")
53
- browse text → 4,970 tokens (clean visible text, when you need it)
54
- browse snap -i → 15,072 tokens (interactive elements + refs, when you need it)
70
+ ## Commands
71
+
72
+ ### Navigation
73
+
74
+ ```bash
75
+ browse goto <url> # Navigate to URL
76
+ browse back # Go back
77
+ browse forward # Go forward
78
+ browse reload # Reload page
79
+ browse url # Get current URL
55
80
  ```
56
81
 
57
- You pick the right view for the task. Reading prices? Use `text`. Need to click something? Use `snapshot -i`. Just navigating? `goto` is enough.
82
+ ### Content Extraction
58
83
 
59
- ### 2. Ref-Based Interaction — No Selector Construction
84
+ ```bash
85
+ browse text # Visible text (clean, no DOM mutation)
86
+ browse html [sel] # Full HTML or element innerHTML
87
+ browse links # All links as "text -> href"
88
+ browse forms # Form structure as JSON
89
+ browse accessibility # Raw ARIA snapshot tree
90
+ ```
60
91
 
61
- After `snapshot`, every element gets a ref (`@e1`, `@e2`, ...) backed by a Playwright Locator. The agent doesn't waste tokens constructing CSS selectors:
92
+ ### Interaction
62
93
 
63
94
  ```bash
64
- $ browse snapshot -i
65
- @e1 [button] "Help 24/7"
66
- @e2 [link] "Mumzworld"
67
- @e3 [searchbox]
68
- @e4 [link] "Sign In"
69
- @e5 [link] "Cart"
95
+ browse click <sel> # Click element
96
+ browse rightclick <sel> # Right-click element (context menu)
97
+ browse dblclick <sel> # Double-click element
98
+ browse fill <sel> <val> # Clear and fill input
99
+ browse select <sel> <val> # Select dropdown option
100
+ browse hover <sel> # Hover element
101
+ browse focus <sel> # Focus element
102
+ browse tap <sel> # Tap element (requires touch context via emulate)
103
+ browse check <sel> # Check checkbox
104
+ browse uncheck <sel> # Uncheck checkbox
105
+ browse type <text> # Type text via keyboard (current focus)
106
+ browse press <key> # Press key (Enter, Tab, etc.)
107
+ browse keydown <key> # Key down event
108
+ browse keyup <key> # Key up event
109
+ browse keyboard inserttext <text> # Insert text without key events
110
+ browse scroll [sel|up|down] # Scroll element into view or direction
111
+ browse scrollinto <sel> # Scroll element into view (explicit)
112
+ browse swipe <dir> [px] # Swipe up/down/left/right (touch events)
113
+ browse drag <src> <tgt> # Drag and drop
114
+ browse highlight <sel> # Highlight element with visual overlay
115
+ browse download <sel> [path] # Download file triggered by click
116
+ browse upload <sel> <files...> # Upload files to input
117
+ ```
70
118
 
71
- $ browse fill @e3 "strollers"
72
- Filled @e3
119
+ ### Mouse Control
73
120
 
74
- $ browse press Enter
75
- Pressed Enter
121
+ ```bash
122
+ browse mouse move <x> <y> # Move mouse to coordinates
123
+ browse mouse down [button] # Press mouse button (left/right/middle)
124
+ browse mouse up [button] # Release mouse button
125
+ browse mouse wheel <dy> [dx] # Scroll wheel
76
126
  ```
77
127
 
78
- ### 3. Cursor-Interactive Detection — What ARIA Misses
128
+ ### Settings
129
+
130
+ ```bash
131
+ browse set geo <lat> <lng> # Set geolocation
132
+ browse set media <scheme> # Set color scheme (dark/light/no-preference)
133
+ ```
79
134
 
80
- Modern SPAs use `<div onclick>`, `cursor: pointer`, `tabindex`, and `data-action` for interactivity. These are **invisible** to accessibility trees — both @playwright/mcp and raw `ariaSnapshot()` miss them.
135
+ ### Wait
81
136
 
82
137
  ```bash
83
- $ browse snapshot -i -C
84
- @e1 [button] "Submit"
85
- @e2 [textbox] "Email"
138
+ browse wait <selector> # Wait for element
139
+ browse wait <selector> --state hidden # Wait for element to disappear
140
+ browse wait <ms> # Wait for milliseconds
141
+ browse wait --url <pattern> # Wait for URL
142
+ browse wait --text "Welcome" # Wait for text to appear in page
143
+ browse wait --fn "js expr" # Wait for JavaScript condition
144
+ browse wait --load <state> # Wait for load state (load/domcontentloaded/networkidle)
145
+ browse wait --network-idle # Wait for network idle
146
+ ```
147
+
148
+ ### Snapshot
86
149
 
87
- [cursor-interactive]
88
- @e3 [div.card] "Add to cart" (cursor:pointer)
89
- @e4 [span.close] "Close dialog" (onclick)
90
- @e5 [div.menu] "Open Menu" (data-action)
150
+ ```bash
151
+ browse snapshot # Full accessibility tree
152
+ browse snapshot -i # Interactive elements only (terse flat list)
153
+ browse snapshot -i -f # Interactive elements, full indented tree
154
+ browse snapshot -i -C # Include cursor-interactive elements (onclick, cursor:pointer)
155
+ browse snapshot -V # Viewport only — elements visible on screen
156
+ browse snapshot -c # Compact — remove empty structural elements
157
+ browse snapshot -d 3 # Limit depth to 3 levels
158
+ browse snapshot -s "#main" # Scope to CSS selector
159
+ browse snapshot -i -c -d 5 # Combine options
91
160
  ```
92
161
 
93
- Every detected element gets a ref. `browse click @e3` just works.
162
+ | Flag | Description |
163
+ |------|-------------|
164
+ | `-i` | Interactive elements only (buttons, links, inputs) — terse flat list |
165
+ | `-f` | Full — indented tree with props and children (use with `-i`) |
166
+ | `-V` | Viewport — only elements visible in current viewport |
167
+ | `-c` | Compact — remove empty structural elements |
168
+ | `-C` | Cursor-interactive — detect divs with `cursor:pointer`, `onclick`, `tabindex` |
169
+ | `-d N` | Limit tree depth |
170
+ | `-s <sel>` | Scope to CSS selector |
94
171
 
95
- ### 4. 75 Purpose-Built Commands vs Generic Tools
172
+ The `-C` flag catches modern SPA patterns that ARIA trees miss — `<div onclick>`, `cursor: pointer`, `tabindex`, and `data-action` elements.
96
173
 
97
- @playwright/mcp has ~15 tools. For anything beyond navigate/click/type, you write JavaScript via `browser_evaluate`. `browse` has purpose-built commands that return structured, minimal output:
174
+ ### Find Elements
98
175
 
99
- | Need | @playwright/mcp | browse |
100
- |------|----------------|--------|
101
- | Page text | `browser_evaluate` + custom JS | `text` |
102
- | Form fields | `browser_evaluate` + custom JS | `forms` → structured JSON |
103
- | All links | `browser_evaluate` + custom JS | `links` → `Text → URL` |
104
- | Network log | Not available | `network` |
105
- | Cookies | Not available | `cookies` |
106
- | Performance | Not available | `perf` |
107
- | Page diff | Not available | `diff <url1> <url2>` |
108
- | Snapshot diff | Not available | `snapshot-diff` |
109
- | Responsive screenshots | Not available | `responsive` |
110
- | Device emulation | Not available | `emulate iphone` |
111
- | Input value | `browser_evaluate` + custom JS | `value <sel>` |
112
- | Element count | `browser_evaluate` + custom JS | `count <sel>` |
113
- | iframe targeting | Not available | `frame <sel>` / `frame main` |
114
- | Network mocking | Not available | `route <pattern> block\|fulfill` |
115
- | Offline mode | Not available | `offline on\|off` |
116
- | State persistence | Not available | `state save\|load` |
117
- | Credential vault | Not available | `auth save\|login\|list` |
118
- | HAR recording | Not available | `har start\|stop` |
119
- | Video recording | Not available | `video start [dir]\|stop\|status` |
120
- | Clipboard access | Not available | `clipboard [write <text>]` |
121
- | Element finding | Not available | `find role\|text\|label\|placeholder\|testid` |
122
- | DevTools inspect | Not available | `inspect` |
123
- | Domain restriction | Not available | `--allowed-domains` |
124
- | Prompt injection defense | Not available | `--content-boundaries` |
125
- | JSON output mode | Not available | `--json` |
176
+ ```bash
177
+ browse find role <role> [name] # By ARIA role
178
+ browse find text <text> # By text content
179
+ browse find label <label> # By label
180
+ browse find placeholder <placeholder> # By placeholder
181
+ browse find testid <id> # By data-testid
182
+ browse find alt <text> # By alt text
183
+ browse find title <text> # By title attribute
184
+ browse find first <sel> # First matching element
185
+ browse find last <sel> # Last matching element
186
+ browse find nth <n> <sel> # Nth matching element (0-indexed)
187
+ ```
126
188
 
127
- ### 5. Persistent Daemon — 100ms Commands
189
+ ### Inspection
128
190
 
191
+ ```bash
192
+ browse js <expr> # Evaluate JavaScript expression
193
+ browse eval <file> # Evaluate JavaScript file
194
+ browse css <sel> <prop> # Get computed CSS property
195
+ browse attrs <sel> # Get element attributes as JSON
196
+ browse element-state <sel> # Element state (visible, enabled, checked, etc.)
197
+ browse value <sel> # Get input/select value
198
+ browse count <sel> # Count elements matching selector
199
+ browse box <sel> # Get bounding box as JSON {x, y, width, height}
200
+ browse clipboard [write <text>] # Read or write clipboard
201
+ browse console [--clear] # Console log buffer
202
+ browse errors [--clear] # Page errors only (filtered from console)
203
+ browse network [--clear] # Network request buffer
204
+ browse cookies # Browser cookies as JSON
205
+ browse storage [set <k> <v>] # localStorage/sessionStorage
206
+ browse perf # Navigation timing (dns, ttfb, load)
207
+ browse devices [filter] # List available device names
129
208
  ```
130
- First command: ~2s (server + Chromium startup, once)
131
- Every command after: ~100-200ms (HTTP to localhost)
209
+
210
+ ### Visual
211
+
212
+ ```bash
213
+ browse screenshot [path] # Take screenshot (viewport)
214
+ browse screenshot --full [path] # Full-page screenshot
215
+ browse screenshot <sel|@ref> [path] # Screenshot specific element
216
+ browse screenshot --clip x,y,w,h [path] # Screenshot clipped region
217
+ browse screenshot --annotate [path] # Annotated screenshot with numbered labels
218
+ browse pdf [path] # Save page as PDF
219
+ browse responsive [prefix] # Mobile/tablet/desktop screenshots
132
220
  ```
133
221
 
134
- @playwright/mcp starts a new browser per MCP session. `browse` keeps the server running across commands with auto-shutdown after 30 min idle. Crash recovery is built in — the CLI detects a dead server and restarts transparently.
222
+ ### Compare
135
223
 
136
- ### 6. Multi-Agent Sessions — Parallel Browsing on One Chromium
224
+ ```bash
225
+ browse diff <url1> <url2> # Text diff between two pages
226
+ browse snapshot-diff # Diff current vs last snapshot
227
+ browse screenshot-diff <baseline> [current] # Pixel-level visual diff
228
+ ```
137
229
 
138
- Run multiple AI agents in parallel, each with its own isolated browser session, sharing a single Chromium process. Each session gets its own tabs, refs, cookies, localStorage, and console/network buffers — zero cross-talk.
230
+ ### Tabs
139
231
 
140
232
  ```bash
141
- # Agent A researches strollers on mumzworld
142
- browse --session agent-a goto https://www.mumzworld.com
143
- browse --session agent-a snapshot -i
144
- browse --session agent-a fill @e3 "strollers"
145
- browse --session agent-a press Enter
233
+ browse tabs # List all tabs
234
+ browse tab <id> # Switch to tab
235
+ browse newtab [url] # Open new tab
236
+ browse closetab [id] # Close tab
237
+ ```
146
238
 
147
- # Agent B checks competitor pricing on amazon — simultaneously
148
- browse --session agent-b goto https://www.amazon.com
149
- browse --session agent-b snapshot -i
150
- browse --session agent-b fill @e6 "baby stroller"
151
- browse --session agent-b press Enter
239
+ ### Frames
152
240
 
153
- # Or set once via env var
154
- export BROWSE_SESSION=agent-a
155
- browse text # runs in agent-a's session
241
+ ```bash
242
+ browse frame <sel> # Switch to iframe
243
+ browse frame main # Back to main frame
156
244
  ```
157
245
 
158
- Under the hood, each session is a separate Playwright `BrowserContext` on the shared Chromium — same isolation model as browser profiles (separate cookies, storage, cache). One process, no extra memory for multiple Chromium instances.
246
+ ### Device Emulation
159
247
 
248
+ ```bash
249
+ browse emulate "iPhone 14" # Emulate device
250
+ browse emulate reset # Reset to desktop (1920x1080)
251
+ browse devices # List all available devices
252
+ browse devices iphone # Filter device list
253
+ browse viewport 1280x720 # Set viewport size
160
254
  ```
161
- browse --session <id> <command>
162
-
163
- Persistent server (one Chromium process)
164
-
165
- SessionManager
166
- ├── "default" → BrowserContext → tabs, refs, cookies, buffers
167
- ├── "agent-a" → BrowserContext tabs, refs, cookies, buffers
168
- └── "agent-b" → BrowserContext tabs, refs, cookies, buffers
255
+
256
+ 100+ devices: iPhone 12–17, Pixel 5–7, iPad, Galaxy, and all Playwright built-ins.
257
+
258
+ ### Cookies
259
+
260
+ ```bash
261
+ browse cookie <name>=<value> # Set cookie (simple)
262
+ browse cookie set <n> <v> [--domain --secure ...] # Set cookie with options
263
+ browse cookie clear # Clear all cookies
264
+ browse cookie export <file> # Export cookies to JSON
265
+ browse cookie import <file> # Import cookies from JSON
266
+ browse cookies # Read all cookies
169
267
  ```
170
268
 
171
- **Session management:**
269
+ ### Network
270
+
172
271
  ```bash
173
- browse sessions # list active sessions with tab counts
174
- browse session-close agent-a # close a session (frees its tabs/context)
175
- browse status # shows total session count
272
+ browse route <pattern> block # Block matching requests
273
+ browse route <pattern> fulfill <status> [body] # Mock response
274
+ browse route clear # Remove all routes
275
+ browse offline [on|off] # Toggle offline mode
276
+ browse header <name>:<value> # Set extra HTTP header
277
+ browse useragent <string> # Set user agent
176
278
  ```
177
279
 
178
- Sessions auto-close after the idle timeout (default 30 min). The server shuts down when all sessions are idle. Without `--session`, everything runs in a `"default"` session — fully backward compatible.
280
+ ### Dialogs
179
281
 
180
- For full process isolation (separate Chromium instances), use `BROWSE_PORT` to run independent servers.
282
+ ```bash
283
+ browse dialog # Last dialog info
284
+ browse dialog-accept [text] # Accept next dialog (optional prompt text)
285
+ browse dialog-dismiss # Dismiss next dialog
286
+ ```
181
287
 
182
- ## Install
288
+ ### Recording
183
289
 
184
290
  ```bash
185
- npm install -g @ulpi/browse
291
+ browse har start # Start HAR recording
292
+ browse har stop [path] # Stop and save HAR file
293
+
294
+ browse video start [dir] # Start video recording (WebM)
295
+ browse video stop # Stop recording
296
+ browse video status # Check recording status
297
+
298
+ browse record start # Record browsing commands as you go
299
+ browse record stop # Stop recording
300
+ browse record status # Check recording status
301
+ browse record export browse [path] # Export as chain-compatible JSON (replay with browse chain)
302
+ browse record export replay [path] # Export as Chrome DevTools Recorder (Playwright/Puppeteer)
186
303
  ```
187
304
 
188
- Requires [Bun](https://bun.sh) runtime. Chromium is installed automatically via Playwright.
305
+ ### State & Auth
189
306
 
190
- ### Claude Code Skill
307
+ ```bash
308
+ browse state save [name] # Save cookies + localStorage
309
+ browse state load [name] # Restore saved state
310
+ browse state list # List saved states
311
+ browse state show [name] # Show state details
312
+
313
+ browse auth save <name> <url> <user> <pass> # Save encrypted credential
314
+ browse auth save <name> <url> <user> --password-stdin # Password from stdin
315
+ browse auth login <name> # Auto-login with saved credential
316
+ browse auth list # List saved credentials
317
+ browse auth delete <name> # Delete credential
318
+ ```
191
319
 
192
- Install via [skills.sh](https://skills.sh) (works across Claude Code, Cursor, Cline, Windsurf, and 15+ agents):
320
+ ### Multi-Step (Chaining)
321
+
322
+ Execute a sequence of commands in one call:
193
323
 
194
324
  ```bash
195
- npx skills add https://github.com/ulpi-io/skills --skill browse
325
+ echo '[["goto","https://example.com"],["snapshot","-i"],["text"]]' | browse chain
196
326
  ```
197
327
 
198
- Or install directly into your project:
328
+ ### Server Control
199
329
 
200
330
  ```bash
201
- browse install-skill
331
+ browse status # Server health report
332
+ browse instances # List all running browse servers
333
+ browse doctor # System check (Bun, Playwright, Chromium)
334
+ browse upgrade # Self-update via npm
335
+ browse stop # Stop server
336
+ browse restart # Restart server
337
+ browse inspect # Open DevTools (requires BROWSE_DEBUG_PORT)
202
338
  ```
203
339
 
204
- Both copy the skill definition to `.claude/skills/browse/SKILL.md` and add all browse commands to permissions — no more approval prompts.
340
+ ### Setup
205
341
 
206
- ## Real-World Example: E-Commerce Flow
342
+ ```bash
343
+ browse install-skill [path] # Install Claude Code skill
344
+ ```
345
+
346
+ ## Sessions
207
347
 
208
- Agent browses mumzworld.com search, find a product, add to cart, checkout:
348
+ Run multiple AI agents in parallel, each with isolated browser state, sharing one Chromium process:
209
349
 
210
350
  ```bash
211
- browse goto https://www.mumzworld.com
212
- browse snapshot -i # find searchbox → @e3
213
- browse fill @e3 "strollers"
214
- browse press Enter
351
+ # Agent A
352
+ browse --session agent-a goto https://site-a.com
353
+ browse --session agent-a snapshot -i
354
+ browse --session agent-a click @e3
215
355
 
216
- browse text # scan prices in results
217
- browse goto "https://www.mumzworld.com/en/doona-infant-car-seat..."
356
+ # Agent B (simultaneously)
357
+ browse --session agent-b goto https://site-b.com
358
+ browse --session agent-b snapshot -i
359
+ browse --session agent-b fill @e2 "query"
218
360
 
219
- browse snapshot -i # find Add to Cart @e54
220
- browse click @e54
361
+ # Or set once via env var
362
+ export BROWSE_SESSION=agent-a
363
+ browse text
364
+ ```
221
365
 
222
- browse snapshot -i -s "[role=dialog]" # scope to cart modal
223
- browse click @e3 # "View Cart"
366
+ Each session has its own:
367
+ - Browser context (cookies, storage, cache)
368
+ - Tabs and navigation history
369
+ - Refs from snapshots
370
+ - Console and network buffers
224
371
 
225
- browse snapshot -i # find Checkout → @e52
226
- browse click @e52
372
+ ```bash
373
+ browse sessions # List active sessions
374
+ browse session-close agent-a # Close a session
375
+ browse status # Shows total session count
227
376
  ```
228
377
 
229
- **12 steps. ~24K tokens total.** With @playwright/mcp: **~240K tokens** for the same flow (every action dumps a full snapshot).
378
+ Sessions auto-close after the idle timeout (default 30 min). Without `--session`, everything runs in a `"default"` session.
230
379
 
231
- ## Command Reference
380
+ For full process isolation (separate Chromium instances), use `BROWSE_PORT` to run independent servers.
232
381
 
233
- ### Navigation
234
- `goto <url>` | `back` | `forward` | `reload` | `url`
382
+ ## Security
235
383
 
236
- ### Content Extraction
237
- `text` | `html [sel]` | `links` | `forms` | `accessibility`
384
+ All security features are opt-in — existing workflows are unaffected until you explicitly enable a feature.
238
385
 
239
- ### Interaction
240
- `click <sel>` | `dblclick <sel>` | `fill <sel> <val>` | `select <sel> <val>` | `hover <sel>` | `focus <sel>` | `check <sel>` | `uncheck <sel>` | `drag <src> <tgt>` | `type <text>` | `press <key>` | `keydown <key>` | `keyup <key>` | `scroll [sel|up|down]` | `wait <sel|--url|--network-idle>` | `viewport <WxH>` | `highlight <sel>` | `download <sel> [path]`
386
+ ### Domain Allowlist
387
+
388
+ Restrict navigation and sub-resource requests to trusted domains:
241
389
 
242
- ### Snapshot & Refs
390
+ ```bash
391
+ browse --allowed-domains "example.com,*.example.com" goto https://example.com
392
+ # Or via env var
393
+ BROWSE_ALLOWED_DOMAINS="example.com,*.api.io" browse goto https://example.com
243
394
  ```
244
- snapshot [-i] [-c] [-C] [-d N] [-s sel]
245
- -i Interactive elements only (buttons, links, inputs)
246
- -c Compact — remove empty structural nodes
247
- -C Cursor-interactive detect hidden clickable elements
248
- -d N Limit tree depth
249
- -s Scope to CSS selector
395
+
396
+ Blocks HTTP requests, WebSocket, EventSource, and `sendBeacon` to non-allowed domains. Wildcards like `*.example.com` match the bare domain and all subdomains.
397
+
398
+ ### Action Policy
399
+
400
+ Gate commands with a `browse-policy.json` file:
401
+
402
+ ```json
403
+ { "default": "allow", "deny": ["js", "eval"], "confirm": ["goto"] }
250
404
  ```
251
- After snapshot, use `@e1`, `@e2`... as selectors in any command.
252
405
 
253
- ### Snapshot Diff
254
- `snapshot-diff` — compare current page against last snapshot.
406
+ Precedence: deny > confirm > allow > default. Hot-reloads on file change — no server restart needed.
255
407
 
256
- ### Device Emulation
257
- `emulate <device>` | `emulate reset` | `devices [filter]`
408
+ ### Credential Vault
258
409
 
259
- 100+ devices: iPhone 12-17, Pixel 5-7, iPad, Galaxy, and all Playwright built-ins.
410
+ Encrypted credential storage (AES-256-GCM). The LLM never sees passwords:
260
411
 
261
- ### Inspection
262
- `js <expr>` | `eval <file>` | `css <sel> <prop>` | `attrs <sel>` | `element-state <sel>` | `value <sel>` | `count <sel>` | `clipboard [write <text>]` | `console [--clear]` | `network [--clear]` | `cookies` | `storage [set <k> <v>]` | `perf`
412
+ ```bash
413
+ echo "mypassword" | browse auth save github https://github.com/login myuser --password-stdin
414
+ browse auth login github # Auto-navigates, detects form, fills + submits
415
+ browse auth list # List saved credentials (no passwords shown)
416
+ ```
263
417
 
264
- ### Visual
265
- `screenshot [path]` | `screenshot --annotate` | `pdf [path]` | `responsive [prefix]`
418
+ Key is auto-generated at `.browse/.encryption-key` or set via `BROWSE_ENCRYPTION_KEY`.
266
419
 
267
- ### Compare
268
- `diff <url1> <url2>` — text diff between two pages.
269
- `screenshot-diff <baseline> [current]` — pixel-level visual regression testing.
420
+ ### Content Boundaries
270
421
 
271
- ### Find
272
- `find role|text|label|placeholder|testid <query> [name]` — semantic element locators.
422
+ Wrap page output in CSPRNG nonce-delimited markers so LLMs can distinguish tool output from untrusted page content:
273
423
 
274
- ### Multi-Step
275
424
  ```bash
276
- echo '[["goto","https://example.com"],["text"]]' | browse chain
425
+ browse --content-boundaries text
277
426
  ```
278
427
 
279
- ### Tabs
280
- `tabs` | `tab <id>` | `newtab [url]` | `closetab [id]`
428
+ ### JSON Output
281
429
 
282
- ### Frames
283
- `frame <sel>` | `frame main`
430
+ Machine-readable output for agent frameworks:
284
431
 
285
- ### Sessions
286
- `sessions` | `session-close <id>`
432
+ ```bash
433
+ browse --json snapshot -i
434
+ # Returns: {"success": true, "data": "...", "command": "snapshot"}
435
+ ```
287
436
 
288
- ### Network
289
- `route <pattern> block` | `route <pattern> fulfill <status> [body]` | `route clear` | `offline [on|off]`
437
+ ## Configuration
438
+
439
+ Create a `browse.json` file at your project root to set persistent defaults:
440
+
441
+ ```json
442
+ {
443
+ "session": "my-agent",
444
+ "json": true,
445
+ "contentBoundaries": true,
446
+ "allowedDomains": ["example.com", "*.api.io"],
447
+ "idleTimeout": 3600000,
448
+ "viewport": "1280x720",
449
+ "device": "iPhone 14",
450
+ "runtime": "playwright"
451
+ }
452
+ ```
290
453
 
291
- ### State & Auth
292
- `state save [name]` | `state load [name]` | `state list` | `state show [name]` | `auth save <name> <url> <user> <pass>` | `auth login <name>` | `auth list` | `auth delete <name>`
454
+ CLI flags and environment variables override config file values.
293
455
 
294
- ### Recording
295
- `har start` | `har stop [path]` | `video start [dir]` | `video stop` | `video status`
456
+ ## Usage with AI Agents
296
457
 
297
- ### Debug
298
- `inspect` — open DevTools debugger (requires `BROWSE_DEBUG_PORT`).
458
+ ### Claude Code (recommended)
299
459
 
300
- ### Server Control
301
- `status` | `instances` | `cookie <n>=<v>` | `header <n>:<v>` | `useragent <str>` | `stop` | `restart`
460
+ Install as a Claude Code skill via [skills.sh](https://skills.sh):
302
461
 
303
- ## Architecture
462
+ ```bash
463
+ npx skills add https://github.com/ulpi-io/skills --skill browse
464
+ ```
465
+
466
+ Or install directly:
304
467
 
468
+ ```bash
469
+ browse install-skill
305
470
  ```
306
- browse [--session <id>] <command>
307
-
308
-
309
- CLI (thin HTTP client)
310
- X-Browse-Session: <id>
311
-
312
-
313
- Persistent server (localhost, auto-started)
314
-
315
- SessionManager
316
- ├── Session "default" BrowserContext + tabs + refs + buffers
317
- ├── Session "agent-a" → BrowserContext + tabs + refs + buffers
318
- └── Session "agent-b" → BrowserContext + tabs + refs + buffers
319
-
320
-
321
- Chromium (Playwright, headless, shared)
471
+
472
+ Both copy the skill definition to `.claude/skills/browse/SKILL.md` and add all browse commands to permissions — no more approval prompts.
473
+
474
+ ### CLAUDE.md / AGENTS.md
475
+
476
+ Add to your project instructions:
477
+
478
+ ```markdown
479
+ ## Browser Automation
480
+
481
+ Use `browse` for web automation. Run `browse --help` for all commands.
482
+
483
+ Core workflow:
484
+ 1. `browse goto <url>` — Navigate to page
485
+ 2. `browse snapshot -i` — Get interactive elements with refs (@e1, @e2)
486
+ 3. `browse click @e1` / `fill @e2 "text"` — Interact using refs
487
+ 4. Re-snapshot after page changes
488
+ ```
489
+
490
+ ### Just ask the agent
491
+
492
+ ```
493
+ Use browse to test the login flow. Run browse --help to see available commands.
322
494
  ```
323
495
 
324
- ## CLI Options
496
+ ## Options
325
497
 
326
498
  | Flag | Description |
327
499
  |------|-------------|
@@ -329,99 +501,93 @@ browse [--session <id>] <command>
329
501
  | `--json` | Wrap output as `{success, data, command}` |
330
502
  | `--content-boundaries` | Wrap page content in nonce-delimited markers |
331
503
  | `--allowed-domains <d,d>` | Block navigation/resources outside allowlist |
332
- | `--headed` | Run browser in headed (visible) mode |
504
+ | `--max-output <n>` | Truncate output to N characters |
505
+ | `--headed` | Show browser window (not headless) |
333
506
 
334
507
  ## Environment Variables
335
508
 
336
509
  | Variable | Default | Description |
337
510
  |----------|---------|-------------|
338
- | `BROWSE_PORT` | auto 9400-10400 | Fixed server port |
511
+ | `BROWSE_PORT` | auto (940010400) | Fixed server port |
339
512
  | `BROWSE_PORT_START` | 9400 | Start of port scan range |
340
513
  | `BROWSE_SESSION` | (none) | Default session ID for all commands |
341
- | `BROWSE_INSTANCE` | auto (PPID) | Instance ID for multi-Claude isolation |
342
- | `BROWSE_IDLE_TIMEOUT` | 1800000 (30m) | Idle shutdown in ms |
514
+ | `BROWSE_INSTANCE` | auto (PPID) | Instance ID for multi-agent isolation |
515
+ | `BROWSE_IDLE_TIMEOUT` | 1800000 (30m) | Idle auto-shutdown in ms |
343
516
  | `BROWSE_TIMEOUT` | (none) | Override all command timeouts (ms) |
344
- | `BROWSE_LOCAL_DIR` | `.browse/` or `/tmp` | State/log directory |
517
+ | `BROWSE_LOCAL_DIR` | `.browse/` or `/tmp` | State/log/screenshot directory |
345
518
  | `BROWSE_JSON` | (none) | Set to `1` for JSON output mode |
346
519
  | `BROWSE_CONTENT_BOUNDARIES` | (none) | Set to `1` for nonce-delimited output |
347
520
  | `BROWSE_ALLOWED_DOMAINS` | (none) | Comma-separated domain allowlist |
348
- | `BROWSE_HEADED` | (none) | Set to `1` for headed (visible) browser mode |
521
+ | `BROWSE_MAX_OUTPUT` | (none) | Truncate output to N characters |
522
+ | `BROWSE_HEADED` | (none) | Set to `1` for headed browser mode |
523
+ | `BROWSE_CDP_URL` | (none) | Connect to remote Chrome via CDP |
349
524
  | `BROWSE_PROXY` | (none) | Proxy server URL |
350
525
  | `BROWSE_PROXY_BYPASS` | (none) | Proxy bypass list |
351
- | `BROWSE_CDP_URL` | (none) | Connect to remote Chrome via CDP |
352
526
  | `BROWSE_SERVER_SCRIPT` | auto-detected | Override path to server.ts |
353
- | `BROWSE_DEBUG_PORT` | (none) | Port for DevTools debugging (inspect command) |
527
+ | `BROWSE_DEBUG_PORT` | (none) | Port for DevTools debugging |
354
528
  | `BROWSE_POLICY` | browse-policy.json | Path to action policy file |
355
- | `BROWSE_CONFIRM_ACTIONS` | (none) | Comma-separated commands requiring confirmation |
529
+ | `BROWSE_CONFIRM_ACTIONS` | (none) | Commands requiring confirmation |
356
530
  | `BROWSE_ENCRYPTION_KEY` | auto-generated | 64-char hex AES key for credential vault |
357
- | `BROWSE_AUTH_PASSWORD` | (none) | Password for auth save (alt to `--password-stdin`) |
531
+ | `BROWSE_AUTH_PASSWORD` | (none) | Password for `auth save` (alt to `--password-stdin`) |
532
+ | `BROWSE_RUNTIME` | playwright | Browser runtime (playwright, rebrowser, lightpanda) |
358
533
 
359
- ## Acknowledgments
534
+ ## Architecture
535
+
536
+ ```
537
+ browse [--session <id>] <command>
538
+ |
539
+ CLI (thin HTTP client)
540
+ |
541
+ Persistent server (localhost, auto-started)
542
+ |
543
+ SessionManager
544
+ ├── "default" → BrowserContext → tabs, refs, cookies, buffers
545
+ ├── "agent-a" → BrowserContext → tabs, refs, cookies, buffers
546
+ └── "agent-b" → BrowserContext → tabs, refs, cookies, buffers
547
+ |
548
+ Chromium (Playwright, headless, shared)
549
+ ```
550
+
551
+ - **First command:** ~2s (server + Chromium startup, once)
552
+ - **Every command after:** ~100–200ms (HTTP to localhost)
553
+ - Server auto-starts on first command, auto-shuts down after 30 min idle
554
+ - Crash recovery: CLI detects dead server and restarts transparently
555
+ - State file: `.browse/browse-server.json` (pid, port, token)
556
+
557
+ ## Benchmarks
558
+
559
+ ### vs Agent Browser & Browser-Use (Token Cost)
560
+
561
+ Tested on 3 sites across multi-step browsing flows — navigate, snapshot, scroll, search, extract text:
562
+
563
+ | Tool | Total Tokens | Total Time | Context Used (200K) |
564
+ |------|-------------:|-----------:|--------------------:|
565
+ | **browse** | **14,134** | **28.5s** | **7.1%** |
566
+ | agent-browser | 39,414 | 36.2s | 19.7% |
567
+ | browser-use | 34,281 | 72.7s | 17.1% |
360
568
 
361
- Inspired by and originally derived from the `/browse` skill in [gstack](https://github.com/garrytan/gstack) by Garry Tan. The core architecture — persistent Chromium daemon, thin CLI client, ref-based element selection via ARIA snapshots — comes from gstack.
569
+ browse uses **2.4x fewer tokens** than browser-use, **2.8x fewer** than agent-browser, and completes **2.5x faster** than browser-use.
570
+
571
+ ### vs @playwright/mcp (Architecture)
572
+
573
+ @playwright/mcp dumps the full accessibility snapshot on every action. browse returns ~15 tokens per action — the agent requests a snapshot only when needed:
574
+
575
+ | | @playwright/mcp | browse |
576
+ |---|---:|---:|
577
+ | Tokens on `navigate` | ~14,578 (auto-dumped) | **~11** |
578
+ | Tokens on `click` | ~14,578 (auto-dumped) | **~15** |
579
+ | 10-action session | ~145,780 | **~11,388** |
580
+ | Context consumed (200K) | **73%** | **6%** |
581
+
582
+ Rerun: `bun run benchmark`
362
583
 
363
584
  ## Changelog
364
585
 
365
- ### v0.3.0 Headed Mode, Clipboard, DevTools
366
-
367
- - `--headed` flag — run browser in visible mode for debugging and demos
368
- - `clipboard [write <text>]` — read and write clipboard contents
369
- - `inspect` command open DevTools debugger via `BROWSE_DEBUG_PORT`
370
- - `screenshot --annotate` — pixel-annotated PNG with numbered badges
371
- - `instances` command — list all running browse servers
372
- - `BROWSE_DEBUG_PORT` env var for DevTools debugging
373
-
374
- ### v0.2.0 — Security, Interactions, DX
375
-
376
- **Commands:**
377
- - `dblclick`, `focus`, `check`, `uncheck`, `drag`, `keydown`, `keyup` — interaction commands
378
- - `frame <sel>` / `frame main` — iframe targeting
379
- - `value <sel>`, `count <sel>` — element inspection
380
- - `scroll up/down` — viewport-relative scrolling
381
- - `wait --url`, `wait --network-idle` — navigation/network wait variants
382
- - `highlight <sel>` — visual element debugging
383
- - `download <sel> [path]` — file download
384
- - `route <pattern> block/fulfill` — network request interception and mocking
385
- - `offline on/off` — offline mode toggle
386
- - `state save/load` — persist and restore cookies + localStorage (all origins)
387
- - `har start/stop` — HAR recording and export
388
- - `video start/stop/status` — video recording (WebM, compositor-level, works with remote CDP)
389
- - `screenshot-diff` — pixel-level visual regression testing
390
- - `find role/text/label/placeholder/testid` — semantic element locators
391
-
392
- **Security:**
393
- - `--allowed-domains` — domain allowlist (HTTP + WebSocket/EventSource/sendBeacon)
394
- - `browse-policy.json` — action policy gate (allow/deny/confirm per command)
395
- - `auth save/login/list/delete` — AES-256-GCM encrypted credential vault
396
- - `--content-boundaries` — CSPRNG nonce wrapping for prompt injection defense
397
-
398
- **DX:**
399
- - `--json` — structured output mode for agent frameworks
400
- - `browse.json` config file support
401
- - AI-friendly error messages — Playwright errors rewritten to actionable hints
402
- - Per-session output folders (`.browse/sessions/{id}/`)
403
-
404
- **Infrastructure:**
405
- - Auto-instance servers via PPID — multi-Claude isolation
406
- - CDP remote connection (`BROWSE_CDP_URL`)
407
- - Proxy support (`BROWSE_PROXY`)
408
- - Compiled binary self-spawn mode
409
- - Orphaned server cleanup
410
-
411
- ### v0.1.0 — Foundation
412
-
413
- **Commands:**
414
- - `emulate` / `devices` — device emulation (100+ devices)
415
- - `snapshot -C` — cursor-interactive detection
416
- - `snapshot-diff` — before/after comparison with ref-number stripping
417
- - `dialog` / `dialog-accept` / `dialog-dismiss` — dialog handling
418
- - `upload` — file upload
419
- - `screenshot --annotate` — numbered badge overlay with legend
420
-
421
- **Infrastructure:**
422
- - Session multiplexing — multiple agents share one Chromium
423
- - Safe retry classification — read vs write commands
424
- - TreeWalker text extraction — no MutationObserver triggers
586
+ See [CHANGELOG.md](CHANGELOG.md) for full release history.
587
+
588
+ ## Acknowledgments
589
+
590
+ Inspired by and originally derived from the `/browse` skill in [gstack](https://github.com/garrytan/gstack) by Garry Tan.
425
591
 
426
592
  ## License
427
593