browsercontrol 0.1.1__tar.gz → 0.1.2__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (21) hide show
  1. {browsercontrol-0.1.1 → browsercontrol-0.1.2}/PKG-INFO +226 -221
  2. {browsercontrol-0.1.1 → browsercontrol-0.1.2}/README.md +225 -220
  3. {browsercontrol-0.1.1 → browsercontrol-0.1.2}/browsercontrol/browser.py +76 -7
  4. {browsercontrol-0.1.1 → browsercontrol-0.1.2}/browsercontrol/server.py +15 -0
  5. {browsercontrol-0.1.1 → browsercontrol-0.1.2}/browsercontrol/tools/__init__.py +5 -3
  6. {browsercontrol-0.1.1 → browsercontrol-0.1.2}/browsercontrol/tools/content.py +8 -1
  7. {browsercontrol-0.1.1 → browsercontrol-0.1.2}/browsercontrol/tools/devtools.py +120 -5
  8. {browsercontrol-0.1.1 → browsercontrol-0.1.2}/browsercontrol/tools/forms.py +45 -3
  9. {browsercontrol-0.1.1 → browsercontrol-0.1.2}/browsercontrol/tools/interaction.py +1 -2
  10. {browsercontrol-0.1.1 → browsercontrol-0.1.2}/browsercontrol/tools/navigation.py +1 -2
  11. {browsercontrol-0.1.1 → browsercontrol-0.1.2}/browsercontrol/tools/recording.py +1 -2
  12. browsercontrol-0.1.2/browsercontrol/tools/tabs.py +91 -0
  13. {browsercontrol-0.1.1 → browsercontrol-0.1.2}/pyproject.toml +1 -1
  14. {browsercontrol-0.1.1 → browsercontrol-0.1.2}/uv.lock +1 -1
  15. {browsercontrol-0.1.1 → browsercontrol-0.1.2}/.gitignore +0 -0
  16. {browsercontrol-0.1.1 → browsercontrol-0.1.2}/CONTRIBUTING.md +0 -0
  17. {browsercontrol-0.1.1 → browsercontrol-0.1.2}/LICENSE +0 -0
  18. {browsercontrol-0.1.1 → browsercontrol-0.1.2}/assets/logo.png +0 -0
  19. {browsercontrol-0.1.1 → browsercontrol-0.1.2}/browsercontrol/__init__.py +0 -0
  20. {browsercontrol-0.1.1 → browsercontrol-0.1.2}/browsercontrol/__main__.py +0 -0
  21. {browsercontrol-0.1.1 → browsercontrol-0.1.2}/browsercontrol/config.py +1 -1
@@ -1,6 +1,6 @@
1
1
  Metadata-Version: 2.4
2
2
  Name: browsercontrol
3
- Version: 0.1.1
3
+ Version: 0.1.2
4
4
  Summary: MCP server for browser automation with Set of Marks (SoM) - AI agents can see and interact with web pages using numbered element IDs
5
5
  Project-URL: Homepage, https://github.com/adityasasidhar/browsercontrol
6
6
  Project-URL: Repository, https://github.com/adityasasidhar/browsercontrol
@@ -25,188 +25,107 @@ Requires-Dist: playwright>=1.49.0
25
25
  Description-Content-Type: text/markdown
26
26
 
27
27
  <p align="center">
28
- <img src="https://raw.githubusercontent.com/adityasasidhar/browsercontrol/main/assets/logo.png" alt="BrowserControl" width="120">
28
+ <img src="https://raw.githubusercontent.com/adityasasidhar/browsercontrol/main/assets/logo.png" alt="BrowserControl" width="140">
29
29
  </p>
30
30
 
31
- <h1 align="center">🌐 BrowserControl</h1>
31
+ <h1 align="center">BrowserControl</h1>
32
32
 
33
33
  <p align="center">
34
- <strong>Give your AI agent real browser superpowers.</strong>
34
+ <strong>Give your AI agent real browser superpowers.</strong><br>
35
+ <sub>Vision-first browser automation for Claude, Gemini, and any MCP-compatible AI agent.</sub>
35
36
  </p>
36
37
 
37
38
  <p align="center">
38
- <a href="https://www.python.org/downloads/"><img src="https://img.shields.io/badge/python-3.11+-blue.svg" alt="Python 3.11+"></a>
39
- <a href="https://opensource.org/licenses/MIT"><img src="https://img.shields.io/badge/License-MIT-green.svg" alt="License: MIT"></a>
40
- <a href="https://modelcontextprotocol.io/"><img src="https://img.shields.io/badge/MCP-Compatible-purple.svg" alt="MCP"></a>
41
- <a href="https://github.com/adityasasidhar/browsercontrol"><img src="https://img.shields.io/github/stars/adityasasidhar/browsercontrol?style=social" alt="GitHub Stars"></a>
39
+ <a href="https://pypi.org/project/browsercontrol/"><img src="https://img.shields.io/pypi/v/browsercontrol?color=blue&label=PyPI" alt="PyPI"></a>
40
+ <a href="https://www.python.org/downloads/"><img src="https://img.shields.io/badge/python-3.11+-3776ab.svg?logo=python&logoColor=white" alt="Python 3.11+"></a>
41
+ <a href="https://opensource.org/licenses/MIT"><img src="https://img.shields.io/badge/license-MIT-green.svg" alt="License: MIT"></a>
42
+ <a href="https://modelcontextprotocol.io/"><img src="https://img.shields.io/badge/MCP-compatible-7c3aed.svg" alt="MCP Compatible"></a>
43
+ <a href="https://github.com/adityasasidhar/browsercontrol/stargazers"><img src="https://img.shields.io/github/stars/adityasasidhar/browsercontrol?style=social" alt="GitHub Stars"></a>
42
44
  </p>
43
45
 
44
46
  <p align="center">
45
47
  <a href="#-quick-start">Quick Start</a> •
46
- <a href="#-features">Features</a> •
48
+ <a href="#-the-secret-set-of-marks-som">How It Works</a> •
47
49
  <a href="#-available-tools">Tools</a> •
48
- <a href="#%EF%B8%8F-configuration">Configuration</a> •
49
- <a href="#-examples">Examples</a>
50
+ <a href="#-configuration">Configuration</a> •
51
+ <a href="#-examples">Examples</a>
52
+ <a href="#-contributing">Contributing</a>
50
53
  </p>
51
54
 
52
55
  ---
53
56
 
54
- Ever wished Claude, Gemini, or your custom AI agent could actually browse the web? Not just fetch URLs, but truly **see**, **click**, **type**, and **interact** with any website like a human?
57
+ > **Ever wished Claude or Gemini could actually browse the web?** Not just fetch URLs, but truly **see**, **click**, **type**, and **interact** with any website like a human?
55
58
 
56
- **BrowserControl** is an MCP server that gives your AI agent full browser access with a **vision-first approach** inspired by Google's AntiGravity IDE.
59
+ **BrowserControl** is an MCP server that gives your AI agent full browser access with a **vision-first approach**—no CSS selectors, no XPath, no guessing. Just point at numbers.
57
60
 
58
- ## ✨ What Makes This Different
59
-
60
- | Traditional Web Access | BrowserControl |
61
- |------------------------|----------------|
62
- | Fetch static HTML | See the **rendered page** |
63
- | Parse complex DOM | Point at **numbered elements** |
64
- | Guess at selectors | Just say **"click 5"** |
65
- | No JavaScript support | Full **dynamic content** |
66
- | No login persistence | **Persistent sessions** |
67
- | No debugging tools | **Console, Network, Errors** |
68
-
69
- ### 🎯 The Secret: Set of Marks (SoM)
70
-
71
- Every screenshot comes annotated with **numbered red boxes** on interactive elements:
72
-
73
- ```
74
- Found 15 interactive elements:
75
- [1] button - Sign In
76
- [2] input - Search...
77
- [3] a - Products
78
- [4] a - Pricing
79
- [5] button - Get Started
80
- ```
81
-
82
- Your agent sees the numbers and simply calls `click(1)` to sign in. **No CSS selectors. No XPath. No guessing.**
83
-
84
- ---
61
+ <br>
85
62
 
86
- ## 🏆 Why BrowserControl Beats Every Alternative
87
-
88
- ### Head-to-Head Comparison
89
-
90
- | Feature | **BrowserControl** | Playwright MCP | Stagehand | Browser-Use | AgentQL |
91
- |---------|:------------------:|:--------------:|:---------:|:-----------:|:-------:|
92
- | **Vision-First (SoM)** | ✅ Numbered boxes | ❌ Text tree | ⚠️ AI vision | ⚠️ AI vision | ❌ Selectors |
93
- | **No Extra AI Calls** | ✅ Zero | ❌ Parses tree | ❌ GPT-4V per action | ❌ Vision model | ❌ Query model |
94
- | **Developer Tools** | ✅ 6 tools | ❌ None | ❌ None | ❌ None | ❌ None |
95
- | **Session Recording** | ✅ Built-in | ❌ Manual | ❌ None | ❌ None | ❌ None |
96
- | **Persistent Sessions** | ✅ Automatic | ⚠️ Manual setup | ❌ None | ❌ None | ❌ None |
97
- | **MCP Native** | ✅ FastMCP | ✅ Official | ❌ Python SDK | ⚠️ Custom | ❌ REST API |
98
- | **Install Complexity** | ✅ `pip install` | ⚠️ npx + config | ❌ Docker + setup | ⚠️ Docker | ❌ Cloud signup |
99
- | **Token Efficiency** | ✅ Tiny IDs | ⚠️ Large tree | ❌ Full images | ❌ Full images | ⚠️ Query results |
100
- | **Cost per Action** | ✅ $0 | ✅ $0 | ❌ ~$0.01-0.05 | ❌ ~$0.01-0.05 | ❌ API fees |
101
- | **Offline/Local** | ✅ 100% local | ✅ Local | ⚠️ Needs LLM API | ⚠️ Needs LLM API | ❌ Cloud only |
102
-
103
- ### 🎯 Key Advantages
63
+ ## What Makes This Different
104
64
 
105
- #### 1. **Token Efficiency = Faster + Cheaper**
65
+ <table>
66
+ <tr>
67
+ <td width="50%">
106
68
 
69
+ ### ❌ Traditional Approach
107
70
  ```
108
- Other tools send: BrowserControl sends:
109
- ─────────────────── ─────────────────────
110
- Full DOM tree "click(5)"
111
- (5,000+ tokens) (3 tokens)
112
- or
113
- Base64 screenshot Element ID + summary
114
- (10,000+ tokens) (100 tokens)
115
- ```
116
-
117
- **Result**: 50-100x fewer tokens per action = faster responses, lower costs.
118
-
119
- #### 2. **No Extra AI Calls Required**
120
-
121
- | Tool | AI Calls per Click |
122
- |------|-------------------|
123
- | **BrowserControl** | 0 (just `click(5)`) |
124
- | Stagehand | 1-2 (vision + action) |
125
- | Browser-Use | 1-2 (vision + planning) |
126
- | AgentQL | 1 (query interpretation) |
127
-
128
- **Result**: No vision API costs, no rate limits, works offline.
129
-
130
- #### 3. **Developer Tools No One Else Has**
131
-
132
- ```python
133
- # Only BrowserControl can do this:
134
- get_console_logs() # See browser errors
135
- get_network_requests() # Monitor API calls
136
- get_page_errors() # Catch JS exceptions
137
- run_in_console(code) # Debug in real-time
138
- inspect_element(5) # Get computed styles
139
- get_page_performance() # Core Web Vitals
71
+ "Find the button with class 'btn-primary'
72
+ that contains 'Submit' and is inside
73
+ form#contact-form..."
140
74
  ```
75
+ - Parse complex DOM structures
76
+ - Guess at CSS selectors
77
+ - No JavaScript support
78
+ - No login persistence
79
+ - No debugging tools
141
80
 
142
- **Other tools**: Navigate, click, type... that's it.
143
-
144
- #### 4. **Session Recording Built-In**
81
+ </td>
82
+ <td width="50%">
145
83
 
84
+ ### ✅ BrowserControl
146
85
  ```
147
- start_recording() → Browse around → stop_recording()
148
-
149
- 📹 session_20260108.zip
150
- (View with Playwright trace viewer)
86
+ "click(7)"
151
87
  ```
88
+ - See the **rendered page** with numbered elements
89
+ - Just say **"click 5"** or **"type in 3"**
90
+ - Full **dynamic JavaScript** support
91
+ - **Persistent sessions** across restarts
92
+ - Complete **DevTools access**
152
93
 
153
- **Other tools**: No recording. Debug from memory.
94
+ </td>
95
+ </tr>
96
+ </table>
154
97
 
155
- #### 5. **True Persistence**
156
-
157
- | What Persists | BrowserControl | Others |
158
- |---------------|:--------------:|:------:|
159
- | Cookies | ✅ | ❌ |
160
- | localStorage | ✅ | ❌ |
161
- | Session tokens | ✅ | ❌ |
162
- | Login state | ✅ | ❌ |
163
- | Browser history | ✅ | ❌ |
98
+ <br>
164
99
 
165
- **Result**: Log in once, stay logged in across sessions.
100
+ ## 🎯 The Secret: Set of Marks (SoM)
166
101
 
167
- #### 6. **Simpler Mental Model**
102
+ Every screenshot comes annotated with **numbered red boxes** on interactive elements:
168
103
 
169
104
  ```
170
- Other tools:
171
- "Find the button with class 'btn-primary' that contains text 'Submit'
172
- and is a descendant of form#contact-form..."
173
-
174
- BrowserControl:
175
- "click(7)"
105
+ Found 15 interactive elements:
106
+ [1] button - Sign In
107
+ [2] input - Search...
108
+ [3] a - Products
109
+ [4] a - Pricing
110
+ [5] button - Get Started
176
111
  ```
177
112
 
178
- ### 📊 Real-World Performance
179
-
180
- | Scenario | BrowserControl | Vision-Based Tools |
181
- |----------|:--------------:|:------------------:|
182
- | Click a button | ~50ms | ~2-5 seconds |
183
- | Fill a form (5 fields) | ~500ms | ~15-30 seconds |
184
- | Navigate + act | ~1 second | ~5-10 seconds |
185
- | Debug console errors | ✅ Instant | ❌ Not possible |
186
-
187
- ### 💰 Cost Comparison (1000 actions/month)
188
-
189
- | Tool | Monthly Cost |
190
- |------|-------------|
191
- | **BrowserControl** | **$0** (fully local) |
192
- | Stagehand (GPT-4V) | ~$30-50 |
193
- | Browser-Use (Claude Vision) | ~$20-40 |
194
- | AgentQL | ~$50+ (API fees) |
113
+ Your agent sees the numbers and simply calls `click(1)` to sign in. **No CSS selectors. No XPath. No guessing.**
195
114
 
196
- ---
115
+ <br>
197
116
 
198
117
  ## 🚀 Quick Start
199
118
 
200
119
  ### Installation
201
120
 
202
121
  ```bash
203
- # Install with pip
122
+ # Using pip
204
123
  pip install browsercontrol
205
124
 
206
- # Or with uv (recommended)
125
+ # Or with uv (recommended for faster installs)
207
126
  uv add browsercontrol
208
127
 
209
- # That's it! Chromium is auto-installed on first run
128
+ # Chromium is auto-installed on first run—no extra steps needed!
210
129
  ```
211
130
 
212
131
  ### Run the Server
@@ -215,7 +134,7 @@ uv add browsercontrol
215
134
  # Using the CLI
216
135
  browsercontrol
217
136
 
218
- # Or as a module
137
+ # Or as a Python module
219
138
  python -m browsercontrol
220
139
 
221
140
  # Or with FastMCP
@@ -224,7 +143,24 @@ fastmcp run browsercontrol.server:mcp
224
143
 
225
144
  ### Connect to Claude Desktop
226
145
 
227
- Add to `~/.config/Claude/claude_desktop_config.json`:
146
+ Add to your Claude configuration file:
147
+
148
+ <details>
149
+ <summary><b>📁 macOS</b> — <code>~/Library/Application Support/Claude/claude_desktop_config.json</code></summary>
150
+
151
+ ```json
152
+ {
153
+ "mcpServers": {
154
+ "browsercontrol": {
155
+ "command": "browsercontrol"
156
+ }
157
+ }
158
+ }
159
+ ```
160
+ </details>
161
+
162
+ <details>
163
+ <summary><b>📁 Linux</b> — <code>~/.config/Claude/claude_desktop_config.json</code></summary>
228
164
 
229
165
  ```json
230
166
  {
@@ -235,60 +171,100 @@ Add to `~/.config/Claude/claude_desktop_config.json`:
235
171
  }
236
172
  }
237
173
  ```
174
+ </details>
238
175
 
239
- Then just ask Claude:
176
+ <details>
177
+ <summary><b>📁 Windows</b> — <code>%APPDATA%\Claude\claude_desktop_config.json</code></summary>
178
+
179
+ ```json
180
+ {
181
+ "mcpServers": {
182
+ "browsercontrol": {
183
+ "command": "browsercontrol"
184
+ }
185
+ }
186
+ }
187
+ ```
188
+ </details>
240
189
 
190
+ Then ask Claude:
241
191
  > *"Go to GitHub and star the browsercontrol repo"*
242
192
 
243
193
  Claude will navigate, find the star button, and click it—showing you screenshots along the way!
244
194
 
245
- ---
195
+ <br>
246
196
 
247
- ## 🎯 Features
197
+ ## 🥊 Head-to-Head Comparison
248
198
 
249
- ### 1. Set of Marks (SoM) - Vision-First Interaction
199
+ | Feature | **BrowserControl** | Playwright MCP | Stagehand | Browser-Use | AgentQL |
200
+ |---------|:------------------:|:--------------:|:---------:|:-----------:|:-------:|
201
+ | **Vision-First (SoM)** | ✅ Numbered boxes | ❌ Text tree | ⚠️ AI vision | ⚠️ AI vision | ❌ Selectors |
202
+ | **Multi-Tab Support** | ✅ Full control | ⚠️ Implicit | ⚠️ Implicit | ⚠️ Basic | ❌ None |
203
+ | **Cookie Management** | ✅ Direct tools | ⚠️ JS only | ⚠️ JS only | ⚠️ Basic | ❌ None |
204
+ | **File Uploads** | ✅ Native tool | ⚠️ Manual | ❌ No | ❌ No | ❌ No |
205
+ | **Developer Tools** | ✅ 8 tools | ❌ None | ❌ None | ❌ None | ❌ None |
206
+ | **Session Recording** | ✅ Built-in | ⚠️ Manual | ❌ None | ❌ None | ❌ None |
207
+ | **Persistent Sessions** | ✅ Automatic | ⚠️ Manual | ❌ None | ❌ None | ❌ None |
208
+ | **Token Efficiency** | ✅ Tiny IDs | ⚠️ Large tree | ❌ Full images | ❌ Full images | ⚠️ Query results |
209
+ | **100% Local/Offline** | ✅ Yes | ✅ Yes | ❌ Needs LLM API | ❌ Needs LLM API | ❌ Cloud only |
210
+ | **Monthly Cost (1k actions)** | **$0** | $0 | ~$30-50 | ~$20-40 | ~$50+ |
250
211
 
251
- Every action returns an annotated screenshot with numbered elements. Your AI agent can:
252
- - **See** the page exactly as a human would
253
- - **Identify** clickable elements by number
254
- - **Act** with simple commands like `click(5)`
212
+ <br>
255
213
 
256
- ### 2. 🔧 Developer Tools
214
+ ## 💪 Key Advantages
257
215
 
258
- Built-in debugging tools for web development:
216
+ ### 1. Multi-Tab Orchestration
217
+ Unlike other tools that get "lost" when a new window opens:
218
+ - `list_tabs()` — See every open page, title, and URL
219
+ - `switch_tab(index)` — Multitask between different sites
220
+ - `create_tab(url)` — Open references or parallel workflows
259
221
 
260
- | Tool | Description |
261
- |------|-------------|
262
- | `get_console_logs()` | Capture browser console (errors, warnings, logs) |
263
- | `get_network_requests()` | Monitor API calls, status codes, timing |
264
- | `get_page_errors()` | See JavaScript exceptions and crashes |
265
- | `run_in_console(code)` | Execute JS in browser console |
266
- | `inspect_element(id)` | Get computed styles, dimensions, properties |
267
- | `get_page_performance()` | Page load time, Core Web Vitals, memory |
268
-
269
- ### 3. 🎬 Session Recording
222
+ ### 2. Session & Cookie Management
223
+ Stop fighting with login forms. Inject or inspect session state directly:
224
+ - `set_cookie()` Log in instantly by injecting an auth token
225
+ - `get_cookies()` Debug session issues or export state
226
+ - `clear_cookies()` Fresh start without clearing the whole profile
270
227
 
271
- Record browser sessions for debugging and documentation:
228
+ ### 3. Reliable File Uploads
229
+ Most AI agents fail when they hit a `<input type="file">`. BrowserControl uses native browser engine hooks:
230
+ - `upload_file(id, path)` — Just point at the button and the local file
272
231
 
273
- | Tool | Description |
274
- |------|-------------|
275
- | `start_recording()` | Begin recording the session |
276
- | `stop_recording()` | Save recording (Playwright trace format) |
277
- | `take_snapshot()` | Save screenshot + HTML + URL |
278
- | `list_recordings()` | View all saved sessions |
232
+ ### 4. Developer Tools Suite
233
+ Debug like a pro with tools no one else provides:
234
+ ```python
235
+ get_console_logs() # See browser errors
236
+ get_network_requests() # Monitor API calls
237
+ get_page_errors() # Catch JS exceptions
238
+ run_in_console(code) # Debug in real-time
239
+ inspect_element(5) # Get computed styles
240
+ get_page_performance() # Core Web Vitals
241
+ ```
279
242
 
280
- View recordings with:
281
- ```bash
282
- npx playwright show-trace ~/.browsercontrol/recordings/session.zip
243
+ ### 5. Session Recording
244
+ ```
245
+ start_recording() → Browse around → stop_recording()
246
+
247
+ session_20260202.zip
248
+ (View with Playwright trace viewer)
283
249
  ```
284
250
 
285
- ### 4. 💾 Persistent Sessions
251
+ ### 6. Dynamic Viewport Control
252
+ Test responsive designs or emulate mobile screens on the fly:
253
+ - `set_viewport(width, height)` — Change resolution without restarting
286
254
 
287
- - Cookies, localStorage, and session data persist across restarts
288
- - Stay logged into websites
289
- - Maintain shopping carts, preferences, etc.
255
+ ### 7. True Persistence
290
256
 
291
- ---
257
+ | What Persists | BrowserControl | Others |
258
+ |---------------|:--------------:|:------:|
259
+ | Cookies | ✅ | ❌ |
260
+ | localStorage | ✅ | ❌ |
261
+ | Session tokens | ✅ | ❌ |
262
+ | Login state | ✅ | ❌ |
263
+ | Browser history | ✅ | ❌ |
264
+
265
+ **Result**: Log in once, stay logged in across sessions.
266
+
267
+ <br>
292
268
 
293
269
  ## 🛠️ Available Tools
294
270
 
@@ -299,26 +275,36 @@ npx playwright show-trace ~/.browsercontrol/recordings/session.zip
299
275
  | `go_back()` | Navigate back |
300
276
  | `go_forward()` | Navigate forward |
301
277
  | `refresh_page()` | Reload the page |
302
- | `scroll(direction, amount)` | Scroll the page |
278
+ | `scroll(direction, amount)` | Scroll up/down/left/right |
303
279
 
304
280
  ### Interaction
305
281
  | Tool | Description |
306
282
  |------|-------------|
307
283
  | `click(element_id)` | Click element by number |
308
284
  | `click_at(x, y)` | Click at coordinates |
309
- | `type_text(element_id, text)` | Type into input |
285
+ | `type_text(element_id, text)` | Type into input field |
310
286
  | `press_key(key)` | Press keyboard key (Enter, Tab, etc.) |
311
287
  | `hover(element_id)` | Hover over element |
312
288
  | `scroll_to_element(element_id)` | Scroll element into view |
313
- | `wait(seconds)` | Wait for loading |
289
+ | `upload_file(element_id, path)` | Upload a file to an input |
290
+ | `wait(seconds)` | Wait for page loading |
291
+
292
+ ### Tab Management
293
+ | Tool | Description |
294
+ |------|-------------|
295
+ | `create_tab(url)` | Open a new browser tab |
296
+ | `switch_tab(index)` | Switch to a tab by its index |
297
+ | `close_tab(index)` | Close a specific tab |
298
+ | `list_tabs()` | List all open tabs and URLs |
314
299
 
315
300
  ### Forms
316
301
  | Tool | Description |
317
302
  |------|-------------|
318
303
  | `select_option(element_id, option)` | Select dropdown option |
319
304
  | `check_checkbox(element_id)` | Toggle checkbox |
305
+ | `upload_file(element_id, file_path)` | Upload file to input |
320
306
 
321
- ### Content
307
+ ### Content Extraction
322
308
  | Tool | Description |
323
309
  |------|-------------|
324
310
  | `get_page_content()` | Get page as markdown |
@@ -335,6 +321,11 @@ npx playwright show-trace ~/.browsercontrol/recordings/session.zip
335
321
  | `get_page_errors()` | JavaScript errors |
336
322
  | `run_in_console(code)` | Execute JS in console |
337
323
  | `inspect_element(id)` | Element styles/properties |
324
+ | `get_cookies()` | List browser cookies |
325
+ | `set_cookie(name, value, ...)` | Set a cookie |
326
+ | `delete_cookie(name)` | Remove a cookie |
327
+ | `clear_cookies()` | Clear all cookies |
328
+ | `set_viewport(width, height)` | Change window size |
338
329
  | `get_page_performance()` | Load times, Web Vitals |
339
330
 
340
331
  ### Recording
@@ -345,7 +336,7 @@ npx playwright show-trace ~/.browsercontrol/recordings/session.zip
345
336
  | `take_snapshot()` | Save screenshot + HTML |
346
337
  | `list_recordings()` | View saved sessions |
347
338
 
348
- ---
339
+ <br>
349
340
 
350
341
  ## ⚙️ Configuration
351
342
 
@@ -358,28 +349,27 @@ Configure via environment variables:
358
349
  | `BROWSER_VIEWPORT_HEIGHT` | `720` | Viewport height in pixels |
359
350
  | `BROWSER_TIMEOUT` | `30000` | Navigation timeout (ms) |
360
351
  | `BROWSER_USER_DATA_DIR` | `~/.browsercontrol/user_data` | Browser profile path |
361
- | `BROWSER_EXTENSION_PATH` | - | Path to browser extension |
362
- | `LOG_LEVEL` | `INFO` | Logging level (DEBUG, INFO, WARNING, ERROR) |
352
+ | `BROWSER_EXTENSION_PATH` | | Path to browser extension |
353
+ | `LOG_LEVEL` | `INFO` | Logging verbosity |
363
354
 
364
- ### Examples
355
+ **Examples:**
365
356
 
366
357
  ```bash
367
358
  # Run with visible browser (for debugging)
368
359
  BROWSER_HEADLESS=false browsercontrol
369
360
 
370
- # Custom viewport for mobile testing
361
+ # Mobile viewport emulation
371
362
  BROWSER_VIEWPORT_WIDTH=375 BROWSER_VIEWPORT_HEIGHT=812 browsercontrol
372
363
 
373
364
  # Verbose logging
374
365
  LOG_LEVEL=DEBUG browsercontrol
375
366
  ```
376
367
 
377
- ---
378
-
379
- ## 📚 Examples
368
+ <br>
380
369
 
381
- ### Example 1: Web Research
370
+ ## 📖 Examples
382
371
 
372
+ ### Web Research
383
373
  ```
384
374
  You: "Go to Wikipedia and find out when Python was created"
385
375
 
@@ -391,8 +381,7 @@ Claude: *navigates to wikipedia.org*
391
381
  "Python was created by Guido van Rossum and first released in 1991."
392
382
  ```
393
383
 
394
- ### Example 2: Debug a Web App
395
-
384
+ ### Debug a Web App
396
385
  ```
397
386
  You: "Go to my app at localhost:3000 and check for any errors"
398
387
 
@@ -408,8 +397,7 @@ Claude: *navigates to localhost:3000*
408
397
  "The /api/users endpoint returned 404. Check your API route."
409
398
  ```
410
399
 
411
- ### Example 3: Automated Testing
412
-
400
+ ### Automated Testing with Recording
413
401
  ```
414
402
  You: "Test the login flow on my site. Start recording."
415
403
 
@@ -421,11 +409,10 @@ Claude: *calls start_recording()*
421
409
  *verifies successful redirect*
422
410
  *calls stop_recording()*
423
411
 
424
- "Login test passed! Recording saved to session_20260108.zip"
412
+ "Login test passed! Recording saved to session_20260202.zip"
425
413
  ```
426
414
 
427
- ### Example 4: Form Automation
428
-
415
+ ### Form Automation
429
416
  ```
430
417
  You: "Fill out the contact form on example.com"
431
418
 
@@ -438,7 +425,7 @@ Claude: *navigates to example.com/contact*
438
425
  "Form submitted successfully!"
439
426
  ```
440
427
 
441
- ---
428
+ <br>
442
429
 
443
430
  ## 🏗️ Architecture
444
431
 
@@ -456,16 +443,16 @@ Claude: *navigates to example.com/contact*
456
443
 
457
444
  ### How It Works
458
445
 
459
- 1. **AI sends command**: `click(5)`
460
- 2. **Server finds element**: Looks up element #5 from the last screenshot
461
- 3. **Browser acts**: Clicks at the element's coordinates
462
- 4. **Capture state**: Takes new screenshot, detects elements
463
- 5. **Annotate**: Draws numbered boxes on interactive elements
464
- 6. **Return to AI**: Sends annotated image + element list
446
+ 1. **AI sends command** `click(5)`
447
+ 2. **Server finds element** Looks up element #5 from the last screenshot
448
+ 3. **Browser acts** Clicks at the element's coordinates
449
+ 4. **Capture state** Takes new screenshot, detects elements
450
+ 5. **Annotate** Draws numbered boxes on interactive elements
451
+ 6. **Return to AI** Sends annotated image + element list
465
452
 
466
- ---
453
+ <br>
467
454
 
468
- ## 📦 Project Structure
455
+ ## 📁 Project Structure
469
456
 
470
457
  ```
471
458
  browsercontrol/
@@ -480,54 +467,72 @@ browsercontrol/
480
467
  ├── forms.py # Form handling tools
481
468
  ├── content.py # Content extraction tools
482
469
  ├── devtools.py # Developer tools
483
- └── recording.py # Session recording tools
470
+ ├── recording.py # Session recording tools
471
+ └── tabs.py # Tab management tools
484
472
  ```
485
473
 
486
- ---
474
+ <br>
487
475
 
488
476
  ## 🔧 Troubleshooting
489
477
 
490
- ### "Missing X server" Error
478
+ <details>
479
+ <summary><b>"Missing X server" Error</b></summary>
491
480
 
492
481
  Set `BROWSER_HEADLESS=true` or run with xvfb:
493
482
  ```bash
494
483
  xvfb-run browsercontrol
495
484
  ```
485
+ </details>
496
486
 
497
- ### Browser Not Starting
487
+ <details>
488
+ <summary><b>Browser Not Starting</b></summary>
498
489
 
499
490
  Chromium auto-installs on first run. If it fails, install manually:
500
491
  ```bash
501
492
  python -m playwright install chromium
502
493
  ```
494
+ </details>
503
495
 
504
- ### Session Not Persisting
496
+ <details>
497
+ <summary><b>Session Not Persisting</b></summary>
505
498
 
506
499
  Check that `BROWSER_USER_DATA_DIR` is writable:
507
500
  ```bash
508
501
  ls -la ~/.browsercontrol/
509
502
  ```
503
+ </details>
510
504
 
511
- ### Connection Refused
505
+ <details>
506
+ <summary><b>Connection Refused</b></summary>
512
507
 
513
508
  Ensure no other instance is running:
514
509
  ```bash
515
510
  pkill -f browsercontrol
516
511
  browsercontrol
517
512
  ```
513
+ </details>
518
514
 
519
- ---
515
+ <details>
516
+ <summary><b>View Session Recordings</b></summary>
517
+
518
+ Open recordings in the Playwright trace viewer:
519
+ ```bash
520
+ npx playwright show-trace ~/.browsercontrol/recordings/session.zip
521
+ ```
522
+ </details>
523
+
524
+ <br>
520
525
 
521
526
  ## 🤝 Contributing
522
527
 
523
- Contributions are welcome! Some ideas:
528
+ Contributions are welcome! Check out our [Contributing Guide](CONTRIBUTING.md) for details.
524
529
 
525
- - [ ] Multi-tab support
530
+ **Ideas for contributions:**
526
531
  - [ ] Firefox/WebKit support
527
532
  - [ ] DOM diffing (detect changes)
528
- - [ ] Accessibility audit
533
+ - [ ] Accessibility audit tools
529
534
  - [ ] Mobile emulation presets
530
- - [ ] Cookie import/export
535
+ - [ ] Cookie import/export files
531
536
 
532
537
  ```bash
533
538
  # Clone and install
@@ -542,28 +547,28 @@ uv run pytest
542
547
  uv run fastmcp dev browsercontrol/server.py
543
548
  ```
544
549
 
545
- ---
550
+ <br>
546
551
 
547
552
  ## 📄 License
548
553
 
549
- MIT License - Use it however you want.
554
+ [MIT License](LICENSE) Use it however you want.
550
555
 
551
- ---
556
+ <br>
552
557
 
553
558
  ## 🙏 Acknowledgments
554
559
 
555
- - Inspired by the browser control capabilities in **Google's AntiGravity IDE**
560
+ - Vision-first approach inspired by **Google's AntiGravity IDE**
556
561
  - Built with [FastMCP](https://gofastmcp.com) and [Playwright](https://playwright.dev)
557
562
  - Thanks to the MCP community for making AI-tool integration accessible
558
563
 
559
564
  ---
560
565
 
561
566
  <p align="center">
562
- <strong>Built with ❤️ for the AI agent community.</strong>
567
+ <strong>Built for AI agents that need to see the web.</strong>
563
568
  </p>
564
569
 
565
570
  <p align="center">
566
571
  <a href="https://github.com/adityasasidhar/browsercontrol">⭐ Star on GitHub</a> •
567
- <a href="https://github.com/adityasasidhar/browsercontrol/issues">Report Bug</a> •
568
- <a href="https://github.com/adityasasidhar/browsercontrol/issues">Request Feature</a>
572
+ <a href="https://github.com/adityasasidhar/browsercontrol/issues">🐛 Report Bug</a> •
573
+ <a href="https://github.com/adityasasidhar/browsercontrol/issues">💡 Request Feature</a>
569
574
  </p>