browsercontrol 0.1.0__py3-none-any.whl
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- browsercontrol/__init__.py +8 -0
- browsercontrol/__main__.py +19 -0
- browsercontrol/browser.py +417 -0
- browsercontrol/config.py +61 -0
- browsercontrol/server.py +89 -0
- browsercontrol/tools/__init__.py +17 -0
- browsercontrol/tools/content.py +135 -0
- browsercontrol/tools/devtools.py +355 -0
- browsercontrol/tools/forms.py +96 -0
- browsercontrol/tools/interaction.py +204 -0
- browsercontrol/tools/navigation.py +163 -0
- browsercontrol/tools/recording.py +221 -0
- browsercontrol-0.1.0.dist-info/METADATA +569 -0
- browsercontrol-0.1.0.dist-info/RECORD +17 -0
- browsercontrol-0.1.0.dist-info/WHEEL +4 -0
- browsercontrol-0.1.0.dist-info/entry_points.txt +2 -0
- browsercontrol-0.1.0.dist-info/licenses/LICENSE +21 -0
|
@@ -0,0 +1,569 @@
|
|
|
1
|
+
Metadata-Version: 2.4
|
|
2
|
+
Name: browsercontrol
|
|
3
|
+
Version: 0.1.0
|
|
4
|
+
Summary: MCP server for browser automation with Set of Marks (SoM) - AI agents can see and interact with web pages using numbered element IDs
|
|
5
|
+
Project-URL: Homepage, https://github.com/adityasasidhar/browsercontrol
|
|
6
|
+
Project-URL: Repository, https://github.com/adityasasidhar/browsercontrol
|
|
7
|
+
Author: Aditya Sasidhar
|
|
8
|
+
License: MIT
|
|
9
|
+
License-File: LICENSE
|
|
10
|
+
Keywords: agent,ai,automation,browser,llm,mcp,playwright
|
|
11
|
+
Classifier: Development Status :: 4 - Beta
|
|
12
|
+
Classifier: Intended Audience :: Developers
|
|
13
|
+
Classifier: License :: OSI Approved :: MIT License
|
|
14
|
+
Classifier: Programming Language :: Python :: 3
|
|
15
|
+
Classifier: Programming Language :: Python :: 3.11
|
|
16
|
+
Classifier: Programming Language :: Python :: 3.12
|
|
17
|
+
Classifier: Programming Language :: Python :: 3.13
|
|
18
|
+
Classifier: Topic :: Internet :: WWW/HTTP :: Browsers
|
|
19
|
+
Classifier: Topic :: Software Development :: Libraries :: Python Modules
|
|
20
|
+
Requires-Python: >=3.11
|
|
21
|
+
Requires-Dist: fastmcp>=2.14.2
|
|
22
|
+
Requires-Dist: markdownify>=0.14.1
|
|
23
|
+
Requires-Dist: pillow>=11.0.0
|
|
24
|
+
Requires-Dist: playwright>=1.49.0
|
|
25
|
+
Description-Content-Type: text/markdown
|
|
26
|
+
|
|
27
|
+
<p align="center">
|
|
28
|
+
<img src="https://raw.githubusercontent.com/adityasasidhar/browsercontrol/main/logo.png" alt="BrowserControl" width="120">
|
|
29
|
+
</p>
|
|
30
|
+
|
|
31
|
+
<h1 align="center">🌐 BrowserControl</h1>
|
|
32
|
+
|
|
33
|
+
<p align="center">
|
|
34
|
+
<strong>Give your AI agent real browser superpowers.</strong>
|
|
35
|
+
</p>
|
|
36
|
+
|
|
37
|
+
<p align="center">
|
|
38
|
+
<a href="https://www.python.org/downloads/"><img src="https://img.shields.io/badge/python-3.11+-blue.svg" alt="Python 3.11+"></a>
|
|
39
|
+
<a href="https://opensource.org/licenses/MIT"><img src="https://img.shields.io/badge/License-MIT-green.svg" alt="License: MIT"></a>
|
|
40
|
+
<a href="https://modelcontextprotocol.io/"><img src="https://img.shields.io/badge/MCP-Compatible-purple.svg" alt="MCP"></a>
|
|
41
|
+
<a href="https://github.com/adityasasidhar/browsercontrol"><img src="https://img.shields.io/github/stars/adityasasidhar/browsercontrol?style=social" alt="GitHub Stars"></a>
|
|
42
|
+
</p>
|
|
43
|
+
|
|
44
|
+
<p align="center">
|
|
45
|
+
<a href="#-quick-start">Quick Start</a> •
|
|
46
|
+
<a href="#-features">Features</a> •
|
|
47
|
+
<a href="#-available-tools">Tools</a> •
|
|
48
|
+
<a href="#%EF%B8%8F-configuration">Configuration</a> •
|
|
49
|
+
<a href="#-examples">Examples</a>
|
|
50
|
+
</p>
|
|
51
|
+
|
|
52
|
+
---
|
|
53
|
+
|
|
54
|
+
Ever wished Claude, Gemini, or your custom AI agent could actually browse the web? Not just fetch URLs, but truly **see**, **click**, **type**, and **interact** with any website like a human?
|
|
55
|
+
|
|
56
|
+
**BrowserControl** is an MCP server that gives your AI agent full browser access with a **vision-first approach** inspired by Google's AntiGravity IDE.
|
|
57
|
+
|
|
58
|
+
## ✨ What Makes This Different
|
|
59
|
+
|
|
60
|
+
| Traditional Web Access | BrowserControl |
|
|
61
|
+
|------------------------|----------------|
|
|
62
|
+
| Fetch static HTML | See the **rendered page** |
|
|
63
|
+
| Parse complex DOM | Point at **numbered elements** |
|
|
64
|
+
| Guess at selectors | Just say **"click 5"** |
|
|
65
|
+
| No JavaScript support | Full **dynamic content** |
|
|
66
|
+
| No login persistence | **Persistent sessions** |
|
|
67
|
+
| No debugging tools | **Console, Network, Errors** |
|
|
68
|
+
|
|
69
|
+
### 🎯 The Secret: Set of Marks (SoM)
|
|
70
|
+
|
|
71
|
+
Every screenshot comes annotated with **numbered red boxes** on interactive elements:
|
|
72
|
+
|
|
73
|
+
```
|
|
74
|
+
Found 15 interactive elements:
|
|
75
|
+
[1] button - Sign In
|
|
76
|
+
[2] input - Search...
|
|
77
|
+
[3] a - Products
|
|
78
|
+
[4] a - Pricing
|
|
79
|
+
[5] button - Get Started
|
|
80
|
+
```
|
|
81
|
+
|
|
82
|
+
Your agent sees the numbers and simply calls `click(1)` to sign in. **No CSS selectors. No XPath. No guessing.**
|
|
83
|
+
|
|
84
|
+
---
|
|
85
|
+
|
|
86
|
+
## 🏆 Why BrowserControl Beats Every Alternative
|
|
87
|
+
|
|
88
|
+
### Head-to-Head Comparison
|
|
89
|
+
|
|
90
|
+
| Feature | **BrowserControl** | Playwright MCP | Stagehand | Browser-Use | AgentQL |
|
|
91
|
+
|---------|:------------------:|:--------------:|:---------:|:-----------:|:-------:|
|
|
92
|
+
| **Vision-First (SoM)** | ✅ Numbered boxes | ❌ Text tree | ⚠️ AI vision | ⚠️ AI vision | ❌ Selectors |
|
|
93
|
+
| **No Extra AI Calls** | ✅ Zero | ❌ Parses tree | ❌ GPT-4V per action | ❌ Vision model | ❌ Query model |
|
|
94
|
+
| **Developer Tools** | ✅ 6 tools | ❌ None | ❌ None | ❌ None | ❌ None |
|
|
95
|
+
| **Session Recording** | ✅ Built-in | ❌ Manual | ❌ None | ❌ None | ❌ None |
|
|
96
|
+
| **Persistent Sessions** | ✅ Automatic | ⚠️ Manual setup | ❌ None | ❌ None | ❌ None |
|
|
97
|
+
| **MCP Native** | ✅ FastMCP | ✅ Official | ❌ Python SDK | ⚠️ Custom | ❌ REST API |
|
|
98
|
+
| **Install Complexity** | ✅ `pip install` | ⚠️ npx + config | ❌ Docker + setup | ⚠️ Docker | ❌ Cloud signup |
|
|
99
|
+
| **Token Efficiency** | ✅ Tiny IDs | ⚠️ Large tree | ❌ Full images | ❌ Full images | ⚠️ Query results |
|
|
100
|
+
| **Cost per Action** | ✅ $0 | ✅ $0 | ❌ ~$0.01-0.05 | ❌ ~$0.01-0.05 | ❌ API fees |
|
|
101
|
+
| **Offline/Local** | ✅ 100% local | ✅ Local | ⚠️ Needs LLM API | ⚠️ Needs LLM API | ❌ Cloud only |
|
|
102
|
+
|
|
103
|
+
### 🎯 Key Advantages
|
|
104
|
+
|
|
105
|
+
#### 1. **Token Efficiency = Faster + Cheaper**
|
|
106
|
+
|
|
107
|
+
```
|
|
108
|
+
Other tools send: BrowserControl sends:
|
|
109
|
+
─────────────────── ─────────────────────
|
|
110
|
+
Full DOM tree "click(5)"
|
|
111
|
+
(5,000+ tokens) (3 tokens)
|
|
112
|
+
or
|
|
113
|
+
Base64 screenshot Element ID + summary
|
|
114
|
+
(10,000+ tokens) (100 tokens)
|
|
115
|
+
```
|
|
116
|
+
|
|
117
|
+
**Result**: 50-100x fewer tokens per action = faster responses, lower costs.
|
|
118
|
+
|
|
119
|
+
#### 2. **No Extra AI Calls Required**
|
|
120
|
+
|
|
121
|
+
| Tool | AI Calls per Click |
|
|
122
|
+
|------|-------------------|
|
|
123
|
+
| **BrowserControl** | 0 (just `click(5)`) |
|
|
124
|
+
| Stagehand | 1-2 (vision + action) |
|
|
125
|
+
| Browser-Use | 1-2 (vision + planning) |
|
|
126
|
+
| AgentQL | 1 (query interpretation) |
|
|
127
|
+
|
|
128
|
+
**Result**: No vision API costs, no rate limits, works offline.
|
|
129
|
+
|
|
130
|
+
#### 3. **Developer Tools No One Else Has**
|
|
131
|
+
|
|
132
|
+
```python
|
|
133
|
+
# Only BrowserControl can do this:
|
|
134
|
+
get_console_logs() # See browser errors
|
|
135
|
+
get_network_requests() # Monitor API calls
|
|
136
|
+
get_page_errors() # Catch JS exceptions
|
|
137
|
+
run_in_console(code) # Debug in real-time
|
|
138
|
+
inspect_element(5) # Get computed styles
|
|
139
|
+
get_page_performance() # Core Web Vitals
|
|
140
|
+
```
|
|
141
|
+
|
|
142
|
+
**Other tools**: Navigate, click, type... that's it.
|
|
143
|
+
|
|
144
|
+
#### 4. **Session Recording Built-In**
|
|
145
|
+
|
|
146
|
+
```
|
|
147
|
+
start_recording() → Browse around → stop_recording()
|
|
148
|
+
↓
|
|
149
|
+
📹 session_20260108.zip
|
|
150
|
+
(View with Playwright trace viewer)
|
|
151
|
+
```
|
|
152
|
+
|
|
153
|
+
**Other tools**: No recording. Debug from memory.
|
|
154
|
+
|
|
155
|
+
#### 5. **True Persistence**
|
|
156
|
+
|
|
157
|
+
| What Persists | BrowserControl | Others |
|
|
158
|
+
|---------------|:--------------:|:------:|
|
|
159
|
+
| Cookies | ✅ | ❌ |
|
|
160
|
+
| localStorage | ✅ | ❌ |
|
|
161
|
+
| Session tokens | ✅ | ❌ |
|
|
162
|
+
| Login state | ✅ | ❌ |
|
|
163
|
+
| Browser history | ✅ | ❌ |
|
|
164
|
+
|
|
165
|
+
**Result**: Log in once, stay logged in across sessions.
|
|
166
|
+
|
|
167
|
+
#### 6. **Simpler Mental Model**
|
|
168
|
+
|
|
169
|
+
```
|
|
170
|
+
❌ Other tools:
|
|
171
|
+
"Find the button with class 'btn-primary' that contains text 'Submit'
|
|
172
|
+
and is a descendant of form#contact-form..."
|
|
173
|
+
|
|
174
|
+
✅ BrowserControl:
|
|
175
|
+
"click(7)"
|
|
176
|
+
```
|
|
177
|
+
|
|
178
|
+
### 📊 Real-World Performance
|
|
179
|
+
|
|
180
|
+
| Scenario | BrowserControl | Vision-Based Tools |
|
|
181
|
+
|----------|:--------------:|:------------------:|
|
|
182
|
+
| Click a button | ~50ms | ~2-5 seconds |
|
|
183
|
+
| Fill a form (5 fields) | ~500ms | ~15-30 seconds |
|
|
184
|
+
| Navigate + act | ~1 second | ~5-10 seconds |
|
|
185
|
+
| Debug console errors | ✅ Instant | ❌ Not possible |
|
|
186
|
+
|
|
187
|
+
### 💰 Cost Comparison (1000 actions/month)
|
|
188
|
+
|
|
189
|
+
| Tool | Monthly Cost |
|
|
190
|
+
|------|-------------|
|
|
191
|
+
| **BrowserControl** | **$0** (fully local) |
|
|
192
|
+
| Stagehand (GPT-4V) | ~$30-50 |
|
|
193
|
+
| Browser-Use (Claude Vision) | ~$20-40 |
|
|
194
|
+
| AgentQL | ~$50+ (API fees) |
|
|
195
|
+
|
|
196
|
+
---
|
|
197
|
+
|
|
198
|
+
## 🚀 Quick Start
|
|
199
|
+
|
|
200
|
+
### Installation
|
|
201
|
+
|
|
202
|
+
```bash
|
|
203
|
+
# Install with pip
|
|
204
|
+
pip install browsercontrol
|
|
205
|
+
|
|
206
|
+
# Or with uv (recommended)
|
|
207
|
+
uv add browsercontrol
|
|
208
|
+
|
|
209
|
+
# That's it! Chromium is auto-installed on first run
|
|
210
|
+
```
|
|
211
|
+
|
|
212
|
+
### Run the Server
|
|
213
|
+
|
|
214
|
+
```bash
|
|
215
|
+
# Using the CLI
|
|
216
|
+
browsercontrol
|
|
217
|
+
|
|
218
|
+
# Or as a module
|
|
219
|
+
python -m browsercontrol
|
|
220
|
+
|
|
221
|
+
# Or with FastMCP
|
|
222
|
+
fastmcp run browsercontrol.server:mcp
|
|
223
|
+
```
|
|
224
|
+
|
|
225
|
+
### Connect to Claude Desktop
|
|
226
|
+
|
|
227
|
+
Add to `~/.config/Claude/claude_desktop_config.json`:
|
|
228
|
+
|
|
229
|
+
```json
|
|
230
|
+
{
|
|
231
|
+
"mcpServers": {
|
|
232
|
+
"browsercontrol": {
|
|
233
|
+
"command": "browsercontrol"
|
|
234
|
+
}
|
|
235
|
+
}
|
|
236
|
+
}
|
|
237
|
+
```
|
|
238
|
+
|
|
239
|
+
Then just ask Claude:
|
|
240
|
+
|
|
241
|
+
> *"Go to GitHub and star the browsercontrol repo"*
|
|
242
|
+
|
|
243
|
+
Claude will navigate, find the star button, and click it—showing you screenshots along the way!
|
|
244
|
+
|
|
245
|
+
---
|
|
246
|
+
|
|
247
|
+
## 🎯 Features
|
|
248
|
+
|
|
249
|
+
### 1. Set of Marks (SoM) - Vision-First Interaction
|
|
250
|
+
|
|
251
|
+
Every action returns an annotated screenshot with numbered elements. Your AI agent can:
|
|
252
|
+
- **See** the page exactly as a human would
|
|
253
|
+
- **Identify** clickable elements by number
|
|
254
|
+
- **Act** with simple commands like `click(5)`
|
|
255
|
+
|
|
256
|
+
### 2. 🔧 Developer Tools
|
|
257
|
+
|
|
258
|
+
Built-in debugging tools for web development:
|
|
259
|
+
|
|
260
|
+
| Tool | Description |
|
|
261
|
+
|------|-------------|
|
|
262
|
+
| `get_console_logs()` | Capture browser console (errors, warnings, logs) |
|
|
263
|
+
| `get_network_requests()` | Monitor API calls, status codes, timing |
|
|
264
|
+
| `get_page_errors()` | See JavaScript exceptions and crashes |
|
|
265
|
+
| `run_in_console(code)` | Execute JS in browser console |
|
|
266
|
+
| `inspect_element(id)` | Get computed styles, dimensions, properties |
|
|
267
|
+
| `get_page_performance()` | Page load time, Core Web Vitals, memory |
|
|
268
|
+
|
|
269
|
+
### 3. 🎬 Session Recording
|
|
270
|
+
|
|
271
|
+
Record browser sessions for debugging and documentation:
|
|
272
|
+
|
|
273
|
+
| Tool | Description |
|
|
274
|
+
|------|-------------|
|
|
275
|
+
| `start_recording()` | Begin recording the session |
|
|
276
|
+
| `stop_recording()` | Save recording (Playwright trace format) |
|
|
277
|
+
| `take_snapshot()` | Save screenshot + HTML + URL |
|
|
278
|
+
| `list_recordings()` | View all saved sessions |
|
|
279
|
+
|
|
280
|
+
View recordings with:
|
|
281
|
+
```bash
|
|
282
|
+
npx playwright show-trace ~/.browsercontrol/recordings/session.zip
|
|
283
|
+
```
|
|
284
|
+
|
|
285
|
+
### 4. 💾 Persistent Sessions
|
|
286
|
+
|
|
287
|
+
- Cookies, localStorage, and session data persist across restarts
|
|
288
|
+
- Stay logged into websites
|
|
289
|
+
- Maintain shopping carts, preferences, etc.
|
|
290
|
+
|
|
291
|
+
---
|
|
292
|
+
|
|
293
|
+
## 🛠️ Available Tools
|
|
294
|
+
|
|
295
|
+
### Navigation
|
|
296
|
+
| Tool | Description |
|
|
297
|
+
|------|-------------|
|
|
298
|
+
| `navigate_to(url)` | Go to a URL |
|
|
299
|
+
| `go_back()` | Navigate back |
|
|
300
|
+
| `go_forward()` | Navigate forward |
|
|
301
|
+
| `refresh_page()` | Reload the page |
|
|
302
|
+
| `scroll(direction, amount)` | Scroll the page |
|
|
303
|
+
|
|
304
|
+
### Interaction
|
|
305
|
+
| Tool | Description |
|
|
306
|
+
|------|-------------|
|
|
307
|
+
| `click(element_id)` | Click element by number |
|
|
308
|
+
| `click_at(x, y)` | Click at coordinates |
|
|
309
|
+
| `type_text(element_id, text)` | Type into input |
|
|
310
|
+
| `press_key(key)` | Press keyboard key (Enter, Tab, etc.) |
|
|
311
|
+
| `hover(element_id)` | Hover over element |
|
|
312
|
+
| `scroll_to_element(element_id)` | Scroll element into view |
|
|
313
|
+
| `wait(seconds)` | Wait for loading |
|
|
314
|
+
|
|
315
|
+
### Forms
|
|
316
|
+
| Tool | Description |
|
|
317
|
+
|------|-------------|
|
|
318
|
+
| `select_option(element_id, option)` | Select dropdown option |
|
|
319
|
+
| `check_checkbox(element_id)` | Toggle checkbox |
|
|
320
|
+
|
|
321
|
+
### Content
|
|
322
|
+
| Tool | Description |
|
|
323
|
+
|------|-------------|
|
|
324
|
+
| `get_page_content()` | Get page as markdown |
|
|
325
|
+
| `get_text(element_id)` | Get element text |
|
|
326
|
+
| `get_page_info()` | Get URL and title |
|
|
327
|
+
| `run_javascript(script)` | Execute JavaScript |
|
|
328
|
+
| `screenshot(annotate, full_page)` | Take screenshot |
|
|
329
|
+
|
|
330
|
+
### Developer Tools
|
|
331
|
+
| Tool | Description |
|
|
332
|
+
|------|-------------|
|
|
333
|
+
| `get_console_logs()` | Browser console output |
|
|
334
|
+
| `get_network_requests()` | API calls and responses |
|
|
335
|
+
| `get_page_errors()` | JavaScript errors |
|
|
336
|
+
| `run_in_console(code)` | Execute JS in console |
|
|
337
|
+
| `inspect_element(id)` | Element styles/properties |
|
|
338
|
+
| `get_page_performance()` | Load times, Web Vitals |
|
|
339
|
+
|
|
340
|
+
### Recording
|
|
341
|
+
| Tool | Description |
|
|
342
|
+
|------|-------------|
|
|
343
|
+
| `start_recording()` | Begin session recording |
|
|
344
|
+
| `stop_recording()` | Save recording |
|
|
345
|
+
| `take_snapshot()` | Save screenshot + HTML |
|
|
346
|
+
| `list_recordings()` | View saved sessions |
|
|
347
|
+
|
|
348
|
+
---
|
|
349
|
+
|
|
350
|
+
## ⚙️ Configuration
|
|
351
|
+
|
|
352
|
+
Configure via environment variables:
|
|
353
|
+
|
|
354
|
+
| Variable | Default | Description |
|
|
355
|
+
|----------|---------|-------------|
|
|
356
|
+
| `BROWSER_HEADLESS` | `true` | Run without visible window |
|
|
357
|
+
| `BROWSER_VIEWPORT_WIDTH` | `1280` | Viewport width in pixels |
|
|
358
|
+
| `BROWSER_VIEWPORT_HEIGHT` | `720` | Viewport height in pixels |
|
|
359
|
+
| `BROWSER_TIMEOUT` | `30000` | Navigation timeout (ms) |
|
|
360
|
+
| `BROWSER_USER_DATA_DIR` | `~/.browsercontrol/user_data` | Browser profile path |
|
|
361
|
+
| `BROWSER_EXTENSION_PATH` | - | Path to browser extension |
|
|
362
|
+
| `LOG_LEVEL` | `INFO` | Logging level (DEBUG, INFO, WARNING, ERROR) |
|
|
363
|
+
|
|
364
|
+
### Examples
|
|
365
|
+
|
|
366
|
+
```bash
|
|
367
|
+
# Run with visible browser (for debugging)
|
|
368
|
+
BROWSER_HEADLESS=false browsercontrol
|
|
369
|
+
|
|
370
|
+
# Custom viewport for mobile testing
|
|
371
|
+
BROWSER_VIEWPORT_WIDTH=375 BROWSER_VIEWPORT_HEIGHT=812 browsercontrol
|
|
372
|
+
|
|
373
|
+
# Verbose logging
|
|
374
|
+
LOG_LEVEL=DEBUG browsercontrol
|
|
375
|
+
```
|
|
376
|
+
|
|
377
|
+
---
|
|
378
|
+
|
|
379
|
+
## 📚 Examples
|
|
380
|
+
|
|
381
|
+
### Example 1: Web Research
|
|
382
|
+
|
|
383
|
+
```
|
|
384
|
+
You: "Go to Wikipedia and find out when Python was created"
|
|
385
|
+
|
|
386
|
+
Claude: *navigates to wikipedia.org*
|
|
387
|
+
*types "Python programming language" in search*
|
|
388
|
+
*clicks search result*
|
|
389
|
+
*reads the page*
|
|
390
|
+
|
|
391
|
+
"Python was created by Guido van Rossum and first released in 1991."
|
|
392
|
+
```
|
|
393
|
+
|
|
394
|
+
### Example 2: Debug a Web App
|
|
395
|
+
|
|
396
|
+
```
|
|
397
|
+
You: "Go to my app at localhost:3000 and check for any errors"
|
|
398
|
+
|
|
399
|
+
Claude: *navigates to localhost:3000*
|
|
400
|
+
*calls get_console_logs()*
|
|
401
|
+
|
|
402
|
+
"I found 2 errors:
|
|
403
|
+
[ERROR] Uncaught TypeError: Cannot read property 'map' of undefined
|
|
404
|
+
[ERROR] Failed to load resource: 404 /api/users"
|
|
405
|
+
|
|
406
|
+
*calls get_network_requests()*
|
|
407
|
+
|
|
408
|
+
"The /api/users endpoint returned 404. Check your API route."
|
|
409
|
+
```
|
|
410
|
+
|
|
411
|
+
### Example 3: Automated Testing
|
|
412
|
+
|
|
413
|
+
```
|
|
414
|
+
You: "Test the login flow on my site. Start recording."
|
|
415
|
+
|
|
416
|
+
Claude: *calls start_recording()*
|
|
417
|
+
*navigates to site*
|
|
418
|
+
*types email in element 3*
|
|
419
|
+
*types password in element 4*
|
|
420
|
+
*clicks Login button (element 5)*
|
|
421
|
+
*verifies successful redirect*
|
|
422
|
+
*calls stop_recording()*
|
|
423
|
+
|
|
424
|
+
"Login test passed! Recording saved to session_20260108.zip"
|
|
425
|
+
```
|
|
426
|
+
|
|
427
|
+
### Example 4: Form Automation
|
|
428
|
+
|
|
429
|
+
```
|
|
430
|
+
You: "Fill out the contact form on example.com"
|
|
431
|
+
|
|
432
|
+
Claude: *navigates to example.com/contact*
|
|
433
|
+
*types name in element 2*
|
|
434
|
+
*types email in element 3*
|
|
435
|
+
*types message in element 4*
|
|
436
|
+
*clicks Submit (element 5)*
|
|
437
|
+
|
|
438
|
+
"Form submitted successfully!"
|
|
439
|
+
```
|
|
440
|
+
|
|
441
|
+
---
|
|
442
|
+
|
|
443
|
+
## 🏗️ Architecture
|
|
444
|
+
|
|
445
|
+
```
|
|
446
|
+
┌─────────────────┐ ┌──────────────────┐ ┌─────────────┐
|
|
447
|
+
│ AI Agent │────▶│ BrowserControl │────▶│ Browser │
|
|
448
|
+
│ (Claude/Gemini) │◀────│ MCP Server │◀────│ (Chromium) │
|
|
449
|
+
└─────────────────┘ └──────────────────┘ └─────────────┘
|
|
450
|
+
│ │ │
|
|
451
|
+
│ "click(5)" │ mouse.click() │
|
|
452
|
+
│◀───────────────────────│◀─────────────────────│
|
|
453
|
+
│ [annotated │ [screenshot + │
|
|
454
|
+
│ screenshot] │ element map] │
|
|
455
|
+
```
|
|
456
|
+
|
|
457
|
+
### How It Works
|
|
458
|
+
|
|
459
|
+
1. **AI sends command**: `click(5)`
|
|
460
|
+
2. **Server finds element**: Looks up element #5 from the last screenshot
|
|
461
|
+
3. **Browser acts**: Clicks at the element's coordinates
|
|
462
|
+
4. **Capture state**: Takes new screenshot, detects elements
|
|
463
|
+
5. **Annotate**: Draws numbered boxes on interactive elements
|
|
464
|
+
6. **Return to AI**: Sends annotated image + element list
|
|
465
|
+
|
|
466
|
+
---
|
|
467
|
+
|
|
468
|
+
## 📦 Project Structure
|
|
469
|
+
|
|
470
|
+
```
|
|
471
|
+
browsercontrol/
|
|
472
|
+
├── __init__.py # Package exports
|
|
473
|
+
├── __main__.py # CLI entry point
|
|
474
|
+
├── server.py # MCP server setup
|
|
475
|
+
├── browser.py # BrowserManager with SoM
|
|
476
|
+
├── config.py # Environment configuration
|
|
477
|
+
└── tools/
|
|
478
|
+
├── navigation.py # Navigation tools
|
|
479
|
+
├── interaction.py # Click, type, hover tools
|
|
480
|
+
├── forms.py # Form handling tools
|
|
481
|
+
├── content.py # Content extraction tools
|
|
482
|
+
├── devtools.py # Developer tools
|
|
483
|
+
└── recording.py # Session recording tools
|
|
484
|
+
```
|
|
485
|
+
|
|
486
|
+
---
|
|
487
|
+
|
|
488
|
+
## 🔧 Troubleshooting
|
|
489
|
+
|
|
490
|
+
### "Missing X server" Error
|
|
491
|
+
|
|
492
|
+
Set `BROWSER_HEADLESS=true` or run with xvfb:
|
|
493
|
+
```bash
|
|
494
|
+
xvfb-run browsercontrol
|
|
495
|
+
```
|
|
496
|
+
|
|
497
|
+
### Browser Not Starting
|
|
498
|
+
|
|
499
|
+
Chromium auto-installs on first run. If it fails, install manually:
|
|
500
|
+
```bash
|
|
501
|
+
python -m playwright install chromium
|
|
502
|
+
```
|
|
503
|
+
|
|
504
|
+
### Session Not Persisting
|
|
505
|
+
|
|
506
|
+
Check that `BROWSER_USER_DATA_DIR` is writable:
|
|
507
|
+
```bash
|
|
508
|
+
ls -la ~/.browsercontrol/
|
|
509
|
+
```
|
|
510
|
+
|
|
511
|
+
### Connection Refused
|
|
512
|
+
|
|
513
|
+
Ensure no other instance is running:
|
|
514
|
+
```bash
|
|
515
|
+
pkill -f browsercontrol
|
|
516
|
+
browsercontrol
|
|
517
|
+
```
|
|
518
|
+
|
|
519
|
+
---
|
|
520
|
+
|
|
521
|
+
## 🤝 Contributing
|
|
522
|
+
|
|
523
|
+
Contributions are welcome! Some ideas:
|
|
524
|
+
|
|
525
|
+
- [ ] Multi-tab support
|
|
526
|
+
- [ ] Firefox/WebKit support
|
|
527
|
+
- [ ] DOM diffing (detect changes)
|
|
528
|
+
- [ ] Accessibility audit
|
|
529
|
+
- [ ] Mobile emulation presets
|
|
530
|
+
- [ ] Cookie import/export
|
|
531
|
+
|
|
532
|
+
```bash
|
|
533
|
+
# Clone and install
|
|
534
|
+
git clone https://github.com/adityasasidhar/browsercontrol
|
|
535
|
+
cd browsercontrol
|
|
536
|
+
uv sync
|
|
537
|
+
|
|
538
|
+
# Run tests
|
|
539
|
+
uv run pytest
|
|
540
|
+
|
|
541
|
+
# Run in development
|
|
542
|
+
uv run fastmcp dev browsercontrol/server.py
|
|
543
|
+
```
|
|
544
|
+
|
|
545
|
+
---
|
|
546
|
+
|
|
547
|
+
## 📄 License
|
|
548
|
+
|
|
549
|
+
MIT License - Use it however you want.
|
|
550
|
+
|
|
551
|
+
---
|
|
552
|
+
|
|
553
|
+
## 🙏 Acknowledgments
|
|
554
|
+
|
|
555
|
+
- Inspired by the browser control capabilities in **Google's AntiGravity IDE**
|
|
556
|
+
- Built with [FastMCP](https://gofastmcp.com) and [Playwright](https://playwright.dev)
|
|
557
|
+
- Thanks to the MCP community for making AI-tool integration accessible
|
|
558
|
+
|
|
559
|
+
---
|
|
560
|
+
|
|
561
|
+
<p align="center">
|
|
562
|
+
<strong>Built with ❤️ for the AI agent community.</strong>
|
|
563
|
+
</p>
|
|
564
|
+
|
|
565
|
+
<p align="center">
|
|
566
|
+
<a href="https://github.com/adityasasidhar/browsercontrol">⭐ Star on GitHub</a> •
|
|
567
|
+
<a href="https://github.com/adityasasidhar/browsercontrol/issues">Report Bug</a> •
|
|
568
|
+
<a href="https://github.com/adityasasidhar/browsercontrol/issues">Request Feature</a>
|
|
569
|
+
</p>
|
|
@@ -0,0 +1,17 @@
|
|
|
1
|
+
browsercontrol/__init__.py,sha256=E6j-Cvot13Rvhw51cPoMS5oWUrYvWeDflfLL8GGjxcA,154
|
|
2
|
+
browsercontrol/__main__.py,sha256=RuuFpmjiukaJ49ZuxhZnUnThzJJgu01i2CCkDoZmJNM,262
|
|
3
|
+
browsercontrol/browser.py,sha256=YSiyBDWheY7WJWxRPoxk-vCKG64DYRQ0YwmX5Hs0QPE,15448
|
|
4
|
+
browsercontrol/config.py,sha256=tFst4bzqGfzzL6MEQgLWj6Ce6BvplDqrb-bZKDQqoW4,1742
|
|
5
|
+
browsercontrol/server.py,sha256=eSKG7_KLNm7Rfx0ynYsb0iC69vHpiiij6ClJhjI2-S8,2266
|
|
6
|
+
browsercontrol/tools/__init__.py,sha256=EbcDOYxh2HwOuw8rAoi4MPxmGq-3w61_xLiw6CT-3PM,627
|
|
7
|
+
browsercontrol/tools/content.py,sha256=n-Nk861QuqI1iFGxogOuGvm5r6F159OR4Bqrrssfx4c,5117
|
|
8
|
+
browsercontrol/tools/devtools.py,sha256=_i77QC-Rq4dO8Bxd8bds2BIOkltn9zV7_1V6cNfQa5c,14439
|
|
9
|
+
browsercontrol/tools/forms.py,sha256=v1yOqfAWbJveWxArZ4AIQQqSvLmbbpOVDdX_ebJiofw,3751
|
|
10
|
+
browsercontrol/tools/interaction.py,sha256=OPwyRb26eeinpvlmdE7G9nSivNhj-6eekwgOfCMjYqg,8016
|
|
11
|
+
browsercontrol/tools/navigation.py,sha256=7DQn8rp9FdM4k7dteVq8LOn51dRIU50vvfa8cEtgaj0,6733
|
|
12
|
+
browsercontrol/tools/recording.py,sha256=g-SnSq9ZaEBdhpcmBOtULiKyCAMgBu9zVtrvEpYKdp4,8577
|
|
13
|
+
browsercontrol-0.1.0.dist-info/METADATA,sha256=D18g7vROiymir9JTjzaaQmRj0neKOzJE0aGDycGwYUg,17608
|
|
14
|
+
browsercontrol-0.1.0.dist-info/WHEEL,sha256=WLgqFyCfm_KASv4WHyYy0P3pM_m7J5L9k2skdKLirC8,87
|
|
15
|
+
browsercontrol-0.1.0.dist-info/entry_points.txt,sha256=D5xqrNJa55ODigK69ugHY13MHJ7cuKC5UKR3V-AtnyA,64
|
|
16
|
+
browsercontrol-0.1.0.dist-info/licenses/LICENSE,sha256=FROq7ig9Fh69Doibiy8sbSHUG6AO9Y4beZlQvgCrnPY,1072
|
|
17
|
+
browsercontrol-0.1.0.dist-info/RECORD,,
|
|
@@ -0,0 +1,21 @@
|
|
|
1
|
+
MIT License
|
|
2
|
+
|
|
3
|
+
Copyright (c) 2026 Aditya Sasidhar
|
|
4
|
+
|
|
5
|
+
Permission is hereby granted, free of charge, to any person obtaining a copy
|
|
6
|
+
of this software and associated documentation files (the "Software"), to deal
|
|
7
|
+
in the Software without restriction, including without limitation the rights
|
|
8
|
+
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
|
|
9
|
+
copies of the Software, and to permit persons to whom the Software is
|
|
10
|
+
furnished to do so, subject to the following conditions:
|
|
11
|
+
|
|
12
|
+
The above copyright notice and this permission notice shall be included in all
|
|
13
|
+
copies or substantial portions of the Software.
|
|
14
|
+
|
|
15
|
+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
|
16
|
+
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
|
17
|
+
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
|
18
|
+
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
|
19
|
+
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
|
|
20
|
+
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
|
|
21
|
+
SOFTWARE.
|