browsercontrol 0.1.1__tar.gz → 0.1.2__tar.gz
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- {browsercontrol-0.1.1 → browsercontrol-0.1.2}/PKG-INFO +226 -221
- {browsercontrol-0.1.1 → browsercontrol-0.1.2}/README.md +225 -220
- {browsercontrol-0.1.1 → browsercontrol-0.1.2}/browsercontrol/browser.py +76 -7
- {browsercontrol-0.1.1 → browsercontrol-0.1.2}/browsercontrol/server.py +15 -0
- {browsercontrol-0.1.1 → browsercontrol-0.1.2}/browsercontrol/tools/__init__.py +5 -3
- {browsercontrol-0.1.1 → browsercontrol-0.1.2}/browsercontrol/tools/content.py +8 -1
- {browsercontrol-0.1.1 → browsercontrol-0.1.2}/browsercontrol/tools/devtools.py +120 -5
- {browsercontrol-0.1.1 → browsercontrol-0.1.2}/browsercontrol/tools/forms.py +45 -3
- {browsercontrol-0.1.1 → browsercontrol-0.1.2}/browsercontrol/tools/interaction.py +1 -2
- {browsercontrol-0.1.1 → browsercontrol-0.1.2}/browsercontrol/tools/navigation.py +1 -2
- {browsercontrol-0.1.1 → browsercontrol-0.1.2}/browsercontrol/tools/recording.py +1 -2
- browsercontrol-0.1.2/browsercontrol/tools/tabs.py +91 -0
- {browsercontrol-0.1.1 → browsercontrol-0.1.2}/pyproject.toml +1 -1
- {browsercontrol-0.1.1 → browsercontrol-0.1.2}/uv.lock +1 -1
- {browsercontrol-0.1.1 → browsercontrol-0.1.2}/.gitignore +0 -0
- {browsercontrol-0.1.1 → browsercontrol-0.1.2}/CONTRIBUTING.md +0 -0
- {browsercontrol-0.1.1 → browsercontrol-0.1.2}/LICENSE +0 -0
- {browsercontrol-0.1.1 → browsercontrol-0.1.2}/assets/logo.png +0 -0
- {browsercontrol-0.1.1 → browsercontrol-0.1.2}/browsercontrol/__init__.py +0 -0
- {browsercontrol-0.1.1 → browsercontrol-0.1.2}/browsercontrol/__main__.py +0 -0
- {browsercontrol-0.1.1 → browsercontrol-0.1.2}/browsercontrol/config.py +1 -1
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
Metadata-Version: 2.4
|
|
2
2
|
Name: browsercontrol
|
|
3
|
-
Version: 0.1.
|
|
3
|
+
Version: 0.1.2
|
|
4
4
|
Summary: MCP server for browser automation with Set of Marks (SoM) - AI agents can see and interact with web pages using numbered element IDs
|
|
5
5
|
Project-URL: Homepage, https://github.com/adityasasidhar/browsercontrol
|
|
6
6
|
Project-URL: Repository, https://github.com/adityasasidhar/browsercontrol
|
|
@@ -25,188 +25,107 @@ Requires-Dist: playwright>=1.49.0
|
|
|
25
25
|
Description-Content-Type: text/markdown
|
|
26
26
|
|
|
27
27
|
<p align="center">
|
|
28
|
-
<img src="https://raw.githubusercontent.com/adityasasidhar/browsercontrol/main/assets/logo.png" alt="BrowserControl" width="
|
|
28
|
+
<img src="https://raw.githubusercontent.com/adityasasidhar/browsercontrol/main/assets/logo.png" alt="BrowserControl" width="140">
|
|
29
29
|
</p>
|
|
30
30
|
|
|
31
|
-
<h1 align="center"
|
|
31
|
+
<h1 align="center">BrowserControl</h1>
|
|
32
32
|
|
|
33
33
|
<p align="center">
|
|
34
|
-
<strong>Give your AI agent real browser superpowers.</strong>
|
|
34
|
+
<strong>Give your AI agent real browser superpowers.</strong><br>
|
|
35
|
+
<sub>Vision-first browser automation for Claude, Gemini, and any MCP-compatible AI agent.</sub>
|
|
35
36
|
</p>
|
|
36
37
|
|
|
37
38
|
<p align="center">
|
|
38
|
-
<a href="https://
|
|
39
|
-
<a href="https://
|
|
40
|
-
<a href="https://
|
|
41
|
-
<a href="https://
|
|
39
|
+
<a href="https://pypi.org/project/browsercontrol/"><img src="https://img.shields.io/pypi/v/browsercontrol?color=blue&label=PyPI" alt="PyPI"></a>
|
|
40
|
+
<a href="https://www.python.org/downloads/"><img src="https://img.shields.io/badge/python-3.11+-3776ab.svg?logo=python&logoColor=white" alt="Python 3.11+"></a>
|
|
41
|
+
<a href="https://opensource.org/licenses/MIT"><img src="https://img.shields.io/badge/license-MIT-green.svg" alt="License: MIT"></a>
|
|
42
|
+
<a href="https://modelcontextprotocol.io/"><img src="https://img.shields.io/badge/MCP-compatible-7c3aed.svg" alt="MCP Compatible"></a>
|
|
43
|
+
<a href="https://github.com/adityasasidhar/browsercontrol/stargazers"><img src="https://img.shields.io/github/stars/adityasasidhar/browsercontrol?style=social" alt="GitHub Stars"></a>
|
|
42
44
|
</p>
|
|
43
45
|
|
|
44
46
|
<p align="center">
|
|
45
47
|
<a href="#-quick-start">Quick Start</a> •
|
|
46
|
-
<a href="#-
|
|
48
|
+
<a href="#-the-secret-set-of-marks-som">How It Works</a> •
|
|
47
49
|
<a href="#-available-tools">Tools</a> •
|
|
48
|
-
<a href="
|
|
49
|
-
<a href="#-examples">Examples</a>
|
|
50
|
+
<a href="#-configuration">Configuration</a> •
|
|
51
|
+
<a href="#-examples">Examples</a> •
|
|
52
|
+
<a href="#-contributing">Contributing</a>
|
|
50
53
|
</p>
|
|
51
54
|
|
|
52
55
|
---
|
|
53
56
|
|
|
54
|
-
Ever wished Claude
|
|
57
|
+
> **Ever wished Claude or Gemini could actually browse the web?** Not just fetch URLs, but truly **see**, **click**, **type**, and **interact** with any website like a human?
|
|
55
58
|
|
|
56
|
-
**BrowserControl** is an MCP server that gives your AI agent full browser access with a **vision-first approach
|
|
59
|
+
**BrowserControl** is an MCP server that gives your AI agent full browser access with a **vision-first approach**—no CSS selectors, no XPath, no guessing. Just point at numbers.
|
|
57
60
|
|
|
58
|
-
|
|
59
|
-
|
|
60
|
-
| Traditional Web Access | BrowserControl |
|
|
61
|
-
|------------------------|----------------|
|
|
62
|
-
| Fetch static HTML | See the **rendered page** |
|
|
63
|
-
| Parse complex DOM | Point at **numbered elements** |
|
|
64
|
-
| Guess at selectors | Just say **"click 5"** |
|
|
65
|
-
| No JavaScript support | Full **dynamic content** |
|
|
66
|
-
| No login persistence | **Persistent sessions** |
|
|
67
|
-
| No debugging tools | **Console, Network, Errors** |
|
|
68
|
-
|
|
69
|
-
### 🎯 The Secret: Set of Marks (SoM)
|
|
70
|
-
|
|
71
|
-
Every screenshot comes annotated with **numbered red boxes** on interactive elements:
|
|
72
|
-
|
|
73
|
-
```
|
|
74
|
-
Found 15 interactive elements:
|
|
75
|
-
[1] button - Sign In
|
|
76
|
-
[2] input - Search...
|
|
77
|
-
[3] a - Products
|
|
78
|
-
[4] a - Pricing
|
|
79
|
-
[5] button - Get Started
|
|
80
|
-
```
|
|
81
|
-
|
|
82
|
-
Your agent sees the numbers and simply calls `click(1)` to sign in. **No CSS selectors. No XPath. No guessing.**
|
|
83
|
-
|
|
84
|
-
---
|
|
61
|
+
<br>
|
|
85
62
|
|
|
86
|
-
##
|
|
87
|
-
|
|
88
|
-
### Head-to-Head Comparison
|
|
89
|
-
|
|
90
|
-
| Feature | **BrowserControl** | Playwright MCP | Stagehand | Browser-Use | AgentQL |
|
|
91
|
-
|---------|:------------------:|:--------------:|:---------:|:-----------:|:-------:|
|
|
92
|
-
| **Vision-First (SoM)** | ✅ Numbered boxes | ❌ Text tree | ⚠️ AI vision | ⚠️ AI vision | ❌ Selectors |
|
|
93
|
-
| **No Extra AI Calls** | ✅ Zero | ❌ Parses tree | ❌ GPT-4V per action | ❌ Vision model | ❌ Query model |
|
|
94
|
-
| **Developer Tools** | ✅ 6 tools | ❌ None | ❌ None | ❌ None | ❌ None |
|
|
95
|
-
| **Session Recording** | ✅ Built-in | ❌ Manual | ❌ None | ❌ None | ❌ None |
|
|
96
|
-
| **Persistent Sessions** | ✅ Automatic | ⚠️ Manual setup | ❌ None | ❌ None | ❌ None |
|
|
97
|
-
| **MCP Native** | ✅ FastMCP | ✅ Official | ❌ Python SDK | ⚠️ Custom | ❌ REST API |
|
|
98
|
-
| **Install Complexity** | ✅ `pip install` | ⚠️ npx + config | ❌ Docker + setup | ⚠️ Docker | ❌ Cloud signup |
|
|
99
|
-
| **Token Efficiency** | ✅ Tiny IDs | ⚠️ Large tree | ❌ Full images | ❌ Full images | ⚠️ Query results |
|
|
100
|
-
| **Cost per Action** | ✅ $0 | ✅ $0 | ❌ ~$0.01-0.05 | ❌ ~$0.01-0.05 | ❌ API fees |
|
|
101
|
-
| **Offline/Local** | ✅ 100% local | ✅ Local | ⚠️ Needs LLM API | ⚠️ Needs LLM API | ❌ Cloud only |
|
|
102
|
-
|
|
103
|
-
### 🎯 Key Advantages
|
|
63
|
+
## ✨ What Makes This Different
|
|
104
64
|
|
|
105
|
-
|
|
65
|
+
<table>
|
|
66
|
+
<tr>
|
|
67
|
+
<td width="50%">
|
|
106
68
|
|
|
69
|
+
### ❌ Traditional Approach
|
|
107
70
|
```
|
|
108
|
-
|
|
109
|
-
|
|
110
|
-
|
|
111
|
-
(5,000+ tokens) (3 tokens)
|
|
112
|
-
or
|
|
113
|
-
Base64 screenshot Element ID + summary
|
|
114
|
-
(10,000+ tokens) (100 tokens)
|
|
115
|
-
```
|
|
116
|
-
|
|
117
|
-
**Result**: 50-100x fewer tokens per action = faster responses, lower costs.
|
|
118
|
-
|
|
119
|
-
#### 2. **No Extra AI Calls Required**
|
|
120
|
-
|
|
121
|
-
| Tool | AI Calls per Click |
|
|
122
|
-
|------|-------------------|
|
|
123
|
-
| **BrowserControl** | 0 (just `click(5)`) |
|
|
124
|
-
| Stagehand | 1-2 (vision + action) |
|
|
125
|
-
| Browser-Use | 1-2 (vision + planning) |
|
|
126
|
-
| AgentQL | 1 (query interpretation) |
|
|
127
|
-
|
|
128
|
-
**Result**: No vision API costs, no rate limits, works offline.
|
|
129
|
-
|
|
130
|
-
#### 3. **Developer Tools No One Else Has**
|
|
131
|
-
|
|
132
|
-
```python
|
|
133
|
-
# Only BrowserControl can do this:
|
|
134
|
-
get_console_logs() # See browser errors
|
|
135
|
-
get_network_requests() # Monitor API calls
|
|
136
|
-
get_page_errors() # Catch JS exceptions
|
|
137
|
-
run_in_console(code) # Debug in real-time
|
|
138
|
-
inspect_element(5) # Get computed styles
|
|
139
|
-
get_page_performance() # Core Web Vitals
|
|
71
|
+
"Find the button with class 'btn-primary'
|
|
72
|
+
that contains 'Submit' and is inside
|
|
73
|
+
form#contact-form..."
|
|
140
74
|
```
|
|
75
|
+
- Parse complex DOM structures
|
|
76
|
+
- Guess at CSS selectors
|
|
77
|
+
- No JavaScript support
|
|
78
|
+
- No login persistence
|
|
79
|
+
- No debugging tools
|
|
141
80
|
|
|
142
|
-
|
|
143
|
-
|
|
144
|
-
#### 4. **Session Recording Built-In**
|
|
81
|
+
</td>
|
|
82
|
+
<td width="50%">
|
|
145
83
|
|
|
84
|
+
### ✅ BrowserControl
|
|
146
85
|
```
|
|
147
|
-
|
|
148
|
-
↓
|
|
149
|
-
📹 session_20260108.zip
|
|
150
|
-
(View with Playwright trace viewer)
|
|
86
|
+
"click(7)"
|
|
151
87
|
```
|
|
88
|
+
- See the **rendered page** with numbered elements
|
|
89
|
+
- Just say **"click 5"** or **"type in 3"**
|
|
90
|
+
- Full **dynamic JavaScript** support
|
|
91
|
+
- **Persistent sessions** across restarts
|
|
92
|
+
- Complete **DevTools access**
|
|
152
93
|
|
|
153
|
-
|
|
94
|
+
</td>
|
|
95
|
+
</tr>
|
|
96
|
+
</table>
|
|
154
97
|
|
|
155
|
-
|
|
156
|
-
|
|
157
|
-
| What Persists | BrowserControl | Others |
|
|
158
|
-
|---------------|:--------------:|:------:|
|
|
159
|
-
| Cookies | ✅ | ❌ |
|
|
160
|
-
| localStorage | ✅ | ❌ |
|
|
161
|
-
| Session tokens | ✅ | ❌ |
|
|
162
|
-
| Login state | ✅ | ❌ |
|
|
163
|
-
| Browser history | ✅ | ❌ |
|
|
98
|
+
<br>
|
|
164
99
|
|
|
165
|
-
|
|
100
|
+
## 🎯 The Secret: Set of Marks (SoM)
|
|
166
101
|
|
|
167
|
-
|
|
102
|
+
Every screenshot comes annotated with **numbered red boxes** on interactive elements:
|
|
168
103
|
|
|
169
104
|
```
|
|
170
|
-
|
|
171
|
-
|
|
172
|
-
|
|
173
|
-
|
|
174
|
-
|
|
175
|
-
|
|
105
|
+
Found 15 interactive elements:
|
|
106
|
+
[1] button - Sign In
|
|
107
|
+
[2] input - Search...
|
|
108
|
+
[3] a - Products
|
|
109
|
+
[4] a - Pricing
|
|
110
|
+
[5] button - Get Started
|
|
176
111
|
```
|
|
177
112
|
|
|
178
|
-
|
|
179
|
-
|
|
180
|
-
| Scenario | BrowserControl | Vision-Based Tools |
|
|
181
|
-
|----------|:--------------:|:------------------:|
|
|
182
|
-
| Click a button | ~50ms | ~2-5 seconds |
|
|
183
|
-
| Fill a form (5 fields) | ~500ms | ~15-30 seconds |
|
|
184
|
-
| Navigate + act | ~1 second | ~5-10 seconds |
|
|
185
|
-
| Debug console errors | ✅ Instant | ❌ Not possible |
|
|
186
|
-
|
|
187
|
-
### 💰 Cost Comparison (1000 actions/month)
|
|
188
|
-
|
|
189
|
-
| Tool | Monthly Cost |
|
|
190
|
-
|------|-------------|
|
|
191
|
-
| **BrowserControl** | **$0** (fully local) |
|
|
192
|
-
| Stagehand (GPT-4V) | ~$30-50 |
|
|
193
|
-
| Browser-Use (Claude Vision) | ~$20-40 |
|
|
194
|
-
| AgentQL | ~$50+ (API fees) |
|
|
113
|
+
Your agent sees the numbers and simply calls `click(1)` to sign in. **No CSS selectors. No XPath. No guessing.**
|
|
195
114
|
|
|
196
|
-
|
|
115
|
+
<br>
|
|
197
116
|
|
|
198
117
|
## 🚀 Quick Start
|
|
199
118
|
|
|
200
119
|
### Installation
|
|
201
120
|
|
|
202
121
|
```bash
|
|
203
|
-
#
|
|
122
|
+
# Using pip
|
|
204
123
|
pip install browsercontrol
|
|
205
124
|
|
|
206
|
-
# Or with uv (recommended)
|
|
125
|
+
# Or with uv (recommended for faster installs)
|
|
207
126
|
uv add browsercontrol
|
|
208
127
|
|
|
209
|
-
#
|
|
128
|
+
# Chromium is auto-installed on first run—no extra steps needed!
|
|
210
129
|
```
|
|
211
130
|
|
|
212
131
|
### Run the Server
|
|
@@ -215,7 +134,7 @@ uv add browsercontrol
|
|
|
215
134
|
# Using the CLI
|
|
216
135
|
browsercontrol
|
|
217
136
|
|
|
218
|
-
# Or as a module
|
|
137
|
+
# Or as a Python module
|
|
219
138
|
python -m browsercontrol
|
|
220
139
|
|
|
221
140
|
# Or with FastMCP
|
|
@@ -224,7 +143,24 @@ fastmcp run browsercontrol.server:mcp
|
|
|
224
143
|
|
|
225
144
|
### Connect to Claude Desktop
|
|
226
145
|
|
|
227
|
-
Add to
|
|
146
|
+
Add to your Claude configuration file:
|
|
147
|
+
|
|
148
|
+
<details>
|
|
149
|
+
<summary><b>📁 macOS</b> — <code>~/Library/Application Support/Claude/claude_desktop_config.json</code></summary>
|
|
150
|
+
|
|
151
|
+
```json
|
|
152
|
+
{
|
|
153
|
+
"mcpServers": {
|
|
154
|
+
"browsercontrol": {
|
|
155
|
+
"command": "browsercontrol"
|
|
156
|
+
}
|
|
157
|
+
}
|
|
158
|
+
}
|
|
159
|
+
```
|
|
160
|
+
</details>
|
|
161
|
+
|
|
162
|
+
<details>
|
|
163
|
+
<summary><b>📁 Linux</b> — <code>~/.config/Claude/claude_desktop_config.json</code></summary>
|
|
228
164
|
|
|
229
165
|
```json
|
|
230
166
|
{
|
|
@@ -235,60 +171,100 @@ Add to `~/.config/Claude/claude_desktop_config.json`:
|
|
|
235
171
|
}
|
|
236
172
|
}
|
|
237
173
|
```
|
|
174
|
+
</details>
|
|
238
175
|
|
|
239
|
-
|
|
176
|
+
<details>
|
|
177
|
+
<summary><b>📁 Windows</b> — <code>%APPDATA%\Claude\claude_desktop_config.json</code></summary>
|
|
178
|
+
|
|
179
|
+
```json
|
|
180
|
+
{
|
|
181
|
+
"mcpServers": {
|
|
182
|
+
"browsercontrol": {
|
|
183
|
+
"command": "browsercontrol"
|
|
184
|
+
}
|
|
185
|
+
}
|
|
186
|
+
}
|
|
187
|
+
```
|
|
188
|
+
</details>
|
|
240
189
|
|
|
190
|
+
Then ask Claude:
|
|
241
191
|
> *"Go to GitHub and star the browsercontrol repo"*
|
|
242
192
|
|
|
243
193
|
Claude will navigate, find the star button, and click it—showing you screenshots along the way!
|
|
244
194
|
|
|
245
|
-
|
|
195
|
+
<br>
|
|
246
196
|
|
|
247
|
-
##
|
|
197
|
+
## 🥊 Head-to-Head Comparison
|
|
248
198
|
|
|
249
|
-
|
|
199
|
+
| Feature | **BrowserControl** | Playwright MCP | Stagehand | Browser-Use | AgentQL |
|
|
200
|
+
|---------|:------------------:|:--------------:|:---------:|:-----------:|:-------:|
|
|
201
|
+
| **Vision-First (SoM)** | ✅ Numbered boxes | ❌ Text tree | ⚠️ AI vision | ⚠️ AI vision | ❌ Selectors |
|
|
202
|
+
| **Multi-Tab Support** | ✅ Full control | ⚠️ Implicit | ⚠️ Implicit | ⚠️ Basic | ❌ None |
|
|
203
|
+
| **Cookie Management** | ✅ Direct tools | ⚠️ JS only | ⚠️ JS only | ⚠️ Basic | ❌ None |
|
|
204
|
+
| **File Uploads** | ✅ Native tool | ⚠️ Manual | ❌ No | ❌ No | ❌ No |
|
|
205
|
+
| **Developer Tools** | ✅ 8 tools | ❌ None | ❌ None | ❌ None | ❌ None |
|
|
206
|
+
| **Session Recording** | ✅ Built-in | ⚠️ Manual | ❌ None | ❌ None | ❌ None |
|
|
207
|
+
| **Persistent Sessions** | ✅ Automatic | ⚠️ Manual | ❌ None | ❌ None | ❌ None |
|
|
208
|
+
| **Token Efficiency** | ✅ Tiny IDs | ⚠️ Large tree | ❌ Full images | ❌ Full images | ⚠️ Query results |
|
|
209
|
+
| **100% Local/Offline** | ✅ Yes | ✅ Yes | ❌ Needs LLM API | ❌ Needs LLM API | ❌ Cloud only |
|
|
210
|
+
| **Monthly Cost (1k actions)** | **$0** | $0 | ~$30-50 | ~$20-40 | ~$50+ |
|
|
250
211
|
|
|
251
|
-
|
|
252
|
-
- **See** the page exactly as a human would
|
|
253
|
-
- **Identify** clickable elements by number
|
|
254
|
-
- **Act** with simple commands like `click(5)`
|
|
212
|
+
<br>
|
|
255
213
|
|
|
256
|
-
|
|
214
|
+
## 💪 Key Advantages
|
|
257
215
|
|
|
258
|
-
|
|
216
|
+
### 1. Multi-Tab Orchestration
|
|
217
|
+
Unlike other tools that get "lost" when a new window opens:
|
|
218
|
+
- `list_tabs()` — See every open page, title, and URL
|
|
219
|
+
- `switch_tab(index)` — Multitask between different sites
|
|
220
|
+
- `create_tab(url)` — Open references or parallel workflows
|
|
259
221
|
|
|
260
|
-
|
|
261
|
-
|
|
262
|
-
|
|
263
|
-
|
|
264
|
-
|
|
265
|
-
| `run_in_console(code)` | Execute JS in browser console |
|
|
266
|
-
| `inspect_element(id)` | Get computed styles, dimensions, properties |
|
|
267
|
-
| `get_page_performance()` | Page load time, Core Web Vitals, memory |
|
|
268
|
-
|
|
269
|
-
### 3. 🎬 Session Recording
|
|
222
|
+
### 2. Session & Cookie Management
|
|
223
|
+
Stop fighting with login forms. Inject or inspect session state directly:
|
|
224
|
+
- `set_cookie()` — Log in instantly by injecting an auth token
|
|
225
|
+
- `get_cookies()` — Debug session issues or export state
|
|
226
|
+
- `clear_cookies()` — Fresh start without clearing the whole profile
|
|
270
227
|
|
|
271
|
-
|
|
228
|
+
### 3. Reliable File Uploads
|
|
229
|
+
Most AI agents fail when they hit a `<input type="file">`. BrowserControl uses native browser engine hooks:
|
|
230
|
+
- `upload_file(id, path)` — Just point at the button and the local file
|
|
272
231
|
|
|
273
|
-
|
|
274
|
-
|
|
275
|
-
|
|
276
|
-
|
|
277
|
-
|
|
278
|
-
|
|
232
|
+
### 4. Developer Tools Suite
|
|
233
|
+
Debug like a pro with tools no one else provides:
|
|
234
|
+
```python
|
|
235
|
+
get_console_logs() # See browser errors
|
|
236
|
+
get_network_requests() # Monitor API calls
|
|
237
|
+
get_page_errors() # Catch JS exceptions
|
|
238
|
+
run_in_console(code) # Debug in real-time
|
|
239
|
+
inspect_element(5) # Get computed styles
|
|
240
|
+
get_page_performance() # Core Web Vitals
|
|
241
|
+
```
|
|
279
242
|
|
|
280
|
-
|
|
281
|
-
```
|
|
282
|
-
|
|
243
|
+
### 5. Session Recording
|
|
244
|
+
```
|
|
245
|
+
start_recording() → Browse around → stop_recording()
|
|
246
|
+
↓
|
|
247
|
+
session_20260202.zip
|
|
248
|
+
(View with Playwright trace viewer)
|
|
283
249
|
```
|
|
284
250
|
|
|
285
|
-
###
|
|
251
|
+
### 6. Dynamic Viewport Control
|
|
252
|
+
Test responsive designs or emulate mobile screens on the fly:
|
|
253
|
+
- `set_viewport(width, height)` — Change resolution without restarting
|
|
286
254
|
|
|
287
|
-
|
|
288
|
-
- Stay logged into websites
|
|
289
|
-
- Maintain shopping carts, preferences, etc.
|
|
255
|
+
### 7. True Persistence
|
|
290
256
|
|
|
291
|
-
|
|
257
|
+
| What Persists | BrowserControl | Others |
|
|
258
|
+
|---------------|:--------------:|:------:|
|
|
259
|
+
| Cookies | ✅ | ❌ |
|
|
260
|
+
| localStorage | ✅ | ❌ |
|
|
261
|
+
| Session tokens | ✅ | ❌ |
|
|
262
|
+
| Login state | ✅ | ❌ |
|
|
263
|
+
| Browser history | ✅ | ❌ |
|
|
264
|
+
|
|
265
|
+
**Result**: Log in once, stay logged in across sessions.
|
|
266
|
+
|
|
267
|
+
<br>
|
|
292
268
|
|
|
293
269
|
## 🛠️ Available Tools
|
|
294
270
|
|
|
@@ -299,26 +275,36 @@ npx playwright show-trace ~/.browsercontrol/recordings/session.zip
|
|
|
299
275
|
| `go_back()` | Navigate back |
|
|
300
276
|
| `go_forward()` | Navigate forward |
|
|
301
277
|
| `refresh_page()` | Reload the page |
|
|
302
|
-
| `scroll(direction, amount)` | Scroll
|
|
278
|
+
| `scroll(direction, amount)` | Scroll up/down/left/right |
|
|
303
279
|
|
|
304
280
|
### Interaction
|
|
305
281
|
| Tool | Description |
|
|
306
282
|
|------|-------------|
|
|
307
283
|
| `click(element_id)` | Click element by number |
|
|
308
284
|
| `click_at(x, y)` | Click at coordinates |
|
|
309
|
-
| `type_text(element_id, text)` | Type into input |
|
|
285
|
+
| `type_text(element_id, text)` | Type into input field |
|
|
310
286
|
| `press_key(key)` | Press keyboard key (Enter, Tab, etc.) |
|
|
311
287
|
| `hover(element_id)` | Hover over element |
|
|
312
288
|
| `scroll_to_element(element_id)` | Scroll element into view |
|
|
313
|
-
| `
|
|
289
|
+
| `upload_file(element_id, path)` | Upload a file to an input |
|
|
290
|
+
| `wait(seconds)` | Wait for page loading |
|
|
291
|
+
|
|
292
|
+
### Tab Management
|
|
293
|
+
| Tool | Description |
|
|
294
|
+
|------|-------------|
|
|
295
|
+
| `create_tab(url)` | Open a new browser tab |
|
|
296
|
+
| `switch_tab(index)` | Switch to a tab by its index |
|
|
297
|
+
| `close_tab(index)` | Close a specific tab |
|
|
298
|
+
| `list_tabs()` | List all open tabs and URLs |
|
|
314
299
|
|
|
315
300
|
### Forms
|
|
316
301
|
| Tool | Description |
|
|
317
302
|
|------|-------------|
|
|
318
303
|
| `select_option(element_id, option)` | Select dropdown option |
|
|
319
304
|
| `check_checkbox(element_id)` | Toggle checkbox |
|
|
305
|
+
| `upload_file(element_id, file_path)` | Upload file to input |
|
|
320
306
|
|
|
321
|
-
### Content
|
|
307
|
+
### Content Extraction
|
|
322
308
|
| Tool | Description |
|
|
323
309
|
|------|-------------|
|
|
324
310
|
| `get_page_content()` | Get page as markdown |
|
|
@@ -335,6 +321,11 @@ npx playwright show-trace ~/.browsercontrol/recordings/session.zip
|
|
|
335
321
|
| `get_page_errors()` | JavaScript errors |
|
|
336
322
|
| `run_in_console(code)` | Execute JS in console |
|
|
337
323
|
| `inspect_element(id)` | Element styles/properties |
|
|
324
|
+
| `get_cookies()` | List browser cookies |
|
|
325
|
+
| `set_cookie(name, value, ...)` | Set a cookie |
|
|
326
|
+
| `delete_cookie(name)` | Remove a cookie |
|
|
327
|
+
| `clear_cookies()` | Clear all cookies |
|
|
328
|
+
| `set_viewport(width, height)` | Change window size |
|
|
338
329
|
| `get_page_performance()` | Load times, Web Vitals |
|
|
339
330
|
|
|
340
331
|
### Recording
|
|
@@ -345,7 +336,7 @@ npx playwright show-trace ~/.browsercontrol/recordings/session.zip
|
|
|
345
336
|
| `take_snapshot()` | Save screenshot + HTML |
|
|
346
337
|
| `list_recordings()` | View saved sessions |
|
|
347
338
|
|
|
348
|
-
|
|
339
|
+
<br>
|
|
349
340
|
|
|
350
341
|
## ⚙️ Configuration
|
|
351
342
|
|
|
@@ -358,28 +349,27 @@ Configure via environment variables:
|
|
|
358
349
|
| `BROWSER_VIEWPORT_HEIGHT` | `720` | Viewport height in pixels |
|
|
359
350
|
| `BROWSER_TIMEOUT` | `30000` | Navigation timeout (ms) |
|
|
360
351
|
| `BROWSER_USER_DATA_DIR` | `~/.browsercontrol/user_data` | Browser profile path |
|
|
361
|
-
| `BROWSER_EXTENSION_PATH` |
|
|
362
|
-
| `LOG_LEVEL` | `INFO` | Logging
|
|
352
|
+
| `BROWSER_EXTENSION_PATH` | — | Path to browser extension |
|
|
353
|
+
| `LOG_LEVEL` | `INFO` | Logging verbosity |
|
|
363
354
|
|
|
364
|
-
|
|
355
|
+
**Examples:**
|
|
365
356
|
|
|
366
357
|
```bash
|
|
367
358
|
# Run with visible browser (for debugging)
|
|
368
359
|
BROWSER_HEADLESS=false browsercontrol
|
|
369
360
|
|
|
370
|
-
#
|
|
361
|
+
# Mobile viewport emulation
|
|
371
362
|
BROWSER_VIEWPORT_WIDTH=375 BROWSER_VIEWPORT_HEIGHT=812 browsercontrol
|
|
372
363
|
|
|
373
364
|
# Verbose logging
|
|
374
365
|
LOG_LEVEL=DEBUG browsercontrol
|
|
375
366
|
```
|
|
376
367
|
|
|
377
|
-
|
|
378
|
-
|
|
379
|
-
## 📚 Examples
|
|
368
|
+
<br>
|
|
380
369
|
|
|
381
|
-
|
|
370
|
+
## 📖 Examples
|
|
382
371
|
|
|
372
|
+
### Web Research
|
|
383
373
|
```
|
|
384
374
|
You: "Go to Wikipedia and find out when Python was created"
|
|
385
375
|
|
|
@@ -391,8 +381,7 @@ Claude: *navigates to wikipedia.org*
|
|
|
391
381
|
"Python was created by Guido van Rossum and first released in 1991."
|
|
392
382
|
```
|
|
393
383
|
|
|
394
|
-
###
|
|
395
|
-
|
|
384
|
+
### Debug a Web App
|
|
396
385
|
```
|
|
397
386
|
You: "Go to my app at localhost:3000 and check for any errors"
|
|
398
387
|
|
|
@@ -408,8 +397,7 @@ Claude: *navigates to localhost:3000*
|
|
|
408
397
|
"The /api/users endpoint returned 404. Check your API route."
|
|
409
398
|
```
|
|
410
399
|
|
|
411
|
-
###
|
|
412
|
-
|
|
400
|
+
### Automated Testing with Recording
|
|
413
401
|
```
|
|
414
402
|
You: "Test the login flow on my site. Start recording."
|
|
415
403
|
|
|
@@ -421,11 +409,10 @@ Claude: *calls start_recording()*
|
|
|
421
409
|
*verifies successful redirect*
|
|
422
410
|
*calls stop_recording()*
|
|
423
411
|
|
|
424
|
-
"Login test passed! Recording saved to
|
|
412
|
+
"Login test passed! Recording saved to session_20260202.zip"
|
|
425
413
|
```
|
|
426
414
|
|
|
427
|
-
###
|
|
428
|
-
|
|
415
|
+
### Form Automation
|
|
429
416
|
```
|
|
430
417
|
You: "Fill out the contact form on example.com"
|
|
431
418
|
|
|
@@ -438,7 +425,7 @@ Claude: *navigates to example.com/contact*
|
|
|
438
425
|
"Form submitted successfully!"
|
|
439
426
|
```
|
|
440
427
|
|
|
441
|
-
|
|
428
|
+
<br>
|
|
442
429
|
|
|
443
430
|
## 🏗️ Architecture
|
|
444
431
|
|
|
@@ -456,16 +443,16 @@ Claude: *navigates to example.com/contact*
|
|
|
456
443
|
|
|
457
444
|
### How It Works
|
|
458
445
|
|
|
459
|
-
1. **AI sends command
|
|
460
|
-
2. **Server finds element
|
|
461
|
-
3. **Browser acts
|
|
462
|
-
4. **Capture state
|
|
463
|
-
5. **Annotate
|
|
464
|
-
6. **Return to AI
|
|
446
|
+
1. **AI sends command** — `click(5)`
|
|
447
|
+
2. **Server finds element** — Looks up element #5 from the last screenshot
|
|
448
|
+
3. **Browser acts** — Clicks at the element's coordinates
|
|
449
|
+
4. **Capture state** — Takes new screenshot, detects elements
|
|
450
|
+
5. **Annotate** — Draws numbered boxes on interactive elements
|
|
451
|
+
6. **Return to AI** — Sends annotated image + element list
|
|
465
452
|
|
|
466
|
-
|
|
453
|
+
<br>
|
|
467
454
|
|
|
468
|
-
##
|
|
455
|
+
## 📁 Project Structure
|
|
469
456
|
|
|
470
457
|
```
|
|
471
458
|
browsercontrol/
|
|
@@ -480,54 +467,72 @@ browsercontrol/
|
|
|
480
467
|
├── forms.py # Form handling tools
|
|
481
468
|
├── content.py # Content extraction tools
|
|
482
469
|
├── devtools.py # Developer tools
|
|
483
|
-
|
|
470
|
+
├── recording.py # Session recording tools
|
|
471
|
+
└── tabs.py # Tab management tools
|
|
484
472
|
```
|
|
485
473
|
|
|
486
|
-
|
|
474
|
+
<br>
|
|
487
475
|
|
|
488
476
|
## 🔧 Troubleshooting
|
|
489
477
|
|
|
490
|
-
|
|
478
|
+
<details>
|
|
479
|
+
<summary><b>"Missing X server" Error</b></summary>
|
|
491
480
|
|
|
492
481
|
Set `BROWSER_HEADLESS=true` or run with xvfb:
|
|
493
482
|
```bash
|
|
494
483
|
xvfb-run browsercontrol
|
|
495
484
|
```
|
|
485
|
+
</details>
|
|
496
486
|
|
|
497
|
-
|
|
487
|
+
<details>
|
|
488
|
+
<summary><b>Browser Not Starting</b></summary>
|
|
498
489
|
|
|
499
490
|
Chromium auto-installs on first run. If it fails, install manually:
|
|
500
491
|
```bash
|
|
501
492
|
python -m playwright install chromium
|
|
502
493
|
```
|
|
494
|
+
</details>
|
|
503
495
|
|
|
504
|
-
|
|
496
|
+
<details>
|
|
497
|
+
<summary><b>Session Not Persisting</b></summary>
|
|
505
498
|
|
|
506
499
|
Check that `BROWSER_USER_DATA_DIR` is writable:
|
|
507
500
|
```bash
|
|
508
501
|
ls -la ~/.browsercontrol/
|
|
509
502
|
```
|
|
503
|
+
</details>
|
|
510
504
|
|
|
511
|
-
|
|
505
|
+
<details>
|
|
506
|
+
<summary><b>Connection Refused</b></summary>
|
|
512
507
|
|
|
513
508
|
Ensure no other instance is running:
|
|
514
509
|
```bash
|
|
515
510
|
pkill -f browsercontrol
|
|
516
511
|
browsercontrol
|
|
517
512
|
```
|
|
513
|
+
</details>
|
|
518
514
|
|
|
519
|
-
|
|
515
|
+
<details>
|
|
516
|
+
<summary><b>View Session Recordings</b></summary>
|
|
517
|
+
|
|
518
|
+
Open recordings in the Playwright trace viewer:
|
|
519
|
+
```bash
|
|
520
|
+
npx playwright show-trace ~/.browsercontrol/recordings/session.zip
|
|
521
|
+
```
|
|
522
|
+
</details>
|
|
523
|
+
|
|
524
|
+
<br>
|
|
520
525
|
|
|
521
526
|
## 🤝 Contributing
|
|
522
527
|
|
|
523
|
-
Contributions are welcome!
|
|
528
|
+
Contributions are welcome! Check out our [Contributing Guide](CONTRIBUTING.md) for details.
|
|
524
529
|
|
|
525
|
-
|
|
530
|
+
**Ideas for contributions:**
|
|
526
531
|
- [ ] Firefox/WebKit support
|
|
527
532
|
- [ ] DOM diffing (detect changes)
|
|
528
|
-
- [ ] Accessibility audit
|
|
533
|
+
- [ ] Accessibility audit tools
|
|
529
534
|
- [ ] Mobile emulation presets
|
|
530
|
-
- [ ] Cookie import/export
|
|
535
|
+
- [ ] Cookie import/export files
|
|
531
536
|
|
|
532
537
|
```bash
|
|
533
538
|
# Clone and install
|
|
@@ -542,28 +547,28 @@ uv run pytest
|
|
|
542
547
|
uv run fastmcp dev browsercontrol/server.py
|
|
543
548
|
```
|
|
544
549
|
|
|
545
|
-
|
|
550
|
+
<br>
|
|
546
551
|
|
|
547
552
|
## 📄 License
|
|
548
553
|
|
|
549
|
-
MIT License
|
|
554
|
+
[MIT License](LICENSE) — Use it however you want.
|
|
550
555
|
|
|
551
|
-
|
|
556
|
+
<br>
|
|
552
557
|
|
|
553
558
|
## 🙏 Acknowledgments
|
|
554
559
|
|
|
555
|
-
-
|
|
560
|
+
- Vision-first approach inspired by **Google's AntiGravity IDE**
|
|
556
561
|
- Built with [FastMCP](https://gofastmcp.com) and [Playwright](https://playwright.dev)
|
|
557
562
|
- Thanks to the MCP community for making AI-tool integration accessible
|
|
558
563
|
|
|
559
564
|
---
|
|
560
565
|
|
|
561
566
|
<p align="center">
|
|
562
|
-
<strong>Built
|
|
567
|
+
<strong>Built for AI agents that need to see the web.</strong>
|
|
563
568
|
</p>
|
|
564
569
|
|
|
565
570
|
<p align="center">
|
|
566
571
|
<a href="https://github.com/adityasasidhar/browsercontrol">⭐ Star on GitHub</a> •
|
|
567
|
-
<a href="https://github.com/adityasasidhar/browsercontrol/issues"
|
|
568
|
-
<a href="https://github.com/adityasasidhar/browsercontrol/issues"
|
|
572
|
+
<a href="https://github.com/adityasasidhar/browsercontrol/issues">🐛 Report Bug</a> •
|
|
573
|
+
<a href="https://github.com/adityasasidhar/browsercontrol/issues">💡 Request Feature</a>
|
|
569
574
|
</p>
|