browsercontrol 0.1.1__tar.gz
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- browsercontrol-0.1.1/.gitignore +18 -0
- browsercontrol-0.1.1/CONTRIBUTING.md +84 -0
- browsercontrol-0.1.1/LICENSE +21 -0
- browsercontrol-0.1.1/PKG-INFO +569 -0
- browsercontrol-0.1.1/README.md +543 -0
- browsercontrol-0.1.1/assets/logo.png +0 -0
- browsercontrol-0.1.1/browsercontrol/__init__.py +8 -0
- browsercontrol-0.1.1/browsercontrol/__main__.py +19 -0
- browsercontrol-0.1.1/browsercontrol/browser.py +431 -0
- browsercontrol-0.1.1/browsercontrol/config.py +61 -0
- browsercontrol-0.1.1/browsercontrol/server.py +89 -0
- browsercontrol-0.1.1/browsercontrol/tools/__init__.py +17 -0
- browsercontrol-0.1.1/browsercontrol/tools/content.py +135 -0
- browsercontrol-0.1.1/browsercontrol/tools/devtools.py +355 -0
- browsercontrol-0.1.1/browsercontrol/tools/forms.py +96 -0
- browsercontrol-0.1.1/browsercontrol/tools/interaction.py +204 -0
- browsercontrol-0.1.1/browsercontrol/tools/navigation.py +163 -0
- browsercontrol-0.1.1/browsercontrol/tools/recording.py +221 -0
- browsercontrol-0.1.1/pyproject.toml +48 -0
- browsercontrol-0.1.1/uv.lock +1896 -0
|
@@ -0,0 +1,84 @@
|
|
|
1
|
+
# Contributing to BrowserControl
|
|
2
|
+
|
|
3
|
+
We love your input! We want to make contributing to BrowserControl as easy and transparent as possible, whether it's:
|
|
4
|
+
|
|
5
|
+
- Reporting a bug
|
|
6
|
+
- Discussing the current state of the code
|
|
7
|
+
- Submitting a fix
|
|
8
|
+
- Proposing new features
|
|
9
|
+
- Becoming a maintainer
|
|
10
|
+
|
|
11
|
+
## π Quick Start for Contributors
|
|
12
|
+
|
|
13
|
+
1. **Fork the repo** and clone it locally
|
|
14
|
+
2. **Install `uv`** (our package manager):
|
|
15
|
+
```bash
|
|
16
|
+
curl -LsSf https://astral.sh/uv/install.sh | sh
|
|
17
|
+
```
|
|
18
|
+
3. **Install dependencies**:
|
|
19
|
+
```bash
|
|
20
|
+
uv sync
|
|
21
|
+
```
|
|
22
|
+
4. **Install Playwright browsers**:
|
|
23
|
+
```bash
|
|
24
|
+
uv run playwright install chromium
|
|
25
|
+
```
|
|
26
|
+
5. **Run the server in dev mode**:
|
|
27
|
+
```bash
|
|
28
|
+
uv run fastmcp dev browsercontrol/server.py
|
|
29
|
+
```
|
|
30
|
+
|
|
31
|
+
## π οΈ Development Workflow
|
|
32
|
+
|
|
33
|
+
We use [uv](https://github.com/astral-sh/uv) for dependency management and packaging. It's fast and reliable.
|
|
34
|
+
|
|
35
|
+
### Project Structure
|
|
36
|
+
- `browsercontrol/server.py`: Main MCP server definition
|
|
37
|
+
- `browsercontrol/browser.py`: Core logic (Playwright + Set of Marks)
|
|
38
|
+
- `browsercontrol/tools/`: Tool implementations split by category
|
|
39
|
+
|
|
40
|
+
### Making Changes
|
|
41
|
+
1. Create a branch for your feature: `git checkout -b feature/amazing-feature`
|
|
42
|
+
2. Implement your changes
|
|
43
|
+
3. Run tests (see below)
|
|
44
|
+
4. Commit your changes. We like [Conventional Commits](https://www.conventionalcommits.org/).
|
|
45
|
+
- `feat: add new scrolling tool`
|
|
46
|
+
- `fix: handle localhost connection refused`
|
|
47
|
+
- `docs: update troubleshooting guide`
|
|
48
|
+
|
|
49
|
+
## π§ͺ Testing
|
|
50
|
+
|
|
51
|
+
We use `pytest`. Please ensure all tests pass before submitting a PR.
|
|
52
|
+
|
|
53
|
+
```bash
|
|
54
|
+
# Run all tests
|
|
55
|
+
uv run pytest
|
|
56
|
+
|
|
57
|
+
# Run specific test file
|
|
58
|
+
uv run pytest tests/test_navigation.py
|
|
59
|
+
```
|
|
60
|
+
|
|
61
|
+
If you add a new tool or feature, please add a corresponding test case covering:
|
|
62
|
+
- Happy path (it works)
|
|
63
|
+
- Error handling (it fails gracefully)
|
|
64
|
+
|
|
65
|
+
## π Pull Request Process
|
|
66
|
+
|
|
67
|
+
1. Update the README.md with details of changes to the interface, this includes new environment variables, exposed ports, useful file locations and container parameters.
|
|
68
|
+
2. Increase the version numbers in any examples files and the README.md to the new version that this Pull Request would represent.
|
|
69
|
+
3. You may merge the Pull Request in once you have the sign-off of two other developers, or if you do not have permission to do that, you may request the second reviewer to merge it for you.
|
|
70
|
+
|
|
71
|
+
## π Reporting Bugs
|
|
72
|
+
|
|
73
|
+
Bugs are tracked as GitHub issues. When filing an issue, please explain the problem and include additional details to help maintainers reproduce the problem:
|
|
74
|
+
|
|
75
|
+
- Use a clear and descriptive title
|
|
76
|
+
- Describe the exact steps which reproduce the problem
|
|
77
|
+
- Provide specific examples to demonstrate the steps
|
|
78
|
+
- Describe the behavior you observed after following the steps
|
|
79
|
+
- Explain which behavior you expected to see instead and why
|
|
80
|
+
- Include screenshots/logs if possible
|
|
81
|
+
|
|
82
|
+
## π License
|
|
83
|
+
|
|
84
|
+
By contributing, you agree that your contributions will be licensed under its MIT License.
|
|
@@ -0,0 +1,21 @@
|
|
|
1
|
+
MIT License
|
|
2
|
+
|
|
3
|
+
Copyright (c) 2026 Aditya Sasidhar
|
|
4
|
+
|
|
5
|
+
Permission is hereby granted, free of charge, to any person obtaining a copy
|
|
6
|
+
of this software and associated documentation files (the "Software"), to deal
|
|
7
|
+
in the Software without restriction, including without limitation the rights
|
|
8
|
+
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
|
|
9
|
+
copies of the Software, and to permit persons to whom the Software is
|
|
10
|
+
furnished to do so, subject to the following conditions:
|
|
11
|
+
|
|
12
|
+
The above copyright notice and this permission notice shall be included in all
|
|
13
|
+
copies or substantial portions of the Software.
|
|
14
|
+
|
|
15
|
+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
|
16
|
+
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
|
17
|
+
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
|
18
|
+
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
|
19
|
+
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
|
|
20
|
+
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
|
|
21
|
+
SOFTWARE.
|
|
@@ -0,0 +1,569 @@
|
|
|
1
|
+
Metadata-Version: 2.4
|
|
2
|
+
Name: browsercontrol
|
|
3
|
+
Version: 0.1.1
|
|
4
|
+
Summary: MCP server for browser automation with Set of Marks (SoM) - AI agents can see and interact with web pages using numbered element IDs
|
|
5
|
+
Project-URL: Homepage, https://github.com/adityasasidhar/browsercontrol
|
|
6
|
+
Project-URL: Repository, https://github.com/adityasasidhar/browsercontrol
|
|
7
|
+
Author: Aditya Sasidhar
|
|
8
|
+
License: MIT
|
|
9
|
+
License-File: LICENSE
|
|
10
|
+
Keywords: agent,ai,automation,browser,llm,mcp,playwright
|
|
11
|
+
Classifier: Development Status :: 4 - Beta
|
|
12
|
+
Classifier: Intended Audience :: Developers
|
|
13
|
+
Classifier: License :: OSI Approved :: MIT License
|
|
14
|
+
Classifier: Programming Language :: Python :: 3
|
|
15
|
+
Classifier: Programming Language :: Python :: 3.11
|
|
16
|
+
Classifier: Programming Language :: Python :: 3.12
|
|
17
|
+
Classifier: Programming Language :: Python :: 3.13
|
|
18
|
+
Classifier: Topic :: Internet :: WWW/HTTP :: Browsers
|
|
19
|
+
Classifier: Topic :: Software Development :: Libraries :: Python Modules
|
|
20
|
+
Requires-Python: >=3.11
|
|
21
|
+
Requires-Dist: fastmcp>=2.14.2
|
|
22
|
+
Requires-Dist: markdownify>=0.14.1
|
|
23
|
+
Requires-Dist: pillow>=11.0.0
|
|
24
|
+
Requires-Dist: playwright>=1.49.0
|
|
25
|
+
Description-Content-Type: text/markdown
|
|
26
|
+
|
|
27
|
+
<p align="center">
|
|
28
|
+
<img src="https://raw.githubusercontent.com/adityasasidhar/browsercontrol/main/assets/logo.png" alt="BrowserControl" width="120">
|
|
29
|
+
</p>
|
|
30
|
+
|
|
31
|
+
<h1 align="center">π BrowserControl</h1>
|
|
32
|
+
|
|
33
|
+
<p align="center">
|
|
34
|
+
<strong>Give your AI agent real browser superpowers.</strong>
|
|
35
|
+
</p>
|
|
36
|
+
|
|
37
|
+
<p align="center">
|
|
38
|
+
<a href="https://www.python.org/downloads/"><img src="https://img.shields.io/badge/python-3.11+-blue.svg" alt="Python 3.11+"></a>
|
|
39
|
+
<a href="https://opensource.org/licenses/MIT"><img src="https://img.shields.io/badge/License-MIT-green.svg" alt="License: MIT"></a>
|
|
40
|
+
<a href="https://modelcontextprotocol.io/"><img src="https://img.shields.io/badge/MCP-Compatible-purple.svg" alt="MCP"></a>
|
|
41
|
+
<a href="https://github.com/adityasasidhar/browsercontrol"><img src="https://img.shields.io/github/stars/adityasasidhar/browsercontrol?style=social" alt="GitHub Stars"></a>
|
|
42
|
+
</p>
|
|
43
|
+
|
|
44
|
+
<p align="center">
|
|
45
|
+
<a href="#-quick-start">Quick Start</a> β’
|
|
46
|
+
<a href="#-features">Features</a> β’
|
|
47
|
+
<a href="#-available-tools">Tools</a> β’
|
|
48
|
+
<a href="#%EF%B8%8F-configuration">Configuration</a> β’
|
|
49
|
+
<a href="#-examples">Examples</a>
|
|
50
|
+
</p>
|
|
51
|
+
|
|
52
|
+
---
|
|
53
|
+
|
|
54
|
+
Ever wished Claude, Gemini, or your custom AI agent could actually browse the web? Not just fetch URLs, but truly **see**, **click**, **type**, and **interact** with any website like a human?
|
|
55
|
+
|
|
56
|
+
**BrowserControl** is an MCP server that gives your AI agent full browser access with a **vision-first approach** inspired by Google's AntiGravity IDE.
|
|
57
|
+
|
|
58
|
+
## β¨ What Makes This Different
|
|
59
|
+
|
|
60
|
+
| Traditional Web Access | BrowserControl |
|
|
61
|
+
|------------------------|----------------|
|
|
62
|
+
| Fetch static HTML | See the **rendered page** |
|
|
63
|
+
| Parse complex DOM | Point at **numbered elements** |
|
|
64
|
+
| Guess at selectors | Just say **"click 5"** |
|
|
65
|
+
| No JavaScript support | Full **dynamic content** |
|
|
66
|
+
| No login persistence | **Persistent sessions** |
|
|
67
|
+
| No debugging tools | **Console, Network, Errors** |
|
|
68
|
+
|
|
69
|
+
### π― The Secret: Set of Marks (SoM)
|
|
70
|
+
|
|
71
|
+
Every screenshot comes annotated with **numbered red boxes** on interactive elements:
|
|
72
|
+
|
|
73
|
+
```
|
|
74
|
+
Found 15 interactive elements:
|
|
75
|
+
[1] button - Sign In
|
|
76
|
+
[2] input - Search...
|
|
77
|
+
[3] a - Products
|
|
78
|
+
[4] a - Pricing
|
|
79
|
+
[5] button - Get Started
|
|
80
|
+
```
|
|
81
|
+
|
|
82
|
+
Your agent sees the numbers and simply calls `click(1)` to sign in. **No CSS selectors. No XPath. No guessing.**
|
|
83
|
+
|
|
84
|
+
---
|
|
85
|
+
|
|
86
|
+
## π Why BrowserControl Beats Every Alternative
|
|
87
|
+
|
|
88
|
+
### Head-to-Head Comparison
|
|
89
|
+
|
|
90
|
+
| Feature | **BrowserControl** | Playwright MCP | Stagehand | Browser-Use | AgentQL |
|
|
91
|
+
|---------|:------------------:|:--------------:|:---------:|:-----------:|:-------:|
|
|
92
|
+
| **Vision-First (SoM)** | β
Numbered boxes | β Text tree | β οΈ AI vision | β οΈ AI vision | β Selectors |
|
|
93
|
+
| **No Extra AI Calls** | β
Zero | β Parses tree | β GPT-4V per action | β Vision model | β Query model |
|
|
94
|
+
| **Developer Tools** | β
6 tools | β None | β None | β None | β None |
|
|
95
|
+
| **Session Recording** | β
Built-in | β Manual | β None | β None | β None |
|
|
96
|
+
| **Persistent Sessions** | β
Automatic | β οΈ Manual setup | β None | β None | β None |
|
|
97
|
+
| **MCP Native** | β
FastMCP | β
Official | β Python SDK | β οΈ Custom | β REST API |
|
|
98
|
+
| **Install Complexity** | β
`pip install` | β οΈ npx + config | β Docker + setup | β οΈ Docker | β Cloud signup |
|
|
99
|
+
| **Token Efficiency** | β
Tiny IDs | β οΈ Large tree | β Full images | β Full images | β οΈ Query results |
|
|
100
|
+
| **Cost per Action** | β
$0 | β
$0 | β ~$0.01-0.05 | β ~$0.01-0.05 | β API fees |
|
|
101
|
+
| **Offline/Local** | β
100% local | β
Local | β οΈ Needs LLM API | β οΈ Needs LLM API | β Cloud only |
|
|
102
|
+
|
|
103
|
+
### π― Key Advantages
|
|
104
|
+
|
|
105
|
+
#### 1. **Token Efficiency = Faster + Cheaper**
|
|
106
|
+
|
|
107
|
+
```
|
|
108
|
+
Other tools send: BrowserControl sends:
|
|
109
|
+
βββββββββββββββββββ βββββββββββββββββββββ
|
|
110
|
+
Full DOM tree "click(5)"
|
|
111
|
+
(5,000+ tokens) (3 tokens)
|
|
112
|
+
or
|
|
113
|
+
Base64 screenshot Element ID + summary
|
|
114
|
+
(10,000+ tokens) (100 tokens)
|
|
115
|
+
```
|
|
116
|
+
|
|
117
|
+
**Result**: 50-100x fewer tokens per action = faster responses, lower costs.
|
|
118
|
+
|
|
119
|
+
#### 2. **No Extra AI Calls Required**
|
|
120
|
+
|
|
121
|
+
| Tool | AI Calls per Click |
|
|
122
|
+
|------|-------------------|
|
|
123
|
+
| **BrowserControl** | 0 (just `click(5)`) |
|
|
124
|
+
| Stagehand | 1-2 (vision + action) |
|
|
125
|
+
| Browser-Use | 1-2 (vision + planning) |
|
|
126
|
+
| AgentQL | 1 (query interpretation) |
|
|
127
|
+
|
|
128
|
+
**Result**: No vision API costs, no rate limits, works offline.
|
|
129
|
+
|
|
130
|
+
#### 3. **Developer Tools No One Else Has**
|
|
131
|
+
|
|
132
|
+
```python
|
|
133
|
+
# Only BrowserControl can do this:
|
|
134
|
+
get_console_logs() # See browser errors
|
|
135
|
+
get_network_requests() # Monitor API calls
|
|
136
|
+
get_page_errors() # Catch JS exceptions
|
|
137
|
+
run_in_console(code) # Debug in real-time
|
|
138
|
+
inspect_element(5) # Get computed styles
|
|
139
|
+
get_page_performance() # Core Web Vitals
|
|
140
|
+
```
|
|
141
|
+
|
|
142
|
+
**Other tools**: Navigate, click, type... that's it.
|
|
143
|
+
|
|
144
|
+
#### 4. **Session Recording Built-In**
|
|
145
|
+
|
|
146
|
+
```
|
|
147
|
+
start_recording() β Browse around β stop_recording()
|
|
148
|
+
β
|
|
149
|
+
πΉ session_20260108.zip
|
|
150
|
+
(View with Playwright trace viewer)
|
|
151
|
+
```
|
|
152
|
+
|
|
153
|
+
**Other tools**: No recording. Debug from memory.
|
|
154
|
+
|
|
155
|
+
#### 5. **True Persistence**
|
|
156
|
+
|
|
157
|
+
| What Persists | BrowserControl | Others |
|
|
158
|
+
|---------------|:--------------:|:------:|
|
|
159
|
+
| Cookies | β
| β |
|
|
160
|
+
| localStorage | β
| β |
|
|
161
|
+
| Session tokens | β
| β |
|
|
162
|
+
| Login state | β
| β |
|
|
163
|
+
| Browser history | β
| β |
|
|
164
|
+
|
|
165
|
+
**Result**: Log in once, stay logged in across sessions.
|
|
166
|
+
|
|
167
|
+
#### 6. **Simpler Mental Model**
|
|
168
|
+
|
|
169
|
+
```
|
|
170
|
+
β Other tools:
|
|
171
|
+
"Find the button with class 'btn-primary' that contains text 'Submit'
|
|
172
|
+
and is a descendant of form#contact-form..."
|
|
173
|
+
|
|
174
|
+
β
BrowserControl:
|
|
175
|
+
"click(7)"
|
|
176
|
+
```
|
|
177
|
+
|
|
178
|
+
### π Real-World Performance
|
|
179
|
+
|
|
180
|
+
| Scenario | BrowserControl | Vision-Based Tools |
|
|
181
|
+
|----------|:--------------:|:------------------:|
|
|
182
|
+
| Click a button | ~50ms | ~2-5 seconds |
|
|
183
|
+
| Fill a form (5 fields) | ~500ms | ~15-30 seconds |
|
|
184
|
+
| Navigate + act | ~1 second | ~5-10 seconds |
|
|
185
|
+
| Debug console errors | β
Instant | β Not possible |
|
|
186
|
+
|
|
187
|
+
### π° Cost Comparison (1000 actions/month)
|
|
188
|
+
|
|
189
|
+
| Tool | Monthly Cost |
|
|
190
|
+
|------|-------------|
|
|
191
|
+
| **BrowserControl** | **$0** (fully local) |
|
|
192
|
+
| Stagehand (GPT-4V) | ~$30-50 |
|
|
193
|
+
| Browser-Use (Claude Vision) | ~$20-40 |
|
|
194
|
+
| AgentQL | ~$50+ (API fees) |
|
|
195
|
+
|
|
196
|
+
---
|
|
197
|
+
|
|
198
|
+
## π Quick Start
|
|
199
|
+
|
|
200
|
+
### Installation
|
|
201
|
+
|
|
202
|
+
```bash
|
|
203
|
+
# Install with pip
|
|
204
|
+
pip install browsercontrol
|
|
205
|
+
|
|
206
|
+
# Or with uv (recommended)
|
|
207
|
+
uv add browsercontrol
|
|
208
|
+
|
|
209
|
+
# That's it! Chromium is auto-installed on first run
|
|
210
|
+
```
|
|
211
|
+
|
|
212
|
+
### Run the Server
|
|
213
|
+
|
|
214
|
+
```bash
|
|
215
|
+
# Using the CLI
|
|
216
|
+
browsercontrol
|
|
217
|
+
|
|
218
|
+
# Or as a module
|
|
219
|
+
python -m browsercontrol
|
|
220
|
+
|
|
221
|
+
# Or with FastMCP
|
|
222
|
+
fastmcp run browsercontrol.server:mcp
|
|
223
|
+
```
|
|
224
|
+
|
|
225
|
+
### Connect to Claude Desktop
|
|
226
|
+
|
|
227
|
+
Add to `~/.config/Claude/claude_desktop_config.json`:
|
|
228
|
+
|
|
229
|
+
```json
|
|
230
|
+
{
|
|
231
|
+
"mcpServers": {
|
|
232
|
+
"browsercontrol": {
|
|
233
|
+
"command": "browsercontrol"
|
|
234
|
+
}
|
|
235
|
+
}
|
|
236
|
+
}
|
|
237
|
+
```
|
|
238
|
+
|
|
239
|
+
Then just ask Claude:
|
|
240
|
+
|
|
241
|
+
> *"Go to GitHub and star the browsercontrol repo"*
|
|
242
|
+
|
|
243
|
+
Claude will navigate, find the star button, and click itβshowing you screenshots along the way!
|
|
244
|
+
|
|
245
|
+
---
|
|
246
|
+
|
|
247
|
+
## π― Features
|
|
248
|
+
|
|
249
|
+
### 1. Set of Marks (SoM) - Vision-First Interaction
|
|
250
|
+
|
|
251
|
+
Every action returns an annotated screenshot with numbered elements. Your AI agent can:
|
|
252
|
+
- **See** the page exactly as a human would
|
|
253
|
+
- **Identify** clickable elements by number
|
|
254
|
+
- **Act** with simple commands like `click(5)`
|
|
255
|
+
|
|
256
|
+
### 2. π§ Developer Tools
|
|
257
|
+
|
|
258
|
+
Built-in debugging tools for web development:
|
|
259
|
+
|
|
260
|
+
| Tool | Description |
|
|
261
|
+
|------|-------------|
|
|
262
|
+
| `get_console_logs()` | Capture browser console (errors, warnings, logs) |
|
|
263
|
+
| `get_network_requests()` | Monitor API calls, status codes, timing |
|
|
264
|
+
| `get_page_errors()` | See JavaScript exceptions and crashes |
|
|
265
|
+
| `run_in_console(code)` | Execute JS in browser console |
|
|
266
|
+
| `inspect_element(id)` | Get computed styles, dimensions, properties |
|
|
267
|
+
| `get_page_performance()` | Page load time, Core Web Vitals, memory |
|
|
268
|
+
|
|
269
|
+
### 3. π¬ Session Recording
|
|
270
|
+
|
|
271
|
+
Record browser sessions for debugging and documentation:
|
|
272
|
+
|
|
273
|
+
| Tool | Description |
|
|
274
|
+
|------|-------------|
|
|
275
|
+
| `start_recording()` | Begin recording the session |
|
|
276
|
+
| `stop_recording()` | Save recording (Playwright trace format) |
|
|
277
|
+
| `take_snapshot()` | Save screenshot + HTML + URL |
|
|
278
|
+
| `list_recordings()` | View all saved sessions |
|
|
279
|
+
|
|
280
|
+
View recordings with:
|
|
281
|
+
```bash
|
|
282
|
+
npx playwright show-trace ~/.browsercontrol/recordings/session.zip
|
|
283
|
+
```
|
|
284
|
+
|
|
285
|
+
### 4. πΎ Persistent Sessions
|
|
286
|
+
|
|
287
|
+
- Cookies, localStorage, and session data persist across restarts
|
|
288
|
+
- Stay logged into websites
|
|
289
|
+
- Maintain shopping carts, preferences, etc.
|
|
290
|
+
|
|
291
|
+
---
|
|
292
|
+
|
|
293
|
+
## π οΈ Available Tools
|
|
294
|
+
|
|
295
|
+
### Navigation
|
|
296
|
+
| Tool | Description |
|
|
297
|
+
|------|-------------|
|
|
298
|
+
| `navigate_to(url)` | Go to a URL |
|
|
299
|
+
| `go_back()` | Navigate back |
|
|
300
|
+
| `go_forward()` | Navigate forward |
|
|
301
|
+
| `refresh_page()` | Reload the page |
|
|
302
|
+
| `scroll(direction, amount)` | Scroll the page |
|
|
303
|
+
|
|
304
|
+
### Interaction
|
|
305
|
+
| Tool | Description |
|
|
306
|
+
|------|-------------|
|
|
307
|
+
| `click(element_id)` | Click element by number |
|
|
308
|
+
| `click_at(x, y)` | Click at coordinates |
|
|
309
|
+
| `type_text(element_id, text)` | Type into input |
|
|
310
|
+
| `press_key(key)` | Press keyboard key (Enter, Tab, etc.) |
|
|
311
|
+
| `hover(element_id)` | Hover over element |
|
|
312
|
+
| `scroll_to_element(element_id)` | Scroll element into view |
|
|
313
|
+
| `wait(seconds)` | Wait for loading |
|
|
314
|
+
|
|
315
|
+
### Forms
|
|
316
|
+
| Tool | Description |
|
|
317
|
+
|------|-------------|
|
|
318
|
+
| `select_option(element_id, option)` | Select dropdown option |
|
|
319
|
+
| `check_checkbox(element_id)` | Toggle checkbox |
|
|
320
|
+
|
|
321
|
+
### Content
|
|
322
|
+
| Tool | Description |
|
|
323
|
+
|------|-------------|
|
|
324
|
+
| `get_page_content()` | Get page as markdown |
|
|
325
|
+
| `get_text(element_id)` | Get element text |
|
|
326
|
+
| `get_page_info()` | Get URL and title |
|
|
327
|
+
| `run_javascript(script)` | Execute JavaScript |
|
|
328
|
+
| `screenshot(annotate, full_page)` | Take screenshot |
|
|
329
|
+
|
|
330
|
+
### Developer Tools
|
|
331
|
+
| Tool | Description |
|
|
332
|
+
|------|-------------|
|
|
333
|
+
| `get_console_logs()` | Browser console output |
|
|
334
|
+
| `get_network_requests()` | API calls and responses |
|
|
335
|
+
| `get_page_errors()` | JavaScript errors |
|
|
336
|
+
| `run_in_console(code)` | Execute JS in console |
|
|
337
|
+
| `inspect_element(id)` | Element styles/properties |
|
|
338
|
+
| `get_page_performance()` | Load times, Web Vitals |
|
|
339
|
+
|
|
340
|
+
### Recording
|
|
341
|
+
| Tool | Description |
|
|
342
|
+
|------|-------------|
|
|
343
|
+
| `start_recording()` | Begin session recording |
|
|
344
|
+
| `stop_recording()` | Save recording |
|
|
345
|
+
| `take_snapshot()` | Save screenshot + HTML |
|
|
346
|
+
| `list_recordings()` | View saved sessions |
|
|
347
|
+
|
|
348
|
+
---
|
|
349
|
+
|
|
350
|
+
## βοΈ Configuration
|
|
351
|
+
|
|
352
|
+
Configure via environment variables:
|
|
353
|
+
|
|
354
|
+
| Variable | Default | Description |
|
|
355
|
+
|----------|---------|-------------|
|
|
356
|
+
| `BROWSER_HEADLESS` | `true` | Run without visible window |
|
|
357
|
+
| `BROWSER_VIEWPORT_WIDTH` | `1280` | Viewport width in pixels |
|
|
358
|
+
| `BROWSER_VIEWPORT_HEIGHT` | `720` | Viewport height in pixels |
|
|
359
|
+
| `BROWSER_TIMEOUT` | `30000` | Navigation timeout (ms) |
|
|
360
|
+
| `BROWSER_USER_DATA_DIR` | `~/.browsercontrol/user_data` | Browser profile path |
|
|
361
|
+
| `BROWSER_EXTENSION_PATH` | - | Path to browser extension |
|
|
362
|
+
| `LOG_LEVEL` | `INFO` | Logging level (DEBUG, INFO, WARNING, ERROR) |
|
|
363
|
+
|
|
364
|
+
### Examples
|
|
365
|
+
|
|
366
|
+
```bash
|
|
367
|
+
# Run with visible browser (for debugging)
|
|
368
|
+
BROWSER_HEADLESS=false browsercontrol
|
|
369
|
+
|
|
370
|
+
# Custom viewport for mobile testing
|
|
371
|
+
BROWSER_VIEWPORT_WIDTH=375 BROWSER_VIEWPORT_HEIGHT=812 browsercontrol
|
|
372
|
+
|
|
373
|
+
# Verbose logging
|
|
374
|
+
LOG_LEVEL=DEBUG browsercontrol
|
|
375
|
+
```
|
|
376
|
+
|
|
377
|
+
---
|
|
378
|
+
|
|
379
|
+
## π Examples
|
|
380
|
+
|
|
381
|
+
### Example 1: Web Research
|
|
382
|
+
|
|
383
|
+
```
|
|
384
|
+
You: "Go to Wikipedia and find out when Python was created"
|
|
385
|
+
|
|
386
|
+
Claude: *navigates to wikipedia.org*
|
|
387
|
+
*types "Python programming language" in search*
|
|
388
|
+
*clicks search result*
|
|
389
|
+
*reads the page*
|
|
390
|
+
|
|
391
|
+
"Python was created by Guido van Rossum and first released in 1991."
|
|
392
|
+
```
|
|
393
|
+
|
|
394
|
+
### Example 2: Debug a Web App
|
|
395
|
+
|
|
396
|
+
```
|
|
397
|
+
You: "Go to my app at localhost:3000 and check for any errors"
|
|
398
|
+
|
|
399
|
+
Claude: *navigates to localhost:3000*
|
|
400
|
+
*calls get_console_logs()*
|
|
401
|
+
|
|
402
|
+
"I found 2 errors:
|
|
403
|
+
[ERROR] Uncaught TypeError: Cannot read property 'map' of undefined
|
|
404
|
+
[ERROR] Failed to load resource: 404 /api/users"
|
|
405
|
+
|
|
406
|
+
*calls get_network_requests()*
|
|
407
|
+
|
|
408
|
+
"The /api/users endpoint returned 404. Check your API route."
|
|
409
|
+
```
|
|
410
|
+
|
|
411
|
+
### Example 3: Automated Testing
|
|
412
|
+
|
|
413
|
+
```
|
|
414
|
+
You: "Test the login flow on my site. Start recording."
|
|
415
|
+
|
|
416
|
+
Claude: *calls start_recording()*
|
|
417
|
+
*navigates to site*
|
|
418
|
+
*types email in element 3*
|
|
419
|
+
*types password in element 4*
|
|
420
|
+
*clicks Login button (element 5)*
|
|
421
|
+
*verifies successful redirect*
|
|
422
|
+
*calls stop_recording()*
|
|
423
|
+
|
|
424
|
+
"Login test passed! Recording saved to session_20260108.zip"
|
|
425
|
+
```
|
|
426
|
+
|
|
427
|
+
### Example 4: Form Automation
|
|
428
|
+
|
|
429
|
+
```
|
|
430
|
+
You: "Fill out the contact form on example.com"
|
|
431
|
+
|
|
432
|
+
Claude: *navigates to example.com/contact*
|
|
433
|
+
*types name in element 2*
|
|
434
|
+
*types email in element 3*
|
|
435
|
+
*types message in element 4*
|
|
436
|
+
*clicks Submit (element 5)*
|
|
437
|
+
|
|
438
|
+
"Form submitted successfully!"
|
|
439
|
+
```
|
|
440
|
+
|
|
441
|
+
---
|
|
442
|
+
|
|
443
|
+
## ποΈ Architecture
|
|
444
|
+
|
|
445
|
+
```
|
|
446
|
+
βββββββββββββββββββ ββββββββββββββββββββ βββββββββββββββ
|
|
447
|
+
β AI Agent ββββββΆβ BrowserControl ββββββΆβ Browser β
|
|
448
|
+
β (Claude/Gemini) βββββββ MCP Server βββββββ (Chromium) β
|
|
449
|
+
βββββββββββββββββββ ββββββββββββββββββββ βββββββββββββββ
|
|
450
|
+
β β β
|
|
451
|
+
β "click(5)" β mouse.click() β
|
|
452
|
+
βββββββββββββββββββββββββββββββββββββββββββββββββ
|
|
453
|
+
β [annotated β [screenshot + β
|
|
454
|
+
β screenshot] β element map] β
|
|
455
|
+
```
|
|
456
|
+
|
|
457
|
+
### How It Works
|
|
458
|
+
|
|
459
|
+
1. **AI sends command**: `click(5)`
|
|
460
|
+
2. **Server finds element**: Looks up element #5 from the last screenshot
|
|
461
|
+
3. **Browser acts**: Clicks at the element's coordinates
|
|
462
|
+
4. **Capture state**: Takes new screenshot, detects elements
|
|
463
|
+
5. **Annotate**: Draws numbered boxes on interactive elements
|
|
464
|
+
6. **Return to AI**: Sends annotated image + element list
|
|
465
|
+
|
|
466
|
+
---
|
|
467
|
+
|
|
468
|
+
## π¦ Project Structure
|
|
469
|
+
|
|
470
|
+
```
|
|
471
|
+
browsercontrol/
|
|
472
|
+
βββ __init__.py # Package exports
|
|
473
|
+
βββ __main__.py # CLI entry point
|
|
474
|
+
βββ server.py # MCP server setup
|
|
475
|
+
βββ browser.py # BrowserManager with SoM
|
|
476
|
+
βββ config.py # Environment configuration
|
|
477
|
+
βββ tools/
|
|
478
|
+
βββ navigation.py # Navigation tools
|
|
479
|
+
βββ interaction.py # Click, type, hover tools
|
|
480
|
+
βββ forms.py # Form handling tools
|
|
481
|
+
βββ content.py # Content extraction tools
|
|
482
|
+
βββ devtools.py # Developer tools
|
|
483
|
+
βββ recording.py # Session recording tools
|
|
484
|
+
```
|
|
485
|
+
|
|
486
|
+
---
|
|
487
|
+
|
|
488
|
+
## π§ Troubleshooting
|
|
489
|
+
|
|
490
|
+
### "Missing X server" Error
|
|
491
|
+
|
|
492
|
+
Set `BROWSER_HEADLESS=true` or run with xvfb:
|
|
493
|
+
```bash
|
|
494
|
+
xvfb-run browsercontrol
|
|
495
|
+
```
|
|
496
|
+
|
|
497
|
+
### Browser Not Starting
|
|
498
|
+
|
|
499
|
+
Chromium auto-installs on first run. If it fails, install manually:
|
|
500
|
+
```bash
|
|
501
|
+
python -m playwright install chromium
|
|
502
|
+
```
|
|
503
|
+
|
|
504
|
+
### Session Not Persisting
|
|
505
|
+
|
|
506
|
+
Check that `BROWSER_USER_DATA_DIR` is writable:
|
|
507
|
+
```bash
|
|
508
|
+
ls -la ~/.browsercontrol/
|
|
509
|
+
```
|
|
510
|
+
|
|
511
|
+
### Connection Refused
|
|
512
|
+
|
|
513
|
+
Ensure no other instance is running:
|
|
514
|
+
```bash
|
|
515
|
+
pkill -f browsercontrol
|
|
516
|
+
browsercontrol
|
|
517
|
+
```
|
|
518
|
+
|
|
519
|
+
---
|
|
520
|
+
|
|
521
|
+
## π€ Contributing
|
|
522
|
+
|
|
523
|
+
Contributions are welcome! Some ideas:
|
|
524
|
+
|
|
525
|
+
- [ ] Multi-tab support
|
|
526
|
+
- [ ] Firefox/WebKit support
|
|
527
|
+
- [ ] DOM diffing (detect changes)
|
|
528
|
+
- [ ] Accessibility audit
|
|
529
|
+
- [ ] Mobile emulation presets
|
|
530
|
+
- [ ] Cookie import/export
|
|
531
|
+
|
|
532
|
+
```bash
|
|
533
|
+
# Clone and install
|
|
534
|
+
git clone https://github.com/adityasasidhar/browsercontrol
|
|
535
|
+
cd browsercontrol
|
|
536
|
+
uv sync
|
|
537
|
+
|
|
538
|
+
# Run tests
|
|
539
|
+
uv run pytest
|
|
540
|
+
|
|
541
|
+
# Run in development
|
|
542
|
+
uv run fastmcp dev browsercontrol/server.py
|
|
543
|
+
```
|
|
544
|
+
|
|
545
|
+
---
|
|
546
|
+
|
|
547
|
+
## π License
|
|
548
|
+
|
|
549
|
+
MIT License - Use it however you want.
|
|
550
|
+
|
|
551
|
+
---
|
|
552
|
+
|
|
553
|
+
## π Acknowledgments
|
|
554
|
+
|
|
555
|
+
- Inspired by the browser control capabilities in **Google's AntiGravity IDE**
|
|
556
|
+
- Built with [FastMCP](https://gofastmcp.com) and [Playwright](https://playwright.dev)
|
|
557
|
+
- Thanks to the MCP community for making AI-tool integration accessible
|
|
558
|
+
|
|
559
|
+
---
|
|
560
|
+
|
|
561
|
+
<p align="center">
|
|
562
|
+
<strong>Built with β€οΈ for the AI agent community.</strong>
|
|
563
|
+
</p>
|
|
564
|
+
|
|
565
|
+
<p align="center">
|
|
566
|
+
<a href="https://github.com/adityasasidhar/browsercontrol">β Star on GitHub</a> β’
|
|
567
|
+
<a href="https://github.com/adityasasidhar/browsercontrol/issues">Report Bug</a> β’
|
|
568
|
+
<a href="https://github.com/adityasasidhar/browsercontrol/issues">Request Feature</a>
|
|
569
|
+
</p>
|