browsercontrol 0.1.1__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,18 @@
1
+ # IDE
2
+ .idea/
3
+ .vscode/
4
+ *.iml
5
+
6
+ # Python
7
+ */__pycache__/*
8
+ *.py[cod]
9
+ *$py.class
10
+ *.so
11
+ .Python
12
+ *.egg-info/
13
+ dist/
14
+ build/
15
+
16
+ # Virtual environments
17
+ .venv/
18
+
@@ -0,0 +1,84 @@
1
+ # Contributing to BrowserControl
2
+
3
+ We love your input! We want to make contributing to BrowserControl as easy and transparent as possible, whether it's:
4
+
5
+ - Reporting a bug
6
+ - Discussing the current state of the code
7
+ - Submitting a fix
8
+ - Proposing new features
9
+ - Becoming a maintainer
10
+
11
+ ## πŸš€ Quick Start for Contributors
12
+
13
+ 1. **Fork the repo** and clone it locally
14
+ 2. **Install `uv`** (our package manager):
15
+ ```bash
16
+ curl -LsSf https://astral.sh/uv/install.sh | sh
17
+ ```
18
+ 3. **Install dependencies**:
19
+ ```bash
20
+ uv sync
21
+ ```
22
+ 4. **Install Playwright browsers**:
23
+ ```bash
24
+ uv run playwright install chromium
25
+ ```
26
+ 5. **Run the server in dev mode**:
27
+ ```bash
28
+ uv run fastmcp dev browsercontrol/server.py
29
+ ```
30
+
31
+ ## πŸ› οΈ Development Workflow
32
+
33
+ We use [uv](https://github.com/astral-sh/uv) for dependency management and packaging. It's fast and reliable.
34
+
35
+ ### Project Structure
36
+ - `browsercontrol/server.py`: Main MCP server definition
37
+ - `browsercontrol/browser.py`: Core logic (Playwright + Set of Marks)
38
+ - `browsercontrol/tools/`: Tool implementations split by category
39
+
40
+ ### Making Changes
41
+ 1. Create a branch for your feature: `git checkout -b feature/amazing-feature`
42
+ 2. Implement your changes
43
+ 3. Run tests (see below)
44
+ 4. Commit your changes. We like [Conventional Commits](https://www.conventionalcommits.org/).
45
+ - `feat: add new scrolling tool`
46
+ - `fix: handle localhost connection refused`
47
+ - `docs: update troubleshooting guide`
48
+
49
+ ## πŸ§ͺ Testing
50
+
51
+ We use `pytest`. Please ensure all tests pass before submitting a PR.
52
+
53
+ ```bash
54
+ # Run all tests
55
+ uv run pytest
56
+
57
+ # Run specific test file
58
+ uv run pytest tests/test_navigation.py
59
+ ```
60
+
61
+ If you add a new tool or feature, please add a corresponding test case covering:
62
+ - Happy path (it works)
63
+ - Error handling (it fails gracefully)
64
+
65
+ ## πŸ“ Pull Request Process
66
+
67
+ 1. Update the README.md with details of changes to the interface, this includes new environment variables, exposed ports, useful file locations and container parameters.
68
+ 2. Increase the version numbers in any examples files and the README.md to the new version that this Pull Request would represent.
69
+ 3. You may merge the Pull Request in once you have the sign-off of two other developers, or if you do not have permission to do that, you may request the second reviewer to merge it for you.
70
+
71
+ ## πŸ› Reporting Bugs
72
+
73
+ Bugs are tracked as GitHub issues. When filing an issue, please explain the problem and include additional details to help maintainers reproduce the problem:
74
+
75
+ - Use a clear and descriptive title
76
+ - Describe the exact steps which reproduce the problem
77
+ - Provide specific examples to demonstrate the steps
78
+ - Describe the behavior you observed after following the steps
79
+ - Explain which behavior you expected to see instead and why
80
+ - Include screenshots/logs if possible
81
+
82
+ ## πŸ“„ License
83
+
84
+ By contributing, you agree that your contributions will be licensed under its MIT License.
@@ -0,0 +1,21 @@
1
+ MIT License
2
+
3
+ Copyright (c) 2026 Aditya Sasidhar
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in all
13
+ copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21
+ SOFTWARE.
@@ -0,0 +1,569 @@
1
+ Metadata-Version: 2.4
2
+ Name: browsercontrol
3
+ Version: 0.1.1
4
+ Summary: MCP server for browser automation with Set of Marks (SoM) - AI agents can see and interact with web pages using numbered element IDs
5
+ Project-URL: Homepage, https://github.com/adityasasidhar/browsercontrol
6
+ Project-URL: Repository, https://github.com/adityasasidhar/browsercontrol
7
+ Author: Aditya Sasidhar
8
+ License: MIT
9
+ License-File: LICENSE
10
+ Keywords: agent,ai,automation,browser,llm,mcp,playwright
11
+ Classifier: Development Status :: 4 - Beta
12
+ Classifier: Intended Audience :: Developers
13
+ Classifier: License :: OSI Approved :: MIT License
14
+ Classifier: Programming Language :: Python :: 3
15
+ Classifier: Programming Language :: Python :: 3.11
16
+ Classifier: Programming Language :: Python :: 3.12
17
+ Classifier: Programming Language :: Python :: 3.13
18
+ Classifier: Topic :: Internet :: WWW/HTTP :: Browsers
19
+ Classifier: Topic :: Software Development :: Libraries :: Python Modules
20
+ Requires-Python: >=3.11
21
+ Requires-Dist: fastmcp>=2.14.2
22
+ Requires-Dist: markdownify>=0.14.1
23
+ Requires-Dist: pillow>=11.0.0
24
+ Requires-Dist: playwright>=1.49.0
25
+ Description-Content-Type: text/markdown
26
+
27
+ <p align="center">
28
+ <img src="https://raw.githubusercontent.com/adityasasidhar/browsercontrol/main/assets/logo.png" alt="BrowserControl" width="120">
29
+ </p>
30
+
31
+ <h1 align="center">🌐 BrowserControl</h1>
32
+
33
+ <p align="center">
34
+ <strong>Give your AI agent real browser superpowers.</strong>
35
+ </p>
36
+
37
+ <p align="center">
38
+ <a href="https://www.python.org/downloads/"><img src="https://img.shields.io/badge/python-3.11+-blue.svg" alt="Python 3.11+"></a>
39
+ <a href="https://opensource.org/licenses/MIT"><img src="https://img.shields.io/badge/License-MIT-green.svg" alt="License: MIT"></a>
40
+ <a href="https://modelcontextprotocol.io/"><img src="https://img.shields.io/badge/MCP-Compatible-purple.svg" alt="MCP"></a>
41
+ <a href="https://github.com/adityasasidhar/browsercontrol"><img src="https://img.shields.io/github/stars/adityasasidhar/browsercontrol?style=social" alt="GitHub Stars"></a>
42
+ </p>
43
+
44
+ <p align="center">
45
+ <a href="#-quick-start">Quick Start</a> β€’
46
+ <a href="#-features">Features</a> β€’
47
+ <a href="#-available-tools">Tools</a> β€’
48
+ <a href="#%EF%B8%8F-configuration">Configuration</a> β€’
49
+ <a href="#-examples">Examples</a>
50
+ </p>
51
+
52
+ ---
53
+
54
+ Ever wished Claude, Gemini, or your custom AI agent could actually browse the web? Not just fetch URLs, but truly **see**, **click**, **type**, and **interact** with any website like a human?
55
+
56
+ **BrowserControl** is an MCP server that gives your AI agent full browser access with a **vision-first approach** inspired by Google's AntiGravity IDE.
57
+
58
+ ## ✨ What Makes This Different
59
+
60
+ | Traditional Web Access | BrowserControl |
61
+ |------------------------|----------------|
62
+ | Fetch static HTML | See the **rendered page** |
63
+ | Parse complex DOM | Point at **numbered elements** |
64
+ | Guess at selectors | Just say **"click 5"** |
65
+ | No JavaScript support | Full **dynamic content** |
66
+ | No login persistence | **Persistent sessions** |
67
+ | No debugging tools | **Console, Network, Errors** |
68
+
69
+ ### 🎯 The Secret: Set of Marks (SoM)
70
+
71
+ Every screenshot comes annotated with **numbered red boxes** on interactive elements:
72
+
73
+ ```
74
+ Found 15 interactive elements:
75
+ [1] button - Sign In
76
+ [2] input - Search...
77
+ [3] a - Products
78
+ [4] a - Pricing
79
+ [5] button - Get Started
80
+ ```
81
+
82
+ Your agent sees the numbers and simply calls `click(1)` to sign in. **No CSS selectors. No XPath. No guessing.**
83
+
84
+ ---
85
+
86
+ ## πŸ† Why BrowserControl Beats Every Alternative
87
+
88
+ ### Head-to-Head Comparison
89
+
90
+ | Feature | **BrowserControl** | Playwright MCP | Stagehand | Browser-Use | AgentQL |
91
+ |---------|:------------------:|:--------------:|:---------:|:-----------:|:-------:|
92
+ | **Vision-First (SoM)** | βœ… Numbered boxes | ❌ Text tree | ⚠️ AI vision | ⚠️ AI vision | ❌ Selectors |
93
+ | **No Extra AI Calls** | βœ… Zero | ❌ Parses tree | ❌ GPT-4V per action | ❌ Vision model | ❌ Query model |
94
+ | **Developer Tools** | βœ… 6 tools | ❌ None | ❌ None | ❌ None | ❌ None |
95
+ | **Session Recording** | βœ… Built-in | ❌ Manual | ❌ None | ❌ None | ❌ None |
96
+ | **Persistent Sessions** | βœ… Automatic | ⚠️ Manual setup | ❌ None | ❌ None | ❌ None |
97
+ | **MCP Native** | βœ… FastMCP | βœ… Official | ❌ Python SDK | ⚠️ Custom | ❌ REST API |
98
+ | **Install Complexity** | βœ… `pip install` | ⚠️ npx + config | ❌ Docker + setup | ⚠️ Docker | ❌ Cloud signup |
99
+ | **Token Efficiency** | βœ… Tiny IDs | ⚠️ Large tree | ❌ Full images | ❌ Full images | ⚠️ Query results |
100
+ | **Cost per Action** | βœ… $0 | βœ… $0 | ❌ ~$0.01-0.05 | ❌ ~$0.01-0.05 | ❌ API fees |
101
+ | **Offline/Local** | βœ… 100% local | βœ… Local | ⚠️ Needs LLM API | ⚠️ Needs LLM API | ❌ Cloud only |
102
+
103
+ ### 🎯 Key Advantages
104
+
105
+ #### 1. **Token Efficiency = Faster + Cheaper**
106
+
107
+ ```
108
+ Other tools send: BrowserControl sends:
109
+ ─────────────────── ─────────────────────
110
+ Full DOM tree "click(5)"
111
+ (5,000+ tokens) (3 tokens)
112
+ or
113
+ Base64 screenshot Element ID + summary
114
+ (10,000+ tokens) (100 tokens)
115
+ ```
116
+
117
+ **Result**: 50-100x fewer tokens per action = faster responses, lower costs.
118
+
119
+ #### 2. **No Extra AI Calls Required**
120
+
121
+ | Tool | AI Calls per Click |
122
+ |------|-------------------|
123
+ | **BrowserControl** | 0 (just `click(5)`) |
124
+ | Stagehand | 1-2 (vision + action) |
125
+ | Browser-Use | 1-2 (vision + planning) |
126
+ | AgentQL | 1 (query interpretation) |
127
+
128
+ **Result**: No vision API costs, no rate limits, works offline.
129
+
130
+ #### 3. **Developer Tools No One Else Has**
131
+
132
+ ```python
133
+ # Only BrowserControl can do this:
134
+ get_console_logs() # See browser errors
135
+ get_network_requests() # Monitor API calls
136
+ get_page_errors() # Catch JS exceptions
137
+ run_in_console(code) # Debug in real-time
138
+ inspect_element(5) # Get computed styles
139
+ get_page_performance() # Core Web Vitals
140
+ ```
141
+
142
+ **Other tools**: Navigate, click, type... that's it.
143
+
144
+ #### 4. **Session Recording Built-In**
145
+
146
+ ```
147
+ start_recording() β†’ Browse around β†’ stop_recording()
148
+ ↓
149
+ πŸ“Ή session_20260108.zip
150
+ (View with Playwright trace viewer)
151
+ ```
152
+
153
+ **Other tools**: No recording. Debug from memory.
154
+
155
+ #### 5. **True Persistence**
156
+
157
+ | What Persists | BrowserControl | Others |
158
+ |---------------|:--------------:|:------:|
159
+ | Cookies | βœ… | ❌ |
160
+ | localStorage | βœ… | ❌ |
161
+ | Session tokens | βœ… | ❌ |
162
+ | Login state | βœ… | ❌ |
163
+ | Browser history | βœ… | ❌ |
164
+
165
+ **Result**: Log in once, stay logged in across sessions.
166
+
167
+ #### 6. **Simpler Mental Model**
168
+
169
+ ```
170
+ ❌ Other tools:
171
+ "Find the button with class 'btn-primary' that contains text 'Submit'
172
+ and is a descendant of form#contact-form..."
173
+
174
+ βœ… BrowserControl:
175
+ "click(7)"
176
+ ```
177
+
178
+ ### πŸ“Š Real-World Performance
179
+
180
+ | Scenario | BrowserControl | Vision-Based Tools |
181
+ |----------|:--------------:|:------------------:|
182
+ | Click a button | ~50ms | ~2-5 seconds |
183
+ | Fill a form (5 fields) | ~500ms | ~15-30 seconds |
184
+ | Navigate + act | ~1 second | ~5-10 seconds |
185
+ | Debug console errors | βœ… Instant | ❌ Not possible |
186
+
187
+ ### πŸ’° Cost Comparison (1000 actions/month)
188
+
189
+ | Tool | Monthly Cost |
190
+ |------|-------------|
191
+ | **BrowserControl** | **$0** (fully local) |
192
+ | Stagehand (GPT-4V) | ~$30-50 |
193
+ | Browser-Use (Claude Vision) | ~$20-40 |
194
+ | AgentQL | ~$50+ (API fees) |
195
+
196
+ ---
197
+
198
+ ## πŸš€ Quick Start
199
+
200
+ ### Installation
201
+
202
+ ```bash
203
+ # Install with pip
204
+ pip install browsercontrol
205
+
206
+ # Or with uv (recommended)
207
+ uv add browsercontrol
208
+
209
+ # That's it! Chromium is auto-installed on first run
210
+ ```
211
+
212
+ ### Run the Server
213
+
214
+ ```bash
215
+ # Using the CLI
216
+ browsercontrol
217
+
218
+ # Or as a module
219
+ python -m browsercontrol
220
+
221
+ # Or with FastMCP
222
+ fastmcp run browsercontrol.server:mcp
223
+ ```
224
+
225
+ ### Connect to Claude Desktop
226
+
227
+ Add to `~/.config/Claude/claude_desktop_config.json`:
228
+
229
+ ```json
230
+ {
231
+ "mcpServers": {
232
+ "browsercontrol": {
233
+ "command": "browsercontrol"
234
+ }
235
+ }
236
+ }
237
+ ```
238
+
239
+ Then just ask Claude:
240
+
241
+ > *"Go to GitHub and star the browsercontrol repo"*
242
+
243
+ Claude will navigate, find the star button, and click itβ€”showing you screenshots along the way!
244
+
245
+ ---
246
+
247
+ ## 🎯 Features
248
+
249
+ ### 1. Set of Marks (SoM) - Vision-First Interaction
250
+
251
+ Every action returns an annotated screenshot with numbered elements. Your AI agent can:
252
+ - **See** the page exactly as a human would
253
+ - **Identify** clickable elements by number
254
+ - **Act** with simple commands like `click(5)`
255
+
256
+ ### 2. πŸ”§ Developer Tools
257
+
258
+ Built-in debugging tools for web development:
259
+
260
+ | Tool | Description |
261
+ |------|-------------|
262
+ | `get_console_logs()` | Capture browser console (errors, warnings, logs) |
263
+ | `get_network_requests()` | Monitor API calls, status codes, timing |
264
+ | `get_page_errors()` | See JavaScript exceptions and crashes |
265
+ | `run_in_console(code)` | Execute JS in browser console |
266
+ | `inspect_element(id)` | Get computed styles, dimensions, properties |
267
+ | `get_page_performance()` | Page load time, Core Web Vitals, memory |
268
+
269
+ ### 3. 🎬 Session Recording
270
+
271
+ Record browser sessions for debugging and documentation:
272
+
273
+ | Tool | Description |
274
+ |------|-------------|
275
+ | `start_recording()` | Begin recording the session |
276
+ | `stop_recording()` | Save recording (Playwright trace format) |
277
+ | `take_snapshot()` | Save screenshot + HTML + URL |
278
+ | `list_recordings()` | View all saved sessions |
279
+
280
+ View recordings with:
281
+ ```bash
282
+ npx playwright show-trace ~/.browsercontrol/recordings/session.zip
283
+ ```
284
+
285
+ ### 4. πŸ’Ύ Persistent Sessions
286
+
287
+ - Cookies, localStorage, and session data persist across restarts
288
+ - Stay logged into websites
289
+ - Maintain shopping carts, preferences, etc.
290
+
291
+ ---
292
+
293
+ ## πŸ› οΈ Available Tools
294
+
295
+ ### Navigation
296
+ | Tool | Description |
297
+ |------|-------------|
298
+ | `navigate_to(url)` | Go to a URL |
299
+ | `go_back()` | Navigate back |
300
+ | `go_forward()` | Navigate forward |
301
+ | `refresh_page()` | Reload the page |
302
+ | `scroll(direction, amount)` | Scroll the page |
303
+
304
+ ### Interaction
305
+ | Tool | Description |
306
+ |------|-------------|
307
+ | `click(element_id)` | Click element by number |
308
+ | `click_at(x, y)` | Click at coordinates |
309
+ | `type_text(element_id, text)` | Type into input |
310
+ | `press_key(key)` | Press keyboard key (Enter, Tab, etc.) |
311
+ | `hover(element_id)` | Hover over element |
312
+ | `scroll_to_element(element_id)` | Scroll element into view |
313
+ | `wait(seconds)` | Wait for loading |
314
+
315
+ ### Forms
316
+ | Tool | Description |
317
+ |------|-------------|
318
+ | `select_option(element_id, option)` | Select dropdown option |
319
+ | `check_checkbox(element_id)` | Toggle checkbox |
320
+
321
+ ### Content
322
+ | Tool | Description |
323
+ |------|-------------|
324
+ | `get_page_content()` | Get page as markdown |
325
+ | `get_text(element_id)` | Get element text |
326
+ | `get_page_info()` | Get URL and title |
327
+ | `run_javascript(script)` | Execute JavaScript |
328
+ | `screenshot(annotate, full_page)` | Take screenshot |
329
+
330
+ ### Developer Tools
331
+ | Tool | Description |
332
+ |------|-------------|
333
+ | `get_console_logs()` | Browser console output |
334
+ | `get_network_requests()` | API calls and responses |
335
+ | `get_page_errors()` | JavaScript errors |
336
+ | `run_in_console(code)` | Execute JS in console |
337
+ | `inspect_element(id)` | Element styles/properties |
338
+ | `get_page_performance()` | Load times, Web Vitals |
339
+
340
+ ### Recording
341
+ | Tool | Description |
342
+ |------|-------------|
343
+ | `start_recording()` | Begin session recording |
344
+ | `stop_recording()` | Save recording |
345
+ | `take_snapshot()` | Save screenshot + HTML |
346
+ | `list_recordings()` | View saved sessions |
347
+
348
+ ---
349
+
350
+ ## βš™οΈ Configuration
351
+
352
+ Configure via environment variables:
353
+
354
+ | Variable | Default | Description |
355
+ |----------|---------|-------------|
356
+ | `BROWSER_HEADLESS` | `true` | Run without visible window |
357
+ | `BROWSER_VIEWPORT_WIDTH` | `1280` | Viewport width in pixels |
358
+ | `BROWSER_VIEWPORT_HEIGHT` | `720` | Viewport height in pixels |
359
+ | `BROWSER_TIMEOUT` | `30000` | Navigation timeout (ms) |
360
+ | `BROWSER_USER_DATA_DIR` | `~/.browsercontrol/user_data` | Browser profile path |
361
+ | `BROWSER_EXTENSION_PATH` | - | Path to browser extension |
362
+ | `LOG_LEVEL` | `INFO` | Logging level (DEBUG, INFO, WARNING, ERROR) |
363
+
364
+ ### Examples
365
+
366
+ ```bash
367
+ # Run with visible browser (for debugging)
368
+ BROWSER_HEADLESS=false browsercontrol
369
+
370
+ # Custom viewport for mobile testing
371
+ BROWSER_VIEWPORT_WIDTH=375 BROWSER_VIEWPORT_HEIGHT=812 browsercontrol
372
+
373
+ # Verbose logging
374
+ LOG_LEVEL=DEBUG browsercontrol
375
+ ```
376
+
377
+ ---
378
+
379
+ ## πŸ“š Examples
380
+
381
+ ### Example 1: Web Research
382
+
383
+ ```
384
+ You: "Go to Wikipedia and find out when Python was created"
385
+
386
+ Claude: *navigates to wikipedia.org*
387
+ *types "Python programming language" in search*
388
+ *clicks search result*
389
+ *reads the page*
390
+
391
+ "Python was created by Guido van Rossum and first released in 1991."
392
+ ```
393
+
394
+ ### Example 2: Debug a Web App
395
+
396
+ ```
397
+ You: "Go to my app at localhost:3000 and check for any errors"
398
+
399
+ Claude: *navigates to localhost:3000*
400
+ *calls get_console_logs()*
401
+
402
+ "I found 2 errors:
403
+ [ERROR] Uncaught TypeError: Cannot read property 'map' of undefined
404
+ [ERROR] Failed to load resource: 404 /api/users"
405
+
406
+ *calls get_network_requests()*
407
+
408
+ "The /api/users endpoint returned 404. Check your API route."
409
+ ```
410
+
411
+ ### Example 3: Automated Testing
412
+
413
+ ```
414
+ You: "Test the login flow on my site. Start recording."
415
+
416
+ Claude: *calls start_recording()*
417
+ *navigates to site*
418
+ *types email in element 3*
419
+ *types password in element 4*
420
+ *clicks Login button (element 5)*
421
+ *verifies successful redirect*
422
+ *calls stop_recording()*
423
+
424
+ "Login test passed! Recording saved to session_20260108.zip"
425
+ ```
426
+
427
+ ### Example 4: Form Automation
428
+
429
+ ```
430
+ You: "Fill out the contact form on example.com"
431
+
432
+ Claude: *navigates to example.com/contact*
433
+ *types name in element 2*
434
+ *types email in element 3*
435
+ *types message in element 4*
436
+ *clicks Submit (element 5)*
437
+
438
+ "Form submitted successfully!"
439
+ ```
440
+
441
+ ---
442
+
443
+ ## πŸ—οΈ Architecture
444
+
445
+ ```
446
+ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
447
+ β”‚ AI Agent │────▢│ BrowserControl │────▢│ Browser β”‚
448
+ β”‚ (Claude/Gemini) │◀────│ MCP Server │◀────│ (Chromium) β”‚
449
+ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
450
+ β”‚ β”‚ β”‚
451
+ β”‚ "click(5)" β”‚ mouse.click() β”‚
452
+ │◀───────────────────────│◀─────────────────────│
453
+ β”‚ [annotated β”‚ [screenshot + β”‚
454
+ β”‚ screenshot] β”‚ element map] β”‚
455
+ ```
456
+
457
+ ### How It Works
458
+
459
+ 1. **AI sends command**: `click(5)`
460
+ 2. **Server finds element**: Looks up element #5 from the last screenshot
461
+ 3. **Browser acts**: Clicks at the element's coordinates
462
+ 4. **Capture state**: Takes new screenshot, detects elements
463
+ 5. **Annotate**: Draws numbered boxes on interactive elements
464
+ 6. **Return to AI**: Sends annotated image + element list
465
+
466
+ ---
467
+
468
+ ## πŸ“¦ Project Structure
469
+
470
+ ```
471
+ browsercontrol/
472
+ β”œβ”€β”€ __init__.py # Package exports
473
+ β”œβ”€β”€ __main__.py # CLI entry point
474
+ β”œβ”€β”€ server.py # MCP server setup
475
+ β”œβ”€β”€ browser.py # BrowserManager with SoM
476
+ β”œβ”€β”€ config.py # Environment configuration
477
+ └── tools/
478
+ β”œβ”€β”€ navigation.py # Navigation tools
479
+ β”œβ”€β”€ interaction.py # Click, type, hover tools
480
+ β”œβ”€β”€ forms.py # Form handling tools
481
+ β”œβ”€β”€ content.py # Content extraction tools
482
+ β”œβ”€β”€ devtools.py # Developer tools
483
+ └── recording.py # Session recording tools
484
+ ```
485
+
486
+ ---
487
+
488
+ ## πŸ”§ Troubleshooting
489
+
490
+ ### "Missing X server" Error
491
+
492
+ Set `BROWSER_HEADLESS=true` or run with xvfb:
493
+ ```bash
494
+ xvfb-run browsercontrol
495
+ ```
496
+
497
+ ### Browser Not Starting
498
+
499
+ Chromium auto-installs on first run. If it fails, install manually:
500
+ ```bash
501
+ python -m playwright install chromium
502
+ ```
503
+
504
+ ### Session Not Persisting
505
+
506
+ Check that `BROWSER_USER_DATA_DIR` is writable:
507
+ ```bash
508
+ ls -la ~/.browsercontrol/
509
+ ```
510
+
511
+ ### Connection Refused
512
+
513
+ Ensure no other instance is running:
514
+ ```bash
515
+ pkill -f browsercontrol
516
+ browsercontrol
517
+ ```
518
+
519
+ ---
520
+
521
+ ## 🀝 Contributing
522
+
523
+ Contributions are welcome! Some ideas:
524
+
525
+ - [ ] Multi-tab support
526
+ - [ ] Firefox/WebKit support
527
+ - [ ] DOM diffing (detect changes)
528
+ - [ ] Accessibility audit
529
+ - [ ] Mobile emulation presets
530
+ - [ ] Cookie import/export
531
+
532
+ ```bash
533
+ # Clone and install
534
+ git clone https://github.com/adityasasidhar/browsercontrol
535
+ cd browsercontrol
536
+ uv sync
537
+
538
+ # Run tests
539
+ uv run pytest
540
+
541
+ # Run in development
542
+ uv run fastmcp dev browsercontrol/server.py
543
+ ```
544
+
545
+ ---
546
+
547
+ ## πŸ“„ License
548
+
549
+ MIT License - Use it however you want.
550
+
551
+ ---
552
+
553
+ ## πŸ™ Acknowledgments
554
+
555
+ - Inspired by the browser control capabilities in **Google's AntiGravity IDE**
556
+ - Built with [FastMCP](https://gofastmcp.com) and [Playwright](https://playwright.dev)
557
+ - Thanks to the MCP community for making AI-tool integration accessible
558
+
559
+ ---
560
+
561
+ <p align="center">
562
+ <strong>Built with ❀️ for the AI agent community.</strong>
563
+ </p>
564
+
565
+ <p align="center">
566
+ <a href="https://github.com/adityasasidhar/browsercontrol">⭐ Star on GitHub</a> β€’
567
+ <a href="https://github.com/adityasasidhar/browsercontrol/issues">Report Bug</a> β€’
568
+ <a href="https://github.com/adityasasidhar/browsercontrol/issues">Request Feature</a>
569
+ </p>