windows-mcp 0.5.2__tar.gz → 0.5.3__tar.gz
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- {windows_mcp-0.5.2 → windows_mcp-0.5.3}/.gitignore +1 -1
- windows_mcp-0.5.3/.python-version +1 -0
- windows_mcp-0.5.3/CONTRIBUTING.md +384 -0
- {windows_mcp-0.5.2 → windows_mcp-0.5.3}/PKG-INFO +54 -3
- {windows_mcp-0.5.2 → windows_mcp-0.5.3}/README.md +53 -1
- windows_mcp-0.5.3/SECURITY.md +304 -0
- windows_mcp-0.5.3/assets/demo1.mov +0 -0
- windows_mcp-0.5.3/assets/demo2.mov +0 -0
- windows_mcp-0.5.3/assets/logo.png +0 -0
- windows_mcp-0.5.3/assets/screenshots/screenshot_1.png +0 -0
- windows_mcp-0.5.3/assets/screenshots/screenshot_2.png +0 -0
- windows_mcp-0.5.3/assets/screenshots/screenshot_3.png +0 -0
- windows_mcp-0.5.3/manifest.json +99 -0
- windows_mcp-0.5.3/notebook.ipynb +187 -0
- {windows_mcp-0.5.2 → windows_mcp-0.5.3}/pyproject.toml +40 -42
- windows_mcp-0.5.3/server.json +23 -0
- windows_mcp-0.5.2/main.py → windows_mcp-0.5.3/src/windows_mcp/__main__.py +38 -17
- {windows_mcp-0.5.2/src → windows_mcp-0.5.3/src/windows_mcp}/desktop/service.py +9 -6
- {windows_mcp-0.5.2/src → windows_mcp-0.5.3/src/windows_mcp}/desktop/views.py +1 -1
- {windows_mcp-0.5.2/src → windows_mcp-0.5.3/src/windows_mcp}/tree/service.py +89 -36
- {windows_mcp-0.5.2/src → windows_mcp-0.5.3/src/windows_mcp}/tree/views.py +8 -0
- windows_mcp-0.5.3/uv.lock +1483 -0
- {windows_mcp-0.5.2 → windows_mcp-0.5.3}/LICENSE.md +0 -0
- {windows_mcp-0.5.2/src → windows_mcp-0.5.3/src/windows_mcp}/__init__.py +0 -0
- {windows_mcp-0.5.2/src → windows_mcp-0.5.3/src/windows_mcp}/desktop/__init__.py +0 -0
- {windows_mcp-0.5.2/src → windows_mcp-0.5.3/src/windows_mcp}/desktop/config.py +0 -0
- {windows_mcp-0.5.2/src → windows_mcp-0.5.3/src/windows_mcp}/tree/__init__.py +0 -0
- {windows_mcp-0.5.2/src → windows_mcp-0.5.3/src/windows_mcp}/tree/config.py +0 -0
- {windows_mcp-0.5.2/src → windows_mcp-0.5.3/src/windows_mcp}/tree/utils.py +0 -0
|
@@ -0,0 +1 @@
|
|
|
1
|
+
3.13
|
|
@@ -0,0 +1,384 @@
|
|
|
1
|
+
# Contributing to Windows-MCP
|
|
2
|
+
|
|
3
|
+
Thank you for your interest in contributing to Windows-MCP! We welcome contributions from the community to help make this project better. This document provides guidelines and instructions for contributing.
|
|
4
|
+
|
|
5
|
+
## Table of Contents
|
|
6
|
+
|
|
7
|
+
- [Code of Conduct](#code-of-conduct)
|
|
8
|
+
- [Getting Started](#getting-started)
|
|
9
|
+
- [Prerequisites](#prerequisites)
|
|
10
|
+
- [Development Environment Setup](#development-environment-setup)
|
|
11
|
+
- [Development Workflow](#development-workflow)
|
|
12
|
+
- [Branching Strategy](#branching-strategy)
|
|
13
|
+
- [Making Changes](#making-changes)
|
|
14
|
+
- [Commit Messages](#commit-messages)
|
|
15
|
+
- [Code Style](#code-style)
|
|
16
|
+
- [Testing](#testing)
|
|
17
|
+
- [Running Tests](#running-tests)
|
|
18
|
+
- [Adding Tests](#adding-tests)
|
|
19
|
+
- [Pull Requests](#pull-requests)
|
|
20
|
+
- [Before Submitting](#before-submitting)
|
|
21
|
+
- [Pull Request Process](#pull-request-process)
|
|
22
|
+
- [Review Process](#review-process)
|
|
23
|
+
- [Documentation](#documentation)
|
|
24
|
+
- [Reporting Issues](#reporting-issues)
|
|
25
|
+
- [Security Vulnerabilities](#security-vulnerabilities)
|
|
26
|
+
- [Getting Help](#getting-help)
|
|
27
|
+
|
|
28
|
+
## Code of Conduct
|
|
29
|
+
|
|
30
|
+
By participating in this project, you agree to maintain a respectful and inclusive environment. We expect all contributors to:
|
|
31
|
+
|
|
32
|
+
- Be respectful and considerate in communication
|
|
33
|
+
- Welcome newcomers and help them get started
|
|
34
|
+
- Accept constructive criticism gracefully
|
|
35
|
+
- Focus on what's best for the community and project
|
|
36
|
+
|
|
37
|
+
## Getting Started
|
|
38
|
+
|
|
39
|
+
### Prerequisites
|
|
40
|
+
|
|
41
|
+
Before you begin, ensure you have:
|
|
42
|
+
|
|
43
|
+
- **Windows OS**: Windows 7, 8, 8.1, 10, or 11
|
|
44
|
+
- **Python 3.13+**: [Download Python](https://www.python.org/downloads/)
|
|
45
|
+
- **UV Package Manager**: Install with `pip install uv` or see [UV documentation](https://github.com/astral-sh/uv)
|
|
46
|
+
- **Git**: [Download Git](https://git-scm.com/downloads)
|
|
47
|
+
- **A GitHub account**: [Sign up here](https://github.com/join)
|
|
48
|
+
|
|
49
|
+
### Development Environment Setup
|
|
50
|
+
|
|
51
|
+
1. **Fork the Repository**
|
|
52
|
+
|
|
53
|
+
Click the "Fork" button on the [Windows-MCP repository](https://github.com/CursorTouch/Windows-MCP) to create your own copy.
|
|
54
|
+
|
|
55
|
+
2. **Clone Your Fork**
|
|
56
|
+
|
|
57
|
+
```bash
|
|
58
|
+
git clone https://github.com/YOUR_USERNAME/Windows-MCP.git
|
|
59
|
+
cd Windows-MCP
|
|
60
|
+
```
|
|
61
|
+
|
|
62
|
+
3. **Add Upstream Remote**
|
|
63
|
+
|
|
64
|
+
```bash
|
|
65
|
+
git remote add upstream https://github.com/CursorTouch/Windows-MCP.git
|
|
66
|
+
```
|
|
67
|
+
|
|
68
|
+
4. **Install Dependencies**
|
|
69
|
+
|
|
70
|
+
```bash
|
|
71
|
+
uv sync
|
|
72
|
+
```
|
|
73
|
+
|
|
74
|
+
5. **Verify Installation**
|
|
75
|
+
|
|
76
|
+
```bash
|
|
77
|
+
uv run main.py --help
|
|
78
|
+
```
|
|
79
|
+
|
|
80
|
+
## Development Workflow
|
|
81
|
+
|
|
82
|
+
### Branching Strategy
|
|
83
|
+
|
|
84
|
+
- **`main`** branch contains the latest stable code
|
|
85
|
+
- Create feature branches from `main` using descriptive names:
|
|
86
|
+
- Features: `feature/add-new-tool`
|
|
87
|
+
- Bug fixes: `fix/click-tool-coordinates`
|
|
88
|
+
- Documentation: `docs/update-readme`
|
|
89
|
+
- Refactoring: `refactor/desktop-service`
|
|
90
|
+
|
|
91
|
+
### Making Changes
|
|
92
|
+
|
|
93
|
+
1. **Create a New Branch**
|
|
94
|
+
|
|
95
|
+
```bash
|
|
96
|
+
git checkout -b feature/your-feature-name
|
|
97
|
+
```
|
|
98
|
+
|
|
99
|
+
2. **Make Your Changes**
|
|
100
|
+
|
|
101
|
+
- Write clean, readable code
|
|
102
|
+
- Follow the existing code structure
|
|
103
|
+
- Add comments for complex logic
|
|
104
|
+
- Update documentation as needed
|
|
105
|
+
|
|
106
|
+
3. **Test Your Changes**
|
|
107
|
+
|
|
108
|
+
- Test manually in a safe environment (VM recommended)
|
|
109
|
+
- Add automated tests if applicable
|
|
110
|
+
- Ensure existing functionality isn't broken
|
|
111
|
+
|
|
112
|
+
4. **Commit Your Changes**
|
|
113
|
+
|
|
114
|
+
```bash
|
|
115
|
+
git add .
|
|
116
|
+
git commit -m "Add feature: description of your changes"
|
|
117
|
+
```
|
|
118
|
+
|
|
119
|
+
### Commit Messages
|
|
120
|
+
|
|
121
|
+
While we don't enforce a strict commit message format, please make your commits informative:
|
|
122
|
+
|
|
123
|
+
**Good examples:**
|
|
124
|
+
- `Add support for multi-monitor setups in State-Tool`
|
|
125
|
+
- `Fix Click-Tool coordinate offset on high DPI displays`
|
|
126
|
+
- `Update README with Perplexity Desktop installation steps`
|
|
127
|
+
- `Refactor Desktop class to improve error handling`
|
|
128
|
+
|
|
129
|
+
**Avoid:**
|
|
130
|
+
- `fix bug`
|
|
131
|
+
- `update`
|
|
132
|
+
- `changes`
|
|
133
|
+
|
|
134
|
+
### Code Style
|
|
135
|
+
|
|
136
|
+
We use **[Ruff](https://github.com/astral-sh/ruff)** for code formatting and linting.
|
|
137
|
+
|
|
138
|
+
**Key Guidelines:**
|
|
139
|
+
- **Line length**: 100 characters maximum
|
|
140
|
+
- **Quotes**: Use double quotes for strings
|
|
141
|
+
- **Naming conventions**: Follow PEP 8
|
|
142
|
+
- `snake_case` for functions and variables
|
|
143
|
+
- `PascalCase` for classes
|
|
144
|
+
- `UPPER_CASE` for constants
|
|
145
|
+
- **Type hints**: Add type annotations to function signatures
|
|
146
|
+
- **Docstrings**: Use Google-style docstrings for all public functions and classes
|
|
147
|
+
|
|
148
|
+
**Example:**
|
|
149
|
+
|
|
150
|
+
```python
|
|
151
|
+
def click_tool(
|
|
152
|
+
loc: list[int],
|
|
153
|
+
button: Literal['left', 'right', 'middle'] = 'left',
|
|
154
|
+
clicks: int = 1
|
|
155
|
+
) -> str:
|
|
156
|
+
"""Click on UI elements at specific coordinates.
|
|
157
|
+
|
|
158
|
+
Args:
|
|
159
|
+
loc: List of [x, y] coordinates to click
|
|
160
|
+
button: Mouse button to use (left, right, or middle)
|
|
161
|
+
clicks: Number of clicks (1=single, 2=double, 3=triple)
|
|
162
|
+
|
|
163
|
+
Returns:
|
|
164
|
+
Confirmation message describing the action performed
|
|
165
|
+
|
|
166
|
+
Raises:
|
|
167
|
+
ValueError: If loc doesn't contain exactly 2 integers
|
|
168
|
+
"""
|
|
169
|
+
if len(loc) != 2:
|
|
170
|
+
raise ValueError("Location must be a list of exactly 2 integers [x, y]")
|
|
171
|
+
# Implementation...
|
|
172
|
+
```
|
|
173
|
+
|
|
174
|
+
**Format Code:**
|
|
175
|
+
|
|
176
|
+
```bash
|
|
177
|
+
ruff format .
|
|
178
|
+
```
|
|
179
|
+
|
|
180
|
+
**Run Linter:**
|
|
181
|
+
|
|
182
|
+
```bash
|
|
183
|
+
ruff check .
|
|
184
|
+
```
|
|
185
|
+
|
|
186
|
+
## Testing
|
|
187
|
+
|
|
188
|
+
### Running Tests
|
|
189
|
+
|
|
190
|
+
If the project has tests (check the `tests/` directory):
|
|
191
|
+
|
|
192
|
+
```bash
|
|
193
|
+
pytest
|
|
194
|
+
```
|
|
195
|
+
|
|
196
|
+
Run specific test files:
|
|
197
|
+
|
|
198
|
+
```bash
|
|
199
|
+
pytest tests/test_desktop.py
|
|
200
|
+
```
|
|
201
|
+
|
|
202
|
+
Run with coverage:
|
|
203
|
+
|
|
204
|
+
```bash
|
|
205
|
+
pytest --cov=src tests/
|
|
206
|
+
```
|
|
207
|
+
|
|
208
|
+
### Adding Tests
|
|
209
|
+
|
|
210
|
+
When adding new features:
|
|
211
|
+
|
|
212
|
+
1. **Create test files** in the `tests/` directory matching the module structure
|
|
213
|
+
2. **Write unit tests** for individual functions
|
|
214
|
+
3. **Write integration tests** for tool workflows
|
|
215
|
+
4. **Use fixtures** for common test setup
|
|
216
|
+
5. **Mock external dependencies** (Windows API calls, file system operations)
|
|
217
|
+
|
|
218
|
+
**Example Test:**
|
|
219
|
+
|
|
220
|
+
```python
|
|
221
|
+
import pytest
|
|
222
|
+
from src.desktop.service import Desktop
|
|
223
|
+
|
|
224
|
+
def test_click_tool_validates_coordinates():
|
|
225
|
+
"""Test that click_tool raises ValueError for invalid coordinates."""
|
|
226
|
+
with pytest.raises(ValueError, match="exactly 2 integers"):
|
|
227
|
+
click_tool([100]) # Missing y coordinate
|
|
228
|
+
```
|
|
229
|
+
|
|
230
|
+
## Pull Requests
|
|
231
|
+
|
|
232
|
+
### Before Submitting
|
|
233
|
+
|
|
234
|
+
- [ ] Code follows the project's style guidelines
|
|
235
|
+
- [ ] All tests pass (if applicable)
|
|
236
|
+
- [ ] Documentation is updated (README, docstrings, etc.)
|
|
237
|
+
- [ ] Commit messages are clear and descriptive
|
|
238
|
+
- [ ] Changes are tested in a safe environment (VM recommended)
|
|
239
|
+
- [ ] No sensitive information (API keys, passwords) is included
|
|
240
|
+
|
|
241
|
+
### Pull Request Process
|
|
242
|
+
|
|
243
|
+
1. **Update Your Branch**
|
|
244
|
+
|
|
245
|
+
```bash
|
|
246
|
+
git fetch upstream
|
|
247
|
+
git rebase upstream/main
|
|
248
|
+
```
|
|
249
|
+
|
|
250
|
+
2. **Push to Your Fork**
|
|
251
|
+
|
|
252
|
+
```bash
|
|
253
|
+
git push origin feature/your-feature-name
|
|
254
|
+
```
|
|
255
|
+
|
|
256
|
+
3. **Create Pull Request**
|
|
257
|
+
|
|
258
|
+
- Go to the [Windows-MCP repository](https://github.com/CursorTouch/Windows-MCP)
|
|
259
|
+
- Click "New Pull Request"
|
|
260
|
+
- Select your fork and branch
|
|
261
|
+
- Fill out the PR template with:
|
|
262
|
+
- **Description**: What does this PR do?
|
|
263
|
+
- **Motivation**: Why is this change needed?
|
|
264
|
+
- **Testing**: How was this tested?
|
|
265
|
+
- **Screenshots**: If applicable (UI changes, new features)
|
|
266
|
+
- **Related Issues**: Link any related issues
|
|
267
|
+
|
|
268
|
+
4. **Respond to Feedback**
|
|
269
|
+
|
|
270
|
+
- Address reviewer comments promptly
|
|
271
|
+
- Make requested changes in new commits
|
|
272
|
+
- Push updates to the same branch
|
|
273
|
+
|
|
274
|
+
### Review Process
|
|
275
|
+
|
|
276
|
+
- Maintainers will review your PR within a few days
|
|
277
|
+
- You may be asked to make changes or provide clarification
|
|
278
|
+
- Once approved, a maintainer will merge your PR
|
|
279
|
+
- Your contribution will be acknowledged in release notes
|
|
280
|
+
|
|
281
|
+
## Documentation
|
|
282
|
+
|
|
283
|
+
Good documentation is crucial! When contributing:
|
|
284
|
+
|
|
285
|
+
### Code Documentation
|
|
286
|
+
|
|
287
|
+
- **Docstrings**: Add to all public functions, classes, and methods
|
|
288
|
+
- **Comments**: Explain complex logic or non-obvious decisions
|
|
289
|
+
- **Type hints**: Help users and tools understand your code
|
|
290
|
+
|
|
291
|
+
### User Documentation
|
|
292
|
+
|
|
293
|
+
Update relevant documentation files:
|
|
294
|
+
|
|
295
|
+
- **README.md**: For user-facing features or installation changes
|
|
296
|
+
- **SECURITY.md**: For security-related changes
|
|
297
|
+
- **CONTRIBUTING.md**: For development process changes
|
|
298
|
+
|
|
299
|
+
### Tool Documentation
|
|
300
|
+
|
|
301
|
+
When adding or modifying tools:
|
|
302
|
+
|
|
303
|
+
1. Update the tool's `description` parameter in `main.py`
|
|
304
|
+
2. Add appropriate `ToolAnnotations`
|
|
305
|
+
3. Update the tools list in `README.md`
|
|
306
|
+
4. Update `manifest.json` if needed
|
|
307
|
+
|
|
308
|
+
## Reporting Issues
|
|
309
|
+
|
|
310
|
+
Found a bug or have a feature request? Please open an issue!
|
|
311
|
+
|
|
312
|
+
### Bug Reports
|
|
313
|
+
|
|
314
|
+
Include:
|
|
315
|
+
- **Description**: Clear description of the bug
|
|
316
|
+
- **Steps to Reproduce**: Detailed steps to recreate the issue
|
|
317
|
+
- **Expected Behavior**: What should happen
|
|
318
|
+
- **Actual Behavior**: What actually happens
|
|
319
|
+
- **Environment**: Windows version, Python version, MCP client
|
|
320
|
+
- **Screenshots/Logs**: If applicable
|
|
321
|
+
|
|
322
|
+
### Feature Requests
|
|
323
|
+
|
|
324
|
+
Include:
|
|
325
|
+
- **Description**: What feature do you want?
|
|
326
|
+
- **Use Case**: Why is this feature needed?
|
|
327
|
+
- **Proposed Solution**: How might this be implemented?
|
|
328
|
+
- **Alternatives**: Other approaches you've considered
|
|
329
|
+
|
|
330
|
+
## Security Vulnerabilities
|
|
331
|
+
|
|
332
|
+
**DO NOT** report security vulnerabilities through public GitHub issues.
|
|
333
|
+
|
|
334
|
+
Instead, please:
|
|
335
|
+
1. Email the maintainers at [jeogeoalukka@gmail.com](mailto:jeogeoalukka@gmail.com)
|
|
336
|
+
2. Or use [GitHub Security Advisories](https://github.com/CursorTouch/Windows-MCP/security/advisories)
|
|
337
|
+
|
|
338
|
+
See our [Security Policy](SECURITY.md) for more details.
|
|
339
|
+
|
|
340
|
+
## Getting Help
|
|
341
|
+
|
|
342
|
+
Need help with your contribution?
|
|
343
|
+
|
|
344
|
+
- **Discord**: Join our [Discord Community](https://discord.com/invite/Aue9Yj2VzS)
|
|
345
|
+
- **Twitter/X**: Follow [@CursorTouch](https://x.com/CursorTouch)
|
|
346
|
+
- **GitHub Discussions**: Ask questions in [Discussions](https://github.com/CursorTouch/Windows-MCP/discussions)
|
|
347
|
+
- **Issues**: Open an issue for technical questions
|
|
348
|
+
|
|
349
|
+
## Types of Contributions
|
|
350
|
+
|
|
351
|
+
We welcome many types of contributions:
|
|
352
|
+
|
|
353
|
+
### Code Contributions
|
|
354
|
+
|
|
355
|
+
- **New Tools**: Add new MCP tools for Windows automation
|
|
356
|
+
- **Bug Fixes**: Fix issues in existing tools
|
|
357
|
+
- **Performance Improvements**: Optimize code for speed or efficiency
|
|
358
|
+
- **Refactoring**: Improve code structure and maintainability
|
|
359
|
+
|
|
360
|
+
### Non-Code Contributions
|
|
361
|
+
|
|
362
|
+
- **Documentation**: Improve README, guides, or docstrings
|
|
363
|
+
- **Testing**: Add test cases or improve test coverage
|
|
364
|
+
- **Bug Reports**: Report issues with detailed information
|
|
365
|
+
- **Feature Requests**: Suggest new features or improvements
|
|
366
|
+
- **Community Support**: Help others in Discord or Discussions
|
|
367
|
+
- **Translations**: Help translate documentation (future)
|
|
368
|
+
|
|
369
|
+
## Recognition
|
|
370
|
+
|
|
371
|
+
Contributors are recognized in:
|
|
372
|
+
- GitHub contributors page
|
|
373
|
+
- Release notes for significant contributions
|
|
374
|
+
- Special mentions for major features or fixes
|
|
375
|
+
|
|
376
|
+
## License
|
|
377
|
+
|
|
378
|
+
By contributing to Windows-MCP, you agree that your contributions will be licensed under the [MIT License](LICENSE.md).
|
|
379
|
+
|
|
380
|
+
---
|
|
381
|
+
|
|
382
|
+
Thank you for contributing to Windows-MCP! Your efforts help make this project better for everyone. 🙏
|
|
383
|
+
|
|
384
|
+
Made with ❤️ by the CursorTouch community
|
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
Metadata-Version: 2.4
|
|
2
2
|
Name: windows-mcp
|
|
3
|
-
Version: 0.5.
|
|
3
|
+
Version: 0.5.3
|
|
4
4
|
Summary: Lightweight MCP Server for interacting with Windows Operating System.
|
|
5
5
|
Project-URL: homepage, https://github.com/CursorTouch
|
|
6
6
|
Author-email: Jeomon George <jeogeoalukka@gmail.com>
|
|
@@ -44,7 +44,6 @@ Requires-Dist: python-levenshtein>=0.27.1
|
|
|
44
44
|
Requires-Dist: pywinauto>=0.6.9
|
|
45
45
|
Requires-Dist: requests>=2.32.3
|
|
46
46
|
Requires-Dist: tabulate>=0.9.0
|
|
47
|
-
Requires-Dist: twine>=6.2.0
|
|
48
47
|
Requires-Dist: uiautomation>=2.0.24
|
|
49
48
|
Description-Content-Type: text/markdown
|
|
50
49
|
|
|
@@ -114,6 +113,58 @@ mcp-name: io.github.CursorTouch/Windows-MCP
|
|
|
114
113
|
- **Real-Time Interaction**
|
|
115
114
|
Typical latency between actions (e.g., from one mouse click to the next) ranges from **0.7 to 2.5 secs**, and may slightly vary based on the number of active applications and system load, also the inferencing speed of the llm.
|
|
116
115
|
|
|
116
|
+
- **DOM Mode for Browser Automation**
|
|
117
|
+
Special `use_dom=True` mode for State-Tool that focuses exclusively on web page content, filtering out browser UI elements for cleaner, more efficient web automation.
|
|
118
|
+
|
|
119
|
+
## 🌐 DOM Mode for Browser Automation
|
|
120
|
+
|
|
121
|
+
Windows-MCP includes a powerful **DOM Mode** feature that enhances browser automation by focusing on web page content rather than browser UI elements.
|
|
122
|
+
|
|
123
|
+
### What is DOM Mode?
|
|
124
|
+
|
|
125
|
+
When `use_dom=True` is set in the State-Tool, the MCP server:
|
|
126
|
+
- **Filters out browser UI**: Removes address bars, tabs, toolbars, and other browser chrome elements
|
|
127
|
+
- **Returns only web content**: Provides interactive elements (links, buttons, forms) from the actual web page
|
|
128
|
+
- **Reduces token usage**: Cleaner output means fewer tokens sent to the LLM
|
|
129
|
+
- **Improves accuracy**: LLM focuses only on relevant web page elements
|
|
130
|
+
|
|
131
|
+
### When to Use DOM Mode
|
|
132
|
+
|
|
133
|
+
✅ **Use `use_dom=True` when:**
|
|
134
|
+
- Automating web applications or websites
|
|
135
|
+
- Scraping web content
|
|
136
|
+
- Filling out web forms
|
|
137
|
+
- Clicking links or buttons on web pages
|
|
138
|
+
- Testing web interfaces
|
|
139
|
+
- You want to ignore browser UI and focus on page content
|
|
140
|
+
|
|
141
|
+
❌ **Use `use_dom=False` (default) when:**
|
|
142
|
+
- Interacting with browser controls (address bar, tabs, bookmarks)
|
|
143
|
+
- Working with desktop applications
|
|
144
|
+
- Need to see all UI elements including browser chrome
|
|
145
|
+
- Managing browser settings or extensions
|
|
146
|
+
|
|
147
|
+
### Example Usage
|
|
148
|
+
|
|
149
|
+
```python
|
|
150
|
+
# Get web page content only (no browser UI)
|
|
151
|
+
state_tool(use_vision=False, use_dom=True)
|
|
152
|
+
|
|
153
|
+
# Get full desktop state including browser UI
|
|
154
|
+
state_tool(use_vision=False, use_dom=False)
|
|
155
|
+
|
|
156
|
+
# Get web page content with screenshot
|
|
157
|
+
state_tool(use_vision=True, use_dom=True)
|
|
158
|
+
```
|
|
159
|
+
|
|
160
|
+
### Benefits
|
|
161
|
+
|
|
162
|
+
1. **Token Efficiency**: Reduces the amount of data sent to LLM by filtering irrelevant browser UI
|
|
163
|
+
2. **Better Focus**: LLM concentrates on actionable web page elements
|
|
164
|
+
3. **Cleaner Output**: Only relevant interactive elements from the DOM are returned
|
|
165
|
+
4. **Faster Processing**: Less data means faster LLM inference
|
|
166
|
+
5. **Cost Savings**: Fewer tokens = lower API costs for cloud LLMs
|
|
167
|
+
|
|
117
168
|
## 🛠️Installation
|
|
118
169
|
|
|
119
170
|
### Prerequisites
|
|
@@ -317,7 +368,7 @@ MCP Client can access the following tools to interact with Windows:
|
|
|
317
368
|
- `Move-Tool`: Move mouse pointer.
|
|
318
369
|
- `Shortcut-Tool`: Press keyboard shortcuts (`Ctrl+c`, `Alt+Tab`, etc).
|
|
319
370
|
- `Wait-Tool`: Pause for a defined duration.
|
|
320
|
-
- `State-Tool`: Combined snapshot of default language, browser, active apps and interactive, textual and scrollable elements along with screenshot of the desktop
|
|
371
|
+
- `State-Tool`: Combined snapshot of default language, browser, active apps and interactive, textual and scrollable elements along with screenshot of the desktop. Supports `use_dom=True` for browser content extraction (web page elements only) and `use_vision=True` for including screenshots.
|
|
321
372
|
- `App-Tool`: To launch an application from the start menu, resize or move the window and switch between apps.
|
|
322
373
|
- `Shell-Tool`: To execute PowerShell commands.
|
|
323
374
|
- `Scrape-Tool`: To scrape the entire webpage for information.
|
|
@@ -64,6 +64,58 @@ mcp-name: io.github.CursorTouch/Windows-MCP
|
|
|
64
64
|
- **Real-Time Interaction**
|
|
65
65
|
Typical latency between actions (e.g., from one mouse click to the next) ranges from **0.7 to 2.5 secs**, and may slightly vary based on the number of active applications and system load, also the inferencing speed of the llm.
|
|
66
66
|
|
|
67
|
+
- **DOM Mode for Browser Automation**
|
|
68
|
+
Special `use_dom=True` mode for State-Tool that focuses exclusively on web page content, filtering out browser UI elements for cleaner, more efficient web automation.
|
|
69
|
+
|
|
70
|
+
## 🌐 DOM Mode for Browser Automation
|
|
71
|
+
|
|
72
|
+
Windows-MCP includes a powerful **DOM Mode** feature that enhances browser automation by focusing on web page content rather than browser UI elements.
|
|
73
|
+
|
|
74
|
+
### What is DOM Mode?
|
|
75
|
+
|
|
76
|
+
When `use_dom=True` is set in the State-Tool, the MCP server:
|
|
77
|
+
- **Filters out browser UI**: Removes address bars, tabs, toolbars, and other browser chrome elements
|
|
78
|
+
- **Returns only web content**: Provides interactive elements (links, buttons, forms) from the actual web page
|
|
79
|
+
- **Reduces token usage**: Cleaner output means fewer tokens sent to the LLM
|
|
80
|
+
- **Improves accuracy**: LLM focuses only on relevant web page elements
|
|
81
|
+
|
|
82
|
+
### When to Use DOM Mode
|
|
83
|
+
|
|
84
|
+
✅ **Use `use_dom=True` when:**
|
|
85
|
+
- Automating web applications or websites
|
|
86
|
+
- Scraping web content
|
|
87
|
+
- Filling out web forms
|
|
88
|
+
- Clicking links or buttons on web pages
|
|
89
|
+
- Testing web interfaces
|
|
90
|
+
- You want to ignore browser UI and focus on page content
|
|
91
|
+
|
|
92
|
+
❌ **Use `use_dom=False` (default) when:**
|
|
93
|
+
- Interacting with browser controls (address bar, tabs, bookmarks)
|
|
94
|
+
- Working with desktop applications
|
|
95
|
+
- Need to see all UI elements including browser chrome
|
|
96
|
+
- Managing browser settings or extensions
|
|
97
|
+
|
|
98
|
+
### Example Usage
|
|
99
|
+
|
|
100
|
+
```python
|
|
101
|
+
# Get web page content only (no browser UI)
|
|
102
|
+
state_tool(use_vision=False, use_dom=True)
|
|
103
|
+
|
|
104
|
+
# Get full desktop state including browser UI
|
|
105
|
+
state_tool(use_vision=False, use_dom=False)
|
|
106
|
+
|
|
107
|
+
# Get web page content with screenshot
|
|
108
|
+
state_tool(use_vision=True, use_dom=True)
|
|
109
|
+
```
|
|
110
|
+
|
|
111
|
+
### Benefits
|
|
112
|
+
|
|
113
|
+
1. **Token Efficiency**: Reduces the amount of data sent to LLM by filtering irrelevant browser UI
|
|
114
|
+
2. **Better Focus**: LLM concentrates on actionable web page elements
|
|
115
|
+
3. **Cleaner Output**: Only relevant interactive elements from the DOM are returned
|
|
116
|
+
4. **Faster Processing**: Less data means faster LLM inference
|
|
117
|
+
5. **Cost Savings**: Fewer tokens = lower API costs for cloud LLMs
|
|
118
|
+
|
|
67
119
|
## 🛠️Installation
|
|
68
120
|
|
|
69
121
|
### Prerequisites
|
|
@@ -267,7 +319,7 @@ MCP Client can access the following tools to interact with Windows:
|
|
|
267
319
|
- `Move-Tool`: Move mouse pointer.
|
|
268
320
|
- `Shortcut-Tool`: Press keyboard shortcuts (`Ctrl+c`, `Alt+Tab`, etc).
|
|
269
321
|
- `Wait-Tool`: Pause for a defined duration.
|
|
270
|
-
- `State-Tool`: Combined snapshot of default language, browser, active apps and interactive, textual and scrollable elements along with screenshot of the desktop
|
|
322
|
+
- `State-Tool`: Combined snapshot of default language, browser, active apps and interactive, textual and scrollable elements along with screenshot of the desktop. Supports `use_dom=True` for browser content extraction (web page elements only) and `use_vision=True` for including screenshots.
|
|
271
323
|
- `App-Tool`: To launch an application from the start menu, resize or move the window and switch between apps.
|
|
272
324
|
- `Shell-Tool`: To execute PowerShell commands.
|
|
273
325
|
- `Scrape-Tool`: To scrape the entire webpage for information.
|