windows-mcp 0.5.2__tar.gz → 0.5.3__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (29) hide show
  1. {windows_mcp-0.5.2 → windows_mcp-0.5.3}/.gitignore +1 -1
  2. windows_mcp-0.5.3/.python-version +1 -0
  3. windows_mcp-0.5.3/CONTRIBUTING.md +384 -0
  4. {windows_mcp-0.5.2 → windows_mcp-0.5.3}/PKG-INFO +54 -3
  5. {windows_mcp-0.5.2 → windows_mcp-0.5.3}/README.md +53 -1
  6. windows_mcp-0.5.3/SECURITY.md +304 -0
  7. windows_mcp-0.5.3/assets/demo1.mov +0 -0
  8. windows_mcp-0.5.3/assets/demo2.mov +0 -0
  9. windows_mcp-0.5.3/assets/logo.png +0 -0
  10. windows_mcp-0.5.3/assets/screenshots/screenshot_1.png +0 -0
  11. windows_mcp-0.5.3/assets/screenshots/screenshot_2.png +0 -0
  12. windows_mcp-0.5.3/assets/screenshots/screenshot_3.png +0 -0
  13. windows_mcp-0.5.3/manifest.json +99 -0
  14. windows_mcp-0.5.3/notebook.ipynb +187 -0
  15. {windows_mcp-0.5.2 → windows_mcp-0.5.3}/pyproject.toml +40 -42
  16. windows_mcp-0.5.3/server.json +23 -0
  17. windows_mcp-0.5.2/main.py → windows_mcp-0.5.3/src/windows_mcp/__main__.py +38 -17
  18. {windows_mcp-0.5.2/src → windows_mcp-0.5.3/src/windows_mcp}/desktop/service.py +9 -6
  19. {windows_mcp-0.5.2/src → windows_mcp-0.5.3/src/windows_mcp}/desktop/views.py +1 -1
  20. {windows_mcp-0.5.2/src → windows_mcp-0.5.3/src/windows_mcp}/tree/service.py +89 -36
  21. {windows_mcp-0.5.2/src → windows_mcp-0.5.3/src/windows_mcp}/tree/views.py +8 -0
  22. windows_mcp-0.5.3/uv.lock +1483 -0
  23. {windows_mcp-0.5.2 → windows_mcp-0.5.3}/LICENSE.md +0 -0
  24. {windows_mcp-0.5.2/src → windows_mcp-0.5.3/src/windows_mcp}/__init__.py +0 -0
  25. {windows_mcp-0.5.2/src → windows_mcp-0.5.3/src/windows_mcp}/desktop/__init__.py +0 -0
  26. {windows_mcp-0.5.2/src → windows_mcp-0.5.3/src/windows_mcp}/desktop/config.py +0 -0
  27. {windows_mcp-0.5.2/src → windows_mcp-0.5.3/src/windows_mcp}/tree/__init__.py +0 -0
  28. {windows_mcp-0.5.2/src → windows_mcp-0.5.3/src/windows_mcp}/tree/config.py +0 -0
  29. {windows_mcp-0.5.2/src → windows_mcp-0.5.3/src/windows_mcp}/tree/utils.py +0 -0
@@ -165,5 +165,5 @@ cython_debug/
165
165
  .mcpregistry_github_token
166
166
  .mcpregistry_registry_token
167
167
  sandbox
168
- *.ipynb
168
+ # *.ipynb
169
169
  *.mcpb
@@ -0,0 +1 @@
1
+ 3.13
@@ -0,0 +1,384 @@
1
+ # Contributing to Windows-MCP
2
+
3
+ Thank you for your interest in contributing to Windows-MCP! We welcome contributions from the community to help make this project better. This document provides guidelines and instructions for contributing.
4
+
5
+ ## Table of Contents
6
+
7
+ - [Code of Conduct](#code-of-conduct)
8
+ - [Getting Started](#getting-started)
9
+ - [Prerequisites](#prerequisites)
10
+ - [Development Environment Setup](#development-environment-setup)
11
+ - [Development Workflow](#development-workflow)
12
+ - [Branching Strategy](#branching-strategy)
13
+ - [Making Changes](#making-changes)
14
+ - [Commit Messages](#commit-messages)
15
+ - [Code Style](#code-style)
16
+ - [Testing](#testing)
17
+ - [Running Tests](#running-tests)
18
+ - [Adding Tests](#adding-tests)
19
+ - [Pull Requests](#pull-requests)
20
+ - [Before Submitting](#before-submitting)
21
+ - [Pull Request Process](#pull-request-process)
22
+ - [Review Process](#review-process)
23
+ - [Documentation](#documentation)
24
+ - [Reporting Issues](#reporting-issues)
25
+ - [Security Vulnerabilities](#security-vulnerabilities)
26
+ - [Getting Help](#getting-help)
27
+
28
+ ## Code of Conduct
29
+
30
+ By participating in this project, you agree to maintain a respectful and inclusive environment. We expect all contributors to:
31
+
32
+ - Be respectful and considerate in communication
33
+ - Welcome newcomers and help them get started
34
+ - Accept constructive criticism gracefully
35
+ - Focus on what's best for the community and project
36
+
37
+ ## Getting Started
38
+
39
+ ### Prerequisites
40
+
41
+ Before you begin, ensure you have:
42
+
43
+ - **Windows OS**: Windows 7, 8, 8.1, 10, or 11
44
+ - **Python 3.13+**: [Download Python](https://www.python.org/downloads/)
45
+ - **UV Package Manager**: Install with `pip install uv` or see [UV documentation](https://github.com/astral-sh/uv)
46
+ - **Git**: [Download Git](https://git-scm.com/downloads)
47
+ - **A GitHub account**: [Sign up here](https://github.com/join)
48
+
49
+ ### Development Environment Setup
50
+
51
+ 1. **Fork the Repository**
52
+
53
+ Click the "Fork" button on the [Windows-MCP repository](https://github.com/CursorTouch/Windows-MCP) to create your own copy.
54
+
55
+ 2. **Clone Your Fork**
56
+
57
+ ```bash
58
+ git clone https://github.com/YOUR_USERNAME/Windows-MCP.git
59
+ cd Windows-MCP
60
+ ```
61
+
62
+ 3. **Add Upstream Remote**
63
+
64
+ ```bash
65
+ git remote add upstream https://github.com/CursorTouch/Windows-MCP.git
66
+ ```
67
+
68
+ 4. **Install Dependencies**
69
+
70
+ ```bash
71
+ uv sync
72
+ ```
73
+
74
+ 5. **Verify Installation**
75
+
76
+ ```bash
77
+ uv run main.py --help
78
+ ```
79
+
80
+ ## Development Workflow
81
+
82
+ ### Branching Strategy
83
+
84
+ - **`main`** branch contains the latest stable code
85
+ - Create feature branches from `main` using descriptive names:
86
+ - Features: `feature/add-new-tool`
87
+ - Bug fixes: `fix/click-tool-coordinates`
88
+ - Documentation: `docs/update-readme`
89
+ - Refactoring: `refactor/desktop-service`
90
+
91
+ ### Making Changes
92
+
93
+ 1. **Create a New Branch**
94
+
95
+ ```bash
96
+ git checkout -b feature/your-feature-name
97
+ ```
98
+
99
+ 2. **Make Your Changes**
100
+
101
+ - Write clean, readable code
102
+ - Follow the existing code structure
103
+ - Add comments for complex logic
104
+ - Update documentation as needed
105
+
106
+ 3. **Test Your Changes**
107
+
108
+ - Test manually in a safe environment (VM recommended)
109
+ - Add automated tests if applicable
110
+ - Ensure existing functionality isn't broken
111
+
112
+ 4. **Commit Your Changes**
113
+
114
+ ```bash
115
+ git add .
116
+ git commit -m "Add feature: description of your changes"
117
+ ```
118
+
119
+ ### Commit Messages
120
+
121
+ While we don't enforce a strict commit message format, please make your commits informative:
122
+
123
+ **Good examples:**
124
+ - `Add support for multi-monitor setups in State-Tool`
125
+ - `Fix Click-Tool coordinate offset on high DPI displays`
126
+ - `Update README with Perplexity Desktop installation steps`
127
+ - `Refactor Desktop class to improve error handling`
128
+
129
+ **Avoid:**
130
+ - `fix bug`
131
+ - `update`
132
+ - `changes`
133
+
134
+ ### Code Style
135
+
136
+ We use **[Ruff](https://github.com/astral-sh/ruff)** for code formatting and linting.
137
+
138
+ **Key Guidelines:**
139
+ - **Line length**: 100 characters maximum
140
+ - **Quotes**: Use double quotes for strings
141
+ - **Naming conventions**: Follow PEP 8
142
+ - `snake_case` for functions and variables
143
+ - `PascalCase` for classes
144
+ - `UPPER_CASE` for constants
145
+ - **Type hints**: Add type annotations to function signatures
146
+ - **Docstrings**: Use Google-style docstrings for all public functions and classes
147
+
148
+ **Example:**
149
+
150
+ ```python
151
+ def click_tool(
152
+ loc: list[int],
153
+ button: Literal['left', 'right', 'middle'] = 'left',
154
+ clicks: int = 1
155
+ ) -> str:
156
+ """Click on UI elements at specific coordinates.
157
+
158
+ Args:
159
+ loc: List of [x, y] coordinates to click
160
+ button: Mouse button to use (left, right, or middle)
161
+ clicks: Number of clicks (1=single, 2=double, 3=triple)
162
+
163
+ Returns:
164
+ Confirmation message describing the action performed
165
+
166
+ Raises:
167
+ ValueError: If loc doesn't contain exactly 2 integers
168
+ """
169
+ if len(loc) != 2:
170
+ raise ValueError("Location must be a list of exactly 2 integers [x, y]")
171
+ # Implementation...
172
+ ```
173
+
174
+ **Format Code:**
175
+
176
+ ```bash
177
+ ruff format .
178
+ ```
179
+
180
+ **Run Linter:**
181
+
182
+ ```bash
183
+ ruff check .
184
+ ```
185
+
186
+ ## Testing
187
+
188
+ ### Running Tests
189
+
190
+ If the project has tests (check the `tests/` directory):
191
+
192
+ ```bash
193
+ pytest
194
+ ```
195
+
196
+ Run specific test files:
197
+
198
+ ```bash
199
+ pytest tests/test_desktop.py
200
+ ```
201
+
202
+ Run with coverage:
203
+
204
+ ```bash
205
+ pytest --cov=src tests/
206
+ ```
207
+
208
+ ### Adding Tests
209
+
210
+ When adding new features:
211
+
212
+ 1. **Create test files** in the `tests/` directory matching the module structure
213
+ 2. **Write unit tests** for individual functions
214
+ 3. **Write integration tests** for tool workflows
215
+ 4. **Use fixtures** for common test setup
216
+ 5. **Mock external dependencies** (Windows API calls, file system operations)
217
+
218
+ **Example Test:**
219
+
220
+ ```python
221
+ import pytest
222
+ from src.desktop.service import Desktop
223
+
224
+ def test_click_tool_validates_coordinates():
225
+ """Test that click_tool raises ValueError for invalid coordinates."""
226
+ with pytest.raises(ValueError, match="exactly 2 integers"):
227
+ click_tool([100]) # Missing y coordinate
228
+ ```
229
+
230
+ ## Pull Requests
231
+
232
+ ### Before Submitting
233
+
234
+ - [ ] Code follows the project's style guidelines
235
+ - [ ] All tests pass (if applicable)
236
+ - [ ] Documentation is updated (README, docstrings, etc.)
237
+ - [ ] Commit messages are clear and descriptive
238
+ - [ ] Changes are tested in a safe environment (VM recommended)
239
+ - [ ] No sensitive information (API keys, passwords) is included
240
+
241
+ ### Pull Request Process
242
+
243
+ 1. **Update Your Branch**
244
+
245
+ ```bash
246
+ git fetch upstream
247
+ git rebase upstream/main
248
+ ```
249
+
250
+ 2. **Push to Your Fork**
251
+
252
+ ```bash
253
+ git push origin feature/your-feature-name
254
+ ```
255
+
256
+ 3. **Create Pull Request**
257
+
258
+ - Go to the [Windows-MCP repository](https://github.com/CursorTouch/Windows-MCP)
259
+ - Click "New Pull Request"
260
+ - Select your fork and branch
261
+ - Fill out the PR template with:
262
+ - **Description**: What does this PR do?
263
+ - **Motivation**: Why is this change needed?
264
+ - **Testing**: How was this tested?
265
+ - **Screenshots**: If applicable (UI changes, new features)
266
+ - **Related Issues**: Link any related issues
267
+
268
+ 4. **Respond to Feedback**
269
+
270
+ - Address reviewer comments promptly
271
+ - Make requested changes in new commits
272
+ - Push updates to the same branch
273
+
274
+ ### Review Process
275
+
276
+ - Maintainers will review your PR within a few days
277
+ - You may be asked to make changes or provide clarification
278
+ - Once approved, a maintainer will merge your PR
279
+ - Your contribution will be acknowledged in release notes
280
+
281
+ ## Documentation
282
+
283
+ Good documentation is crucial! When contributing:
284
+
285
+ ### Code Documentation
286
+
287
+ - **Docstrings**: Add to all public functions, classes, and methods
288
+ - **Comments**: Explain complex logic or non-obvious decisions
289
+ - **Type hints**: Help users and tools understand your code
290
+
291
+ ### User Documentation
292
+
293
+ Update relevant documentation files:
294
+
295
+ - **README.md**: For user-facing features or installation changes
296
+ - **SECURITY.md**: For security-related changes
297
+ - **CONTRIBUTING.md**: For development process changes
298
+
299
+ ### Tool Documentation
300
+
301
+ When adding or modifying tools:
302
+
303
+ 1. Update the tool's `description` parameter in `main.py`
304
+ 2. Add appropriate `ToolAnnotations`
305
+ 3. Update the tools list in `README.md`
306
+ 4. Update `manifest.json` if needed
307
+
308
+ ## Reporting Issues
309
+
310
+ Found a bug or have a feature request? Please open an issue!
311
+
312
+ ### Bug Reports
313
+
314
+ Include:
315
+ - **Description**: Clear description of the bug
316
+ - **Steps to Reproduce**: Detailed steps to recreate the issue
317
+ - **Expected Behavior**: What should happen
318
+ - **Actual Behavior**: What actually happens
319
+ - **Environment**: Windows version, Python version, MCP client
320
+ - **Screenshots/Logs**: If applicable
321
+
322
+ ### Feature Requests
323
+
324
+ Include:
325
+ - **Description**: What feature do you want?
326
+ - **Use Case**: Why is this feature needed?
327
+ - **Proposed Solution**: How might this be implemented?
328
+ - **Alternatives**: Other approaches you've considered
329
+
330
+ ## Security Vulnerabilities
331
+
332
+ **DO NOT** report security vulnerabilities through public GitHub issues.
333
+
334
+ Instead, please:
335
+ 1. Email the maintainers at [jeogeoalukka@gmail.com](mailto:jeogeoalukka@gmail.com)
336
+ 2. Or use [GitHub Security Advisories](https://github.com/CursorTouch/Windows-MCP/security/advisories)
337
+
338
+ See our [Security Policy](SECURITY.md) for more details.
339
+
340
+ ## Getting Help
341
+
342
+ Need help with your contribution?
343
+
344
+ - **Discord**: Join our [Discord Community](https://discord.com/invite/Aue9Yj2VzS)
345
+ - **Twitter/X**: Follow [@CursorTouch](https://x.com/CursorTouch)
346
+ - **GitHub Discussions**: Ask questions in [Discussions](https://github.com/CursorTouch/Windows-MCP/discussions)
347
+ - **Issues**: Open an issue for technical questions
348
+
349
+ ## Types of Contributions
350
+
351
+ We welcome many types of contributions:
352
+
353
+ ### Code Contributions
354
+
355
+ - **New Tools**: Add new MCP tools for Windows automation
356
+ - **Bug Fixes**: Fix issues in existing tools
357
+ - **Performance Improvements**: Optimize code for speed or efficiency
358
+ - **Refactoring**: Improve code structure and maintainability
359
+
360
+ ### Non-Code Contributions
361
+
362
+ - **Documentation**: Improve README, guides, or docstrings
363
+ - **Testing**: Add test cases or improve test coverage
364
+ - **Bug Reports**: Report issues with detailed information
365
+ - **Feature Requests**: Suggest new features or improvements
366
+ - **Community Support**: Help others in Discord or Discussions
367
+ - **Translations**: Help translate documentation (future)
368
+
369
+ ## Recognition
370
+
371
+ Contributors are recognized in:
372
+ - GitHub contributors page
373
+ - Release notes for significant contributions
374
+ - Special mentions for major features or fixes
375
+
376
+ ## License
377
+
378
+ By contributing to Windows-MCP, you agree that your contributions will be licensed under the [MIT License](LICENSE.md).
379
+
380
+ ---
381
+
382
+ Thank you for contributing to Windows-MCP! Your efforts help make this project better for everyone. 🙏
383
+
384
+ Made with ❤️ by the CursorTouch community
@@ -1,6 +1,6 @@
1
1
  Metadata-Version: 2.4
2
2
  Name: windows-mcp
3
- Version: 0.5.2
3
+ Version: 0.5.3
4
4
  Summary: Lightweight MCP Server for interacting with Windows Operating System.
5
5
  Project-URL: homepage, https://github.com/CursorTouch
6
6
  Author-email: Jeomon George <jeogeoalukka@gmail.com>
@@ -44,7 +44,6 @@ Requires-Dist: python-levenshtein>=0.27.1
44
44
  Requires-Dist: pywinauto>=0.6.9
45
45
  Requires-Dist: requests>=2.32.3
46
46
  Requires-Dist: tabulate>=0.9.0
47
- Requires-Dist: twine>=6.2.0
48
47
  Requires-Dist: uiautomation>=2.0.24
49
48
  Description-Content-Type: text/markdown
50
49
 
@@ -114,6 +113,58 @@ mcp-name: io.github.CursorTouch/Windows-MCP
114
113
  - **Real-Time Interaction**
115
114
  Typical latency between actions (e.g., from one mouse click to the next) ranges from **0.7 to 2.5 secs**, and may slightly vary based on the number of active applications and system load, also the inferencing speed of the llm.
116
115
 
116
+ - **DOM Mode for Browser Automation**
117
+ Special `use_dom=True` mode for State-Tool that focuses exclusively on web page content, filtering out browser UI elements for cleaner, more efficient web automation.
118
+
119
+ ## 🌐 DOM Mode for Browser Automation
120
+
121
+ Windows-MCP includes a powerful **DOM Mode** feature that enhances browser automation by focusing on web page content rather than browser UI elements.
122
+
123
+ ### What is DOM Mode?
124
+
125
+ When `use_dom=True` is set in the State-Tool, the MCP server:
126
+ - **Filters out browser UI**: Removes address bars, tabs, toolbars, and other browser chrome elements
127
+ - **Returns only web content**: Provides interactive elements (links, buttons, forms) from the actual web page
128
+ - **Reduces token usage**: Cleaner output means fewer tokens sent to the LLM
129
+ - **Improves accuracy**: LLM focuses only on relevant web page elements
130
+
131
+ ### When to Use DOM Mode
132
+
133
+ ✅ **Use `use_dom=True` when:**
134
+ - Automating web applications or websites
135
+ - Scraping web content
136
+ - Filling out web forms
137
+ - Clicking links or buttons on web pages
138
+ - Testing web interfaces
139
+ - You want to ignore browser UI and focus on page content
140
+
141
+ ❌ **Use `use_dom=False` (default) when:**
142
+ - Interacting with browser controls (address bar, tabs, bookmarks)
143
+ - Working with desktop applications
144
+ - Need to see all UI elements including browser chrome
145
+ - Managing browser settings or extensions
146
+
147
+ ### Example Usage
148
+
149
+ ```python
150
+ # Get web page content only (no browser UI)
151
+ state_tool(use_vision=False, use_dom=True)
152
+
153
+ # Get full desktop state including browser UI
154
+ state_tool(use_vision=False, use_dom=False)
155
+
156
+ # Get web page content with screenshot
157
+ state_tool(use_vision=True, use_dom=True)
158
+ ```
159
+
160
+ ### Benefits
161
+
162
+ 1. **Token Efficiency**: Reduces the amount of data sent to LLM by filtering irrelevant browser UI
163
+ 2. **Better Focus**: LLM concentrates on actionable web page elements
164
+ 3. **Cleaner Output**: Only relevant interactive elements from the DOM are returned
165
+ 4. **Faster Processing**: Less data means faster LLM inference
166
+ 5. **Cost Savings**: Fewer tokens = lower API costs for cloud LLMs
167
+
117
168
  ## 🛠️Installation
118
169
 
119
170
  ### Prerequisites
@@ -317,7 +368,7 @@ MCP Client can access the following tools to interact with Windows:
317
368
  - `Move-Tool`: Move mouse pointer.
318
369
  - `Shortcut-Tool`: Press keyboard shortcuts (`Ctrl+c`, `Alt+Tab`, etc).
319
370
  - `Wait-Tool`: Pause for a defined duration.
320
- - `State-Tool`: Combined snapshot of default language, browser, active apps and interactive, textual and scrollable elements along with screenshot of the desktop..
371
+ - `State-Tool`: Combined snapshot of default language, browser, active apps and interactive, textual and scrollable elements along with screenshot of the desktop. Supports `use_dom=True` for browser content extraction (web page elements only) and `use_vision=True` for including screenshots.
321
372
  - `App-Tool`: To launch an application from the start menu, resize or move the window and switch between apps.
322
373
  - `Shell-Tool`: To execute PowerShell commands.
323
374
  - `Scrape-Tool`: To scrape the entire webpage for information.
@@ -64,6 +64,58 @@ mcp-name: io.github.CursorTouch/Windows-MCP
64
64
  - **Real-Time Interaction**
65
65
  Typical latency between actions (e.g., from one mouse click to the next) ranges from **0.7 to 2.5 secs**, and may slightly vary based on the number of active applications and system load, also the inferencing speed of the llm.
66
66
 
67
+ - **DOM Mode for Browser Automation**
68
+ Special `use_dom=True` mode for State-Tool that focuses exclusively on web page content, filtering out browser UI elements for cleaner, more efficient web automation.
69
+
70
+ ## 🌐 DOM Mode for Browser Automation
71
+
72
+ Windows-MCP includes a powerful **DOM Mode** feature that enhances browser automation by focusing on web page content rather than browser UI elements.
73
+
74
+ ### What is DOM Mode?
75
+
76
+ When `use_dom=True` is set in the State-Tool, the MCP server:
77
+ - **Filters out browser UI**: Removes address bars, tabs, toolbars, and other browser chrome elements
78
+ - **Returns only web content**: Provides interactive elements (links, buttons, forms) from the actual web page
79
+ - **Reduces token usage**: Cleaner output means fewer tokens sent to the LLM
80
+ - **Improves accuracy**: LLM focuses only on relevant web page elements
81
+
82
+ ### When to Use DOM Mode
83
+
84
+ ✅ **Use `use_dom=True` when:**
85
+ - Automating web applications or websites
86
+ - Scraping web content
87
+ - Filling out web forms
88
+ - Clicking links or buttons on web pages
89
+ - Testing web interfaces
90
+ - You want to ignore browser UI and focus on page content
91
+
92
+ ❌ **Use `use_dom=False` (default) when:**
93
+ - Interacting with browser controls (address bar, tabs, bookmarks)
94
+ - Working with desktop applications
95
+ - Need to see all UI elements including browser chrome
96
+ - Managing browser settings or extensions
97
+
98
+ ### Example Usage
99
+
100
+ ```python
101
+ # Get web page content only (no browser UI)
102
+ state_tool(use_vision=False, use_dom=True)
103
+
104
+ # Get full desktop state including browser UI
105
+ state_tool(use_vision=False, use_dom=False)
106
+
107
+ # Get web page content with screenshot
108
+ state_tool(use_vision=True, use_dom=True)
109
+ ```
110
+
111
+ ### Benefits
112
+
113
+ 1. **Token Efficiency**: Reduces the amount of data sent to LLM by filtering irrelevant browser UI
114
+ 2. **Better Focus**: LLM concentrates on actionable web page elements
115
+ 3. **Cleaner Output**: Only relevant interactive elements from the DOM are returned
116
+ 4. **Faster Processing**: Less data means faster LLM inference
117
+ 5. **Cost Savings**: Fewer tokens = lower API costs for cloud LLMs
118
+
67
119
  ## 🛠️Installation
68
120
 
69
121
  ### Prerequisites
@@ -267,7 +319,7 @@ MCP Client can access the following tools to interact with Windows:
267
319
  - `Move-Tool`: Move mouse pointer.
268
320
  - `Shortcut-Tool`: Press keyboard shortcuts (`Ctrl+c`, `Alt+Tab`, etc).
269
321
  - `Wait-Tool`: Pause for a defined duration.
270
- - `State-Tool`: Combined snapshot of default language, browser, active apps and interactive, textual and scrollable elements along with screenshot of the desktop..
322
+ - `State-Tool`: Combined snapshot of default language, browser, active apps and interactive, textual and scrollable elements along with screenshot of the desktop. Supports `use_dom=True` for browser content extraction (web page elements only) and `use_vision=True` for including screenshots.
271
323
  - `App-Tool`: To launch an application from the start menu, resize or move the window and switch between apps.
272
324
  - `Shell-Tool`: To execute PowerShell commands.
273
325
  - `Scrape-Tool`: To scrape the entire webpage for information.