@rayven122/mcp-selenium 0.2.3

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/LICENSE ADDED
@@ -0,0 +1,21 @@
1
+ MIT License
2
+
3
+ Copyright (c) 2025 Angie Jones
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in all
13
+ copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21
+ SOFTWARE.
package/README.md ADDED
@@ -0,0 +1,259 @@
1
+ # MCP Selenium Server
2
+
3
+ A Model Context Protocol (MCP) server that lets AI agents automate real browsers
4
+ through Selenium WebDriver.
5
+
6
+ Use it when an agent needs to open a browser, navigate pages, click elements,
7
+ fill forms, upload files, handle alerts, manage cookies, capture diagnostics, or
8
+ inspect page structure without writing a separate Selenium script.
9
+
10
+ ## What It Provides
11
+
12
+ - Browser automation for Chrome, Firefox, Edge, Safari, and Edge in IE mode.
13
+ - 18 MCP tools for navigation, interactions, screenshots, cookies, windows,
14
+ frames, alerts, script execution, and diagnostics.
15
+ - 2 MCP resources for browser status and compact accessibility snapshots.
16
+ - Passive WebDriver BiDi capture for console logs, JavaScript errors, and
17
+ network activity when the browser and driver support it.
18
+
19
+ ## Setup
20
+
21
+ <details open>
22
+ <summary><strong>Goose (Desktop)</strong></summary>
23
+
24
+ Paste into your browser address bar:
25
+
26
+ ```
27
+ goose://extension?cmd=npx&arg=-y&arg=github%3Arayven122%2Fmcp-selenium&id=selenium-mcp&name=Selenium%20MCP&description=automates%20browser%20interactions
28
+ ```
29
+ </details>
30
+
31
+ <details>
32
+ <summary><strong>Goose (CLI)</strong></summary>
33
+
34
+ ```bash
35
+ goose session --with-extension "npx -y github:rayven122/mcp-selenium"
36
+ ```
37
+ </details>
38
+
39
+ <details>
40
+ <summary><strong>Claude Code</strong></summary>
41
+
42
+ ```bash
43
+ claude mcp add selenium -- npx -y github:rayven122/mcp-selenium
44
+ ```
45
+ </details>
46
+
47
+ <details>
48
+ <summary><strong>Cursor / Windsurf / other MCP clients</strong></summary>
49
+
50
+ ```json
51
+ {
52
+ "mcpServers": {
53
+ "selenium": {
54
+ "command": "npx",
55
+ "args": ["-y", "github:rayven122/mcp-selenium"]
56
+ }
57
+ }
58
+ }
59
+ ```
60
+ </details>
61
+
62
+ ## Requirements
63
+
64
+ - Node.js and npm.
65
+ - At least one supported browser installed.
66
+ - The matching browser driver available to Selenium if your environment does
67
+ not provide one automatically.
68
+
69
+ For local tests, Chrome and `chromedriver` must be on your `PATH`.
70
+
71
+ ## Example Usage
72
+
73
+ After adding the server to your MCP client, ask your AI agent something like:
74
+
75
+ > Open Chrome, go to github.com/angiejones, and take a screenshot.
76
+
77
+ The agent can then call `start_browser`, `navigate`, and `take_screenshot`
78
+ through MCP. For most page inspection tasks, agents should prefer the
79
+ `accessibility://current` resource because it is smaller and easier to reason
80
+ about than full HTML or screenshots.
81
+
82
+ ## Supported Browsers
83
+
84
+ | Browser | `start_browser` value | Headless support | Notes |
85
+ |---------|------------------------|------------------|-------|
86
+ | Chrome | `chrome` | Yes | Uses `--headless=new` when `options.headless` is true. |
87
+ | Firefox | `firefox` | Yes | Uses Firefox headless mode when requested. |
88
+ | Edge | `edge` | Yes | Uses `--headless=new` when `options.headless` is true. |
89
+ | Safari | `safari` | No | macOS only. Requires Safari remote automation. |
90
+ | Edge in IE mode | `edge-ie` | No | Windows only. Only exposed in the `start_browser` schema on Windows. Requires IEDriverServer and IE mode setup. |
91
+
92
+ ### Safari Setup
93
+
94
+ Run this once on macOS:
95
+
96
+ ```bash
97
+ sudo safaridriver --enable
98
+ ```
99
+
100
+ Then enable "Allow Remote Automation" in Safari under Settings > Developer.
101
+
102
+ ### Edge IE Mode Setup
103
+
104
+ Edge IE mode is for legacy sites that must run through the Internet Explorer
105
+ engine inside Microsoft Edge. It requires:
106
+
107
+ - Windows.
108
+ - Microsoft Edge.
109
+ - IEDriverServer, preferably 32-bit, from the
110
+ [Selenium downloads](https://www.selenium.dev/downloads/) on your `PATH`.
111
+ - IE mode enabled in Edge by policy or registry, with target sites configured
112
+ for Internet Explorer mode.
113
+
114
+ Example:
115
+
116
+ ```json
117
+ {
118
+ "browser": "edge-ie",
119
+ "options": {
120
+ "ieIgnoreZoomSetting": true
121
+ }
122
+ }
123
+ ```
124
+
125
+ Optional Edge IE mode options include `edgePath` and `ieIgnoreZoomSetting`.
126
+
127
+ ## Tools
128
+
129
+ Locator-based tools use the same locator strategies:
130
+
131
+ | Strategy | Description |
132
+ |----------|-------------|
133
+ | `id` | Find by element ID. |
134
+ | `css` | Find by CSS selector. |
135
+ | `xpath` | Find by XPath expression. |
136
+ | `name` | Find by `name` attribute. |
137
+ | `tag` | Find by tag name. |
138
+ | `class` | Find by class name. |
139
+
140
+ Most locator-based tools accept an optional `timeout` in milliseconds. The
141
+ default is `10000` unless noted otherwise.
142
+
143
+ | Tool | Purpose | Key parameters |
144
+ |------|---------|----------------|
145
+ | `start_browser` | Launch a browser session. | `browser`, optional `options` |
146
+ | `navigate` | Navigate to a URL. | `url` |
147
+ | `interact` | Click, double-click, right-click, or hover over an element. | `action`, `by`, `value`, optional `timeout` |
148
+ | `send_keys` | Clear an element, then type text into it. | `by`, `value`, `text`, optional `timeout` |
149
+ | `get_element_text` | Read visible text from an element. | `by`, `value`, optional `timeout` |
150
+ | `get_element_attribute` | Read an element attribute. | `by`, `value`, `attribute`, optional `timeout` |
151
+ | `press_key` | Press a keyboard key. | `key` |
152
+ | `upload_file` | Set a file input to an absolute file path. | `by`, `value`, `filePath`, optional `timeout` |
153
+ | `take_screenshot` | Capture the current page. | optional `outputPath` |
154
+ | `close_session` | Close the current browser session. | none |
155
+ | `execute_script` | Run JavaScript in the browser. | `script`, optional `args` |
156
+ | `window` | List, switch, switch to latest, or close windows and tabs. | `action`, optional `handle` |
157
+ | `frame` | Switch to a frame or back to the default page. | `action`, optional `by`, `value`, `index`, `timeout` |
158
+ | `alert` | Accept, dismiss, read, or type into browser dialogs. | `action`, optional `text`, `timeout` |
159
+ | `add_cookie` | Add a cookie for the current page domain. | `name`, `value`, optional cookie fields |
160
+ | `get_cookies` | Return all cookies or one cookie by name. | optional `name` |
161
+ | `delete_cookie` | Delete all cookies or one cookie by name. | optional `name` |
162
+ | `diagnostics` | Read BiDi console logs, JS errors, or network activity. | `type`, optional `clear` |
163
+
164
+ ### Tool Details
165
+
166
+ #### `start_browser`
167
+
168
+ `browser` must be one of `chrome`, `firefox`, `edge`, or `safari`. On Windows,
169
+ `edge-ie` is also available for Edge in Internet Explorer mode.
170
+
171
+ `options` can include:
172
+
173
+ ```json
174
+ {
175
+ "headless": true,
176
+ "arguments": ["--window-size=1280,720"],
177
+ "edgePath": "C:\\Program Files (x86)\\Microsoft\\Edge\\Application\\msedge.exe",
178
+ "ieIgnoreZoomSetting": true
179
+ }
180
+ ```
181
+
182
+ `edgePath` and `ieIgnoreZoomSetting` apply only to `edge-ie`.
183
+
184
+ For safety, browser arguments that weaken browser isolation or expose remote
185
+ debugging are blocked by default. In a trusted local environment, set
186
+ `MCP_SELENIUM_ALLOW_UNSAFE_BROWSER_ARGS=1` to pass those arguments through.
187
+
188
+ #### `navigate`
189
+
190
+ `javascript:` and `vbscript:` URLs are rejected. Use `execute_script` when you
191
+ intentionally need to run JavaScript in the active page.
192
+
193
+ #### `interact`
194
+
195
+ `action` must be one of `click`, `doubleclick`, `rightclick`, or `hover`.
196
+
197
+ #### `take_screenshot`
198
+
199
+ When `outputPath` is provided, the path must end in `.png` and resolve inside
200
+ the server's current working directory. Set `MCP_SELENIUM_SCREENSHOT_DIR` to use
201
+ a different trusted screenshot output directory.
202
+
203
+ #### `window`
204
+
205
+ `action` must be one of `list`, `switch`, `switch_latest`, or `close`.
206
+ `handle` is required for `switch`.
207
+
208
+ #### `frame`
209
+
210
+ `action` must be `switch` or `default`. For `switch`, provide either a locator
211
+ (`by` and `value`) or an `index`.
212
+
213
+ #### `alert`
214
+
215
+ `action` must be one of `accept`, `dismiss`, `get_text`, or `send_text`.
216
+ `text` is required for `send_text`. The default timeout is `5000` ms.
217
+
218
+ #### `diagnostics`
219
+
220
+ `type` must be one of `console`, `errors`, or `network`. Set `clear` to `true`
221
+ to empty that diagnostics buffer after reading it.
222
+
223
+ ## Resources
224
+
225
+ MCP resources provide read-only data that clients can access without calling a
226
+ tool.
227
+
228
+ | Resource | MIME type | Requires browser | Description |
229
+ |----------|-----------|------------------|-------------|
230
+ | `browser-status://current` | `text/plain` | No | Current active session ID, or `no active session`. |
231
+ | `accessibility://current` | `application/json` | Yes | Compact accessibility tree of interactive elements and text content. |
232
+
233
+ ## Development
234
+
235
+ ```bash
236
+ git clone https://github.com/rayven122/mcp-selenium.git
237
+ cd mcp-selenium
238
+ npm install
239
+ npm test
240
+ ```
241
+
242
+ Tests use Node's built-in test runner and talk to the real MCP server over
243
+ stdio. They require Chrome and `chromedriver` on your `PATH`.
244
+
245
+ This fork is distributed via GitHub (not published to npm). The Setup section
246
+ above runs it directly with `npx -y github:rayven122/mcp-selenium`.
247
+
248
+ ### Run from a local clone
249
+
250
+ For a pinned local copy (recommended when running on a fixed Windows host for
251
+ Edge IE mode), point your MCP client at the server entry directly:
252
+
253
+ ```bash
254
+ node /absolute/path/to/mcp-selenium/src/lib/server.js
255
+ ```
256
+
257
+ ## License
258
+
259
+ MIT
@@ -0,0 +1,29 @@
1
+ #!/usr/bin/env node
2
+
3
+ import { spawn } from 'node:child_process';
4
+ import { dirname, resolve } from 'node:path';
5
+ import { fileURLToPath } from 'node:url';
6
+
7
+ const __filename = fileURLToPath(import.meta.url);
8
+ const __dirname = dirname(__filename);
9
+
10
+ const serverPath = resolve(__dirname, '../src/lib/server.js');
11
+
12
+ // Start the server
13
+ const child = spawn('node', [serverPath], {
14
+ stdio: 'inherit',
15
+ });
16
+
17
+ child.on('error', (error) => {
18
+ console.error(`Error starting server: ${error.message}`);
19
+ process.exit(1);
20
+ });
21
+
22
+ // Handle process termination
23
+ process.on('SIGTERM', () => {
24
+ child.kill('SIGTERM');
25
+ });
26
+
27
+ process.on('SIGINT', () => {
28
+ child.kill('SIGINT');
29
+ });
package/package.json ADDED
@@ -0,0 +1,48 @@
1
+ {
2
+ "name": "@rayven122/mcp-selenium",
3
+ "version": "0.2.3",
4
+ "description": "Selenium WebDriver MCP Server (rayven122 fork — adds Edge IE mode)",
5
+ "type": "module",
6
+ "main": "src/lib/server.js",
7
+ "bin": {
8
+ "mcp-selenium": "bin/mcp-selenium.js"
9
+ },
10
+ "files": [
11
+ "bin",
12
+ "src",
13
+ "README.md",
14
+ "LICENSE",
15
+ "smithery.yaml"
16
+ ],
17
+ "engines": {
18
+ "node": ">=18"
19
+ },
20
+ "repository": {
21
+ "type": "git",
22
+ "url": "git+https://github.com/rayven122/mcp-selenium.git"
23
+ },
24
+ "publishConfig": {
25
+ "access": "public"
26
+ },
27
+ "scripts": {
28
+ "audit": "npm audit --audit-level=high",
29
+ "check": "biome check .",
30
+ "ci": "npm run check && npm run audit && npm test && npm run pack:dry-run",
31
+ "coverage": "node --test --test-concurrency=1 --experimental-test-coverage --test-coverage-include='src/**/*.js'",
32
+ "format": "biome format --write .",
33
+ "format:check": "biome format .",
34
+ "lint": "biome lint .",
35
+ "pack:dry-run": "npm pack --dry-run",
36
+ "test": "node --test --test-concurrency=1"
37
+ },
38
+ "keywords": [],
39
+ "author": "",
40
+ "license": "ISC",
41
+ "dependencies": {
42
+ "@modelcontextprotocol/sdk": "^1.26.0",
43
+ "selenium-webdriver": "^4.18.1"
44
+ },
45
+ "devDependencies": {
46
+ "@biomejs/biome": "2.5.1"
47
+ }
48
+ }
package/smithery.yaml ADDED
@@ -0,0 +1,13 @@
1
+ # Smithery configuration file: https://smithery.ai/docs/config#smitheryyaml
2
+
3
+ startCommand:
4
+ type: stdio
5
+ configSchema:
6
+ # JSON Schema defining the configuration options for the MCP.
7
+ type: object
8
+ required: []
9
+ properties: {}
10
+ commandFunction:
11
+ # A function that produces the CLI command to start the MCP on stdio.
12
+ |-
13
+ (config) => ({command:'node', args:['src/lib/server.js'], env:{}})
@@ -0,0 +1,69 @@
1
+ // Browser-side script — walks DOM to build accessibility tree.
2
+ // Uses `var` intentionally: this is executed via WebDriver's executeScript in arbitrary
3
+ // browser contexts, so we avoid `const`/`let` for maximum compatibility.
4
+ var ROLE_MAP = {
5
+ A: 'link', BUTTON: 'button', INPUT: 'textbox', SELECT: 'combobox',
6
+ OPTION: 'option', TEXTAREA: 'textbox', IMG: 'img', TABLE: 'table',
7
+ THEAD: 'rowgroup', TBODY: 'rowgroup', TR: 'row', TH: 'columnheader',
8
+ TD: 'cell', UL: 'list', OL: 'list', LI: 'listitem', NAV: 'navigation',
9
+ MAIN: 'main', HEADER: 'banner', FOOTER: 'contentinfo', ASIDE: 'complementary',
10
+ FORM: 'form', SECTION: 'region', H1: 'heading', H2: 'heading',
11
+ H3: 'heading', H4: 'heading', H5: 'heading', H6: 'heading',
12
+ DIALOG: 'dialog', DETAILS: 'group', SUMMARY: 'button',
13
+ FIELDSET: 'group', LEGEND: 'legend', LABEL: 'label',
14
+ PROGRESS: 'progressbar', METER: 'meter'
15
+ };
16
+ var INPUT_ROLES = {
17
+ checkbox: 'checkbox', radio: 'radio', button: 'button',
18
+ submit: 'button', reset: 'button', range: 'slider',
19
+ search: 'searchbox', email: 'textbox', url: 'textbox',
20
+ tel: 'textbox', number: 'spinbutton'
21
+ };
22
+ var SKIP = { SCRIPT:1, STYLE:1, NOSCRIPT:1, TEMPLATE:1, SVG:1 };
23
+
24
+ function walk(el) {
25
+ if (!el) return null;
26
+ if (el.nodeType === 3) {
27
+ var t = el.textContent.trim();
28
+ return t ? { role: 'text', name: t.substring(0, 200) } : null;
29
+ }
30
+ if (el.nodeType !== 1 || SKIP[el.tagName]) return null;
31
+ // Note: we check the HTML hidden attribute and aria-hidden, but intentionally
32
+ // skip getComputedStyle checks for display:none / visibility:hidden — calling
33
+ // getComputedStyle on every node forces style recalculation and is too expensive
34
+ // for large DOMs. If you need CSS-hidden filtering, add it here at the cost of
35
+ // performance: var cs = window.getComputedStyle(el); if (cs.display === 'none' || cs.visibility === 'hidden') return null;
36
+ if (el.hidden || el.getAttribute('aria-hidden') === 'true') return null;
37
+
38
+ var tag = el.tagName;
39
+ var role = el.getAttribute('role') || (tag === 'INPUT' ? INPUT_ROLES[el.type] : null) || ROLE_MAP[tag] || null;
40
+ var name = el.getAttribute('aria-label') || el.getAttribute('alt') || el.getAttribute('title')
41
+ || el.getAttribute('placeholder') || el.getAttribute('name') || null;
42
+ var node = {};
43
+ if (role) node.role = role;
44
+ if (name) node.name = name;
45
+ if (el.id) node.id = el.id;
46
+ if (/^H[1-6]$/.test(tag)) node.level = parseInt(tag[1], 10);
47
+ if (el.href) node.href = el.href;
48
+ if (el.disabled) node.disabled = true;
49
+ if (el.checked) node.checked = true;
50
+ if (el.required) node.required = true;
51
+ if (el.value && (tag === 'INPUT' || tag === 'TEXTAREA' || tag === 'SELECT')) node.value = el.value.substring(0, 200);
52
+
53
+ var kids = [];
54
+ for (var i = 0; i < el.childNodes.length; i++) {
55
+ var c = walk(el.childNodes[i]);
56
+ if (c) kids.push(c);
57
+ }
58
+
59
+ // Collapse: text-only node with no role gets merged up
60
+ if (!role && !name && !el.id && kids.length === 1 && kids[0].role === 'text') return kids[0];
61
+ // Skip empty containers with no semantic meaning
62
+ if (!role && !name && !el.id && kids.length === 0) return null;
63
+
64
+ if (kids.length > 0) node.children = kids;
65
+ // If the node has nothing useful, skip it
66
+ if (!node.role && !node.name && !node.id && !node.children) return null;
67
+ return node;
68
+ }
69
+ return walk(document.body);