@wdio/mcp 1.4.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/LICENSE ADDED
@@ -0,0 +1,21 @@
1
+ MIT License
2
+
3
+ Copyright (c) 2025 Vince Graics
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in all
13
+ copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21
+ SOFTWARE.
package/README.md ADDED
@@ -0,0 +1,434 @@
1
+ # WebDriverIO MCP Server
2
+
3
+ A Model Context Protocol (MCP) server that enables Claude Desktop to interact with web browsers and mobile applications
4
+ using WebDriverIO. Automate Chrome browsers, iOS apps, and Android apps—all through a unified interface.
5
+
6
+ ## Installation
7
+
8
+ ### Setup
9
+
10
+ **Option 1: Configure Claude Desktop or Claude Code (Recommended)**
11
+
12
+ Add the following configuration to your Claude MCP settings:
13
+
14
+ ```json
15
+ {
16
+ "mcpServers": {
17
+ "wdio-mcp": {
18
+ "command": "npx",
19
+ "args": [
20
+ "-y",
21
+ "@wdio/mcp"
22
+ ]
23
+ }
24
+ }
25
+ }
26
+ ```
27
+
28
+ **Option 2: Global Installation**
29
+
30
+ ```bash
31
+ npm i -g @wdio/mcp
32
+ ```
33
+
34
+ Then configure MCP:
35
+
36
+ ```json
37
+ {
38
+ "mcpServers": {
39
+ "wdio-mcp": {
40
+ "command": "wdio-mcp"
41
+ }
42
+ }
43
+ }
44
+ ```
45
+
46
+ > **Note:** The npm package is `@wdio/mcp`, but the executable binary is `wdio-mcp`.
47
+
48
+ **Restart Claude Desktop**
49
+
50
+ ⚠️ You may need to fully restart Claude Desktop. On Windows, use Task Manager to ensure it's completely closed before
51
+ restarting.
52
+
53
+ 📖 **Need help?** Read the [official MCP configuration guide](https://modelcontextprotocol.io/quickstart/user)
54
+
55
+ ### Prerequisites For Mobile App Automation
56
+
57
+ - **Appium Server**: Install globally with `npm install -g appium`
58
+ - **Platform Drivers**:
59
+ - iOS: `appium driver install xcuitest` (requires Xcode on macOS)
60
+ - Android: `appium driver install uiautomator2` (requires Android Studio)
61
+ - **Devices/Emulators**:
62
+ - iOS Simulator (macOS) or physical device
63
+ - Android Emulator or physical device
64
+ - **For iOS Real Devices**: You'll need the device's UDID (Unique Device Identifier)
65
+ - **Find UDID on macOS**: Connect device → Open Finder → Select device → Click device name/model to reveal UDID
66
+ - **Find UDID on Windows**: Connect device → iTunes or Apple Devices app → Click device icon → Click "Serial Number"
67
+ to reveal UDID
68
+ - **Xcode method**: Window → Devices and Simulators → Select device → UDID shown as "Identifier"
69
+
70
+ Start the Appium server before using mobile features:
71
+
72
+ ```bash
73
+ appium
74
+ # Server runs at http://127.0.0.1:4723 by default
75
+ ```
76
+
77
+ ## Features
78
+
79
+ ### Browser Automation
80
+
81
+ - **Session Management**: Start and close Chrome browser sessions with headless/headed modes
82
+ - **Navigation & Interaction**: Navigate URLs, click elements, fill forms, and retrieve content
83
+ - **Page Analysis**: Get visible elements, accessibility trees, take screenshots
84
+ - **Cookie Management**: Get, set, and delete cookies
85
+ - **Scrolling**: Smooth scrolling with configurable distances
86
+
87
+ ### Mobile App Automation (iOS/Android)
88
+
89
+ - **Native App Testing**: Test iOS (.app/.ipa) and Android (.apk) apps via Appium
90
+ - **Touch Gestures**: Tap, swipe, long-press, drag-and-drop
91
+ - **App Lifecycle**: Launch, background, terminate, check app state
92
+ - **Context Switching**: Seamlessly switch between native and webview contexts for hybrid apps
93
+ - **Device Control**: Rotate, lock/unlock, geolocation, keyboard control, notifications
94
+ - **Cross-Platform Selectors**: Accessibility IDs, XPath, UiAutomator (Android), Predicates (iOS)
95
+
96
+ ## Available Tools
97
+
98
+ ### Session Management
99
+
100
+ | Tool | Description |
101
+ |---------------------|------------------------------------------------------------------------------------------|
102
+ | `start_browser` | Start a Chrome browser session (headless/headed, custom dimensions) |
103
+ | `start_app_session` | Start an iOS or Android app session via Appium (supports state preservation via noReset) |
104
+ | `close_session` | Close or detach from the current browser or app session (supports detach mode) |
105
+
106
+ ### Navigation & Page Interaction (Web & Mobile)
107
+
108
+ | Tool | Description |
109
+ |------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
110
+ | `navigate` | Navigate to a URL |
111
+ | `get_visible_elements` | Get visible, interactable elements on the page. Supports `inViewportOnly` (default: true) to filter viewport elements, and `includeContainers` (default: false) to include layout containers on mobile |
112
+ | `get_accessibility` | Get accessibility tree with semantic element information |
113
+ | `scroll_down` | Scroll down by specified pixels |
114
+ | `scroll_up` | Scroll up by specified pixels |
115
+ | `take_screenshot` | Capture a screenshot |
116
+
117
+ ### Element Interaction (Web & Mobile)
118
+
119
+ | Tool | Description |
120
+ |--------------------|-----------------------------------------------------------------|
121
+ | `find_element` | Find an element using CSS selectors, XPath, or mobile selectors |
122
+ | `click_element` | Click an element |
123
+ | `click_via_text` | Click an element by text content |
124
+ | `set_value` | Type text into input fields |
125
+ | `get_element_text` | Get text content of an element |
126
+ | `is_displayed` | Check if an element is displayed |
127
+
128
+ ### Cookie Management (Web)
129
+
130
+ | Tool | Description |
131
+ |------------------|--------------------------------------------------------|
132
+ | `get_cookies` | Get all cookies or a specific cookie by name |
133
+ | `set_cookie` | Set a cookie with name, value, and optional attributes |
134
+ | `delete_cookies` | Delete all cookies or a specific cookie |
135
+
136
+ ### Mobile Gestures (iOS/Android)
137
+
138
+ | Tool | Description |
139
+ |-----------------|-------------------------------------------|
140
+ | `tap_element` | Tap an element by selector or coordinates |
141
+ | `swipe` | Swipe in a direction (up/down/left/right) |
142
+ | `long_press` | Long press an element or coordinates |
143
+ | `drag_and_drop` | Drag from one location to another |
144
+
145
+ ### App Lifecycle (iOS/Android)
146
+
147
+ | Tool | Description |
148
+ |-----------------|--------------------------------------------------------------|
149
+ | `get_app_state` | Check app state (installed, running, background, foreground) |
150
+ | `activate_app` | Bring app to foreground |
151
+ | `terminate_app` | Terminate a running app |
152
+
153
+ ### Context Switching (Hybrid Apps)
154
+
155
+ | Tool | Description |
156
+ |-----------------------|-------------------------------------------------|
157
+ | `get_contexts` | List available contexts (NATIVE_APP, WEBVIEW_*) |
158
+ | `get_current_context` | Show the currently active context |
159
+ | `switch_context` | Switch between native and webview contexts |
160
+
161
+ ### Device Control (iOS/Android)
162
+
163
+ | Tool | Description |
164
+ |---------------------------------------|-----------------------------------------------|
165
+ | `get_device_info` | Get device platform, version, screen size |
166
+ | `rotate_device` | Rotate to portrait or landscape orientation |
167
+ | `get_orientation` | Get current device orientation |
168
+ | `lock_device` / `unlock_device` | Lock or unlock device screen |
169
+ | `is_device_locked` | Check if device is locked |
170
+ | `shake_device` | Shake the device (iOS only) |
171
+ | `send_keys` | Send keyboard input (Android only) |
172
+ | `press_key_code` | Press Android key code (BACK=4, HOME=3, etc.) |
173
+ | `hide_keyboard` / `is_keyboard_shown` | Control on-screen keyboard |
174
+ | `open_notifications` | Open notifications panel (Android only) |
175
+ | `get_geolocation` / `set_geolocation` | Get or set device GPS location |
176
+
177
+ ## Usage Examples
178
+
179
+ ### Real-World Test Cases
180
+
181
+ **Example 1: Testing Demo Android App (Book Scanning)**
182
+
183
+ ```
184
+ Test the Demo Android app at C:\Users\demo-liveApiGbRegionNonMinifiedRelease-3018788.apk on emulator-5554:
185
+ 1. Start the app with auto-grant permissions
186
+ 2. Get visible elements on the onboarding screen
187
+ 3. Tap "Skip" to bypass onboarding
188
+ 4. Verify main screen loads
189
+ 5. Take a screenshot
190
+ ```
191
+
192
+ **Example 2: Testing World of Books E-commerce Site**
193
+
194
+ ```
195
+ You are a Testing expert, and want to assess the basic workflows of worldofbooks.com:
196
+ - Open World of Books (accept all cookies)
197
+ - Get visible elements to see navigation structure
198
+ - Search for a fiction book
199
+ - Choose one and validate if there are NEW and used book options
200
+ - Report your findings at the end
201
+ ```
202
+
203
+ ### Browser Automation
204
+
205
+ **Basic web testing prompt:**
206
+
207
+ ```
208
+ You are a Testing expert, and want to assess the basic workflows of a web application:
209
+ - Open World of Books (accept all cookies)
210
+ - Search for a fiction book
211
+ - Choose one and validate if there are NEW and used book options
212
+ - Report your findings at the end
213
+ ```
214
+
215
+ **Browser configuration options:**
216
+
217
+ ```javascript
218
+ // Default settings (headed mode, 1280x1080)
219
+ start_browser()
220
+
221
+ // Headless mode
222
+ start_browser({headless: true})
223
+
224
+ // Custom dimensions
225
+ start_browser({windowWidth: 1920, windowHeight: 1080})
226
+
227
+ // Headless with custom dimensions
228
+ start_browser({headless: true, windowWidth: 1920, windowHeight: 1080})
229
+ ```
230
+
231
+ ### Mobile App Automation
232
+
233
+ **Testing an iOS app on simulator:**
234
+
235
+ ```
236
+ Test my iOS app located at /path/to/MyApp.app on iPhone 15 Pro simulator:
237
+ 1. Start the app session
238
+ 2. Tap the login button
239
+ 3. Enter "testuser" in the username field
240
+ 4. Take a screenshot of the home screen
241
+ 5. Close the session
242
+ ```
243
+
244
+ **Preserving app state between sessions:**
245
+
246
+ ```
247
+ Test my Android app without resetting data:
248
+ 1. Start app session with noReset: true and fullReset: false
249
+ 2. App launches with existing login state and user data preserved
250
+ 3. Run test scenarios
251
+ 4. Close session (app remains installed with data intact)
252
+ ```
253
+
254
+ **Testing an iOS app on real device:**
255
+
256
+ ```
257
+ Test my iOS app on my physical iPhone:
258
+ 1. Start app session with:
259
+ - platform: iOS
260
+ - appPath: /path/to/MyApp.ipa
261
+ - deviceName: My iPhone
262
+ - udid: 00008030-001234567890ABCD (your device's UDID)
263
+ - platformVersion: 17.0
264
+ 2. Run your test scenario
265
+ 3. Close the session
266
+ ```
267
+
268
+ **Testing an Android app:**
269
+
270
+ ```
271
+ Test my Android app /path/to/app.apk on the Pixel_6_API_34 emulator:
272
+ 1. Start the app with auto-grant permissions
273
+ 2. Get visible elements (use inViewportOnly: false to see all elements)
274
+ 3. Swipe up to scroll
275
+ 4. Tap on the "Settings" button using text matching
276
+ 5. Verify the settings screen is displayed
277
+ ```
278
+
279
+ **Advanced element detection:**
280
+
281
+ ```
282
+ Test my app and debug layout issues:
283
+ 1. Start the app session
284
+ 2. Get visible elements with includeContainers: true to see the layout hierarchy
285
+ 3. Analyze ViewGroup, FrameLayout, and ScrollView containers
286
+ 4. Use inViewportOnly: false to find off-screen elements that need scrolling
287
+ ```
288
+
289
+ **Hybrid app testing (switching contexts):**
290
+
291
+ ```
292
+ Test my hybrid app:
293
+ 1. Start the Android app session
294
+ 2. Tap "Open Web" button in native context
295
+ 3. List available contexts
296
+ 4. Switch to WEBVIEW context
297
+ 5. Click the login button using CSS selector
298
+ 6. Switch back to NATIVE_APP context
299
+ 7. Verify we're back on the home screen
300
+ ```
301
+
302
+ ## Important Notes
303
+
304
+ ⚠️ **Session Management:**
305
+
306
+ - Only one session (browser OR app) can be active at a time
307
+ - Always close sessions when done to free system resources
308
+ - To switch between browser and mobile, close the current session first
309
+ - Use `close_session({ detach: true })` to disconnect without terminating the session on the Appium server
310
+ - **State preservation** can be controlled with `noReset` and `fullReset` parameters during session creation
311
+ - Sessions created with `noReset: true` or without `appPath` will automatically detach on close
312
+
313
+ ⚠️ **Task Planning:**
314
+
315
+ - Break complex automation into smaller, focused operations
316
+ - Claude may consume message limits quickly with extensive automation
317
+
318
+ ⚠️ **Mobile Automation:**
319
+
320
+ - Appium server must be running before starting mobile sessions
321
+ - Ensure emulators/simulators are running and devices are connected
322
+ - iOS automation requires macOS with Xcode installed
323
+ - **iOS Real Devices**: Testing on physical iOS devices requires the device's UDID (40-character unique identifier). See
324
+ Prerequisites section for how to find your UDID
325
+
326
+ ## Selector Syntax Quick Reference
327
+
328
+ **Web (CSS/XPath):**
329
+
330
+ - CSS: `button.my-class`, `#element-id`
331
+ - XPath: `//button[@class='my-class']`
332
+ - Text: `button=Exact text`, `a*=Contains text`
333
+
334
+ **Mobile (Cross-Platform):**
335
+
336
+ - Accessibility ID: `~loginButton` (works on both iOS and Android)
337
+ - Android UiAutomator: `android=new UiSelector().text("Login")`
338
+ - iOS Predicate: `-ios predicate string:label == "Login" AND visible == 1`
339
+ - XPath: `//android.widget.Button[@text="Login"]`
340
+
341
+ ## Advanced Features
342
+
343
+ ### App State Preservation
344
+
345
+ **State Preservation with noReset/fullReset:**
346
+ Control app state when creating new sessions using the `noReset` and `fullReset` parameters:
347
+
348
+ | noReset | fullReset | Behavior |
349
+ |---------|-----------|-------------------------------------------------------|
350
+ | `true` | `false` | Preserve state: App stays installed, data preserved |
351
+ | `false` | `false` | Clear app data but keep app installed (default) |
352
+ | `false` | `true` | Full reset: Uninstall and reinstall app (clean slate) |
353
+
354
+ **Example with state preservation:**
355
+
356
+ ```javascript
357
+ // Preserve login state between test runs
358
+ start_app_session({
359
+ platform: 'Android',
360
+ appPath: '/path/to/app.apk',
361
+ deviceName: 'emulator-5554',
362
+ noReset: true, // Don't reset app state
363
+ fullReset: false, // Don't uninstall
364
+ autoGrantPermissions: true
365
+ })
366
+ // App launches with existing user data, login tokens, preferences intact
367
+ ```
368
+
369
+ **Detach from Sessions:**
370
+ The `close_session` tool supports a `detach` parameter that disconnects from the session without terminating it on the
371
+ Appium server:
372
+
373
+ ```javascript
374
+ // Detach without killing the session
375
+ close_session({detach: true})
376
+
377
+ // Standard session termination (closes the app and removes session)
378
+ close_session({detach: false}) // or just close_session()
379
+ ```
380
+
381
+ Sessions created with `noReset: true` or without `appPath` will automatically detach on close.
382
+
383
+ This is particularly useful when:
384
+
385
+ * Preserving app state for manual testing continuation
386
+ * Debugging multi-step workflows (leave session running between tool invocations)
387
+ * Testing scenarios where you want the app to remain installed and in current state
388
+
389
+ ### Smart Element Detection
390
+
391
+ - **Platform-specific element classification**: Automatically identifies interactable elements vs layout containers
392
+ - Android: Button, EditText, CheckBox vs ViewGroup, FrameLayout, ScrollView
393
+ - iOS: Button, TextField, Switch vs View, StackView, CollectionView
394
+ - **Multiple locator strategies**: Each element provides accessibility ID, resource ID, text, XPath, and
395
+ platform-specific selectors
396
+ - **Viewport filtering**: Control whether to get only visible elements or all elements including off-screen
397
+ - **Layout debugging**: Optionally include container elements to understand UI hierarchy
398
+
399
+ ### Automatic Permission & Alert Handling
400
+
401
+ Both iOS and Android sessions now support automatic handling of system permissions and alerts:
402
+
403
+ - `autoGrantPermissions` (default: true): Automatically grants app permissions (camera, location, etc.)
404
+ - `autoAcceptAlerts` (default: true): Automatically accepts system alerts and dialogs
405
+ - `autoDismissAlerts` (optional): Set to true to dismiss alerts instead of accepting them
406
+
407
+ This eliminates the need to manually handle permission popups during automated testing.
408
+
409
+ ## Technical Details
410
+
411
+ - **Built with:** TypeScript, WebDriverIO, Appium
412
+ - **Browser Support:** Chrome (headed/headless, automated driver management)
413
+ - **Mobile Support:** iOS (XCUITest) and Android (UiAutomator2/Espresso)
414
+ - **Protocol:** Model Context Protocol (MCP) for Claude Desktop integration
415
+ - **Session Model:** Single active session (browser or mobile app)
416
+ - **Data Format:** TOON (Token-Oriented Object Notation) for efficient LLM communication
417
+ - **Element Detection:** XML-based page source parsing with intelligent filtering and multi-strategy locator generation
418
+
419
+ ## Troubleshooting
420
+
421
+ **Browser automation not working?**
422
+
423
+ - Ensure Chrome is installed
424
+ - Try restarting Claude Desktop completely
425
+ - Check that no other WebDriver instances are running
426
+
427
+ **Mobile automation not working?**
428
+
429
+ - Verify Appium server is running: `appium`
430
+ - Check device/emulator is running: `adb devices` (Android) or Xcode Devices (iOS)
431
+ - Ensure correct platform drivers are installed
432
+ - Verify app path is correct and accessible
433
+
434
+ **Found issues or have suggestions?** Please share your feedback!
@@ -0,0 +1 @@
1
+ #!/usr/bin/env node