@mobilenext/mobile-mcp 0.0.12 β†’ 0.0.14

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -1,6 +1,6 @@
1
- ## Mobile Next - MCP server for Mobile Automation
1
+ # Mobile Next - MCP server for Mobile Development and Automation | iOS, Android, Simulator, Emulator, and physical devices
2
2
 
3
- This is a [Model Context Protocol (MCP) server](https://github.com/modelcontextprotocol) that enables scalable mobile automation through a platform-agnostic interface, eliminating the need for distinct iOS or Android knowledge.
3
+ This is a [Model Context Protocol (MCP) server](https://github.com/modelcontextprotocol) that enables scalable mobile automation, development through a platform-agnostic interface, eliminating the need for distinct iOS or Android knowledge. You can run it on emulators, simulators, and physical devices (iOS and Android).
4
4
  This server allows Agents and LLMs to interact with native iOS/Android applications and devices through structured accessibility snapshots or coordinate-based taps based on screenshots.
5
5
 
6
6
  https://github.com/user-attachments/assets/c4e89c4f-cc71-4424-8184-bdbc8c638fa1
@@ -27,7 +27,8 @@ https://github.com/user-attachments/assets/c4e89c4f-cc71-4424-8184-bdbc8c638fa1
27
27
  Join us on our journey as we continuously enhance Mobile MCP!
28
28
  Check out our detailed roadmap to see upcoming features, improvements, and milestones. Your feedback is invaluable in shaping the future of mobile automation.
29
29
 
30
- πŸ‘‰ [Explore the Roadmap](https://github.com/orgs/mobile-next/projects/1)
30
+ πŸ‘‰ [Explore the Roadmap](https://github.com/orgs/mobile-next/projects/3)
31
+
31
32
 
32
33
  ### Main use cases
33
34
 
@@ -47,7 +48,7 @@ How we help to scale mobile automation:
47
48
  - πŸ“Š **Deterministic tool application**: Reduces ambiguity found in purely screenshot-based approaches by relying on structured data whenever possible.
48
49
  - πŸ“Ί **Extract structured data**: Enables you to extract structred data from anything visible on screen.
49
50
 
50
- ## Mobile MCP Architecture
51
+ ## πŸ—οΈ Mobile MCP Architecture
51
52
 
52
53
  <p align="center">
53
54
  <a href="https://raw.githubusercontent.com/mobile-next/mobile-next-assets/refs/heads/main/mobile-mcp-arch-1.png">
@@ -56,10 +57,14 @@ How we help to scale mobile automation:
56
57
  </p>
57
58
 
58
59
 
60
+ ## πŸ“š Wiki page
61
+
62
+ More details in our [wiki page](https://github.com/mobile-next/mobile-mcp/wiki) for setup, configuration and debugging related questions.
63
+
59
64
 
60
65
  ## Installation and configuration
61
66
 
62
- [Detailed guide for Claude Desktop](https://modelcontextprotocol.io/quickstart/user)
67
+ Setup our MCP with Cursor, Claude, VS Code, Github Copilot:
63
68
 
64
69
  ```json
65
70
  {
@@ -81,6 +86,64 @@ claude mcp add mobile -- npx -y @mobilenext/mobile-mcp@latest ⁠
81
86
 
82
87
  [Read more in our wiki](https://github.com/mobile-next/mobile-mcp/wiki)! πŸš€
83
88
 
89
+
90
+ ### πŸ› οΈ How to Use πŸ“
91
+
92
+ After adding the MCP server to your IDE/Client, you can instruct your AI assistant to use the available tools.
93
+ For example, in Cursor's agent mode, you could use the prompts below to quickly validate, test and iterate on UI intereactions, read information from screen, go through complex workflows.
94
+ Be descriptive, straight to the point.
95
+
96
+ ### ✨ Example Prompts
97
+
98
+ #### Workflows
99
+
100
+ You can specifiy detailed workflows in a single prompt, verify business logic, setup automations. You can go crazy:
101
+
102
+ **Search for a video, comment, like and share it.**
103
+ ```
104
+ Find the video called " Beginner Recipe for Tonkotsu Ramen" by Way of Ramen, click on like video, after liking write a comment " this was delicious, will make it next Friday", share the video with the first contact in your whatsapp list.
105
+ ```
106
+
107
+ **Download a successful step counter app, register, setup workout and 5 start the app**
108
+ ```
109
+ Find and Download a free "Pomodoro" app that has more thank 1k stars.
110
+ Launch the app, register with my email, after registration find how to start a pomodoro timer.
111
+ When the pomodoro timer started, go back to the app store and rate the app 5 stars,
112
+ and leave a comment how useful the app is.
113
+ ```
114
+
115
+ **Search in Substack, read, highlight, comment and save an article**
116
+ ```
117
+ Open Substack website, search for "Latest trends in AI automation 2025", open the first article,
118
+ highlight the section titled "Emerging AI trends", and save article to reading list for later review,
119
+ comment a random paragraph summary.
120
+ ```
121
+
122
+ **Reserve a workout class, set timer**
123
+ ```
124
+ Open ClassPass, search for yoga classes tomorrow morning within 2 miles,
125
+ book the highest-rated class at 7 AM, confirm reservation,
126
+ setup a timer for the booked slot in the phone
127
+ ```
128
+
129
+ **Find a local event, setup calendar event**
130
+ ```
131
+ Open Eventbrite, search for AI startup meetup events happening this weekend in "Austin, TX",
132
+ select the most popular one, register and RSVP yes to the even, setup a calendar event as a reminder.
133
+ ```
134
+
135
+ **Check weather forecast and send a Whatsapp/Telegram/Slack message**
136
+ ```
137
+ Open Weather app, check tomorrow's weather forecast for "Berlin", and send the summary
138
+ via Whatsapp/Telegram/Slack to contact "Lauren Trown", thumbs up their response.
139
+ ```
140
+
141
+ - **Schedule a meeting in Zoom and share invite via email**
142
+ ```
143
+ Open Zoom app, schedule a meeting titled "AI Hackathon" for tomorrow at 10 AM with a duration of 1 hour,
144
+ copy the invitation link, and send it via Gmail to contacts "team@example.com".
145
+ ```
146
+
84
147
  ## Prerequisites
85
148
 
86
149
  What you will need to connect MCP with your agent and mobile devices:
@@ -111,96 +174,6 @@ On iOS, you'll need Xcode and to run the Simulator before using Mobile MCP with
111
174
  - `xcrun simctl list`
112
175
  - `xcrun simctl boot "iPhone 16"`
113
176
 
114
- # Mobile Commands and interaction tools
115
-
116
- The commands and tools support both accessibility-based locators (preferred) and coordinate-based inputs, giving you flexibility when accessibility/automation IDs are missing for reliable and seemless automation.
117
-
118
- ## mobile_list_apps
119
- - **Description:** List all the installed apps on the device
120
- - **Parameters:**
121
- - `bundleId` (string): The application's unique bundle/package identifier like: com.google.android.keep or com.apple.mobilenotes )
122
-
123
- ## mobile_launch_app
124
- - **Description:** Launches the specified app on the device/emulator
125
- - **Parameters:**
126
- - `bundleId` (string): The application's unique bundle/package identifier like: com.google.android.keep or com.apple.mobilenotes )
127
-
128
- ## mobile_terminate_app
129
- - **Description:** Terminates a running application
130
- - **Parameters:**
131
- - `packageName` (string): Based on the application's bundle/package identifier calls am force stop or kills the app based on pid.
132
-
133
- ## mobile_get_screen_size
134
- - **Description:** Get the screen size of the mobile device in pixels
135
- - **Parameters:** None
136
-
137
- ## mobile_click_on_screen_at_coordinates
138
- - **Description:** Taps on specified screen coordinates based on coordinates.
139
- - **Parameters:**
140
- - `x` (number): X-coordinate
141
- - `y` (number): Y-coordinate
142
-
143
- ## mobile_list_elements_on_screen
144
- - **Description:** List elements on screen and their coordinates, with display text or accessibility label.
145
- - **Parameters:** None
146
-
147
- ## mobile_element_tap
148
- - **Description:** Taps on a UI element identified by accessibility locator
149
- - **Parameters:**
150
- - `element` (string): Human-readable element description (e.g., "Login button")
151
- - `ref` (string): Accessibility/automation ID or reference from a snapshot
152
-
153
- ## mobile_tap
154
- - **Description:** Taps on specified screen coordinates
155
- - **Parameters:**
156
- - `x` (number): X-coordinate
157
- - `y` (number): Y-coordinate
158
-
159
- ## mobile_press_button
160
- - **Description:** Press a button on device (home, back, volume, enter, power button.)
161
- - **Parameters:** None
162
-
163
- ## mobile_open_url
164
- - **Description:** Open a URL in browser on device
165
- - **Parameters:**
166
- - `url` (string): The URL to be opened (e.g., "https://example.com").
167
-
168
- ## mobile_type_text
169
- - **Description:** Types text into a focused UI element (e.g., TextField, SearchField)
170
- - **Parameters:**
171
- - `text` (string): Text to type
172
- - `submit` (boolean): Whether to press Enter/Return after typing
173
-
174
- ## mobile_element_swipe
175
- - **Description:** Performs a swipe gesture from one UI element to another
176
- - **Parameters:**
177
- - `startElement` (string): Human-readable description of the start element
178
- - `startRef` (string): Accessibility/automation ID of the start element
179
- - `endElement` (string): Human-readable description of the end element
180
- - `endRef` (string): Accessibility/automation ID of the end element
181
-
182
- ## mobile_swipe
183
- - **Description:** Performs a swipe gesture between two sets of screen coordinates
184
- - **Parameters:**
185
- - `startX` (number): Start X-coordinate
186
- - `startY` (number): Start Y-coordinate
187
- - `endX` (number): End X-coordinate
188
- - `endY` (number): End Y-coordinate
189
-
190
- ## mobile_press_key
191
- - **Description:** Presses hardware keys or triggers special events (e.g., back button on Android)
192
- - **Parameters:**
193
- - `key` (string): Key identifier (e.g., HOME, BACK, VOLUME_UP, etc.)
194
-
195
- ## mobile_take_screenshot
196
- - **Description:** Captures a screenshot of the current device screen
197
- - **Parameters:** None
198
-
199
- ## mobile_get_source
200
- - **Description:** Fetches the current device UI structure (accessibility snapshot) (xml format)
201
- - **Parameters:** None
202
-
203
-
204
177
  # Thanks to all contributors ❀️
205
178
 
206
179
  ### We appreciate everyone who has helped improve this project.
package/lib/android.js CHANGED
@@ -36,7 +36,7 @@ var __importDefault = (this && this.__importDefault) || function (mod) {
36
36
  return (mod && mod.__esModule) ? mod : { "default": mod };
37
37
  };
38
38
  Object.defineProperty(exports, "__esModule", { value: true });
39
- exports.getConnectedDevices = exports.AndroidRobot = void 0;
39
+ exports.AndroidDeviceManager = exports.AndroidRobot = void 0;
40
40
  const path_1 = __importDefault(require("path"));
41
41
  const child_process_1 = require("child_process");
42
42
  const xml = __importStar(require("fast-xml-parser"));
@@ -54,6 +54,11 @@ const BUTTON_MAP = {
54
54
  "VOLUME_UP": "KEYCODE_VOLUME_UP",
55
55
  "VOLUME_DOWN": "KEYCODE_VOLUME_DOWN",
56
56
  "ENTER": "KEYCODE_ENTER",
57
+ "DPAD_CENTER": "KEYCODE_DPAD_CENTER",
58
+ "DPAD_UP": "KEYCODE_DPAD_UP",
59
+ "DPAD_DOWN": "KEYCODE_DPAD_DOWN",
60
+ "DPAD_LEFT": "KEYCODE_DPAD_LEFT",
61
+ "DPAD_RIGHT": "KEYCODE_DPAD_RIGHT",
57
62
  };
58
63
  const TIMEOUT = 30000;
59
64
  const MAX_BUFFER_SIZE = 1024 * 1024 * 4;
@@ -68,6 +73,14 @@ class AndroidRobot {
68
73
  timeout: TIMEOUT,
69
74
  });
70
75
  }
76
+ getSystemFeatures() {
77
+ return this.adb("shell", "pm", "list", "features")
78
+ .toString()
79
+ .split("\n")
80
+ .map(line => line.trim())
81
+ .filter(line => line.startsWith("feature:"))
82
+ .map(line => line.substring("feature:".length));
83
+ }
71
84
  async getScreenSize() {
72
85
  const screenSize = this.adb("shell", "wm", "size")
73
86
  .toString()
@@ -96,6 +109,14 @@ class AndroidRobot {
96
109
  async launchApp(packageName) {
97
110
  this.adb("shell", "monkey", "-p", packageName, "-c", "android.intent.category.LAUNCHER", "1");
98
111
  }
112
+ async listRunningProcesses() {
113
+ return this.adb("shell", "ps", "-e")
114
+ .toString()
115
+ .split("\n")
116
+ .map(line => line.trim())
117
+ .filter(line => line.startsWith("u")) // non-system processes
118
+ .map(line => line.split(/\s+/)[8]); // get process name
119
+ }
99
120
  async swipe(direction) {
100
121
  const screenSize = await this.getScreenSize();
101
122
  const centerX = screenSize.width >> 1;
@@ -122,16 +143,6 @@ class AndroidRobot {
122
143
  }
123
144
  collectElements(node) {
124
145
  const elements = [];
125
- const getScreenElementRect = (element) => {
126
- const bounds = String(element.bounds);
127
- const [, left, top, right, bottom] = bounds.match(/^\[(\d+),(\d+)\]\[(\d+),(\d+)\]$/)?.map(Number) || [];
128
- return {
129
- x: left,
130
- y: top,
131
- width: right - left,
132
- height: bottom - top,
133
- };
134
- };
135
146
  if (node.node) {
136
147
  if (Array.isArray(node.node)) {
137
148
  for (const childNode of node.node) {
@@ -145,10 +156,14 @@ class AndroidRobot {
145
156
  if (node.text || node["content-desc"] || node.hint) {
146
157
  const element = {
147
158
  type: node.class || "text",
148
- name: node.text,
159
+ text: node.text,
149
160
  label: node["content-desc"] || node.hint || "",
150
- rect: getScreenElementRect(node),
161
+ rect: this.getScreenElementRect(node),
151
162
  };
163
+ if (node.focused === "true") {
164
+ // only provide it if it's true, otherwise don't confuse llm
165
+ element.focused = true;
166
+ }
152
167
  if (element.rect.width > 0 && element.rect.height > 0) {
153
168
  elements.push(element);
154
169
  }
@@ -156,12 +171,7 @@ class AndroidRobot {
156
171
  return elements;
157
172
  }
158
173
  async getElementsOnScreen() {
159
- const dump = this.adb("exec-out", "uiautomator", "dump", "/dev/tty");
160
- const parser = new xml.XMLParser({
161
- ignoreAttributes: false,
162
- attributeNamePrefix: ""
163
- });
164
- const parsedXml = parser.parse(dump);
174
+ const parsedXml = await this.getUiAutomatorXml();
165
175
  const hierarchy = parsedXml.hierarchy;
166
176
  const elements = this.collectElements(hierarchy.node);
167
177
  return elements;
@@ -186,14 +196,80 @@ class AndroidRobot {
186
196
  async tap(x, y) {
187
197
  this.adb("shell", "input", "tap", `${x}`, `${y}`);
188
198
  }
199
+ async setOrientation(orientation) {
200
+ // Android uses numbers for orientation:
201
+ // 0 - Portrait
202
+ // 1 - Landscape
203
+ const orientationValue = orientation === "portrait" ? 0 : 1;
204
+ // Set orientation using content provider
205
+ this.adb("shell", "content", "insert", "--uri", "content://settings/system", "--bind", "name:s:user_rotation", "--bind", `value:i:${orientationValue}`);
206
+ // Force the orientation change
207
+ this.adb("shell", "settings", "put", "system", "accelerometer_rotation", "0");
208
+ }
209
+ async getOrientation() {
210
+ const rotation = this.adb("shell", "settings", "get", "system", "user_rotation").toString().trim();
211
+ return rotation === "0" ? "portrait" : "landscape";
212
+ }
213
+ async getUiAutomatorDump() {
214
+ for (let tries = 0; tries < 10; tries++) {
215
+ const dump = this.adb("exec-out", "uiautomator", "dump", "/dev/tty").toString();
216
+ // note: we're not catching other errors here. maybe we should check for <?xml
217
+ if (dump.includes("null root node returned by UiTestAutomationBridge")) {
218
+ // uncomment for debugging
219
+ // const screenshot = await this.getScreenshot();
220
+ // console.error("Failed to get UIAutomator XML. Here's a screenshot: " + screenshot.toString("base64"));
221
+ continue;
222
+ }
223
+ return dump;
224
+ }
225
+ throw new robot_1.ActionableError("Failed to get UIAutomator XML");
226
+ }
227
+ async getUiAutomatorXml() {
228
+ const dump = await this.getUiAutomatorDump();
229
+ const parser = new xml.XMLParser({
230
+ ignoreAttributes: false,
231
+ attributeNamePrefix: ""
232
+ });
233
+ return parser.parse(dump);
234
+ }
235
+ getScreenElementRect(node) {
236
+ const bounds = String(node.bounds);
237
+ const [, left, top, right, bottom] = bounds.match(/^\[(\d+),(\d+)\]\[(\d+),(\d+)\]$/)?.map(Number) || [];
238
+ return {
239
+ x: left,
240
+ y: top,
241
+ width: right - left,
242
+ height: bottom - top,
243
+ };
244
+ }
189
245
  }
190
246
  exports.AndroidRobot = AndroidRobot;
191
- const getConnectedDevices = () => {
192
- return (0, child_process_1.execFileSync)(getAdbPath(), ["devices"])
193
- .toString()
194
- .split("\n")
195
- .filter(line => !line.startsWith("List of devices attached"))
196
- .filter(line => line.trim() !== "")
197
- .map(line => line.split("\t")[0]);
198
- };
199
- exports.getConnectedDevices = getConnectedDevices;
247
+ class AndroidDeviceManager {
248
+ getDeviceType(name) {
249
+ const device = new AndroidRobot(name);
250
+ const features = device.getSystemFeatures();
251
+ if (features.includes("android.software.leanback") || features.includes("android.hardware.type.television")) {
252
+ return "tv";
253
+ }
254
+ return "mobile";
255
+ }
256
+ getConnectedDevices() {
257
+ try {
258
+ const names = (0, child_process_1.execFileSync)(getAdbPath(), ["devices"])
259
+ .toString()
260
+ .split("\n")
261
+ .filter(line => !line.startsWith("List of devices attached"))
262
+ .filter(line => line.trim() !== "")
263
+ .map(line => line.split("\t")[0]);
264
+ return names.map(name => ({
265
+ deviceId: name,
266
+ deviceType: this.getDeviceType(name),
267
+ }));
268
+ }
269
+ catch (error) {
270
+ console.error("Could not execute adb command, maybe ANDROID_HOME is not set?");
271
+ return [];
272
+ }
273
+ }
274
+ }
275
+ exports.AndroidDeviceManager = AndroidDeviceManager;
@@ -0,0 +1,64 @@
1
+ "use strict";
2
+ Object.defineProperty(exports, "__esModule", { value: true });
3
+ exports.isImageMagickInstalled = exports.Image = exports.ImageTransformer = void 0;
4
+ const child_process_1 = require("child_process");
5
+ const DEFAULT_JPEG_QUALITY = 75;
6
+ class ImageTransformer {
7
+ buffer;
8
+ newWidth = 0;
9
+ newFormat = "png";
10
+ jpegOptions = { quality: DEFAULT_JPEG_QUALITY };
11
+ constructor(buffer) {
12
+ this.buffer = buffer;
13
+ }
14
+ resize(width) {
15
+ this.newWidth = width;
16
+ return this;
17
+ }
18
+ jpeg(options) {
19
+ this.newFormat = "jpg";
20
+ this.jpegOptions = options;
21
+ return this;
22
+ }
23
+ png() {
24
+ this.newFormat = "png";
25
+ return this;
26
+ }
27
+ toBuffer() {
28
+ const proc = (0, child_process_1.spawnSync)("magick", ["-", "-resize", `${this.newWidth}x`, "-quality", `${this.jpegOptions.quality}`, `${this.newFormat}:-`], {
29
+ maxBuffer: 8 * 1024 * 1024,
30
+ input: this.buffer
31
+ });
32
+ return proc.stdout;
33
+ }
34
+ }
35
+ exports.ImageTransformer = ImageTransformer;
36
+ class Image {
37
+ buffer;
38
+ constructor(buffer) {
39
+ this.buffer = buffer;
40
+ }
41
+ static fromBuffer(buffer) {
42
+ return new Image(buffer);
43
+ }
44
+ resize(width) {
45
+ return new ImageTransformer(this.buffer).resize(width);
46
+ }
47
+ jpeg(options) {
48
+ return new ImageTransformer(this.buffer).jpeg(options);
49
+ }
50
+ }
51
+ exports.Image = Image;
52
+ const isImageMagickInstalled = () => {
53
+ try {
54
+ return (0, child_process_1.execFileSync)("magick", ["--version"])
55
+ .toString()
56
+ .split("\n")
57
+ .filter(line => line.includes("Version: ImageMagick"))
58
+ .length > 0;
59
+ }
60
+ catch (error) {
61
+ return false;
62
+ }
63
+ };
64
+ exports.isImageMagickInstalled = isImageMagickInstalled;
package/lib/ios.js CHANGED
@@ -132,6 +132,14 @@ class IosRobot {
132
132
  (0, fs_1.unlinkSync)(tmpFilename);
133
133
  return buffer;
134
134
  }
135
+ async setOrientation(orientation) {
136
+ const wda = await this.wda();
137
+ await wda.setOrientation(orientation);
138
+ }
139
+ async getOrientation() {
140
+ const wda = await this.wda();
141
+ return await wda.getOrientation();
142
+ }
135
143
  }
136
144
  exports.IosRobot = IosRobot;
137
145
  class IosManager {
@@ -145,6 +153,11 @@ class IosManager {
145
153
  return false;
146
154
  }
147
155
  }
156
+ async getDeviceName(deviceId) {
157
+ const output = (0, child_process_1.execFileSync)(getGoIosPath(), ["info", "--udid", deviceId]).toString();
158
+ const json = JSON.parse(output);
159
+ return json.DeviceName;
160
+ }
148
161
  async listDevices() {
149
162
  if (!(await this.isGoIosInstalled())) {
150
163
  console.error("go-ios is not installed, no physical iOS devices can be detected");
@@ -152,7 +165,11 @@ class IosManager {
152
165
  }
153
166
  const output = (0, child_process_1.execFileSync)(getGoIosPath(), ["list"]).toString();
154
167
  const json = JSON.parse(output);
155
- return json.deviceList;
168
+ const devices = json.deviceList.map(async (device) => ({
169
+ deviceId: device,
170
+ deviceName: await this.getDeviceName(device),
171
+ }));
172
+ return Promise.all(devices);
156
173
  }
157
174
  }
158
175
  exports.IosManager = IosManager;
@@ -15,7 +15,7 @@ class Simctl {
15
15
  async wda() {
16
16
  const wda = new webdriver_agent_1.WebDriverAgent("localhost", WDA_PORT);
17
17
  if (!(await wda.isRunning())) {
18
- throw new robot_1.ActionableError("WebDriverAgent is not running on device (tunnel okay, port forwarding okay), please see https://github.com/mobile-next/mobile-mcp/wiki/");
18
+ throw new robot_1.ActionableError("WebDriverAgent is not running on simulator, please see https://github.com/mobile-next/mobile-mcp/wiki/");
19
19
  }
20
20
  return wda;
21
21
  }
@@ -100,21 +100,40 @@ class Simctl {
100
100
  const wda = await this.wda();
101
101
  return wda.getElementsOnScreen();
102
102
  }
103
+ async setOrientation(orientation) {
104
+ const wda = await this.wda();
105
+ return wda.setOrientation(orientation);
106
+ }
107
+ async getOrientation() {
108
+ const wda = await this.wda();
109
+ return wda.getOrientation();
110
+ }
103
111
  }
104
112
  exports.Simctl = Simctl;
105
113
  class SimctlManager {
106
114
  listSimulators() {
107
- const text = (0, child_process_1.execFileSync)("xcrun", ["simctl", "list", "devices", "-j"]).toString();
108
- const json = JSON.parse(text);
109
- return Object.values(json.devices).flatMap(device => {
110
- return device.map(d => {
111
- return {
112
- name: d.name,
113
- uuid: d.udid,
114
- state: d.state,
115
- };
115
+ // detect if this is a mac
116
+ if (process.platform !== "darwin") {
117
+ // don't even try to run xcrun
118
+ return [];
119
+ }
120
+ try {
121
+ const text = (0, child_process_1.execFileSync)("xcrun", ["simctl", "list", "devices", "-j"]).toString();
122
+ const json = JSON.parse(text);
123
+ return Object.values(json.devices).flatMap(device => {
124
+ return device.map(d => {
125
+ return {
126
+ name: d.name,
127
+ uuid: d.udid,
128
+ state: d.state,
129
+ };
130
+ });
116
131
  });
117
- });
132
+ }
133
+ catch (error) {
134
+ console.error("Error listing simulators", error);
135
+ return [];
136
+ }
118
137
  }
119
138
  listBootedSimulators() {
120
139
  return this.listSimulators()
package/lib/png.js ADDED
@@ -0,0 +1,19 @@
1
+ "use strict";
2
+ Object.defineProperty(exports, "__esModule", { value: true });
3
+ exports.PNG = void 0;
4
+ class PNG {
5
+ buffer;
6
+ constructor(buffer) {
7
+ this.buffer = buffer;
8
+ }
9
+ getDimensions() {
10
+ const pngSignature = Buffer.from([137, 80, 78, 71, 13, 10, 26, 10]);
11
+ if (!this.buffer.subarray(0, 8).equals(pngSignature)) {
12
+ throw new Error("Not a valid PNG file");
13
+ }
14
+ const width = this.buffer.readUInt32BE(16);
15
+ const height = this.buffer.readUInt32BE(20);
16
+ return { width, height };
17
+ }
18
+ }
19
+ exports.PNG = PNG;
package/lib/server.js CHANGED
@@ -1,17 +1,15 @@
1
1
  "use strict";
2
- var __importDefault = (this && this.__importDefault) || function (mod) {
3
- return (mod && mod.__esModule) ? mod : { "default": mod };
4
- };
5
2
  Object.defineProperty(exports, "__esModule", { value: true });
6
3
  exports.createMcpServer = void 0;
7
4
  const mcp_js_1 = require("@modelcontextprotocol/sdk/server/mcp.js");
8
5
  const zod_1 = require("zod");
9
- const sharp_1 = __importDefault(require("sharp"));
10
6
  const logger_1 = require("./logger");
11
7
  const android_1 = require("./android");
12
8
  const robot_1 = require("./robot");
13
9
  const iphone_simulator_1 = require("./iphone-simulator");
14
10
  const ios_1 = require("./ios");
11
+ const png_1 = require("./png");
12
+ const image_utils_1 = require("./image-utils");
15
13
  const getAgentVersion = () => {
16
14
  const json = require("../package.json");
17
15
  return json.version;
@@ -62,11 +60,28 @@ const createMcpServer = () => {
62
60
  };
63
61
  tool("mobile_list_available_devices", "List all available devices. This includes both physical devices and simulators. If there is more than one device returned, you need to let the user select one of them.", {}, async ({}) => {
64
62
  const iosManager = new ios_1.IosManager();
65
- const devices = await simulatorManager.listBootedSimulators();
63
+ const androidManager = new android_1.AndroidDeviceManager();
64
+ const devices = simulatorManager.listBootedSimulators();
66
65
  const simulatorNames = devices.map(d => d.name);
67
- const androidDevices = (0, android_1.getConnectedDevices)();
66
+ const androidDevices = androidManager.getConnectedDevices();
68
67
  const iosDevices = await iosManager.listDevices();
69
- return `Found these iOS simulators: [${simulatorNames.join(".")}], iOS devices: [${iosDevices.join(",")}] and Android devices: [${androidDevices.join(",")}]`;
68
+ const iosDeviceNames = iosDevices.map(d => d.deviceId);
69
+ const androidTvDevices = androidDevices.filter(d => d.deviceType === "tv").map(d => d.deviceId);
70
+ const androidMobileDevices = androidDevices.filter(d => d.deviceType === "mobile").map(d => d.deviceId);
71
+ const resp = ["Found these devices:"];
72
+ if (simulatorNames.length > 0) {
73
+ resp.push(`iOS simulators: [${simulatorNames.join(".")}]`);
74
+ }
75
+ if (iosDevices.length > 0) {
76
+ resp.push(`iOS devices: [${iosDeviceNames.join(",")}]`);
77
+ }
78
+ if (androidMobileDevices.length > 0) {
79
+ resp.push(`Android devices: [${androidMobileDevices.join(",")}]`);
80
+ }
81
+ if (androidTvDevices.length > 0) {
82
+ resp.push(`Android TV devices: [${androidTvDevices.join(",")}]`);
83
+ }
84
+ return resp.join("\n");
70
85
  });
71
86
  tool("mobile_use_device", "Select a device to use. This can be a simulator or an Android device. Use the list_available_devices tool to get a list of available devices.", {
72
87
  device: zod_1.z.string().describe("The name of the device to select"),
@@ -83,7 +98,7 @@ const createMcpServer = () => {
83
98
  robot = new android_1.AndroidRobot(device);
84
99
  break;
85
100
  }
86
- return `Selected device: ${device} (${deviceType})`;
101
+ return `Selected device: ${device}`;
87
102
  });
88
103
  tool("mobile_list_apps", "List all the installed apps on the device", {}, async ({}) => {
89
104
  requireRobot();
@@ -121,17 +136,25 @@ const createMcpServer = () => {
121
136
  requireRobot();
122
137
  const elements = await robot.getElementsOnScreen();
123
138
  const result = elements.map(element => {
124
- const x = Number((element.rect.x + element.rect.width / 2)).toFixed(3);
125
- const y = Number((element.rect.y + element.rect.height / 2)).toFixed(3);
126
- return {
127
- text: element.label,
139
+ const x = Number((element.rect.x + element.rect.width / 2)).toFixed(1);
140
+ const y = Number((element.rect.y + element.rect.height / 2)).toFixed(1);
141
+ const out = {
142
+ type: element.type,
143
+ text: element.text,
144
+ label: element.label,
145
+ name: element.name,
146
+ value: element.value,
128
147
  coordinates: { x, y }
129
148
  };
149
+ if (element.focused) {
150
+ out.focused = true;
151
+ }
152
+ return out;
130
153
  });
131
154
  return `Found these elements on screen: ${JSON.stringify(result)}`;
132
155
  });
133
156
  tool("mobile_press_button", "Press a button on device", {
134
- button: zod_1.z.string().describe("The button to press. Supported buttons: BACK (android only), HOME, VOLUME_UP, VOLUME_DOWN, ENTER"),
157
+ button: zod_1.z.string().describe("The button to press. Supported buttons: BACK (android only), HOME, VOLUME_UP, VOLUME_DOWN, ENTER, DPAD_CENTER (android tv only), DPAD_UP (android tv only), DPAD_DOWN (android tv only), DPAD_LEFT (android tv only), DPAD_RIGHT (android tv only)"),
135
158
  }, async ({ button }) => {
136
159
  requireRobot();
137
160
  await robot.pressButton(button);
@@ -165,24 +188,29 @@ const createMcpServer = () => {
165
188
  server.tool("mobile_take_screenshot", "Take a screenshot of the mobile device. Use this to understand what's on screen, if you need to press an element that is available through view hierarchy then you must list elements on screen instead. Do not cache this result.", {}, async ({}) => {
166
189
  requireRobot();
167
190
  try {
168
- const screenshot = await robot.getScreenshot();
169
- // Scale down the screenshot by 50%
170
- const image = (0, sharp_1.default)(screenshot);
171
- const metadata = await image.metadata();
172
- if (!metadata.width) {
173
- throw new Error("Failed to get screenshot metadata");
191
+ let screenshot = await robot.getScreenshot();
192
+ let mimeType = "image/png";
193
+ // validate we received a png, will throw exception otherwise
194
+ const image = new png_1.PNG(screenshot);
195
+ const pngSize = image.getDimensions();
196
+ if (pngSize.width <= 0 || pngSize.height <= 0) {
197
+ throw new robot_1.ActionableError("Screenshot is invalid. Please try again.");
198
+ }
199
+ if ((0, image_utils_1.isImageMagickInstalled)()) {
200
+ (0, logger_1.trace)("ImageMagick is installed, resizing screenshot");
201
+ const image = image_utils_1.Image.fromBuffer(screenshot);
202
+ const beforeSize = screenshot.length;
203
+ screenshot = image.resize(Math.floor(pngSize.width / 2))
204
+ .jpeg({ quality: 75 })
205
+ .toBuffer();
206
+ const afterSize = screenshot.length;
207
+ (0, logger_1.trace)(`Screenshot resized from ${beforeSize} bytes to ${afterSize} bytes`);
208
+ mimeType = "image/jpeg";
174
209
  }
175
- const resizedScreenshot = await image
176
- .resize(Math.floor(metadata.width / 2))
177
- .jpeg({ quality: 75 })
178
- .toBuffer();
179
- // debug:
180
- // writeFileSync('/tmp/screenshot.png', screenshot);
181
- // writeFileSync('/tmp/screenshot-scaled.jpg', resizedScreenshot);
182
- const screenshot64 = resizedScreenshot.toString("base64");
210
+ const screenshot64 = screenshot.toString("base64");
183
211
  (0, logger_1.trace)(`Screenshot taken: ${screenshot.length} bytes`);
184
212
  return {
185
- content: [{ type: "image", data: screenshot64, mimeType: "image/jpeg" }]
213
+ content: [{ type: "image", data: screenshot64, mimeType }]
186
214
  };
187
215
  }
188
216
  catch (err) {
@@ -193,6 +221,18 @@ const createMcpServer = () => {
193
221
  };
194
222
  }
195
223
  });
224
+ tool("mobile_set_orientation", "Change the screen orientation of the device", {
225
+ orientation: zod_1.z.enum(["portrait", "landscape"]).describe("The desired orientation"),
226
+ }, async ({ orientation }) => {
227
+ requireRobot();
228
+ await robot.setOrientation(orientation);
229
+ return `Changed device orientation to ${orientation}`;
230
+ });
231
+ tool("mobile_get_orientation", "Get the current screen orientation of the device", {}, async () => {
232
+ requireRobot();
233
+ const orientation = await robot.getOrientation();
234
+ return `Current device orientation is ${orientation}`;
235
+ });
196
236
  return server;
197
237
  };
198
238
  exports.createMcpServer = createMcpServer;
@@ -207,5 +207,25 @@ class WebDriverAgent {
207
207
  });
208
208
  });
209
209
  }
210
+ async setOrientation(orientation) {
211
+ await this.withinSession(async (sessionUrl) => {
212
+ const url = `${sessionUrl}/orientation`;
213
+ await fetch(url, {
214
+ method: "POST",
215
+ headers: { "Content-Type": "application/json" },
216
+ body: JSON.stringify({
217
+ orientation: orientation.toUpperCase()
218
+ })
219
+ });
220
+ });
221
+ }
222
+ async getOrientation() {
223
+ return this.withinSession(async (sessionUrl) => {
224
+ const url = `${sessionUrl}/orientation`;
225
+ const response = await fetch(url);
226
+ const json = await response.json();
227
+ return json.value.toLowerCase();
228
+ });
229
+ }
210
230
  }
211
231
  exports.WebDriverAgent = WebDriverAgent;
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "@mobilenext/mobile-mcp",
3
- "version": "0.0.12",
3
+ "version": "0.0.14",
4
4
  "description": "Mobile MCP",
5
5
  "repository": {
6
6
  "type": "git",
@@ -24,8 +24,6 @@
24
24
  "dependencies": {
25
25
  "@modelcontextprotocol/sdk": "^1.6.1",
26
26
  "fast-xml-parser": "^5.0.9",
27
- "nyc": "^17.1.0",
28
- "sharp": "^0.33.5",
29
27
  "zod-to-json-schema": "^3.24.4"
30
28
  },
31
29
  "devDependencies": {
@@ -42,6 +40,7 @@
42
40
  "eslint-plugin-import": "^2.31.0",
43
41
  "eslint-plugin-notice": "^1.0.0",
44
42
  "husky": "^9.1.7",
43
+ "nyc": "^17.1.0",
45
44
  "mocha": "^11.1.0",
46
45
  "ts-node": "^10.9.2",
47
46
  "typescript": "^5.8.2"