mobai-mcp 1.0.0 → 1.0.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (2) hide show
  1. package/dist/index.js +149 -8
  2. package/package.json +1 -1
package/dist/index.js CHANGED
@@ -836,6 +836,36 @@ const API_REFERENCE = `# MobAI API Reference
836
836
  \`\`\`json
837
837
  {"error": "message", "code": "ERROR_CODE"}
838
838
  \`\`\`
839
+
840
+ ## DSL Action Reference
841
+
842
+ ### type Action
843
+ - **predicate**: Required if keyboard not already open (auto-taps the element first)
844
+ - **dismiss_keyboard**: Default \`false\` (keyboard stays open after typing)
845
+ - **clear_first**: Optional, clears field before typing
846
+
847
+ \`\`\`json
848
+ {"action": "type", "text": "hello", "predicate": {"type": "input"}}
849
+ \`\`\`
850
+
851
+ ### press_key Action
852
+ - **key**: Keyboard key to press (return, tab, delete, escape, etc.)
853
+ - **context**: Optional, "web" for web context (supports enter, tab, delete, escape)
854
+
855
+ \`\`\`json
856
+ {"action": "press_key", "key": "return"}
857
+ {"action": "press_key", "key": "tab", "context": "web"}
858
+ \`\`\`
859
+
860
+ ### select_web_context Action
861
+ - **url_contains**: Filter by URL substring
862
+ - **title_contains**: Filter by page title substring
863
+
864
+ \`\`\`json
865
+ {"action": "select_web_context"}
866
+ {"action": "select_web_context", "url_contains": "example.com"}
867
+ {"action": "select_web_context", "title_contains": "Login"}
868
+ \`\`\`
839
869
  `;
840
870
  const DSL_GUIDE = `# MobAI DSL Guide
841
871
 
@@ -860,7 +890,8 @@ The DSL (Domain Specific Language) enables batch execution of multiple automatio
860
890
  |--------|-------------|------------|
861
891
  | observe | Get UI tree/screenshot | context, include (ui_tree, screenshot, installed_apps) |
862
892
  | tap | Tap element | predicate or coords |
863
- | type | Type text | text, predicate, clear_first |
893
+ | type | Type text | text, predicate (if keyboard not open), dismiss_keyboard (default: false) |
894
+ | press_key | Press keyboard key | key (return, tab, delete, etc.), context (optional: "web") |
864
895
  | toggle | Set switch state | predicate, state ("on"/"off") |
865
896
  | swipe | Swipe gesture | direction, distance, duration_ms |
866
897
  | scroll | Scroll in container | direction, predicate (container), to_element |
@@ -871,6 +902,7 @@ The DSL (Domain Specific Language) enables batch execution of multiple automatio
871
902
  | assert_not_exists | Verify element gone | predicate |
872
903
  | delay | Wait fixed time | duration_ms |
873
904
  | if_exists | Conditional | predicate, then, else |
905
+ | select_web_context | Select browser/WebView | url_contains, title_contains (optional filters) |
874
906
 
875
907
  ## Predicates
876
908
 
@@ -894,9 +926,11 @@ Match elements by:
894
926
 
895
927
  ### Type Text
896
928
  \`\`\`json
897
- {"action": "type", "text": "Hello", "clear_first": true}
929
+ {"action": "type", "text": "Hello", "predicate": {"type": "input"}}
898
930
  \`\`\`
899
931
 
932
+ Note: \`predicate\` is required if keyboard is not already open. Use \`dismiss_keyboard: true\` to close keyboard after typing.
933
+
900
934
  ### Toggle Switch
901
935
  \`\`\`json
902
936
  {"action": "toggle", "predicate": {"type": "switch", "text_contains": "WiFi"}, "state": "on"}
@@ -926,13 +960,71 @@ const NATIVE_RUNNER_GUIDE = `# Native App Automation Guide
926
960
 
927
961
  Use this for automating native mobile apps (Settings, Mail, Instagram, etc.).
928
962
 
963
+ ## Script Writing Guidelines
964
+
965
+ The DSL's purpose is to **minimize LLM calls** by encoding assumptions into comprehensive scripts. Write scripts that handle common scenarios without needing to re-observe.
966
+
967
+ ### Example: Handle Cookie Banner
968
+ \`\`\`json
969
+ {
970
+ "action": "if_exists",
971
+ "predicate": {"text_contains": "Accept Cookies"},
972
+ "then": [{"action": "tap", "predicate": {"text_contains": "Accept"}}]
973
+ }
974
+ \`\`\`
975
+
976
+ ### Common Knowledge (use without observing)
977
+ - Safari has an address bar at the top
978
+ - Settings app has Wi-Fi, Bluetooth, General sections
979
+ - Alert dialogs have "OK", "Cancel", "Allow", "Don't Allow" buttons
980
+ - iOS keyboard has "Done", "Return", "Search" keys
981
+
982
+ ### Script Writing Rules
983
+ - **Use open_app** - Always start scripts with open_app to ensure correct app
984
+ - **UI tree provided upfront** - You receive the initial UI tree, use it to plan the script
985
+ - **Use if_exists for popups** - Handle cookie banners, permission dialogs, notifications
986
+ - **observe only for assert_screen_changed** - Use observe to establish baseline, then assert_screen_changed to verify navigation
987
+
988
+ ## IMPORTANT: Browser Native UI
989
+
990
+ When automating browsers (Safari, Chrome), use **Native Runner** for the browser's own UI:
991
+ - Address bar / URL bar
992
+ - Tab bar and tab management
993
+ - Navigation buttons (back, forward, refresh)
994
+ - Bookmarks bar
995
+ - Browser menus and settings
996
+
997
+ These are native OS elements, NOT web content. Only use Web Runner for the actual webpage content inside the browser.
998
+
929
999
  ## Workflow
930
1000
 
931
1001
  1. **Observe UI** - Get the accessibility tree
932
1002
  2. **Match Elements** - Use predicates to find elements
933
- 3. **Execute Actions** - Tap, type, swipe, etc.
1003
+ 3. **Execute Actions** - Tap, type, swipe, press_key, etc.
934
1004
  4. **Verify Results** - Check UI state changed
935
1005
 
1006
+ ## Type Action
1007
+
1008
+ The \`type\` action requires either:
1009
+ 1. Keyboard already open (from previous tap on input), OR
1010
+ 2. A predicate to identify and tap the input field
1011
+
1012
+ **dismiss_keyboard** default is \`false\` (keyboard stays open after typing).
1013
+
1014
+ ### Pattern 1: Tap then Type
1015
+ \`\`\`json
1016
+ [
1017
+ {"action": "tap", "predicate": {"type": "input"}},
1018
+ {"action": "type", "text": "username"},
1019
+ {"action": "press_key", "key": "tab"}
1020
+ ]
1021
+ \`\`\`
1022
+
1023
+ ### Pattern 2: Type with Predicate
1024
+ \`\`\`json
1025
+ {"action": "type", "text": "username", "predicate": {"type": "input", "label": "Username"}}
1026
+ \`\`\`
1027
+
936
1028
  ## Common Patterns
937
1029
 
938
1030
  ### Open App and Navigate
@@ -954,9 +1046,10 @@ Use this for automating native mobile apps (Settings, Mail, Instagram, etc.).
954
1046
  "version": "0.2",
955
1047
  "steps": [
956
1048
  {"action": "tap", "predicate": {"type": "input"}},
957
- {"action": "type", "text": "username", "clear_first": true},
958
- {"action": "tap", "predicate": {"type": "input", "index": 1}},
959
- {"action": "type", "text": "password", "clear_first": true}
1049
+ {"action": "type", "text": "username"},
1050
+ {"action": "press_key", "key": "tab"},
1051
+ {"action": "type", "text": "password"},
1052
+ {"action": "press_key", "key": "return"}
960
1053
  ]
961
1054
  }
962
1055
  \`\`\`
@@ -986,16 +1079,45 @@ Use this for automating native mobile apps (Settings, Mail, Instagram, etc.).
986
1079
  }
987
1080
  \`\`\`
988
1081
 
1082
+ ## Quick Reference
1083
+
1084
+ | Action | Description | Key Fields |
1085
+ |--------|-------------|------------|
1086
+ | tap | Tap element | predicate or coords |
1087
+ | type | Type text | text, predicate (if keyboard not open), dismiss_keyboard (default: false) |
1088
+ | press_key | Press keyboard key | key (return, tab, delete, etc.) |
1089
+ | swipe | Swipe gesture | direction, distance |
1090
+ | scroll | Scroll container | direction, to_element |
1091
+
989
1092
  ## Tips
990
1093
 
991
1094
  - **Always observe first** - Get UI tree before interacting
992
1095
  - **Use predicates** - More robust than hardcoded indices
993
1096
  - **Add delays after navigation** - Apps need time to render
994
1097
  - **Use retry strategy** - Transient failures are common
1098
+ - **Use press_key for form navigation** - Tab between fields, Return to submit
995
1099
  `;
996
1100
  const WEB_RUNNER_GUIDE = `# Web Automation Guide
997
1101
 
998
- Use this for automating browsers (Safari, Chrome) and WebViews on mobile devices.
1102
+ **Try native-runner first for simple taps/types.** Only use Web Runner when you need DOM manipulation, CSS selectors, or JavaScript execution.
1103
+
1104
+ ## When to Use Web Runner
1105
+
1106
+ ✅ **USE Web Runner for:**
1107
+ - Native runner returns NO_MATCH for web elements
1108
+ - CSS selector-based element targeting
1109
+ - JavaScript execution in page context
1110
+ - DOM manipulation and inspection
1111
+ - Complex form interactions requiring DOM access
1112
+
1113
+ ❌ **DO NOT use Web Runner for:**
1114
+ - Browser address bar / URL bar → use Native Runner
1115
+ - Browser tab bar → use Native Runner
1116
+ - Browser navigation buttons (back, forward, refresh) → use Native Runner
1117
+ - Browser menus and settings → use Native Runner
1118
+ - Any UI outside the webpage or webview content area → use Native Runner
1119
+
1120
+ The browser's own UI (address bar, tabs, navigation) are **native OS elements**, not web content.
999
1121
 
1000
1122
  ## Platform Support
1001
1123
 
@@ -1009,7 +1131,26 @@ Use this for automating browsers (Safari, Chrome) and WebViews on mobile devices
1009
1131
  1. **Select web context** - Connect to browser
1010
1132
  2. **Navigate** - Go to URL
1011
1133
  3. **Get DOM** - Inspect page structure
1012
- 4. **Interact** - Click, type using CSS selectors
1134
+ 4. **Interact** - Click, type, press_key using CSS selectors
1135
+
1136
+ ## select_web_context Options
1137
+
1138
+ \`\`\`json
1139
+ {"action": "select_web_context"}
1140
+ {"action": "select_web_context", "url_contains": "example.com"}
1141
+ {"action": "select_web_context", "title_contains": "Login"}
1142
+ \`\`\`
1143
+
1144
+ Use \`url_contains\` or \`title_contains\` to select a specific tab/WebView when multiple are available.
1145
+
1146
+ ## press_key (Web Context)
1147
+
1148
+ Press keyboard keys in web context. Supported keys: \`enter\`, \`tab\`, \`delete\`, \`escape\`
1149
+
1150
+ \`\`\`json
1151
+ {"action": "press_key", "context": "web", "key": "enter"}
1152
+ {"action": "press_key", "context": "web", "key": "tab"}
1153
+ \`\`\`
1013
1154
 
1014
1155
  ## Common Patterns
1015
1156
 
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "mobai-mcp",
3
- "version": "1.0.0",
3
+ "version": "1.0.1",
4
4
  "description": "MCP server for MobAI - AI-powered mobile device automation",
5
5
  "type": "module",
6
6
  "main": "dist/index.js",