mobai-mcp 1.0.0 → 1.0.1
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/dist/index.js +149 -8
- package/package.json +1 -1
package/dist/index.js
CHANGED
|
@@ -836,6 +836,36 @@ const API_REFERENCE = `# MobAI API Reference
|
|
|
836
836
|
\`\`\`json
|
|
837
837
|
{"error": "message", "code": "ERROR_CODE"}
|
|
838
838
|
\`\`\`
|
|
839
|
+
|
|
840
|
+
## DSL Action Reference
|
|
841
|
+
|
|
842
|
+
### type Action
|
|
843
|
+
- **predicate**: Required if keyboard not already open (auto-taps the element first)
|
|
844
|
+
- **dismiss_keyboard**: Default \`false\` (keyboard stays open after typing)
|
|
845
|
+
- **clear_first**: Optional, clears field before typing
|
|
846
|
+
|
|
847
|
+
\`\`\`json
|
|
848
|
+
{"action": "type", "text": "hello", "predicate": {"type": "input"}}
|
|
849
|
+
\`\`\`
|
|
850
|
+
|
|
851
|
+
### press_key Action
|
|
852
|
+
- **key**: Keyboard key to press (return, tab, delete, escape, etc.)
|
|
853
|
+
- **context**: Optional, "web" for web context (supports enter, tab, delete, escape)
|
|
854
|
+
|
|
855
|
+
\`\`\`json
|
|
856
|
+
{"action": "press_key", "key": "return"}
|
|
857
|
+
{"action": "press_key", "key": "tab", "context": "web"}
|
|
858
|
+
\`\`\`
|
|
859
|
+
|
|
860
|
+
### select_web_context Action
|
|
861
|
+
- **url_contains**: Filter by URL substring
|
|
862
|
+
- **title_contains**: Filter by page title substring
|
|
863
|
+
|
|
864
|
+
\`\`\`json
|
|
865
|
+
{"action": "select_web_context"}
|
|
866
|
+
{"action": "select_web_context", "url_contains": "example.com"}
|
|
867
|
+
{"action": "select_web_context", "title_contains": "Login"}
|
|
868
|
+
\`\`\`
|
|
839
869
|
`;
|
|
840
870
|
const DSL_GUIDE = `# MobAI DSL Guide
|
|
841
871
|
|
|
@@ -860,7 +890,8 @@ The DSL (Domain Specific Language) enables batch execution of multiple automatio
|
|
|
860
890
|
|--------|-------------|------------|
|
|
861
891
|
| observe | Get UI tree/screenshot | context, include (ui_tree, screenshot, installed_apps) |
|
|
862
892
|
| tap | Tap element | predicate or coords |
|
|
863
|
-
| type | Type text | text, predicate,
|
|
893
|
+
| type | Type text | text, predicate (if keyboard not open), dismiss_keyboard (default: false) |
|
|
894
|
+
| press_key | Press keyboard key | key (return, tab, delete, etc.), context (optional: "web") |
|
|
864
895
|
| toggle | Set switch state | predicate, state ("on"/"off") |
|
|
865
896
|
| swipe | Swipe gesture | direction, distance, duration_ms |
|
|
866
897
|
| scroll | Scroll in container | direction, predicate (container), to_element |
|
|
@@ -871,6 +902,7 @@ The DSL (Domain Specific Language) enables batch execution of multiple automatio
|
|
|
871
902
|
| assert_not_exists | Verify element gone | predicate |
|
|
872
903
|
| delay | Wait fixed time | duration_ms |
|
|
873
904
|
| if_exists | Conditional | predicate, then, else |
|
|
905
|
+
| select_web_context | Select browser/WebView | url_contains, title_contains (optional filters) |
|
|
874
906
|
|
|
875
907
|
## Predicates
|
|
876
908
|
|
|
@@ -894,9 +926,11 @@ Match elements by:
|
|
|
894
926
|
|
|
895
927
|
### Type Text
|
|
896
928
|
\`\`\`json
|
|
897
|
-
{"action": "type", "text": "Hello", "
|
|
929
|
+
{"action": "type", "text": "Hello", "predicate": {"type": "input"}}
|
|
898
930
|
\`\`\`
|
|
899
931
|
|
|
932
|
+
Note: \`predicate\` is required if keyboard is not already open. Use \`dismiss_keyboard: true\` to close keyboard after typing.
|
|
933
|
+
|
|
900
934
|
### Toggle Switch
|
|
901
935
|
\`\`\`json
|
|
902
936
|
{"action": "toggle", "predicate": {"type": "switch", "text_contains": "WiFi"}, "state": "on"}
|
|
@@ -926,13 +960,71 @@ const NATIVE_RUNNER_GUIDE = `# Native App Automation Guide
|
|
|
926
960
|
|
|
927
961
|
Use this for automating native mobile apps (Settings, Mail, Instagram, etc.).
|
|
928
962
|
|
|
963
|
+
## Script Writing Guidelines
|
|
964
|
+
|
|
965
|
+
The DSL's purpose is to **minimize LLM calls** by encoding assumptions into comprehensive scripts. Write scripts that handle common scenarios without needing to re-observe.
|
|
966
|
+
|
|
967
|
+
### Example: Handle Cookie Banner
|
|
968
|
+
\`\`\`json
|
|
969
|
+
{
|
|
970
|
+
"action": "if_exists",
|
|
971
|
+
"predicate": {"text_contains": "Accept Cookies"},
|
|
972
|
+
"then": [{"action": "tap", "predicate": {"text_contains": "Accept"}}]
|
|
973
|
+
}
|
|
974
|
+
\`\`\`
|
|
975
|
+
|
|
976
|
+
### Common Knowledge (use without observing)
|
|
977
|
+
- Safari has an address bar at the top
|
|
978
|
+
- Settings app has Wi-Fi, Bluetooth, General sections
|
|
979
|
+
- Alert dialogs have "OK", "Cancel", "Allow", "Don't Allow" buttons
|
|
980
|
+
- iOS keyboard has "Done", "Return", "Search" keys
|
|
981
|
+
|
|
982
|
+
### Script Writing Rules
|
|
983
|
+
- **Use open_app** - Always start scripts with open_app to ensure correct app
|
|
984
|
+
- **UI tree provided upfront** - You receive the initial UI tree, use it to plan the script
|
|
985
|
+
- **Use if_exists for popups** - Handle cookie banners, permission dialogs, notifications
|
|
986
|
+
- **observe only for assert_screen_changed** - Use observe to establish baseline, then assert_screen_changed to verify navigation
|
|
987
|
+
|
|
988
|
+
## IMPORTANT: Browser Native UI
|
|
989
|
+
|
|
990
|
+
When automating browsers (Safari, Chrome), use **Native Runner** for the browser's own UI:
|
|
991
|
+
- Address bar / URL bar
|
|
992
|
+
- Tab bar and tab management
|
|
993
|
+
- Navigation buttons (back, forward, refresh)
|
|
994
|
+
- Bookmarks bar
|
|
995
|
+
- Browser menus and settings
|
|
996
|
+
|
|
997
|
+
These are native OS elements, NOT web content. Only use Web Runner for the actual webpage content inside the browser.
|
|
998
|
+
|
|
929
999
|
## Workflow
|
|
930
1000
|
|
|
931
1001
|
1. **Observe UI** - Get the accessibility tree
|
|
932
1002
|
2. **Match Elements** - Use predicates to find elements
|
|
933
|
-
3. **Execute Actions** - Tap, type, swipe, etc.
|
|
1003
|
+
3. **Execute Actions** - Tap, type, swipe, press_key, etc.
|
|
934
1004
|
4. **Verify Results** - Check UI state changed
|
|
935
1005
|
|
|
1006
|
+
## Type Action
|
|
1007
|
+
|
|
1008
|
+
The \`type\` action requires either:
|
|
1009
|
+
1. Keyboard already open (from previous tap on input), OR
|
|
1010
|
+
2. A predicate to identify and tap the input field
|
|
1011
|
+
|
|
1012
|
+
**dismiss_keyboard** default is \`false\` (keyboard stays open after typing).
|
|
1013
|
+
|
|
1014
|
+
### Pattern 1: Tap then Type
|
|
1015
|
+
\`\`\`json
|
|
1016
|
+
[
|
|
1017
|
+
{"action": "tap", "predicate": {"type": "input"}},
|
|
1018
|
+
{"action": "type", "text": "username"},
|
|
1019
|
+
{"action": "press_key", "key": "tab"}
|
|
1020
|
+
]
|
|
1021
|
+
\`\`\`
|
|
1022
|
+
|
|
1023
|
+
### Pattern 2: Type with Predicate
|
|
1024
|
+
\`\`\`json
|
|
1025
|
+
{"action": "type", "text": "username", "predicate": {"type": "input", "label": "Username"}}
|
|
1026
|
+
\`\`\`
|
|
1027
|
+
|
|
936
1028
|
## Common Patterns
|
|
937
1029
|
|
|
938
1030
|
### Open App and Navigate
|
|
@@ -954,9 +1046,10 @@ Use this for automating native mobile apps (Settings, Mail, Instagram, etc.).
|
|
|
954
1046
|
"version": "0.2",
|
|
955
1047
|
"steps": [
|
|
956
1048
|
{"action": "tap", "predicate": {"type": "input"}},
|
|
957
|
-
{"action": "type", "text": "username"
|
|
958
|
-
{"action": "
|
|
959
|
-
{"action": "type", "text": "password",
|
|
1049
|
+
{"action": "type", "text": "username"},
|
|
1050
|
+
{"action": "press_key", "key": "tab"},
|
|
1051
|
+
{"action": "type", "text": "password"},
|
|
1052
|
+
{"action": "press_key", "key": "return"}
|
|
960
1053
|
]
|
|
961
1054
|
}
|
|
962
1055
|
\`\`\`
|
|
@@ -986,16 +1079,45 @@ Use this for automating native mobile apps (Settings, Mail, Instagram, etc.).
|
|
|
986
1079
|
}
|
|
987
1080
|
\`\`\`
|
|
988
1081
|
|
|
1082
|
+
## Quick Reference
|
|
1083
|
+
|
|
1084
|
+
| Action | Description | Key Fields |
|
|
1085
|
+
|--------|-------------|------------|
|
|
1086
|
+
| tap | Tap element | predicate or coords |
|
|
1087
|
+
| type | Type text | text, predicate (if keyboard not open), dismiss_keyboard (default: false) |
|
|
1088
|
+
| press_key | Press keyboard key | key (return, tab, delete, etc.) |
|
|
1089
|
+
| swipe | Swipe gesture | direction, distance |
|
|
1090
|
+
| scroll | Scroll container | direction, to_element |
|
|
1091
|
+
|
|
989
1092
|
## Tips
|
|
990
1093
|
|
|
991
1094
|
- **Always observe first** - Get UI tree before interacting
|
|
992
1095
|
- **Use predicates** - More robust than hardcoded indices
|
|
993
1096
|
- **Add delays after navigation** - Apps need time to render
|
|
994
1097
|
- **Use retry strategy** - Transient failures are common
|
|
1098
|
+
- **Use press_key for form navigation** - Tab between fields, Return to submit
|
|
995
1099
|
`;
|
|
996
1100
|
const WEB_RUNNER_GUIDE = `# Web Automation Guide
|
|
997
1101
|
|
|
998
|
-
|
|
1102
|
+
**Try native-runner first for simple taps/types.** Only use Web Runner when you need DOM manipulation, CSS selectors, or JavaScript execution.
|
|
1103
|
+
|
|
1104
|
+
## When to Use Web Runner
|
|
1105
|
+
|
|
1106
|
+
✅ **USE Web Runner for:**
|
|
1107
|
+
- Native runner returns NO_MATCH for web elements
|
|
1108
|
+
- CSS selector-based element targeting
|
|
1109
|
+
- JavaScript execution in page context
|
|
1110
|
+
- DOM manipulation and inspection
|
|
1111
|
+
- Complex form interactions requiring DOM access
|
|
1112
|
+
|
|
1113
|
+
❌ **DO NOT use Web Runner for:**
|
|
1114
|
+
- Browser address bar / URL bar → use Native Runner
|
|
1115
|
+
- Browser tab bar → use Native Runner
|
|
1116
|
+
- Browser navigation buttons (back, forward, refresh) → use Native Runner
|
|
1117
|
+
- Browser menus and settings → use Native Runner
|
|
1118
|
+
- Any UI outside the webpage or webview content area → use Native Runner
|
|
1119
|
+
|
|
1120
|
+
The browser's own UI (address bar, tabs, navigation) are **native OS elements**, not web content.
|
|
999
1121
|
|
|
1000
1122
|
## Platform Support
|
|
1001
1123
|
|
|
@@ -1009,7 +1131,26 @@ Use this for automating browsers (Safari, Chrome) and WebViews on mobile devices
|
|
|
1009
1131
|
1. **Select web context** - Connect to browser
|
|
1010
1132
|
2. **Navigate** - Go to URL
|
|
1011
1133
|
3. **Get DOM** - Inspect page structure
|
|
1012
|
-
4. **Interact** - Click, type using CSS selectors
|
|
1134
|
+
4. **Interact** - Click, type, press_key using CSS selectors
|
|
1135
|
+
|
|
1136
|
+
## select_web_context Options
|
|
1137
|
+
|
|
1138
|
+
\`\`\`json
|
|
1139
|
+
{"action": "select_web_context"}
|
|
1140
|
+
{"action": "select_web_context", "url_contains": "example.com"}
|
|
1141
|
+
{"action": "select_web_context", "title_contains": "Login"}
|
|
1142
|
+
\`\`\`
|
|
1143
|
+
|
|
1144
|
+
Use \`url_contains\` or \`title_contains\` to select a specific tab/WebView when multiple are available.
|
|
1145
|
+
|
|
1146
|
+
## press_key (Web Context)
|
|
1147
|
+
|
|
1148
|
+
Press keyboard keys in web context. Supported keys: \`enter\`, \`tab\`, \`delete\`, \`escape\`
|
|
1149
|
+
|
|
1150
|
+
\`\`\`json
|
|
1151
|
+
{"action": "press_key", "context": "web", "key": "enter"}
|
|
1152
|
+
{"action": "press_key", "context": "web", "key": "tab"}
|
|
1153
|
+
\`\`\`
|
|
1013
1154
|
|
|
1014
1155
|
## Common Patterns
|
|
1015
1156
|
|