mobile-debug-mcp 0.24.1 → 0.24.3

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,312 @@
1
+ # Baseline Spec v0
2
+
3
+ ## 1. System Overview
4
+
5
+ The MCP surface is defined in `src/server/tool-definitions.ts` and dispatched in `src/server/tool-handlers.ts`. Tools are grouped in code by module, not by an explicit runtime taxonomy: **manage**, **observe**, **interact**, **network/classification**, and **system**.
6
+
7
+ Agents interact with tools by name through `handleToolCall(name, args)`. Most handlers return a **single text content block containing JSON** via `wrapResponse(...)`. Exceptions are observable in code:
8
+
9
+ | Tool | MCP content shape |
10
+ | --- | --- |
11
+ | most tools | one text block with JSON |
12
+ | `get_logs` | two text blocks: metadata JSON, then logs JSON |
13
+ | `capture_screenshot` | one text block with JSON metadata, then one or more image blocks |
14
+ | `build_and_install` | one NDJSON text block, then one JSON text block |
15
+ | uncaught handler error | one plain text error string, not wrapped JSON |
16
+
17
+ Observable execution flow for state-mutating action tools at the MCP boundary:
18
+
19
+ 1. resolve device/platform
20
+ 2. call `ToolsNetwork.notifyActionStart()`
21
+ 3. capture UI fingerprint before the action
22
+ 4. execute the platform action
23
+ 5. capture UI fingerprint after the action
24
+ 6. wrap the result into an action envelope
25
+
26
+ That flow is applied to `start_app`, `restart_app`, `tap`, `swipe`, `scroll_to_element`, `type_text`, and `press_back`. `tap_element` builds a similar envelope inside `src/interact/index.ts` rather than through the shared wrapper.
27
+
28
+ ## 2. Tool Inventory
29
+
30
+ ### Manage / lifecycle
31
+
32
+ | Tool | Purpose | Inputs | Outputs | Side effects |
33
+ | --- | --- | --- | --- | --- |
34
+ | `start_app` | Launch app on Android or iOS. | `{ platform: 'android'\|'ios', appId: string, deviceId?: string }` | `ActionExecutionResult` JSON with `device` and `details` (`launch_time_ms`, `device_id`, `output?`, `observed_app?`, `error?`). | Launches app, captures fingerprints, resets network window. |
35
+ | `terminate_app` | Stop app process. | `{ platform: 'android'\|'ios', appId: string, deviceId?: string }` | `{ terminated: boolean, device: DeviceInfo }` | Terminates app. |
36
+ | `restart_app` | Terminate then relaunch app. | `{ platform: 'android'\|'ios', appId: string, deviceId?: string }` | `ActionExecutionResult` JSON with `device` and restart `details` (`terminated_before_restart`, `terminate_error?`, `output?`, `observed_app?`, `error?`). | Stops and launches app, captures fingerprints, resets network window. |
37
+ | `reset_app_data` | Clear app storage / simulator container data. | `{ platform: 'android'\|'ios', appId: string, deviceId?: string }` | `{ reset: boolean, device: DeviceInfo }` | Clears app state. |
38
+ | `install_app` | Install built artifact or project output. | `{ platform: 'android'\|'ios', projectType: 'native'\|'kmp'\|'react-native'\|'flutter', appPath: string, deviceId?: string }` | `{ device: DeviceInfo, installed: boolean, output?: string, error?: string }` | Installs app; Android may push APK/AAB and run `pm install`; iOS may use `simctl` or `idb`. |
39
+ | `build_app` | Build project and return artifact path. | `{ platform: 'android'\|'ios', projectType: ..., projectPath: string, variant?: string }` | Build result JSON from platform builder, including artifact path on success or `error`. | Runs Gradle or Xcode build. |
40
+ | `build_and_install` | Build then install, streaming progress. | `{ platform: 'android'\|'ios', projectType: ..., projectPath: string, deviceId?: string, variant?: string }` | MCP response has NDJSON event block plus result JSON `{ success: boolean, artifactPath?: string, device?: DeviceInfo, output?: string, error?: string }`. | Builds, installs, emits progress events. |
41
+ | `list_devices` | Enumerate available devices. | `{ platform?: 'android'\|'ios', appId?: string }` | `{ devices: DeviceInfo[] }` (runtime objects may also include `appInstalled`/`booted`). | Reads device lists. |
42
+
43
+ ### Observe / inspect
44
+
45
+ | Tool | Purpose | Inputs | Outputs | Side effects |
46
+ | --- | --- | --- | --- | --- |
47
+ | `get_logs` | Fetch recent device logs. | `{ platform: 'android'\|'ios', appId?: string, deviceId?: string, pid?: number, tag?: string, level?: string, contains?: string, since_seconds?: number, limit?: number, lines?: number }` | Two text blocks: metadata `{ device, result: { count, filtered, crashLines, source, meta } }`, then `{ logs: [...] }`. | Reads platform logs. |
48
+ | `capture_screenshot` | Capture current screenshot. | `{ platform: 'android'\|'ios', deviceId?: string }` | Text metadata block plus image block(s). | Captures screenshot; uses temp files. |
49
+ | `capture_debug_snapshot` | Bundle screenshot, UI tree, screen, fingerprint, and logs. | `{ reason?: string, includeLogs?: boolean, logLines?: number, platform?: 'android'\|'ios', appId?: string, deviceId?: string, sessionId?: string }` | Wrapped JSON snapshot object with device metadata, screenshot metadata, UI tree, fingerprint, current screen, and logs/errors. | Captures multiple observations. |
50
+ | `start_log_stream` | Start background structured log stream. | `{ platform?: 'android'\|'ios', packageName: string, level?: 'error'\|'warn'\|'info'\|'debug', deviceId?: string, sessionId?: string }` | `{ success: boolean, stream_started?: boolean, device_id?: string, pid?: number, error?: string }` | Starts long-lived log process, writes NDJSON file. |
51
+ | `read_log_stream` | Read accumulated streamed logs. | `{ sessionId?: string }` | `{ entries: any[], crash_summary?: { crash_detected: boolean, exception?: string, sample?: string } }` | Reads stream file; no new device action. |
52
+ | `stop_log_stream` | Stop background log stream. | `{ sessionId?: string }` | `{ success: boolean }` | Stops stream process and clears session entry. |
53
+ | `get_ui_tree` | Return current UI hierarchy. | `{ platform: 'android'\|'ios', deviceId?: string }` | `GetUITreeResponse` with `device`, `elements`, `resolution`, optional `error`. | Dumps UI hierarchy; Android writes/pulls XML; iOS queries via `idb`. |
54
+ | `get_current_screen` | Return visible Android activity. | `{ deviceId?: string }` | `GetCurrentScreenResponse` with `device`, `activity`, `package`, `shortActivity?`, `error?`. | Reads `dumpsys`; Android only. |
55
+ | `get_screen_fingerprint` | Compute stable screen fingerprint from UI tree and current screen. | `{ platform?: 'android'\|'ios', deviceId?: string }` | `{ fingerprint: string\|null, activity?: string, error?: string }` | Reads UI tree and, on Android, current screen. |
56
+
57
+ ### Interact / wait / verify
58
+
59
+ | Tool | Purpose | Inputs | Outputs | Side effects |
60
+ | --- | --- | --- | --- | --- |
61
+ | `wait_for_screen_change` | Wait until fingerprint differs from provided previous fingerprint. | `{ platform?: 'android'\|'ios', previousFingerprint: string, timeoutMs?: number, pollIntervalMs?: number, deviceId?: string }` | `{ success: boolean, previousFingerprint, newFingerprint?\|lastFingerprint?, elapsedMs, observed_screen: { fingerprint, activity }, reason?: 'timeout' }` | Polls fingerprints. |
62
+ | `expect_screen` | Exact check against expected fingerprint or screen name. | `{ platform?: 'android'\|'ios', fingerprint?: string, screen?: string, deviceId?: string }` | `{ success, observed_screen, expected_screen, confidence, comparison: { basis, matched, reason } }` | Reads fingerprint/current screen. |
63
+ | `expect_element_visible` | Binary visible check for selector. | `{ selector: { text?: string, resource_id?: string, accessibility_id?: string, contains?: boolean }, element_id?: string, timeout_ms?: number, poll_interval_ms?: number, platform?: 'android'\|'ios', deviceId?: string }` | `{ success, selector, element_id, expected_condition: 'visible', element?, observed, reason, failure_code?, retryable? }` | Polls UI tree through `wait_for_ui`. |
64
+ | `wait_for_ui` | Deterministic UI wait and element resolution. | `{ selector?: { text?: string, resource_id?: string, accessibility_id?: string, contains?: boolean }, condition?: 'exists'\|'not_exists'\|'visible'\|'clickable', timeout_ms?: number, poll_interval_ms?: number, match?: { index?: number }, retry?: { max_attempts?: number, backoff_ms?: number }, platform?: 'android'\|'ios', deviceId?: string }` | Success: `{ status:'success', matched, element, metrics, requested, observed }`; failure: `{ status:'timeout', error:{code,message}, metrics, requested, observed }`. | Polls UI tree; resolves actionable ancestor for `clickable`. |
65
+ | `find_element` | Heuristic semantic element search. | `{ query: string, exact?: boolean, timeoutMs?: number, platform?: 'android'\|'ios', deviceId?: string }` | `{ found: true, element, score, confidence }` or `{ found: false, error }` | Polls UI tree; no mutation. |
66
+
67
+ ### Action / mutation
68
+
69
+ | Tool | Purpose | Inputs | Outputs | Side effects |
70
+ | --- | --- | --- | --- | --- |
71
+ | `tap` | Tap coordinates. | `{ x: number, y: number, platform?: 'android'\|'ios', deviceId?: string }` | `ActionExecutionResult` | Taps screen; captures fingerprints; resets network window. |
72
+ | `tap_element` | Tap resolved UI element by `elementId`. | `{ elementId: string }` | Action-style JSON with `action_type: 'tap_element'`, target selector/resolved element, `success`, fingerprints, `failure_code?`, `retryable?`. | Reads cached element/UI context, validates element, taps it, resets network window. |
73
+ | `swipe` | Swipe coordinates. | `{ platform?: 'android'\|'ios', x1, y1, x2, y2, duration, deviceId?: string }` | `ActionExecutionResult` | Swipes screen; captures fingerprints; resets network window. |
74
+ | `scroll_to_element` | Repeatedly scroll until matching visible element is found. | `{ platform: 'android'\|'ios', selector: { text?: string, resourceId?: string, contentDesc?: string, className?: string }, direction?: 'down'\|'up', maxScrolls?: number, scrollAmount?: number, deviceId?: string }` | `ActionExecutionResult` | Repeated swipes plus UI tree checks; resets network window. |
75
+ | `type_text` | Type text into focused field. | `{ platform?: 'android', text: string, deviceId?: string }` | `ActionExecutionResult` | Android text input; captures fingerprints; resets network window. |
76
+ | `press_back` | Send Android Back key. | `{ platform?: 'android', deviceId?: string }` | `ActionExecutionResult` | Android back action; captures fingerprints; resets network window. |
77
+
78
+ ### Classification / network / system
79
+
80
+ | Tool | Purpose | Inputs | Outputs | Side effects |
81
+ | --- | --- | --- | --- | --- |
82
+ | `classify_action_outcome` | Deterministic rule-based classifier over supplied signals. | `{ uiChanged: boolean, expectedElementVisible?: boolean, networkRequests?: { url?: string, status: 'success'\|'failure'\|'retryable' }[], hasLogErrors?: boolean }` | `{ outcome: 'success'\|'no_op'\|'backend_failure'\|'ui_failure'\|'unknown', reasoning: string, nextAction?: 'call_get_network_activity' }` | Pure computation. |
83
+ | `get_network_activity` | Return normalized request events since last action window. | `{}` | `{ requests: NetworkRequestSummary[], count: number }` | Reads logs, advances internal `lastConsumedTimestamp`. |
84
+ | `get_system_status` | Aggregate Android/iOS/Gradle readiness. | `{}` | `{ success, status: 'ready'\|'degraded'\|'blocked', adbAvailable, adbVersion, devices, deviceStates, logsAvailable, envValid, issues, appInstalled, iosAvailable, iosDevices, gradleJavaHome, gradleValid, gradleFilesChecked, gradleSuggestedFixes, summary }` | Reads toolchain/device state. |
85
+
86
+ ## 3. Action Tools (Mutation Tools)
87
+
88
+ | Tool | Actual output shape | Success reporting | Failure structure | Retry logic |
89
+ | --- | --- | --- | --- | --- |
90
+ | `start_app` | `ActionExecutionResult` + `device` + `details` | `success` mirrors underlying launch success | `failure_code` inferred generically; raw launch `error` only appears in `details` | none |
91
+ | `terminate_app` | `{ terminated: boolean, device }` | `terminated === true` | no standardized failure code; boolean only at MCP layer | none |
92
+ | `restart_app` | `ActionExecutionResult` + `device` + restart `details` | `success` mirrors underlying restart success | `failure_code` inferred generically; terminate/start details kept in `details` | no retry; always does terminate then start |
93
+ | `reset_app_data` | `{ reset: boolean, device }` | `reset === true` | no standardized failure code | none |
94
+ | `install_app` | `{ device, installed, output?, error? }` | `installed === true` | unstructured `error` string; no action envelope | Android has internal fallback paths; iOS may fall back from `simctl` to `idb` |
95
+ | `build_and_install` | NDJSON event stream + `{ success, artifactPath?, device?, output?, error? }` | final `success === true` | unstructured `error`; build/install phases encoded in NDJSON | build and install internals may retry depending on platform helpers |
96
+ | `tap` | `ActionExecutionResult` | `success` means command executed | `failure_code`/`retryable` inferred from generic error text; raw error omitted | none |
97
+ | `tap_element` | action-style JSON built in `src/interact/index.ts` | `success` means element was resolved and tap dispatched | structured `failure_code` from `ActionFailureCode`; includes `retryable` | none |
98
+ | `swipe` | `ActionExecutionResult` | command executed | generic inferred `failure_code` | none |
99
+ | `scroll_to_element` | `ActionExecutionResult` | **different semantics**: success means target element became visible during scroll loop | `failure_code` inferred by scroll-specific string matching | internal loop up to `maxScrolls` |
100
+ | `type_text` | `ActionExecutionResult` | command executed | generic inferred `failure_code` | none |
101
+ | `press_back` | `ActionExecutionResult` | command executed | generic inferred `failure_code` | none |
102
+
103
+ **Observed inconsistency:** `start_app`/`restart_app` expose `device` and rich `details`; `tap`/`swipe`/`type_text`/`press_back` do not. `scroll_to_element` reports an outcome-oriented success, while the others mostly report execution success.
104
+
105
+ ## 4. Observation and Wait Tools
106
+
107
+ ### `wait_for_ui`
108
+
109
+ - **Role:** both waits and resolves.
110
+ - **Signals used:** only the current UI tree from `get_ui_tree`.
111
+ - **Behavior:** filters elements by selector, supports `match.index`, evaluates `exists` / `not_exists` / `visible` / `clickable`, and resolves an actionable ancestor for `clickable`.
112
+ - **Output:** descriptive, not binary. Returns `requested`, `observed`, `metrics`, and optionally `element`.
113
+ - **Success model:** `status: 'success'`; otherwise `status: 'timeout'` with structured `error`.
114
+
115
+ ### `wait_for_screen_change`
116
+
117
+ - **Role:** wait only.
118
+ - **Signals used:** screen fingerprints from `get_screen_fingerprint`.
119
+ - **Behavior:** polls until fingerprint differs from `previousFingerprint`, then performs a confirmation read for stability.
120
+ - **Output:** binary `success` plus descriptive `observed_screen`, elapsed time, and either `newFingerprint` or `lastFingerprint`.
121
+
122
+ ### `find_element`
123
+
124
+ - **Role:** resolve only.
125
+ - **Signals used:** UI tree.
126
+ - **Behavior:** heuristic scoring over text/content/resource/class; if best element is not interactable it tries to resolve a clickable ancestor.
127
+ - **Output:** descriptive, scored result (`score`, `confidence`) or `{ found:false, error }`.
128
+
129
+ ### `get_ui_tree`
130
+
131
+ - **Role:** inspect only.
132
+ - **Signals used:** platform accessibility/UI dump.
133
+ - **Output:** raw tree data with `elements`, `resolution`, and `device`.
134
+ - **Notes:** Android and iOS each retry internally up to three attempts.
135
+
136
+ ### `get_current_screen`
137
+
138
+ - **Role:** inspect only.
139
+ - **Signals used:** Android activity manager / window dumps.
140
+ - **Output:** current package/activity object.
141
+ - **Notes:** Android-only.
142
+
143
+ ### `get_screen_fingerprint`
144
+
145
+ - **Role:** inspect only.
146
+ - **Signals used:** UI tree plus current screen on Android.
147
+ - **Behavior:** normalizes a subset of visible, structurally significant elements and hashes them.
148
+ - **Output:** `{ fingerprint, activity?, error? }`.
149
+ - **Notes:** iOS fingerprint omits activity in the hash payload.
150
+
151
+ ### Log/snapshot observation
152
+
153
+ - `get_logs` returns structured metadata plus raw/structured log entries.
154
+ - `start_log_stream` / `read_log_stream` / `stop_log_stream` manage background NDJSON log capture.
155
+ - `capture_screenshot` and `capture_debug_snapshot` provide point-in-time observation artifacts.
156
+
157
+ ## 5. Existing Verification Mechanisms
158
+
159
+ | Mechanism | Success rule | Determinism | Ambiguity |
160
+ | --- | --- | --- | --- |
161
+ | `expect_screen` | exact fingerprint equality, else exact screen-name equality | binary and deterministic | if only `screen` is provided, Android may use either fingerprint-derived `activity` or `get_current_screen` label |
162
+ | `expect_element_visible` | delegated `wait_for_ui(condition:'visible')` reaches success | binary wrapper over deterministic wait | failure collapses to `TIMEOUT` or `UNKNOWN` |
163
+ | `wait_for_ui` used as verification | requested condition becomes true | deterministic per poll inputs | descriptive output, not a dedicated verification result |
164
+ | `wait_for_screen_change` | fingerprint changes and stays stable for one confirmation pass | deterministic | verifies change, not correctness of destination |
165
+ | `classify_action_outcome` | ordered rule evaluation over provided UI/network/log inputs | deterministic pure function | if `networkRequests` omitted, it returns `unknown` with `nextAction: 'call_get_network_activity'`; `hasLogErrors` does not change the enum outcome |
166
+
167
+ ## 6. Action Result Semantics
168
+
169
+ Across action tools, **success is not uniform**:
170
+
171
+ 1. **Execution success:** `tap`, `swipe`, `type_text`, `press_back`, `start_app`, `restart_app`, and `tap_element` mainly report that the command ran or the tap was dispatched.
172
+ 2. **Outcome success:** `scroll_to_element` reports success only if the target element was actually found during scrolling.
173
+ 3. **Boolean operation success:** `install_app`, `terminate_app`, and `reset_app_data` use tool-specific booleans (`installed`, `terminated`, `reset`) instead of the action envelope.
174
+
175
+ Failure handling is **partly standardized**:
176
+
177
+ - action-envelope tools use `failure_code` and `retryable`
178
+ - manage tools often use plain booleans plus `error` strings
179
+ - some handlers drop underlying diagnostics before the MCP response is built
180
+
181
+ ## 7. Failure Handling
182
+
183
+ ### Structured failure signals
184
+
185
+ | Source | Structured signals |
186
+ | --- | --- |
187
+ | action envelope | `ELEMENT_NOT_FOUND`, `ELEMENT_NOT_INTERACTABLE`, `TIMEOUT`, `NAVIGATION_NO_CHANGE`, `AMBIGUOUS_TARGET`, `STALE_REFERENCE`, `UNKNOWN` |
188
+ | `wait_for_ui` | `INVALID_SELECTOR`, `INVALID_CONDITION`, `PLATFORM_NOT_SUPPORTED`, `ELEMENT_NOT_FOUND`, `INTERNAL_ERROR` |
189
+ | `expect_element_visible` | `failure_code: 'TIMEOUT'\|'UNKNOWN'`, `retryable` |
190
+ | `classify_action_outcome` | `outcome: success\|no_op\|backend_failure\|ui_failure\|unknown` |
191
+ | `get_network_activity` | per-request `status: success\|failure\|retryable` |
192
+
193
+ ### Unstructured failure signals
194
+
195
+ - plain `error` strings from `install_app`, `build_app`, `build_and_install`, `find_element`, `start_log_stream`, many platform helpers
196
+ - boolean-only failures from `terminate_app` and `reset_app_data`
197
+ - top-level handler fallback: `Error executing tool <name>: ...` as plain text, not JSON
198
+
199
+ ### Retry / recovery logic present in implementation
200
+
201
+ | Area | Observed logic |
202
+ | --- | --- |
203
+ | `wait_for_ui` | `retry.max_attempts` and `retry.backoff_ms` |
204
+ | `scroll_to_element` | repeated swipes up to `maxScrolls` |
205
+ | Android `install_app` | retries `pm install` with `-t` on test-only failure; has push + shell fallback |
206
+ | iOS `install_app` | tries `simctl install`, may fall back to `idb` |
207
+ | `get_ui_tree` | platform handlers retry up to three times |
208
+ | `wait_for_screen_change` | one stability confirmation pass after a detected change |
209
+
210
+ ## 8. Execution Patterns (Observed)
211
+
212
+ 1. **Generic action wrapper**
213
+ `notifyActionStart()` → fingerprint before → platform action → fingerprint after → action envelope.
214
+
215
+ 2. **Resolved tap flow**
216
+ `wait_for_ui` returns `element.elementId` → `tap_element` uses cached element and current UI tree to validate it → tap → fingerprints before/after.
217
+
218
+ 3. **Visibility verification flow**
219
+ `expect_element_visible` is implemented as `wait_for_ui(... condition:'visible' ...)` plus a narrower binary result.
220
+
221
+ 4. **Screen verification flow**
222
+ `wait_for_screen_change` and `expect_screen` both depend on `get_screen_fingerprint`; `expect_screen` may additionally call `get_current_screen` on Android when matching by screen name.
223
+
224
+ 5. **Network correlation flow**
225
+ action tools that call `notifyActionStart()` create the time window used by `get_network_activity`; `classify_action_outcome` can then classify using supplied request summaries.
226
+
227
+ 6. **Snapshot/debug flow**
228
+ `capture_debug_snapshot` aggregates screenshot, current screen, fingerprint, UI tree, and logs in one call.
229
+
230
+ ## 9. Inconsistencies and Gaps
231
+
232
+ 1. **Response envelope mismatch:** most tools return wrapped JSON, but `get_logs`, `capture_screenshot`, and `build_and_install` use multi-block responses.
233
+ 2. **Unexpected-error shape mismatch:** uncaught handler failures become plain text strings, not structured JSON.
234
+ 3. **Action result mismatch:** some mutation tools use `ActionExecutionResult`; `install_app`, `terminate_app`, `reset_app_data`, and `build_and_install` do not.
235
+ 4. **Success semantics mismatch:** `scroll_to_element` success is outcome-based; most other action tools are execution-based.
236
+ 5. **Detail richness mismatch:** `start_app` and `restart_app` include `device` and rich `details`; other action-envelope tools usually omit raw error/details.
237
+ 6. **Failure-code derivation mismatch:** generic action wrappers infer `failure_code` by matching substrings in error text; `tap_element` assigns codes directly.
238
+ 7. **Dropped diagnostics:** handler-level MCP responses omit some underlying `diagnostics`/`error` detail, especially for `terminate_app`, `reset_app_data`, and `get_logs`.
239
+ 8. **`expect_element_visible` type/implementation mismatch:** the type allows `ELEMENT_NOT_FOUND`, but the implementation only emits `TIMEOUT` or `UNKNOWN`.
240
+ 9. **Platform mismatch:** `get_current_screen` is Android-only; `type_text` and `press_back` are Android-only; other tools are dual-platform.
241
+ 10. **Observation helper gap:** `waitForUICore` supports `ui`/`log`/`screen`/`idle` modes internally, but only the newer selector-based `wait_for_ui` is exposed as a tool.
242
+ 11. **Network-window coverage gap:** only tools that call `notifyActionStart()` reset the network activity window; `install_app`, `terminate_app`, and `reset_app_data` do not.
243
+ 12. **`classify_action_outcome` log input is secondary in name only:** `hasLogErrors` affects reasoning text for `no_op` but never changes the enum outcome.
244
+ 13. **`build_and_install` has dead autodetect code:** handler requires `platform` and `projectType`, but later still contains unreachable fallback autodetection branches.
245
+ 14. **Runtime object shape drift:** `list_devices` may return extra runtime fields like `appInstalled` and `booted` beyond the base `DeviceInfo` shape.
246
+
247
+ ## 10. Minimal Canonical Model (Derived, Not Invented)
248
+
249
+ ### Common action shape already present
250
+
251
+ ```ts
252
+ {
253
+ action_id: string,
254
+ timestamp: string,
255
+ action_type: string,
256
+ target: {
257
+ selector: Record<string, unknown>,
258
+ resolved: Record<string, unknown> | null
259
+ },
260
+ success: boolean,
261
+ failure_code?: string,
262
+ retryable?: boolean,
263
+ ui_fingerprint_before: string | null,
264
+ ui_fingerprint_after: string | null,
265
+ device?: DeviceInfo,
266
+ details?: Record<string, unknown>
267
+ }
268
+ ```
269
+
270
+ This shape is already used directly or closely approximated by:
271
+
272
+ - `start_app`
273
+ - `restart_app`
274
+ - `tap`
275
+ - `tap_element`
276
+ - `swipe`
277
+ - `scroll_to_element`
278
+ - `type_text`
279
+ - `press_back`
280
+
281
+ ### Common observation/verification pattern already present
282
+
283
+ ```ts
284
+ {
285
+ requested|expected: ...,
286
+ observed: ...,
287
+ success|status: boolean | 'success' | 'timeout',
288
+ metrics?|confidence?|comparison?|reason?
289
+ }
290
+ ```
291
+
292
+ Examples:
293
+
294
+ - `wait_for_ui` → `requested`, `observed`, `metrics`
295
+ - `expect_screen` → `expected_screen`, `observed_screen`, `comparison`
296
+ - `expect_element_visible` → `selector`, `observed`, `reason`
297
+ - `wait_for_screen_change` → previous vs observed/new fingerprint
298
+
299
+ ### Common failure signals already present
300
+
301
+ - action failure codes from `ActionFailureCode`
302
+ - wait/expect codes (`INVALID_*`, `ELEMENT_NOT_FOUND`, `TIMEOUT`, `UNKNOWN`)
303
+ - network request statuses (`success`, `failure`, `retryable`)
304
+ - fallback unstructured `error` strings
305
+
306
+ ### Common flow already present
307
+
308
+ - resolve device
309
+ - perform platform operation
310
+ - optionally capture fingerprints before/after
311
+ - return structured JSON, usually in one text block
312
+ - perform verification in separate tools rather than as part of most actions
@@ -0,0 +1,272 @@
1
+ # MCP Tooling Specification — Spec v1 (Refined)
2
+
3
+ ## 1. Scope
4
+
5
+ This specification defines the runtime contract for MCP tools used to interact with mobile applications.
6
+
7
+ It standardizes:
8
+
9
+ - action execution semantics
10
+ - verification model
11
+ - failure handling
12
+ - response shape constraints
13
+
14
+ This spec is incremental and aligned with the current implementation. It does not introduce new tools or require architectural redesign.
15
+
16
+ ## 2. Core Model
17
+
18
+ The system is based on a strict separation:
19
+
20
+ - Action tools perform execution
21
+ - Verification tools determine outcome
22
+ - `wait_for_*` tools resolve and synchronize
23
+ - Observation tools inspect state
24
+
25
+ ## 3. Execution Model
26
+
27
+ Canonical flow for verifiable interactions:
28
+
29
+ `RESOLVE -> ACT -> WAIT (optional) -> EXPECT`
30
+
31
+ This flow applies when outcome verification is required.
32
+
33
+ It does not apply to:
34
+
35
+ - pure inspection tools
36
+ - observation-only flows
37
+ - non-verifiable or exploratory actions
38
+
39
+ ## 4. Action Tools
40
+
41
+ ### 4.1 Definition
42
+
43
+ Action tools mutate application state.
44
+
45
+ Includes:
46
+ `start_app`, `restart_app`, `tap`, `tap_element`, `swipe`, `scroll_to_element`, `type_text`, `press_back`
47
+
48
+ ### 4.2 Required Semantics
49
+
50
+ - `success` MUST represent execution success only
51
+ - execution success means the platform command was dispatched without error
52
+ - `success` MUST NOT imply outcome success
53
+
54
+ ### 4.3 Action Envelope
55
+
56
+ MUST be returned in this structure:
57
+
58
+ ```ts
59
+ {
60
+ action_id: string,
61
+ timestamp: string,
62
+ action_type: string,
63
+ target: {
64
+ selector: object,
65
+ resolved: object | null
66
+ },
67
+ success: boolean,
68
+ ui_fingerprint_before: string | null,
69
+ ui_fingerprint_after: string | null,
70
+ failure_code?: string,
71
+ retryable?: boolean,
72
+ device?: DeviceInfo,
73
+ details?: object
74
+ }
75
+ ```
76
+
77
+ Rules:
78
+
79
+ - `success` is at the top level, not nested
80
+ - `target` contains only selection and resolution context
81
+ - fingerprints represent observed pre/post UI state on a best-effort basis
82
+ - `failure_code` is optional but MUST be used when a structured mapping exists
83
+
84
+ ### 4.4 Allowed Deviations
85
+
86
+ Explicit temporary exceptions:
87
+
88
+ - `install_app`, `terminate_app`, `reset_app_data` do not use this envelope
89
+ - `scroll_to_element` may temporarily retain outcome-based success semantics
90
+ - partial `failure_code` coverage is allowed
91
+ - detail richness may vary across tools
92
+
93
+ ## 5. Verification Tools
94
+
95
+ ### 5.1 Definition
96
+
97
+ Verification tools determine whether the intended outcome occurred.
98
+
99
+ Primary:
100
+
101
+ - `expect_screen`
102
+ - `expect_element_visible`
103
+
104
+ ### 5.2 Required Semantics
105
+
106
+ - MUST return `success` as a boolean
107
+ - `success` MUST represent outcome truth
108
+ - MUST be binary and deterministic
109
+
110
+ Optional fields do not affect `success`:
111
+ `observed`, `expected`, `comparison`, `reason`, `confidence`
112
+
113
+ ### 5.3 Authoritative Role
114
+
115
+ Verification tools are the only authoritative source of outcome truth.
116
+
117
+ Action tools MUST NOT be used to infer outcome success.
118
+
119
+ ### 5.4 Applicability Rules
120
+
121
+ An `expect_*` tool is applicable when:
122
+
123
+ - expected destination screen is known -> `expect_screen`
124
+ - expected UI element state is known -> `expect_element_visible`
125
+ - outcome is explicitly defined or testable
126
+
127
+ Rules:
128
+
129
+ - `wait_for_*` MAY be used before `expect_*` for synchronization
130
+ - `wait_for_*` MUST NOT replace `expect_*` when an applicable `expect_*` tool exists
131
+ - when no applicable `expect_*` tool exists, `expect_*` MAY be skipped
132
+
133
+ ## 6. wait_for_* Tools
134
+
135
+ ### 6.1 Definition
136
+
137
+ `wait_for_*` tools provide deterministic resolution and synchronization.
138
+
139
+ Examples:
140
+
141
+ - `wait_for_ui`
142
+ - `wait_for_screen_change`
143
+
144
+ ### 6.2 Rules
145
+
146
+ - MAY resolve UI elements
147
+ - MAY synchronize UI/system state
148
+ - MUST NOT be treated as final verification when `expect_*` is applicable
149
+
150
+ ### 6.3 Semantics
151
+
152
+ - `success` indicates condition met or resolution succeeded
153
+ - `success` does NOT indicate outcome correctness
154
+
155
+ ## 7. Failure Semantics
156
+
157
+ ### 7.1 Canonical Codes
158
+
159
+ - `ELEMENT_NOT_FOUND`
160
+ - `ELEMENT_NOT_INTERACTABLE`
161
+ - `TIMEOUT`
162
+ - `NAVIGATION_NO_CHANGE`
163
+ - `AMBIGUOUS_TARGET`
164
+ - `STALE_REFERENCE`
165
+ - `UNKNOWN`
166
+
167
+ ### 7.2 Rules
168
+
169
+ - `failure_code` MUST be used when a structured mapping exists
170
+ - `failure_code` MUST NOT be replaced by string errors
171
+ - string errors MAY exist for diagnostics only
172
+ - not all tools must emit all codes
173
+
174
+ ### 7.3 Scope
175
+
176
+ Applies to:
177
+
178
+ - action tools
179
+ - verification tools
180
+ - `wait_for_ui`-style tools
181
+
182
+ ## 8. Response Shape
183
+
184
+ ### 8.1 Default
185
+
186
+ All responses MUST be a single JSON text block.
187
+
188
+ ### 8.2 Allowed Exceptions
189
+
190
+ Multi-block responses are allowed only for:
191
+
192
+ - `get_logs`
193
+ - `capture_screenshot`
194
+ - `build_and_install`
195
+
196
+ ### 8.3 Errors
197
+
198
+ All handler/runtime errors MUST be JSON-wrapped.
199
+
200
+ String-only errors are not allowed, including fallback handler errors.
201
+
202
+ Note: string diagnostics may still appear inside structured JSON payloads where explicitly defined by a tool.
203
+
204
+ ## 9. Classification
205
+
206
+ Tool: `classify_action_outcome`
207
+
208
+ Rules:
209
+
210
+ - MAY use UI, network, and log signals
211
+ - MUST be deterministic
212
+ - MUST NOT replace `expect_*` tools
213
+ - MUST be treated as a supplementary signal only
214
+
215
+ It is not a verification mechanism.
216
+
217
+ ## 10. Execution Patterns
218
+
219
+ Canonical pattern:
220
+
221
+ `wait_for_ui -> tap_element -> wait_for_screen_change (optional) -> expect_screen`
222
+
223
+ Interpretation:
224
+
225
+ - `tap_element.success` = executed
226
+ - `wait_for_screen_change.success` = UI changed
227
+ - `expect_screen.success` = correct outcome verified
228
+
229
+ ## 11. Known Deviations
230
+
231
+ Explicitly allowed:
232
+
233
+ - `install_app`, `terminate_app`, `reset_app_data` not using envelope
234
+ - `build_and_install` streaming NDJSON
235
+ - platform-specific tools
236
+ - partial failure coverage
237
+ - `scroll_to_element` outcome-based success (temporary exception)
238
+ - extended runtime fields in `list_devices`
239
+
240
+ ## 12. Migration Rules
241
+
242
+ Must change now:
243
+
244
+ - uncaught errors must be JSON-wrapped
245
+
246
+ Should align when touched:
247
+
248
+ - `tap`, `swipe`, `type_text`, `press_back`
249
+ - `start_app`, `restart_app`
250
+ - `scroll_to_element`
251
+ - `wait_for_ui`
252
+
253
+ No change required:
254
+
255
+ - `tap_element`
256
+ - `expect_screen`
257
+ - `expect_element_visible`
258
+ - `wait_for_screen_change`
259
+
260
+ ## 13. Guiding Principles
261
+
262
+ - Actions execute
263
+ - Verification proves
264
+ - Waiting synchronizes
265
+ - Classification assists
266
+
267
+ ## Final Definition
268
+
269
+ Action success equals execution success.
270
+ Outcome success equals verification success.
271
+
272
+ Verification tools are authoritative when the expected outcome is defined.