mobile-debug-mcp 0.25.1 → 0.26.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (35) hide show
  1. package/dist/interact/classify.js +48 -11
  2. package/dist/interact/index.js +113 -0
  3. package/dist/observe/android.js +10 -1
  4. package/dist/observe/index.js +19 -1
  5. package/dist/observe/ios.js +15 -1
  6. package/dist/observe/snapshot-metadata.js +88 -0
  7. package/dist/server/tool-definitions.js +49 -14
  8. package/dist/server/tool-handlers.js +12 -0
  9. package/dist/server-core.js +1 -1
  10. package/docs/CHANGELOG.md +9 -0
  11. package/docs/ROADMAP.md +66 -38
  12. package/docs/rfcs/003-wait-and-synchronization-reliability.md +296 -0
  13. package/docs/rfcs/004-action-verification-routing.md +342 -0
  14. package/docs/specs/mcp-tooling-spec-v1.md +11 -3
  15. package/docs/tools/interact.md +31 -8
  16. package/docs/tools/observe.md +4 -2
  17. package/package.json +1 -1
  18. package/skills/rfc-review/SKILL.md +52 -0
  19. package/skills/rfc-review/references/rfc-review-checklist.md +12 -0
  20. package/skills/rfc-review/references/rfc-review-template.md +28 -0
  21. package/src/interact/classify.ts +53 -13
  22. package/src/interact/index.ts +151 -0
  23. package/src/observe/android.ts +11 -1
  24. package/src/observe/index.ts +26 -1
  25. package/src/observe/ios.ts +28 -13
  26. package/src/observe/snapshot-metadata.ts +107 -0
  27. package/src/server/tool-definitions.ts +49 -14
  28. package/src/server/tool-handlers.ts +13 -0
  29. package/src/server-core.ts +1 -1
  30. package/src/types.ts +23 -0
  31. package/test/unit/interact/classify_action_outcome.test.ts +44 -25
  32. package/test/unit/interact/wait_for_ui_change.test.ts +76 -0
  33. package/test/unit/server/contract.test.ts +8 -6
  34. package/test/unit/server/response_shapes.test.ts +37 -3
  35. package/docs/rfcs/003-wait-and-synchronization-reliability +0 -232
package/docs/ROADMAP.md CHANGED
@@ -1,6 +1,6 @@
1
- # Mobile Debug MCP Prioritized Roadmap
1
+ # Mobile Debug MCP Roadmap
2
2
 
3
- ## Prioritization Criteria
3
+ ## Planning Principles
4
4
 
5
5
  Ordered by:
6
6
 
@@ -26,33 +26,45 @@ Higher task success with fewer retries.
26
26
 
27
27
  ---
28
28
 
29
- # Completed
29
+ # Roadmap Status Overview
30
30
 
31
- These priorities are done and kept here for history:
31
+ ## Completed Foundations
32
32
 
33
- - Priority 1 Stronger State Verification
34
- - Priority 2 — Richer Element Identity
33
+ | Capability | Status | Notes |
34
+ |-----------|--------|-------|
35
+ | Stronger State Verification | Complete | Foundational verification layer shipped |
36
+ | Richer Element Identity | Complete | Identity and selector confidence foundations shipped |
37
+
38
+ ## Current Focus
39
+
40
+ - Wait and Synchronization Reliability
41
+
42
+ ## Upcoming Work
43
+
44
+ - Long Press Gesture
45
+ - Better Compose / Custom Control Semantics
35
46
 
36
- Completion notes:
47
+ ## Later Horizon
37
48
 
38
- - State-aware verification is now implemented and wired through the tool surface.
39
- - Platform-native element metadata and selector-confidence hints are now part of the runtime contract.
49
+ - Pinch to Zoom
50
+ - Action Trace Correlation
40
51
 
41
52
  ---
42
53
 
43
- # Priority 1 — Stronger State Verification
54
+ # Stronger State Verification
44
55
 
45
56
  ## Why first
46
57
  Highest leverage improvement.
47
58
 
48
- **Status:** Completed
59
+ **Status:** Completed
60
+ **Priority:** P1
49
61
 
50
62
  Most failures are not “can’t act,” they’re:
51
63
  - uncertain state
52
64
  - weak verification
53
65
  - retry loops caused by inference
54
66
 
55
- ## Deliver
67
+ ## Scope
56
68
  - Direct readable control values
57
69
  - Expanded `expect_*` verification
58
70
  - Move from inference to state introspection
@@ -60,7 +72,7 @@ Most failures are not “can’t act,” they’re:
60
72
  ## Expected Impact
61
73
  Very high.
62
74
 
63
- ## Done Criteria
75
+ ## Exit Criteria
64
76
  - Control state readable for core widgets (toggle, slider, input, dropdown)
65
77
  - New expect_* state verifiers implemented
66
78
  - Agents can verify state without visual inference in representative flows
@@ -79,19 +91,20 @@ Blocks or strengthens:
79
91
 
80
92
  ---
81
93
 
82
- # Priority 2 — Richer Element Identity
94
+ # Richer Element Identity
83
95
 
84
96
  ## Why second
85
97
  Directly reduces selector brittleness.
86
98
 
87
- **Status:** Completed
99
+ **Status:** Completed
100
+ **Priority:** P2
88
101
 
89
102
  Improves:
90
103
  - targeting stability
91
104
  - repeatability
92
105
  - agent confidence
93
106
 
94
- ## Deliver
107
+ ## Scope
95
108
  - Stable IDs / test tags prioritization
96
109
  - Selector confidence metadata
97
110
  - Preferred selector hierarchy
@@ -99,7 +112,7 @@ Improves:
99
112
  ## Expected Impact
100
113
  Very high.
101
114
 
102
- ## Done Criteria
115
+ ## Exit Criteria
103
116
  - Stable selector preference order implemented
104
117
  - Test tags/resource IDs surfaced where available
105
118
  - Selector confidence metadata available
@@ -118,18 +131,21 @@ Blocks or strengthens:
118
131
 
119
132
  ---
120
133
 
121
- # Priority 3 — Wait and Synchronization Reliability
134
+ # Wait and Synchronization Reliability
122
135
 
123
136
  ## Why third
124
137
  Reliable async synchronization is foundational for agent success and should precede gesture expansion.
125
138
 
139
+ **Status:** Spec Ready
140
+ **Priority:** P3
141
+
126
142
  Addresses failures where agents:
127
143
  - skip UI waits after actions
128
144
  - rely on network/log signals too early
129
145
  - struggle with in-place UI updates
130
146
  - misread stale UI snapshots
131
147
 
132
- ## Deliver
148
+ ## Scope
133
149
  - UI-first synchronization policy guidance
134
150
  - wait_for_ui_change (hierarchy diff based waiting)
135
151
  - Structured loading state detection
@@ -139,7 +155,7 @@ Addresses failures where agents:
139
155
  ## Expected Impact
140
156
  Very high.
141
157
 
142
- ## Done Criteria
158
+ ## Exit Criteria
143
159
  - wait_for_ui_change implemented
144
160
  - Loading state detection available for representative controls
145
161
  - Snapshot revision or staleness metadata exposed
@@ -163,11 +179,14 @@ Blocks or strengthens:
163
179
 
164
180
  ---
165
181
 
166
- # Priority 4 — Long Press Gesture
182
+ # Long Press Gesture
167
183
 
168
184
  ## Why fourth
169
185
  High utility, relatively low complexity.
170
186
 
187
+ **Status:** Planned
188
+ **Priority:** P4
189
+
171
190
  Unlocks many currently awkward interactions:
172
191
 
173
192
  - context menus
@@ -177,7 +196,7 @@ Unlocks many currently awkward interactions:
177
196
 
178
197
  Broad usefulness.
179
198
 
180
- ## Deliver
199
+ ## Scope
181
200
  New tool:
182
201
 
183
202
  ```json
@@ -191,7 +210,7 @@ Verification alignment:
191
210
  ## Expected Impact
192
211
  High.
193
212
 
194
- ## Done Criteria
213
+ ## Exit Criteria
195
214
  - long_press tool implemented across supported platforms
196
215
  - Duration defaults and overrides supported
197
216
  - Verification patterns for long press outcomes defined
@@ -211,18 +230,21 @@ Strengthens:
211
230
 
212
231
  ---
213
232
 
214
- # Priority 5 — Better Compose / Custom Control Semantics
233
+ # Better Compose / Custom Control Semantics
215
234
 
216
235
  ## Why fifth
217
236
  Important, but strengthened by priorities 1–4 first.
218
237
 
238
+ **Status:** Planned
239
+ **Priority:** P5
240
+
219
241
  Semantics become more useful once:
220
242
  - identity is stronger
221
243
  - verification is stronger
222
244
  - gestures are richer
223
245
  - synchronization is more reliable
224
246
 
225
- ## Deliver
247
+ ## Scope
226
248
  - Composite control traits
227
249
  - Control role enrichment (adjustable, expandable, selectable_group)
228
250
  - Interaction contracts metadata
@@ -233,7 +255,7 @@ Semantics become more useful once:
233
255
  ## Expected Impact
234
256
  High.
235
257
 
236
- ## Done Criteria
258
+ ## Exit Criteria
237
259
  - Semantic traits implemented for major custom control classes
238
260
  - Interaction contracts surfaced in snapshot model
239
261
  - Confidence model defined for derived semantics
@@ -253,11 +275,14 @@ Depends on:
253
275
 
254
276
  ---
255
277
 
256
- # Priority 6 — Pinch to Zoom
278
+ # Pinch to Zoom
257
279
 
258
280
  ## Why sixth
259
281
  Valuable, but narrower than long press.
260
282
 
283
+ **Status:** Planned
284
+ **Priority:** P6
285
+
261
286
  Applies mainly to:
262
287
  - maps
263
288
  - images
@@ -266,7 +291,7 @@ Applies mainly to:
266
291
 
267
292
  Useful, but less universal.
268
293
 
269
- ## Deliver
294
+ ## Scope
270
295
 
271
296
  ```json
272
297
  pinch_to_zoom(target, scale, center?)
@@ -279,7 +304,7 @@ Verification:
279
304
  ## Expected Impact
280
305
  Medium-high.
281
306
 
282
- ## Done Criteria
307
+ ## Exit Criteria
283
308
  - pinch_to_zoom implemented
284
309
  - Zoom in/out flows supported
285
310
  - Verification primitives for viewport or zoom state available
@@ -297,22 +322,25 @@ Depends on:
297
322
 
298
323
  ---
299
324
 
300
- # Priority 7 — Action Trace Correlation
325
+ # Action Trace Correlation
301
326
 
302
327
  ## Why seventh
303
328
  Very valuable for debugging,
304
329
  but less critical than improving control success first.
305
330
 
331
+ **Status:** Planned
332
+ **Priority:** P7
333
+
306
334
  Improves diagnosis more than task completion.
307
335
 
308
- ## Deliver
336
+ ## Scope
309
337
  - Action correlation metadata
310
338
  - UI/network/log linkage
311
339
 
312
340
  ## Expected Impact
313
341
  Medium-high.
314
342
 
315
- ## Done Criteria
343
+ ## Exit Criteria
316
344
  - Action correlation model defined
317
345
  - UI/network/log linkage captured for representative actions
318
346
  - Correlation metadata exposed to agents
@@ -331,7 +359,7 @@ Depends on:
331
359
 
332
360
  ---
333
361
 
334
- # Delivery Waves
362
+ # Roadmap Sequence
335
363
 
336
364
  ## Dependency Summary
337
365
  Foundational sequence:
@@ -351,7 +379,7 @@ Layer 3 (Interaction Expansion)
351
379
  Layer 4 (Observability)
352
380
  - Priority 7 depends on 1,2,3
353
381
 
354
- ## Wave 1 (Immediate)
382
+ ## Wave 1 (Current Focus)
355
383
  - Stronger State Verification
356
384
  - Richer Element Identity
357
385
  - Wait and Synchronization Reliability
@@ -361,7 +389,7 @@ Make core loop more reliable.
361
389
 
362
390
  ---
363
391
 
364
- ## Wave 2
392
+ ## Wave 2 (Expansion)
365
393
  - Long Press
366
394
  - Better Compose Semantics
367
395
 
@@ -370,7 +398,7 @@ Expand interaction capability.
370
398
 
371
399
  ---
372
400
 
373
- ## Wave 3
401
+ ## Wave 3 (Advanced)
374
402
  - Pinch to Zoom
375
403
  - Action Trace Correlation
376
404
 
@@ -379,7 +407,7 @@ Advanced gestures + observability.
379
407
 
380
408
  ---
381
409
 
382
- # Priority Stack Summary
410
+ # Capability Sequence
383
411
 
384
412
  Execution Order:
385
413
  1. Stronger State Verification
@@ -397,7 +425,7 @@ Rationale:
397
425
 
398
426
  ---
399
427
 
400
- ## Explicitly Deferred
428
+ ## Future Considerations
401
429
  Still out of scope:
402
430
 
403
431
  - Recovery planning logic
@@ -0,0 +1,296 @@
1
+ # RFC-003: Wait and Synchronization Reliability
2
+
3
+ Priority: 3
4
+ Depends on: RFC-001 (Stronger State Verification), RFC-002 (Platform-Native Element Metadata and Resolution Hints)
5
+
6
+ ---
7
+
8
+ # 1. Problem
9
+
10
+ Agents can often identify the right element (RFC-002) and verify the right state (RFC-001), but still fail because they act before the UI has reached the intended post-action state.
11
+
12
+ This causes:
13
+
14
+ - retries caused by racing the UI
15
+ - false failures from stale snapshots
16
+ - overuse of network/log verification when UI evidence should suffice
17
+ - flakiness in asynchronous and in-place update flows
18
+ - unreliable behaviour in Compose-heavy or thin accessibility trees
19
+
20
+ Current system limitations:
21
+
22
+ - wait_for_ui is underused after actions involving async state changes
23
+ - current waits focus on expected elements appearing, not general UI transition detection
24
+ - snapshot staleness is not explicitly surfaced
25
+ - loading state transitions are inconsistently observable
26
+
27
+ ---
28
+
29
+ # 2. Goals
30
+
31
+ This RFC introduces:
32
+
33
+ 1. UI-first synchronization policy after actions
34
+ 2. Snapshot staleness and revision metadata
35
+ 3. UI-change based waiting for in-place updates
36
+ 4. Structured loading-state detection
37
+ 5. Compose-aware synchronization hints
38
+
39
+ Success goals:
40
+
41
+ - reduce retries caused by premature actions
42
+ - increase successful post-action verification
43
+ - reduce unnecessary fallbacks to logs/network checks
44
+ - improve reliability in asynchronous UI flows
45
+
46
+ ---
47
+
48
+ # 3. Non-Goals
49
+
50
+ This RFC does not:
51
+
52
+ - redefine state verification semantics (RFC-001)
53
+ - redefine element identity contracts (RFC-002)
54
+ - add new interaction primitives (long press, pinch, etc.)
55
+ - replace network or log verification where no UI outcome exists
56
+
57
+ ---
58
+
59
+ # 4. Proposed Model
60
+
61
+ ## 4.1 UI-First Synchronization Contract (v1)
62
+
63
+ Default post-action flow SHOULD be:
64
+
65
+ ```text
66
+ action
67
+ → wait_for_ui(expected outcome)
68
+ → verify state
69
+ → only fall back to network/logs when no UI outcome exists or wait fails
70
+ ```
71
+
72
+ Tool-level contract:
73
+
74
+ - After actions expected to cause visible UI changes, agents SHOULD invoke wait_for_ui or wait_for_ui_change before verification.
75
+ - wait_for_ui SHOULD be used when an expected element or explicit outcome is known.
76
+ - wait_for_ui_change SHOULD be used for in-place mutations where a specific element target is not known.
77
+ - wait_for_screen_change SHOULD remain preferred for full navigation transitions when available.
78
+
79
+ Rules:
80
+
81
+ - UI evidence MUST be preferred over network or log evidence when a UI outcome is expected.
82
+ - Actions that trigger navigation, async mutation, or visible state changes SHOULD be followed by a wait.
83
+ - Network/log checks are fallback signals, not primary synchronization mechanisms.
84
+ - This synchronization order is normative tool behavior for agents, not advisory prose.
85
+
86
+ ---
87
+
88
+ ## 4.2 Snapshot Revision Contract
89
+
90
+ All snapshot responses MUST include revision metadata.
91
+
92
+ Emission scope:
93
+
94
+ - snapshot_revision and captured_at_ms MUST be emitted on snapshot responses.
95
+ - get_ui_tree responses SHOULD emit the same fields when backed by the same snapshot generation layer.
96
+ - If both surfaces exist, revision values MUST be consistent across them when derived from the same underlying snapshot.
97
+
98
+ Required snapshot envelope:
99
+
100
+ ```json
101
+ {
102
+ "snapshot_revision": 184,
103
+ "captured_at_ms": 1714452012301
104
+ }
105
+ ```
106
+
107
+ Field requirements:
108
+
109
+ - snapshot_revision REQUIRED on every snapshot response.
110
+ - captured_at_ms REQUIRED on every snapshot response.
111
+
112
+ Source of truth:
113
+
114
+ - snapshot_revision originates in the snapshot generation layer.
115
+ - It MUST increment when a meaningful hierarchy delta is detected.
116
+ - Cosmetic-only changes MUST NOT increment revision.
117
+
118
+ Meaningful deltas include:
119
+
120
+ - node added or removed
121
+ - visible text mutation
122
+ - control state change
123
+ - list content mutation
124
+ - navigation or view transition
125
+
126
+ Cosmetic churn examples (must not increment):
127
+
128
+ - cursor blink
129
+ - focus-only changes
130
+ - animation-only transitions
131
+ - timestamp or unrelated ephemeral text changes
132
+
133
+ Rules:
134
+
135
+ - Agents SHOULD use revision changes as synchronization signals.
136
+ - Stale revisions SHOULD trigger reacquisition before verification.
137
+ - This extends the snapshot response contract defined by RFC-002.
138
+
139
+ - Snapshot responses are the normative required emission surface; get_ui_tree emission is recommended for consistency.
140
+ - snapshot_revision MUST be monotonically increasing within a session.
141
+
142
+ ---
143
+
144
+ ## 4.3 wait_for_ui_change API
145
+
146
+ Concrete API contract:
147
+
148
+ ```ts
149
+ wait_for_ui_change({
150
+ expected_change?: "hierarchy_diff" | "text_change" | "state_change",
151
+ timeout_ms?: number,
152
+ stability_window_ms?: number
153
+ }) => {
154
+ success: boolean,
155
+ observed_change: "hierarchy_diff" | "text_change" | "state_change" | null,
156
+ snapshot_revision?: number,
157
+ timeout: boolean
158
+ }
159
+ ```
160
+
161
+ Relationship to other wait primitives:
162
+
163
+ - wait_for_screen_change remains the preferred primitive for navigation-level transitions.
164
+ - wait_for_ui_change is the preferred primitive for non-navigation UI mutations and in-place updates.
165
+ - wait_for_ui_change is additive to wait_for_screen_change, not a replacement for it.
166
+
167
+ Rules:
168
+
169
+ - stability_window_ms represents time a detected change must remain stable before success.
170
+ - Meaningful delta semantics are inherited from Section 4.2.
171
+ - wait_for_ui_change complements wait_for_ui; it does not replace it.
172
+
173
+ - Agents SHOULD prefer wait_for_screen_change for navigation and wait_for_ui_change for non-navigation changes.
174
+
175
+ ---
176
+
177
+ ## 4.4 Structured Loading-State Contract
178
+
179
+ Loading signals are OPTIONAL overall, but when a detectable loading signal exists they SHOULD be surfaced on snapshot responses and UI tree responses, and if emitted they MUST conform to the contract below.
180
+
181
+ Required shape:
182
+
183
+ ```json
184
+ {
185
+ "loading_state": {
186
+ "active": true,
187
+ "signal": "progress_indicator",
188
+ "source": "snapshot"
189
+ }
190
+ }
191
+ ```
192
+
193
+ Required fields:
194
+
195
+ - active
196
+ - signal
197
+ - source
198
+
199
+ Rules:
200
+
201
+ - Loading signals are synchronization hints only.
202
+ - Loading completion MUST NOT alone be treated as success.
203
+ - If emitted, the shape above MUST be used.
204
+ - Absence of loading_state is valid when no reliable loading signal is detectable; malformed or partial loading_state emission is not valid.
205
+
206
+ ---
207
+
208
+ ## 4.5 Compose-Aware Synchronization Hints
209
+
210
+ For Compose or thin accessibility structures:
211
+
212
+ Systems SHOULD support:
213
+
214
+ - merged semantic node changes as wait signals
215
+ - text mutations within existing nodes
216
+ - in-place recomposition awareness
217
+
218
+ These are synchronization hints layered on top of standard wait behaviour.
219
+
220
+ ---
221
+
222
+ # 5. Failure Modes
223
+
224
+ ## 5.1 Premature Action Progression
225
+
226
+ If an action is followed immediately by verification without waiting:
227
+
228
+ - system SHOULD bias toward suggesting wait_for_ui
229
+ - retries SHOULD prefer synchronization correction before repeated action execution
230
+
231
+ ---
232
+
233
+ ## 5.2 Stale Snapshot Reads
234
+
235
+ If verification uses an old snapshot:
236
+
237
+ - revision metadata SHOULD expose staleness
238
+ - agents SHOULD reacquire snapshot before retrying verification
239
+
240
+ ---
241
+
242
+ ## 5.3 No Visible UI Outcome
243
+
244
+ If no UI outcome is expected:
245
+
246
+ - network/log verification MAY be primary evidence
247
+ - UI-first policy does not apply rigidly
248
+
249
+ ---
250
+
251
+ ## 5.4 False Positive UI Change Detection
252
+
253
+ If unrelated UI churn triggers early wait completion:
254
+
255
+ - systems SHOULD reject cosmetic-only changes using Section 4.2 rules
256
+ - agents SHOULD prefer stability windows before considering waits satisfied
257
+
258
+ ---
259
+
260
+ # 6. Acceptance Criteria
261
+
262
+ RFC-003 specification is complete when:
263
+
264
+ - Snapshot Revision Contract is fully defined and mandatory.
265
+ - wait_for_ui_change API contract is fully defined.
266
+ - Loading-State Contract required schema is defined.
267
+ - Synchronization tool-selection rules are explicitly specified.
268
+ - False-positive change handling is specified.
269
+
270
+ Implementation readiness success is measured when:
271
+
272
+ - snapshot revisions reduce stale-read retries
273
+ - synchronization retries decrease
274
+ - post-action verification success increases
275
+
276
+ ---
277
+
278
+ # 7. Success Metrics
279
+
280
+ - Fewer retries caused by timing/synchronization errors
281
+ - Higher post-action verification success rate
282
+ - Reduced unnecessary fallback to network/log evidence
283
+ - Improved stability in asynchronous and Compose-heavy flows
284
+
285
+ ---
286
+
287
+ # 8. Deferred To Later RFCs
288
+
289
+ - Advanced subscriptions / notify-when-element-appears APIs
290
+ - Full action-to-ui trace correlation (Priority 7)
291
+ - Gesture-trigger-specific synchronization logic
292
+ - Element appearance subscription / notify-when-ready APIs
293
+
294
+ ---
295
+
296
+ This RFC standardises temporal reliability and synchronization signals layered on top of state verification and element identity guarantees from RFC-001 and RFC-002.