mobile-debug-mcp 0.24.8 → 0.25.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +1 -1
- package/dist/interact/index.js +213 -3
- package/dist/observe/ios.js +56 -2
- package/dist/server/common.js +2 -1
- package/dist/server/tool-definitions.js +55 -0
- package/dist/server/tool-handlers.js +17 -0
- package/dist/server-core.js +1 -1
- package/dist/utils/android/utils.js +67 -1
- package/docs/CHANGELOG.md +3 -0
- package/docs/ROADMAP.md +388 -0
- package/docs/rfcs/001-state-verification.md +452 -0
- package/docs/specs/mcp-tooling-spec-v1.md +4 -0
- package/docs/tools/interact.md +25 -0
- package/docs/tools/observe.md +2 -1
- package/package.json +1 -1
- package/src/interact/index.ts +240 -3
- package/src/observe/ios.ts +62 -3
- package/src/server/common.ts +2 -1
- package/src/server/tool-definitions.ts +55 -0
- package/src/server/tool-handlers.ts +18 -0
- package/src/server-core.ts +1 -1
- package/src/types.ts +41 -0
- package/src/utils/android/utils.ts +78 -14
- package/test/unit/observe/state_extraction.test.ts +43 -0
- package/test/unit/server/response_shapes.test.ts +40 -2
package/docs/ROADMAP.md
ADDED
|
@@ -0,0 +1,388 @@
|
|
|
1
|
+
# Mobile Debug MCP Prioritized Roadmap
|
|
2
|
+
|
|
3
|
+
## Prioritization Criteria
|
|
4
|
+
|
|
5
|
+
Ordered by:
|
|
6
|
+
|
|
7
|
+
1. Impact on agent reliability
|
|
8
|
+
2. Reduction in retries / brittleness
|
|
9
|
+
3. Breadth of app coverage improved
|
|
10
|
+
4. Implementation complexity vs payoff
|
|
11
|
+
|
|
12
|
+
## Program-Level Success Metrics
|
|
13
|
+
Track roadmap impact across releases using:
|
|
14
|
+
|
|
15
|
+
- Retry reduction rate (% fewer action retries per task)
|
|
16
|
+
- Element match success rate (% successful element targeting)
|
|
17
|
+
- Verification success rate (% expect_* checks passing first attempt)
|
|
18
|
+
- Wait success rate for asynchronous UI flows
|
|
19
|
+
- Custom control interaction success rate
|
|
20
|
+
- Gesture success rate
|
|
21
|
+
- Mean time to root cause during debugging
|
|
22
|
+
- Overall agent task completion rate
|
|
23
|
+
|
|
24
|
+
Primary KPI:
|
|
25
|
+
Higher task success with fewer retries.
|
|
26
|
+
|
|
27
|
+
---
|
|
28
|
+
|
|
29
|
+
# Priority 1 — Stronger State Verification
|
|
30
|
+
|
|
31
|
+
## Why first
|
|
32
|
+
Highest leverage improvement.
|
|
33
|
+
|
|
34
|
+
Most failures are not “can’t act,” they’re:
|
|
35
|
+
- uncertain state
|
|
36
|
+
- weak verification
|
|
37
|
+
- retry loops caused by inference
|
|
38
|
+
|
|
39
|
+
## Deliver
|
|
40
|
+
- Direct readable control values
|
|
41
|
+
- Expanded `expect_*` verification
|
|
42
|
+
- Move from inference to state introspection
|
|
43
|
+
|
|
44
|
+
## Expected Impact
|
|
45
|
+
Very high.
|
|
46
|
+
|
|
47
|
+
## Done Criteria
|
|
48
|
+
- Control state readable for core widgets (toggle, slider, input, dropdown)
|
|
49
|
+
- New expect_* state verifiers implemented
|
|
50
|
+
- Agents can verify state without visual inference in representative flows
|
|
51
|
+
- Documentation and snapshot response shape updated
|
|
52
|
+
|
|
53
|
+
## Success Metrics
|
|
54
|
+
- 30%+ retry reduction on stateful tasks
|
|
55
|
+
- Higher first-pass verification success
|
|
56
|
+
- Reduced false positive verifications
|
|
57
|
+
|
|
58
|
+
## Dependencies
|
|
59
|
+
Blocks or strengthens:
|
|
60
|
+
- Priority 5 — Better Compose / Custom Control Semantics
|
|
61
|
+
- Priority 6 — Pinch to Zoom verification
|
|
62
|
+
- Priority 7 — Action Trace Correlation
|
|
63
|
+
|
|
64
|
+
---
|
|
65
|
+
|
|
66
|
+
# Priority 2 — Richer Element Identity
|
|
67
|
+
|
|
68
|
+
## Why second
|
|
69
|
+
Directly reduces selector brittleness.
|
|
70
|
+
|
|
71
|
+
Improves:
|
|
72
|
+
- targeting stability
|
|
73
|
+
- repeatability
|
|
74
|
+
- agent confidence
|
|
75
|
+
|
|
76
|
+
## Deliver
|
|
77
|
+
- Stable IDs / test tags prioritization
|
|
78
|
+
- Selector confidence metadata
|
|
79
|
+
- Preferred selector hierarchy
|
|
80
|
+
|
|
81
|
+
## Expected Impact
|
|
82
|
+
Very high.
|
|
83
|
+
|
|
84
|
+
## Done Criteria
|
|
85
|
+
- Stable selector preference order implemented
|
|
86
|
+
- Test tags/resource IDs surfaced where available
|
|
87
|
+
- Selector confidence metadata available
|
|
88
|
+
- Structural fallback selectors defined
|
|
89
|
+
|
|
90
|
+
## Success Metrics
|
|
91
|
+
- Higher element match rate
|
|
92
|
+
- Reduced selector drift failures
|
|
93
|
+
- Lower retargeting retries
|
|
94
|
+
|
|
95
|
+
## Dependencies
|
|
96
|
+
Blocks or strengthens:
|
|
97
|
+
- Priority 4 — Long Press targeting reliability
|
|
98
|
+
- Priority 5 — Better Compose / Custom Control Semantics
|
|
99
|
+
- Priority 6 — Pinch to Zoom targeting
|
|
100
|
+
|
|
101
|
+
---
|
|
102
|
+
|
|
103
|
+
# Priority 3 — Wait and Synchronization Reliability
|
|
104
|
+
|
|
105
|
+
## Why third
|
|
106
|
+
Reliable async synchronization is foundational for agent success and should precede gesture expansion.
|
|
107
|
+
|
|
108
|
+
Addresses failures where agents:
|
|
109
|
+
- skip UI waits after actions
|
|
110
|
+
- rely on network/log signals too early
|
|
111
|
+
- struggle with in-place UI updates
|
|
112
|
+
- misread stale UI snapshots
|
|
113
|
+
|
|
114
|
+
## Deliver
|
|
115
|
+
- UI-first synchronization policy guidance
|
|
116
|
+
- wait_for_ui_change (hierarchy diff based waiting)
|
|
117
|
+
- Structured loading state detection
|
|
118
|
+
- Snapshot revision / staleness metadata
|
|
119
|
+
- Compose-aware wait robustness improvements
|
|
120
|
+
|
|
121
|
+
## Expected Impact
|
|
122
|
+
Very high.
|
|
123
|
+
|
|
124
|
+
## Done Criteria
|
|
125
|
+
- wait_for_ui_change implemented
|
|
126
|
+
- Loading state detection available for representative controls
|
|
127
|
+
- Snapshot revision or staleness metadata exposed
|
|
128
|
+
- UI-first sync guidance added to spec guardrails
|
|
129
|
+
- In-place update waits validated on benchmark flows
|
|
130
|
+
|
|
131
|
+
## Success Metrics
|
|
132
|
+
- Reduced missed async UI transitions
|
|
133
|
+
- Fewer retries caused by premature actions
|
|
134
|
+
- Higher wait success rate for dynamic UI flows
|
|
135
|
+
- Lower fallback usage to network/log checks
|
|
136
|
+
|
|
137
|
+
## Dependencies
|
|
138
|
+
Depends on:
|
|
139
|
+
- Priority 1 — Stronger State Verification
|
|
140
|
+
- Priority 2 — Richer Element Identity
|
|
141
|
+
|
|
142
|
+
Blocks or strengthens:
|
|
143
|
+
- Priority 5 — Better Compose / Custom Control Semantics
|
|
144
|
+
- Priority 7 — Action Trace Correlation
|
|
145
|
+
|
|
146
|
+
---
|
|
147
|
+
|
|
148
|
+
# Priority 4 — Long Press Gesture
|
|
149
|
+
|
|
150
|
+
## Why fourth
|
|
151
|
+
High utility, relatively low complexity.
|
|
152
|
+
|
|
153
|
+
Unlocks many currently awkward interactions:
|
|
154
|
+
|
|
155
|
+
- context menus
|
|
156
|
+
- hidden actions
|
|
157
|
+
- reorder handles
|
|
158
|
+
- press-and-hold controls
|
|
159
|
+
|
|
160
|
+
Broad usefulness.
|
|
161
|
+
|
|
162
|
+
## Deliver
|
|
163
|
+
New tool:
|
|
164
|
+
|
|
165
|
+
```json
|
|
166
|
+
long_press(element_id, duration_ms?)
|
|
167
|
+
```
|
|
168
|
+
|
|
169
|
+
Verification alignment:
|
|
170
|
+
- expect_context_menu
|
|
171
|
+
- expect_press_effect
|
|
172
|
+
|
|
173
|
+
## Expected Impact
|
|
174
|
+
High.
|
|
175
|
+
|
|
176
|
+
## Done Criteria
|
|
177
|
+
- long_press tool implemented across supported platforms
|
|
178
|
+
- Duration defaults and overrides supported
|
|
179
|
+
- Verification patterns for long press outcomes defined
|
|
180
|
+
- Included in action capability model
|
|
181
|
+
|
|
182
|
+
## Success Metrics
|
|
183
|
+
- Increased hidden/control-surface interaction coverage
|
|
184
|
+
- Reduced dead-end interaction failures
|
|
185
|
+
- Long press task success rate tracked
|
|
186
|
+
|
|
187
|
+
## Dependencies
|
|
188
|
+
Depends on:
|
|
189
|
+
- Priority 2 — Richer Element Identity
|
|
190
|
+
|
|
191
|
+
Strengthens:
|
|
192
|
+
- Priority 5 semantics interaction contracts
|
|
193
|
+
|
|
194
|
+
---
|
|
195
|
+
|
|
196
|
+
# Priority 5 — Better Compose / Custom Control Semantics
|
|
197
|
+
|
|
198
|
+
## Why fifth
|
|
199
|
+
Important, but strengthened by priorities 1–4 first.
|
|
200
|
+
|
|
201
|
+
Semantics become more useful once:
|
|
202
|
+
- identity is stronger
|
|
203
|
+
- verification is stronger
|
|
204
|
+
- gestures are richer
|
|
205
|
+
- synchronization is more reliable
|
|
206
|
+
|
|
207
|
+
## Deliver
|
|
208
|
+
- Composite control traits
|
|
209
|
+
- Control role enrichment (adjustable, expandable, selectable_group)
|
|
210
|
+
- Interaction contracts metadata
|
|
211
|
+
- Custom widget gesture affordance hints
|
|
212
|
+
- Semantic confidence annotations
|
|
213
|
+
- Compose-aware selectors for waits (merged semantics and element relationships)
|
|
214
|
+
|
|
215
|
+
## Expected Impact
|
|
216
|
+
High.
|
|
217
|
+
|
|
218
|
+
## Done Criteria
|
|
219
|
+
- Semantic traits implemented for major custom control classes
|
|
220
|
+
- Interaction contracts surfaced in snapshot model
|
|
221
|
+
- Confidence model defined for derived semantics
|
|
222
|
+
- Custom control manipulation success validated in benchmark flows
|
|
223
|
+
|
|
224
|
+
## Success Metrics
|
|
225
|
+
- Higher custom control interaction success rate
|
|
226
|
+
- Fewer retries on non-standard widgets
|
|
227
|
+
- Reduced semantic ambiguity failures
|
|
228
|
+
|
|
229
|
+
## Dependencies
|
|
230
|
+
Depends on:
|
|
231
|
+
- Priority 1 — Stronger State Verification
|
|
232
|
+
- Priority 2 — Richer Element Identity
|
|
233
|
+
- Priority 3 — Wait and Synchronization Reliability
|
|
234
|
+
- Priority 4 — Long Press
|
|
235
|
+
|
|
236
|
+
---
|
|
237
|
+
|
|
238
|
+
# Priority 6 — Pinch to Zoom
|
|
239
|
+
|
|
240
|
+
## Why sixth
|
|
241
|
+
Valuable, but narrower than long press.
|
|
242
|
+
|
|
243
|
+
Applies mainly to:
|
|
244
|
+
- maps
|
|
245
|
+
- images
|
|
246
|
+
- canvases
|
|
247
|
+
- zoomable custom surfaces
|
|
248
|
+
|
|
249
|
+
Useful, but less universal.
|
|
250
|
+
|
|
251
|
+
## Deliver
|
|
252
|
+
|
|
253
|
+
```json
|
|
254
|
+
pinch_to_zoom(target, scale, center?)
|
|
255
|
+
```
|
|
256
|
+
|
|
257
|
+
Verification:
|
|
258
|
+
- expect_zoom_level
|
|
259
|
+
- expect_viewport_change
|
|
260
|
+
|
|
261
|
+
## Expected Impact
|
|
262
|
+
Medium-high.
|
|
263
|
+
|
|
264
|
+
## Done Criteria
|
|
265
|
+
- pinch_to_zoom implemented
|
|
266
|
+
- Zoom in/out flows supported
|
|
267
|
+
- Verification primitives for viewport or zoom state available
|
|
268
|
+
- Gesture integrated into action model
|
|
269
|
+
|
|
270
|
+
## Success Metrics
|
|
271
|
+
- Successful execution across zoomable surfaces
|
|
272
|
+
- Reduced failures on map/image workflows
|
|
273
|
+
- Gesture success rate tracked
|
|
274
|
+
|
|
275
|
+
## Dependencies
|
|
276
|
+
Depends on:
|
|
277
|
+
- Priority 1 — Stronger State Verification
|
|
278
|
+
- Priority 2 — Richer Element Identity
|
|
279
|
+
|
|
280
|
+
---
|
|
281
|
+
|
|
282
|
+
# Priority 7 — Action Trace Correlation
|
|
283
|
+
|
|
284
|
+
## Why seventh
|
|
285
|
+
Very valuable for debugging,
|
|
286
|
+
but less critical than improving control success first.
|
|
287
|
+
|
|
288
|
+
Improves diagnosis more than task completion.
|
|
289
|
+
|
|
290
|
+
## Deliver
|
|
291
|
+
- Action correlation metadata
|
|
292
|
+
- UI/network/log linkage
|
|
293
|
+
|
|
294
|
+
## Expected Impact
|
|
295
|
+
Medium-high.
|
|
296
|
+
|
|
297
|
+
## Done Criteria
|
|
298
|
+
- Action correlation model defined
|
|
299
|
+
- UI/network/log linkage captured for representative actions
|
|
300
|
+
- Correlation metadata exposed to agents
|
|
301
|
+
- Debugging workflows validated with trace linkage
|
|
302
|
+
|
|
303
|
+
## Success Metrics
|
|
304
|
+
- Lower time-to-root-cause
|
|
305
|
+
- Faster diagnosis of partial failures
|
|
306
|
+
- Improved action causality attribution
|
|
307
|
+
|
|
308
|
+
## Dependencies
|
|
309
|
+
Depends on:
|
|
310
|
+
- Priority 1 — Stronger State Verification
|
|
311
|
+
- Priority 2 — Richer Element Identity
|
|
312
|
+
- Priority 3 — Wait and Synchronization Reliability
|
|
313
|
+
|
|
314
|
+
---
|
|
315
|
+
|
|
316
|
+
# Delivery Waves
|
|
317
|
+
|
|
318
|
+
## Dependency Summary
|
|
319
|
+
Foundational sequence:
|
|
320
|
+
|
|
321
|
+
Layer 1 (Foundations)
|
|
322
|
+
- Priority 1
|
|
323
|
+
- Priority 2
|
|
324
|
+
|
|
325
|
+
Layer 2 (Synchronization)
|
|
326
|
+
- Priority 3 depends on 1,2
|
|
327
|
+
|
|
328
|
+
Layer 3 (Interaction Expansion)
|
|
329
|
+
- Priority 4 depends on 2
|
|
330
|
+
- Priority 5 depends on 1,2,3,4
|
|
331
|
+
- Priority 6 depends on 1,2
|
|
332
|
+
|
|
333
|
+
Layer 4 (Observability)
|
|
334
|
+
- Priority 7 depends on 1,2,3
|
|
335
|
+
|
|
336
|
+
## Wave 1 (Immediate)
|
|
337
|
+
- Stronger State Verification
|
|
338
|
+
- Richer Element Identity
|
|
339
|
+
- Wait and Synchronization Reliability
|
|
340
|
+
|
|
341
|
+
Focus:
|
|
342
|
+
Make core loop more reliable.
|
|
343
|
+
|
|
344
|
+
---
|
|
345
|
+
|
|
346
|
+
## Wave 2
|
|
347
|
+
- Long Press
|
|
348
|
+
- Better Compose Semantics
|
|
349
|
+
|
|
350
|
+
Focus:
|
|
351
|
+
Expand interaction capability.
|
|
352
|
+
|
|
353
|
+
---
|
|
354
|
+
|
|
355
|
+
## Wave 3
|
|
356
|
+
- Pinch to Zoom
|
|
357
|
+
- Action Trace Correlation
|
|
358
|
+
|
|
359
|
+
Focus:
|
|
360
|
+
Advanced gestures + observability.
|
|
361
|
+
|
|
362
|
+
---
|
|
363
|
+
|
|
364
|
+
# Priority Stack Summary
|
|
365
|
+
|
|
366
|
+
Execution Order:
|
|
367
|
+
1. Stronger State Verification
|
|
368
|
+
2. Richer Element Identity
|
|
369
|
+
3. Wait and Synchronization Reliability
|
|
370
|
+
4. Long Press
|
|
371
|
+
5. Better Compose / Custom Control Semantics
|
|
372
|
+
6. Pinch to Zoom
|
|
373
|
+
7. Action Trace Correlation
|
|
374
|
+
|
|
375
|
+
Rationale:
|
|
376
|
+
- Priorities 1–3 harden control, verification, and synchronization.
|
|
377
|
+
- Priorities 4–6 expand interaction capability.
|
|
378
|
+
- Priority 7 adds observability once control reliability matures.
|
|
379
|
+
|
|
380
|
+
---
|
|
381
|
+
|
|
382
|
+
## Explicitly Deferred
|
|
383
|
+
Still out of scope:
|
|
384
|
+
|
|
385
|
+
- Recovery planning logic
|
|
386
|
+
- Autonomous retry strategy
|
|
387
|
+
- MCP-level agent orchestration
|
|
388
|
+
- Autonomous recovery hinting (future consideration only)
|