mobile-debug-mcp 0.26.1 → 0.26.3

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/docs/ROADMAP.md CHANGED
@@ -4,11 +4,23 @@
4
4
 
5
5
  Ordered by:
6
6
 
7
+
7
8
  1. Impact on agent reliability
8
9
  2. Reduction in retries / brittleness
9
10
  3. Breadth of app coverage improved
10
11
  4. Implementation complexity vs payoff
11
12
 
13
+ ## Capability Status Definitions
14
+
15
+ - **Completed**
16
+ Capability implemented and considered part of the baseline platform.
17
+
18
+ - **Spec Ready**
19
+ Capability design or RFC is mature and implementation-ready, but not yet delivered.
20
+
21
+ - **Planned**
22
+ Capability is prioritized on the roadmap, but detailed specification and/or implementation work remains ahead.
23
+
12
24
  ## Program-Level Success Metrics
13
25
  Track roadmap impact across releases using:
14
26
 
@@ -28,21 +40,22 @@ Higher task success with fewer retries.
28
40
 
29
41
  # Roadmap Status Overview
30
42
 
31
- ## Completed Foundations
43
+ ## Completed Capabilities
32
44
 
33
- | Capability | Status | Notes |
34
- |-----------|--------|-------|
35
- | Stronger State Verification | Complete | Foundational verification layer shipped |
36
- | Richer Element Identity | Complete | Identity and selector confidence foundations shipped |
45
+ - Stronger State Verification Complete (Foundational verification layer shipped)
46
+ - Richer Element Identity — Complete (Identity and selector confidence foundations shipped)
37
47
 
38
48
  ## Current Focus
39
49
 
40
50
  - Wait and Synchronization Reliability
51
+ - Actionability Resolution
41
52
 
42
53
  ## Upcoming Work
43
54
 
44
- - Long Press Gesture
55
+ - Adjustable Control Support
45
56
  - Better Compose / Custom Control Semantics
57
+ - Signal-Oriented Diagnostic Filtering
58
+ - Long Press Gesture
46
59
 
47
60
  ## Later Horizon
48
61
 
@@ -53,11 +66,10 @@ Higher task success with fewer retries.
53
66
 
54
67
  # Stronger State Verification
55
68
 
56
- ## Why first
69
+ ## Rationale
57
70
  Highest leverage improvement.
58
71
 
59
- **Status:** Completed
60
- **Priority:** P1
72
+ **Status:** Completed
61
73
 
62
74
  Most failures are not “can’t act,” they’re:
63
75
  - uncertain state
@@ -85,19 +97,18 @@ Very high.
85
97
 
86
98
  ## Dependencies
87
99
  Blocks or strengthens:
88
- - Priority 5 — Better Compose / Custom Control Semantics
89
- - Priority 6 — Pinch to Zoom verification
90
- - Priority 7 — Action Trace Correlation
100
+ - Better Compose / Custom Control Semantics
101
+ - Pinch to Zoom
102
+ - Action Trace Correlation
91
103
 
92
104
  ---
93
105
 
94
106
  # Richer Element Identity
95
107
 
96
- ## Why second
108
+ ## Rationale
97
109
  Directly reduces selector brittleness.
98
110
 
99
- **Status:** Completed
100
- **Priority:** P2
111
+ **Status:** Completed
101
112
 
102
113
  Improves:
103
114
  - targeting stability
@@ -125,19 +136,18 @@ Very high.
125
136
 
126
137
  ## Dependencies
127
138
  Blocks or strengthens:
128
- - Priority 4 — Long Press targeting reliability
129
- - Priority 5 — Better Compose / Custom Control Semantics
130
- - Priority 6 — Pinch to Zoom targeting
139
+ - Long Press Gesture
140
+ - Better Compose / Custom Control Semantics
141
+ - Pinch to Zoom
131
142
 
132
143
  ---
133
144
 
134
145
  # Wait and Synchronization Reliability
135
146
 
136
- ## Why third
147
+ ## Rationale
137
148
  Reliable async synchronization is foundational for agent success and should precede gesture expansion.
138
149
 
139
- **Status:** Spec Ready
140
- **Priority:** P3
150
+ **Status:** Spec Ready
141
151
 
142
152
  Addresses failures where agents:
143
153
  - skip UI waits after actions
@@ -150,6 +160,7 @@ Addresses failures where agents:
150
160
  - wait_for_ui_change (hierarchy diff based waiting)
151
161
  - Structured loading state detection
152
162
  - Snapshot revision / staleness metadata
163
+ - Focused snapshot views / incremental snapshot diffs
153
164
  - Compose-aware wait robustness improvements
154
165
 
155
166
  ## Expected Impact
@@ -159,6 +170,7 @@ Very high.
159
170
  - wait_for_ui_change implemented
160
171
  - Loading state detection available for representative controls
161
172
  - Snapshot revision or staleness metadata exposed
173
+ - Focused or diff-oriented snapshots validated in benchmark flows
162
174
  - UI-first sync guidance added to spec guardrails
163
175
  - In-place update waits validated on benchmark flows
164
176
 
@@ -170,22 +182,158 @@ Very high.
170
182
 
171
183
  ## Dependencies
172
184
  Depends on:
173
- - Priority 1 — Stronger State Verification
174
- - Priority 2 — Richer Element Identity
185
+ - Stronger State Verification
186
+ - Richer Element Identity
187
+
188
+ Blocks or strengthens:
189
+ - Better Compose / Custom Control Semantics
190
+ - Action Trace Correlation
191
+
192
+ ---
193
+
194
+ # Actionability Resolution
195
+
196
+ ## Rationale
197
+ Reduces failures caused by interacting with discoverable but non-actionable UI nodes.
198
+
199
+ **Status:** Planned
200
+
201
+ Addresses cases where:
202
+ - visible text is not the true click target
203
+ - child nodes differ from actionable containers
204
+ - affordance exists but handler ownership is ambiguous
205
+
206
+ ## Scope
207
+ - Actionable container resolution
208
+ - Executable-target preference rules
209
+ - Actionability confidence metadata
210
+ - Post-action state verification integration
211
+
212
+ ## Expected Impact
213
+ High.
214
+
215
+ ## Exit Criteria
216
+ - Actionable target resolution implemented
217
+ - Preference rules defined for executable containers over leaf nodes
218
+ - Actionability confidence surfaced
219
+ - Benchmark flows show reduced false taps and submit ambiguity
220
+
221
+ ## Success Metrics
222
+ - Reduced mis-targeted action failures
223
+ - Lower retarget retries
224
+ - Higher first-attempt action success
225
+
226
+ ## Dependencies
227
+ Depends on:
228
+ - Stronger State Verification
229
+ - Richer Element Identity
230
+ - Wait and Synchronization Reliability
231
+
232
+ Blocks or strengthens:
233
+ - Adjustable Control Support
234
+ - Better Compose / Custom Control Semantics
235
+
236
+ ---
237
+
238
+ # Adjustable Control Support
239
+
240
+ ## Rationale
241
+ High leverage improvement for sliders and parameterized controls.
242
+
243
+ **Status:** Planned
244
+
245
+ Addresses friction around:
246
+ - coordinate-calibrated slider interaction
247
+ - snapping and quantized controls
248
+ - weak state confirmation after adjustment
249
+
250
+ ## Scope
251
+ New semantic control support:
252
+
253
+ ```json
254
+ set_slider_value(target, value, tolerance?)
255
+ ```
256
+
257
+ Includes:
258
+ - semantic adjustable control manipulation
259
+ - read-back verification loop
260
+ - tolerance-aware value setting
261
+ - fallback coordinate calibration only when needed
262
+
263
+ ## Expected Impact
264
+ High.
265
+
266
+ ## Exit Criteria
267
+ - Adjustable control primitive implemented
268
+ - Verification loop reads and confirms resulting values
269
+ - Tolerance model defined
270
+ - Benchmark slider/custom control flows validated
271
+
272
+ ## Success Metrics
273
+ - Higher custom control interaction success rate
274
+ - Fewer retries adjusting controls
275
+ - Reduced coordinate-guessing failures
276
+
277
+ ## Dependencies
278
+ Depends on:
279
+ - Stronger State Verification
280
+ - Richer Element Identity
281
+ - Actionability Resolution
175
282
 
176
283
  Blocks or strengthens:
177
- - Priority 5 — Better Compose / Custom Control Semantics
178
- - Priority 7 — Action Trace Correlation
284
+ - Better Compose / Custom Control Semantics
285
+ - Pinch to Zoom
286
+
287
+ ---
288
+
289
+ # Signal-Oriented Diagnostic Filtering
290
+
291
+ ## Rationale
292
+ Improves observability by separating causal signals from diagnostic noise.
293
+
294
+ **Status:** Planned
295
+
296
+ Addresses friction from:
297
+ - noisy log streams
298
+ - weak signal extraction
299
+ - difficult action-to-signal attribution
300
+
301
+ ## Scope
302
+ - Structured diagnostic classification
303
+ - Noise filtering heuristics
304
+ - Signal relevance scoring
305
+ - App vs system event tagging
306
+
307
+ ## Expected Impact
308
+ High.
309
+
310
+ ## Exit Criteria
311
+ - Diagnostic signal classification model defined
312
+ - Noise filtering available in representative flows
313
+ - Relevant action-linked signals surfaced separately from background noise
314
+ - Debug workflows validated with filtered signals
315
+
316
+ ## Success Metrics
317
+ - Lower time-to-root-cause
318
+ - Faster identification of relevant action signals
319
+ - Reduced diagnostic ambiguity
320
+
321
+ ## Dependencies
322
+ Depends on:
323
+ - Stronger State Verification
324
+ - Wait and Synchronization Reliability
325
+
326
+ Strengthens:
327
+ - Action Trace Correlation
179
328
 
180
329
  ---
181
330
 
182
331
  # Long Press Gesture
183
332
 
184
- ## Why fourth
333
+ ## Rationale
185
334
  High utility, relatively low complexity.
186
335
 
187
- **Status:** Planned
188
- **Priority:** P4
336
+ **Status:** Planned
189
337
 
190
338
  Unlocks many currently awkward interactions:
191
339
 
@@ -223,26 +371,26 @@ High.
223
371
 
224
372
  ## Dependencies
225
373
  Depends on:
226
- - Priority 2 — Richer Element Identity
374
+ - Richer Element Identity
227
375
 
228
376
  Strengthens:
229
- - Priority 5 semantics interaction contracts
377
+ - Better Compose / Custom Control Semantics
230
378
 
231
379
  ---
232
380
 
233
381
  # Better Compose / Custom Control Semantics
234
382
 
235
- ## Why fifth
236
- Important, but strengthened by priorities 1–4 first.
383
+ ## Rationale
384
+ Higher priority after agent feedback exposed custom control semantics as a core reliability gap, not a later optimization.
237
385
 
238
- **Status:** Planned
239
- **Priority:** P5
386
+ **Status:** Spec Ready
240
387
 
241
388
  Semantics become more useful once:
242
389
  - identity is stronger
243
390
  - verification is stronger
244
391
  - gestures are richer
245
392
  - synchronization is more reliable
393
+ - action execution is more precise
246
394
 
247
395
  ## Scope
248
396
  - Composite control traits
@@ -268,20 +416,21 @@ High.
268
416
 
269
417
  ## Dependencies
270
418
  Depends on:
271
- - Priority 1 — Stronger State Verification
272
- - Priority 2 — Richer Element Identity
273
- - Priority 3 — Wait and Synchronization Reliability
274
- - Priority 4 — Long Press
419
+ - Stronger State Verification
420
+ - Richer Element Identity
421
+ - Wait and Synchronization Reliability
422
+ - Actionability Resolution
423
+ - Adjustable Control Support
424
+ - Long Press Gesture
275
425
 
276
426
  ---
277
427
 
278
428
  # Pinch to Zoom
279
429
 
280
- ## Why sixth
430
+ ## Rationale
281
431
  Valuable, but narrower than long press.
282
432
 
283
- **Status:** Planned
284
- **Priority:** P6
433
+ **Status:** Planned
285
434
 
286
435
  Applies mainly to:
287
436
  - maps
@@ -317,19 +466,18 @@ Medium-high.
317
466
 
318
467
  ## Dependencies
319
468
  Depends on:
320
- - Priority 1 — Stronger State Verification
321
- - Priority 2 — Richer Element Identity
469
+ - Stronger State Verification
470
+ - Richer Element Identity
322
471
 
323
472
  ---
324
473
 
325
474
  # Action Trace Correlation
326
475
 
327
- ## Why seventh
476
+ ## Rationale
328
477
  Very valuable for debugging,
329
478
  but less critical than improving control success first.
330
479
 
331
- **Status:** Planned
332
- **Priority:** P7
480
+ **Status:** Planned
333
481
 
334
482
  Improves diagnosis more than task completion.
335
483
 
@@ -353,75 +501,93 @@ Medium-high.
353
501
 
354
502
  ## Dependencies
355
503
  Depends on:
356
- - Priority 1 — Stronger State Verification
357
- - Priority 2 — Richer Element Identity
358
- - Priority 3 — Wait and Synchronization Reliability
504
+ - Stronger State Verification
505
+ - Richer Element Identity
506
+ - Wait and Synchronization Reliability
359
507
 
360
508
  ---
361
509
 
362
510
  # Roadmap Sequence
363
511
 
364
512
  ## Dependency Summary
365
- Foundational sequence:
366
513
 
367
- Layer 1 (Foundations)
368
- - Priority 1
369
- - Priority 2
514
+ Foundation
515
+ - Stronger State Verification
516
+ - Richer Element Identity
517
+
518
+ Synchronization & Actionability
519
+ - Wait and Synchronization Reliability
520
+ - Actionability Resolution
370
521
 
371
- Layer 2 (Synchronization)
372
- - Priority 3 depends on 1,2
522
+ Control Precision & Observability
523
+ - Adjustable Control Support
524
+ - Signal-Oriented Diagnostic Filtering
373
525
 
374
- Layer 3 (Interaction Expansion)
375
- - Priority 4 depends on 2
376
- - Priority 5 depends on 1,2,3,4
377
- - Priority 6 depends on 1,2
526
+ Interaction Expansion
527
+ - Long Press Gesture
528
+ - Better Compose / Custom Control Semantics
529
+ - Pinch to Zoom
378
530
 
379
- Layer 4 (Observability)
380
- - Priority 7 depends on 1,2,3
531
+ Deep Observability
532
+ - Action Trace Correlation
381
533
 
382
534
  ## Wave 1 (Current Focus)
383
535
  - Stronger State Verification
384
536
  - Richer Element Identity
385
537
  - Wait and Synchronization Reliability
538
+ - Actionability Resolution
386
539
 
387
540
  Focus:
388
541
  Make core loop more reliable.
389
542
 
390
543
  ---
391
544
 
392
- ## Wave 2 (Expansion)
393
- - Long Press
394
- - Better Compose Semantics
545
+ ## Wave 2 (Control Precision + Diagnostics)
546
+ - Adjustable Control Support
547
+ - Better Compose / Custom Control Semantics
548
+ - Signal-Oriented Diagnostic Filtering
549
+
550
+ Focus:
551
+ Improve control precision, custom control semantics, and signal observability.
552
+
553
+ ---
554
+
555
+ ## Wave 3 (Interaction Expansion)
556
+ - Long Press Gesture
395
557
 
396
558
  Focus:
397
- Expand interaction capability.
559
+ Expand interaction capability after core control reliability is improved.
398
560
 
399
561
  ---
400
562
 
401
- ## Wave 3 (Advanced)
563
+ ## Wave 4 (Advanced Gestures + Deep Observability)
402
564
  - Pinch to Zoom
403
565
  - Action Trace Correlation
404
566
 
405
567
  Focus:
406
- Advanced gestures + observability.
568
+ Advanced gestures + deep observability.
407
569
 
408
570
  ---
409
571
 
410
- # Capability Sequence
572
+ # Roadmap Ordering
411
573
 
412
- Execution Order:
574
+ Roadmap Ordering:
413
575
  1. Stronger State Verification
414
576
  2. Richer Element Identity
415
577
  3. Wait and Synchronization Reliability
416
- 4. Long Press
417
- 5. Better Compose / Custom Control Semantics
418
- 6. Pinch to Zoom
419
- 7. Action Trace Correlation
578
+ 4. Actionability Resolution
579
+ 5. Adjustable Control Support
580
+ 6. Better Compose / Custom Control Semantics
581
+ 7. Signal-Oriented Diagnostic Filtering
582
+ 8. Long Press Gesture
583
+ 9. Pinch to Zoom
584
+ 10. Action Trace Correlation
420
585
 
421
586
  Rationale:
422
- - Priorities 1–3 harden control, verification, and synchronization.
423
- - Priorities 4–6 expand interaction capability.
424
- - Priority 7 adds observability once control reliability matures.
587
+ - Early roadmap items harden state, targeting, synchronization, action execution.
588
+ - Mid roadmap items improve control precision and signal observability.
589
+ - Later interaction-focused items expand interaction coverage.
590
+ - Final observability work deepens debugging observability.
425
591
 
426
592
  ---
427
593
 
@@ -0,0 +1,216 @@
1
+ # RFC 005 — Unified Action Execution and Verification Model
2
+
3
+ ## 1. Summary
4
+
5
+ This RFC defines a unified execution and verification model for all agent-driven UI actions.
6
+
7
+ It standardises:
8
+ - how actions are resolved
9
+ - how they are executed
10
+ - how outcomes are verified
11
+ - how failures are classified
12
+ - how observability signals are emitted
13
+
14
+ The goal is to eliminate inconsistent per-feature execution logic and establish a single deterministic lifecycle for all UI interactions.
15
+
16
+ ---
17
+
18
+ ## 2. Problem Statement
19
+
20
+ Current execution paths are fragmented across interaction types:
21
+
22
+ - Tap / click actions rely on implicit success assumptions
23
+ - Control adjustments (sliders, inputs) use ad-hoc verification logic
24
+ - Gesture actions lack consistent post-execution validation
25
+ - Action success is often inferred from indirect UI changes or logs
26
+
27
+ This leads to:
28
+ - ambiguous success states
29
+ - inconsistent retries
30
+ - weak failure classification
31
+ - poor observability signal quality
32
+
33
+ ---
34
+
35
+ ## 3. Design Goals
36
+
37
+ The model must:
38
+
39
+ - Provide a single lifecycle for all actions
40
+ - Separate target resolution from execution
41
+ - Require explicit verification of state change
42
+ - Standardise failure classification
43
+ - Integrate with observability systems cleanly
44
+ - Support both simple and parameterised actions
45
+
46
+ ---
47
+
48
+ ## 4. Action Lifecycle
49
+
50
+ Every action MUST pass through the following states:
51
+
52
+ 1. Resolved
53
+ - A target has been identified via Actionability Resolution
54
+ - The target is executable (not just visible)
55
+
56
+ 2. Dispatched
57
+ - The action has been issued to the runtime layer
58
+
59
+ 3. Pending Verification
60
+ - Waiting for expected UI or state change
61
+
62
+ 4. Verified
63
+ - Expected outcome confirmed
64
+
65
+ 5. Failed
66
+ - Verification did not succeed within constraints
67
+
68
+ ---
69
+
70
+ ## 5. Action Types
71
+
72
+ All actions are categorised into canonical types:
73
+
74
+ - Navigation
75
+ - Input
76
+ - Selection
77
+ - Gesture
78
+ - Control Adjustment
79
+
80
+ Each type may have type-specific execution adapters but MUST conform to the same lifecycle.
81
+
82
+ ---
83
+
84
+ ## 6. Execution Contract
85
+
86
+ All actions MUST define:
87
+
88
+ ### 6.1 Target
89
+ A resolved executable entity (not a UI label or text node)
90
+
91
+ ### 6.2 Intent
92
+ The intended effect of the action
93
+
94
+ ### 6.3 Expected State Delta
95
+ What must change in the UI or application state
96
+
97
+ ---
98
+
99
+ ## 7. Verification Model
100
+
101
+ Verification MUST be explicit and deterministic.
102
+
103
+ ### 7.1 Verification Sources
104
+ At least one must be used:
105
+
106
+ - UI state diff
107
+ - element property change
108
+ - navigation change
109
+ - value update (for controls)
110
+
111
+ ### 7.2 Timeout Behaviour
112
+ - Each action defines a verification window
113
+ - Failure occurs if no valid state delta is observed in time
114
+
115
+ ### 7.3 No Implicit Success
116
+ Actions MUST NOT be considered successful without explicit verification.
117
+
118
+ ---
119
+
120
+ ## 8. Actionability Integration
121
+
122
+ This model depends on Actionability Resolution:
123
+
124
+ - Only resolved executable targets may be executed
125
+ - Visible but non-actionable nodes are invalid targets
126
+ - Execution is blocked if confidence is below threshold
127
+
128
+ ---
129
+
130
+ ## 9. Control Adjustment Model
131
+
132
+ Control actions (sliders, inputs) are treated as parameterised actions:
133
+
134
+ Example:
135
+
136
+ set_slider_value(target, value, tolerance)
137
+
138
+ Must include:
139
+ - pre-state value
140
+ - post-state verification
141
+ - tolerance-aware validation
142
+
143
+ Fallback to coordinate-based interaction is allowed only if semantic control resolution fails.
144
+
145
+ ---
146
+
147
+ ## 10. Observability Hooks
148
+
149
+ Each action emits structured signals:
150
+
151
+ - action_id
152
+ - target_id
153
+ - action_type
154
+ - lifecycle_state transitions
155
+ - verification result
156
+ - failure reason (if applicable)
157
+
158
+ These signals feed:
159
+ - Signal-Oriented Diagnostic Filtering
160
+ - Action Trace Correlation
161
+
162
+ ---
163
+
164
+ ## 11. Failure Classification
165
+
166
+ Failures MUST be categorised:
167
+
168
+ - Target resolution failure
169
+ - Dispatch failure
170
+ - Verification timeout
171
+ - Unexpected state delta
172
+ - No state change observed
173
+
174
+ This enables consistent debugging and telemetry.
175
+
176
+ ---
177
+
178
+ ## 12. Relationship to Existing Roadmap
179
+
180
+ This RFC provides the foundation for:
181
+
182
+ - Actionability Resolution (#4)
183
+ - Adjustable Control Support (#5)
184
+ - Signal-Oriented Diagnostic Filtering (#6)
185
+
186
+ It defines the shared execution substrate those capabilities plug into.
187
+
188
+ ---
189
+
190
+ ## 13. Scope Boundary
191
+
192
+ This RFC defines the execution model and lifecycle semantics for agent-driven UI actions.
193
+
194
+ - Action types referenced in this RFC correspond to the existing runtime `action_type` contract and do not redefine or extend the underlying taxonomy
195
+ - Lifecycle signals described in this RFC are emitted by the runtime execution layer (defined in RFC 006), not by this specification directly
196
+
197
+ It does NOT define:
198
+ - runtime instrumentation details
199
+ - how lifecycle states are emitted in code
200
+ - mapping to specific source modules (e.g. src/server, src/interact)
201
+ - tool schema implementation details
202
+ - mapping between semantic action categories and runtime implementation modules (this is defined in RFC 006)
203
+
204
+ Those concerns are delegated to a separate binding-layer RFC which defines how this model is implemented in the current system.
205
+
206
+ ---
207
+
208
+ ## 14. Summary
209
+
210
+ This model enforces a single, verifiable lifecycle for all UI actions.
211
+
212
+ It ensures:
213
+ - deterministic execution
214
+ - explicit verification
215
+ - consistent failure handling
216
+ - unified observability