mobile-debug-mcp 0.26.1 → 0.26.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/docs/ROADMAP.md CHANGED
@@ -4,11 +4,23 @@
4
4
 
5
5
  Ordered by:
6
6
 
7
+
7
8
  1. Impact on agent reliability
8
9
  2. Reduction in retries / brittleness
9
10
  3. Breadth of app coverage improved
10
11
  4. Implementation complexity vs payoff
11
12
 
13
+ ## Capability Status Definitions
14
+
15
+ - **Completed**
16
+ Capability implemented and considered part of the baseline platform.
17
+
18
+ - **Spec Ready**
19
+ Capability design or RFC is mature and implementation-ready, but not yet delivered.
20
+
21
+ - **Planned**
22
+ Capability is prioritized on the roadmap, but detailed specification and/or implementation work remains ahead.
23
+
12
24
  ## Program-Level Success Metrics
13
25
  Track roadmap impact across releases using:
14
26
 
@@ -28,19 +40,20 @@ Higher task success with fewer retries.
28
40
 
29
41
  # Roadmap Status Overview
30
42
 
31
- ## Completed Foundations
43
+ ## Completed Capabilities
32
44
 
33
- | Capability | Status | Notes |
34
- |-----------|--------|-------|
35
- | Stronger State Verification | Complete | Foundational verification layer shipped |
36
- | Richer Element Identity | Complete | Identity and selector confidence foundations shipped |
45
+ - Stronger State Verification Complete (Foundational verification layer shipped)
46
+ - Richer Element Identity — Complete (Identity and selector confidence foundations shipped)
37
47
 
38
48
  ## Current Focus
39
49
 
40
50
  - Wait and Synchronization Reliability
51
+ - Actionability Resolution
41
52
 
42
53
  ## Upcoming Work
43
54
 
55
+ - Adjustable Control Support
56
+ - Signal-Oriented Diagnostic Filtering
44
57
  - Long Press Gesture
45
58
  - Better Compose / Custom Control Semantics
46
59
 
@@ -53,11 +66,10 @@ Higher task success with fewer retries.
53
66
 
54
67
  # Stronger State Verification
55
68
 
56
- ## Why first
69
+ ## Rationale
57
70
  Highest leverage improvement.
58
71
 
59
- **Status:** Completed
60
- **Priority:** P1
72
+ **Status:** Completed
61
73
 
62
74
  Most failures are not “can’t act,” they’re:
63
75
  - uncertain state
@@ -85,19 +97,18 @@ Very high.
85
97
 
86
98
  ## Dependencies
87
99
  Blocks or strengthens:
88
- - Priority 5 — Better Compose / Custom Control Semantics
89
- - Priority 6 — Pinch to Zoom verification
90
- - Priority 7 — Action Trace Correlation
100
+ - Better Compose / Custom Control Semantics
101
+ - Pinch to Zoom
102
+ - Action Trace Correlation
91
103
 
92
104
  ---
93
105
 
94
106
  # Richer Element Identity
95
107
 
96
- ## Why second
108
+ ## Rationale
97
109
  Directly reduces selector brittleness.
98
110
 
99
- **Status:** Completed
100
- **Priority:** P2
111
+ **Status:** Completed
101
112
 
102
113
  Improves:
103
114
  - targeting stability
@@ -125,19 +136,18 @@ Very high.
125
136
 
126
137
  ## Dependencies
127
138
  Blocks or strengthens:
128
- - Priority 4 — Long Press targeting reliability
129
- - Priority 5 — Better Compose / Custom Control Semantics
130
- - Priority 6 — Pinch to Zoom targeting
139
+ - Long Press Gesture
140
+ - Better Compose / Custom Control Semantics
141
+ - Pinch to Zoom
131
142
 
132
143
  ---
133
144
 
134
145
  # Wait and Synchronization Reliability
135
146
 
136
- ## Why third
147
+ ## Rationale
137
148
  Reliable async synchronization is foundational for agent success and should precede gesture expansion.
138
149
 
139
- **Status:** Spec Ready
140
- **Priority:** P3
150
+ **Status:** Spec Ready
141
151
 
142
152
  Addresses failures where agents:
143
153
  - skip UI waits after actions
@@ -170,22 +180,158 @@ Very high.
170
180
 
171
181
  ## Dependencies
172
182
  Depends on:
173
- - Priority 1 — Stronger State Verification
174
- - Priority 2 — Richer Element Identity
183
+ - Stronger State Verification
184
+ - Richer Element Identity
185
+
186
+ Blocks or strengthens:
187
+ - Better Compose / Custom Control Semantics
188
+ - Action Trace Correlation
189
+
190
+ ---
191
+
192
+ # Actionability Resolution
193
+
194
+ ## Rationale
195
+ Reduces failures caused by interacting with discoverable but non-actionable UI nodes.
196
+
197
+ **Status:** Planned
198
+
199
+ Addresses cases where:
200
+ - visible text is not the true click target
201
+ - child nodes differ from actionable containers
202
+ - affordance exists but handler ownership is ambiguous
203
+
204
+ ## Scope
205
+ - Actionable container resolution
206
+ - Executable-target preference rules
207
+ - Actionability confidence metadata
208
+ - Post-action state verification integration
209
+
210
+ ## Expected Impact
211
+ High.
212
+
213
+ ## Exit Criteria
214
+ - Actionable target resolution implemented
215
+ - Preference rules defined for executable containers over leaf nodes
216
+ - Actionability confidence surfaced
217
+ - Benchmark flows show reduced false taps and submit ambiguity
218
+
219
+ ## Success Metrics
220
+ - Reduced mis-targeted action failures
221
+ - Lower retarget retries
222
+ - Higher first-attempt action success
223
+
224
+ ## Dependencies
225
+ Depends on:
226
+ - Stronger State Verification
227
+ - Richer Element Identity
228
+ - Wait and Synchronization Reliability
229
+
230
+ Blocks or strengthens:
231
+ - Adjustable Control Support
232
+ - Better Compose / Custom Control Semantics
233
+
234
+ ---
235
+
236
+ # Adjustable Control Support
237
+
238
+ ## Rationale
239
+ High leverage improvement for sliders and parameterized controls.
240
+
241
+ **Status:** Planned
242
+
243
+ Addresses friction around:
244
+ - coordinate-calibrated slider interaction
245
+ - snapping and quantized controls
246
+ - weak state confirmation after adjustment
247
+
248
+ ## Scope
249
+ New semantic control support:
250
+
251
+ ```json
252
+ set_slider_value(target, value, tolerance?)
253
+ ```
254
+
255
+ Includes:
256
+ - semantic adjustable control manipulation
257
+ - read-back verification loop
258
+ - tolerance-aware value setting
259
+ - fallback coordinate calibration only when needed
260
+
261
+ ## Expected Impact
262
+ High.
263
+
264
+ ## Exit Criteria
265
+ - Adjustable control primitive implemented
266
+ - Verification loop reads and confirms resulting values
267
+ - Tolerance model defined
268
+ - Benchmark slider/custom control flows validated
269
+
270
+ ## Success Metrics
271
+ - Higher custom control interaction success rate
272
+ - Fewer retries adjusting controls
273
+ - Reduced coordinate-guessing failures
274
+
275
+ ## Dependencies
276
+ Depends on:
277
+ - Stronger State Verification
278
+ - Richer Element Identity
279
+ - Actionability Resolution
175
280
 
176
281
  Blocks or strengthens:
177
- - Priority 5 — Better Compose / Custom Control Semantics
178
- - Priority 7 — Action Trace Correlation
282
+ - Better Compose / Custom Control Semantics
283
+ - Pinch to Zoom
284
+
285
+ ---
286
+
287
+ # Signal-Oriented Diagnostic Filtering
288
+
289
+ ## Rationale
290
+ Improves observability by separating causal signals from diagnostic noise.
291
+
292
+ **Status:** Planned
293
+
294
+ Addresses friction from:
295
+ - noisy log streams
296
+ - weak signal extraction
297
+ - difficult action-to-signal attribution
298
+
299
+ ## Scope
300
+ - Structured diagnostic classification
301
+ - Noise filtering heuristics
302
+ - Signal relevance scoring
303
+ - App vs system event tagging
304
+
305
+ ## Expected Impact
306
+ High.
307
+
308
+ ## Exit Criteria
309
+ - Diagnostic signal classification model defined
310
+ - Noise filtering available in representative flows
311
+ - Relevant action-linked signals surfaced separately from background noise
312
+ - Debug workflows validated with filtered signals
313
+
314
+ ## Success Metrics
315
+ - Lower time-to-root-cause
316
+ - Faster identification of relevant action signals
317
+ - Reduced diagnostic ambiguity
318
+
319
+ ## Dependencies
320
+ Depends on:
321
+ - Stronger State Verification
322
+ - Wait and Synchronization Reliability
323
+
324
+ Strengthens:
325
+ - Action Trace Correlation
179
326
 
180
327
  ---
181
328
 
182
329
  # Long Press Gesture
183
330
 
184
- ## Why fourth
331
+ ## Rationale
185
332
  High utility, relatively low complexity.
186
333
 
187
- **Status:** Planned
188
- **Priority:** P4
334
+ **Status:** Planned
189
335
 
190
336
  Unlocks many currently awkward interactions:
191
337
 
@@ -223,26 +369,26 @@ High.
223
369
 
224
370
  ## Dependencies
225
371
  Depends on:
226
- - Priority 2 — Richer Element Identity
372
+ - Richer Element Identity
227
373
 
228
374
  Strengthens:
229
- - Priority 5 semantics interaction contracts
375
+ - Better Compose / Custom Control Semantics
230
376
 
231
377
  ---
232
378
 
233
379
  # Better Compose / Custom Control Semantics
234
380
 
235
- ## Why fifth
236
- Important, but strengthened by priorities 1–4 first.
381
+ ## Rationale
382
+ Important, but strengthened by earlier capabilities first.
237
383
 
238
- **Status:** Planned
239
- **Priority:** P5
384
+ **Status:** Planned
240
385
 
241
386
  Semantics become more useful once:
242
387
  - identity is stronger
243
388
  - verification is stronger
244
389
  - gestures are richer
245
390
  - synchronization is more reliable
391
+ - action execution is more precise
246
392
 
247
393
  ## Scope
248
394
  - Composite control traits
@@ -268,20 +414,22 @@ High.
268
414
 
269
415
  ## Dependencies
270
416
  Depends on:
271
- - Priority 1 — Stronger State Verification
272
- - Priority 2 — Richer Element Identity
273
- - Priority 3 — Wait and Synchronization Reliability
274
- - Priority 4 — Long Press
417
+ - Stronger State Verification
418
+ - Richer Element Identity
419
+ - Wait and Synchronization Reliability
420
+ - Actionability Resolution
421
+ - Adjustable Control Support
422
+ - Signal-Oriented Diagnostic Filtering
423
+ - Long Press Gesture
275
424
 
276
425
  ---
277
426
 
278
427
  # Pinch to Zoom
279
428
 
280
- ## Why sixth
429
+ ## Rationale
281
430
  Valuable, but narrower than long press.
282
431
 
283
- **Status:** Planned
284
- **Priority:** P6
432
+ **Status:** Planned
285
433
 
286
434
  Applies mainly to:
287
435
  - maps
@@ -317,19 +465,18 @@ Medium-high.
317
465
 
318
466
  ## Dependencies
319
467
  Depends on:
320
- - Priority 1 — Stronger State Verification
321
- - Priority 2 — Richer Element Identity
468
+ - Stronger State Verification
469
+ - Richer Element Identity
322
470
 
323
471
  ---
324
472
 
325
473
  # Action Trace Correlation
326
474
 
327
- ## Why seventh
475
+ ## Rationale
328
476
  Very valuable for debugging,
329
477
  but less critical than improving control success first.
330
478
 
331
- **Status:** Planned
332
- **Priority:** P7
479
+ **Status:** Planned
333
480
 
334
481
  Improves diagnosis more than task completion.
335
482
 
@@ -353,75 +500,93 @@ Medium-high.
353
500
 
354
501
  ## Dependencies
355
502
  Depends on:
356
- - Priority 1 — Stronger State Verification
357
- - Priority 2 — Richer Element Identity
358
- - Priority 3 — Wait and Synchronization Reliability
503
+ - Stronger State Verification
504
+ - Richer Element Identity
505
+ - Wait and Synchronization Reliability
359
506
 
360
507
  ---
361
508
 
362
509
  # Roadmap Sequence
363
510
 
364
511
  ## Dependency Summary
365
- Foundational sequence:
366
512
 
367
- Layer 1 (Foundations)
368
- - Priority 1
369
- - Priority 2
513
+ Foundation
514
+ - Stronger State Verification
515
+ - Richer Element Identity
370
516
 
371
- Layer 2 (Synchronization)
372
- - Priority 3 depends on 1,2
517
+ Synchronization & Actionability
518
+ - Wait and Synchronization Reliability
519
+ - Actionability Resolution
373
520
 
374
- Layer 3 (Interaction Expansion)
375
- - Priority 4 depends on 2
376
- - Priority 5 depends on 1,2,3,4
377
- - Priority 6 depends on 1,2
521
+ Control Precision & Observability
522
+ - Adjustable Control Support
523
+ - Signal-Oriented Diagnostic Filtering
378
524
 
379
- Layer 4 (Observability)
380
- - Priority 7 depends on 1,2,3
525
+ Interaction Expansion
526
+ - Long Press Gesture
527
+ - Better Compose / Custom Control Semantics
528
+ - Pinch to Zoom
529
+
530
+ Deep Observability
531
+ - Action Trace Correlation
381
532
 
382
533
  ## Wave 1 (Current Focus)
383
534
  - Stronger State Verification
384
535
  - Richer Element Identity
385
536
  - Wait and Synchronization Reliability
537
+ - Actionability Resolution
386
538
 
387
539
  Focus:
388
540
  Make core loop more reliable.
389
541
 
390
542
  ---
391
543
 
392
- ## Wave 2 (Expansion)
393
- - Long Press
394
- - Better Compose Semantics
544
+ ## Wave 2 (Control Precision + Diagnostics)
545
+ - Adjustable Control Support
546
+ - Signal-Oriented Diagnostic Filtering
547
+
548
+ Focus:
549
+ Improve control precision and signal observability.
550
+
551
+ ---
552
+
553
+ ## Wave 3 (Interaction Expansion)
554
+ - Long Press Gesture
555
+ - Better Compose / Custom Control Semantics
395
556
 
396
557
  Focus:
397
558
  Expand interaction capability.
398
559
 
399
560
  ---
400
561
 
401
- ## Wave 3 (Advanced)
562
+ ## Wave 4 (Advanced Gestures + Deep Observability)
402
563
  - Pinch to Zoom
403
564
  - Action Trace Correlation
404
565
 
405
566
  Focus:
406
- Advanced gestures + observability.
567
+ Advanced gestures + deep observability.
407
568
 
408
569
  ---
409
570
 
410
- # Capability Sequence
571
+ # Roadmap Ordering
411
572
 
412
- Execution Order:
573
+ Roadmap Ordering:
413
574
  1. Stronger State Verification
414
575
  2. Richer Element Identity
415
576
  3. Wait and Synchronization Reliability
416
- 4. Long Press
417
- 5. Better Compose / Custom Control Semantics
418
- 6. Pinch to Zoom
419
- 7. Action Trace Correlation
577
+ 4. Actionability Resolution
578
+ 5. Adjustable Control Support
579
+ 6. Signal-Oriented Diagnostic Filtering
580
+ 7. Long Press Gesture
581
+ 8. Better Compose / Custom Control Semantics
582
+ 9. Pinch to Zoom
583
+ 10. Action Trace Correlation
420
584
 
421
585
  Rationale:
422
- - Priorities 1–3 harden control, verification, and synchronization.
423
- - Priorities 4–6 expand interaction capability.
424
- - Priority 7 adds observability once control reliability matures.
586
+ - Early roadmap items harden state, targeting, synchronization, action execution.
587
+ - Mid roadmap items improve control precision and signal observability.
588
+ - Later interaction-focused items expand interaction coverage.
589
+ - Final observability work deepens debugging observability.
425
590
 
426
591
  ---
427
592
 
@@ -0,0 +1,216 @@
1
+ # RFC 005 — Unified Action Execution and Verification Model
2
+
3
+ ## 1. Summary
4
+
5
+ This RFC defines a unified execution and verification model for all agent-driven UI actions.
6
+
7
+ It standardises:
8
+ - how actions are resolved
9
+ - how they are executed
10
+ - how outcomes are verified
11
+ - how failures are classified
12
+ - how observability signals are emitted
13
+
14
+ The goal is to eliminate inconsistent per-feature execution logic and establish a single deterministic lifecycle for all UI interactions.
15
+
16
+ ---
17
+
18
+ ## 2. Problem Statement
19
+
20
+ Current execution paths are fragmented across interaction types:
21
+
22
+ - Tap / click actions rely on implicit success assumptions
23
+ - Control adjustments (sliders, inputs) use ad-hoc verification logic
24
+ - Gesture actions lack consistent post-execution validation
25
+ - Action success is often inferred from indirect UI changes or logs
26
+
27
+ This leads to:
28
+ - ambiguous success states
29
+ - inconsistent retries
30
+ - weak failure classification
31
+ - poor observability signal quality
32
+
33
+ ---
34
+
35
+ ## 3. Design Goals
36
+
37
+ The model must:
38
+
39
+ - Provide a single lifecycle for all actions
40
+ - Separate target resolution from execution
41
+ - Require explicit verification of state change
42
+ - Standardise failure classification
43
+ - Integrate with observability systems cleanly
44
+ - Support both simple and parameterised actions
45
+
46
+ ---
47
+
48
+ ## 4. Action Lifecycle
49
+
50
+ Every action MUST pass through the following states:
51
+
52
+ 1. Resolved
53
+ - A target has been identified via Actionability Resolution
54
+ - The target is executable (not just visible)
55
+
56
+ 2. Dispatched
57
+ - The action has been issued to the runtime layer
58
+
59
+ 3. Pending Verification
60
+ - Waiting for expected UI or state change
61
+
62
+ 4. Verified
63
+ - Expected outcome confirmed
64
+
65
+ 5. Failed
66
+ - Verification did not succeed within constraints
67
+
68
+ ---
69
+
70
+ ## 5. Action Types
71
+
72
+ All actions are categorised into canonical types:
73
+
74
+ - Navigation
75
+ - Input
76
+ - Selection
77
+ - Gesture
78
+ - Control Adjustment
79
+
80
+ Each type may have type-specific execution adapters but MUST conform to the same lifecycle.
81
+
82
+ ---
83
+
84
+ ## 6. Execution Contract
85
+
86
+ All actions MUST define:
87
+
88
+ ### 6.1 Target
89
+ A resolved executable entity (not a UI label or text node)
90
+
91
+ ### 6.2 Intent
92
+ The intended effect of the action
93
+
94
+ ### 6.3 Expected State Delta
95
+ What must change in the UI or application state
96
+
97
+ ---
98
+
99
+ ## 7. Verification Model
100
+
101
+ Verification MUST be explicit and deterministic.
102
+
103
+ ### 7.1 Verification Sources
104
+ At least one must be used:
105
+
106
+ - UI state diff
107
+ - element property change
108
+ - navigation change
109
+ - value update (for controls)
110
+
111
+ ### 7.2 Timeout Behaviour
112
+ - Each action defines a verification window
113
+ - Failure occurs if no valid state delta is observed in time
114
+
115
+ ### 7.3 No Implicit Success
116
+ Actions MUST NOT be considered successful without explicit verification.
117
+
118
+ ---
119
+
120
+ ## 8. Actionability Integration
121
+
122
+ This model depends on Actionability Resolution:
123
+
124
+ - Only resolved executable targets may be executed
125
+ - Visible but non-actionable nodes are invalid targets
126
+ - Execution is blocked if confidence is below threshold
127
+
128
+ ---
129
+
130
+ ## 9. Control Adjustment Model
131
+
132
+ Control actions (sliders, inputs) are treated as parameterised actions:
133
+
134
+ Example:
135
+
136
+ set_slider_value(target, value, tolerance)
137
+
138
+ Must include:
139
+ - pre-state value
140
+ - post-state verification
141
+ - tolerance-aware validation
142
+
143
+ Fallback to coordinate-based interaction is allowed only if semantic control resolution fails.
144
+
145
+ ---
146
+
147
+ ## 10. Observability Hooks
148
+
149
+ Each action emits structured signals:
150
+
151
+ - action_id
152
+ - target_id
153
+ - action_type
154
+ - lifecycle_state transitions
155
+ - verification result
156
+ - failure reason (if applicable)
157
+
158
+ These signals feed:
159
+ - Signal-Oriented Diagnostic Filtering
160
+ - Action Trace Correlation
161
+
162
+ ---
163
+
164
+ ## 11. Failure Classification
165
+
166
+ Failures MUST be categorised:
167
+
168
+ - Target resolution failure
169
+ - Dispatch failure
170
+ - Verification timeout
171
+ - Unexpected state delta
172
+ - No state change observed
173
+
174
+ This enables consistent debugging and telemetry.
175
+
176
+ ---
177
+
178
+ ## 12. Relationship to Existing Roadmap
179
+
180
+ This RFC provides the foundation for:
181
+
182
+ - Actionability Resolution (#4)
183
+ - Adjustable Control Support (#5)
184
+ - Signal-Oriented Diagnostic Filtering (#6)
185
+
186
+ It defines the shared execution substrate those capabilities plug into.
187
+
188
+ ---
189
+
190
+ ## 13. Scope Boundary
191
+
192
+ This RFC defines the execution model and lifecycle semantics for agent-driven UI actions.
193
+
194
+ - Action types referenced in this RFC correspond to the existing runtime `action_type` contract and do not redefine or extend the underlying taxonomy
195
+ - Lifecycle signals described in this RFC are emitted by the runtime execution layer (defined in RFC 006), not by this specification directly
196
+
197
+ It does NOT define:
198
+ - runtime instrumentation details
199
+ - how lifecycle states are emitted in code
200
+ - mapping to specific source modules (e.g. src/server, src/interact)
201
+ - tool schema implementation details
202
+ - mapping between semantic action categories and runtime implementation modules (this is defined in RFC 006)
203
+
204
+ Those concerns are delegated to a separate binding-layer RFC which defines how this model is implemented in the current system.
205
+
206
+ ---
207
+
208
+ ## 14. Summary
209
+
210
+ This model enforces a single, verifiable lifecycle for all UI actions.
211
+
212
+ It ensures:
213
+ - deterministic execution
214
+ - explicit verification
215
+ - consistent failure handling
216
+ - unified observability