mobile-debug-mcp 0.26.3 → 0.26.5

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,238 @@
1
+ # RFC 009 — Semantic Control Modeling for Custom and Composite Controls
2
+
3
+ ## 1. Summary
4
+
5
+ This RFC defines a semantic control model for identifying, exposing, and interacting with custom and composite controls that are poorly represented through raw accessibility or platform UI trees.
6
+
7
+ It introduces semantic enrichment for controls such as:
8
+
9
+ - sliders
10
+ - steppers
11
+ - segmented controls
12
+ - dropdowns
13
+ - Compose/SwiftUI custom widgets
14
+ - composite gesture-driven controls
15
+
16
+ The goal is to improve target resolution, control interaction, and verification reliability for controls whose actionable semantics are not fully captured by raw snapshots.
17
+
18
+ ---
19
+
20
+ ## 2. Problem Statement
21
+
22
+ Current interaction logic works well when platform semantics are explicit.
23
+
24
+ It is weaker when controls appear as:
25
+
26
+ - generic container views
27
+ - unlabeled clickable wrappers
28
+ - nested composite controls
29
+ - custom Compose/SwiftUI components with weak accessibility exposure
30
+
31
+ Observed problems include:
32
+
33
+ - controls resolving as parent containers rather than actionable targets
34
+ - missing slider-like controls in snapshots
35
+ - weak distinction between discrete vs continuous controls
36
+ - inability to infer supported interactions from control structure
37
+ - unreliable verification of control state
38
+
39
+ This causes brittle automation and coordinate fallback behavior.
40
+
41
+ ---
42
+
43
+ ## 3. Goals
44
+
45
+ This RFC introduces a semantic layer that MUST:
46
+
47
+ - infer higher-level control semantics from raw UI structures
48
+ - enrich snapshots with semantic control metadata
49
+ - improve actionable target selection (RFC 007)
50
+ - improve adjustable control handling (RFC 008)
51
+ - improve verification for semantic control state
52
+ - reduce coordinate fallback usage
53
+
54
+ ---
55
+
56
+ ## 4. Non-Goals
57
+
58
+ This RFC does NOT define:
59
+
60
+ - replacement of raw accessibility trees
61
+ - ML-based semantic inference
62
+ - probabilistic control classification
63
+ - new gesture primitives
64
+ - autonomous planning behavior
65
+
66
+ Semantic modeling is deterministic enrichment layered over raw signals.
67
+
68
+ ---
69
+
70
+ ## 5. Runtime Surfaces
71
+
72
+ This RFC applies to existing runtime surfaces:
73
+
74
+ - findElementHandler
75
+ - _resolveActionableAncestor
76
+ - _buildResolvedElement
77
+ - tapElementHandler
78
+ - scrollToElementHandler
79
+
80
+ Semantic modeling augments these surfaces; it does not replace them.
81
+
82
+ ---
83
+
84
+ ## 6. Semantic Control Model
85
+
86
+ Controls MAY progressively expose semantic metadata such as:
87
+
88
+ ```ts
89
+ interface SemanticControl {
90
+ semantic_role:
91
+ | "slider"
92
+ | "stepper"
93
+ | "dropdown"
94
+ | "segmented_control"
95
+ | "custom_adjustable"
96
+ | "composite_control";
97
+
98
+ supported_actions: string[];
99
+
100
+ adjustable: boolean;
101
+
102
+ state_shape:
103
+ | "continuous"
104
+ | "discrete"
105
+ | "semantic";
106
+ }
107
+ ```
108
+
109
+ The control roles above represent an expected semantic model, not a claim that all such control classes are equally surfaced in the current runtime.
110
+
111
+ Current runtime support may initially expose simpler semantic signals such as:
112
+ - role hints
113
+ - semantic labels
114
+ - value_range metadata
115
+ - selector confidence or related resolution signals
116
+
117
+ Richer control roles are progressive extensions over time.
118
+
119
+ ---
120
+
121
+ ## 7. Semantic Inference Rules
122
+
123
+ Inference MAY use signals such as:
124
+
125
+ - accessibility role hints
126
+ - value_range metadata
127
+ - child composition patterns
128
+ - repeated selectable child structures
129
+ - platform traits (adjustable, selected, expanded)
130
+ - known control heuristics
131
+
132
+ Inference MUST be deterministic and explainable.
133
+
134
+ Raw signals always win on conflict.
135
+
136
+ Semantic inference confidence, where present, is advisory only and MUST NOT be treated as executable truth.
137
+
138
+ ---
139
+
140
+ ## 8. Resolution Integration (RFC 007)
141
+
142
+ Semantic metadata SHOULD improve target resolution by:
143
+
144
+ - preferring actionable child controls over generic containers
145
+ - promoting semantically actionable descendants
146
+ - disambiguating among multiple candidate matches
147
+
148
+ Semantic signals are advisory enrichment, not executable truth.
149
+
150
+ ---
151
+
152
+ ## 9. Adjustable Control Integration (RFC 008)
153
+
154
+ Where adjustable=true:
155
+
156
+ Semantic metadata MAY expose:
157
+
158
+ - supported adjustment mode
159
+ - discrete vs continuous state model
160
+ - expected verification strategy
161
+
162
+ This improves convergence for value-setting workflows.
163
+
164
+ ---
165
+
166
+ ## 10. Verification Integration
167
+
168
+ Verification MAY use semantic control metadata to improve:
169
+
170
+ - value-state verification
171
+ - discrete selection verification
172
+ - semantic-state checks
173
+
174
+ Formal verification still remains governed by RFC 005.
175
+
176
+ ---
177
+
178
+ ## 11. Output Contract (Progressive Extension)
179
+
180
+ Current runtime may expose partial semantic outputs.
181
+
182
+ Expected progressive shape (future extension model):
183
+
184
+ ```ts
185
+ interface SemanticResolutionMetadata {
186
+ semantic_role?: string;
187
+ supported_actions?: string[];
188
+ adjustable?: boolean;
189
+ state_shape?: string;
190
+ confidence?: "low" | "medium" | "high";
191
+ }
192
+ ```
193
+
194
+ These fields are progressive enrichment and MUST NOT be assumed universally present.
195
+
196
+ Implementations MAY expose only a subset of this model initially. Presence of a richer semantic role does not imply universal runtime support for all control classes.
197
+
198
+ ---
199
+
200
+ ## 12. Failure Modes
201
+
202
+ Semantic modeling MAY fail due to:
203
+
204
+ - insufficient raw signals
205
+ - ambiguous composite structures
206
+ - conflicting heuristics
207
+
208
+ When semantic inference confidence is insufficient:
209
+
210
+ - raw resolution flow MUST continue
211
+ - semantic fields MAY be omitted
212
+ - no semantic guessing should be forced
213
+
214
+ ---
215
+
216
+ ## 13. Success Metrics
217
+
218
+ - fewer coordinate fallbacks
219
+ - improved control discovery
220
+ - improved actionable-target precision
221
+ - improved slider/custom-control automation success
222
+ - reduced semantic mismatch failures (RFC 010)
223
+
224
+ ---
225
+
226
+ ## 14. Relationship to Other RFCs
227
+
228
+ RFC 005 — verification correctness model
229
+ RFC 006 — runtime action execution
230
+ RFC 007 — target resolution
231
+ RFC 008 — adjustable control support
232
+ RFC 010 — recovery uses semantic mismatch failures defined here
233
+
234
+ ---
235
+
236
+ ## 15. Summary
237
+
238
+ This RFC adds deterministic semantic control enrichment for custom and composite controls, improving resolution, interaction reliability, and verification while remaining layered over existing runtime signals.
@@ -52,7 +52,7 @@ For backend/API activity, `wait_for_screen_change` is not the right verification
52
52
  Action tools mutate application state.
53
53
 
54
54
  Includes:
55
- `start_app`, `restart_app`, `tap`, `tap_element`, `swipe`, `scroll_to_element`, `type_text`, `press_back`
55
+ `start_app`, `restart_app`, `tap`, `tap_element`, `swipe`, `scroll_to_element`, `type_text`, `press_back`, `adjust_control`
56
56
 
57
57
  ### 4.2 Required Semantics
58
58
 
@@ -244,6 +244,7 @@ Raw layer contents include:
244
244
  - UI hierarchy or accessibility tree
245
245
  - normalized readable element state where exposed by the platform
246
246
  - platform-native identity hints such as stable identifiers, roles, and test tags
247
+ - semantic control metadata when derivable from the raw tree, including `semantic_role`, `supported_actions`, `adjustable`, and `state_shape`
247
248
  - snapshot metadata such as `snapshot_revision` and `captured_at_ms`
248
249
  - `loading_state` when a reliable loading signal is detectable
249
250
  - screenshot when available
@@ -292,6 +293,27 @@ Semantic output MUST NOT replace classification or verification.
292
293
 
293
294
  Classification remains a supplementary, post-action interpretation mechanism.
294
295
 
296
+ ### 9.4 Semantic Control Metadata
297
+
298
+ When present, semantic control metadata MAY include:
299
+
300
+ ```ts
301
+ {
302
+ semantic_role?: 'slider' | 'stepper' | 'dropdown' | 'segmented_control' | 'custom_adjustable' | 'composite_control' | null,
303
+ supported_actions?: string[] | null,
304
+ adjustable?: boolean | null,
305
+ state_shape?: 'continuous' | 'discrete' | 'semantic' | null
306
+ }
307
+ ```
308
+
309
+ Rules:
310
+
311
+ - semantic control metadata is derived and best-effort
312
+ - raw platform roles and state remain authoritative on conflict
313
+ - `adjustable` MAY be inferred from platform traits when no known role matches
314
+ - `state_shape` MUST respect known control roles before value-based heuristics
315
+ - `supported_actions` are hints only and MUST NOT be treated as guaranteed executable actions
316
+
295
317
  ## 10. Classification
296
318
 
297
319
  Tool: `classify_action_outcome`
@@ -172,6 +172,27 @@ Guidance:
172
172
 
173
173
  ---
174
174
 
175
+ ## adjust_control
176
+
177
+ Purpose:
178
+
179
+ - adjust a numeric control value with bounded verification
180
+
181
+ Notes:
182
+
183
+ - initial support is for slider-like controls that expose `value_range` or readable numeric value state
184
+ - `expect_state` is the verification surface used to read back the resulting value
185
+ - direct target placement is preferred; drag fallback is treated as degraded mode
186
+ - the tool returns `target_state`, `actual_state`, `within_tolerance`, `converged`, `attempts`, and `adjustment_mode`
187
+
188
+ Input example:
189
+
190
+ ```json
191
+ { "selector": { "text": "Duration" }, "property": "value", "targetValue": 30, "tolerance": 0.5, "platform": "android", "deviceId": "emulator-5554" }
192
+ ```
193
+
194
+ ---
195
+
175
196
  ## find_element
176
197
 
177
198
  Locate a UI element on the current screen using semantic matching and return an actionable element descriptor.
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "mobile-debug-mcp",
3
- "version": "0.26.3",
3
+ "version": "0.26.5",
4
4
  "description": "MCP server for mobile app debugging (Android + iOS), with focus on security and reliability",
5
5
  "type": "module",
6
6
  "bin": {