mobile-debug-mcp 0.26.3 → 0.26.5
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/dist/interact/index.js +496 -7
- package/dist/observe/ios.js +48 -4
- package/dist/server/tool-definitions.js +56 -0
- package/dist/server/tool-handlers.js +25 -0
- package/dist/server-core.js +1 -1
- package/dist/utils/android/utils.js +37 -5
- package/docs/CHANGELOG.md +6 -0
- package/docs/ROADMAP.md +69 -14
- package/docs/rfcs/008-adjustable-control-support-and-semantic-value-manipulation.md +273 -0
- package/docs/rfcs/009-semantic-control-modeling-for-custom-and-composite-controls.md +238 -0
- package/docs/specs/mcp-tooling-spec-v1.md +23 -1
- package/docs/tools/interact.md +21 -0
- package/package.json +1 -1
- package/src/interact/index.ts +625 -8
- package/src/observe/ios.ts +43 -4
- package/src/server/tool-definitions.ts +56 -0
- package/src/server/tool-handlers.ts +26 -0
- package/src/server-core.ts +1 -1
- package/src/types.ts +21 -0
- package/src/utils/android/utils.ts +32 -5
- package/test/unit/interact/adjust_control.test.ts +365 -0
- package/test/unit/observe/find_element.test.ts +46 -0
- package/test/unit/observe/state_extraction.test.ts +89 -2
- package/test/unit/server/contract.test.ts +8 -0
- package/test/unit/server/response_shapes.test.ts +39 -0
|
@@ -0,0 +1,238 @@
|
|
|
1
|
+
# RFC 009 — Semantic Control Modeling for Custom and Composite Controls
|
|
2
|
+
|
|
3
|
+
## 1. Summary
|
|
4
|
+
|
|
5
|
+
This RFC defines a semantic control model for identifying, exposing, and interacting with custom and composite controls that are poorly represented through raw accessibility or platform UI trees.
|
|
6
|
+
|
|
7
|
+
It introduces semantic enrichment for controls such as:
|
|
8
|
+
|
|
9
|
+
- sliders
|
|
10
|
+
- steppers
|
|
11
|
+
- segmented controls
|
|
12
|
+
- dropdowns
|
|
13
|
+
- Compose/SwiftUI custom widgets
|
|
14
|
+
- composite gesture-driven controls
|
|
15
|
+
|
|
16
|
+
The goal is to improve target resolution, control interaction, and verification reliability for controls whose actionable semantics are not fully captured by raw snapshots.
|
|
17
|
+
|
|
18
|
+
---
|
|
19
|
+
|
|
20
|
+
## 2. Problem Statement
|
|
21
|
+
|
|
22
|
+
Current interaction logic works well when platform semantics are explicit.
|
|
23
|
+
|
|
24
|
+
It is weaker when controls appear as:
|
|
25
|
+
|
|
26
|
+
- generic container views
|
|
27
|
+
- unlabeled clickable wrappers
|
|
28
|
+
- nested composite controls
|
|
29
|
+
- custom Compose/SwiftUI components with weak accessibility exposure
|
|
30
|
+
|
|
31
|
+
Observed problems include:
|
|
32
|
+
|
|
33
|
+
- controls resolving as parent containers rather than actionable targets
|
|
34
|
+
- missing slider-like controls in snapshots
|
|
35
|
+
- weak distinction between discrete vs continuous controls
|
|
36
|
+
- inability to infer supported interactions from control structure
|
|
37
|
+
- unreliable verification of control state
|
|
38
|
+
|
|
39
|
+
This causes brittle automation and coordinate fallback behavior.
|
|
40
|
+
|
|
41
|
+
---
|
|
42
|
+
|
|
43
|
+
## 3. Goals
|
|
44
|
+
|
|
45
|
+
This RFC introduces a semantic layer that MUST:
|
|
46
|
+
|
|
47
|
+
- infer higher-level control semantics from raw UI structures
|
|
48
|
+
- enrich snapshots with semantic control metadata
|
|
49
|
+
- improve actionable target selection (RFC 007)
|
|
50
|
+
- improve adjustable control handling (RFC 008)
|
|
51
|
+
- improve verification for semantic control state
|
|
52
|
+
- reduce coordinate fallback usage
|
|
53
|
+
|
|
54
|
+
---
|
|
55
|
+
|
|
56
|
+
## 4. Non-Goals
|
|
57
|
+
|
|
58
|
+
This RFC does NOT define:
|
|
59
|
+
|
|
60
|
+
- replacement of raw accessibility trees
|
|
61
|
+
- ML-based semantic inference
|
|
62
|
+
- probabilistic control classification
|
|
63
|
+
- new gesture primitives
|
|
64
|
+
- autonomous planning behavior
|
|
65
|
+
|
|
66
|
+
Semantic modeling is deterministic enrichment layered over raw signals.
|
|
67
|
+
|
|
68
|
+
---
|
|
69
|
+
|
|
70
|
+
## 5. Runtime Surfaces
|
|
71
|
+
|
|
72
|
+
This RFC applies to existing runtime surfaces:
|
|
73
|
+
|
|
74
|
+
- findElementHandler
|
|
75
|
+
- _resolveActionableAncestor
|
|
76
|
+
- _buildResolvedElement
|
|
77
|
+
- tapElementHandler
|
|
78
|
+
- scrollToElementHandler
|
|
79
|
+
|
|
80
|
+
Semantic modeling augments these surfaces; it does not replace them.
|
|
81
|
+
|
|
82
|
+
---
|
|
83
|
+
|
|
84
|
+
## 6. Semantic Control Model
|
|
85
|
+
|
|
86
|
+
Controls MAY progressively expose semantic metadata such as:
|
|
87
|
+
|
|
88
|
+
```ts
|
|
89
|
+
interface SemanticControl {
|
|
90
|
+
semantic_role:
|
|
91
|
+
| "slider"
|
|
92
|
+
| "stepper"
|
|
93
|
+
| "dropdown"
|
|
94
|
+
| "segmented_control"
|
|
95
|
+
| "custom_adjustable"
|
|
96
|
+
| "composite_control";
|
|
97
|
+
|
|
98
|
+
supported_actions: string[];
|
|
99
|
+
|
|
100
|
+
adjustable: boolean;
|
|
101
|
+
|
|
102
|
+
state_shape:
|
|
103
|
+
| "continuous"
|
|
104
|
+
| "discrete"
|
|
105
|
+
| "semantic";
|
|
106
|
+
}
|
|
107
|
+
```
|
|
108
|
+
|
|
109
|
+
The control roles above represent an expected semantic model, not a claim that all such control classes are equally surfaced in the current runtime.
|
|
110
|
+
|
|
111
|
+
Current runtime support may initially expose simpler semantic signals such as:
|
|
112
|
+
- role hints
|
|
113
|
+
- semantic labels
|
|
114
|
+
- value_range metadata
|
|
115
|
+
- selector confidence or related resolution signals
|
|
116
|
+
|
|
117
|
+
Richer control roles are progressive extensions over time.
|
|
118
|
+
|
|
119
|
+
---
|
|
120
|
+
|
|
121
|
+
## 7. Semantic Inference Rules
|
|
122
|
+
|
|
123
|
+
Inference MAY use signals such as:
|
|
124
|
+
|
|
125
|
+
- accessibility role hints
|
|
126
|
+
- value_range metadata
|
|
127
|
+
- child composition patterns
|
|
128
|
+
- repeated selectable child structures
|
|
129
|
+
- platform traits (adjustable, selected, expanded)
|
|
130
|
+
- known control heuristics
|
|
131
|
+
|
|
132
|
+
Inference MUST be deterministic and explainable.
|
|
133
|
+
|
|
134
|
+
Raw signals always win on conflict.
|
|
135
|
+
|
|
136
|
+
Semantic inference confidence, where present, is advisory only and MUST NOT be treated as executable truth.
|
|
137
|
+
|
|
138
|
+
---
|
|
139
|
+
|
|
140
|
+
## 8. Resolution Integration (RFC 007)
|
|
141
|
+
|
|
142
|
+
Semantic metadata SHOULD improve target resolution by:
|
|
143
|
+
|
|
144
|
+
- preferring actionable child controls over generic containers
|
|
145
|
+
- promoting semantically actionable descendants
|
|
146
|
+
- disambiguating among multiple candidate matches
|
|
147
|
+
|
|
148
|
+
Semantic signals are advisory enrichment, not executable truth.
|
|
149
|
+
|
|
150
|
+
---
|
|
151
|
+
|
|
152
|
+
## 9. Adjustable Control Integration (RFC 008)
|
|
153
|
+
|
|
154
|
+
Where adjustable=true:
|
|
155
|
+
|
|
156
|
+
Semantic metadata MAY expose:
|
|
157
|
+
|
|
158
|
+
- supported adjustment mode
|
|
159
|
+
- discrete vs continuous state model
|
|
160
|
+
- expected verification strategy
|
|
161
|
+
|
|
162
|
+
This improves convergence for value-setting workflows.
|
|
163
|
+
|
|
164
|
+
---
|
|
165
|
+
|
|
166
|
+
## 10. Verification Integration
|
|
167
|
+
|
|
168
|
+
Verification MAY use semantic control metadata to improve:
|
|
169
|
+
|
|
170
|
+
- value-state verification
|
|
171
|
+
- discrete selection verification
|
|
172
|
+
- semantic-state checks
|
|
173
|
+
|
|
174
|
+
Formal verification still remains governed by RFC 005.
|
|
175
|
+
|
|
176
|
+
---
|
|
177
|
+
|
|
178
|
+
## 11. Output Contract (Progressive Extension)
|
|
179
|
+
|
|
180
|
+
Current runtime may expose partial semantic outputs.
|
|
181
|
+
|
|
182
|
+
Expected progressive shape (future extension model):
|
|
183
|
+
|
|
184
|
+
```ts
|
|
185
|
+
interface SemanticResolutionMetadata {
|
|
186
|
+
semantic_role?: string;
|
|
187
|
+
supported_actions?: string[];
|
|
188
|
+
adjustable?: boolean;
|
|
189
|
+
state_shape?: string;
|
|
190
|
+
confidence?: "low" | "medium" | "high";
|
|
191
|
+
}
|
|
192
|
+
```
|
|
193
|
+
|
|
194
|
+
These fields are progressive enrichment and MUST NOT be assumed universally present.
|
|
195
|
+
|
|
196
|
+
Implementations MAY expose only a subset of this model initially. Presence of a richer semantic role does not imply universal runtime support for all control classes.
|
|
197
|
+
|
|
198
|
+
---
|
|
199
|
+
|
|
200
|
+
## 12. Failure Modes
|
|
201
|
+
|
|
202
|
+
Semantic modeling MAY fail due to:
|
|
203
|
+
|
|
204
|
+
- insufficient raw signals
|
|
205
|
+
- ambiguous composite structures
|
|
206
|
+
- conflicting heuristics
|
|
207
|
+
|
|
208
|
+
When semantic inference confidence is insufficient:
|
|
209
|
+
|
|
210
|
+
- raw resolution flow MUST continue
|
|
211
|
+
- semantic fields MAY be omitted
|
|
212
|
+
- no semantic guessing should be forced
|
|
213
|
+
|
|
214
|
+
---
|
|
215
|
+
|
|
216
|
+
## 13. Success Metrics
|
|
217
|
+
|
|
218
|
+
- fewer coordinate fallbacks
|
|
219
|
+
- improved control discovery
|
|
220
|
+
- improved actionable-target precision
|
|
221
|
+
- improved slider/custom-control automation success
|
|
222
|
+
- reduced semantic mismatch failures (RFC 010)
|
|
223
|
+
|
|
224
|
+
---
|
|
225
|
+
|
|
226
|
+
## 14. Relationship to Other RFCs
|
|
227
|
+
|
|
228
|
+
RFC 005 — verification correctness model
|
|
229
|
+
RFC 006 — runtime action execution
|
|
230
|
+
RFC 007 — target resolution
|
|
231
|
+
RFC 008 — adjustable control support
|
|
232
|
+
RFC 010 — recovery uses semantic mismatch failures defined here
|
|
233
|
+
|
|
234
|
+
---
|
|
235
|
+
|
|
236
|
+
## 15. Summary
|
|
237
|
+
|
|
238
|
+
This RFC adds deterministic semantic control enrichment for custom and composite controls, improving resolution, interaction reliability, and verification while remaining layered over existing runtime signals.
|
|
@@ -52,7 +52,7 @@ For backend/API activity, `wait_for_screen_change` is not the right verification
|
|
|
52
52
|
Action tools mutate application state.
|
|
53
53
|
|
|
54
54
|
Includes:
|
|
55
|
-
`start_app`, `restart_app`, `tap`, `tap_element`, `swipe`, `scroll_to_element`, `type_text`, `press_back`
|
|
55
|
+
`start_app`, `restart_app`, `tap`, `tap_element`, `swipe`, `scroll_to_element`, `type_text`, `press_back`, `adjust_control`
|
|
56
56
|
|
|
57
57
|
### 4.2 Required Semantics
|
|
58
58
|
|
|
@@ -244,6 +244,7 @@ Raw layer contents include:
|
|
|
244
244
|
- UI hierarchy or accessibility tree
|
|
245
245
|
- normalized readable element state where exposed by the platform
|
|
246
246
|
- platform-native identity hints such as stable identifiers, roles, and test tags
|
|
247
|
+
- semantic control metadata when derivable from the raw tree, including `semantic_role`, `supported_actions`, `adjustable`, and `state_shape`
|
|
247
248
|
- snapshot metadata such as `snapshot_revision` and `captured_at_ms`
|
|
248
249
|
- `loading_state` when a reliable loading signal is detectable
|
|
249
250
|
- screenshot when available
|
|
@@ -292,6 +293,27 @@ Semantic output MUST NOT replace classification or verification.
|
|
|
292
293
|
|
|
293
294
|
Classification remains a supplementary, post-action interpretation mechanism.
|
|
294
295
|
|
|
296
|
+
### 9.4 Semantic Control Metadata
|
|
297
|
+
|
|
298
|
+
When present, semantic control metadata MAY include:
|
|
299
|
+
|
|
300
|
+
```ts
|
|
301
|
+
{
|
|
302
|
+
semantic_role?: 'slider' | 'stepper' | 'dropdown' | 'segmented_control' | 'custom_adjustable' | 'composite_control' | null,
|
|
303
|
+
supported_actions?: string[] | null,
|
|
304
|
+
adjustable?: boolean | null,
|
|
305
|
+
state_shape?: 'continuous' | 'discrete' | 'semantic' | null
|
|
306
|
+
}
|
|
307
|
+
```
|
|
308
|
+
|
|
309
|
+
Rules:
|
|
310
|
+
|
|
311
|
+
- semantic control metadata is derived and best-effort
|
|
312
|
+
- raw platform roles and state remain authoritative on conflict
|
|
313
|
+
- `adjustable` MAY be inferred from platform traits when no known role matches
|
|
314
|
+
- `state_shape` MUST respect known control roles before value-based heuristics
|
|
315
|
+
- `supported_actions` are hints only and MUST NOT be treated as guaranteed executable actions
|
|
316
|
+
|
|
295
317
|
## 10. Classification
|
|
296
318
|
|
|
297
319
|
Tool: `classify_action_outcome`
|
package/docs/tools/interact.md
CHANGED
|
@@ -172,6 +172,27 @@ Guidance:
|
|
|
172
172
|
|
|
173
173
|
---
|
|
174
174
|
|
|
175
|
+
## adjust_control
|
|
176
|
+
|
|
177
|
+
Purpose:
|
|
178
|
+
|
|
179
|
+
- adjust a numeric control value with bounded verification
|
|
180
|
+
|
|
181
|
+
Notes:
|
|
182
|
+
|
|
183
|
+
- initial support is for slider-like controls that expose `value_range` or readable numeric value state
|
|
184
|
+
- `expect_state` is the verification surface used to read back the resulting value
|
|
185
|
+
- direct target placement is preferred; drag fallback is treated as degraded mode
|
|
186
|
+
- the tool returns `target_state`, `actual_state`, `within_tolerance`, `converged`, `attempts`, and `adjustment_mode`
|
|
187
|
+
|
|
188
|
+
Input example:
|
|
189
|
+
|
|
190
|
+
```json
|
|
191
|
+
{ "selector": { "text": "Duration" }, "property": "value", "targetValue": 30, "tolerance": 0.5, "platform": "android", "deviceId": "emulator-5554" }
|
|
192
|
+
```
|
|
193
|
+
|
|
194
|
+
---
|
|
195
|
+
|
|
175
196
|
## find_element
|
|
176
197
|
|
|
177
198
|
Locate a UI element on the current screen using semantic matching and return an actionable element descriptor.
|