mobile-debug-mcp 0.25.0 → 0.25.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,400 @@
1
+ # RFC-002: Platform-Native Element Metadata and Resolution Hints
2
+
3
+ Priority: 2
4
+ Depends on: RFC-001 (Stronger State Verification)
5
+
6
+ ---
7
+
8
+ # 1. Problem
9
+
10
+ Agents currently rely on brittle or inconsistent selectors when identifying UI elements.
11
+
12
+ This leads to:
13
+
14
+ - selector drift across UI updates
15
+ - failure to target correct elements in dynamic layouts
16
+ - retry loops due to ambiguous element matching
17
+ - inability to distinguish visually similar components
18
+
19
+ Current system limitations:
20
+
21
+ - weak or inconsistent element identifiers
22
+ - over-reliance on hierarchy position or text inference
23
+ - insufficient metadata for stable targeting
24
+
25
+ This RFC assumes stable identity may be derived from underlying platform accessibility or testing hooks, but does not assume a universal cross-platform stable identifier model.
26
+
27
+ This RFC does not assume that a universal or guaranteed stable_id exists across all platforms. Instead, it defines a best-effort model based on platform-native identifiers and developer-provided metadata, supplemented by resolution hints.
28
+
29
+ ---
30
+
31
+ # 2. Goals
32
+
33
+ This RFC introduces:
34
+
35
+ 1. Platform-native element identity metadata for UI targeting
36
+ 2. Hierarchy-independent element references
37
+ 3. Selector confidence metadata for reliability
38
+ 4. Structured fallback resolution strategy
39
+
40
+ Success goals:
41
+
42
+ - Increase element match success rate
43
+ - Reduce selector-related retries
44
+ - Improve robustness across UI updates and Compose-heavy layouts
45
+
46
+ ---
47
+
48
+ # 3. Non-Goals
49
+
50
+ This RFC does not:
51
+
52
+ - Modify state verification logic (RFC-001)
53
+ - Introduce gesture handling (future RFCs)
54
+ - Define synchronization/waiting behaviour (RFC-003)
55
+ - Add new interaction primitives beyond identification and selection
56
+
57
+ ---
58
+
59
+ # 4. Proposed Model
60
+
61
+ ## 4.1 Stable Element Identity
62
+
63
+ Each UI element SHOULD expose a stable identifier when available.
64
+
65
+ Preferred model:
66
+
67
+ ```json
68
+ {
69
+ "element_id": "wifi_toggle",
70
+ "stable_id": "settings_wifi_toggle",
71
+ "role": "switch"
72
+ }
73
+ ```
74
+
75
+ Rules:
76
+
77
+ - stable_id SHOULD be derived from platform-native or developer-provided identifiers when available
78
+ - stable_id MAY remain consistent across UI renders where supported by the platform
79
+ - stable_id is a preferred targeting key when present, but not guaranteed to exist
80
+ - element_id is session-scoped and may change between snapshots
81
+
82
+ ---
83
+
84
+ ## 4.1.1 Stable ID Origin
85
+
86
+ stable_id MUST be derived from platform-provided or framework-provided identifiers when available.
87
+
88
+ Acceptable sources include:
89
+
90
+ - Android: resource-id, content-desc (when stable and explicitly set)
91
+ - iOS: accessibilityIdentifier
92
+ - Web: data-testid or equivalent testing attributes
93
+ - Compose: semantics properties or developer-assigned test tags
94
+
95
+ Rules:
96
+
97
+ - stable_id MUST NOT be heuristically generated from visual text alone
98
+ - stable_id SHOULD prefer developer-defined identifiers over inferred values
99
+ - If no reliable source exists, stable_id MUST be omitted (not fabricated)
100
+
101
+ ---
102
+
103
+ ## 4.1.2 Stable ID Collision Handling
104
+
105
+ If multiple elements share the same stable_id:
106
+
107
+ - system MUST treat this as a collision state
108
+ - all matching elements MUST be returned
109
+ - agent MUST disambiguate using role, label, or hierarchy context
110
+
111
+ Rules:
112
+
113
+ - collisions MUST NOT be silently resolved by system-level heuristics
114
+ - stable_id uniqueness is a best-effort constraint, not a guarantee
115
+
116
+ ---
117
+
118
+ ## 4.2 Selector Confidence Model
119
+
120
+ Each element MAY include confidence metadata for selection reliability.
121
+
122
+ ```json
123
+ {
124
+ "selector": "Text('WiFi')",
125
+ "confidence": 0.92
126
+ }
127
+ ```
128
+
129
+ Rules:
130
+
131
+ - Confidence reflects likelihood of correct element match
132
+ - Low confidence SHOULD trigger fallback resolution
133
+ - Confidence MUST NOT be treated as deterministic truth
134
+
135
+ ---
136
+
137
+ ## 4.2.1 Confidence API Exposure
138
+
139
+ Confidence metadata MUST be exposed as part of the element selector object.
140
+
141
+ Preferred shape:
142
+
143
+ ```json
144
+ {
145
+ "selector": "Text('WiFi')",
146
+ "confidence": {
147
+ "score": 0.92,
148
+ "reason": "unique_text_match"
149
+ }
150
+ }
151
+ ```
152
+
153
+ Rules:
154
+
155
+ - confidence.score MUST be a float between 0 and 1
156
+ - confidence.reason SHOULD indicate primary matching heuristic
157
+ - confidence MUST be attached to selector metadata, not state
158
+
159
+ This structure is expected to be present in both snapshot metadata and any downstream selector debugging output produced by the resolution engine.
160
+
161
+ ---
162
+
163
+ ## 4.3 Fallback Resolution Strategy
164
+
165
+ Resolution order MUST be:
166
+
167
+ 1. stable_id (if unique or disambiguated via collision handling)
168
+ 2. platform-native metadata match (role + label + test_tag)
169
+ 3. selector + confidence scoring
170
+ 4. structural hierarchy fallback
171
+ 5. text inference (last resort)
172
+
173
+ Agents MUST prefer higher-order resolution strategies before falling back.
174
+
175
+ ---
176
+
177
+ ## 4.4 Element Metadata Model (Platform-Aware)
178
+
179
+ For Compose and similar UI systems, elements MUST expose structured metadata rather than inferred semantic paths.
180
+
181
+ Preferred model:
182
+
183
+ ```json
184
+ {
185
+ "role": "button",
186
+ "label": "Save",
187
+ "text": "Save",
188
+ "test_tag": "settings_save_button",
189
+ "semantic": {
190
+ "is_clickable": true,
191
+ "is_container": false
192
+ }
193
+ }
194
+ ```
195
+
196
+ Rules:
197
+
198
+ - semantic_path MUST NOT be used as a required field
199
+ - platform-native metadata (test_tag / accessibility id) is preferred
200
+ - hierarchy information MAY be included but is not authoritative
201
+
202
+ ---
203
+
204
+ ## 4.5 Snapshot Response Contract (v1)
205
+
206
+ This RFC defines the expected structure of element metadata returned by the snapshot observation tool.
207
+
208
+ Each element in a snapshot MUST conform to the following shape:
209
+
210
+ ```json
211
+ {
212
+ "element_id": "string (session-scoped)",
213
+ "stable_id": "string (optional)",
214
+ "role": "string",
215
+ "label": "string (optional)",
216
+ "text": "string (optional)",
217
+ "test_tag": "string (optional)",
218
+ "selector": {
219
+ "value": "string",
220
+ "confidence": {
221
+ "score": 0.0-1.0,
222
+ "reason": "string"
223
+ }
224
+ }
225
+ }
226
+ ```
227
+
228
+ Rules:
229
+
230
+ - element_id MUST be present and session-scoped
231
+ - stable_id MAY be present when provided by platform or developer metadata
232
+ - selector.confidence MUST be attached when selector is present
233
+ - test_tag SHOULD be preferred over inferred identifiers where available
234
+
235
+ Note:
236
+ This schema replaces ambiguous "snapshot response" references in prior sections and defines the canonical output contract for element identity and resolution metadata.
237
+
238
+ This contract defines the boundary between platform-derived metadata and resolution-engine-generated metadata, and is the single source of truth for all element identity fields used by downstream agents.
239
+
240
+
241
+ ## 4.6 API Surface Mapping
242
+
243
+ This section defines where each field in the Snapshot Response Contract is produced within the system.
244
+
245
+ ### 4.6.1 Snapshot Tool Responsibility
246
+
247
+ The snapshot observation tool (e.g. `observe_ui_snapshot`) is responsible for returning the raw UI tree enriched with platform-derived metadata.
248
+
249
+ It MUST return elements conforming to the Snapshot Response Contract (Section 4.5).
250
+
251
+ In the current codebase, this maps to the `observe_ui_snapshot` pipeline (or equivalent snapshot generation function), which MUST return data conforming to the SnapshotResponse TypeScript contract defined in Section 4.6.4.
252
+
253
+ ### 4.6.2 Field Origin Mapping
254
+
255
+ Each field in the element model has a defined source of truth:
256
+
257
+ - element_id:
258
+ - Origin: Snapshot session layer
259
+ - Responsibility: Generated per snapshot traversal
260
+ - Scope: Session-scoped only
261
+
262
+ - stable_id:
263
+ - Origin: Platform adapter layer (Android/iOS/Web/Compose)
264
+ - Responsibility: Extracted from platform-native identifiers
265
+ - Constraint: MUST NOT be generated by heuristics alone
266
+
267
+ - role:
268
+ - Origin: Accessibility tree / platform UI framework
269
+ - Responsibility: Semantic role mapping from native UI system
270
+
271
+ - label / text:
272
+ - Origin: Platform accessibility node
273
+ - Responsibility: Visible or accessible text content extraction
274
+
275
+ - test_tag:
276
+ - Origin: Developer-defined metadata (when available)
277
+ - Responsibility: Explicit testing identifiers (e.g. accessibilityIdentifier, data-testid)
278
+
279
+ - selector:
280
+ - Origin: Resolution engine (post-processing layer)
281
+ - Responsibility: Generated match expression for agent targeting
282
+
283
+ - selector.confidence:
284
+ - Origin: Resolution engine
285
+ - Responsibility: Heuristic confidence scoring of selector correctness
286
+
287
+ ### 4.6.3 Layer Separation Rule
288
+
289
+ The system MUST maintain strict separation between:
290
+
291
+ - Platform extraction layer (stable_id, role, label, test_tag)
292
+ - Resolution layer (selector, confidence)
293
+ - Session layer (element_id)
294
+
295
+ No layer is permitted to overwrite another layer's source of truth.
296
+
297
+ ---
298
+
299
+ ## 4.6.4 TypeScript Contract (Implementation Binding)
300
+
301
+ This section defines the concrete TypeScript-level contract used by the codebase for snapshot and element resolution.
302
+
303
+ These types represent the implementation binding for the Snapshot Response Contract (Section 4.5).
304
+
305
+ ```ts
306
+ export interface SelectorConfidence {
307
+ score: number; // 0.0 - 1.0
308
+ reason: string;
309
+ }
310
+
311
+ export interface ElementSelector {
312
+ value: string;
313
+ confidence: SelectorConfidence;
314
+ }
315
+
316
+ export interface ElementSnapshot {
317
+ element_id: string; // session-scoped
318
+ stable_id?: string;
319
+ role: string;
320
+ label?: string;
321
+ text?: string;
322
+ test_tag?: string;
323
+ selector?: ElementSelector;
324
+ }
325
+
326
+ export interface SnapshotResponse {
327
+ elements: ElementSnapshot[];
328
+ }
329
+ ```
330
+
331
+ Notes:
332
+
333
+ - This interface MUST align with the runtime snapshot implementation.
334
+ - This is the canonical mapping between RFC definition and codebase types.
335
+ - Any deviation in implementation MUST be reflected in a future RFC revision.
336
+
337
+ ---
338
+
339
+ # 5. Failure Modes
340
+
341
+ ## 5.1 Ambiguous match
342
+
343
+ If multiple elements match a selector:
344
+
345
+ - The snapshot MUST include all matching candidates in the underlying element tree or debug snapshot.
346
+ - Current action APIs (e.g. find_element / tap / wait_for_ui) MAY return a single best-effort match for compatibility.
347
+ - When ambiguity exists, systems SHOULD expose candidate alternatives via snapshot inspection or debug instrumentation.
348
+ - Future extensions MAY introduce explicit multi-candidate resolution APIs, but are not required for RFC-002 compliance.
349
+
350
+ ---
351
+
352
+ ## 5.2 Missing stable identity
353
+
354
+ If stable_id is unavailable:
355
+
356
+ - fallback hierarchy MUST be used
357
+ - selector confidence SHOULD reflect reduced certainty and include reason="no_stable_id"
358
+ - retries MAY be triggered
359
+
360
+ ---
361
+
362
+ ## 5.3 Layout drift
363
+
364
+ If UI structure changes:
365
+
366
+ - stable_id remains valid if preserved
367
+ - structural selectors may degrade
368
+ - confidence SHOULD reflect uncertainty
369
+
370
+ ---
371
+
372
+ # 6. Acceptance Criteria
373
+
374
+ RFC-002 is complete when:
375
+
376
+ - platform-native identity metadata (stable_id, test_tag, role) is present where available
377
+ - selector confidence metadata is present and conforms to Snapshot Response Contract (Section 4.5)
378
+ - fallback resolution strategy is implemented
379
+ - element match success rate improves on benchmark flows
380
+ - selector-related retries are reduced
381
+
382
+ ---
383
+
384
+ # 7. Success Metrics
385
+
386
+ - Higher element resolution match rate using platform-native metadata + confidence hints
387
+ - Reduced selector retries
388
+ - Lower failure rate on UI updates
389
+ - Improved stability in Compose UI trees
390
+
391
+ ---
392
+
393
+ # 8. Out of Scope
394
+
395
+ - State verification (RFC-001)
396
+ - Wait/synchronization (RFC-003)
397
+ - Gestures (future RFCs)
398
+ - Action tracing
399
+
400
+ This RFC is scoped as a metadata and resolution hint layer. It does not guarantee stable identity across all platforms, but standardises how identity signals are exposed and consumed.
@@ -0,0 +1,232 @@
1
+
2
+
3
+ # RFC-003: Wait and Synchronization Reliability
4
+
5
+ Priority: 3
6
+ Depends on: RFC-001 (Stronger State Verification), RFC-002 (Platform-Native Element Metadata and Resolution Hints)
7
+
8
+ ---
9
+
10
+ # 1. Problem
11
+
12
+ Agents can often identify the right element (RFC-002) and verify the right state (RFC-001), but still fail because they act before the UI has reached the intended post-action state.
13
+
14
+ This causes:
15
+
16
+ - retries caused by racing the UI
17
+ - false failures from stale snapshots
18
+ - overuse of network/log verification when UI evidence should suffice
19
+ - flakiness in asynchronous and in-place update flows
20
+ - unreliable behaviour in Compose-heavy or thin accessibility trees
21
+
22
+ Current system limitations:
23
+
24
+ - wait_for_ui is underused after actions involving async state changes
25
+ - current waits focus on expected elements appearing, not general UI transition detection
26
+ - snapshot staleness is not explicitly surfaced
27
+ - loading state transitions are inconsistently observable
28
+
29
+ ---
30
+
31
+ # 2. Goals
32
+
33
+ This RFC introduces:
34
+
35
+ 1. UI-first synchronization policy after actions
36
+ 2. Snapshot staleness and revision metadata
37
+ 3. UI-change based waiting for in-place updates
38
+ 4. Structured loading-state detection
39
+ 5. Compose-aware synchronization hints
40
+
41
+ Success goals:
42
+
43
+ - reduce retries caused by premature actions
44
+ - increase successful post-action verification
45
+ - reduce unnecessary fallbacks to logs/network checks
46
+ - improve reliability in asynchronous UI flows
47
+
48
+ ---
49
+
50
+ # 3. Non-Goals
51
+
52
+ This RFC does not:
53
+
54
+ - redefine state verification semantics (RFC-001)
55
+ - redefine element identity contracts (RFC-002)
56
+ - add new interaction primitives (long press, pinch, etc.)
57
+ - replace network or log verification where no UI outcome exists
58
+
59
+ ---
60
+
61
+ # 4. Proposed Model
62
+
63
+ ## 4.1 UI-First Synchronization Policy
64
+
65
+ Default post-action flow SHOULD be:
66
+
67
+ ```text
68
+ action
69
+ → wait_for_ui(expected outcome)
70
+ → verify state
71
+ → only fall back to network/logs when no UI outcome exists or wait fails
72
+ ```
73
+
74
+ Rules:
75
+
76
+ - UI evidence MUST be preferred over network or log evidence when a UI outcome is expected.
77
+ - Actions that trigger navigation, async mutation, or visible state changes SHOULD be followed by a wait.
78
+ - Network/log checks are fallback signals, not primary synchronization mechanisms.
79
+
80
+ ---
81
+
82
+ ## 4.2 Snapshot Revision / Staleness Metadata
83
+
84
+ Snapshot responses MUST expose revision metadata.
85
+
86
+ Preferred shape:
87
+
88
+ ```json
89
+ {
90
+ "snapshot_revision": 184,
91
+ "captured_at_ms": 1714452012301
92
+ }
93
+ ```
94
+
95
+ Rules:
96
+
97
+ - snapshot_revision MUST increment when hierarchy meaningfully changes.
98
+ - captured_at_ms MUST reflect snapshot capture time.
99
+ - Agents SHOULD use revision changes as synchronization signals.
100
+ - Agents SHOULD treat stale revisions as suspect for verification.
101
+
102
+ ---
103
+
104
+ ## 4.3 wait_for_ui_change Primitive
105
+
106
+ A UI-diff based wait primitive SHOULD support waiting on hierarchy changes, not only explicit expected elements.
107
+
108
+ Conceptual contract:
109
+
110
+ ```ts
111
+ wait_for_ui_change({
112
+ expected_change?: "hierarchy_diff" | "text_change" | "state_change",
113
+ timeout_ms?: number
114
+ })
115
+ ```
116
+
117
+ Use cases:
118
+
119
+ - in-place content refresh
120
+ - async partial rerender
121
+ - list mutations
122
+ - Compose recomposition-driven updates
123
+
124
+ Rules:
125
+
126
+ - wait_for_ui_change SHOULD detect meaningful UI deltas, not cosmetic noise.
127
+ - It MAY use snapshot revisions as one signal.
128
+ - It complements wait_for_ui; it does not replace it.
129
+
130
+ ---
131
+
132
+ ## 4.4 Structured Loading-State Detection
133
+
134
+ Systems SHOULD surface structured loading signals when detectable.
135
+
136
+ Examples:
137
+
138
+ - progress indicator present/absent
139
+ - disabled submit button becomes enabled
140
+ - loading spinner removed
141
+
142
+ Preferred model:
143
+
144
+ ```json
145
+ {
146
+ "loading_state": {
147
+ "active": true,
148
+ "signal": "progress_indicator"
149
+ }
150
+ }
151
+ ```
152
+
153
+ Rules:
154
+
155
+ - loading start and loading completion SHOULD be detectable when possible.
156
+ - Loading signals MAY be used as synchronization hints, not sole success criteria.
157
+
158
+ ---
159
+
160
+ ## 4.5 Compose-Aware Synchronization Hints
161
+
162
+ For Compose or thin accessibility structures:
163
+
164
+ Systems SHOULD support:
165
+
166
+ - merged semantic node changes as wait signals
167
+ - text mutations within existing nodes
168
+ - in-place recomposition awareness
169
+
170
+ These are synchronization hints layered on top of standard wait behaviour.
171
+
172
+ ---
173
+
174
+ # 5. Failure Modes
175
+
176
+ ## 5.1 Premature Action Progression
177
+
178
+ If an action is followed immediately by verification without waiting:
179
+
180
+ - system SHOULD bias toward suggesting wait_for_ui
181
+ - retries SHOULD prefer synchronization correction before repeated action execution
182
+
183
+ ---
184
+
185
+ ## 5.2 Stale Snapshot Reads
186
+
187
+ If verification uses an old snapshot:
188
+
189
+ - revision metadata SHOULD expose staleness
190
+ - agents SHOULD reacquire snapshot before retrying verification
191
+
192
+ ---
193
+
194
+ ## 5.3 No Visible UI Outcome
195
+
196
+ If no UI outcome is expected:
197
+
198
+ - network/log verification MAY be primary evidence
199
+ - UI-first policy does not apply rigidly
200
+
201
+ ---
202
+
203
+ # 6. Acceptance Criteria
204
+
205
+ RFC-003 is complete when:
206
+
207
+ - UI-first synchronization policy is enforced in agent guidance
208
+ - snapshot revision metadata is exposed in snapshot responses
209
+ - wait_for_ui_change contract is implemented or stubbed
210
+ - structured loading-state hints are surfaced where detectable
211
+ - retries caused by premature action progression are reduced
212
+
213
+ ---
214
+
215
+ # 7. Success Metrics
216
+
217
+ - Fewer retries caused by timing/synchronization errors
218
+ - Higher post-action verification success rate
219
+ - Reduced unnecessary fallback to network/log evidence
220
+ - Improved stability in asynchronous and Compose-heavy flows
221
+
222
+ ---
223
+
224
+ # 8. Deferred To Later RFCs
225
+
226
+ - Advanced subscriptions / notify-when-element-appears APIs
227
+ - Full action-to-ui trace correlation (Priority 7)
228
+ - Gesture-trigger-specific synchronization logic
229
+
230
+ ---
231
+
232
+ This RFC standardises temporal reliability and synchronization signals layered on top of state verification and element identity guarantees from RFC-001 and RFC-002.
@@ -238,6 +238,7 @@ Raw layer contents include:
238
238
 
239
239
  - UI hierarchy or accessibility tree
240
240
  - normalized readable element state where exposed by the platform
241
+ - platform-native identity hints such as stable identifiers, roles, and test tags
241
242
  - screenshot when available
242
243
  - element-level attributes
243
244
  - logs and fingerprint/activity observations
@@ -83,12 +83,13 @@ Input:
83
83
  Response (example):
84
84
 
85
85
  ```json
86
- { "device": { "platform": "android", "id": "emulator-5554" }, "screen": "", "resolution": { "width": 1080, "height": 2400 }, "elements": [ { "text": "Sign in", "type": "android.widget.Button", "resourceId": "com.example:id/signin", "clickable": true, "bounds": [0,0,100,50], "state": { "enabled": true } } ] }
86
+ { "device": { "platform": "android", "id": "emulator-5554" }, "screen": "", "resolution": { "width": 1080, "height": 2400 }, "elements": [ { "text": "Sign in", "type": "android.widget.Button", "resourceId": "com.example:id/signin", "clickable": true, "bounds": [0,0,100,50], "state": { "enabled": true }, "stable_id": "com.example:id/signin", "role": "button", "test_tag": "com.example:id/signin", "selector": { "value": "com.example:id/signin", "confidence": { "score": 1, "reason": "resource_id" } }, "semantic": { "is_clickable": true, "is_container": false } } ] }
87
87
  ```
88
88
 
89
89
  Notes:
90
90
  - Useful for inspection, selector development, and fallback debugging.
91
91
  - Elements may include a normalized `state` object when the platform exposes readable state such as checked, selected, focused, expanded, text input, or slider values.
92
+ - Elements may also include platform-native identity hints such as `stable_id`, `role`, `test_tag`, `selector`, and `semantic`.
92
93
  - Prefer `wait_for_ui` for deterministic element resolution in interactive flows.
93
94
 
94
95
  ---
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "mobile-debug-mcp",
3
- "version": "0.25.0",
3
+ "version": "0.25.1",
4
4
  "description": "MCP server for mobile app debugging (Android + iOS), with focus on security and reliability",
5
5
  "type": "module",
6
6
  "bin": {