mobile-debug-mcp 0.26.1 → 0.26.3

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,230 @@
1
+ # RFC 006 — Runtime Action Instrumentation & Binding Layer
2
+
3
+ ## 1. Summary
4
+
5
+ This RFC defines how the execution model in RFC 005 is mapped onto the current runtime behaviour of the system.
6
+
7
+ It does not assume a new instrumentation system exists. Instead, it describes how lifecycle semantics are derived from existing execution flows, logs, module behaviour, and lightweight runtime metadata attached to action envelopes.
8
+
9
+ It specifies:
10
+ - how existing `action_type` values are interpreted under RFC 005 semantics
11
+ - how lifecycle states are inferred from current runtime execution
12
+ - how `src/server` and `src/interact` currently participate in execution
13
+ - how legacy and platform actions are incorporated into the model
14
+
15
+ This RFC is a runtime binding and normalisation layer over existing implementation behaviour.
16
+
17
+ ---
18
+
19
+ ## 2. Problem Statement
20
+
21
+ RFC 005 defines a unified execution lifecycle:
22
+ - Resolved
23
+ - Dispatched
24
+ - Pending Verification
25
+ - Verified
26
+ - Failed
27
+
28
+ However, the current system already contains:
29
+ - a concrete `action_type` execution model
30
+ - execution logic split across `src/server` and `src/interact`
31
+ - platform-specific actions (tap_element, type_text, press_back, start_app, restart_app, scroll_to_element)
32
+ - distributed logging and partial instrumentation within modules
33
+
34
+ There is no central instrumentation system and no explicit lifecycle emitter.
35
+ Instead, lifecycle meaning is inferred from runtime behaviour and the `lifecycle_state` / `source_module` fields now attached to action envelopes.
36
+
37
+ This results in:
38
+ - implicit execution state transitions
39
+ - distributed observability signals
40
+ - non-uniform traceability across actions
41
+
42
+ ---
43
+
44
+ ## 3. Design Goals
45
+
46
+ This layer MUST:
47
+
48
+ - Map existing runtime behaviour to RFC 005 lifecycle semantics
49
+ - Use existing `action_type` values as the authoritative execution taxonomy
50
+ - Derive lifecycle states from observable runtime transitions
51
+ - Reflect actual module responsibilities (not idealised separation)
52
+ - Work with existing logging and execution hooks
53
+ - Preserve compatibility with all current action implementations
54
+
55
+ ---
56
+
57
+ ## 4. Runtime Execution Flow (Observed)
58
+
59
+ Current observed execution flow:
60
+
61
+ UI Request
62
+ → src/server (routing + validation)
63
+ → src/interact (execution + platform dispatch)
64
+ → platform layer
65
+ → response handling + logs
66
+ → optional state verification (where available)
67
+
68
+ Lifecycle states are derived from this flow rather than explicitly emitted.
69
+
70
+ ---
71
+
72
+ ## 5. Action Type Mapping (Current Runtime)
73
+
74
+ This RFC maps existing `action_type` values to RFC 005 semantics.
75
+
76
+ | action_type | RFC 005 Semantic Interpretation |
77
+ |------------|---------------------------------|
78
+ | tap | Selection |
79
+ | tap_element | Selection |
80
+ | type_text | Input |
81
+ | press_back | Navigation |
82
+ | start_app | System Action |
83
+ | restart_app | System Action |
84
+ | scroll_to_element | Navigation |
85
+
86
+ This table reflects the current runtime contract.
87
+
88
+ ---
89
+
90
+ ## 6. Lifecycle State Derivation
91
+
92
+ Lifecycle states are NOT explicitly emitted. They are inferred as follows:
93
+
94
+ ### 6.1 Resolved
95
+ Inferred when:
96
+ - src/server accepts request
97
+ - action is validated and normalized
98
+ - action_id is assigned (or equivalent identifier exists)
99
+
100
+ ---
101
+
102
+ ### 6.2 Dispatched
103
+ Inferred when:
104
+ - control passes from src/server to src/interact
105
+ - execution call is issued to platform layer
106
+
107
+ ---
108
+
109
+ ### 6.3 Pending Verification
110
+ Inferred when:
111
+ - platform execution returns a result
112
+ - before any UI/state evaluation occurs
113
+
114
+ ---
115
+
116
+ ### 6.4 Verified / Failed
117
+ Inferred when:
118
+ - post-execution evaluation is performed (if available)
119
+
120
+ Rules:
121
+ - Verified = expected outcome observed in UI/state/log signals
122
+ - Failed = timeout, error, or mismatch in expected outcome
123
+
124
+ Where no formal verification exists, outcome is derived from best available signals (logs, UI diff, or absence of error).
125
+
126
+ ---
127
+
128
+ ## 7. Instrumentation Reality
129
+
130
+ There is no central instrumentation layer in the current system.
131
+
132
+ Instead:
133
+ - src/server emits partial logs during routing and validation
134
+ - src/interact emits execution logs and platform responses
135
+ - platform adapters may emit additional debugging information
136
+ - action envelopes now carry lightweight lifecycle metadata for post-dispatch state and source ownership
137
+
138
+ Lifecycle traceability is therefore assembled from distributed signals rather than a unified event system.
139
+
140
+ ---
141
+
142
+ ## 8. Module Responsibilities (Observed Behaviour)
143
+
144
+ ### src/server
145
+ - receives action requests
146
+ - performs validation and normalization
147
+ - assigns identifiers where applicable
148
+ - routes actions to src/interact
149
+ - emits partial logs for request lifecycle
150
+
151
+ ---
152
+
153
+ ### src/interact
154
+ - executes platform-specific actions
155
+ - handles retries and fallback behaviours
156
+ - emits execution logs
157
+ - returns execution results
158
+ - may perform lightweight post-processing
159
+
160
+ ---
161
+
162
+ ## 9. Verification Reality
163
+
164
+ Verification is not a uniform system-wide layer.
165
+
166
+ It may occur via:
167
+ - UI state comparison (where available)
168
+ - log-based confirmation
169
+ - absence of error signals
170
+ - platform feedback
171
+
172
+ Verification outcomes are best-effort only where no formal verifier exists, and deterministic where reliable state signals or explicit evaluation paths are available.
173
+
174
+ ---
175
+
176
+ ## 10. Legacy and Special Actions
177
+
178
+ Actions such as:
179
+ - scroll_to_element
180
+ - start_app
181
+ - restart_app
182
+ - press_back
183
+
184
+ are fully supported in the runtime.
185
+
186
+ These actions:
187
+ - may bypass full lifecycle observability
188
+ - may not have explicit verification paths
189
+ - are interpreted using best-effort semantic mapping
190
+
191
+ ---
192
+
193
+ ## 11. Observability Model
194
+
195
+ Observability is currently distributed across:
196
+ - src/server logs
197
+ - src/interact logs
198
+ - platform debug output
199
+ - action envelope metadata
200
+
201
+ There is no unified event schema.
202
+
203
+ Lifecycle reconstruction requires correlation of:
204
+ - action_type
205
+ - timestamps
206
+ - execution boundaries
207
+ - error signals
208
+
209
+ ---
210
+
211
+ ## 12. Relationship to RFC 005
212
+
213
+ RFC 005 defines the ideal execution lifecycle semantics.
214
+
215
+ RFC 006 defines how those semantics are interpreted from the existing runtime system.
216
+
217
+ Together:
218
+ - RFC 005 = conceptual correctness model
219
+ - RFC 006 = runtime behavioural mapping layer
220
+
221
+ ---
222
+
223
+ ## 13. Summary
224
+
225
+ This RFC ensures:
226
+ - lifecycle semantics can be derived from current runtime behaviour
227
+ - existing action_type contract is preserved as source of truth
228
+ - no assumption of new instrumentation infrastructure is required
229
+ - real module responsibilities are accurately represented
230
+ - observability is understood as distributed rather than centralised
@@ -0,0 +1,277 @@
1
+ # RFC 007 — Actionability Resolution and Executable Target Selection
2
+
3
+ ## 1. Summary
4
+
5
+ This RFC defines how the system resolves which discovered UI element should receive an action before dispatch.
6
+
7
+ It addresses ambiguity between:
8
+ - visible elements vs actionable elements
9
+ - leaf nodes vs clickable containers
10
+ - semantic targets vs coordinate fallbacks
11
+ - multiple candidate targets with uncertain executability
12
+
13
+ Goal:
14
+ Improve first-attempt action correctness by resolving the best executable target prior to action dispatch.
15
+
16
+ This RFC defines the `Resolved` stage semantics referenced in RFC 005 and operationalized by RFC 006.
17
+ It is grounded in the existing element-resolution flow and extends current resolution behavior rather than assuming a wholly new resolver architecture.
18
+
19
+ ---
20
+
21
+ ## 2. Problem Statement
22
+
23
+ Current interaction failures often arise before execution.
24
+
25
+ The agent may discover the intended UI concept, but not the correct executable target.
26
+
27
+ Examples:
28
+ - tapping label text instead of clickable container
29
+ - sliders not surfacing semantic handles
30
+ - generic Compose containers hiding true affordances
31
+ - multiple matching targets without ranking logic
32
+
33
+ Observed failure modes:
34
+ - false taps
35
+ - submit ambiguity
36
+ - coordinate guessing
37
+ - retry loops
38
+ - brittle fallback behavior
39
+
40
+ This is a target-resolution problem, not an execution problem.
41
+
42
+ ---
43
+
44
+ ## 3. Design Goals
45
+
46
+ Resolution MUST:
47
+ - Prefer executable targets over merely visible matches
48
+ - Reduce ambiguous target selection
49
+ - Support confidence-based ranking
50
+ - Build on existing runtime resolution surfaces before introducing new resolution metadata
51
+ - Use structural and semantic resolution signals
52
+ - Minimize coordinate fallback usage
53
+ - Integrate with verification expectations from RFC 005
54
+
55
+ ---
56
+
57
+ ## 4. Actionability Model
58
+
59
+ Candidate targets are evaluated using actionability signals.
60
+
61
+ ### Structural signals
62
+ - clickable
63
+ - enabled
64
+ - focusable
65
+ - bounds
66
+ - parent action ownership
67
+
68
+ ### Semantic signals
69
+ - control role
70
+ - label association
71
+ - affordance hints
72
+ - selectable or adjustable semantics
73
+
74
+ ### Interaction signals
75
+ - reliable target patterns
76
+ - control-specific heuristics
77
+ - gesture compatibility
78
+
79
+ ---
80
+
81
+ ## 4.1 Current Runtime Resolution Surfaces
82
+
83
+ This RFC builds on current runtime resolution paths, including:
84
+ - `findElementHandler` for candidate discovery
85
+ - `_resolveActionableAncestor` for executable ancestor promotion
86
+ - `tapElementHandler` for resolved element dispatch
87
+ - `scrollToElementHandler` for scroll-mediated target acquisition
88
+
89
+ These existing handlers are the current implementation substrate for the Resolved stage.
90
+ This RFC extends and systematizes those behaviors; it does not assume replacement of those paths.
91
+
92
+ ---
93
+
94
+ ## 5. Target Candidate Ranking
95
+
96
+ When multiple targets match, candidates are ranked.
97
+
98
+ Illustrative confidence model:
99
+
100
+ resolution_confidence =
101
+ interactability_score
102
+ + semantic_match_score
103
+ + structural_reliability_score
104
+
105
+ Highest-confidence executable target is preferred.
106
+
107
+ The confidence model is illustrative and normative only at the rule-precedence level; implementations may use simpler heuristics while preserving resolution ordering guarantees. Any scoring mechanism is implementation-defined and may not be externally surfaced.
108
+
109
+ ---
110
+
111
+ ## 6. Resolution Rules
112
+
113
+ ### Rule A — Prefer actionable containers over passive leaf nodes
114
+
115
+ Prefer:
116
+ - clickable container
117
+
118
+ Over:
119
+ - passive child text nodes
120
+
121
+ Example:
122
+ Prefer button container over "Generate Session" label node.
123
+
124
+ ---
125
+
126
+ ### Rule B — Prefer semantic controls over coordinate fallbacks
127
+
128
+ Use semantic control targets whenever possible.
129
+
130
+ Coordinate fallback only when:
131
+ - no semantic target exists
132
+ - adjustable control semantics absent
133
+ - fallback confidence acceptable
134
+
135
+ ---
136
+
137
+ ### Rule C — Prefer explicit affordance ownership
138
+
139
+ If child and parent differ:
140
+ prefer the node owning the action handler.
141
+
142
+ ---
143
+
144
+ ## 7. Ambiguity Handling
145
+
146
+ When multiple plausible targets remain:
147
+
148
+ System SHOULD:
149
+ - rank candidates
150
+ - expose confidence
151
+ - preserve alternates for fallback reasoning
152
+
153
+ Low-confidence targets may trigger:
154
+ - guarded execution
155
+ - alternate resolution attempt
156
+ - explicit recovery path
157
+
158
+ ---
159
+
160
+ ## 8. Adjustable Control Resolution
161
+
162
+ Special handling for:
163
+ - sliders
164
+ - steppers
165
+ - drag controls
166
+
167
+ Support:
168
+ - adjustable-role recognition
169
+ - control-bound discovery
170
+ - value-aware interaction targeting
171
+
172
+ This RFC defines target resolution.
173
+ Value-setting behavior remains governed by Adjustable Control Support.
174
+
175
+ ---
176
+
177
+ ## 9. Compose / Custom Control Resolution
178
+
179
+ Support derived actionability for:
180
+ - merged Compose semantics
181
+ - composite controls
182
+ - inferred interaction contracts
183
+
184
+ This RFC depends on and strengthens Better Compose / Custom Control Semantics.
185
+
186
+ ---
187
+
188
+ ## 10. Resolution Output Model (Current + Future Extension)
189
+
190
+ This model is non-normative and represents a progressive enrichment direction rather than a required runtime contract.
191
+
192
+ Resolution may evolve toward the following enriched output shape. Current runtime implementations may expose only resolved-target output plus limited supporting metadata.
193
+
194
+ At minimum, current implementations are expected to produce a resolved target. Confidence, alternates, fallback metadata, and reason codes may be introduced incrementally.
195
+
196
+ Illustrative future-complete shape:
197
+
198
+ {
199
+ "resolved_target": "...",
200
+ "confidence": 0.92,
201
+ "fallback_available": true,
202
+ "resolution_reason": "clickable_parent_preferred"
203
+ }
204
+
205
+ ---
206
+
207
+ ## 11. Verification Integration
208
+
209
+ Resolution is incomplete without verification expectations.
210
+
211
+ Resolved output should be derived directly from the existing element-resolution flow before adding richer metadata layers.
212
+
213
+ Resolved target should carry expected post-action signal.
214
+
215
+ Examples:
216
+ - navigation transition expected
217
+ - menu expected
218
+ - control value change expected
219
+
220
+ This feeds RFC 005 verification.
221
+
222
+ ---
223
+
224
+ ## 12. Success Metrics
225
+
226
+ Track:
227
+ - reduced false-tap failures
228
+ - lower retarget retries
229
+ - higher first-attempt action success
230
+ - reduced coordinate fallback usage
231
+ - improved custom control interaction success
232
+
233
+ ---
234
+
235
+ ## 13. Dependencies
236
+
237
+ Depends on:
238
+ - Stronger State Verification
239
+ - Richer Element Identity
240
+ - Wait and Synchronization Reliability
241
+
242
+ Strengthens:
243
+ - Adjustable Control Support
244
+ - Better Compose / Custom Control Semantics
245
+
246
+ ---
247
+
248
+ ## 14. Relationship to Other RFCs
249
+
250
+ RFC 005
251
+ Defines what Resolved means in lifecycle semantics.
252
+
253
+ RFC 006
254
+ Defines how runtime interprets action execution.
255
+
256
+ RFC 007
257
+ Defines how a target becomes Resolved.
258
+ Specifically, it formalizes the current discovery → actionable ancestor resolution → dispatch preparation flow already present in runtime handlers.
259
+
260
+ Together:
261
+ - RFC 005 — action correctness
262
+ - RFC 006 — runtime execution binding
263
+ - RFC 007 — executable target resolution
264
+
265
+ ---
266
+
267
+ ## 15. Summary
268
+
269
+ This RFC reduces failures caused by acting on the wrong thing, even when the right thing was discovered.
270
+
271
+ It improves:
272
+ - action precision
273
+ - control reliability
274
+ - Compose interaction robustness
275
+ - agent success with fewer retries
276
+
277
+ It addresses one of the largest remaining sources of interaction brittleness.
@@ -69,6 +69,8 @@ MUST be returned in this structure:
69
69
  action_id: string,
70
70
  timestamp: string,
71
71
  action_type: string,
72
+ lifecycle_state?: 'pending_verification' | 'failed',
73
+ source_module?: 'server' | 'interact',
72
74
  target: {
73
75
  selector: object,
74
76
  resolved: object | null
@@ -87,6 +89,8 @@ Rules:
87
89
 
88
90
  - `success` is at the top level, not nested
89
91
  - `target` contains only selection and resolution context
92
+ - `lifecycle_state` reflects the post-dispatch runtime state
93
+ - `source_module` identifies where the envelope was produced
90
94
  - fingerprints represent observed pre/post UI state on a best-effort basis
91
95
  - `failure_code` is optional but MUST be used when a structured mapping exists
92
96
 
@@ -36,6 +36,8 @@ Example response:
36
36
  "action_id": "tap_1710000000000_1",
37
37
  "timestamp": "2026-04-23T08:00:00.000Z",
38
38
  "action_type": "tap",
39
+ "lifecycle_state": "pending_verification",
40
+ "source_module": "server",
39
41
  "target": { "selector": { "x": 100, "y": 200 }, "resolved": null },
40
42
  "success": true,
41
43
  "ui_fingerprint_before": "fp_before",
@@ -197,7 +199,14 @@ Output:
197
199
  "telemetry": { "matchedIndex": 3, "matchedInteractable": true }
198
200
  },
199
201
  "score": 1.0,
200
- "confidence": 1.0
202
+ "confidence": 1.0,
203
+ "resolution": {
204
+ "confidence": 1.0,
205
+ "reason": "exact_text_match",
206
+ "fallback_available": false,
207
+ "matched_count": 1,
208
+ "alternates": []
209
+ }
201
210
  }
202
211
  ```
203
212
 
@@ -205,6 +214,7 @@ Notes:
205
214
 
206
215
  - Best used when no precise selector is available yet.
207
216
  - `tapCoordinates` are suitable for `tap` calls.
217
+ - `resolution` explains why the element was selected and may include fallback alternates when the runtime had to promote a parent or nearby control.
208
218
  - Prefer `wait_for_ui` when you already know a deterministic selector and want a stable `elementId`.
209
219
 
210
220
  ---
@@ -333,6 +343,8 @@ Success response:
333
343
  "action_id": "tap_element_1710000000000_1",
334
344
  "timestamp": "2026-04-23T08:00:00.000Z",
335
345
  "action_type": "tap_element",
346
+ "lifecycle_state": "pending_verification",
347
+ "source_module": "interact",
336
348
  "target": {
337
349
  "selector": { "elementId": "el_123" },
338
350
  "resolved": {
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "mobile-debug-mcp",
3
- "version": "0.26.1",
3
+ "version": "0.26.3",
4
4
  "description": "MCP server for mobile app debugging (Android + iOS), with focus on security and reliability",
5
5
  "type": "module",
6
6
  "bin": {