mobile-debug-mcp 0.26.1 → 0.26.3
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/AGENTS.md +3 -0
- package/dist/interact/index.js +169 -102
- package/dist/server/common.js +14 -1
- package/dist/server/tool-definitions.js +22 -4
- package/dist/server/tool-handlers.js +7 -0
- package/dist/server-core.js +1 -1
- package/docs/CHANGELOG.md +6 -0
- package/docs/ROADMAP.md +242 -76
- package/docs/rfcs/005-unified-action-execution-and-verification-model.md +216 -0
- package/docs/rfcs/006-runtime-action-instrumentation-and-binding-layer.md +230 -0
- package/docs/rfcs/007-actionability-resolution-and-executable-target-selection.md +277 -0
- package/docs/specs/mcp-tooling-spec-v1.md +4 -0
- package/docs/tools/interact.md +13 -1
- package/package.json +1 -1
- package/src/interact/index.ts +203 -107
- package/src/server/common.ts +22 -1
- package/src/server/tool-definitions.ts +22 -4
- package/src/server/tool-handlers.ts +7 -0
- package/src/server-core.ts +1 -1
- package/src/types.ts +75 -0
- package/test/unit/observe/find_element.test.ts +5 -0
- package/test/unit/server/response_shapes.test.ts +8 -0
|
@@ -0,0 +1,230 @@
|
|
|
1
|
+
# RFC 006 — Runtime Action Instrumentation & Binding Layer
|
|
2
|
+
|
|
3
|
+
## 1. Summary
|
|
4
|
+
|
|
5
|
+
This RFC defines how the execution model in RFC 005 is mapped onto the current runtime behaviour of the system.
|
|
6
|
+
|
|
7
|
+
It does not assume a new instrumentation system exists. Instead, it describes how lifecycle semantics are derived from existing execution flows, logs, module behaviour, and lightweight runtime metadata attached to action envelopes.
|
|
8
|
+
|
|
9
|
+
It specifies:
|
|
10
|
+
- how existing `action_type` values are interpreted under RFC 005 semantics
|
|
11
|
+
- how lifecycle states are inferred from current runtime execution
|
|
12
|
+
- how `src/server` and `src/interact` currently participate in execution
|
|
13
|
+
- how legacy and platform actions are incorporated into the model
|
|
14
|
+
|
|
15
|
+
This RFC is a runtime binding and normalisation layer over existing implementation behaviour.
|
|
16
|
+
|
|
17
|
+
---
|
|
18
|
+
|
|
19
|
+
## 2. Problem Statement
|
|
20
|
+
|
|
21
|
+
RFC 005 defines a unified execution lifecycle:
|
|
22
|
+
- Resolved
|
|
23
|
+
- Dispatched
|
|
24
|
+
- Pending Verification
|
|
25
|
+
- Verified
|
|
26
|
+
- Failed
|
|
27
|
+
|
|
28
|
+
However, the current system already contains:
|
|
29
|
+
- a concrete `action_type` execution model
|
|
30
|
+
- execution logic split across `src/server` and `src/interact`
|
|
31
|
+
- platform-specific actions (tap_element, type_text, press_back, start_app, restart_app, scroll_to_element)
|
|
32
|
+
- distributed logging and partial instrumentation within modules
|
|
33
|
+
|
|
34
|
+
There is no central instrumentation system and no explicit lifecycle emitter.
|
|
35
|
+
Instead, lifecycle meaning is inferred from runtime behaviour and the `lifecycle_state` / `source_module` fields now attached to action envelopes.
|
|
36
|
+
|
|
37
|
+
This results in:
|
|
38
|
+
- implicit execution state transitions
|
|
39
|
+
- distributed observability signals
|
|
40
|
+
- non-uniform traceability across actions
|
|
41
|
+
|
|
42
|
+
---
|
|
43
|
+
|
|
44
|
+
## 3. Design Goals
|
|
45
|
+
|
|
46
|
+
This layer MUST:
|
|
47
|
+
|
|
48
|
+
- Map existing runtime behaviour to RFC 005 lifecycle semantics
|
|
49
|
+
- Use existing `action_type` values as the authoritative execution taxonomy
|
|
50
|
+
- Derive lifecycle states from observable runtime transitions
|
|
51
|
+
- Reflect actual module responsibilities (not idealised separation)
|
|
52
|
+
- Work with existing logging and execution hooks
|
|
53
|
+
- Preserve compatibility with all current action implementations
|
|
54
|
+
|
|
55
|
+
---
|
|
56
|
+
|
|
57
|
+
## 4. Runtime Execution Flow (Observed)
|
|
58
|
+
|
|
59
|
+
Current observed execution flow:
|
|
60
|
+
|
|
61
|
+
UI Request
|
|
62
|
+
→ src/server (routing + validation)
|
|
63
|
+
→ src/interact (execution + platform dispatch)
|
|
64
|
+
→ platform layer
|
|
65
|
+
→ response handling + logs
|
|
66
|
+
→ optional state verification (where available)
|
|
67
|
+
|
|
68
|
+
Lifecycle states are derived from this flow rather than explicitly emitted.
|
|
69
|
+
|
|
70
|
+
---
|
|
71
|
+
|
|
72
|
+
## 5. Action Type Mapping (Current Runtime)
|
|
73
|
+
|
|
74
|
+
This RFC maps existing `action_type` values to RFC 005 semantics.
|
|
75
|
+
|
|
76
|
+
| action_type | RFC 005 Semantic Interpretation |
|
|
77
|
+
|------------|---------------------------------|
|
|
78
|
+
| tap | Selection |
|
|
79
|
+
| tap_element | Selection |
|
|
80
|
+
| type_text | Input |
|
|
81
|
+
| press_back | Navigation |
|
|
82
|
+
| start_app | System Action |
|
|
83
|
+
| restart_app | System Action |
|
|
84
|
+
| scroll_to_element | Navigation |
|
|
85
|
+
|
|
86
|
+
This table reflects the current runtime contract.
|
|
87
|
+
|
|
88
|
+
---
|
|
89
|
+
|
|
90
|
+
## 6. Lifecycle State Derivation
|
|
91
|
+
|
|
92
|
+
Lifecycle states are NOT explicitly emitted. They are inferred as follows:
|
|
93
|
+
|
|
94
|
+
### 6.1 Resolved
|
|
95
|
+
Inferred when:
|
|
96
|
+
- src/server accepts request
|
|
97
|
+
- action is validated and normalized
|
|
98
|
+
- action_id is assigned (or equivalent identifier exists)
|
|
99
|
+
|
|
100
|
+
---
|
|
101
|
+
|
|
102
|
+
### 6.2 Dispatched
|
|
103
|
+
Inferred when:
|
|
104
|
+
- control passes from src/server to src/interact
|
|
105
|
+
- execution call is issued to platform layer
|
|
106
|
+
|
|
107
|
+
---
|
|
108
|
+
|
|
109
|
+
### 6.3 Pending Verification
|
|
110
|
+
Inferred when:
|
|
111
|
+
- platform execution returns a result
|
|
112
|
+
- before any UI/state evaluation occurs
|
|
113
|
+
|
|
114
|
+
---
|
|
115
|
+
|
|
116
|
+
### 6.4 Verified / Failed
|
|
117
|
+
Inferred when:
|
|
118
|
+
- post-execution evaluation is performed (if available)
|
|
119
|
+
|
|
120
|
+
Rules:
|
|
121
|
+
- Verified = expected outcome observed in UI/state/log signals
|
|
122
|
+
- Failed = timeout, error, or mismatch in expected outcome
|
|
123
|
+
|
|
124
|
+
Where no formal verification exists, outcome is derived from best available signals (logs, UI diff, or absence of error).
|
|
125
|
+
|
|
126
|
+
---
|
|
127
|
+
|
|
128
|
+
## 7. Instrumentation Reality
|
|
129
|
+
|
|
130
|
+
There is no central instrumentation layer in the current system.
|
|
131
|
+
|
|
132
|
+
Instead:
|
|
133
|
+
- src/server emits partial logs during routing and validation
|
|
134
|
+
- src/interact emits execution logs and platform responses
|
|
135
|
+
- platform adapters may emit additional debugging information
|
|
136
|
+
- action envelopes now carry lightweight lifecycle metadata for post-dispatch state and source ownership
|
|
137
|
+
|
|
138
|
+
Lifecycle traceability is therefore assembled from distributed signals rather than a unified event system.
|
|
139
|
+
|
|
140
|
+
---
|
|
141
|
+
|
|
142
|
+
## 8. Module Responsibilities (Observed Behaviour)
|
|
143
|
+
|
|
144
|
+
### src/server
|
|
145
|
+
- receives action requests
|
|
146
|
+
- performs validation and normalization
|
|
147
|
+
- assigns identifiers where applicable
|
|
148
|
+
- routes actions to src/interact
|
|
149
|
+
- emits partial logs for request lifecycle
|
|
150
|
+
|
|
151
|
+
---
|
|
152
|
+
|
|
153
|
+
### src/interact
|
|
154
|
+
- executes platform-specific actions
|
|
155
|
+
- handles retries and fallback behaviours
|
|
156
|
+
- emits execution logs
|
|
157
|
+
- returns execution results
|
|
158
|
+
- may perform lightweight post-processing
|
|
159
|
+
|
|
160
|
+
---
|
|
161
|
+
|
|
162
|
+
## 9. Verification Reality
|
|
163
|
+
|
|
164
|
+
Verification is not a uniform system-wide layer.
|
|
165
|
+
|
|
166
|
+
It may occur via:
|
|
167
|
+
- UI state comparison (where available)
|
|
168
|
+
- log-based confirmation
|
|
169
|
+
- absence of error signals
|
|
170
|
+
- platform feedback
|
|
171
|
+
|
|
172
|
+
Verification outcomes are best-effort only where no formal verifier exists, and deterministic where reliable state signals or explicit evaluation paths are available.
|
|
173
|
+
|
|
174
|
+
---
|
|
175
|
+
|
|
176
|
+
## 10. Legacy and Special Actions
|
|
177
|
+
|
|
178
|
+
Actions such as:
|
|
179
|
+
- scroll_to_element
|
|
180
|
+
- start_app
|
|
181
|
+
- restart_app
|
|
182
|
+
- press_back
|
|
183
|
+
|
|
184
|
+
are fully supported in the runtime.
|
|
185
|
+
|
|
186
|
+
These actions:
|
|
187
|
+
- may bypass full lifecycle observability
|
|
188
|
+
- may not have explicit verification paths
|
|
189
|
+
- are interpreted using best-effort semantic mapping
|
|
190
|
+
|
|
191
|
+
---
|
|
192
|
+
|
|
193
|
+
## 11. Observability Model
|
|
194
|
+
|
|
195
|
+
Observability is currently distributed across:
|
|
196
|
+
- src/server logs
|
|
197
|
+
- src/interact logs
|
|
198
|
+
- platform debug output
|
|
199
|
+
- action envelope metadata
|
|
200
|
+
|
|
201
|
+
There is no unified event schema.
|
|
202
|
+
|
|
203
|
+
Lifecycle reconstruction requires correlation of:
|
|
204
|
+
- action_type
|
|
205
|
+
- timestamps
|
|
206
|
+
- execution boundaries
|
|
207
|
+
- error signals
|
|
208
|
+
|
|
209
|
+
---
|
|
210
|
+
|
|
211
|
+
## 12. Relationship to RFC 005
|
|
212
|
+
|
|
213
|
+
RFC 005 defines the ideal execution lifecycle semantics.
|
|
214
|
+
|
|
215
|
+
RFC 006 defines how those semantics are interpreted from the existing runtime system.
|
|
216
|
+
|
|
217
|
+
Together:
|
|
218
|
+
- RFC 005 = conceptual correctness model
|
|
219
|
+
- RFC 006 = runtime behavioural mapping layer
|
|
220
|
+
|
|
221
|
+
---
|
|
222
|
+
|
|
223
|
+
## 13. Summary
|
|
224
|
+
|
|
225
|
+
This RFC ensures:
|
|
226
|
+
- lifecycle semantics can be derived from current runtime behaviour
|
|
227
|
+
- existing action_type contract is preserved as source of truth
|
|
228
|
+
- no assumption of new instrumentation infrastructure is required
|
|
229
|
+
- real module responsibilities are accurately represented
|
|
230
|
+
- observability is understood as distributed rather than centralised
|
|
@@ -0,0 +1,277 @@
|
|
|
1
|
+
# RFC 007 — Actionability Resolution and Executable Target Selection
|
|
2
|
+
|
|
3
|
+
## 1. Summary
|
|
4
|
+
|
|
5
|
+
This RFC defines how the system resolves which discovered UI element should receive an action before dispatch.
|
|
6
|
+
|
|
7
|
+
It addresses ambiguity between:
|
|
8
|
+
- visible elements vs actionable elements
|
|
9
|
+
- leaf nodes vs clickable containers
|
|
10
|
+
- semantic targets vs coordinate fallbacks
|
|
11
|
+
- multiple candidate targets with uncertain executability
|
|
12
|
+
|
|
13
|
+
Goal:
|
|
14
|
+
Improve first-attempt action correctness by resolving the best executable target prior to action dispatch.
|
|
15
|
+
|
|
16
|
+
This RFC defines the `Resolved` stage semantics referenced in RFC 005 and operationalized by RFC 006.
|
|
17
|
+
It is grounded in the existing element-resolution flow and extends current resolution behavior rather than assuming a wholly new resolver architecture.
|
|
18
|
+
|
|
19
|
+
---
|
|
20
|
+
|
|
21
|
+
## 2. Problem Statement
|
|
22
|
+
|
|
23
|
+
Current interaction failures often arise before execution.
|
|
24
|
+
|
|
25
|
+
The agent may discover the intended UI concept, but not the correct executable target.
|
|
26
|
+
|
|
27
|
+
Examples:
|
|
28
|
+
- tapping label text instead of clickable container
|
|
29
|
+
- sliders not surfacing semantic handles
|
|
30
|
+
- generic Compose containers hiding true affordances
|
|
31
|
+
- multiple matching targets without ranking logic
|
|
32
|
+
|
|
33
|
+
Observed failure modes:
|
|
34
|
+
- false taps
|
|
35
|
+
- submit ambiguity
|
|
36
|
+
- coordinate guessing
|
|
37
|
+
- retry loops
|
|
38
|
+
- brittle fallback behavior
|
|
39
|
+
|
|
40
|
+
This is a target-resolution problem, not an execution problem.
|
|
41
|
+
|
|
42
|
+
---
|
|
43
|
+
|
|
44
|
+
## 3. Design Goals
|
|
45
|
+
|
|
46
|
+
Resolution MUST:
|
|
47
|
+
- Prefer executable targets over merely visible matches
|
|
48
|
+
- Reduce ambiguous target selection
|
|
49
|
+
- Support confidence-based ranking
|
|
50
|
+
- Build on existing runtime resolution surfaces before introducing new resolution metadata
|
|
51
|
+
- Use structural and semantic resolution signals
|
|
52
|
+
- Minimize coordinate fallback usage
|
|
53
|
+
- Integrate with verification expectations from RFC 005
|
|
54
|
+
|
|
55
|
+
---
|
|
56
|
+
|
|
57
|
+
## 4. Actionability Model
|
|
58
|
+
|
|
59
|
+
Candidate targets are evaluated using actionability signals.
|
|
60
|
+
|
|
61
|
+
### Structural signals
|
|
62
|
+
- clickable
|
|
63
|
+
- enabled
|
|
64
|
+
- focusable
|
|
65
|
+
- bounds
|
|
66
|
+
- parent action ownership
|
|
67
|
+
|
|
68
|
+
### Semantic signals
|
|
69
|
+
- control role
|
|
70
|
+
- label association
|
|
71
|
+
- affordance hints
|
|
72
|
+
- selectable or adjustable semantics
|
|
73
|
+
|
|
74
|
+
### Interaction signals
|
|
75
|
+
- reliable target patterns
|
|
76
|
+
- control-specific heuristics
|
|
77
|
+
- gesture compatibility
|
|
78
|
+
|
|
79
|
+
---
|
|
80
|
+
|
|
81
|
+
## 4.1 Current Runtime Resolution Surfaces
|
|
82
|
+
|
|
83
|
+
This RFC builds on current runtime resolution paths, including:
|
|
84
|
+
- `findElementHandler` for candidate discovery
|
|
85
|
+
- `_resolveActionableAncestor` for executable ancestor promotion
|
|
86
|
+
- `tapElementHandler` for resolved element dispatch
|
|
87
|
+
- `scrollToElementHandler` for scroll-mediated target acquisition
|
|
88
|
+
|
|
89
|
+
These existing handlers are the current implementation substrate for the Resolved stage.
|
|
90
|
+
This RFC extends and systematizes those behaviors; it does not assume replacement of those paths.
|
|
91
|
+
|
|
92
|
+
---
|
|
93
|
+
|
|
94
|
+
## 5. Target Candidate Ranking
|
|
95
|
+
|
|
96
|
+
When multiple targets match, candidates are ranked.
|
|
97
|
+
|
|
98
|
+
Illustrative confidence model:
|
|
99
|
+
|
|
100
|
+
resolution_confidence =
|
|
101
|
+
interactability_score
|
|
102
|
+
+ semantic_match_score
|
|
103
|
+
+ structural_reliability_score
|
|
104
|
+
|
|
105
|
+
Highest-confidence executable target is preferred.
|
|
106
|
+
|
|
107
|
+
The confidence model is illustrative and normative only at the rule-precedence level; implementations may use simpler heuristics while preserving resolution ordering guarantees. Any scoring mechanism is implementation-defined and may not be externally surfaced.
|
|
108
|
+
|
|
109
|
+
---
|
|
110
|
+
|
|
111
|
+
## 6. Resolution Rules
|
|
112
|
+
|
|
113
|
+
### Rule A — Prefer actionable containers over passive leaf nodes
|
|
114
|
+
|
|
115
|
+
Prefer:
|
|
116
|
+
- clickable container
|
|
117
|
+
|
|
118
|
+
Over:
|
|
119
|
+
- passive child text nodes
|
|
120
|
+
|
|
121
|
+
Example:
|
|
122
|
+
Prefer button container over "Generate Session" label node.
|
|
123
|
+
|
|
124
|
+
---
|
|
125
|
+
|
|
126
|
+
### Rule B — Prefer semantic controls over coordinate fallbacks
|
|
127
|
+
|
|
128
|
+
Use semantic control targets whenever possible.
|
|
129
|
+
|
|
130
|
+
Coordinate fallback only when:
|
|
131
|
+
- no semantic target exists
|
|
132
|
+
- adjustable control semantics absent
|
|
133
|
+
- fallback confidence acceptable
|
|
134
|
+
|
|
135
|
+
---
|
|
136
|
+
|
|
137
|
+
### Rule C — Prefer explicit affordance ownership
|
|
138
|
+
|
|
139
|
+
If child and parent differ:
|
|
140
|
+
prefer the node owning the action handler.
|
|
141
|
+
|
|
142
|
+
---
|
|
143
|
+
|
|
144
|
+
## 7. Ambiguity Handling
|
|
145
|
+
|
|
146
|
+
When multiple plausible targets remain:
|
|
147
|
+
|
|
148
|
+
System SHOULD:
|
|
149
|
+
- rank candidates
|
|
150
|
+
- expose confidence
|
|
151
|
+
- preserve alternates for fallback reasoning
|
|
152
|
+
|
|
153
|
+
Low-confidence targets may trigger:
|
|
154
|
+
- guarded execution
|
|
155
|
+
- alternate resolution attempt
|
|
156
|
+
- explicit recovery path
|
|
157
|
+
|
|
158
|
+
---
|
|
159
|
+
|
|
160
|
+
## 8. Adjustable Control Resolution
|
|
161
|
+
|
|
162
|
+
Special handling for:
|
|
163
|
+
- sliders
|
|
164
|
+
- steppers
|
|
165
|
+
- drag controls
|
|
166
|
+
|
|
167
|
+
Support:
|
|
168
|
+
- adjustable-role recognition
|
|
169
|
+
- control-bound discovery
|
|
170
|
+
- value-aware interaction targeting
|
|
171
|
+
|
|
172
|
+
This RFC defines target resolution.
|
|
173
|
+
Value-setting behavior remains governed by Adjustable Control Support.
|
|
174
|
+
|
|
175
|
+
---
|
|
176
|
+
|
|
177
|
+
## 9. Compose / Custom Control Resolution
|
|
178
|
+
|
|
179
|
+
Support derived actionability for:
|
|
180
|
+
- merged Compose semantics
|
|
181
|
+
- composite controls
|
|
182
|
+
- inferred interaction contracts
|
|
183
|
+
|
|
184
|
+
This RFC depends on and strengthens Better Compose / Custom Control Semantics.
|
|
185
|
+
|
|
186
|
+
---
|
|
187
|
+
|
|
188
|
+
## 10. Resolution Output Model (Current + Future Extension)
|
|
189
|
+
|
|
190
|
+
This model is non-normative and represents a progressive enrichment direction rather than a required runtime contract.
|
|
191
|
+
|
|
192
|
+
Resolution may evolve toward the following enriched output shape. Current runtime implementations may expose only resolved-target output plus limited supporting metadata.
|
|
193
|
+
|
|
194
|
+
At minimum, current implementations are expected to produce a resolved target. Confidence, alternates, fallback metadata, and reason codes may be introduced incrementally.
|
|
195
|
+
|
|
196
|
+
Illustrative future-complete shape:
|
|
197
|
+
|
|
198
|
+
{
|
|
199
|
+
"resolved_target": "...",
|
|
200
|
+
"confidence": 0.92,
|
|
201
|
+
"fallback_available": true,
|
|
202
|
+
"resolution_reason": "clickable_parent_preferred"
|
|
203
|
+
}
|
|
204
|
+
|
|
205
|
+
---
|
|
206
|
+
|
|
207
|
+
## 11. Verification Integration
|
|
208
|
+
|
|
209
|
+
Resolution is incomplete without verification expectations.
|
|
210
|
+
|
|
211
|
+
Resolved output should be derived directly from the existing element-resolution flow before adding richer metadata layers.
|
|
212
|
+
|
|
213
|
+
Resolved target should carry expected post-action signal.
|
|
214
|
+
|
|
215
|
+
Examples:
|
|
216
|
+
- navigation transition expected
|
|
217
|
+
- menu expected
|
|
218
|
+
- control value change expected
|
|
219
|
+
|
|
220
|
+
This feeds RFC 005 verification.
|
|
221
|
+
|
|
222
|
+
---
|
|
223
|
+
|
|
224
|
+
## 12. Success Metrics
|
|
225
|
+
|
|
226
|
+
Track:
|
|
227
|
+
- reduced false-tap failures
|
|
228
|
+
- lower retarget retries
|
|
229
|
+
- higher first-attempt action success
|
|
230
|
+
- reduced coordinate fallback usage
|
|
231
|
+
- improved custom control interaction success
|
|
232
|
+
|
|
233
|
+
---
|
|
234
|
+
|
|
235
|
+
## 13. Dependencies
|
|
236
|
+
|
|
237
|
+
Depends on:
|
|
238
|
+
- Stronger State Verification
|
|
239
|
+
- Richer Element Identity
|
|
240
|
+
- Wait and Synchronization Reliability
|
|
241
|
+
|
|
242
|
+
Strengthens:
|
|
243
|
+
- Adjustable Control Support
|
|
244
|
+
- Better Compose / Custom Control Semantics
|
|
245
|
+
|
|
246
|
+
---
|
|
247
|
+
|
|
248
|
+
## 14. Relationship to Other RFCs
|
|
249
|
+
|
|
250
|
+
RFC 005
|
|
251
|
+
Defines what Resolved means in lifecycle semantics.
|
|
252
|
+
|
|
253
|
+
RFC 006
|
|
254
|
+
Defines how runtime interprets action execution.
|
|
255
|
+
|
|
256
|
+
RFC 007
|
|
257
|
+
Defines how a target becomes Resolved.
|
|
258
|
+
Specifically, it formalizes the current discovery → actionable ancestor resolution → dispatch preparation flow already present in runtime handlers.
|
|
259
|
+
|
|
260
|
+
Together:
|
|
261
|
+
- RFC 005 — action correctness
|
|
262
|
+
- RFC 006 — runtime execution binding
|
|
263
|
+
- RFC 007 — executable target resolution
|
|
264
|
+
|
|
265
|
+
---
|
|
266
|
+
|
|
267
|
+
## 15. Summary
|
|
268
|
+
|
|
269
|
+
This RFC reduces failures caused by acting on the wrong thing, even when the right thing was discovered.
|
|
270
|
+
|
|
271
|
+
It improves:
|
|
272
|
+
- action precision
|
|
273
|
+
- control reliability
|
|
274
|
+
- Compose interaction robustness
|
|
275
|
+
- agent success with fewer retries
|
|
276
|
+
|
|
277
|
+
It addresses one of the largest remaining sources of interaction brittleness.
|
|
@@ -69,6 +69,8 @@ MUST be returned in this structure:
|
|
|
69
69
|
action_id: string,
|
|
70
70
|
timestamp: string,
|
|
71
71
|
action_type: string,
|
|
72
|
+
lifecycle_state?: 'pending_verification' | 'failed',
|
|
73
|
+
source_module?: 'server' | 'interact',
|
|
72
74
|
target: {
|
|
73
75
|
selector: object,
|
|
74
76
|
resolved: object | null
|
|
@@ -87,6 +89,8 @@ Rules:
|
|
|
87
89
|
|
|
88
90
|
- `success` is at the top level, not nested
|
|
89
91
|
- `target` contains only selection and resolution context
|
|
92
|
+
- `lifecycle_state` reflects the post-dispatch runtime state
|
|
93
|
+
- `source_module` identifies where the envelope was produced
|
|
90
94
|
- fingerprints represent observed pre/post UI state on a best-effort basis
|
|
91
95
|
- `failure_code` is optional but MUST be used when a structured mapping exists
|
|
92
96
|
|
package/docs/tools/interact.md
CHANGED
|
@@ -36,6 +36,8 @@ Example response:
|
|
|
36
36
|
"action_id": "tap_1710000000000_1",
|
|
37
37
|
"timestamp": "2026-04-23T08:00:00.000Z",
|
|
38
38
|
"action_type": "tap",
|
|
39
|
+
"lifecycle_state": "pending_verification",
|
|
40
|
+
"source_module": "server",
|
|
39
41
|
"target": { "selector": { "x": 100, "y": 200 }, "resolved": null },
|
|
40
42
|
"success": true,
|
|
41
43
|
"ui_fingerprint_before": "fp_before",
|
|
@@ -197,7 +199,14 @@ Output:
|
|
|
197
199
|
"telemetry": { "matchedIndex": 3, "matchedInteractable": true }
|
|
198
200
|
},
|
|
199
201
|
"score": 1.0,
|
|
200
|
-
"confidence": 1.0
|
|
202
|
+
"confidence": 1.0,
|
|
203
|
+
"resolution": {
|
|
204
|
+
"confidence": 1.0,
|
|
205
|
+
"reason": "exact_text_match",
|
|
206
|
+
"fallback_available": false,
|
|
207
|
+
"matched_count": 1,
|
|
208
|
+
"alternates": []
|
|
209
|
+
}
|
|
201
210
|
}
|
|
202
211
|
```
|
|
203
212
|
|
|
@@ -205,6 +214,7 @@ Notes:
|
|
|
205
214
|
|
|
206
215
|
- Best used when no precise selector is available yet.
|
|
207
216
|
- `tapCoordinates` are suitable for `tap` calls.
|
|
217
|
+
- `resolution` explains why the element was selected and may include fallback alternates when the runtime had to promote a parent or nearby control.
|
|
208
218
|
- Prefer `wait_for_ui` when you already know a deterministic selector and want a stable `elementId`.
|
|
209
219
|
|
|
210
220
|
---
|
|
@@ -333,6 +343,8 @@ Success response:
|
|
|
333
343
|
"action_id": "tap_element_1710000000000_1",
|
|
334
344
|
"timestamp": "2026-04-23T08:00:00.000Z",
|
|
335
345
|
"action_type": "tap_element",
|
|
346
|
+
"lifecycle_state": "pending_verification",
|
|
347
|
+
"source_module": "interact",
|
|
336
348
|
"target": {
|
|
337
349
|
"selector": { "elementId": "el_123" },
|
|
338
350
|
"resolved": {
|