mobile-debug-mcp 0.26.1 → 0.26.2
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/dist/interact/index.js +26 -33
- package/dist/server/common.js +14 -1
- package/dist/server/tool-definitions.js +19 -3
- package/dist/server/tool-handlers.js +7 -0
- package/dist/server-core.js +1 -1
- package/docs/CHANGELOG.md +3 -0
- package/docs/ROADMAP.md +239 -74
- package/docs/rfcs/005-unified-action-execution-and-verification-model.md +216 -0
- package/docs/rfcs/006-runtime-action-instrumentation-and-binding-layer.md +230 -0
- package/docs/specs/mcp-tooling-spec-v1.md +4 -0
- package/docs/tools/interact.md +4 -0
- package/package.json +1 -1
- package/src/interact/index.ts +27 -35
- package/src/server/common.ts +22 -1
- package/src/server/tool-definitions.ts +19 -3
- package/src/server/tool-handlers.ts +7 -0
- package/src/server-core.ts +1 -1
- package/src/types.ts +2 -0
- package/test/unit/server/response_shapes.test.ts +8 -0
|
@@ -0,0 +1,230 @@
|
|
|
1
|
+
# RFC 006 — Runtime Action Instrumentation & Binding Layer
|
|
2
|
+
|
|
3
|
+
## 1. Summary
|
|
4
|
+
|
|
5
|
+
This RFC defines how the execution model in RFC 005 is mapped onto the current runtime behaviour of the system.
|
|
6
|
+
|
|
7
|
+
It does not assume a new instrumentation system exists. Instead, it describes how lifecycle semantics are derived from existing execution flows, logs, module behaviour, and lightweight runtime metadata attached to action envelopes.
|
|
8
|
+
|
|
9
|
+
It specifies:
|
|
10
|
+
- how existing `action_type` values are interpreted under RFC 005 semantics
|
|
11
|
+
- how lifecycle states are inferred from current runtime execution
|
|
12
|
+
- how `src/server` and `src/interact` currently participate in execution
|
|
13
|
+
- how legacy and platform actions are incorporated into the model
|
|
14
|
+
|
|
15
|
+
This RFC is a runtime binding and normalisation layer over existing implementation behaviour.
|
|
16
|
+
|
|
17
|
+
---
|
|
18
|
+
|
|
19
|
+
## 2. Problem Statement
|
|
20
|
+
|
|
21
|
+
RFC 005 defines a unified execution lifecycle:
|
|
22
|
+
- Resolved
|
|
23
|
+
- Dispatched
|
|
24
|
+
- Pending Verification
|
|
25
|
+
- Verified
|
|
26
|
+
- Failed
|
|
27
|
+
|
|
28
|
+
However, the current system already contains:
|
|
29
|
+
- a concrete `action_type` execution model
|
|
30
|
+
- execution logic split across `src/server` and `src/interact`
|
|
31
|
+
- platform-specific actions (tap_element, type_text, press_back, start_app, restart_app, scroll_to_element)
|
|
32
|
+
- distributed logging and partial instrumentation within modules
|
|
33
|
+
|
|
34
|
+
There is no central instrumentation system and no explicit lifecycle emitter.
|
|
35
|
+
Instead, lifecycle meaning is inferred from runtime behaviour and the `lifecycle_state` / `source_module` fields now attached to action envelopes.
|
|
36
|
+
|
|
37
|
+
This results in:
|
|
38
|
+
- implicit execution state transitions
|
|
39
|
+
- distributed observability signals
|
|
40
|
+
- non-uniform traceability across actions
|
|
41
|
+
|
|
42
|
+
---
|
|
43
|
+
|
|
44
|
+
## 3. Design Goals
|
|
45
|
+
|
|
46
|
+
This layer MUST:
|
|
47
|
+
|
|
48
|
+
- Map existing runtime behaviour to RFC 005 lifecycle semantics
|
|
49
|
+
- Use existing `action_type` values as the authoritative execution taxonomy
|
|
50
|
+
- Derive lifecycle states from observable runtime transitions
|
|
51
|
+
- Reflect actual module responsibilities (not idealised separation)
|
|
52
|
+
- Work with existing logging and execution hooks
|
|
53
|
+
- Preserve compatibility with all current action implementations
|
|
54
|
+
|
|
55
|
+
---
|
|
56
|
+
|
|
57
|
+
## 4. Runtime Execution Flow (Observed)
|
|
58
|
+
|
|
59
|
+
Current observed execution flow:
|
|
60
|
+
|
|
61
|
+
UI Request
|
|
62
|
+
→ src/server (routing + validation)
|
|
63
|
+
→ src/interact (execution + platform dispatch)
|
|
64
|
+
→ platform layer
|
|
65
|
+
→ response handling + logs
|
|
66
|
+
→ optional state verification (where available)
|
|
67
|
+
|
|
68
|
+
Lifecycle states are derived from this flow rather than explicitly emitted.
|
|
69
|
+
|
|
70
|
+
---
|
|
71
|
+
|
|
72
|
+
## 5. Action Type Mapping (Current Runtime)
|
|
73
|
+
|
|
74
|
+
This RFC maps existing `action_type` values to RFC 005 semantics.
|
|
75
|
+
|
|
76
|
+
| action_type | RFC 005 Semantic Interpretation |
|
|
77
|
+
|------------|---------------------------------|
|
|
78
|
+
| tap | Selection |
|
|
79
|
+
| tap_element | Selection |
|
|
80
|
+
| type_text | Input |
|
|
81
|
+
| press_back | Navigation |
|
|
82
|
+
| start_app | System Action |
|
|
83
|
+
| restart_app | System Action |
|
|
84
|
+
| scroll_to_element | Navigation |
|
|
85
|
+
|
|
86
|
+
This table reflects the current runtime contract.
|
|
87
|
+
|
|
88
|
+
---
|
|
89
|
+
|
|
90
|
+
## 6. Lifecycle State Derivation
|
|
91
|
+
|
|
92
|
+
Lifecycle states are NOT explicitly emitted. They are inferred as follows:
|
|
93
|
+
|
|
94
|
+
### 6.1 Resolved
|
|
95
|
+
Inferred when:
|
|
96
|
+
- src/server accepts request
|
|
97
|
+
- action is validated and normalized
|
|
98
|
+
- action_id is assigned (or equivalent identifier exists)
|
|
99
|
+
|
|
100
|
+
---
|
|
101
|
+
|
|
102
|
+
### 6.2 Dispatched
|
|
103
|
+
Inferred when:
|
|
104
|
+
- control passes from src/server to src/interact
|
|
105
|
+
- execution call is issued to platform layer
|
|
106
|
+
|
|
107
|
+
---
|
|
108
|
+
|
|
109
|
+
### 6.3 Pending Verification
|
|
110
|
+
Inferred when:
|
|
111
|
+
- platform execution returns a result
|
|
112
|
+
- before any UI/state evaluation occurs
|
|
113
|
+
|
|
114
|
+
---
|
|
115
|
+
|
|
116
|
+
### 6.4 Verified / Failed
|
|
117
|
+
Inferred when:
|
|
118
|
+
- post-execution evaluation is performed (if available)
|
|
119
|
+
|
|
120
|
+
Rules:
|
|
121
|
+
- Verified = expected outcome observed in UI/state/log signals
|
|
122
|
+
- Failed = timeout, error, or mismatch in expected outcome
|
|
123
|
+
|
|
124
|
+
Where no formal verification exists, outcome is derived from best available signals (logs, UI diff, or absence of error).
|
|
125
|
+
|
|
126
|
+
---
|
|
127
|
+
|
|
128
|
+
## 7. Instrumentation Reality
|
|
129
|
+
|
|
130
|
+
There is no central instrumentation layer in the current system.
|
|
131
|
+
|
|
132
|
+
Instead:
|
|
133
|
+
- src/server emits partial logs during routing and validation
|
|
134
|
+
- src/interact emits execution logs and platform responses
|
|
135
|
+
- platform adapters may emit additional debugging information
|
|
136
|
+
- action envelopes now carry lightweight lifecycle metadata for post-dispatch state and source ownership
|
|
137
|
+
|
|
138
|
+
Lifecycle traceability is therefore assembled from distributed signals rather than a unified event system.
|
|
139
|
+
|
|
140
|
+
---
|
|
141
|
+
|
|
142
|
+
## 8. Module Responsibilities (Observed Behaviour)
|
|
143
|
+
|
|
144
|
+
### src/server
|
|
145
|
+
- receives action requests
|
|
146
|
+
- performs validation and normalization
|
|
147
|
+
- assigns identifiers where applicable
|
|
148
|
+
- routes actions to src/interact
|
|
149
|
+
- emits partial logs for request lifecycle
|
|
150
|
+
|
|
151
|
+
---
|
|
152
|
+
|
|
153
|
+
### src/interact
|
|
154
|
+
- executes platform-specific actions
|
|
155
|
+
- handles retries and fallback behaviours
|
|
156
|
+
- emits execution logs
|
|
157
|
+
- returns execution results
|
|
158
|
+
- may perform lightweight post-processing
|
|
159
|
+
|
|
160
|
+
---
|
|
161
|
+
|
|
162
|
+
## 9. Verification Reality
|
|
163
|
+
|
|
164
|
+
Verification is not a uniform system-wide layer.
|
|
165
|
+
|
|
166
|
+
It may occur via:
|
|
167
|
+
- UI state comparison (where available)
|
|
168
|
+
- log-based confirmation
|
|
169
|
+
- absence of error signals
|
|
170
|
+
- platform feedback
|
|
171
|
+
|
|
172
|
+
Verification outcomes are best-effort only where no formal verifier exists, and deterministic where reliable state signals or explicit evaluation paths are available.
|
|
173
|
+
|
|
174
|
+
---
|
|
175
|
+
|
|
176
|
+
## 10. Legacy and Special Actions
|
|
177
|
+
|
|
178
|
+
Actions such as:
|
|
179
|
+
- scroll_to_element
|
|
180
|
+
- start_app
|
|
181
|
+
- restart_app
|
|
182
|
+
- press_back
|
|
183
|
+
|
|
184
|
+
are fully supported in the runtime.
|
|
185
|
+
|
|
186
|
+
These actions:
|
|
187
|
+
- may bypass full lifecycle observability
|
|
188
|
+
- may not have explicit verification paths
|
|
189
|
+
- are interpreted using best-effort semantic mapping
|
|
190
|
+
|
|
191
|
+
---
|
|
192
|
+
|
|
193
|
+
## 11. Observability Model
|
|
194
|
+
|
|
195
|
+
Observability is currently distributed across:
|
|
196
|
+
- src/server logs
|
|
197
|
+
- src/interact logs
|
|
198
|
+
- platform debug output
|
|
199
|
+
- action envelope metadata
|
|
200
|
+
|
|
201
|
+
There is no unified event schema.
|
|
202
|
+
|
|
203
|
+
Lifecycle reconstruction requires correlation of:
|
|
204
|
+
- action_type
|
|
205
|
+
- timestamps
|
|
206
|
+
- execution boundaries
|
|
207
|
+
- error signals
|
|
208
|
+
|
|
209
|
+
---
|
|
210
|
+
|
|
211
|
+
## 12. Relationship to RFC 005
|
|
212
|
+
|
|
213
|
+
RFC 005 defines the ideal execution lifecycle semantics.
|
|
214
|
+
|
|
215
|
+
RFC 006 defines how those semantics are interpreted from the existing runtime system.
|
|
216
|
+
|
|
217
|
+
Together:
|
|
218
|
+
- RFC 005 = conceptual correctness model
|
|
219
|
+
- RFC 006 = runtime behavioural mapping layer
|
|
220
|
+
|
|
221
|
+
---
|
|
222
|
+
|
|
223
|
+
## 13. Summary
|
|
224
|
+
|
|
225
|
+
This RFC ensures:
|
|
226
|
+
- lifecycle semantics can be derived from current runtime behaviour
|
|
227
|
+
- existing action_type contract is preserved as source of truth
|
|
228
|
+
- no assumption of new instrumentation infrastructure is required
|
|
229
|
+
- real module responsibilities are accurately represented
|
|
230
|
+
- observability is understood as distributed rather than centralised
|
|
@@ -69,6 +69,8 @@ MUST be returned in this structure:
|
|
|
69
69
|
action_id: string,
|
|
70
70
|
timestamp: string,
|
|
71
71
|
action_type: string,
|
|
72
|
+
lifecycle_state?: 'pending_verification' | 'failed',
|
|
73
|
+
source_module?: 'server' | 'interact',
|
|
72
74
|
target: {
|
|
73
75
|
selector: object,
|
|
74
76
|
resolved: object | null
|
|
@@ -87,6 +89,8 @@ Rules:
|
|
|
87
89
|
|
|
88
90
|
- `success` is at the top level, not nested
|
|
89
91
|
- `target` contains only selection and resolution context
|
|
92
|
+
- `lifecycle_state` reflects the post-dispatch runtime state
|
|
93
|
+
- `source_module` identifies where the envelope was produced
|
|
90
94
|
- fingerprints represent observed pre/post UI state on a best-effort basis
|
|
91
95
|
- `failure_code` is optional but MUST be used when a structured mapping exists
|
|
92
96
|
|
package/docs/tools/interact.md
CHANGED
|
@@ -36,6 +36,8 @@ Example response:
|
|
|
36
36
|
"action_id": "tap_1710000000000_1",
|
|
37
37
|
"timestamp": "2026-04-23T08:00:00.000Z",
|
|
38
38
|
"action_type": "tap",
|
|
39
|
+
"lifecycle_state": "pending_verification",
|
|
40
|
+
"source_module": "server",
|
|
39
41
|
"target": { "selector": { "x": 100, "y": 200 }, "resolved": null },
|
|
40
42
|
"success": true,
|
|
41
43
|
"ui_fingerprint_before": "fp_before",
|
|
@@ -333,6 +335,8 @@ Success response:
|
|
|
333
335
|
"action_id": "tap_element_1710000000000_1",
|
|
334
336
|
"timestamp": "2026-04-23T08:00:00.000Z",
|
|
335
337
|
"action_type": "tap_element",
|
|
338
|
+
"lifecycle_state": "pending_verification",
|
|
339
|
+
"source_module": "interact",
|
|
336
340
|
"target": {
|
|
337
341
|
"selector": { "elementId": "el_123" },
|
|
338
342
|
"resolved": {
|
package/package.json
CHANGED
package/src/interact/index.ts
CHANGED
|
@@ -6,7 +6,7 @@ export { AndroidInteract, iOSInteract };
|
|
|
6
6
|
import { resolveTargetDevice } from '../utils/resolve-device.js'
|
|
7
7
|
import { ToolsObserve } from '../observe/index.js'
|
|
8
8
|
import { computeSnapshotSignature } from '../observe/snapshot-metadata.js'
|
|
9
|
-
import {
|
|
9
|
+
import { buildActionExecutionResult } from '../server/common.js'
|
|
10
10
|
import type {
|
|
11
11
|
ActionFailureCode,
|
|
12
12
|
ActionTargetResolved,
|
|
@@ -291,27 +291,25 @@ export class ToolsInteract {
|
|
|
291
291
|
}
|
|
292
292
|
|
|
293
293
|
private static _actionFailure(
|
|
294
|
-
actionId: string,
|
|
295
|
-
timestamp: string,
|
|
296
294
|
actionType: string,
|
|
297
295
|
selector: Record<string, unknown> | null,
|
|
298
296
|
resolved: ActionTargetResolved | null,
|
|
299
297
|
failureCode: ActionFailureCode,
|
|
300
298
|
retryable: boolean,
|
|
301
299
|
uiFingerprintBefore: string | null,
|
|
302
|
-
uiFingerprintAfter?: string | null
|
|
300
|
+
uiFingerprintAfter?: string | null,
|
|
301
|
+
sourceModule: 'server' | 'interact' = 'interact'
|
|
303
302
|
): TapElementResponse {
|
|
304
|
-
return {
|
|
305
|
-
|
|
306
|
-
|
|
307
|
-
|
|
308
|
-
target: { selector, resolved },
|
|
303
|
+
return buildActionExecutionResult({
|
|
304
|
+
actionType,
|
|
305
|
+
selector,
|
|
306
|
+
resolved,
|
|
309
307
|
success: false,
|
|
310
|
-
|
|
311
|
-
|
|
312
|
-
|
|
313
|
-
|
|
314
|
-
}
|
|
308
|
+
uiFingerprintBefore,
|
|
309
|
+
uiFingerprintAfter: uiFingerprintAfter ?? null,
|
|
310
|
+
failure: { failureCode, retryable },
|
|
311
|
+
sourceModule
|
|
312
|
+
})
|
|
315
313
|
}
|
|
316
314
|
|
|
317
315
|
static _resetResolvedUiElementsForTests() {
|
|
@@ -472,14 +470,11 @@ export class ToolsInteract {
|
|
|
472
470
|
}
|
|
473
471
|
|
|
474
472
|
static async tapElementHandler({ elementId }: { elementId: string }): Promise<TapElementResponse> {
|
|
475
|
-
const timestampMs = Date.now()
|
|
476
|
-
const timestamp = new Date(timestampMs).toISOString()
|
|
477
473
|
const actionType = 'tap_element'
|
|
478
|
-
const actionId = nextActionId(actionType, timestampMs)
|
|
479
474
|
const selector = { elementId }
|
|
480
475
|
const resolved = ToolsInteract._resolvedUiElements.get(elementId)
|
|
481
476
|
if (!resolved) {
|
|
482
|
-
return ToolsInteract._actionFailure(
|
|
477
|
+
return ToolsInteract._actionFailure(actionType, selector, null, 'STALE_REFERENCE', true, null)
|
|
483
478
|
}
|
|
484
479
|
|
|
485
480
|
const fingerprintBefore = await ToolsInteract._captureFingerprint(resolved.platform, resolved.deviceId)
|
|
@@ -491,22 +486,22 @@ export class ToolsInteract {
|
|
|
491
486
|
const currentMatch = ToolsInteract._findCurrentResolvedElement(elements, treePlatform, treeDeviceId, resolved)
|
|
492
487
|
|
|
493
488
|
if (!currentMatch) {
|
|
494
|
-
return ToolsInteract._actionFailure(
|
|
489
|
+
return ToolsInteract._actionFailure(actionType, selector, null, 'STALE_REFERENCE', true, fingerprintBefore)
|
|
495
490
|
}
|
|
496
491
|
|
|
497
492
|
const resolvedTarget = ToolsInteract._resolvedTargetFromElement(resolved.elementId, currentMatch.el, currentMatch.index)
|
|
498
493
|
|
|
499
494
|
if (!ToolsInteract._isVisibleElement(currentMatch.el)) {
|
|
500
|
-
return ToolsInteract._actionFailure(
|
|
495
|
+
return ToolsInteract._actionFailure(actionType, selector, resolvedTarget, 'ELEMENT_NOT_INTERACTABLE', true, fingerprintBefore)
|
|
501
496
|
}
|
|
502
497
|
|
|
503
498
|
if (currentMatch.el.enabled === false) {
|
|
504
|
-
return ToolsInteract._actionFailure(
|
|
499
|
+
return ToolsInteract._actionFailure(actionType, selector, resolvedTarget, 'ELEMENT_NOT_INTERACTABLE', true, fingerprintBefore)
|
|
505
500
|
}
|
|
506
501
|
|
|
507
502
|
const bounds = ToolsInteract._normalizeBounds(currentMatch.el.bounds) ?? resolved.bounds
|
|
508
503
|
if (!bounds || bounds[2] <= bounds[0] || bounds[3] <= bounds[1]) {
|
|
509
|
-
return ToolsInteract._actionFailure(
|
|
504
|
+
return ToolsInteract._actionFailure(actionType, selector, resolvedTarget, 'ELEMENT_NOT_INTERACTABLE', true, fingerprintBefore)
|
|
510
505
|
}
|
|
511
506
|
|
|
512
507
|
const x = Math.floor((bounds[0] + bounds[2]) / 2)
|
|
@@ -515,23 +510,20 @@ export class ToolsInteract {
|
|
|
515
510
|
|
|
516
511
|
if (!tapResult.success) {
|
|
517
512
|
const fingerprintAfterFailure = await ToolsInteract._captureFingerprint(resolved.platform, resolved.deviceId)
|
|
518
|
-
return ToolsInteract._actionFailure(
|
|
513
|
+
return ToolsInteract._actionFailure(actionType, selector, resolvedTarget, 'UNKNOWN', false, fingerprintBefore, fingerprintAfterFailure)
|
|
519
514
|
}
|
|
520
515
|
|
|
521
516
|
const fingerprintAfter = await ToolsInteract._captureFingerprint(resolved.platform, resolved.deviceId)
|
|
522
|
-
return {
|
|
523
|
-
|
|
524
|
-
|
|
525
|
-
|
|
526
|
-
|
|
527
|
-
target: {
|
|
528
|
-
selector,
|
|
529
|
-
resolved: resolvedTarget
|
|
530
|
-
},
|
|
517
|
+
return buildActionExecutionResult({
|
|
518
|
+
actionType,
|
|
519
|
+
device: tree?.device,
|
|
520
|
+
selector,
|
|
521
|
+
resolved: resolvedTarget,
|
|
531
522
|
success: true,
|
|
532
|
-
|
|
533
|
-
|
|
534
|
-
|
|
523
|
+
uiFingerprintBefore: fingerprintBefore,
|
|
524
|
+
uiFingerprintAfter: fingerprintAfter,
|
|
525
|
+
sourceModule: 'interact'
|
|
526
|
+
})
|
|
535
527
|
}
|
|
536
528
|
|
|
537
529
|
static async swipeHandler({ platform = 'android', x1, y1, x2, y2, duration, deviceId }: { platform?: 'android' | 'ios', x1: number, y1: number, x2: number, y2: number, duration: number, deviceId?: string }) {
|
package/src/server/common.ts
CHANGED
|
@@ -112,6 +112,23 @@ export function inferScrollFailure(message: string | undefined): { failureCode:
|
|
|
112
112
|
return { failureCode: 'UNKNOWN', retryable: false }
|
|
113
113
|
}
|
|
114
114
|
|
|
115
|
+
const ACTION_LIFECYCLE_STATE_BY_OUTCOME = {
|
|
116
|
+
success: 'pending_verification',
|
|
117
|
+
failure: 'failed'
|
|
118
|
+
} as const
|
|
119
|
+
|
|
120
|
+
export function determineActionLifecycleState({
|
|
121
|
+
success,
|
|
122
|
+
failure
|
|
123
|
+
}: {
|
|
124
|
+
success: boolean
|
|
125
|
+
failure?: { failureCode: ActionFailureCode; retryable: boolean }
|
|
126
|
+
}): NonNullable<ActionExecutionResult['lifecycle_state']> {
|
|
127
|
+
if (failure) return ACTION_LIFECYCLE_STATE_BY_OUTCOME.failure
|
|
128
|
+
if (success) return ACTION_LIFECYCLE_STATE_BY_OUTCOME.success
|
|
129
|
+
return ACTION_LIFECYCLE_STATE_BY_OUTCOME.success
|
|
130
|
+
}
|
|
131
|
+
|
|
115
132
|
export function buildActionExecutionResult({
|
|
116
133
|
actionType,
|
|
117
134
|
device,
|
|
@@ -121,7 +138,8 @@ export function buildActionExecutionResult({
|
|
|
121
138
|
uiFingerprintBefore,
|
|
122
139
|
uiFingerprintAfter,
|
|
123
140
|
failure,
|
|
124
|
-
details
|
|
141
|
+
details,
|
|
142
|
+
sourceModule
|
|
125
143
|
}: {
|
|
126
144
|
actionType: string
|
|
127
145
|
device?: ActionExecutionResult['device']
|
|
@@ -132,6 +150,7 @@ export function buildActionExecutionResult({
|
|
|
132
150
|
uiFingerprintAfter: string | null
|
|
133
151
|
failure?: { failureCode: ActionFailureCode; retryable: boolean }
|
|
134
152
|
details?: Record<string, unknown>
|
|
153
|
+
sourceModule: 'server' | 'interact'
|
|
135
154
|
}): ActionExecutionResult {
|
|
136
155
|
const timestampMs = Date.now()
|
|
137
156
|
const timestamp = new Date(timestampMs).toISOString()
|
|
@@ -139,6 +158,8 @@ export function buildActionExecutionResult({
|
|
|
139
158
|
action_id: nextActionId(actionType, timestampMs),
|
|
140
159
|
timestamp,
|
|
141
160
|
action_type: actionType,
|
|
161
|
+
lifecycle_state: determineActionLifecycleState({ success, failure }),
|
|
162
|
+
source_module: sourceModule,
|
|
142
163
|
...(device ? { device } : {}),
|
|
143
164
|
target: {
|
|
144
165
|
selector,
|
|
@@ -11,7 +11,9 @@ Inputs:
|
|
|
11
11
|
|
|
12
12
|
Output Structure:
|
|
13
13
|
- action_id, timestamp (ISO 8601), action_type
|
|
14
|
-
-
|
|
14
|
+
- lifecycle_state: post-dispatch lifecycle state (pending_verification or failed)
|
|
15
|
+
- source_module: runtime source of the action envelope
|
|
16
|
+
- target.selector = { appId }
|
|
15
17
|
- success = true when launch was dispatched successfully
|
|
16
18
|
- failure_code/retryable when launch dispatch fails
|
|
17
19
|
- ui_fingerprint_before/ui_fingerprint_after when available
|
|
@@ -84,7 +86,9 @@ Inputs:
|
|
|
84
86
|
|
|
85
87
|
Output Structure:
|
|
86
88
|
- action_id, timestamp (ISO 8601), action_type
|
|
87
|
-
-
|
|
89
|
+
- lifecycle_state: post-dispatch lifecycle state (pending_verification or failed)
|
|
90
|
+
- source_module: runtime source of the action envelope
|
|
91
|
+
- target.selector = { appId }
|
|
88
92
|
- success = true when the restart command completed
|
|
89
93
|
- failure_code/retryable when restart dispatch fails
|
|
90
94
|
- ui_fingerprint_before/ui_fingerprint_after when available
|
|
@@ -617,7 +621,9 @@ Inputs:
|
|
|
617
621
|
|
|
618
622
|
Output Structure:
|
|
619
623
|
- action_id, timestamp (ISO 8601), action_type
|
|
620
|
-
-
|
|
624
|
+
- lifecycle_state: post-dispatch lifecycle state (pending_verification or failed)
|
|
625
|
+
- source_module: runtime source of the action envelope
|
|
626
|
+
- target.selector = { x, y }
|
|
621
627
|
- success = true when the tap was dispatched
|
|
622
628
|
- failure_code/retryable when dispatch fails
|
|
623
629
|
- ui_fingerprint_before/ui_fingerprint_after when available
|
|
@@ -673,6 +679,8 @@ Output Structure:
|
|
|
673
679
|
- action_id: unique timestamp-based action identifier
|
|
674
680
|
- timestamp: ISO 8601 timestamp for the action attempt
|
|
675
681
|
- action_type: "tap_element"
|
|
682
|
+
- lifecycle_state: post-dispatch lifecycle state (pending_verification or failed)
|
|
683
|
+
- source_module: runtime source of the action envelope
|
|
676
684
|
- target.selector: original target handle ({ elementId })
|
|
677
685
|
- target.resolved: minimal resolved element info used for the tap
|
|
678
686
|
- success: true when the tap was dispatched
|
|
@@ -725,6 +733,8 @@ Inputs:
|
|
|
725
733
|
|
|
726
734
|
Output Structure:
|
|
727
735
|
- action_id, timestamp (ISO 8601), action_type
|
|
736
|
+
- lifecycle_state: post-dispatch lifecycle state (pending_verification or failed)
|
|
737
|
+
- source_module: runtime source of the action envelope
|
|
728
738
|
- target.selector = { x1, y1, x2, y2, duration }
|
|
729
739
|
- success = true when the swipe was dispatched
|
|
730
740
|
- failure_code/retryable when dispatch fails
|
|
@@ -777,6 +787,8 @@ Inputs:
|
|
|
777
787
|
|
|
778
788
|
Output Structure:
|
|
779
789
|
- action_id, timestamp (ISO 8601), action_type
|
|
790
|
+
- lifecycle_state: post-dispatch lifecycle state (pending_verification or failed)
|
|
791
|
+
- source_module: runtime source of the action envelope
|
|
780
792
|
- target.selector = original selector
|
|
781
793
|
- target.resolved = minimal resolved element info when found
|
|
782
794
|
- success = true when scrolling produced a visible target element
|
|
@@ -831,6 +843,8 @@ Inputs:
|
|
|
831
843
|
|
|
832
844
|
Output Structure:
|
|
833
845
|
- action_id, timestamp (ISO 8601), action_type
|
|
846
|
+
- lifecycle_state: post-dispatch lifecycle state (pending_verification or failed)
|
|
847
|
+
- source_module: runtime source of the action envelope
|
|
834
848
|
- target.selector = { text }
|
|
835
849
|
- success = true when text input was dispatched
|
|
836
850
|
- failure_code/retryable when dispatch fails
|
|
@@ -880,6 +894,8 @@ Inputs:
|
|
|
880
894
|
|
|
881
895
|
Output Structure:
|
|
882
896
|
- action_id, timestamp (ISO 8601), action_type
|
|
897
|
+
- lifecycle_state: post-dispatch lifecycle state (pending_verification or failed)
|
|
898
|
+
- source_module: runtime source of the action envelope
|
|
883
899
|
- target.selector = { key: "back" }
|
|
884
900
|
- success = true when the back action was dispatched
|
|
885
901
|
- failure_code/retryable when dispatch fails
|
|
@@ -47,6 +47,7 @@ async function handleStartApp(args: ToolCallArgs) {
|
|
|
47
47
|
const uiFingerprintAfter = await captureActionFingerprint(platform, deviceId)
|
|
48
48
|
return wrapResponse(buildActionExecutionResult({
|
|
49
49
|
actionType: 'start_app',
|
|
50
|
+
sourceModule: 'server',
|
|
50
51
|
device: res.device,
|
|
51
52
|
selector: { appId },
|
|
52
53
|
success: !!res.appStarted,
|
|
@@ -82,6 +83,7 @@ async function handleRestartApp(args: ToolCallArgs) {
|
|
|
82
83
|
const uiFingerprintAfter = await captureActionFingerprint(platform, deviceId)
|
|
83
84
|
return wrapResponse(buildActionExecutionResult({
|
|
84
85
|
actionType: 'restart_app',
|
|
86
|
+
sourceModule: 'server',
|
|
85
87
|
device: res.device,
|
|
86
88
|
selector: { appId },
|
|
87
89
|
success: !!res.appRestarted,
|
|
@@ -319,6 +321,7 @@ async function handleTap(args: ToolCallArgs) {
|
|
|
319
321
|
const uiFingerprintAfter = await captureActionFingerprint(platform, deviceId)
|
|
320
322
|
return wrapResponse(buildActionExecutionResult({
|
|
321
323
|
actionType: 'tap',
|
|
324
|
+
sourceModule: 'server',
|
|
322
325
|
selector: { x, y },
|
|
323
326
|
success: !!res.success,
|
|
324
327
|
uiFingerprintBefore,
|
|
@@ -348,6 +351,7 @@ async function handleSwipe(args: ToolCallArgs) {
|
|
|
348
351
|
const uiFingerprintAfter = await captureActionFingerprint(platform, deviceId)
|
|
349
352
|
return wrapResponse(buildActionExecutionResult({
|
|
350
353
|
actionType: 'swipe',
|
|
354
|
+
sourceModule: 'server',
|
|
351
355
|
selector: { x1, y1, x2, y2, duration },
|
|
352
356
|
success: !!res.success,
|
|
353
357
|
uiFingerprintBefore,
|
|
@@ -369,6 +373,7 @@ async function handleScrollToElement(args: ToolCallArgs) {
|
|
|
369
373
|
const uiFingerprintAfter = await captureActionFingerprint(platform, deviceId)
|
|
370
374
|
return wrapResponse(buildActionExecutionResult({
|
|
371
375
|
actionType: 'scroll_to_element',
|
|
376
|
+
sourceModule: 'server',
|
|
372
377
|
selector: selector ?? null,
|
|
373
378
|
resolved: res?.success && res?.element ? {
|
|
374
379
|
elementId: null,
|
|
@@ -395,6 +400,7 @@ async function handleTypeText(args: ToolCallArgs) {
|
|
|
395
400
|
const uiFingerprintAfter = await captureActionFingerprint('android', deviceId)
|
|
396
401
|
return wrapResponse(buildActionExecutionResult({
|
|
397
402
|
actionType: 'type_text',
|
|
403
|
+
sourceModule: 'server',
|
|
398
404
|
selector: { text },
|
|
399
405
|
success: !!res.success,
|
|
400
406
|
uiFingerprintBefore,
|
|
@@ -411,6 +417,7 @@ async function handlePressBack(args: ToolCallArgs) {
|
|
|
411
417
|
const uiFingerprintAfter = await captureActionFingerprint('android', deviceId)
|
|
412
418
|
return wrapResponse(buildActionExecutionResult({
|
|
413
419
|
actionType: 'press_back',
|
|
420
|
+
sourceModule: 'server',
|
|
414
421
|
selector: { key: 'back' },
|
|
415
422
|
success: !!res.success,
|
|
416
423
|
uiFingerprintBefore,
|
package/src/server-core.ts
CHANGED
package/src/types.ts
CHANGED
|
@@ -258,6 +258,8 @@ export interface ActionExecutionResult {
|
|
|
258
258
|
action_id: string;
|
|
259
259
|
timestamp: string;
|
|
260
260
|
action_type: string;
|
|
261
|
+
lifecycle_state?: 'pending_verification' | 'failed';
|
|
262
|
+
source_module?: 'server' | 'interact';
|
|
261
263
|
device?: DeviceInfo;
|
|
262
264
|
target: {
|
|
263
265
|
selector: Record<string, unknown> | null;
|
|
@@ -61,6 +61,8 @@ async function run() {
|
|
|
61
61
|
action_id: 'tap_element_1',
|
|
62
62
|
timestamp: '2026-04-23T08:00:00.000Z',
|
|
63
63
|
action_type: 'tap_element',
|
|
64
|
+
lifecycle_state: 'pending_verification',
|
|
65
|
+
source_module: 'interact',
|
|
64
66
|
target: {
|
|
65
67
|
selector: { elementId: 'el_ready' },
|
|
66
68
|
resolved: { elementId: 'el_ready', text: 'Ready', resource_id: null, accessibility_id: null, class: 'Button', bounds: [0, 0, 10, 10], index: 0 }
|
|
@@ -74,6 +76,8 @@ async function run() {
|
|
|
74
76
|
const tapElementPayload = JSON.parse((tapElementResponse as any).content[0].text)
|
|
75
77
|
assert.strictEqual(tapElementPayload.success, true)
|
|
76
78
|
assert.strictEqual(tapElementPayload.action_type, 'tap_element')
|
|
79
|
+
assert.strictEqual(tapElementPayload.lifecycle_state, 'pending_verification')
|
|
80
|
+
assert.strictEqual(tapElementPayload.source_module, 'interact')
|
|
77
81
|
assert.match(tapElementPayload.timestamp, /^\d{4}-\d{2}-\d{2}T/)
|
|
78
82
|
assert.strictEqual(tapElementPayload.target.resolved.elementId, 'el_ready')
|
|
79
83
|
assert.strictEqual(tapElementPayload.ui_fingerprint_before, 'fp_before')
|
|
@@ -84,6 +88,8 @@ async function run() {
|
|
|
84
88
|
const tapPayload = JSON.parse((tapResponse as any).content[0].text)
|
|
85
89
|
assert.strictEqual(tapPayload.success, true)
|
|
86
90
|
assert.strictEqual(tapPayload.action_type, 'tap')
|
|
91
|
+
assert.strictEqual(tapPayload.lifecycle_state, 'pending_verification')
|
|
92
|
+
assert.strictEqual(tapPayload.source_module, 'server')
|
|
87
93
|
assert.match(tapPayload.timestamp, /^\d{4}-\d{2}-\d{2}T/)
|
|
88
94
|
assert.deepStrictEqual(tapPayload.target.selector, { x: 1, y: 2 })
|
|
89
95
|
assert.strictEqual(tapPayload.ui_fingerprint_before, 'fp_mock')
|
|
@@ -107,6 +113,8 @@ async function run() {
|
|
|
107
113
|
const startAppPayload = JSON.parse((startAppResponse as any).content[0].text)
|
|
108
114
|
assert.strictEqual(startAppPayload.success, true)
|
|
109
115
|
assert.strictEqual(startAppPayload.action_type, 'start_app')
|
|
116
|
+
assert.strictEqual(startAppPayload.lifecycle_state, 'pending_verification')
|
|
117
|
+
assert.strictEqual(startAppPayload.source_module, 'server')
|
|
110
118
|
assert.match(startAppPayload.timestamp, /^\d{4}-\d{2}-\d{2}T/)
|
|
111
119
|
assert.strictEqual(startAppPayload.device.id, 'emulator-5554')
|
|
112
120
|
assert.deepStrictEqual(startAppPayload.target.selector, { appId: 'com.example.app' })
|