mobile-debug-mcp 0.26.1 → 0.26.3
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/AGENTS.md +3 -0
- package/dist/interact/index.js +169 -102
- package/dist/server/common.js +14 -1
- package/dist/server/tool-definitions.js +22 -4
- package/dist/server/tool-handlers.js +7 -0
- package/dist/server-core.js +1 -1
- package/docs/CHANGELOG.md +6 -0
- package/docs/ROADMAP.md +242 -76
- package/docs/rfcs/005-unified-action-execution-and-verification-model.md +216 -0
- package/docs/rfcs/006-runtime-action-instrumentation-and-binding-layer.md +230 -0
- package/docs/rfcs/007-actionability-resolution-and-executable-target-selection.md +277 -0
- package/docs/specs/mcp-tooling-spec-v1.md +4 -0
- package/docs/tools/interact.md +13 -1
- package/package.json +1 -1
- package/src/interact/index.ts +203 -107
- package/src/server/common.ts +22 -1
- package/src/server/tool-definitions.ts +22 -4
- package/src/server/tool-handlers.ts +7 -0
- package/src/server-core.ts +1 -1
- package/src/types.ts +75 -0
- package/test/unit/observe/find_element.test.ts +5 -0
- package/test/unit/server/response_shapes.test.ts +8 -0
package/docs/ROADMAP.md
CHANGED
|
@@ -4,11 +4,23 @@
|
|
|
4
4
|
|
|
5
5
|
Ordered by:
|
|
6
6
|
|
|
7
|
+
|
|
7
8
|
1. Impact on agent reliability
|
|
8
9
|
2. Reduction in retries / brittleness
|
|
9
10
|
3. Breadth of app coverage improved
|
|
10
11
|
4. Implementation complexity vs payoff
|
|
11
12
|
|
|
13
|
+
## Capability Status Definitions
|
|
14
|
+
|
|
15
|
+
- **Completed**
|
|
16
|
+
Capability implemented and considered part of the baseline platform.
|
|
17
|
+
|
|
18
|
+
- **Spec Ready**
|
|
19
|
+
Capability design or RFC is mature and implementation-ready, but not yet delivered.
|
|
20
|
+
|
|
21
|
+
- **Planned**
|
|
22
|
+
Capability is prioritized on the roadmap, but detailed specification and/or implementation work remains ahead.
|
|
23
|
+
|
|
12
24
|
## Program-Level Success Metrics
|
|
13
25
|
Track roadmap impact across releases using:
|
|
14
26
|
|
|
@@ -28,21 +40,22 @@ Higher task success with fewer retries.
|
|
|
28
40
|
|
|
29
41
|
# Roadmap Status Overview
|
|
30
42
|
|
|
31
|
-
## Completed
|
|
43
|
+
## Completed Capabilities
|
|
32
44
|
|
|
33
|
-
|
|
34
|
-
|
|
35
|
-
| Stronger State Verification | Complete | Foundational verification layer shipped |
|
|
36
|
-
| Richer Element Identity | Complete | Identity and selector confidence foundations shipped |
|
|
45
|
+
- Stronger State Verification — Complete (Foundational verification layer shipped)
|
|
46
|
+
- Richer Element Identity — Complete (Identity and selector confidence foundations shipped)
|
|
37
47
|
|
|
38
48
|
## Current Focus
|
|
39
49
|
|
|
40
50
|
- Wait and Synchronization Reliability
|
|
51
|
+
- Actionability Resolution
|
|
41
52
|
|
|
42
53
|
## Upcoming Work
|
|
43
54
|
|
|
44
|
-
-
|
|
55
|
+
- Adjustable Control Support
|
|
45
56
|
- Better Compose / Custom Control Semantics
|
|
57
|
+
- Signal-Oriented Diagnostic Filtering
|
|
58
|
+
- Long Press Gesture
|
|
46
59
|
|
|
47
60
|
## Later Horizon
|
|
48
61
|
|
|
@@ -53,11 +66,10 @@ Higher task success with fewer retries.
|
|
|
53
66
|
|
|
54
67
|
# Stronger State Verification
|
|
55
68
|
|
|
56
|
-
##
|
|
69
|
+
## Rationale
|
|
57
70
|
Highest leverage improvement.
|
|
58
71
|
|
|
59
|
-
**Status:** Completed
|
|
60
|
-
**Priority:** P1
|
|
72
|
+
**Status:** Completed
|
|
61
73
|
|
|
62
74
|
Most failures are not “can’t act,” they’re:
|
|
63
75
|
- uncertain state
|
|
@@ -85,19 +97,18 @@ Very high.
|
|
|
85
97
|
|
|
86
98
|
## Dependencies
|
|
87
99
|
Blocks or strengthens:
|
|
88
|
-
-
|
|
89
|
-
-
|
|
90
|
-
-
|
|
100
|
+
- Better Compose / Custom Control Semantics
|
|
101
|
+
- Pinch to Zoom
|
|
102
|
+
- Action Trace Correlation
|
|
91
103
|
|
|
92
104
|
---
|
|
93
105
|
|
|
94
106
|
# Richer Element Identity
|
|
95
107
|
|
|
96
|
-
##
|
|
108
|
+
## Rationale
|
|
97
109
|
Directly reduces selector brittleness.
|
|
98
110
|
|
|
99
|
-
**Status:** Completed
|
|
100
|
-
**Priority:** P2
|
|
111
|
+
**Status:** Completed
|
|
101
112
|
|
|
102
113
|
Improves:
|
|
103
114
|
- targeting stability
|
|
@@ -125,19 +136,18 @@ Very high.
|
|
|
125
136
|
|
|
126
137
|
## Dependencies
|
|
127
138
|
Blocks or strengthens:
|
|
128
|
-
-
|
|
129
|
-
-
|
|
130
|
-
-
|
|
139
|
+
- Long Press Gesture
|
|
140
|
+
- Better Compose / Custom Control Semantics
|
|
141
|
+
- Pinch to Zoom
|
|
131
142
|
|
|
132
143
|
---
|
|
133
144
|
|
|
134
145
|
# Wait and Synchronization Reliability
|
|
135
146
|
|
|
136
|
-
##
|
|
147
|
+
## Rationale
|
|
137
148
|
Reliable async synchronization is foundational for agent success and should precede gesture expansion.
|
|
138
149
|
|
|
139
|
-
**Status:** Spec Ready
|
|
140
|
-
**Priority:** P3
|
|
150
|
+
**Status:** Spec Ready
|
|
141
151
|
|
|
142
152
|
Addresses failures where agents:
|
|
143
153
|
- skip UI waits after actions
|
|
@@ -150,6 +160,7 @@ Addresses failures where agents:
|
|
|
150
160
|
- wait_for_ui_change (hierarchy diff based waiting)
|
|
151
161
|
- Structured loading state detection
|
|
152
162
|
- Snapshot revision / staleness metadata
|
|
163
|
+
- Focused snapshot views / incremental snapshot diffs
|
|
153
164
|
- Compose-aware wait robustness improvements
|
|
154
165
|
|
|
155
166
|
## Expected Impact
|
|
@@ -159,6 +170,7 @@ Very high.
|
|
|
159
170
|
- wait_for_ui_change implemented
|
|
160
171
|
- Loading state detection available for representative controls
|
|
161
172
|
- Snapshot revision or staleness metadata exposed
|
|
173
|
+
- Focused or diff-oriented snapshots validated in benchmark flows
|
|
162
174
|
- UI-first sync guidance added to spec guardrails
|
|
163
175
|
- In-place update waits validated on benchmark flows
|
|
164
176
|
|
|
@@ -170,22 +182,158 @@ Very high.
|
|
|
170
182
|
|
|
171
183
|
## Dependencies
|
|
172
184
|
Depends on:
|
|
173
|
-
-
|
|
174
|
-
-
|
|
185
|
+
- Stronger State Verification
|
|
186
|
+
- Richer Element Identity
|
|
187
|
+
|
|
188
|
+
Blocks or strengthens:
|
|
189
|
+
- Better Compose / Custom Control Semantics
|
|
190
|
+
- Action Trace Correlation
|
|
191
|
+
|
|
192
|
+
---
|
|
193
|
+
|
|
194
|
+
# Actionability Resolution
|
|
195
|
+
|
|
196
|
+
## Rationale
|
|
197
|
+
Reduces failures caused by interacting with discoverable but non-actionable UI nodes.
|
|
198
|
+
|
|
199
|
+
**Status:** Planned
|
|
200
|
+
|
|
201
|
+
Addresses cases where:
|
|
202
|
+
- visible text is not the true click target
|
|
203
|
+
- child nodes differ from actionable containers
|
|
204
|
+
- affordance exists but handler ownership is ambiguous
|
|
205
|
+
|
|
206
|
+
## Scope
|
|
207
|
+
- Actionable container resolution
|
|
208
|
+
- Executable-target preference rules
|
|
209
|
+
- Actionability confidence metadata
|
|
210
|
+
- Post-action state verification integration
|
|
211
|
+
|
|
212
|
+
## Expected Impact
|
|
213
|
+
High.
|
|
214
|
+
|
|
215
|
+
## Exit Criteria
|
|
216
|
+
- Actionable target resolution implemented
|
|
217
|
+
- Preference rules defined for executable containers over leaf nodes
|
|
218
|
+
- Actionability confidence surfaced
|
|
219
|
+
- Benchmark flows show reduced false taps and submit ambiguity
|
|
220
|
+
|
|
221
|
+
## Success Metrics
|
|
222
|
+
- Reduced mis-targeted action failures
|
|
223
|
+
- Lower retarget retries
|
|
224
|
+
- Higher first-attempt action success
|
|
225
|
+
|
|
226
|
+
## Dependencies
|
|
227
|
+
Depends on:
|
|
228
|
+
- Stronger State Verification
|
|
229
|
+
- Richer Element Identity
|
|
230
|
+
- Wait and Synchronization Reliability
|
|
231
|
+
|
|
232
|
+
Blocks or strengthens:
|
|
233
|
+
- Adjustable Control Support
|
|
234
|
+
- Better Compose / Custom Control Semantics
|
|
235
|
+
|
|
236
|
+
---
|
|
237
|
+
|
|
238
|
+
# Adjustable Control Support
|
|
239
|
+
|
|
240
|
+
## Rationale
|
|
241
|
+
High leverage improvement for sliders and parameterized controls.
|
|
242
|
+
|
|
243
|
+
**Status:** Planned
|
|
244
|
+
|
|
245
|
+
Addresses friction around:
|
|
246
|
+
- coordinate-calibrated slider interaction
|
|
247
|
+
- snapping and quantized controls
|
|
248
|
+
- weak state confirmation after adjustment
|
|
249
|
+
|
|
250
|
+
## Scope
|
|
251
|
+
New semantic control support:
|
|
252
|
+
|
|
253
|
+
```json
|
|
254
|
+
set_slider_value(target, value, tolerance?)
|
|
255
|
+
```
|
|
256
|
+
|
|
257
|
+
Includes:
|
|
258
|
+
- semantic adjustable control manipulation
|
|
259
|
+
- read-back verification loop
|
|
260
|
+
- tolerance-aware value setting
|
|
261
|
+
- fallback coordinate calibration only when needed
|
|
262
|
+
|
|
263
|
+
## Expected Impact
|
|
264
|
+
High.
|
|
265
|
+
|
|
266
|
+
## Exit Criteria
|
|
267
|
+
- Adjustable control primitive implemented
|
|
268
|
+
- Verification loop reads and confirms resulting values
|
|
269
|
+
- Tolerance model defined
|
|
270
|
+
- Benchmark slider/custom control flows validated
|
|
271
|
+
|
|
272
|
+
## Success Metrics
|
|
273
|
+
- Higher custom control interaction success rate
|
|
274
|
+
- Fewer retries adjusting controls
|
|
275
|
+
- Reduced coordinate-guessing failures
|
|
276
|
+
|
|
277
|
+
## Dependencies
|
|
278
|
+
Depends on:
|
|
279
|
+
- Stronger State Verification
|
|
280
|
+
- Richer Element Identity
|
|
281
|
+
- Actionability Resolution
|
|
175
282
|
|
|
176
283
|
Blocks or strengthens:
|
|
177
|
-
-
|
|
178
|
-
-
|
|
284
|
+
- Better Compose / Custom Control Semantics
|
|
285
|
+
- Pinch to Zoom
|
|
286
|
+
|
|
287
|
+
---
|
|
288
|
+
|
|
289
|
+
# Signal-Oriented Diagnostic Filtering
|
|
290
|
+
|
|
291
|
+
## Rationale
|
|
292
|
+
Improves observability by separating causal signals from diagnostic noise.
|
|
293
|
+
|
|
294
|
+
**Status:** Planned
|
|
295
|
+
|
|
296
|
+
Addresses friction from:
|
|
297
|
+
- noisy log streams
|
|
298
|
+
- weak signal extraction
|
|
299
|
+
- difficult action-to-signal attribution
|
|
300
|
+
|
|
301
|
+
## Scope
|
|
302
|
+
- Structured diagnostic classification
|
|
303
|
+
- Noise filtering heuristics
|
|
304
|
+
- Signal relevance scoring
|
|
305
|
+
- App vs system event tagging
|
|
306
|
+
|
|
307
|
+
## Expected Impact
|
|
308
|
+
High.
|
|
309
|
+
|
|
310
|
+
## Exit Criteria
|
|
311
|
+
- Diagnostic signal classification model defined
|
|
312
|
+
- Noise filtering available in representative flows
|
|
313
|
+
- Relevant action-linked signals surfaced separately from background noise
|
|
314
|
+
- Debug workflows validated with filtered signals
|
|
315
|
+
|
|
316
|
+
## Success Metrics
|
|
317
|
+
- Lower time-to-root-cause
|
|
318
|
+
- Faster identification of relevant action signals
|
|
319
|
+
- Reduced diagnostic ambiguity
|
|
320
|
+
|
|
321
|
+
## Dependencies
|
|
322
|
+
Depends on:
|
|
323
|
+
- Stronger State Verification
|
|
324
|
+
- Wait and Synchronization Reliability
|
|
325
|
+
|
|
326
|
+
Strengthens:
|
|
327
|
+
- Action Trace Correlation
|
|
179
328
|
|
|
180
329
|
---
|
|
181
330
|
|
|
182
331
|
# Long Press Gesture
|
|
183
332
|
|
|
184
|
-
##
|
|
333
|
+
## Rationale
|
|
185
334
|
High utility, relatively low complexity.
|
|
186
335
|
|
|
187
|
-
**Status:** Planned
|
|
188
|
-
**Priority:** P4
|
|
336
|
+
**Status:** Planned
|
|
189
337
|
|
|
190
338
|
Unlocks many currently awkward interactions:
|
|
191
339
|
|
|
@@ -223,26 +371,26 @@ High.
|
|
|
223
371
|
|
|
224
372
|
## Dependencies
|
|
225
373
|
Depends on:
|
|
226
|
-
-
|
|
374
|
+
- Richer Element Identity
|
|
227
375
|
|
|
228
376
|
Strengthens:
|
|
229
|
-
-
|
|
377
|
+
- Better Compose / Custom Control Semantics
|
|
230
378
|
|
|
231
379
|
---
|
|
232
380
|
|
|
233
381
|
# Better Compose / Custom Control Semantics
|
|
234
382
|
|
|
235
|
-
##
|
|
236
|
-
|
|
383
|
+
## Rationale
|
|
384
|
+
Higher priority after agent feedback exposed custom control semantics as a core reliability gap, not a later optimization.
|
|
237
385
|
|
|
238
|
-
**Status:**
|
|
239
|
-
**Priority:** P5
|
|
386
|
+
**Status:** Spec Ready
|
|
240
387
|
|
|
241
388
|
Semantics become more useful once:
|
|
242
389
|
- identity is stronger
|
|
243
390
|
- verification is stronger
|
|
244
391
|
- gestures are richer
|
|
245
392
|
- synchronization is more reliable
|
|
393
|
+
- action execution is more precise
|
|
246
394
|
|
|
247
395
|
## Scope
|
|
248
396
|
- Composite control traits
|
|
@@ -268,20 +416,21 @@ High.
|
|
|
268
416
|
|
|
269
417
|
## Dependencies
|
|
270
418
|
Depends on:
|
|
271
|
-
-
|
|
272
|
-
-
|
|
273
|
-
-
|
|
274
|
-
-
|
|
419
|
+
- Stronger State Verification
|
|
420
|
+
- Richer Element Identity
|
|
421
|
+
- Wait and Synchronization Reliability
|
|
422
|
+
- Actionability Resolution
|
|
423
|
+
- Adjustable Control Support
|
|
424
|
+
- Long Press Gesture
|
|
275
425
|
|
|
276
426
|
---
|
|
277
427
|
|
|
278
428
|
# Pinch to Zoom
|
|
279
429
|
|
|
280
|
-
##
|
|
430
|
+
## Rationale
|
|
281
431
|
Valuable, but narrower than long press.
|
|
282
432
|
|
|
283
|
-
**Status:** Planned
|
|
284
|
-
**Priority:** P6
|
|
433
|
+
**Status:** Planned
|
|
285
434
|
|
|
286
435
|
Applies mainly to:
|
|
287
436
|
- maps
|
|
@@ -317,19 +466,18 @@ Medium-high.
|
|
|
317
466
|
|
|
318
467
|
## Dependencies
|
|
319
468
|
Depends on:
|
|
320
|
-
-
|
|
321
|
-
-
|
|
469
|
+
- Stronger State Verification
|
|
470
|
+
- Richer Element Identity
|
|
322
471
|
|
|
323
472
|
---
|
|
324
473
|
|
|
325
474
|
# Action Trace Correlation
|
|
326
475
|
|
|
327
|
-
##
|
|
476
|
+
## Rationale
|
|
328
477
|
Very valuable for debugging,
|
|
329
478
|
but less critical than improving control success first.
|
|
330
479
|
|
|
331
|
-
**Status:** Planned
|
|
332
|
-
**Priority:** P7
|
|
480
|
+
**Status:** Planned
|
|
333
481
|
|
|
334
482
|
Improves diagnosis more than task completion.
|
|
335
483
|
|
|
@@ -353,75 +501,93 @@ Medium-high.
|
|
|
353
501
|
|
|
354
502
|
## Dependencies
|
|
355
503
|
Depends on:
|
|
356
|
-
-
|
|
357
|
-
-
|
|
358
|
-
-
|
|
504
|
+
- Stronger State Verification
|
|
505
|
+
- Richer Element Identity
|
|
506
|
+
- Wait and Synchronization Reliability
|
|
359
507
|
|
|
360
508
|
---
|
|
361
509
|
|
|
362
510
|
# Roadmap Sequence
|
|
363
511
|
|
|
364
512
|
## Dependency Summary
|
|
365
|
-
Foundational sequence:
|
|
366
513
|
|
|
367
|
-
|
|
368
|
-
-
|
|
369
|
-
-
|
|
514
|
+
Foundation
|
|
515
|
+
- Stronger State Verification
|
|
516
|
+
- Richer Element Identity
|
|
517
|
+
|
|
518
|
+
Synchronization & Actionability
|
|
519
|
+
- Wait and Synchronization Reliability
|
|
520
|
+
- Actionability Resolution
|
|
370
521
|
|
|
371
|
-
|
|
372
|
-
-
|
|
522
|
+
Control Precision & Observability
|
|
523
|
+
- Adjustable Control Support
|
|
524
|
+
- Signal-Oriented Diagnostic Filtering
|
|
373
525
|
|
|
374
|
-
|
|
375
|
-
-
|
|
376
|
-
-
|
|
377
|
-
-
|
|
526
|
+
Interaction Expansion
|
|
527
|
+
- Long Press Gesture
|
|
528
|
+
- Better Compose / Custom Control Semantics
|
|
529
|
+
- Pinch to Zoom
|
|
378
530
|
|
|
379
|
-
|
|
380
|
-
-
|
|
531
|
+
Deep Observability
|
|
532
|
+
- Action Trace Correlation
|
|
381
533
|
|
|
382
534
|
## Wave 1 (Current Focus)
|
|
383
535
|
- Stronger State Verification
|
|
384
536
|
- Richer Element Identity
|
|
385
537
|
- Wait and Synchronization Reliability
|
|
538
|
+
- Actionability Resolution
|
|
386
539
|
|
|
387
540
|
Focus:
|
|
388
541
|
Make core loop more reliable.
|
|
389
542
|
|
|
390
543
|
---
|
|
391
544
|
|
|
392
|
-
## Wave 2 (
|
|
393
|
-
-
|
|
394
|
-
- Better Compose Semantics
|
|
545
|
+
## Wave 2 (Control Precision + Diagnostics)
|
|
546
|
+
- Adjustable Control Support
|
|
547
|
+
- Better Compose / Custom Control Semantics
|
|
548
|
+
- Signal-Oriented Diagnostic Filtering
|
|
549
|
+
|
|
550
|
+
Focus:
|
|
551
|
+
Improve control precision, custom control semantics, and signal observability.
|
|
552
|
+
|
|
553
|
+
---
|
|
554
|
+
|
|
555
|
+
## Wave 3 (Interaction Expansion)
|
|
556
|
+
- Long Press Gesture
|
|
395
557
|
|
|
396
558
|
Focus:
|
|
397
|
-
Expand interaction capability.
|
|
559
|
+
Expand interaction capability after core control reliability is improved.
|
|
398
560
|
|
|
399
561
|
---
|
|
400
562
|
|
|
401
|
-
## Wave
|
|
563
|
+
## Wave 4 (Advanced Gestures + Deep Observability)
|
|
402
564
|
- Pinch to Zoom
|
|
403
565
|
- Action Trace Correlation
|
|
404
566
|
|
|
405
567
|
Focus:
|
|
406
|
-
Advanced gestures + observability.
|
|
568
|
+
Advanced gestures + deep observability.
|
|
407
569
|
|
|
408
570
|
---
|
|
409
571
|
|
|
410
|
-
#
|
|
572
|
+
# Roadmap Ordering
|
|
411
573
|
|
|
412
|
-
|
|
574
|
+
Roadmap Ordering:
|
|
413
575
|
1. Stronger State Verification
|
|
414
576
|
2. Richer Element Identity
|
|
415
577
|
3. Wait and Synchronization Reliability
|
|
416
|
-
4.
|
|
417
|
-
5.
|
|
418
|
-
6.
|
|
419
|
-
7.
|
|
578
|
+
4. Actionability Resolution
|
|
579
|
+
5. Adjustable Control Support
|
|
580
|
+
6. Better Compose / Custom Control Semantics
|
|
581
|
+
7. Signal-Oriented Diagnostic Filtering
|
|
582
|
+
8. Long Press Gesture
|
|
583
|
+
9. Pinch to Zoom
|
|
584
|
+
10. Action Trace Correlation
|
|
420
585
|
|
|
421
586
|
Rationale:
|
|
422
|
-
-
|
|
423
|
-
-
|
|
424
|
-
-
|
|
587
|
+
- Early roadmap items harden state, targeting, synchronization, action execution.
|
|
588
|
+
- Mid roadmap items improve control precision and signal observability.
|
|
589
|
+
- Later interaction-focused items expand interaction coverage.
|
|
590
|
+
- Final observability work deepens debugging observability.
|
|
425
591
|
|
|
426
592
|
---
|
|
427
593
|
|
|
@@ -0,0 +1,216 @@
|
|
|
1
|
+
# RFC 005 — Unified Action Execution and Verification Model
|
|
2
|
+
|
|
3
|
+
## 1. Summary
|
|
4
|
+
|
|
5
|
+
This RFC defines a unified execution and verification model for all agent-driven UI actions.
|
|
6
|
+
|
|
7
|
+
It standardises:
|
|
8
|
+
- how actions are resolved
|
|
9
|
+
- how they are executed
|
|
10
|
+
- how outcomes are verified
|
|
11
|
+
- how failures are classified
|
|
12
|
+
- how observability signals are emitted
|
|
13
|
+
|
|
14
|
+
The goal is to eliminate inconsistent per-feature execution logic and establish a single deterministic lifecycle for all UI interactions.
|
|
15
|
+
|
|
16
|
+
---
|
|
17
|
+
|
|
18
|
+
## 2. Problem Statement
|
|
19
|
+
|
|
20
|
+
Current execution paths are fragmented across interaction types:
|
|
21
|
+
|
|
22
|
+
- Tap / click actions rely on implicit success assumptions
|
|
23
|
+
- Control adjustments (sliders, inputs) use ad-hoc verification logic
|
|
24
|
+
- Gesture actions lack consistent post-execution validation
|
|
25
|
+
- Action success is often inferred from indirect UI changes or logs
|
|
26
|
+
|
|
27
|
+
This leads to:
|
|
28
|
+
- ambiguous success states
|
|
29
|
+
- inconsistent retries
|
|
30
|
+
- weak failure classification
|
|
31
|
+
- poor observability signal quality
|
|
32
|
+
|
|
33
|
+
---
|
|
34
|
+
|
|
35
|
+
## 3. Design Goals
|
|
36
|
+
|
|
37
|
+
The model must:
|
|
38
|
+
|
|
39
|
+
- Provide a single lifecycle for all actions
|
|
40
|
+
- Separate target resolution from execution
|
|
41
|
+
- Require explicit verification of state change
|
|
42
|
+
- Standardise failure classification
|
|
43
|
+
- Integrate with observability systems cleanly
|
|
44
|
+
- Support both simple and parameterised actions
|
|
45
|
+
|
|
46
|
+
---
|
|
47
|
+
|
|
48
|
+
## 4. Action Lifecycle
|
|
49
|
+
|
|
50
|
+
Every action MUST pass through the following states:
|
|
51
|
+
|
|
52
|
+
1. Resolved
|
|
53
|
+
- A target has been identified via Actionability Resolution
|
|
54
|
+
- The target is executable (not just visible)
|
|
55
|
+
|
|
56
|
+
2. Dispatched
|
|
57
|
+
- The action has been issued to the runtime layer
|
|
58
|
+
|
|
59
|
+
3. Pending Verification
|
|
60
|
+
- Waiting for expected UI or state change
|
|
61
|
+
|
|
62
|
+
4. Verified
|
|
63
|
+
- Expected outcome confirmed
|
|
64
|
+
|
|
65
|
+
5. Failed
|
|
66
|
+
- Verification did not succeed within constraints
|
|
67
|
+
|
|
68
|
+
---
|
|
69
|
+
|
|
70
|
+
## 5. Action Types
|
|
71
|
+
|
|
72
|
+
All actions are categorised into canonical types:
|
|
73
|
+
|
|
74
|
+
- Navigation
|
|
75
|
+
- Input
|
|
76
|
+
- Selection
|
|
77
|
+
- Gesture
|
|
78
|
+
- Control Adjustment
|
|
79
|
+
|
|
80
|
+
Each type may have type-specific execution adapters but MUST conform to the same lifecycle.
|
|
81
|
+
|
|
82
|
+
---
|
|
83
|
+
|
|
84
|
+
## 6. Execution Contract
|
|
85
|
+
|
|
86
|
+
All actions MUST define:
|
|
87
|
+
|
|
88
|
+
### 6.1 Target
|
|
89
|
+
A resolved executable entity (not a UI label or text node)
|
|
90
|
+
|
|
91
|
+
### 6.2 Intent
|
|
92
|
+
The intended effect of the action
|
|
93
|
+
|
|
94
|
+
### 6.3 Expected State Delta
|
|
95
|
+
What must change in the UI or application state
|
|
96
|
+
|
|
97
|
+
---
|
|
98
|
+
|
|
99
|
+
## 7. Verification Model
|
|
100
|
+
|
|
101
|
+
Verification MUST be explicit and deterministic.
|
|
102
|
+
|
|
103
|
+
### 7.1 Verification Sources
|
|
104
|
+
At least one must be used:
|
|
105
|
+
|
|
106
|
+
- UI state diff
|
|
107
|
+
- element property change
|
|
108
|
+
- navigation change
|
|
109
|
+
- value update (for controls)
|
|
110
|
+
|
|
111
|
+
### 7.2 Timeout Behaviour
|
|
112
|
+
- Each action defines a verification window
|
|
113
|
+
- Failure occurs if no valid state delta is observed in time
|
|
114
|
+
|
|
115
|
+
### 7.3 No Implicit Success
|
|
116
|
+
Actions MUST NOT be considered successful without explicit verification.
|
|
117
|
+
|
|
118
|
+
---
|
|
119
|
+
|
|
120
|
+
## 8. Actionability Integration
|
|
121
|
+
|
|
122
|
+
This model depends on Actionability Resolution:
|
|
123
|
+
|
|
124
|
+
- Only resolved executable targets may be executed
|
|
125
|
+
- Visible but non-actionable nodes are invalid targets
|
|
126
|
+
- Execution is blocked if confidence is below threshold
|
|
127
|
+
|
|
128
|
+
---
|
|
129
|
+
|
|
130
|
+
## 9. Control Adjustment Model
|
|
131
|
+
|
|
132
|
+
Control actions (sliders, inputs) are treated as parameterised actions:
|
|
133
|
+
|
|
134
|
+
Example:
|
|
135
|
+
|
|
136
|
+
set_slider_value(target, value, tolerance)
|
|
137
|
+
|
|
138
|
+
Must include:
|
|
139
|
+
- pre-state value
|
|
140
|
+
- post-state verification
|
|
141
|
+
- tolerance-aware validation
|
|
142
|
+
|
|
143
|
+
Fallback to coordinate-based interaction is allowed only if semantic control resolution fails.
|
|
144
|
+
|
|
145
|
+
---
|
|
146
|
+
|
|
147
|
+
## 10. Observability Hooks
|
|
148
|
+
|
|
149
|
+
Each action emits structured signals:
|
|
150
|
+
|
|
151
|
+
- action_id
|
|
152
|
+
- target_id
|
|
153
|
+
- action_type
|
|
154
|
+
- lifecycle_state transitions
|
|
155
|
+
- verification result
|
|
156
|
+
- failure reason (if applicable)
|
|
157
|
+
|
|
158
|
+
These signals feed:
|
|
159
|
+
- Signal-Oriented Diagnostic Filtering
|
|
160
|
+
- Action Trace Correlation
|
|
161
|
+
|
|
162
|
+
---
|
|
163
|
+
|
|
164
|
+
## 11. Failure Classification
|
|
165
|
+
|
|
166
|
+
Failures MUST be categorised:
|
|
167
|
+
|
|
168
|
+
- Target resolution failure
|
|
169
|
+
- Dispatch failure
|
|
170
|
+
- Verification timeout
|
|
171
|
+
- Unexpected state delta
|
|
172
|
+
- No state change observed
|
|
173
|
+
|
|
174
|
+
This enables consistent debugging and telemetry.
|
|
175
|
+
|
|
176
|
+
---
|
|
177
|
+
|
|
178
|
+
## 12. Relationship to Existing Roadmap
|
|
179
|
+
|
|
180
|
+
This RFC provides the foundation for:
|
|
181
|
+
|
|
182
|
+
- Actionability Resolution (#4)
|
|
183
|
+
- Adjustable Control Support (#5)
|
|
184
|
+
- Signal-Oriented Diagnostic Filtering (#6)
|
|
185
|
+
|
|
186
|
+
It defines the shared execution substrate those capabilities plug into.
|
|
187
|
+
|
|
188
|
+
---
|
|
189
|
+
|
|
190
|
+
## 13. Scope Boundary
|
|
191
|
+
|
|
192
|
+
This RFC defines the execution model and lifecycle semantics for agent-driven UI actions.
|
|
193
|
+
|
|
194
|
+
- Action types referenced in this RFC correspond to the existing runtime `action_type` contract and do not redefine or extend the underlying taxonomy
|
|
195
|
+
- Lifecycle signals described in this RFC are emitted by the runtime execution layer (defined in RFC 006), not by this specification directly
|
|
196
|
+
|
|
197
|
+
It does NOT define:
|
|
198
|
+
- runtime instrumentation details
|
|
199
|
+
- how lifecycle states are emitted in code
|
|
200
|
+
- mapping to specific source modules (e.g. src/server, src/interact)
|
|
201
|
+
- tool schema implementation details
|
|
202
|
+
- mapping between semantic action categories and runtime implementation modules (this is defined in RFC 006)
|
|
203
|
+
|
|
204
|
+
Those concerns are delegated to a separate binding-layer RFC which defines how this model is implemented in the current system.
|
|
205
|
+
|
|
206
|
+
---
|
|
207
|
+
|
|
208
|
+
## 14. Summary
|
|
209
|
+
|
|
210
|
+
This model enforces a single, verifiable lifecycle for all UI actions.
|
|
211
|
+
|
|
212
|
+
It ensures:
|
|
213
|
+
- deterministic execution
|
|
214
|
+
- explicit verification
|
|
215
|
+
- consistent failure handling
|
|
216
|
+
- unified observability
|