npm - mobile-debug-mcp - Versions diffs - 0.26.1 → 0.26.3 - Mend

mobile-debug-mcp 0.26.1 → 0.26.3

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (22) hide show

package/AGENTS.md +3 -0
package/dist/interact/index.js +169 -102
package/dist/server/common.js +14 -1
package/dist/server/tool-definitions.js +22 -4
package/dist/server/tool-handlers.js +7 -0
package/dist/server-core.js +1 -1
package/docs/CHANGELOG.md +6 -0
package/docs/ROADMAP.md +242 -76
package/docs/rfcs/005-unified-action-execution-and-verification-model.md +216 -0
package/docs/rfcs/006-runtime-action-instrumentation-and-binding-layer.md +230 -0
package/docs/rfcs/007-actionability-resolution-and-executable-target-selection.md +277 -0
package/docs/specs/mcp-tooling-spec-v1.md +4 -0
package/docs/tools/interact.md +13 -1
package/package.json +1 -1
package/src/interact/index.ts +203 -107
package/src/server/common.ts +22 -1
package/src/server/tool-definitions.ts +22 -4
package/src/server/tool-handlers.ts +7 -0
package/src/server-core.ts +1 -1
package/src/types.ts +75 -0
package/test/unit/observe/find_element.test.ts +5 -0
package/test/unit/server/response_shapes.test.ts +8 -0

package/docs/ROADMAP.md CHANGED Viewed

@@ -4,11 +4,23 @@
 Ordered by:
 1. Impact on agent reliability
 2. Reduction in retries / brittleness
 3. Breadth of app coverage improved
 4. Implementation complexity vs payoff
+## Capability Status Definitions
+- **Completed**
+  Capability implemented and considered part of the baseline platform.
+- **Spec Ready**
+  Capability design or RFC is mature and implementation-ready, but not yet delivered.
+- **Planned**
+  Capability is prioritized on the roadmap, but detailed specification and/or implementation work remains ahead.
 ## Program-Level Success Metrics
 Track roadmap impact across releases using:
@@ -28,21 +40,22 @@ Higher task success with fewer retries.
 # Roadmap Status Overview
-## Completed Foundations
+## Completed Capabilities
-| Capability | Status | Notes |
-|-----------|--------|-------|
-| Stronger State Verification | Complete | Foundational verification layer shipped |
-| Richer Element Identity | Complete | Identity and selector confidence foundations shipped |
+- Stronger State Verification — Complete (Foundational verification layer shipped)
+- Richer Element Identity — Complete (Identity and selector confidence foundations shipped)
 ## Current Focus
 - Wait and Synchronization Reliability
+- Actionability Resolution
 ## Upcoming Work
-- Long Press Gesture
+- Adjustable Control Support
 - Better Compose / Custom Control Semantics
+- Signal-Oriented Diagnostic Filtering
+- Long Press Gesture
 ## Later Horizon
@@ -53,11 +66,10 @@ Higher task success with fewer retries.
 # Stronger State Verification
-## Why first
+## Rationale
 Highest leverage improvement.
-**Status:** Completed
-**Priority:** P1
+**Status:** Completed
 Most failures are not “can’t act,” they’re:
 - uncertain state
@@ -85,19 +97,18 @@ Very high.
 ## Dependencies
 Blocks or strengthens:
-- Priority 5 — Better Compose / Custom Control Semantics
-- Priority 6 — Pinch to Zoom verification
-- Priority 7 — Action Trace Correlation
+- Better Compose / Custom Control Semantics
+- Pinch to Zoom
+- Action Trace Correlation
 ---
 # Richer Element Identity
-## Why second
+## Rationale
 Directly reduces selector brittleness.
-**Status:** Completed
-**Priority:** P2
+**Status:** Completed
 Improves:
 - targeting stability
@@ -125,19 +136,18 @@ Very high.
 ## Dependencies
 Blocks or strengthens:
-- Priority 4 — Long Press targeting reliability
-- Priority 5 — Better Compose / Custom Control Semantics
-- Priority 6 — Pinch to Zoom targeting
+- Long Press Gesture
+- Better Compose / Custom Control Semantics
+- Pinch to Zoom
 ---
 # Wait and Synchronization Reliability
-## Why third
+## Rationale
 Reliable async synchronization is foundational for agent success and should precede gesture expansion.
-**Status:** Spec Ready
-**Priority:** P3
+**Status:** Spec Ready
 Addresses failures where agents:
 - skip UI waits after actions
@@ -150,6 +160,7 @@ Addresses failures where agents:
 - wait_for_ui_change (hierarchy diff based waiting)
 - Structured loading state detection
 - Snapshot revision / staleness metadata
+- Focused snapshot views / incremental snapshot diffs
 - Compose-aware wait robustness improvements
 ## Expected Impact
@@ -159,6 +170,7 @@ Very high.
 - wait_for_ui_change implemented
 - Loading state detection available for representative controls
 - Snapshot revision or staleness metadata exposed
+- Focused or diff-oriented snapshots validated in benchmark flows
 - UI-first sync guidance added to spec guardrails
 - In-place update waits validated on benchmark flows
@@ -170,22 +182,158 @@ Very high.
 ## Dependencies
 Depends on:
-- Priority 1 — Stronger State Verification
-- Priority 2 — Richer Element Identity
+- Stronger State Verification
+- Richer Element Identity
+Blocks or strengthens:
+- Better Compose / Custom Control Semantics
+- Action Trace Correlation
+---
+# Actionability Resolution
+## Rationale
+Reduces failures caused by interacting with discoverable but non-actionable UI nodes.
+**Status:** Planned
+Addresses cases where:
+- visible text is not the true click target
+- child nodes differ from actionable containers
+- affordance exists but handler ownership is ambiguous
+## Scope
+- Actionable container resolution
+- Executable-target preference rules
+- Actionability confidence metadata
+- Post-action state verification integration
+## Expected Impact
+High.
+## Exit Criteria
+- Actionable target resolution implemented
+- Preference rules defined for executable containers over leaf nodes
+- Actionability confidence surfaced
+- Benchmark flows show reduced false taps and submit ambiguity
+## Success Metrics
+- Reduced mis-targeted action failures
+- Lower retarget retries
+- Higher first-attempt action success
+## Dependencies
+Depends on:
+- Stronger State Verification
+- Richer Element Identity
+- Wait and Synchronization Reliability
+Blocks or strengthens:
+- Adjustable Control Support
+- Better Compose / Custom Control Semantics
+---
+# Adjustable Control Support
+## Rationale
+High leverage improvement for sliders and parameterized controls.
+**Status:** Planned
+Addresses friction around:
+- coordinate-calibrated slider interaction
+- snapping and quantized controls
+- weak state confirmation after adjustment
+## Scope
+New semantic control support:
+```json
+set_slider_value(target, value, tolerance?)
+```
+Includes:
+- semantic adjustable control manipulation
+- read-back verification loop
+- tolerance-aware value setting
+- fallback coordinate calibration only when needed
+## Expected Impact
+High.
+## Exit Criteria
+- Adjustable control primitive implemented
+- Verification loop reads and confirms resulting values
+- Tolerance model defined
+- Benchmark slider/custom control flows validated
+## Success Metrics
+- Higher custom control interaction success rate
+- Fewer retries adjusting controls
+- Reduced coordinate-guessing failures
+## Dependencies
+Depends on:
+- Stronger State Verification
+- Richer Element Identity
+- Actionability Resolution
 Blocks or strengthens:
-- Priority 5 — Better Compose / Custom Control Semantics
-- Priority 7 — Action Trace Correlation
+- Better Compose / Custom Control Semantics
+- Pinch to Zoom
+---
+# Signal-Oriented Diagnostic Filtering
+## Rationale
+Improves observability by separating causal signals from diagnostic noise.
+**Status:** Planned
+Addresses friction from:
+- noisy log streams
+- weak signal extraction
+- difficult action-to-signal attribution
+## Scope
+- Structured diagnostic classification
+- Noise filtering heuristics
+- Signal relevance scoring
+- App vs system event tagging
+## Expected Impact
+High.
+## Exit Criteria
+- Diagnostic signal classification model defined
+- Noise filtering available in representative flows
+- Relevant action-linked signals surfaced separately from background noise
+- Debug workflows validated with filtered signals
+## Success Metrics
+- Lower time-to-root-cause
+- Faster identification of relevant action signals
+- Reduced diagnostic ambiguity
+## Dependencies
+Depends on:
+- Stronger State Verification
+- Wait and Synchronization Reliability
+Strengthens:
+- Action Trace Correlation
 ---
 # Long Press Gesture
-## Why fourth
+## Rationale
 High utility, relatively low complexity.
-**Status:** Planned
-**Priority:** P4
+**Status:** Planned
 Unlocks many currently awkward interactions:
@@ -223,26 +371,26 @@ High.
 ## Dependencies
 Depends on:
-- Priority 2 — Richer Element Identity
+- Richer Element Identity
 Strengthens:
-- Priority 5 semantics interaction contracts
+- Better Compose / Custom Control Semantics
 ---
 # Better Compose / Custom Control Semantics
-## Why fifth
-Important, but strengthened by priorities 1–4 first.
+## Rationale
+Higher priority after agent feedback exposed custom control semantics as a core reliability gap, not a later optimization.
-**Status:** Planned
-**Priority:** P5
+**Status:** Spec Ready
 Semantics become more useful once:
 - identity is stronger
 - verification is stronger
 - gestures are richer
 - synchronization is more reliable
+- action execution is more precise
 ## Scope
 - Composite control traits
@@ -268,20 +416,21 @@ High.
 ## Dependencies
 Depends on:
-- Priority 1 — Stronger State Verification
-- Priority 2 — Richer Element Identity
-- Priority 3 — Wait and Synchronization Reliability
-- Priority 4 — Long Press
+- Stronger State Verification
+- Richer Element Identity
+- Wait and Synchronization Reliability
+- Actionability Resolution
+- Adjustable Control Support
+- Long Press Gesture
 ---
 # Pinch to Zoom
-## Why sixth
+## Rationale
 Valuable, but narrower than long press.
-**Status:** Planned
-**Priority:** P6
+**Status:** Planned
 Applies mainly to:
 - maps
@@ -317,19 +466,18 @@ Medium-high.
 ## Dependencies
 Depends on:
-- Priority 1 — Stronger State Verification
-- Priority 2 — Richer Element Identity
+- Stronger State Verification
+- Richer Element Identity
 ---
 # Action Trace Correlation
-## Why seventh
+## Rationale
 Very valuable for debugging,
 but less critical than improving control success first.
-**Status:** Planned
-**Priority:** P7
+**Status:** Planned
 Improves diagnosis more than task completion.
@@ -353,75 +501,93 @@ Medium-high.
 ## Dependencies
 Depends on:
-- Priority 1 — Stronger State Verification
-- Priority 2 — Richer Element Identity
-- Priority 3 — Wait and Synchronization Reliability
+- Stronger State Verification
+- Richer Element Identity
+- Wait and Synchronization Reliability
 ---
 # Roadmap Sequence
 ## Dependency Summary
-Foundational sequence:
-Layer 1 (Foundations)
-- Priority 1
-- Priority 2
+Foundation
+- Stronger State Verification
+- Richer Element Identity
+Synchronization & Actionability
+- Wait and Synchronization Reliability
+- Actionability Resolution
-Layer 2 (Synchronization)
-- Priority 3 depends on 1,2
+Control Precision & Observability
+- Adjustable Control Support
+- Signal-Oriented Diagnostic Filtering
-Layer 3 (Interaction Expansion)
-- Priority 4 depends on 2
-- Priority 5 depends on 1,2,3,4
-- Priority 6 depends on 1,2
+Interaction Expansion
+- Long Press Gesture
+- Better Compose / Custom Control Semantics
+- Pinch to Zoom
-Layer 4 (Observability)
-- Priority 7 depends on 1,2,3
+Deep Observability
+- Action Trace Correlation
 ## Wave 1 (Current Focus)
 - Stronger State Verification
 - Richer Element Identity
 - Wait and Synchronization Reliability
+- Actionability Resolution
 Focus:
 Make core loop more reliable.
 ---
-## Wave 2 (Expansion)
-- Long Press
-- Better Compose Semantics
+## Wave 2 (Control Precision + Diagnostics)
+- Adjustable Control Support
+- Better Compose / Custom Control Semantics
+- Signal-Oriented Diagnostic Filtering
+Focus:
+Improve control precision, custom control semantics, and signal observability.
+---
+## Wave 3 (Interaction Expansion)
+- Long Press Gesture
 Focus:
-Expand interaction capability.
+Expand interaction capability after core control reliability is improved.
 ---
-## Wave 3 (Advanced)
+## Wave 4 (Advanced Gestures + Deep Observability)
 - Pinch to Zoom
 - Action Trace Correlation
 Focus:
-Advanced gestures + observability.
+Advanced gestures + deep observability.
 ---
-# Capability Sequence
+# Roadmap Ordering
-Execution Order:
+Roadmap Ordering:
 1. Stronger State Verification
 2. Richer Element Identity
 3. Wait and Synchronization Reliability
-4. Long Press
-5. Better Compose / Custom Control Semantics
-6. Pinch to Zoom
-7. Action Trace Correlation
+4. Actionability Resolution
+5. Adjustable Control Support
+6. Better Compose / Custom Control Semantics
+7. Signal-Oriented Diagnostic Filtering
+8. Long Press Gesture
+9. Pinch to Zoom
+10. Action Trace Correlation
 Rationale:
-- Priorities 1–3 harden control, verification, and synchronization.
-- Priorities 4–6 expand interaction capability.
-- Priority 7 adds observability once control reliability matures.
+- Early roadmap items harden state, targeting, synchronization, action execution.
+- Mid roadmap items improve control precision and signal observability.
+- Later interaction-focused items expand interaction coverage.
+- Final observability work deepens debugging observability.
 ---

package/docs/rfcs/005-unified-action-execution-and-verification-model.md ADDED Viewed

@@ -0,0 +1,216 @@
+# RFC 005 — Unified Action Execution and Verification Model
+## 1. Summary
+This RFC defines a unified execution and verification model for all agent-driven UI actions.
+It standardises:
+- how actions are resolved
+- how they are executed
+- how outcomes are verified
+- how failures are classified
+- how observability signals are emitted
+The goal is to eliminate inconsistent per-feature execution logic and establish a single deterministic lifecycle for all UI interactions.
+---
+## 2. Problem Statement
+Current execution paths are fragmented across interaction types:
+- Tap / click actions rely on implicit success assumptions
+- Control adjustments (sliders, inputs) use ad-hoc verification logic
+- Gesture actions lack consistent post-execution validation
+- Action success is often inferred from indirect UI changes or logs
+This leads to:
+- ambiguous success states
+- inconsistent retries
+- weak failure classification
+- poor observability signal quality
+---
+## 3. Design Goals
+The model must:
+- Provide a single lifecycle for all actions
+- Separate target resolution from execution
+- Require explicit verification of state change
+- Standardise failure classification
+- Integrate with observability systems cleanly
+- Support both simple and parameterised actions
+---
+## 4. Action Lifecycle
+Every action MUST pass through the following states:
+1. Resolved
+   - A target has been identified via Actionability Resolution
+   - The target is executable (not just visible)
+2. Dispatched
+   - The action has been issued to the runtime layer
+3. Pending Verification
+   - Waiting for expected UI or state change
+4. Verified
+   - Expected outcome confirmed
+5. Failed
+   - Verification did not succeed within constraints
+---
+## 5. Action Types
+All actions are categorised into canonical types:
+- Navigation
+- Input
+- Selection
+- Gesture
+- Control Adjustment
+Each type may have type-specific execution adapters but MUST conform to the same lifecycle.
+---
+## 6. Execution Contract
+All actions MUST define:
+### 6.1 Target
+A resolved executable entity (not a UI label or text node)
+### 6.2 Intent
+The intended effect of the action
+### 6.3 Expected State Delta
+What must change in the UI or application state
+---
+## 7. Verification Model
+Verification MUST be explicit and deterministic.
+### 7.1 Verification Sources
+At least one must be used:
+- UI state diff
+- element property change
+- navigation change
+- value update (for controls)
+### 7.2 Timeout Behaviour
+- Each action defines a verification window
+- Failure occurs if no valid state delta is observed in time
+### 7.3 No Implicit Success
+Actions MUST NOT be considered successful without explicit verification.
+---
+## 8. Actionability Integration
+This model depends on Actionability Resolution:
+- Only resolved executable targets may be executed
+- Visible but non-actionable nodes are invalid targets
+- Execution is blocked if confidence is below threshold
+---
+## 9. Control Adjustment Model
+Control actions (sliders, inputs) are treated as parameterised actions:
+Example:
+set_slider_value(target, value, tolerance)
+Must include:
+- pre-state value
+- post-state verification
+- tolerance-aware validation
+Fallback to coordinate-based interaction is allowed only if semantic control resolution fails.
+---
+## 10. Observability Hooks
+Each action emits structured signals:
+- action_id
+- target_id
+- action_type
+- lifecycle_state transitions
+- verification result
+- failure reason (if applicable)
+These signals feed:
+- Signal-Oriented Diagnostic Filtering
+- Action Trace Correlation
+---
+## 11. Failure Classification
+Failures MUST be categorised:
+- Target resolution failure
+- Dispatch failure
+- Verification timeout
+- Unexpected state delta
+- No state change observed
+This enables consistent debugging and telemetry.
+---
+## 12. Relationship to Existing Roadmap
+This RFC provides the foundation for:
+- Actionability Resolution (#4)
+- Adjustable Control Support (#5)
+- Signal-Oriented Diagnostic Filtering (#6)
+It defines the shared execution substrate those capabilities plug into.
+---
+## 13. Scope Boundary
+This RFC defines the execution model and lifecycle semantics for agent-driven UI actions.
+- Action types referenced in this RFC correspond to the existing runtime `action_type` contract and do not redefine or extend the underlying taxonomy
+- Lifecycle signals described in this RFC are emitted by the runtime execution layer (defined in RFC 006), not by this specification directly
+It does NOT define:
+- runtime instrumentation details
+- how lifecycle states are emitted in code
+- mapping to specific source modules (e.g. src/server, src/interact)
+- tool schema implementation details
+- mapping between semantic action categories and runtime implementation modules (this is defined in RFC 006)
+Those concerns are delegated to a separate binding-layer RFC which defines how this model is implemented in the current system.
+---
+## 14. Summary
+This model enforces a single, verifiable lifecycle for all UI actions.
+It ensures:
+- deterministic execution
+- explicit verification
+- consistent failure handling
+- unified observability