npm - mobile-debug-mcp - Versions diffs - 0.25.1 → 0.26.1 - Mend

mobile-debug-mcp 0.25.1 → 0.26.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (35) hide show

package/dist/interact/classify.js +48 -11
package/dist/interact/index.js +113 -0
package/dist/observe/android.js +10 -1
package/dist/observe/index.js +19 -1
package/dist/observe/ios.js +15 -1
package/dist/observe/snapshot-metadata.js +88 -0
package/dist/server/tool-definitions.js +49 -14
package/dist/server/tool-handlers.js +12 -0
package/dist/server-core.js +1 -1
package/docs/CHANGELOG.md +9 -0
package/docs/ROADMAP.md +66 -38
package/docs/rfcs/003-wait-and-synchronization-reliability.md +296 -0
package/docs/rfcs/004-action-verification-routing.md +342 -0
package/docs/specs/mcp-tooling-spec-v1.md +11 -3
package/docs/tools/interact.md +31 -8
package/docs/tools/observe.md +4 -2
package/package.json +1 -1
package/skills/rfc-review/SKILL.md +52 -0
package/skills/rfc-review/references/rfc-review-checklist.md +12 -0
package/skills/rfc-review/references/rfc-review-template.md +28 -0
package/src/interact/classify.ts +53 -13
package/src/interact/index.ts +151 -0
package/src/observe/android.ts +11 -1
package/src/observe/index.ts +26 -1
package/src/observe/ios.ts +28 -13
package/src/observe/snapshot-metadata.ts +107 -0
package/src/server/tool-definitions.ts +49 -14
package/src/server/tool-handlers.ts +13 -0
package/src/server-core.ts +1 -1
package/src/types.ts +23 -0
package/test/unit/interact/classify_action_outcome.test.ts +44 -25
package/test/unit/interact/wait_for_ui_change.test.ts +76 -0
package/test/unit/server/contract.test.ts +8 -6
package/test/unit/server/response_shapes.test.ts +37 -3
package/docs/rfcs/003-wait-and-synchronization-reliability +0 -232

package/docs/ROADMAP.md CHANGED Viewed

@@ -1,6 +1,6 @@
-# Mobile Debug MCP Prioritized Roadmap
+# Mobile Debug MCP Roadmap
-## Prioritization Criteria
+## Planning Principles
 Ordered by:
@@ -26,33 +26,45 @@ Higher task success with fewer retries.
 ---
-# Completed
+# Roadmap Status Overview
-These priorities are done and kept here for history:
+## Completed Foundations
-- Priority 1 — Stronger State Verification
-- Priority 2 — Richer Element Identity
+| Capability | Status | Notes |
+|-----------|--------|-------|
+| Stronger State Verification | Complete | Foundational verification layer shipped |
+| Richer Element Identity | Complete | Identity and selector confidence foundations shipped |
+## Current Focus
+- Wait and Synchronization Reliability
+## Upcoming Work
+- Long Press Gesture
+- Better Compose / Custom Control Semantics
-Completion notes:
+## Later Horizon
-- State-aware verification is now implemented and wired through the tool surface.
-- Platform-native element metadata and selector-confidence hints are now part of the runtime contract.
+- Pinch to Zoom
+- Action Trace Correlation
 ---
-# Priority 1 — Stronger State Verification
+# Stronger State Verification
 ## Why first
 Highest leverage improvement.
-**Status:** Completed
+**Status:** Completed
+**Priority:** P1
 Most failures are not “can’t act,” they’re:
 - uncertain state
 - weak verification
 - retry loops caused by inference
-## Deliver
+## Scope
 - Direct readable control values
 - Expanded `expect_*` verification
 - Move from inference to state introspection
@@ -60,7 +72,7 @@ Most failures are not “can’t act,” they’re:
 ## Expected Impact
 Very high.
-## Done Criteria
+## Exit Criteria
 - Control state readable for core widgets (toggle, slider, input, dropdown)
 - New expect_* state verifiers implemented
 - Agents can verify state without visual inference in representative flows
@@ -79,19 +91,20 @@ Blocks or strengthens:
 ---
-# Priority 2 — Richer Element Identity
+# Richer Element Identity
 ## Why second
 Directly reduces selector brittleness.
-**Status:** Completed
+**Status:** Completed
+**Priority:** P2
 Improves:
 - targeting stability
 - repeatability
 - agent confidence
-## Deliver
+## Scope
 - Stable IDs / test tags prioritization
 - Selector confidence metadata
 - Preferred selector hierarchy
@@ -99,7 +112,7 @@ Improves:
 ## Expected Impact
 Very high.
-## Done Criteria
+## Exit Criteria
 - Stable selector preference order implemented
 - Test tags/resource IDs surfaced where available
 - Selector confidence metadata available
@@ -118,18 +131,21 @@ Blocks or strengthens:
 ---
-# Priority 3 — Wait and Synchronization Reliability
+# Wait and Synchronization Reliability
 ## Why third
 Reliable async synchronization is foundational for agent success and should precede gesture expansion.
+**Status:** Spec Ready
+**Priority:** P3
 Addresses failures where agents:
 - skip UI waits after actions
 - rely on network/log signals too early
 - struggle with in-place UI updates
 - misread stale UI snapshots
-## Deliver
+## Scope
 - UI-first synchronization policy guidance
 - wait_for_ui_change (hierarchy diff based waiting)
 - Structured loading state detection
@@ -139,7 +155,7 @@ Addresses failures where agents:
 ## Expected Impact
 Very high.
-## Done Criteria
+## Exit Criteria
 - wait_for_ui_change implemented
 - Loading state detection available for representative controls
 - Snapshot revision or staleness metadata exposed
@@ -163,11 +179,14 @@ Blocks or strengthens:
 ---
-# Priority 4 — Long Press Gesture
+# Long Press Gesture
 ## Why fourth
 High utility, relatively low complexity.
+**Status:** Planned
+**Priority:** P4
 Unlocks many currently awkward interactions:
 - context menus
@@ -177,7 +196,7 @@ Unlocks many currently awkward interactions:
 Broad usefulness.
-## Deliver
+## Scope
 New tool:
 ```json
@@ -191,7 +210,7 @@ Verification alignment:
 ## Expected Impact
 High.
-## Done Criteria
+## Exit Criteria
 - long_press tool implemented across supported platforms
 - Duration defaults and overrides supported
 - Verification patterns for long press outcomes defined
@@ -211,18 +230,21 @@ Strengthens:
 ---
-# Priority 5 — Better Compose / Custom Control Semantics
+# Better Compose / Custom Control Semantics
 ## Why fifth
 Important, but strengthened by priorities 1–4 first.
+**Status:** Planned
+**Priority:** P5
 Semantics become more useful once:
 - identity is stronger
 - verification is stronger
 - gestures are richer
 - synchronization is more reliable
-## Deliver
+## Scope
 - Composite control traits
 - Control role enrichment (adjustable, expandable, selectable_group)
 - Interaction contracts metadata
@@ -233,7 +255,7 @@ Semantics become more useful once:
 ## Expected Impact
 High.
-## Done Criteria
+## Exit Criteria
 - Semantic traits implemented for major custom control classes
 - Interaction contracts surfaced in snapshot model
 - Confidence model defined for derived semantics
@@ -253,11 +275,14 @@ Depends on:
 ---
-# Priority 6 — Pinch to Zoom
+# Pinch to Zoom
 ## Why sixth
 Valuable, but narrower than long press.
+**Status:** Planned
+**Priority:** P6
 Applies mainly to:
 - maps
 - images
@@ -266,7 +291,7 @@ Applies mainly to:
 Useful, but less universal.
-## Deliver
+## Scope
 ```json
 pinch_to_zoom(target, scale, center?)
@@ -279,7 +304,7 @@ Verification:
 ## Expected Impact
 Medium-high.
-## Done Criteria
+## Exit Criteria
 - pinch_to_zoom implemented
 - Zoom in/out flows supported
 - Verification primitives for viewport or zoom state available
@@ -297,22 +322,25 @@ Depends on:
 ---
-# Priority 7 — Action Trace Correlation
+# Action Trace Correlation
 ## Why seventh
 Very valuable for debugging,
 but less critical than improving control success first.
+**Status:** Planned
+**Priority:** P7
 Improves diagnosis more than task completion.
-## Deliver
+## Scope
 - Action correlation metadata
 - UI/network/log linkage
 ## Expected Impact
 Medium-high.
-## Done Criteria
+## Exit Criteria
 - Action correlation model defined
 - UI/network/log linkage captured for representative actions
 - Correlation metadata exposed to agents
@@ -331,7 +359,7 @@ Depends on:
 ---
-# Delivery Waves
+# Roadmap Sequence
 ## Dependency Summary
 Foundational sequence:
@@ -351,7 +379,7 @@ Layer 3 (Interaction Expansion)
 Layer 4 (Observability)
 - Priority 7 depends on 1,2,3
-## Wave 1 (Immediate)
+## Wave 1 (Current Focus)
 - Stronger State Verification
 - Richer Element Identity
 - Wait and Synchronization Reliability
@@ -361,7 +389,7 @@ Make core loop more reliable.
 ---
-## Wave 2
+## Wave 2 (Expansion)
 - Long Press
 - Better Compose Semantics
@@ -370,7 +398,7 @@ Expand interaction capability.
 ---
-## Wave 3
+## Wave 3 (Advanced)
 - Pinch to Zoom
 - Action Trace Correlation
@@ -379,7 +407,7 @@ Advanced gestures + observability.
 ---
-# Priority Stack Summary
+# Capability Sequence
 Execution Order:
 1. Stronger State Verification
@@ -397,7 +425,7 @@ Rationale:
 ---
-## Explicitly Deferred
+## Future Considerations
 Still out of scope:
 - Recovery planning logic

package/docs/rfcs/003-wait-and-synchronization-reliability.md ADDED Viewed

@@ -0,0 +1,296 @@
+# RFC-003: Wait and Synchronization Reliability
+Priority: 3
+Depends on: RFC-001 (Stronger State Verification), RFC-002 (Platform-Native Element Metadata and Resolution Hints)
+---
+# 1. Problem
+Agents can often identify the right element (RFC-002) and verify the right state (RFC-001), but still fail because they act before the UI has reached the intended post-action state.
+This causes:
+- retries caused by racing the UI
+- false failures from stale snapshots
+- overuse of network/log verification when UI evidence should suffice
+- flakiness in asynchronous and in-place update flows
+- unreliable behaviour in Compose-heavy or thin accessibility trees
+Current system limitations:
+- wait_for_ui is underused after actions involving async state changes
+- current waits focus on expected elements appearing, not general UI transition detection
+- snapshot staleness is not explicitly surfaced
+- loading state transitions are inconsistently observable
+---
+# 2. Goals
+This RFC introduces:
+1. UI-first synchronization policy after actions
+2. Snapshot staleness and revision metadata
+3. UI-change based waiting for in-place updates
+4. Structured loading-state detection
+5. Compose-aware synchronization hints
+Success goals:
+- reduce retries caused by premature actions
+- increase successful post-action verification
+- reduce unnecessary fallbacks to logs/network checks
+- improve reliability in asynchronous UI flows
+---
+# 3. Non-Goals
+This RFC does not:
+- redefine state verification semantics (RFC-001)
+- redefine element identity contracts (RFC-002)
+- add new interaction primitives (long press, pinch, etc.)
+- replace network or log verification where no UI outcome exists
+---
+# 4. Proposed Model
+## 4.1 UI-First Synchronization Contract (v1)
+Default post-action flow SHOULD be:
+```text
+action
+→ wait_for_ui(expected outcome)
+→ verify state
+→ only fall back to network/logs when no UI outcome exists or wait fails
+```
+Tool-level contract:
+- After actions expected to cause visible UI changes, agents SHOULD invoke wait_for_ui or wait_for_ui_change before verification.
+- wait_for_ui SHOULD be used when an expected element or explicit outcome is known.
+- wait_for_ui_change SHOULD be used for in-place mutations where a specific element target is not known.
+- wait_for_screen_change SHOULD remain preferred for full navigation transitions when available.
+Rules:
+- UI evidence MUST be preferred over network or log evidence when a UI outcome is expected.
+- Actions that trigger navigation, async mutation, or visible state changes SHOULD be followed by a wait.
+- Network/log checks are fallback signals, not primary synchronization mechanisms.
+- This synchronization order is normative tool behavior for agents, not advisory prose.
+---
+## 4.2 Snapshot Revision Contract
+All snapshot responses MUST include revision metadata.
+Emission scope:
+- snapshot_revision and captured_at_ms MUST be emitted on snapshot responses.
+- get_ui_tree responses SHOULD emit the same fields when backed by the same snapshot generation layer.
+- If both surfaces exist, revision values MUST be consistent across them when derived from the same underlying snapshot.
+Required snapshot envelope:
+```json
+{
+  "snapshot_revision": 184,
+  "captured_at_ms": 1714452012301
+}
+```
+Field requirements:
+- snapshot_revision REQUIRED on every snapshot response.
+- captured_at_ms REQUIRED on every snapshot response.
+Source of truth:
+- snapshot_revision originates in the snapshot generation layer.
+- It MUST increment when a meaningful hierarchy delta is detected.
+- Cosmetic-only changes MUST NOT increment revision.
+Meaningful deltas include:
+- node added or removed
+- visible text mutation
+- control state change
+- list content mutation
+- navigation or view transition
+Cosmetic churn examples (must not increment):
+- cursor blink
+- focus-only changes
+- animation-only transitions
+- timestamp or unrelated ephemeral text changes
+Rules:
+- Agents SHOULD use revision changes as synchronization signals.
+- Stale revisions SHOULD trigger reacquisition before verification.
+- This extends the snapshot response contract defined by RFC-002.
+- Snapshot responses are the normative required emission surface; get_ui_tree emission is recommended for consistency.
+- snapshot_revision MUST be monotonically increasing within a session.
+---
+## 4.3 wait_for_ui_change API
+Concrete API contract:
+```ts
+wait_for_ui_change({
+  expected_change?: "hierarchy_diff" | "text_change" | "state_change",
+  timeout_ms?: number,
+  stability_window_ms?: number
+}) => {
+  success: boolean,
+  observed_change: "hierarchy_diff" | "text_change" | "state_change" | null,
+  snapshot_revision?: number,
+  timeout: boolean
+}
+```
+Relationship to other wait primitives:
+- wait_for_screen_change remains the preferred primitive for navigation-level transitions.
+- wait_for_ui_change is the preferred primitive for non-navigation UI mutations and in-place updates.
+- wait_for_ui_change is additive to wait_for_screen_change, not a replacement for it.
+Rules:
+- stability_window_ms represents time a detected change must remain stable before success.
+- Meaningful delta semantics are inherited from Section 4.2.
+- wait_for_ui_change complements wait_for_ui; it does not replace it.
+- Agents SHOULD prefer wait_for_screen_change for navigation and wait_for_ui_change for non-navigation changes.
+---
+## 4.4 Structured Loading-State Contract
+Loading signals are OPTIONAL overall, but when a detectable loading signal exists they SHOULD be surfaced on snapshot responses and UI tree responses, and if emitted they MUST conform to the contract below.
+Required shape:
+```json
+{
+  "loading_state": {
+    "active": true,
+    "signal": "progress_indicator",
+    "source": "snapshot"
+  }
+}
+```
+Required fields:
+- active
+- signal
+- source
+Rules:
+- Loading signals are synchronization hints only.
+- Loading completion MUST NOT alone be treated as success.
+- If emitted, the shape above MUST be used.
+- Absence of loading_state is valid when no reliable loading signal is detectable; malformed or partial loading_state emission is not valid.
+---
+## 4.5 Compose-Aware Synchronization Hints
+For Compose or thin accessibility structures:
+Systems SHOULD support:
+- merged semantic node changes as wait signals
+- text mutations within existing nodes
+- in-place recomposition awareness
+These are synchronization hints layered on top of standard wait behaviour.
+---
+# 5. Failure Modes
+## 5.1 Premature Action Progression
+If an action is followed immediately by verification without waiting:
+- system SHOULD bias toward suggesting wait_for_ui
+- retries SHOULD prefer synchronization correction before repeated action execution
+---
+## 5.2 Stale Snapshot Reads
+If verification uses an old snapshot:
+- revision metadata SHOULD expose staleness
+- agents SHOULD reacquire snapshot before retrying verification
+---
+## 5.3 No Visible UI Outcome
+If no UI outcome is expected:
+- network/log verification MAY be primary evidence
+- UI-first policy does not apply rigidly
+---
+## 5.4 False Positive UI Change Detection
+If unrelated UI churn triggers early wait completion:
+- systems SHOULD reject cosmetic-only changes using Section 4.2 rules
+- agents SHOULD prefer stability windows before considering waits satisfied
+---
+# 6. Acceptance Criteria
+RFC-003 specification is complete when:
+- Snapshot Revision Contract is fully defined and mandatory.
+- wait_for_ui_change API contract is fully defined.
+- Loading-State Contract required schema is defined.
+- Synchronization tool-selection rules are explicitly specified.
+- False-positive change handling is specified.
+Implementation readiness success is measured when:
+- snapshot revisions reduce stale-read retries
+- synchronization retries decrease
+- post-action verification success increases
+---
+# 7. Success Metrics
+- Fewer retries caused by timing/synchronization errors
+- Higher post-action verification success rate
+- Reduced unnecessary fallback to network/log evidence
+- Improved stability in asynchronous and Compose-heavy flows
+---
+# 8. Deferred To Later RFCs
+- Advanced subscriptions / notify-when-element-appears APIs
+- Full action-to-ui trace correlation (Priority 7)
+- Gesture-trigger-specific synchronization logic
+- Element appearance subscription / notify-when-ready APIs
+---
+This RFC standardises temporal reliability and synchronization signals layered on top of state verification and element identity guarantees from RFC-001 and RFC-002.