npm - mobile-debug-mcp - Versions diffs - 0.24.8 → 0.25.1 - Mend

mobile-debug-mcp 0.24.8 → 0.25.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (28) hide show

package/README.md +1 -1
package/dist/interact/index.js +240 -4
package/dist/observe/ios.js +126 -3
package/dist/server/common.js +2 -1
package/dist/server/tool-definitions.js +55 -0
package/dist/server/tool-handlers.js +17 -0
package/dist/server-core.js +1 -1
package/dist/utils/android/utils.js +134 -3
package/docs/CHANGELOG.md +9 -0
package/docs/ROADMAP.md +406 -0
package/docs/rfcs/001-state-verification.md +452 -0
package/docs/rfcs/002-richer-element-identity +400 -0
package/docs/rfcs/003-wait-and-synchronization-reliability +232 -0
package/docs/specs/mcp-tooling-spec-v1.md +5 -0
package/docs/tools/interact.md +25 -0
package/docs/tools/observe.md +3 -1
package/package.json +1 -1
package/src/interact/index.ts +272 -4
package/src/observe/index.ts +6 -0
package/src/observe/ios.ts +129 -4
package/src/server/common.ts +2 -1
package/src/server/tool-definitions.ts +55 -0
package/src/server/tool-handlers.ts +18 -0
package/src/server-core.ts +1 -1
package/src/types.ts +67 -1
package/src/utils/android/utils.ts +126 -4
package/test/unit/observe/state_extraction.test.ts +90 -0
package/test/unit/server/response_shapes.test.ts +40 -2

package/docs/rfcs/002-richer-element-identity ADDED Viewed

@@ -0,0 +1,400 @@
+# RFC-002: Platform-Native Element Metadata and Resolution Hints
+Priority: 2
+Depends on: RFC-001 (Stronger State Verification)
+---
+# 1. Problem
+Agents currently rely on brittle or inconsistent selectors when identifying UI elements.
+This leads to:
+- selector drift across UI updates
+- failure to target correct elements in dynamic layouts
+- retry loops due to ambiguous element matching
+- inability to distinguish visually similar components
+Current system limitations:
+- weak or inconsistent element identifiers
+- over-reliance on hierarchy position or text inference
+- insufficient metadata for stable targeting
+This RFC assumes stable identity may be derived from underlying platform accessibility or testing hooks, but does not assume a universal cross-platform stable identifier model.
+This RFC does not assume that a universal or guaranteed stable_id exists across all platforms. Instead, it defines a best-effort model based on platform-native identifiers and developer-provided metadata, supplemented by resolution hints.
+---
+# 2. Goals
+This RFC introduces:
+1. Platform-native element identity metadata for UI targeting
+2. Hierarchy-independent element references
+3. Selector confidence metadata for reliability
+4. Structured fallback resolution strategy
+Success goals:
+- Increase element match success rate
+- Reduce selector-related retries
+- Improve robustness across UI updates and Compose-heavy layouts
+---
+# 3. Non-Goals
+This RFC does not:
+- Modify state verification logic (RFC-001)
+- Introduce gesture handling (future RFCs)
+- Define synchronization/waiting behaviour (RFC-003)
+- Add new interaction primitives beyond identification and selection
+---
+# 4. Proposed Model
+## 4.1 Stable Element Identity
+Each UI element SHOULD expose a stable identifier when available.
+Preferred model:
+```json
+{
+  "element_id": "wifi_toggle",
+  "stable_id": "settings_wifi_toggle",
+  "role": "switch"
+}
+```
+Rules:
+- stable_id SHOULD be derived from platform-native or developer-provided identifiers when available
+- stable_id MAY remain consistent across UI renders where supported by the platform
+- stable_id is a preferred targeting key when present, but not guaranteed to exist
+- element_id is session-scoped and may change between snapshots
+---
+## 4.1.1 Stable ID Origin
+stable_id MUST be derived from platform-provided or framework-provided identifiers when available.
+Acceptable sources include:
+- Android: resource-id, content-desc (when stable and explicitly set)
+- iOS: accessibilityIdentifier
+- Web: data-testid or equivalent testing attributes
+- Compose: semantics properties or developer-assigned test tags
+Rules:
+- stable_id MUST NOT be heuristically generated from visual text alone
+- stable_id SHOULD prefer developer-defined identifiers over inferred values
+- If no reliable source exists, stable_id MUST be omitted (not fabricated)
+---
+## 4.1.2 Stable ID Collision Handling
+If multiple elements share the same stable_id:
+- system MUST treat this as a collision state
+- all matching elements MUST be returned
+- agent MUST disambiguate using role, label, or hierarchy context
+Rules:
+- collisions MUST NOT be silently resolved by system-level heuristics
+- stable_id uniqueness is a best-effort constraint, not a guarantee
+---
+## 4.2 Selector Confidence Model
+Each element MAY include confidence metadata for selection reliability.
+```json
+{
+  "selector": "Text('WiFi')",
+  "confidence": 0.92
+}
+```
+Rules:
+- Confidence reflects likelihood of correct element match
+- Low confidence SHOULD trigger fallback resolution
+- Confidence MUST NOT be treated as deterministic truth
+---
+## 4.2.1 Confidence API Exposure
+Confidence metadata MUST be exposed as part of the element selector object.
+Preferred shape:
+```json
+{
+  "selector": "Text('WiFi')",
+  "confidence": {
+    "score": 0.92,
+    "reason": "unique_text_match"
+  }
+}
+```
+Rules:
+- confidence.score MUST be a float between 0 and 1
+- confidence.reason SHOULD indicate primary matching heuristic
+- confidence MUST be attached to selector metadata, not state
+This structure is expected to be present in both snapshot metadata and any downstream selector debugging output produced by the resolution engine.
+---
+## 4.3 Fallback Resolution Strategy
+Resolution order MUST be:
+1. stable_id (if unique or disambiguated via collision handling)
+2. platform-native metadata match (role + label + test_tag)
+3. selector + confidence scoring
+4. structural hierarchy fallback
+5. text inference (last resort)
+Agents MUST prefer higher-order resolution strategies before falling back.
+---
+## 4.4 Element Metadata Model (Platform-Aware)
+For Compose and similar UI systems, elements MUST expose structured metadata rather than inferred semantic paths.
+Preferred model:
+```json
+{
+  "role": "button",
+  "label": "Save",
+  "text": "Save",
+  "test_tag": "settings_save_button",
+  "semantic": {
+    "is_clickable": true,
+    "is_container": false
+  }
+}
+```
+Rules:
+- semantic_path MUST NOT be used as a required field
+- platform-native metadata (test_tag / accessibility id) is preferred
+- hierarchy information MAY be included but is not authoritative
+---
+## 4.5 Snapshot Response Contract (v1)
+This RFC defines the expected structure of element metadata returned by the snapshot observation tool.
+Each element in a snapshot MUST conform to the following shape:
+```json
+{
+  "element_id": "string (session-scoped)",
+  "stable_id": "string (optional)",
+  "role": "string",
+  "label": "string (optional)",
+  "text": "string (optional)",
+  "test_tag": "string (optional)",
+  "selector": {
+    "value": "string",
+    "confidence": {
+      "score": 0.0-1.0,
+      "reason": "string"
+    }
+  }
+}
+```
+Rules:
+- element_id MUST be present and session-scoped
+- stable_id MAY be present when provided by platform or developer metadata
+- selector.confidence MUST be attached when selector is present
+- test_tag SHOULD be preferred over inferred identifiers where available
+Note:
+This schema replaces ambiguous "snapshot response" references in prior sections and defines the canonical output contract for element identity and resolution metadata.
+This contract defines the boundary between platform-derived metadata and resolution-engine-generated metadata, and is the single source of truth for all element identity fields used by downstream agents.
+## 4.6 API Surface Mapping
+This section defines where each field in the Snapshot Response Contract is produced within the system.
+### 4.6.1 Snapshot Tool Responsibility
+The snapshot observation tool (e.g. `observe_ui_snapshot`) is responsible for returning the raw UI tree enriched with platform-derived metadata.
+It MUST return elements conforming to the Snapshot Response Contract (Section 4.5).
+In the current codebase, this maps to the `observe_ui_snapshot` pipeline (or equivalent snapshot generation function), which MUST return data conforming to the SnapshotResponse TypeScript contract defined in Section 4.6.4.
+### 4.6.2 Field Origin Mapping
+Each field in the element model has a defined source of truth:
+- element_id:
+  - Origin: Snapshot session layer
+  - Responsibility: Generated per snapshot traversal
+  - Scope: Session-scoped only
+- stable_id:
+  - Origin: Platform adapter layer (Android/iOS/Web/Compose)
+  - Responsibility: Extracted from platform-native identifiers
+  - Constraint: MUST NOT be generated by heuristics alone
+- role:
+  - Origin: Accessibility tree / platform UI framework
+  - Responsibility: Semantic role mapping from native UI system
+- label / text:
+  - Origin: Platform accessibility node
+  - Responsibility: Visible or accessible text content extraction
+- test_tag:
+  - Origin: Developer-defined metadata (when available)
+  - Responsibility: Explicit testing identifiers (e.g. accessibilityIdentifier, data-testid)
+- selector:
+  - Origin: Resolution engine (post-processing layer)
+  - Responsibility: Generated match expression for agent targeting
+- selector.confidence:
+  - Origin: Resolution engine
+  - Responsibility: Heuristic confidence scoring of selector correctness
+### 4.6.3 Layer Separation Rule
+The system MUST maintain strict separation between:
+- Platform extraction layer (stable_id, role, label, test_tag)
+- Resolution layer (selector, confidence)
+- Session layer (element_id)
+No layer is permitted to overwrite another layer's source of truth.
+---
+## 4.6.4 TypeScript Contract (Implementation Binding)
+This section defines the concrete TypeScript-level contract used by the codebase for snapshot and element resolution.
+These types represent the implementation binding for the Snapshot Response Contract (Section 4.5).
+```ts
+export interface SelectorConfidence {
+  score: number; // 0.0 - 1.0
+  reason: string;
+}
+export interface ElementSelector {
+  value: string;
+  confidence: SelectorConfidence;
+}
+export interface ElementSnapshot {
+  element_id: string; // session-scoped
+  stable_id?: string;
+  role: string;
+  label?: string;
+  text?: string;
+  test_tag?: string;
+  selector?: ElementSelector;
+}
+export interface SnapshotResponse {
+  elements: ElementSnapshot[];
+}
+```
+Notes:
+- This interface MUST align with the runtime snapshot implementation.
+- This is the canonical mapping between RFC definition and codebase types.
+- Any deviation in implementation MUST be reflected in a future RFC revision.
+---
+# 5. Failure Modes
+## 5.1 Ambiguous match
+If multiple elements match a selector:
+- The snapshot MUST include all matching candidates in the underlying element tree or debug snapshot.
+- Current action APIs (e.g. find_element / tap / wait_for_ui) MAY return a single best-effort match for compatibility.
+- When ambiguity exists, systems SHOULD expose candidate alternatives via snapshot inspection or debug instrumentation.
+- Future extensions MAY introduce explicit multi-candidate resolution APIs, but are not required for RFC-002 compliance.
+---
+## 5.2 Missing stable identity
+If stable_id is unavailable:
+- fallback hierarchy MUST be used
+- selector confidence SHOULD reflect reduced certainty and include reason="no_stable_id"
+- retries MAY be triggered
+---
+## 5.3 Layout drift
+If UI structure changes:
+- stable_id remains valid if preserved
+- structural selectors may degrade
+- confidence SHOULD reflect uncertainty
+---
+# 6. Acceptance Criteria
+RFC-002 is complete when:
+- platform-native identity metadata (stable_id, test_tag, role) is present where available
+- selector confidence metadata is present and conforms to Snapshot Response Contract (Section 4.5)
+- fallback resolution strategy is implemented
+- element match success rate improves on benchmark flows
+- selector-related retries are reduced
+---
+# 7. Success Metrics
+- Higher element resolution match rate using platform-native metadata + confidence hints
+- Reduced selector retries
+- Lower failure rate on UI updates
+- Improved stability in Compose UI trees
+---
+# 8. Out of Scope
+- State verification (RFC-001)
+- Wait/synchronization (RFC-003)
+- Gestures (future RFCs)
+- Action tracing
+This RFC is scoped as a metadata and resolution hint layer. It does not guarantee stable identity across all platforms, but standardises how identity signals are exposed and consumed.

package/docs/rfcs/003-wait-and-synchronization-reliability ADDED Viewed

@@ -0,0 +1,232 @@
+# RFC-003: Wait and Synchronization Reliability
+Priority: 3
+Depends on: RFC-001 (Stronger State Verification), RFC-002 (Platform-Native Element Metadata and Resolution Hints)
+---
+# 1. Problem
+Agents can often identify the right element (RFC-002) and verify the right state (RFC-001), but still fail because they act before the UI has reached the intended post-action state.
+This causes:
+- retries caused by racing the UI
+- false failures from stale snapshots
+- overuse of network/log verification when UI evidence should suffice
+- flakiness in asynchronous and in-place update flows
+- unreliable behaviour in Compose-heavy or thin accessibility trees
+Current system limitations:
+- wait_for_ui is underused after actions involving async state changes
+- current waits focus on expected elements appearing, not general UI transition detection
+- snapshot staleness is not explicitly surfaced
+- loading state transitions are inconsistently observable
+---
+# 2. Goals
+This RFC introduces:
+1. UI-first synchronization policy after actions
+2. Snapshot staleness and revision metadata
+3. UI-change based waiting for in-place updates
+4. Structured loading-state detection
+5. Compose-aware synchronization hints
+Success goals:
+- reduce retries caused by premature actions
+- increase successful post-action verification
+- reduce unnecessary fallbacks to logs/network checks
+- improve reliability in asynchronous UI flows
+---
+# 3. Non-Goals
+This RFC does not:
+- redefine state verification semantics (RFC-001)
+- redefine element identity contracts (RFC-002)
+- add new interaction primitives (long press, pinch, etc.)
+- replace network or log verification where no UI outcome exists
+---
+# 4. Proposed Model
+## 4.1 UI-First Synchronization Policy
+Default post-action flow SHOULD be:
+```text
+action
+→ wait_for_ui(expected outcome)
+→ verify state
+→ only fall back to network/logs when no UI outcome exists or wait fails
+```
+Rules:
+- UI evidence MUST be preferred over network or log evidence when a UI outcome is expected.
+- Actions that trigger navigation, async mutation, or visible state changes SHOULD be followed by a wait.
+- Network/log checks are fallback signals, not primary synchronization mechanisms.
+---
+## 4.2 Snapshot Revision / Staleness Metadata
+Snapshot responses MUST expose revision metadata.
+Preferred shape:
+```json
+{
+  "snapshot_revision": 184,
+  "captured_at_ms": 1714452012301
+}
+```
+Rules:
+- snapshot_revision MUST increment when hierarchy meaningfully changes.
+- captured_at_ms MUST reflect snapshot capture time.
+- Agents SHOULD use revision changes as synchronization signals.
+- Agents SHOULD treat stale revisions as suspect for verification.
+---
+## 4.3 wait_for_ui_change Primitive
+A UI-diff based wait primitive SHOULD support waiting on hierarchy changes, not only explicit expected elements.
+Conceptual contract:
+```ts
+wait_for_ui_change({
+  expected_change?: "hierarchy_diff" | "text_change" | "state_change",
+  timeout_ms?: number
+})
+```
+Use cases:
+- in-place content refresh
+- async partial rerender
+- list mutations
+- Compose recomposition-driven updates
+Rules:
+- wait_for_ui_change SHOULD detect meaningful UI deltas, not cosmetic noise.
+- It MAY use snapshot revisions as one signal.
+- It complements wait_for_ui; it does not replace it.
+---
+## 4.4 Structured Loading-State Detection
+Systems SHOULD surface structured loading signals when detectable.
+Examples:
+- progress indicator present/absent
+- disabled submit button becomes enabled
+- loading spinner removed
+Preferred model:
+```json
+{
+  "loading_state": {
+    "active": true,
+    "signal": "progress_indicator"
+  }
+}
+```
+Rules:
+- loading start and loading completion SHOULD be detectable when possible.
+- Loading signals MAY be used as synchronization hints, not sole success criteria.
+---
+## 4.5 Compose-Aware Synchronization Hints
+For Compose or thin accessibility structures:
+Systems SHOULD support:
+- merged semantic node changes as wait signals
+- text mutations within existing nodes
+- in-place recomposition awareness
+These are synchronization hints layered on top of standard wait behaviour.
+---
+# 5. Failure Modes
+## 5.1 Premature Action Progression
+If an action is followed immediately by verification without waiting:
+- system SHOULD bias toward suggesting wait_for_ui
+- retries SHOULD prefer synchronization correction before repeated action execution
+---
+## 5.2 Stale Snapshot Reads
+If verification uses an old snapshot:
+- revision metadata SHOULD expose staleness
+- agents SHOULD reacquire snapshot before retrying verification
+---
+## 5.3 No Visible UI Outcome
+If no UI outcome is expected:
+- network/log verification MAY be primary evidence
+- UI-first policy does not apply rigidly
+---
+# 6. Acceptance Criteria
+RFC-003 is complete when:
+- UI-first synchronization policy is enforced in agent guidance
+- snapshot revision metadata is exposed in snapshot responses
+- wait_for_ui_change contract is implemented or stubbed
+- structured loading-state hints are surfaced where detectable
+- retries caused by premature action progression are reduced
+---
+# 7. Success Metrics
+- Fewer retries caused by timing/synchronization errors
+- Higher post-action verification success rate
+- Reduced unnecessary fallback to network/log evidence
+- Improved stability in asynchronous and Compose-heavy flows
+---
+# 8. Deferred To Later RFCs
+- Advanced subscriptions / notify-when-element-appears APIs
+- Full action-to-ui trace correlation (Priority 7)
+- Gesture-trigger-specific synchronization logic
+---
+This RFC standardises temporal reliability and synchronization signals layered on top of state verification and element identity guarantees from RFC-001 and RFC-002.

package/docs/specs/mcp-tooling-spec-v1.md CHANGED Viewed

@@ -40,6 +40,7 @@ Outcome-specific guidance:
 - visible navigation expected -> `wait_for_screen_change` (optional) -> `expect_screen`
 - local UI change expected -> `wait_for_ui` (optional) -> `expect_element_visible`
+- readable element state expected -> `wait_for_ui` (optional) -> `expect_state`
 - backend/API activity expected without a visible UI change -> compare `get_screen_fingerprint` before/after, then call `get_network_activity` immediately after the action and `classify_action_outcome` with the observed requests
 For backend/API activity, `wait_for_screen_change` is not the right verification tool unless a visible transition is also expected.
@@ -108,6 +109,7 @@ Primary:
 - `expect_screen`
 - `expect_element_visible`
+- `expect_state`
 ### 5.2 Required Semantics
@@ -130,6 +132,7 @@ An `expect_*` tool is applicable when:
 - expected destination screen is known -> `expect_screen`
 - expected UI element state is known -> `expect_element_visible`
+- expected readable state property is known -> `expect_state`
 - outcome is explicitly defined or testable
 Rules:
@@ -234,6 +237,8 @@ The semantic layer is derived, best-effort, and MUST be generated exclusively fr
 Raw layer contents include:
 - UI hierarchy or accessibility tree
+- normalized readable element state where exposed by the platform
+- platform-native identity hints such as stable identifiers, roles, and test tags
 - screenshot when available
 - element-level attributes
 - logs and fingerprint/activity observations

package/docs/tools/interact.md CHANGED Viewed

@@ -53,6 +53,7 @@ Preferred verification:
 - navigation outcome known -> `expect_screen`
 - local UI change known -> `expect_element_visible`
+- readable element state known -> `expect_state`
 - backend/API activity expected -> `classify_action_outcome` + `get_network_activity`
 Use `wait_for_screen_change` only when a visible transition is the expected outcome. If a button should trigger an API request but the screen should stay the same, rely on network activity and classification instead.
@@ -459,6 +460,30 @@ Notes:
 ---
+## expect_state
+Deterministically verify a readable state property on a visible element.
+Input:
+```json
+{
+  "selector": { "text": "Notifications" },
+  "property": "checked",
+  "expected": true,
+  "platform": "android",
+  "deviceId": "emulator-5554"
+}
+```
+Notes:
+- Use this when the element is visible but its state also matters.
+- Supported properties include `checked`, `selected`, `focused`, `expanded`, `enabled`, `text_value`, `value`, and `raw_value`.
+- The tool compares normalized state and returns the observed value when available.
+---
 ## classify_action_outcome + get_network_activity
 Use this pair when the action is expected to trigger network/backend work and the screen may not visibly change.