npm - mobile-debug-mcp - Versions diffs - 0.25.1 → 0.26.1 - Mend

mobile-debug-mcp 0.25.1 → 0.26.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (35) hide show

package/dist/interact/classify.js +48 -11
package/dist/interact/index.js +113 -0
package/dist/observe/android.js +10 -1
package/dist/observe/index.js +19 -1
package/dist/observe/ios.js +15 -1
package/dist/observe/snapshot-metadata.js +88 -0
package/dist/server/tool-definitions.js +49 -14
package/dist/server/tool-handlers.js +12 -0
package/dist/server-core.js +1 -1
package/docs/CHANGELOG.md +9 -0
package/docs/ROADMAP.md +66 -38
package/docs/rfcs/003-wait-and-synchronization-reliability.md +296 -0
package/docs/rfcs/004-action-verification-routing.md +342 -0
package/docs/specs/mcp-tooling-spec-v1.md +11 -3
package/docs/tools/interact.md +31 -8
package/docs/tools/observe.md +4 -2
package/package.json +1 -1
package/skills/rfc-review/SKILL.md +52 -0
package/skills/rfc-review/references/rfc-review-checklist.md +12 -0
package/skills/rfc-review/references/rfc-review-template.md +28 -0
package/src/interact/classify.ts +53 -13
package/src/interact/index.ts +151 -0
package/src/observe/android.ts +11 -1
package/src/observe/index.ts +26 -1
package/src/observe/ios.ts +28 -13
package/src/observe/snapshot-metadata.ts +107 -0
package/src/server/tool-definitions.ts +49 -14
package/src/server/tool-handlers.ts +13 -0
package/src/server-core.ts +1 -1
package/src/types.ts +23 -0
package/test/unit/interact/classify_action_outcome.test.ts +44 -25
package/test/unit/interact/wait_for_ui_change.test.ts +76 -0
package/test/unit/server/contract.test.ts +8 -6
package/test/unit/server/response_shapes.test.ts +37 -3
package/docs/rfcs/003-wait-and-synchronization-reliability +0 -232

package/docs/rfcs/004-action-verification-routing.md ADDED Viewed

@@ -0,0 +1,342 @@
+# RFC 004: Verification Routing for Local and Side-Effect Actions
+## Status
+Draft
+## Summary
+This RFC corrects a specification flaw in action verification routing where agents may treat lack of obvious UI change as a trigger to inspect network activity by default.
+The current fallback can cause unnecessary network calls during purely local UI interactions (for example sliders, pickers, toggles, text entry), creating noise and reinforcing incorrect agent behavior.
+This RFC separates:
+- action verification
+- failure diagnosis
+- backend signal inspection
+And introduces context-aware routing based on action type.
+## Motivation
+Observed agent sessions showed `get_network_activity` being invoked during local UI manipulation solely because an action produced no coarse-grained UI diff.
+Current implicit reasoning resembles:
+```text
+if uiChanged == false:
+  inspect network activity
+```
+This is overly broad.
+For many interactions, absence of obvious snapshot change does not imply backend ambiguity. It often means verification used the wrong signals.
+Examples:
+- Slider value changed but tree structure did not.
+- Picker selection updated in-place.
+- Toggle changed checked state only.
+- Text field value changed without large snapshot delta.
+- Tab or accordion state changed through selection metadata.
+In these cases network inspection is diagnostic noise, not evidence.
+## Problem Statement
+The current model conflates:
+1. Verifying whether an action succeeded.
+2. Diagnosing why an action may have failed.
+These are distinct phases.
+As a result:
+- agents overuse network inspection
+- verification costs increase
+- local-state actions are treated as ambiguous too often
+- network hints can be elevated beyond their intended role
+## Goals
+This RFC:
+- Prevents default network fallbacks for local-state actions.
+- Makes verification primarily state-driven.
+- Restricts network activity inspection to side-effect actions where ambiguity remains.
+- Refines `classify_action_outcome` decision routing.
+## Non-Goals
+This RFC does not:
+- change raw snapshot precedence (raw remains authoritative)
+- redefine expect_* ownership of verification
+- make network activity mandatory evidence
+- expand semantic hints into executable truth
+## Action Categories
+### Category A: Local-State Actions
+Actions expected to modify client-side UI state.
+Examples:
+- tap toggle
+- drag slider
+- picker selection
+- text entry
+- scrolling
+- tab switching
+- expand/collapse
+- local navigation controls
+### Category B: Side-Effect Actions
+Actions that may trigger backend or asynchronous side effects.
+Examples:
+- submit
+- save
+- sync
+- search
+- refresh
+- login
+- purchase flows
+## Action Classification Source of Truth
+## Action Type Emission (Runtime Contract)
+`action_type` MUST be emitted by the runtime layer that produces or executes actions. It is not inferred by the agent.
+There are three valid sources of truth, in order of precedence:
+### 1. Tool Schema Annotation (preferred)
+If the action originates from a tool invocation, `action_type` MUST be defined in the tool’s schema definition.
+Example:
+```json
+{
+  "name": "toggle_switch",
+  "action_type": "local_state"
+}
+```
+or
+```json
+{
+  "name": "submit_form",
+  "action_type": "side_effect"
+}
+```
+This is the canonical source.
+### 2. Handler Output (runtime execution layer)
+If tool schema does not define `action_type`, the runtime handler that executes the action MUST attach it before returning the action result.
+Example:
+```json
+{
+  "action": "click",
+  "target": "save_button",
+  "action_type": "side_effect"
+}
+```
+This is valid only when schema-level annotation is absent.
+### 3. Fallback Mapping Table (last resort, deterministic only)
+If neither schema nor handler provides `action_type`, the system MUST use a deterministic mapping table maintained by the runtime.
+This table MUST be:
+- static (no runtime inference)
+- versioned
+- explicitly defined in implementation
+Example mapping:
+| action | action_type |
+|--------|------------|
+| tap_toggle | local_state |
+| enter_text | local_state |
+| submit | side_effect |
+| refresh | side_effect |
+If an action is not in the table, it MUST default to:
+```
+side_effect
+```
+### Hard Constraint
+Agents MUST NOT infer or override `action_type` based on UI state changes, snapshot diffs, or network activity.
+### Normative Interpretation
+`action_type` is part of the execution contract, not the reasoning layer.
+Action type MUST be explicitly defined by the action schema or tool output.
+Valid values:
+- local_state
+- side_effect
+Agents MUST NOT infer action type from UI changes.
+If action type is missing, agents MUST treat it as side_effect only if backend interaction is plausible; otherwise classify as local_state.
+## Revised Verification Routing
+### For Local-State Actions
+Verification priority:
+1. Expected state assertions
+2. Refreshed snapshot comparison
+3. Element property checks
+4. Targeted expect_* verification
+Signals may include:
+- value changes
+- selected state
+- checked state
+- focus changes
+- labels/text
+- enabled/disabled transitions
+- position/state metadata
+Network activity should not be used as default fallback.
+## For Side-Effect Actions
+Verification priority:
+1. Expected UI/state verification first
+2. Retry richer local verification if ambiguous
+3. Only then optionally inspect network or log signals
+Network signals are supporting hints, not primary proof of success.
+## Decision Logic Update
+Replace implied logic:
+```text
+if uiChanged == false:
+  get_network_activity()
+```
+With:
+```text
+if expected_state_verified:
+  success
+elif action_type == local_state:
+  retry using richer state verification
+elif action_type == side_effect and ambiguity_remains:
+  optionally inspect network activity
+else:
+  inconclusive
+```
+## Definition of Ambiguity
+Ambiguity exists only when:
+- expected state cannot be evaluated from UI snapshot, AND
+- no single deterministic state predicate can be computed from UI fields
+Ambiguity does NOT include:
+- absence of visual diff
+- absence of network activity
+- lack of large UI tree changes
+## Normative Rules
+### Rule 1
+Agents MUST NOT use network activity inspection as a default fallback for local-state actions solely because coarse UI diffs are absent.
+### Rule 2
+Agents MUST prefer explicit state verification over backend diagnostics whenever the action is expected to be locally observable.
+### Rule 3
+Network activity MAY be consulted only when:
+- the action plausibly triggers backend work, and
+- local verification remains ambiguous under the defined ambiguity criteria.
+### Rule 4
+Network activity evidence MUST be treated as auxiliary signal, not authoritative proof of action success.
+## Unified Diagnostic Signals
+Network activity and log inspection are equivalent diagnostic signals.
+Both:
+- are secondary to UI state verification
+- MUST NOT be used as default fallback for local-state actions
+- follow the same escalation rules defined in this RFC
+## Impact on classify_action_outcome
+`classify_action_outcome` should be interpreted as routing logic, not a mandatory network escalation path.
+For `uiChanged=false`, action category determines next step.
+No automatic implication:
+```text
+uiChanged=false => inspect network
+```
+## Expected Benefits
+- Fewer unnecessary tool calls
+- Cleaner verification traces
+- Reduced cargo-cult network probing
+- Better behavior for local UI interactions
+- Stronger separation between verification and diagnosis
+- More reliable agent reasoning
+## Compatibility
+This is a patch-level specification correction.
+It refines routing semantics but does not break:
+- existing expect_* semantics
+- snapshot response shape
+- raw-over-semantic precedence
+- action execution model
+## Implementation Notes
+Follow-up work may include:
+- prompt updates
+- regression examples for sliders/toggles/pickers
+- protocol examples showing correct routing
+- telemetry on reduced unnecessary network inspections
+## Open Questions
+Questions for review:
+1. Should action category be explicitly emitted as runtime metadata, or is heuristic inference acceptable only within the fallback mapping layer defined in the Action Type Emission contract?
+2. Should side-effect actions permit optional log inspection alongside network hints?
+3. Should local-state verification examples be added to core spec or examples appendix?
+## Decision Requested
+Adopt verification routing based on action type and remove implicit default escalation from missing UI diffs to network inspection.

package/docs/specs/mcp-tooling-spec-v1.md CHANGED Viewed

@@ -41,7 +41,7 @@ Outcome-specific guidance:
 - visible navigation expected -> `wait_for_screen_change` (optional) -> `expect_screen`
 - local UI change expected -> `wait_for_ui` (optional) -> `expect_element_visible`
 - readable element state expected -> `wait_for_ui` (optional) -> `expect_state`
-- backend/API activity expected without a visible UI change -> compare `get_screen_fingerprint` before/after, then call `get_network_activity` immediately after the action and `classify_action_outcome` with the observed requests
+- backend/API activity expected without a visible UI change -> compare `get_screen_fingerprint` before/after, then call `classify_action_outcome` with the runtime `action_type`; collect `get_network_activity` only if the result remains ambiguous
 For backend/API activity, `wait_for_screen_change` is not the right verification tool unless a visible transition is also expected.
@@ -151,6 +151,7 @@ Examples:
 - `wait_for_ui`
 - `wait_for_screen_change`
+- `wait_for_ui_change`
 ### 6.2 Rules
@@ -239,6 +240,8 @@ Raw layer contents include:
 - UI hierarchy or accessibility tree
 - normalized readable element state where exposed by the platform
 - platform-native identity hints such as stable identifiers, roles, and test tags
+- snapshot metadata such as `snapshot_revision` and `captured_at_ms`
+- `loading_state` when a reliable loading signal is detectable
 - screenshot when available
 - element-level attributes
 - logs and fingerprint/activity observations
@@ -291,11 +294,11 @@ Tool: `classify_action_outcome`
 Rules:
-- MAY use UI, network, and log signals
+- MAY use UI, action, network, and log signals
 - MUST be deterministic
 - MUST NOT replace `expect_*` tools
 - MUST be treated as a supplementary signal only
-- SHOULD be used with `get_network_activity` when the expected outcome is backend/API activity without a visible UI change
+- SHOULD be used with `get_network_activity` only when the outcome is still ambiguous after routing by `action_type`
 It is not a verification mechanism.
@@ -305,10 +308,15 @@ Canonical pattern:
 `wait_for_ui -> tap_element -> wait_for_screen_change (optional) -> expect_screen`
+For in-place UI mutations, agents SHOULD prefer:
+`wait_for_ui_change -> expect_element_visible / expect_state`
 Interpretation:
 - `tap_element.success` = executed
 - `wait_for_screen_change.success` = UI changed
+- `wait_for_ui_change.success` = in-place UI mutation observed and stable
 - `expect_screen.success` = correct outcome verified
 ## 12. Known Deviations

package/docs/tools/interact.md CHANGED Viewed

@@ -17,6 +17,7 @@ Important:
 - `wait_for_*` tools must not be used as the final verification of action success when an applicable `expect_*` tool exists.
 - action tools report execution success, not outcome correctness.
+- `classify_action_outcome` should receive the runtime `action_type` when you want routing to distinguish local-state and side-effect actions.
 ## tap / swipe / type_text / press_back
@@ -54,10 +55,11 @@ Preferred verification:
 - navigation outcome known -> `expect_screen`
 - local UI change known -> `expect_element_visible`
 - readable element state known -> `expect_state`
-- backend/API activity expected -> `classify_action_outcome` + `get_network_activity`
+- backend/API activity expected -> `classify_action_outcome` + optional `get_network_activity` if the UI signal remains ambiguous
-Use `wait_for_screen_change` only when a visible transition is the expected outcome. If a button should trigger an API request but the screen should stay the same, rely on network activity and classification instead.
-For backend-only actions, prefer comparing `get_screen_fingerprint` before/after and call `get_network_activity` immediately after the action; do not wait on `wait_for_screen_change` if no visible transition is expected.
+Use `wait_for_screen_change` only when a visible transition is the expected outcome. If a button should trigger an API request but the screen should stay the same, rely on `action_type` plus classification first.
+For backend-only actions, prefer comparing `get_screen_fingerprint` before/after and collect `get_network_activity` immediately after the action only if the result is still ambiguous; do not wait on `wait_for_screen_change` if no visible transition is expected.
+Use `wait_for_ui_change` when the screen stays in place but visible text or element state should change.
 ---
@@ -148,6 +150,26 @@ Notes:
 ---
+## wait_for_ui_change
+Purpose:
+- detect a stable in-place UI mutation without naming a target element first
+Capabilities:
+- waits for hierarchy, text, or state deltas
+- uses snapshot revision metadata when available
+- confirms the change remains stable before returning success
+Guidance:
+- prefer `wait_for_screen_change` for navigation
+- prefer `wait_for_ui_change` for in-place updates and recomposition-style changes
+- follow with `expect_*` when the expected final state is known
+---
 ## find_element
 Locate a UI element on the current screen using semantic matching and return an actionable element descriptor.
@@ -486,17 +508,18 @@ Notes:
 ## classify_action_outcome + get_network_activity
-Use this pair when the action is expected to trigger network/backend work and the screen may not visibly change.
+Use this pair when the action may trigger network/backend work and the screen may not visibly change.
 Pattern:
 1. perform the action
 2. call `classify_action_outcome` with `uiChanged` from `wait_for_screen_change` or a screen fingerprint comparison
-3. if the classifier asks for it, call `get_network_activity`
-4. call `classify_action_outcome` again with `networkRequests`
+3. pass the runtime `action_type` value as `actionType`
+4. collect `get_network_activity` only if the action is side-effect oriented and the UI signal remains ambiguous
+5. call `classify_action_outcome` again with `networkRequests` if you collected them
 Guidance:
 - `uiChanged=true` or `expectedElementVisible=true` means the action outcome is already verified
-- `nextAction="call_get_network_activity"` means the UI signal was inconclusive and the agent should inspect network activity
-- if network requests succeed but the UI stays unchanged, treat the outcome as a backend/API result rather than a screen transition
+- local-state actions should prefer refreshed snapshots, `expect_state`, or `expect_element_visible` over default network inspection
+- network activity is auxiliary evidence, not mandatory proof

package/docs/tools/observe.md CHANGED Viewed

@@ -83,13 +83,14 @@ Input:
 Response (example):
 ```json
-{ "device": { "platform": "android", "id": "emulator-5554" }, "screen": "", "resolution": { "width": 1080, "height": 2400 }, "elements": [ { "text": "Sign in", "type": "android.widget.Button", "resourceId": "com.example:id/signin", "clickable": true, "bounds": [0,0,100,50], "state": { "enabled": true }, "stable_id": "com.example:id/signin", "role": "button", "test_tag": "com.example:id/signin", "selector": { "value": "com.example:id/signin", "confidence": { "score": 1, "reason": "resource_id" } }, "semantic": { "is_clickable": true, "is_container": false } } ] }
+{ "device": { "platform": "android", "id": "emulator-5554" }, "screen": "", "resolution": { "width": 1080, "height": 2400 }, "snapshot_revision": 12, "captured_at_ms": 1710000000123, "loading_state": { "active": true, "signal": "spinner", "source": "ui_tree" }, "elements": [ { "text": "Sign in", "type": "android.widget.Button", "resourceId": "com.example:id/signin", "clickable": true, "bounds": [0,0,100,50], "state": { "enabled": true }, "stable_id": "com.example:id/signin", "role": "button", "test_tag": "com.example:id/signin", "selector": { "value": "com.example:id/signin", "confidence": { "score": 1, "reason": "resource_id" } }, "semantic": { "is_clickable": true, "is_container": false } } ] }
 ```
 Notes:
 - Useful for inspection, selector development, and fallback debugging.
 - Elements may include a normalized `state` object when the platform exposes readable state such as checked, selected, focused, expanded, text input, or slider values.
 - Elements may also include platform-native identity hints such as `stable_id`, `role`, `test_tag`, `selector`, and `semantic`.
+- The tree response may include `snapshot_revision`, `captured_at_ms`, and `loading_state` when a reliable signal is available.
 - Prefer `wait_for_ui` for deterministic element resolution in interactive flows.
 ---
@@ -136,7 +137,8 @@ Behavior:
 - Fast by default: does not wait for new logs and avoids long blocking operations.
 - Returns a dual-layer payload:
   - `raw` is authoritative and contains the underlying observation data unchanged.
-  - `semantic` is optional, derived from `raw`, and intended for planning only.
+- `semantic` is optional, derived from `raw`, and intended for planning only.
+- `raw` now includes `snapshot_revision`, `captured_at_ms`, and `loading_state` when detectable.
 Response (example):

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "mobile-debug-mcp",
-  "version": "0.25.1",
+  "version": "0.26.1",
   "description": "MCP server for mobile app debugging (Android + iOS), with focus on security and reliability",
   "type": "module",
   "bin": {

package/skills/rfc-review/SKILL.md ADDED Viewed

@@ -0,0 +1,52 @@
+# RFC Review skill
+name: rfc-review
+version: 0.1.1
+summary: Reusable workflow for reviewing RFCs/specs in this repository with a consistent readiness rubric and output template.
+# Purpose
+Help an agent review an RFC for clarity, implementation readiness, and alignment with the current codebase. Use a common template so reviews stay consistent across documents and reviewers.
+# Activation conditions
+Activate when an agent needs to:
+- review a new or revised RFC
+- assess whether an RFC is implementation-ready
+- identify whether feedback is an RFC issue or an implementation issue
+- compare a spec against the current `src/` contract surface and docs
+# Surface area (actions)
+- locate-rfc
+- compare-against-code
+- assess-contract-completeness
+- classify-gaps
+- produce-review
+# Core guidance
+1. Read the RFC first, then compare it against the relevant code, docs, and tests.
+2. Separate **spec gaps** from **implementation gaps**.
+3. Check for: problem clarity, scope boundaries, explicit contracts, acceptance criteria, non-goals, and consistency with existing behavior.
+4. Prefer precise feedback that names the missing contract, unclear rule, or inconsistent behavior.
+5. Use the shared review template in `references/rfc-review-template.md` for the final output.
+6. If the RFC is not ready, say exactly what must be clarified before implementation can start.
+7. Classify each blocker as either a **spec gap** or an **implementation contract gap** and stop at that boundary.
+# Inputs & outputs
+- review-rfc(input: { rfcPath, relatedPaths?, focusAreas? }) -> { verdict, risks, specGaps, implementationGaps, recommendations }
+- compare-against-code(input: { rfcPath, codePaths[] }) -> { matches, mismatches, notes }
+- produce-review(input: { rfcPath, findings[] }) -> { summary, verdict, checklist, nextStep }
+# Failure handling
+- If the RFC file is missing, stop and report the missing path explicitly.
+- If the RFC is ambiguous, classify each concern as either "spec" or "implementation" instead of blending them.
+- If the review cannot be grounded in the current repo, state that the RFC is not reviewable yet.
+# Progressive disclosure
+- Keep this file short.
+- Load the reference template only when writing the final review.
+# References
+- `references/rfc-review-template.md` — standard review format and verdict rubric
+- `references/rfc-review-checklist.md` — questions to apply while reviewing an RFC
+# License
+Same as repository (MIT).

package/skills/rfc-review/references/rfc-review-checklist.md ADDED Viewed

@@ -0,0 +1,12 @@
+# RFC Review Checklist
+Ask these questions while reviewing:
+1. Is the problem statement specific and grounded in current failures?
+2. Are non-goals explicit?
+3. Are contracts concrete enough to implement?
+4. Are acceptance criteria testable?
+5. Does the RFC define the source of truth for new fields or behaviors?
+6. Does it match existing code paths and public tool surfaces?
+7. Can each open concern be classified as a spec issue or an implementation issue?
+8. Is the RFC ready to implement without further interpretation?

package/skills/rfc-review/references/rfc-review-template.md ADDED Viewed

@@ -0,0 +1,28 @@
+# RFC Review Template
+Use this structure for every RFC review:
+## Verdict
+- Ready / Needs clarification / Needs implementation contract / Not ready
+## Summary
+- One short paragraph on the RFC's current quality.
+## What is good
+- List the strongest parts of the RFC.
+## Issues
+For each issue, include:
+- **Type:** spec / implementation / implementation contract / doc
+- **Severity:** low / medium / high
+- **Why it matters:** one sentence
+- **Fix:** exact change needed
+## Missing contract surfaces
+- List any API shapes, response fields, state transitions, or invariants that are still undefined.
+## Codebase alignment
+- Note whether the RFC matches current `src/`, docs, and tests.
+## Next step
+- State the smallest next action needed to move the RFC forward.