npm - mobile-debug-mcp - Versions diffs - 0.26.4 → 0.27.0 - Mend

mobile-debug-mcp 0.26.4 → 0.27.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (26) hide show

package/dist/interact/index.js +392 -192
package/dist/observe/ios.js +47 -3
package/dist/server/common.js +39 -0
package/dist/server-core.js +1 -1
package/dist/utils/android/utils.js +35 -3
package/docs/CHANGELOG.md +6 -0
package/docs/ROADMAP.md +114 -16
package/docs/rfcs/009-semantic-control-modeling-for-custom-and-composite-controls.md +238 -0
package/docs/rfcs/010-verification-stabilization-and-temporal-convergence.md +265 -0
package/docs/rfcs/011-recovery-and-replanning-for-failed-or-ambiguous-interaction-flows.md +321 -0
package/docs/rfcs/011.1-recovery-contract-types-and-runtime-wiring-spec.md +253 -0
package/docs/rfcs/012.md +203 -0
package/docs/specs/mcp-tooling-spec-v1.md +34 -0
package/docs/tools/interact.md +10 -0
package/package.json +1 -1
package/src/interact/index.ts +433 -194
package/src/observe/ios.ts +42 -3
package/src/server/common.ts +44 -1
package/src/server-core.ts +1 -1
package/src/types.ts +41 -1
package/src/utils/android/utils.ts +30 -3
package/test/unit/interact/adjust_control.test.ts +77 -1
package/test/unit/interact/verification_stabilization.test.ts +94 -0
package/test/unit/observe/find_element.test.ts +46 -0
package/test/unit/observe/state_extraction.test.ts +65 -2
package/test/unit/server/common.test.ts +36 -1

package/docs/rfcs/010-verification-stabilization-and-temporal-convergence.md ADDED Viewed

@@ -0,0 +1,265 @@
+# RFC 010 — Verification Stabilization and Temporal Convergence
+## 1. Summary
+This RFC defines a verification stabilization layer that ensures UI state transitions are not misclassified due to timing instability, transient UI states, or stale snapshots.
+It introduces temporal semantics into verification so that readiness and state checks are based on convergence over time, not a single snapshot.
+---
+## 2. Problem Statement
+Current verification behavior is snapshot-based and may produce false-negative failures when UI state is in transition.
+Observed issues include:
+- readiness checks timing out even though UI converges shortly after
+- stale snapshots being treated as authoritative state
+- transient UI states causing premature failure classification
+- mismatch between UI convergence and verification success
+These issues lead to unnecessary retries, incorrect failure classification, and degraded automation reliability.
+---
+## 3. Goals
+This RFC introduces a temporal verification model that MUST:
+- reduce false-negative readiness failures
+- ensure verification reflects stable UI convergence
+- introduce bounded recheck before failure
+- debounce transient mismatches
+- maintain deterministic verification behavior
+---
+## 4. Non-Goals
+This RFC does NOT define:
+- recovery or replanning strategies (covered by a later RFC)
+- probabilistic verification
+- ML-based state inference
+- changes to action execution semantics
+Verification remains deterministic and grounded in observable UI state.
+---
+## 5. Runtime Ownership and Integration
+This RFC applies to existing verification surfaces:
+- expect_* handlers (e.g. expect_state)
+- readiness checks in wait_for_ui_element
+- post-action verification in src/interact
+It augments these surfaces with temporal semantics; it does not replace them.
+### 5.1 Ownership and Composition with Existing Logic
+This RFC refines existing behavior rather than introducing a parallel mechanism.
+- `wait_for_ui_element` (and underlying `waitForUICore`) owns **readiness stabilization**.
+- `expect_*` handlers (e.g. `expect_state`) own **state verification stabilization**.
+- `src/interact` owns **post-action verification application** of these rules.
+Composition rules:
+- `wait_for_ui_element` MUST apply stabilization for presence/readiness before returning success or failure.
+- `expect_*` MUST apply stabilization for state/value assertions.
+- If both are used in sequence, `wait_for_ui_element` completes first, then `expect_*` applies its own stabilization.
+- Stabilization MUST NOT be duplicated across layers for the same check.
+---
+## 6. Temporal Verification Model
+Verification MUST consider state over time, not a single observation.
+### 6.1 Stabilization Window
+Verification SHOULD use a bounded observation window before declaring failure.
+Within this window:
+- multiple UI reads MAY be performed
+- transient mismatches MUST NOT immediately trigger failure
+### 6.2 Verify-Until-Stable
+Verification SHOULD require state to be stable across consecutive observations before success is confirmed.
+Example:
+- state must match expected condition for N consecutive reads
+### 6.3 Debounce Semantics
+Transient mismatches SHOULD be debounced.
+Short-lived mismatches within the stabilization window MUST NOT be treated as terminal failure.
+### 6.4 Deterministic Defaults (Required)
+Implementations MUST use bounded defaults unless explicitly overridden:
+- `stabilization_window_ms`: 1000ms (range: 500–1500ms)
+- `stable_observation_count`: 2 consecutive matching reads
+- `max_recheck_attempts`: 3
+- `min_read_interval_ms`: 100–200ms between reads
+These values MUST be configurable but bounded to prevent unbounded waits.
+---
+## 6.1 Reference Stabilization Algorithm
+For a given verification predicate `P(snapshot)`:
+1. Start timer `t0`.
+2. Initialize `stable_count = 0`, `attempts = 0`.
+3. Loop until `now - t0 > stabilization_window_ms` OR `stable_count >= stable_observation_count`:
+   - Read fresh snapshot `S`.
+   - If `P(S)` is true:
+       - `stable_count += 1`
+     Else:
+       - `stable_count = 0`
+   - `attempts += 1`
+   - Sleep `min_read_interval_ms`.
+4. If `stable_count >= stable_observation_count`: SUCCESS
+5. Else if `attempts < max_recheck_attempts`:
+   - Perform one additional fresh read and re-evaluate once.
+6. Else: FAILURE
+Notes:
+- Implementations MUST ensure at least one fresh read occurs before failure.
+- Debounce is achieved via resetting `stable_count` on mismatch.
+---
+## 7. Snapshot Freshness
+Verification MUST account for snapshot freshness.
+### 7.1 Freshness Constraints
+- snapshots older than `snapshot_stale_threshold_ms` MUST be considered stale (default: 500ms)
+- stale snapshots MUST NOT be used as final verification evidence and MUST trigger a fresh read
+### 7.2 Re-read Requirement
+Before declaring failure, the system MUST attempt at least one fresh UI read within the stabilization window.
+### 7.3 Freshness Defaults
+- `snapshot_stale_threshold_ms`: 500ms (range: 300–800ms)
+---
+## 8. Runtime Failure Code Mapping
+Existing runtime failure signals MUST map into RFC 010 failure categories.
+| Runtime Code | RFC 010 Category |
+|--------------|------------------|
+| ELEMENT_NOT_FOUND | Target Resolution Failure |
+| STALE_REFERENCE | Target Resolution Failure |
+| AMBIGUOUS_TARGET | Target Resolution Failure |
+| TIMEOUT | Execution Failure |
+| ACTION_REJECTED | Execution Failure |
+| VERIFICATION_FAILED | Verification Failure |
+| EXPECT_STATE_MISMATCH | Verification Failure |
+| CONTROL_CONVERGENCE_FAILED | Control Convergence Failure |
+| SEMANTIC_MISMATCH | Semantic Mismatch Failure |
+| UNKNOWN | Execution Failure (default fallback) |
+This mapping MUST be deterministic, exhaustive, and versioned with the runtime.
+### 8.1 Failure Gating Rules
+Failure MUST only be emitted when:
+- stabilization window is exhausted
+- fresh snapshot verification still fails
+Transient mismatches SHOULD NOT be classified as:
+- TIMEOUT
+- VERIFICATION_FAILED
+until stabilization logic has completed.
+- FAILURE MUST NOT be emitted if `stable_observation_count` has not been attempted within the stabilization window.
+- FAILURE MUST NOT be emitted without at least one fresh read within `snapshot_stale_threshold_ms`.
+- TIMEOUT MUST correspond to exhaustion of `stabilization_window_ms`, not a single read failure.
+---
+## 9. Integration with RFC 005 (Verification Correctness)
+RFC 005 defines what correctness means.
+RFC 010 defines when correctness can be confidently evaluated.
+RFC 010 augments RFC 005 by introducing temporal convergence requirements before asserting success or failure.
+---
+## 10. Integration with RFC 006 (Execution Layer)
+Post-action verification in src/interact MUST apply stabilization logic before returning failure.
+Execution MUST NOT prematurely surface verification failure without applying temporal checks defined in this RFC.
+`src/interact` MUST wrap post-action verification with the reference stabilization algorithm. It MUST pass through configuration (window, counts) and MUST NOT short-circuit on first mismatch.
+---
+## 11. Integration with RFC 011.1 (Recovery Contract)
+Verification stabilization reduces false-positive failure signals that would otherwise trigger downstream recovery mechanisms (defined in a companion RFC).
+---
+## 13. Output Behavior (Progressive Extension)
+Future implementations MAY expose additional metadata such as:
+```ts
+interface VerificationMetadata {
+  stabilization_attempts?: number;
+  stabilization_window_ms?: number;
+  stable_observation_count?: number;
+  snapshot_freshness_ms?: number;
+}
+```
+These fields are optional and for observability only.
+---
+## 14. Failure Modes
+Verification stabilization MAY fail due to:
+- UI never converging to expected state
+- repeated oscillation of UI state
+- persistent stale snapshot conditions
+In these cases, failure MUST be emitted after stabilization window is exhausted.
+---
+## 15. Success Metrics
+- reduced false-negative readiness failures
+- higher first-pass verification success
+- lower premature timeout rates
+- improved reliability of wait and readiness checks
+---
+## 16. Summary
+This RFC introduces temporal stabilization into verification, ensuring that UI state is evaluated based on convergence over time rather than single snapshots. It improves reliability by eliminating transient mismatches and stale-state errors without introducing probabilistic behavior.

package/docs/rfcs/011-recovery-and-replanning-for-failed-or-ambiguous-interaction-flows.md ADDED Viewed

@@ -0,0 +1,321 @@
+# RFC 011 — Recovery and Replanning for Failed or Ambiguous Interaction Flows
+## 1. Summary
+This RFC defines a structured recovery and replanning model for UI interaction failures, enabling the system to respond to execution uncertainty with bounded, deterministic recovery strategies.
+It extends the interaction stack defined in RFCs 005–009 by introducing explicit failure classification, recovery policy selection, and bounded replanning of interaction sequences.
+---
+## 2. Problem Statement
+Even with reliable execution primitives (RFC 005–009), UI interactions can fail due to:
+- incorrect or stale target resolution
+- state drift between observation and execution
+- ambiguous or partial UI snapshots
+- control convergence failures (RFC 008)
+- semantic mismatches in custom/Compose controls (RFC 009)
+Currently, failure handling is implicit and ad hoc, often resulting in:
+- repeated identical retries
+- stalled flows with no recovery path
+- loss of interaction context
+- inability to switch strategy after failure
+This leads to brittle automation behavior even when core primitives are correct.
+---
+## 3. Goals
+This RFC introduces a structured recovery system that MUST:
+- classify failures into distinct categories
+- select appropriate recovery strategies based on failure type
+- enable bounded replanning of interaction flows
+- prevent infinite retry loops
+- preserve interaction context across recovery attempts
+- improve robustness under UI drift or ambiguity
+---
+## 4. Non-Goals
+This RFC does NOT define:
+- new UI interaction primitives (covered in RFC 006–008)
+- new target resolution mechanisms (RFC 007)
+- new control semantics (RFC 008–009)
+- general autonomous planning system
+- ML-based decision making or probabilistic policy learning
+Recovery is deterministic and rule-based in this version.
+---
+## 5. Runtime Ownership and Integration
+Recovery is a cross-layer concern with explicit ownership:
+### 5.1 Server Layer (src/server)
+- Detects failure conditions from action execution results
+- Emits normalized failure objects
+- Applies initial failure classification mapping
+### 5.2 Interact Layer (src/interact)
+- Executes recovery strategies
+- Performs re-resolution, retry, and step-back operations where supported
+- Maintains bounded retry loops
+### 5.3 Shared Contract Layer
+- Defines failure schema
+- Defines recovery state machine transitions
+Recovery is NOT owned by a single layer; it is a coordinated contract between server and interact.
+---
+## 5. Failure Classification Model
+All interaction failures MUST be classified into one of the following categories:
+### 5.1 Target Resolution Failure
+- element not found
+- ambiguous or multiple matches
+- stale UI tree snapshot
+### 5.2 Execution Failure
+- action could not be dispatched
+- runtime rejection of interaction
+- invalid gesture or control interaction
+### 5.3 Verification Failure
+- action executed but expected state not observed
+- expect_state mismatch (RFC 005)
+### 5.4 Control Convergence Failure
+- adjustable control failed to reach target state (RFC 008)
+### 5.5 Semantic Mismatch Failure
+- control semantics inferred incorrectly (RFC 009)
+---
+## 6. Runtime Failure Code Mapping
+Existing runtime failure signals MUST map into RFC 011 failure categories.
+| Runtime Code | RFC 011 Category |
+|--------------|------------------|
+| ELEMENT_NOT_FOUND | Target Resolution Failure |
+| STALE_REFERENCE | Target Resolution Failure |
+| AMBIGUOUS_TARGET | Target Resolution Failure |
+| TIMEOUT | Execution Failure |
+| ACTION_REJECTED | Execution Failure |
+| VERIFICATION_FAILED | Verification Failure |
+| EXPECT_STATE_MISMATCH | Verification Failure |
+| CONTROL_CONVERGENCE_FAILED | Control Convergence Failure |
+| UNKNOWN | Execution Failure (default fallback) |
+This mapping MUST be deterministic and versioned with the runtime.
+---
+## 6. Recovery Strategy Model
+Each failure type MUST map to a bounded set of recovery strategies:
+### 6.1 Re-resolve Strategy
+Re-run target resolution (RFC 007) with updated context.
+Used for:
+- stale snapshot
+- ambiguous target
+---
+### 6.2 Alternate Candidate Strategy
+Select next-best candidate from resolved targets.
+Used for:
+- multiple matches
+- incorrect initial resolution
+---
+### 6.3 State Refresh Strategy
+Re-observe UI state before retrying action.
+Used for:
+- drift between observation and execution
+---
+### 6.4 Retry with Constraint Adjustment
+Retry action with adjusted parameters:
+- increased tolerance (RFC 008)
+- alternative interaction mode
+Used for:
+- convergence failures
+- flaky execution paths
+---
+### 6.5 Step-back Strategy
+Rollback interaction context one step and re-enter flow.
+Used for:
+- persistent verification failure
+- inconsistent UI state transitions
+---
+## 7. Replanning Model
+Replanning is the process of constructing a new bounded interaction sequence after failure.
+A replanned sequence MUST:
+- preserve original intent
+- incorporate failure classification context
+- apply a recovery strategy
+- remain bounded in retry depth
+Replanning is NOT full autonomous task planning.
+---
+## 7.1 Scope of Replanning
+Replanning in this RFC is strictly scoped to:
+- Single-action recovery sequences
+- Local retry chains
+- Bounded corrective adjustments
+It does NOT include:
+- multi-step autonomous task planning
+- global goal decomposition
+- long-horizon planning
+Replanning is therefore a bounded extension of execution, not a planning system.
+---
+## 8. Recovery State and Budget Contract
+The system MUST represent recovery state explicitly per action.
+### 8.1 Recovery State Schema (conceptual)
+{
+  "failure_class": "TargetResolutionFailure | ExecutionFailure | VerificationFailure | ControlConvergenceFailure | SemanticMismatchFailure",
+  "recovery_strategy": "re_resolve | alternate_candidate | state_refresh | retry_adjustment | step_back",
+  "recovery_attempts": 0,
+  "max_recovery_attempts": 3,
+  "retry_depth": 0,
+  "max_retry_depth": 3
+}
+### 8.2 Budget Rules
+- Each action MUST track recovery_attempts
+- Recovery MUST NOT exceed max_recovery_attempts
+- retry_depth MUST be bounded per interaction step
+- Exhaustion MUST produce a terminal failure state
+### 8.3 Enforcement Point
+Budget enforcement is the responsibility of the Interact layer (src/interact), with server providing initial values.
+---
+## 9. Execution Context Model
+Full rollback is NOT required or assumed.
+The system MUST preserve:
+- last resolved target set (RFC 007)
+- last executed action descriptor (RFC 006)
+- last verification result (RFC 005)
+- recovery_attempts counter
+The system MAY optionally retain:
+- prior candidate selections
+- intermediate resolution outputs
+Step-back is implemented as a re-resolution + re-execution, NOT a full state rollback system.
+---
+## 10. Relationship to Existing RFCs
+### RFC 005 — Correctness Model
+Defines verification failures that trigger recovery.
+### RFC 006 — Runtime Binding
+Defines execution surface where failures occur.
+### RFC 007 — Target Resolution
+Provides alternate candidates for recovery strategies.
+### RFC 008 — Control-State Convergence
+Defines recovery paths for control adjustment failures.
+### RFC 009 — Semantic Control Model
+Defines classification of semantic mismatch failures.
+---
+## 11. Expected System Behaviour
+On failure:
+1. classify runtime failure using deterministic mapping (Section 6)
+2. select recovery strategy
+3. optionally re-resolve target
+4. re-execute bounded action
+5. verify outcome using RFC 005 or mark recovery attempt failure
+6. escalate if budget exceeded
+---
+## 12. Structured Failure Output Contract
+When recovery is exhausted or fails, the system MUST emit a structured failure object:
+{
+  "failure_class": "...",
+  "runtime_code": "...",
+  "resolved_target": "...",
+  "attempted_recovery_strategies": ["..."],
+  "recovery_attempts": 3,
+  "final_state": "failed"
+}
+This ensures consistent observability across server and interact layers.
+---
+## 12. Success Metrics
+- reduction in stuck interaction flows
+- reduced repeated identical retries
+- improved recovery success rate after first failure
+- improved robustness under UI drift
+- clearer structured failure outputs
+---
+## 13. Summary
+This RFC introduces deterministic recovery and replanning for UI interaction failures, enabling the system to remain robust under ambiguity, drift, and execution uncertainty while preserving bounded and explainable behavior.