PyPI - minitap-mobile-use - Versions diffs - 2.5.3__py3-none-any.whl → 2.6.0__py3-none-any.whl - Mend

minitap-mobile-use 2.5.3py3-none-any.whl → 2.6.0py3-none-any.whl

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Potentially problematic release.

This version of minitap-mobile-use might be problematic. Click here for more details.

Files changed (40) hide show

minitap/mobile_use/agents/contextor/contextor.py +0 -8
minitap/mobile_use/agents/cortex/cortex.md +122 -36
minitap/mobile_use/agents/cortex/cortex.py +32 -17
minitap/mobile_use/agents/cortex/types.py +18 -4
minitap/mobile_use/agents/executor/executor.md +3 -3
minitap/mobile_use/agents/executor/executor.py +10 -3
minitap/mobile_use/agents/hopper/hopper.md +30 -2
minitap/mobile_use/agents/hopper/hopper.py +19 -15
minitap/mobile_use/agents/orchestrator/orchestrator.py +14 -5
minitap/mobile_use/agents/outputter/outputter.py +13 -3
minitap/mobile_use/agents/planner/planner.md +20 -9
minitap/mobile_use/agents/planner/planner.py +12 -5
minitap/mobile_use/agents/screen_analyzer/human.md +16 -0
minitap/mobile_use/agents/screen_analyzer/screen_analyzer.py +111 -0
minitap/mobile_use/clients/ios_client.py +7 -3
minitap/mobile_use/config.py +87 -24
minitap/mobile_use/controllers/mobile_command_controller.py +354 -88
minitap/mobile_use/controllers/platform_specific_commands_controller.py +41 -27
minitap/mobile_use/controllers/types.py +95 -0
minitap/mobile_use/graph/graph.py +55 -11
minitap/mobile_use/graph/state.py +10 -3
minitap/mobile_use/main.py +12 -4
minitap/mobile_use/sdk/agent.py +109 -72
minitap/mobile_use/sdk/examples/smart_notification_assistant.py +59 -10
minitap/mobile_use/servers/device_hardware_bridge.py +13 -6
minitap/mobile_use/services/llm.py +5 -2
minitap/mobile_use/tools/index.py +7 -9
minitap/mobile_use/tools/mobile/{clear_text.py → focus_and_clear_text.py} +7 -7
minitap/mobile_use/tools/mobile/{input_text.py → focus_and_input_text.py} +8 -8
minitap/mobile_use/tools/mobile/long_press_on.py +130 -15
minitap/mobile_use/tools/mobile/swipe.py +3 -26
minitap/mobile_use/tools/mobile/tap.py +41 -28
minitap/mobile_use/tools/mobile/wait_for_delay.py +84 -0
minitap/mobile_use/utils/cli_helpers.py +10 -6
{minitap_mobile_use-2.5.3.dist-info → minitap_mobile_use-2.6.0.dist-info}/METADATA +1 -1
{minitap_mobile_use-2.5.3.dist-info → minitap_mobile_use-2.6.0.dist-info}/RECORD +38 -36
minitap/mobile_use/tools/mobile/glimpse_screen.py +0 -74
minitap/mobile_use/tools/mobile/wait_for_animation_to_end.py +0 -64
{minitap_mobile_use-2.5.3.dist-info → minitap_mobile_use-2.6.0.dist-info}/WHEEL +0 -0
{minitap_mobile_use-2.5.3.dist-info → minitap_mobile_use-2.6.0.dist-info}/entry_points.txt +0 -0

minitap/mobile_use/agents/planner/planner.md CHANGED Viewed

@@ -24,7 +24,13 @@ You work like an agile tech lead: defining the key milestones without locking in
    - A list of **agent thoughts**, including observations from the device, challenges encountered, and reasoning about what happened
    - Take into account the agent thoughts/previous plan to update the plan : maybe some steps are not required as we successfully completed them.
-   Use these inputs to update the plan: removing dead ends, adapting to what we learned, and suggesting new directions.
+   Your job is **not to restart from scratch**. Instead:
+   - Exclude subgoals that are already marked completed.
+   - Begin the new plan at the **next major action** after the last success.
+   - Use **agent thoughts only** as the source of truth when deciding what went wrong and what is possible next.
+   - If a subgoal failed or was partially wrong, redefine it based on what the agent thoughts revealed (e.g., pivot to 'search' if a contact wasn't in recent chats).
+   - Ensure the replanned steps still drive toward the original user goal, but always flow logically from the **current known state**.
 ### Output
@@ -56,17 +62,22 @@ Each subgoal should be:
 #### **Replanning Example**
-**Original Plan**: same as above
-**Agent Thoughts**:
+**Original Plan**:
+- Open the WhatsApp app to find the contact "Alice" (COMPLETED)
+- Open the conversation with Alice to send a message (FAILED)
+- Type the message "I'm running late" into the message field (NOT_STARTED)
+- Send the message (NOT_STARTED)
-- Couldn't find Alice in recent chats
-- Search bar was present on top of the chat screen
-- Keyboard appeared after tapping search
+**Agent Thoughts**:
+- Successfully launched WhatsApp app
+- Couldn't find Alice in recent chats - scrolled through visible conversations but no match
+- Search bar was present on top of the chat screen with resource-id com.whatsapp:id/menuitem_search
+- Previous approach of manually scrolling through chats is inefficient for this case
 **New Plan**:
-- Open WhatsApp
 - Tap the search bar to find a contact
 - Search for "Alice" in the search field
 - Select the correct chat to open the conversation
-- Type and send "I’m running late"
+- Type and send "I'm running late"
+**Reasoning**: The agent thoughts reveal that WhatsApp is already open (first subgoal completed), but Alice wasn't in recent chats. Rather than restarting, we pivot to using the search feature that was observed, continuing from the current state.

minitap/mobile_use/agents/planner/planner.py CHANGED Viewed

@@ -7,7 +7,7 @@ from minitap.mobile_use.agents.planner.types import PlannerOutput, Subgoal, Subg
 from minitap.mobile_use.agents.planner.utils import generate_id, one_of_them_is_failure
 from minitap.mobile_use.context import MobileUseContext
 from minitap.mobile_use.graph.state import State
-from minitap.mobile_use.services.llm import get_llm, invoke_llm_with_timeout_message
+from minitap.mobile_use.services.llm import get_llm, invoke_llm_with_timeout_message, with_fallback
 from minitap.mobile_use.tools.index import EXECUTOR_WRAPPERS_TOOLS, format_tools_list
 from minitap.mobile_use.utils.decorators import wrap_with_callbacks
 from minitap.mobile_use.utils.logger import get_logger
@@ -46,10 +46,17 @@ class PlannerNode:
             HumanMessage(content=human_message),
         ]
-        llm = get_llm(ctx=self.ctx, name="planner")
-        llm = llm.with_structured_output(PlannerOutput)
-        response: PlannerOutput = await invoke_llm_with_timeout_message(
-            llm.ainvoke(messages), agent_name="Planner"
+        llm = get_llm(ctx=self.ctx, name="planner").with_structured_output(PlannerOutput)
+        llm_fallback = get_llm(
+            ctx=self.ctx, name="planner", use_fallback=True
+        ).with_structured_output(PlannerOutput)
+        response: PlannerOutput = await with_fallback(
+            main_call=lambda: invoke_llm_with_timeout_message(
+                llm.ainvoke(messages), agent_name="Planner"
+            ),
+            fallback_call=lambda: invoke_llm_with_timeout_message(
+                llm_fallback.ainvoke(messages), agent_name="Planner (Fallback)"
+            ),
         )  # type: ignore
         subgoals_plan = [
             Subgoal(

minitap/mobile_use/agents/screen_analyzer/human.md ADDED Viewed

@@ -0,0 +1,16 @@
+## Task
+Analyze the provided screenshot and answer the following specific question:
+{{ prompt }}
+## Instructions
+1. Look carefully at the screenshot
+2. Provide a concise, direct answer to the question
+3. Focus only on what is visible in the screenshot
+4. Be specific and factual
+## Output
+Provide your analysis as a clear, concise text response.

minitap/mobile_use/agents/screen_analyzer/screen_analyzer.py ADDED Viewed

@@ -0,0 +1,111 @@
+from pathlib import Path
+from jinja2 import Template
+from langchain_core.messages import HumanMessage, SystemMessage
+from minitap.mobile_use.context import MobileUseContext
+from minitap.mobile_use.controllers.mobile_command_controller import (
+    take_screenshot as take_screenshot_controller,
+)
+from minitap.mobile_use.graph.state import State
+from minitap.mobile_use.services.llm import get_llm, invoke_llm_with_timeout_message, with_fallback
+from minitap.mobile_use.utils.conversations import get_screenshot_message_for_llm
+from minitap.mobile_use.utils.decorators import wrap_with_callbacks
+from minitap.mobile_use.utils.logger import get_logger
+from minitap.mobile_use.utils.media import compress_base64_jpeg
+logger = get_logger(__name__)
+class ScreenAnalyzerNode:
+    def __init__(self, ctx: MobileUseContext):
+        self.ctx = ctx
+    @wrap_with_callbacks(
+        before=lambda: logger.info("Starting Screen Analyzer Agent..."),
+        on_success=lambda _: logger.success("Screen Analyzer Agent"),
+        on_failure=lambda _: logger.error("Screen Analyzer Agent"),
+    )
+    async def __call__(self, state: State):
+        # Check if there's a screen analysis request
+        if not state.screen_analysis_prompt:
+            logger.info("No screen analysis prompt, skipping")
+            return {}
+        prompt = state.screen_analysis_prompt
+        analysis_result = "Analysis failed"
+        has_failed = False
+        try:
+            # Take a fresh screenshot
+            screenshot_output = take_screenshot_controller(ctx=self.ctx)
+            compressed_image_base64 = compress_base64_jpeg(screenshot_output)
+            # Invoke the screen_analyzer
+            analysis_result = await screen_analyzer(
+                ctx=self.ctx, screenshot_base64=compressed_image_base64, prompt=prompt
+            )
+        except Exception as e:
+            logger.error(f"Screen analysis failed: {e}")
+            analysis_result = f"Failed to analyze screen: {str(e)}"
+            has_failed = True
+        # Create outcome message
+        if has_failed:
+            agent_outcome = f"Screen analysis failed: {analysis_result}"
+        else:
+            agent_outcome = f"Screen analysis result: {analysis_result}"
+        return await state.asanitize_update(
+            ctx=self.ctx,
+            update={
+                "agents_thoughts": [agent_outcome],
+                "screen_analysis_prompt": None,
+            },
+            agent="screen_analyzer",
+        )
+async def screen_analyzer(ctx: MobileUseContext, screenshot_base64: str, prompt: str) -> str:
+    """
+    Analyzes a screenshot using a VLM and returns a textual description based on the prompt.
+    Args:
+        ctx: The mobile use context
+        screenshot_base64: Base64 encoded screenshot
+        prompt: The specific question or instruction for analyzing the screenshot
+    Returns:
+        A concise textual description answering the prompt
+    """
+    logger.info("Starting Screen Analyzer Agent")
+    system_message = (
+        "You are a visual analysis assistant. "
+        "Your task is to examine screenshots and provide accurate, "
+        "concise answers to specific questions about what you see."
+    )
+    human_message = Template(
+        Path(__file__).parent.joinpath("human.md").read_text(encoding="utf-8")
+    ).render(prompt=prompt)
+    messages = [
+        SystemMessage(content=system_message),
+        get_screenshot_message_for_llm(screenshot_base64),
+        HumanMessage(content=human_message),
+    ]
+    llm = get_llm(ctx=ctx, name="screen_analyzer", temperature=0)
+    llm_fallback = get_llm(ctx=ctx, name="screen_analyzer", use_fallback=True, temperature=0)
+    response = await with_fallback(
+        main_call=lambda: invoke_llm_with_timeout_message(
+            llm.ainvoke(messages), agent_name="ScreenAnalyzer"
+        ),
+        fallback_call=lambda: invoke_llm_with_timeout_message(
+            llm_fallback.ainvoke(messages), agent_name="ScreenAnalyzer (Fallback)"
+        ),
+    )
+    return response.content  # type: ignore

minitap/mobile_use/clients/ios_client.py CHANGED Viewed

@@ -27,9 +27,13 @@ def get_ios_devices() -> tuple[bool, list[str], str]:
         for runtime, devices in devices_dict.items():
             if "ios" in runtime.lower():  # e.g. "com.apple.CoreSimulator.SimRuntime.iOS-17-0"
-                for dev in devices:
-                    if "udid" in dev:
-                        serials.append(dev["udid"])
+                for device in devices:
+                    if device.get("state") != "Booted":
+                        continue
+                    device_udid = device.get("udid")
+                    if not device_udid:
+                        continue
+                    serials.append(device_udid)
         return True, serials, ""

minitap/mobile_use/config.py CHANGED Viewed

@@ -94,8 +94,9 @@ def record_events(output_path: Path | None, events: list[str] | BaseModel | Any)
 LLMProvider = Literal["openai", "google", "openrouter", "xai", "vertexai", "minitap"]
 LLMUtilsNode = Literal["outputter", "hopper"]
-AgentNode = Literal["planner", "orchestrator", "cortex", "executor"]
-AgentNodeWithFallback = Literal["cortex"]
+LLMUtilsNodeWithFallback = LLMUtilsNode
+AgentNode = Literal["planner", "orchestrator", "cortex", "screen_analyzer", "executor"]
+AgentNodeWithFallback = AgentNode
 ROOT_DIR = Path(__file__).parent.parent.parent
 DEFAULT_LLM_CONFIG_FILENAME = "llm-config.defaults.jsonc"
@@ -149,21 +150,23 @@ class LLMWithFallback(LLM):
 class LLMConfigUtils(BaseModel):
-    outputter: LLM
-    hopper: LLM
+    outputter: LLMWithFallback
+    hopper: LLMWithFallback
 class LLMConfig(BaseModel):
-    planner: LLM
-    orchestrator: LLM
+    planner: LLMWithFallback
+    orchestrator: LLMWithFallback
     cortex: LLMWithFallback
-    executor: LLM
+    screen_analyzer: LLMWithFallback
+    executor: LLMWithFallback
     utils: LLMConfigUtils
     def validate_providers(self):
         self.planner.validate_provider("Planner")
         self.orchestrator.validate_provider("Orchestrator")
         self.cortex.validate_provider("Cortex")
+        self.screen_analyzer.validate_provider("ScreenAnalyzer")
         self.executor.validate_provider("Executor")
         self.utils.outputter.validate_provider("Outputter")
         self.utils.hopper.validate_provider("Hopper")
@@ -173,16 +176,17 @@ class LLMConfig(BaseModel):
 📃 Planner: {self.planner}
 🎯 Orchestrator: {self.orchestrator}
 🧠 Cortex: {self.cortex}
+👁️ ScreenAnalyzer: {self.screen_analyzer}
 🛠️ Executor: {self.executor}
 🧩 Utils:
     🔽 Hopper: {self.utils.hopper}
     📝 Outputter: {self.utils.outputter}
 """
-    def get_agent(self, item: AgentNode) -> LLM:
+    def get_agent(self, item: AgentNode) -> LLMWithFallback:
         return getattr(self, item)
-    def get_utils(self, item: LLMUtilsNode) -> LLM:
+    def get_utils(self, item: LLMUtilsNode) -> LLMWithFallback:
         return getattr(self.utils, item)
@@ -196,17 +200,42 @@ def get_default_llm_config() -> LLMConfig:
     except Exception as e:
         logger.error(f"Failed to load default llm config: {e}. Falling back to hardcoded config")
         return LLMConfig(
-            planner=LLM(provider="openai", model="gpt-4.1"),
-            orchestrator=LLM(provider="openai", model="gpt-4.1"),
+            planner=LLMWithFallback(
+                provider="openai",
+                model="gpt-5-nano",
+                fallback=LLM(provider="openai", model="gpt-5-mini"),
+            ),
+            orchestrator=LLMWithFallback(
+                provider="openai",
+                model="gpt-5-nano",
+                fallback=LLM(provider="openai", model="gpt-5-mini"),
+            ),
             cortex=LLMWithFallback(
                 provider="openai",
-                model="o3",
-                fallback=LLM(provider="openai", model="gpt-5"),
+                model="gpt-5",
+                fallback=LLM(provider="openai", model="o4-mini"),
+            ),
+            screen_analyzer=LLMWithFallback(
+                provider="openai",
+                model="gpt-4o",
+                fallback=LLM(provider="openai", model="gpt-5-mini"),
+            ),
+            executor=LLMWithFallback(
+                provider="openai",
+                model="gpt-5-nano",
+                fallback=LLM(provider="openai", model="gpt-5-mini"),
             ),
-            executor=LLM(provider="openai", model="gpt-4.1"),
             utils=LLMConfigUtils(
-                outputter=LLM(provider="openai", model="gpt-5-nano"),
-                hopper=LLM(provider="openai", model="gpt-4.1"),
+                outputter=LLMWithFallback(
+                    provider="openai",
+                    model="gpt-5-nano",
+                    fallback=LLM(provider="openai", model="gpt-5-mini"),
+                ),
+                hopper=LLMWithFallback(
+                    provider="openai",
+                    model="gpt-5-nano",
+                    fallback=LLM(provider="openai", model="gpt-5-mini"),
+                ),
             ),
         )
@@ -223,26 +252,60 @@ def get_default_minitap_llm_config() -> LLMConfig | None:
         return None
     return LLMConfig(
-        planner=LLM(provider="minitap", model="meta-llama/llama-4-scout"),
-        orchestrator=LLM(provider="minitap", model="openai/gpt-oss-120b"),
+        planner=LLMWithFallback(
+            provider="minitap",
+            model="meta-llama/llama-4-scout",
+            fallback=LLM(provider="minitap", model="meta-llama/llama-4-maverick"),
+        ),
+        orchestrator=LLMWithFallback(
+            provider="minitap",
+            model="openai/gpt-oss-120b",
+            fallback=LLM(provider="minitap", model="meta-llama/llama-4-maverick"),
+        ),
         cortex=LLMWithFallback(
             provider="minitap",
             model="google/gemini-2.5-pro",
             fallback=LLM(provider="minitap", model="openai/gpt-5"),
         ),
-        executor=LLM(provider="minitap", model="meta-llama/llama-3.3-70b-instruct"),
+        screen_analyzer=LLMWithFallback(
+            provider="minitap",
+            model="meta-llama/llama-3.2-90b-vision-instruct",
+            fallback=LLM(provider="minitap", model="openai/gpt-4o"),
+        ),
+        executor=LLMWithFallback(
+            provider="minitap",
+            model="meta-llama/llama-3.3-70b-instruct",
+            fallback=LLM(provider="minitap", model="openai/gpt-5-mini"),
+        ),
         utils=LLMConfigUtils(
-            outputter=LLM(provider="minitap", model="openai/gpt-4.1"),
-            hopper=LLM(provider="minitap", model="openai/gpt-5-nano"),
+            outputter=LLMWithFallback(
+                provider="minitap",
+                model="openai/gpt-5-nano",
+                fallback=LLM(provider="minitap", model="openai/gpt-5-mini"),
+            ),
+            hopper=LLMWithFallback(
+                provider="minitap",
+                model="openai/gpt-5-nano",
+                fallback=LLM(provider="minitap", model="openai/gpt-5-mini"),
+            ),
         ),
     )
 def deep_merge_llm_config(default: LLMConfig, override: dict) -> LLMConfig:
-    def _deep_merge_dict(base: dict, extra: dict):
+    def _deep_merge_dict(base: dict, extra: dict, path: str = ""):
         for key, value in extra.items():
-            if isinstance(value, dict):
-                _deep_merge_dict(base[key], value)
+            current_path = f"{path}.{key}" if path else key
+            if key not in base:
+                logger.warning(
+                    f"Unsupported config key '{current_path}' found in override config. "
+                    f"Ignoring this key."
+                )
+                continue
+            if isinstance(value, dict) and isinstance(base[key], dict):
+                _deep_merge_dict(base[key], value, current_path)
             else:
                 base[key] = value

minitap-mobile-use 2.5.3__py3-none-any.whl → 2.6.0__py3-none-any.whl

Potentially problematic release.

minitap-mobile-use 2.5.3py3-none-any.whl → 2.6.0py3-none-any.whl