PyPI - minitap-mobile-use - Versions diffs - 2.0.1__py3-none-any.whl → 2.2.0__py3-none-any.whl - Mend

minitap-mobile-use 2.0.1py3-none-any.whl → 2.2.0py3-none-any.whl

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Potentially problematic release.

This version of minitap-mobile-use might be problematic. Click here for more details.

Files changed (62) hide show

minitap/mobile_use/agents/cortex/cortex.md +7 -5
minitap/mobile_use/agents/cortex/cortex.py +4 -1
minitap/mobile_use/agents/cortex/types.py +1 -3
minitap/mobile_use/agents/executor/executor.md +4 -5
minitap/mobile_use/agents/executor/executor.py +3 -1
minitap/mobile_use/agents/executor/tool_node.py +6 -6
minitap/mobile_use/agents/outputter/outputter.py +1 -2
minitap/mobile_use/agents/planner/planner.md +11 -2
minitap/mobile_use/agents/planner/planner.py +7 -2
minitap/mobile_use/agents/planner/types.py +3 -4
minitap/mobile_use/agents/summarizer/summarizer.py +2 -1
minitap/mobile_use/config.py +31 -16
minitap/mobile_use/context.py +3 -4
minitap/mobile_use/controllers/mobile_command_controller.py +36 -24
minitap/mobile_use/controllers/platform_specific_commands_controller.py +3 -4
minitap/mobile_use/graph/graph.py +1 -0
minitap/mobile_use/graph/state.py +9 -9
minitap/mobile_use/main.py +7 -8
minitap/mobile_use/sdk/agent.py +25 -26
minitap/mobile_use/sdk/builders/agent_config_builder.py +9 -10
minitap/mobile_use/sdk/builders/task_request_builder.py +9 -9
minitap/mobile_use/sdk/examples/smart_notification_assistant.py +1 -2
minitap/mobile_use/sdk/types/agent.py +5 -5
minitap/mobile_use/sdk/types/task.py +19 -18
minitap/mobile_use/sdk/utils.py +4 -3
minitap/mobile_use/servers/config.py +1 -2
minitap/mobile_use/servers/device_hardware_bridge.py +3 -4
minitap/mobile_use/servers/start_servers.py +4 -4
minitap/mobile_use/servers/stop_servers.py +2 -3
minitap/mobile_use/services/llm.py +24 -6
minitap/mobile_use/tools/index.py +26 -14
minitap/mobile_use/tools/mobile/back.py +1 -1
minitap/mobile_use/tools/mobile/clear_text.py +277 -0
minitap/mobile_use/tools/mobile/copy_text_from.py +1 -1
minitap/mobile_use/tools/mobile/erase_one_char.py +56 -0
minitap/mobile_use/tools/mobile/find_packages.py +1 -1
minitap/mobile_use/tools/mobile/input_text.py +4 -80
minitap/mobile_use/tools/mobile/launch_app.py +1 -1
minitap/mobile_use/tools/mobile/long_press_on.py +2 -4
minitap/mobile_use/tools/mobile/open_link.py +1 -1
minitap/mobile_use/tools/mobile/paste_text.py +1 -1
minitap/mobile_use/tools/mobile/press_key.py +1 -1
minitap/mobile_use/tools/mobile/stop_app.py +2 -4
minitap/mobile_use/tools/mobile/swipe.py +107 -9
minitap/mobile_use/tools/mobile/take_screenshot.py +1 -1
minitap/mobile_use/tools/mobile/tap.py +2 -4
minitap/mobile_use/tools/mobile/wait_for_animation_to_end.py +2 -4
minitap/mobile_use/tools/tool_wrapper.py +6 -1
minitap/mobile_use/tools/utils.py +86 -0
minitap/mobile_use/utils/cli_helpers.py +1 -2
minitap/mobile_use/utils/cli_selection.py +5 -6
minitap/mobile_use/utils/decorators.py +21 -20
minitap/mobile_use/utils/logger.py +3 -4
minitap/mobile_use/utils/media.py +1 -1
minitap/mobile_use/utils/recorder.py +2 -9
minitap/mobile_use/utils/ui_hierarchy.py +13 -5
{minitap_mobile_use-2.0.1.dist-info → minitap_mobile_use-2.2.0.dist-info}/METADATA +35 -5
minitap_mobile_use-2.2.0.dist-info/RECORD +96 -0
minitap/mobile_use/tools/mobile/erase_text.py +0 -122
minitap_mobile_use-2.0.1.dist-info/RECORD +0 -94
{minitap_mobile_use-2.0.1.dist-info → minitap_mobile_use-2.2.0.dist-info}/WHEEL +0 -0
{minitap_mobile_use-2.0.1.dist-info → minitap_mobile_use-2.2.0.dist-info}/entry_points.txt +0 -0

minitap/mobile_use/agents/cortex/cortex.md CHANGED Viewed

@@ -35,17 +35,19 @@ Focus on the **current PENDING subgoal and the next subgoals not yet started**.
 - Past agent thoughts
 - Recent tool effects
-2.2. Otherwise, output a **stringified structured set of instructions** that an **Executor agent** can perform on a real mobile device:
+  2.2. Otherwise, output a **stringified structured set of instructions** that an **Executor agent** can perform on a real mobile device:
-- These must be **concrete low-level actions**: back, tap, swipe, launch app, find packages, close app, input text, paste, erase text, copy, etc.
-- Your goal is to achieve subgoals **fast** - so you must put as much actions as possible in your instructions to complete all achievable subgoals (based on your observations) in one go.
-- When you need to open an app, use the `find_packages` low-level action to try and get its name.
+- These must be **concrete low-level actions**.
+- The executor has the following available tools: {{ executor_tools_list }}.
+- Your goal is to achieve subgoals **fast** - so you must put as much actions as possible in your instructions to complete all achievable subgoals (based on your observations) in one go.
+- To open URLs/links directly, use the `open_link` tool - it will automatically handle opening in the appropriate browser. It also handles deep links.
+- When you need to open an app, use the `find_packages` low-level action to try and get its name. Then, simply use the `launch_app` low-level action to launch it.
 - If you refer to a UI element or coordinates, specify it clearly (e.g., `resource-id: com.whatsapp:id/search`, `text: "Alice"`, `x: 100, y: 200`).
 - **The structure is up to you**, but it must be valid **JSON stringified output**. You will accompany this output with a **natural-language summary** of your reasoning and approach in your agent thought.
 - **Never use a sequence of `tap` + `input_text` to type into a field. Always use a single `input_text` action** with the correct `resource_id` (this already ensures the element is focused and the cursor is moved to the end).
 - When you want to launch/stop an app, prefer using its package name.
 - **Only reference UI element IDs or visible texts that are explicitly present in the provided UI hierarchy or screenshot. Do not invent, infer, or guess any IDs or texts that are not directly observed**.
-- **For text clearing**: When you need to completely clear text from an input field, always use **LONG PRESS** first to select the text field, then erase. Do NOT use tap + erase as this only clears from cursor position.
+- **For text clearing**: When you need to completely clear text from an input field, always call the `clear_text` tool with the correct resource_id. This tool automatically focuses the element, and ensures the field is emptied. If you notice this tool fails to clear the text, try to long press the input, select all, and call `erase_one_char`.
 ### Output

minitap/mobile_use/agents/cortex/cortex.py CHANGED Viewed

@@ -10,12 +10,14 @@ from langchain_core.messages import (
     ToolMessage,
 )
 from langgraph.graph.message import REMOVE_ALL_MESSAGES
 from minitap.mobile_use.agents.cortex.types import CortexOutput
 from minitap.mobile_use.agents.planner.utils import get_current_subgoal
 from minitap.mobile_use.constants import EXECUTOR_MESSAGES_KEY
 from minitap.mobile_use.context import MobileUseContext
 from minitap.mobile_use.graph.state import State
 from minitap.mobile_use.services.llm import get_llm, with_fallback
+from minitap.mobile_use.tools.index import EXECUTOR_WRAPPERS_TOOLS, format_tools_list
 from minitap.mobile_use.utils.conversations import get_screenshot_message_for_llm
 from minitap.mobile_use.utils.decorators import wrap_with_callbacks
 from minitap.mobile_use.utils.logger import get_logger
@@ -44,6 +46,7 @@ class CortexNode:
             current_subgoal=get_current_subgoal(state.subgoal_plan),
             agents_thoughts=state.agents_thoughts,
             executor_feedback=executor_feedback,
+            executor_tools_list=format_tools_list(ctx=self.ctx, wrappers=EXECUTOR_WRAPPERS_TOOLS),
         )
         messages = [
             SystemMessage(content=system_message),
@@ -83,7 +86,7 @@ class CortexNode:
         is_subgoal_completed = (
             response.complete_subgoals_by_ids is not None
             and len(response.complete_subgoals_by_ids) > 0
-            and len(response.decisions) == 0
+            and (len(response.decisions) == 0 or response.decisions in ["{}", "[]", "null", ""])
         )
         if not is_subgoal_completed:
             response.complete_subgoals_by_ids = []

minitap/mobile_use/agents/cortex/types.py CHANGED Viewed

@@ -1,11 +1,9 @@
-from typing import Optional
 from pydantic import BaseModel, Field
 class CortexOutput(BaseModel):
     decisions: str = Field(..., description="The decisions to be made. A stringified JSON object")
     agent_thought: str = Field(..., description="The agent's thought")
-    complete_subgoals_by_ids: Optional[list[str]] = Field(
+    complete_subgoals_by_ids: list[str] | None = Field(
         [], description="List of subgoal IDs to complete"
     )

minitap/mobile_use/agents/executor/executor.md CHANGED Viewed

@@ -64,14 +64,13 @@ When using the `input_text` tool:
 #### 🔄 Text Clearing Best Practice
-When you need to completely clear text from an input field, **DO NOT** simply use `erase_text` alone, as it only erases from the cursor position, backward. Instead:
+When you need to completely clear text from an input field, always use the clear_text tool with the correct resource_id.
-1. **Use `long_press_on` first** to select the text field and bring up selection options
-2. **Then use `erase_text`** to clear the selected content
+This tool automatically takes care of focusing the element (if needed), and ensuring the field is fully emptied.
-This approach ensures the **entire text content** is removed, not just the portion before the cursor position. The long press will typically select all text in the field, making the subsequent erase operation more effective.
+Only and if only the clear_text tool fails to clear the text, try to long press the input, select all, and call erase_one_char.
-### 🔁 Final Notes
+#### 🔁 Final Notes
 - **You do not need to reason or decide strategy** — that's the Cortex's job.
 - You simply interpret and execute — like hands following the brain.

minitap/mobile_use/agents/executor/executor.py CHANGED Viewed

@@ -3,6 +3,8 @@ from pathlib import Path
 from jinja2 import Template
 from langchain_core.messages import HumanMessage, SystemMessage
 from langchain_google_genai import ChatGoogleGenerativeAI
+from langchain_google_vertexai.chat_models import ChatVertexAI
 from minitap.mobile_use.constants import EXECUTOR_MESSAGES_KEY
 from minitap.mobile_use.context import MobileUseContext
 from minitap.mobile_use.graph.state import State
@@ -56,7 +58,7 @@ class ExecutorNode:
         }
         # ChatGoogleGenerativeAI does not support the "parallel_tool_calls" keyword
-        if not isinstance(llm, ChatGoogleGenerativeAI):
+        if not isinstance(llm, ChatGoogleGenerativeAI | ChatVertexAI):
             llm_bind_tools_kwargs["parallel_tool_calls"] = True
         llm = llm.bind_tools(**llm_bind_tools_kwargs)

minitap/mobile_use/agents/executor/tool_node.py CHANGED Viewed

@@ -1,8 +1,8 @@
 import asyncio
-from typing import Any, Optional
+from typing import Any
 from langgraph.types import Command
 from pydantic import BaseModel
-from typing_extensions import override
+from typing import override
 from langchain_core.runnables import RunnableConfig
 from langgraph.store.base import BaseStore
 from langchain_core.messages import AnyMessage, ToolCall, ToolMessage
@@ -21,7 +21,7 @@ class ExecutorToolNode(ToolNode):
         input: list[AnyMessage] | dict[str, Any] | BaseModel,
         config: RunnableConfig,
         *,
-        store: Optional[BaseStore],
+        store: BaseStore | None,
     ):
         return await self.__func(is_async=True, input=input, config=config, store=store)
@@ -31,7 +31,7 @@ class ExecutorToolNode(ToolNode):
         input: list[AnyMessage] | dict[str, Any] | BaseModel,
         config: RunnableConfig,
         *,
-        store: Optional[BaseStore],
+        store: BaseStore | None,
     ) -> Any:
         loop = asyncio.get_event_loop()
         return loop.run_until_complete(
@@ -44,7 +44,7 @@ class ExecutorToolNode(ToolNode):
         input: list[AnyMessage] | dict[str, Any] | BaseModel,
         config: RunnableConfig,
         *,
-        store: Optional[BaseStore],
+        store: BaseStore | None,
     ) -> Any:
         tool_calls, input_type = self._parse_input(input, store)
         outputs: list[Command | ToolMessage] = []
@@ -74,7 +74,7 @@ class ExecutorToolNode(ToolNode):
         self,
         call: ToolCall,
         output: ToolMessage | Command,
-    ) -> Optional[bool]:
+    ) -> bool | None:
         if isinstance(output, ToolMessage):
             return output.status == "error"
         if isinstance(output, Command):

minitap/mobile_use/agents/outputter/outputter.py CHANGED Viewed

@@ -1,6 +1,5 @@
 import json
 from pathlib import Path
-from typing import Dict, Type, Union
 from jinja2 import Template
 from langchain_core.messages import BaseMessage, HumanMessage, SystemMessage
@@ -49,7 +48,7 @@ async def outputter(
     structured_llm = llm
     if output_config.structured_output:
-        schema: Union[Dict, Type[BaseModel], None] = None
+        schema: dict | type[BaseModel] | None = None
         so = output_config.structured_output
         if isinstance(so, dict):

minitap/mobile_use/agents/planner/planner.md CHANGED Viewed

@@ -12,7 +12,9 @@ You work like an agile tech lead: defining the key milestones without locking in
    - Subgoals should reflect real interactions with mobile UIs (e.g. "Open app", "Tap search bar", "Scroll to item", "Send message to Bob", etc).
    - Don't assume the full UI is visible yet. Plan based on how most mobile apps work, and keep flexibility.
    - List of agents thoughts is empty which is expected, since it is the first plan.
-   - Don't use precise UI actions when formulating subgoals like "copy", "paste", "tap", "swipe", ... unless explicitly asked in the initial goal.
+   - Avoid too granular UI actions based tasks (e.g. "tap", "swipe", "copy", "paste") unless explicitly required.
+   - The executor has the following available tools: {{ executor_tools_list }}.
+     When one of these tools offers a direct shortcut (e.g. `openLink` instead of manually launching a browser and typing a URL), prefer it over decomposed manual steps.
 2. **Replanning**
    If you're asked to **revise a previous plan**, you'll also receive:
@@ -47,12 +49,19 @@ If you're replaning and need to keep a previous subgoal, you **must keep the sam
 - Type the message "I’m running late" (ID: None)
 - Send the message (ID: None)
+#### **Initial Goal**: "Go on https://tesla.com, and tell me what is the first car being displayed"
+**Plan**:
+- Open the link https://tesla.com (ID: None)
+- Find the first car displayed on the home page (ID: None)
 #### **Replanning Example**
 **Original Plan**: same as above with IDs set
 **Agent Thoughts**:
-- Couldn’t find Alice in recent chats
+- Couldn't find Alice in recent chats
 - Search bar was present on top of the chat screen
 - Keyboard appeared after tapping search

minitap/mobile_use/agents/planner/planner.py CHANGED Viewed

@@ -1,13 +1,15 @@
-from pathlib import Path
 import uuid
+from pathlib import Path
 from jinja2 import Template
 from langchain_core.messages import HumanMessage, SystemMessage
 from minitap.mobile_use.agents.planner.types import PlannerOutput, Subgoal, SubgoalStatus
 from minitap.mobile_use.agents.planner.utils import one_of_them_is_failure
 from minitap.mobile_use.context import MobileUseContext
 from minitap.mobile_use.graph.state import State
 from minitap.mobile_use.services.llm import get_llm
+from minitap.mobile_use.tools.index import EXECUTOR_WRAPPERS_TOOLS, format_tools_list
 from minitap.mobile_use.utils.decorators import wrap_with_callbacks
 from minitap.mobile_use.utils.logger import get_logger
@@ -28,7 +30,10 @@ class PlannerNode:
         system_message = Template(
             Path(__file__).parent.joinpath("planner.md").read_text(encoding="utf-8")
-        ).render(platform=self.ctx.device.mobile_platform.value)
+        ).render(
+            platform=self.ctx.device.mobile_platform.value,
+            executor_tools_list=format_tools_list(ctx=self.ctx, wrappers=EXECUTOR_WRAPPERS_TOOLS),
+        )
         human_message = Template(
             Path(__file__).parent.joinpath("human.md").read_text(encoding="utf-8")
         ).render(

minitap/mobile_use/agents/planner/types.py CHANGED Viewed

@@ -1,12 +1,11 @@
 from enum import Enum
-from typing import Optional
 from pydantic import BaseModel
-from typing_extensions import Annotated
+from typing import Annotated
 class PlannerSubgoalOutput(BaseModel):
-    id: Annotated[Optional[str], "If not provided, it will be generated"] = None
+    id: Annotated[str | None, "If not provided, it will be generated"] = None
     description: str
@@ -25,7 +24,7 @@ class Subgoal(BaseModel):
     id: Annotated[str, "Unique identifier of the subgoal"]
     description: Annotated[str, "Description of the subgoal"]
     completion_reason: Annotated[
-        Optional[str], "Reason why the subgoal was completed (failure or success)"
+        str | None, "Reason why the subgoal was completed (failure or success)"
     ] = None
     status: SubgoalStatus

minitap/mobile_use/agents/summarizer/summarizer.py CHANGED Viewed

@@ -3,6 +3,7 @@ from langchain_core.messages import (
     RemoveMessage,
     ToolMessage,
 )
 from minitap.mobile_use.constants import MAX_MESSAGES_IN_HISTORY
 from minitap.mobile_use.context import MobileUseContext
 from minitap.mobile_use.graph.state import State
@@ -22,7 +23,7 @@ class SummarizerNode:
         start_removal = False
         for msg in reversed(state.messages[:nb_removal_candidates]):
-            if isinstance(msg, (ToolMessage, HumanMessage)):
+            if isinstance(msg, ToolMessage | HumanMessage):
                 start_removal = True
             if start_removal and msg.id:
                 remove_messages.append(RemoveMessage(id=msg.id))

minitap/mobile_use/config.py CHANGED Viewed

@@ -1,9 +1,11 @@
 import json
 import os
 from pathlib import Path
-from typing import Annotated, Any, Literal, Optional, Union
+from typing import Annotated, Any, Literal
+import google.auth
 from dotenv import load_dotenv
+from google.auth.exceptions import DefaultCredentialsError
 from pydantic import BaseModel, Field, SecretStr, ValidationError, model_validator
 from pydantic_settings import BaseSettings
@@ -17,17 +19,17 @@ logger = get_logger(__name__)
 class Settings(BaseSettings):
-    OPENAI_API_KEY: Optional[SecretStr] = None
-    GOOGLE_API_KEY: Optional[SecretStr] = None
-    XAI_API_KEY: Optional[SecretStr] = None
-    OPEN_ROUTER_API_KEY: Optional[SecretStr] = None
+    OPENAI_API_KEY: SecretStr | None = None
+    GOOGLE_API_KEY: SecretStr | None = None
+    XAI_API_KEY: SecretStr | None = None
+    OPEN_ROUTER_API_KEY: SecretStr | None = None
-    OPENAI_BASE_URL: Optional[str] = None
+    OPENAI_BASE_URL: str | None = None
-    DEVICE_SCREEN_API_BASE_URL: Optional[str] = None
-    DEVICE_HARDWARE_BRIDGE_BASE_URL: Optional[str] = None
-    ADB_HOST: Optional[str] = None
-    ADB_PORT: Optional[int] = None
+    DEVICE_SCREEN_API_BASE_URL: str | None = None
+    DEVICE_HARDWARE_BRIDGE_BASE_URL: str | None = None
+    ADB_HOST: str | None = None
+    ADB_PORT: int | None = None
     model_config = {"env_file": ".env", "extra": "ignore"}
@@ -71,7 +73,7 @@ def prepare_output_files() -> tuple[str | None, str | None]:
     return validated_events_path, validated_results_path
-def record_events(output_path: Path | None, events: Union[list[str], BaseModel, Any]):
+def record_events(output_path: Path | None, events: list[str] | BaseModel | Any):
     if not output_path:
         return
@@ -88,7 +90,7 @@ def record_events(output_path: Path | None, events: Union[list[str], BaseModel,
 ### LLM Configuration
-LLMProvider = Literal["openai", "google", "openrouter", "xai"]
+LLMProvider = Literal["openai", "google", "openrouter", "xai", "vertexai"]
 LLMUtilsNode = Literal["outputter", "hopper"]
 AgentNode = Literal["planner", "orchestrator", "cortex", "executor"]
 AgentNodeWithFallback = Literal["cortex"]
@@ -98,6 +100,17 @@ DEFAULT_LLM_CONFIG_FILENAME = "llm-config.defaults.jsonc"
 OVERRIDE_LLM_CONFIG_FILENAME = "llm-config.override.jsonc"
+def validate_vertex_ai_credentials():
+    try:
+        _, project = google.auth.default()
+        if not project:
+            raise Exception("VertexAI requires a Google Cloud project to be set.")
+    except DefaultCredentialsError as e:
+        raise Exception(
+            f"VertexAI requires valid Google Application Default Credentials (ADC): {e}"
+        )
 class LLM(BaseModel):
     provider: LLMProvider
     model: str
@@ -110,6 +123,8 @@ class LLM(BaseModel):
             case "google":
                 if not settings.GOOGLE_API_KEY:
                     raise Exception(f"{name} requires GOOGLE_API_KEY in .env")
+            case "vertexai":
+                validate_vertex_ai_credentials()
             case "openrouter":
                 if not settings.OPEN_ROUTER_API_KEY:
                     raise Exception(f"{name} requires OPEN_ROUTER_API_KEY in .env")
@@ -170,7 +185,7 @@ def get_default_llm_config() -> LLMConfig:
     try:
         if not os.path.exists(ROOT_DIR / DEFAULT_LLM_CONFIG_FILENAME):
             raise Exception("Default llm config not found")
-        with open(ROOT_DIR / DEFAULT_LLM_CONFIG_FILENAME, "r") as f:
+        with open(ROOT_DIR / DEFAULT_LLM_CONFIG_FILENAME) as f:
             default_config_dict = load_jsonc(f)
         return LLMConfig.model_validate(default_config_dict["default"])
     except Exception as e:
@@ -211,7 +226,7 @@ def parse_llm_config() -> LLMConfig:
     override_config_dict = {}
     if os.path.exists(ROOT_DIR / OVERRIDE_LLM_CONFIG_FILENAME):
         logger.info("Loading custom llm config...")
-        with open(ROOT_DIR / OVERRIDE_LLM_CONFIG_FILENAME, "r") as f:
+        with open(ROOT_DIR / OVERRIDE_LLM_CONFIG_FILENAME) as f:
             override_config_dict = load_jsonc(f)
     else:
         logger.warning("Custom llm config not found, loading default config")
@@ -237,7 +252,7 @@ def initialize_llm_config() -> LLMConfig:
 class OutputConfig(BaseModel):
     structured_output: Annotated[
-        Optional[Union[type[BaseModel], dict]],
+        type[BaseModel] | dict | None,
         Field(
             default=None,
             description=(
@@ -247,7 +262,7 @@ class OutputConfig(BaseModel):
         ),
     ]
     output_description: Annotated[
-        Optional[str],
+        str | None,
         Field(
             default=None,
             description=(

minitap/mobile_use/context.py CHANGED Viewed

@@ -6,12 +6,11 @@ Uses ContextVar to avoid prop drilling and maintain clean function signatures.
 from enum import Enum
 from pathlib import Path
-from typing import Optional
 from adbutils import AdbClient
 from openai import BaseModel
 from pydantic import ConfigDict
-from typing_extensions import Literal
+from typing import Literal
 from minitap.mobile_use.clients.device_hardware_client import DeviceHardwareClient
 from minitap.mobile_use.clients.screen_api_client import ScreenApiClient
@@ -56,8 +55,8 @@ class MobileUseContext(BaseModel):
     hw_bridge_client: DeviceHardwareClient
     screen_api_client: ScreenApiClient
     llm_config: LLMConfig
-    adb_client: Optional[AdbClient] = None
-    execution_setup: Optional[ExecutionSetup] = None
+    adb_client: AdbClient | None = None
+    execution_setup: ExecutionSetup | None = None
     def get_adb_client(self) -> AdbClient:
         if self.adb_client is None:

minitap/mobile_use/controllers/mobile_command_controller.py CHANGED Viewed

@@ -1,6 +1,6 @@
 import uuid
 from enum import Enum
-from typing import Annotated, Literal, Optional, Union
+from typing import Annotated, Literal
 import yaml
 from langgraph.types import Command
@@ -43,7 +43,7 @@ class RunFlowRequest(BaseModel):
     dry_run: bool = Field(default=False, alias="dryRun")
-def run_flow(ctx: MobileUseContext, flow_steps: list, dry_run: bool = False) -> Optional[dict]:
+def run_flow(ctx: MobileUseContext, flow_steps: list, dry_run: bool = False) -> dict | None:
     """
     Run a flow i.e, a sequence of commands.
     Returns None on success, or the response body of the failed command.
@@ -137,20 +137,20 @@ class SelectorRequestWithPercentages(BaseModel):
         return {"point": self.percentages.to_str()}
-SelectorRequest = Union[
-    IdSelectorRequest,
-    SelectorRequestWithCoordinates,
-    SelectorRequestWithPercentages,
-    TextSelectorRequest,
-    IdWithTextSelectorRequest,
-]
+SelectorRequest = (
+    IdSelectorRequest
+    | SelectorRequestWithCoordinates
+    | SelectorRequestWithPercentages
+    | TextSelectorRequest
+    | IdWithTextSelectorRequest
+)
 def tap(
     ctx: MobileUseContext,
     selector_request: SelectorRequest,
     dry_run: bool = False,
-    index: Optional[int] = None,
+    index: int | None = None,
 ):
     """
     Tap on a selector.
@@ -171,7 +171,7 @@ def long_press_on(
     ctx: MobileUseContext,
     selector_request: SelectorRequest,
     dry_run: bool = False,
-    index: Optional[int] = None,
+    index: int | None = None,
 ):
     long_press_on_body = selector_request.to_dict()
     if not long_press_on_body:
@@ -211,7 +211,7 @@ SwipeDirection = Annotated[
 class SwipeRequest(BaseModel):
     model_config = ConfigDict(extra="forbid")
     swipe_mode: SwipeStartEndCoordinatesRequest | SwipeStartEndPercentagesRequest | SwipeDirection
-    duration: Optional[int] = None  # in ms, default is 400ms
+    duration: int | None = None  # in ms, default is 400ms
     def to_dict(self):
         res = {}
@@ -257,7 +257,7 @@ def paste_text(ctx: MobileUseContext, dry_run: bool = False):
     return run_flow(ctx, ["pasteText"], dry_run=dry_run)
-def erase_text(ctx: MobileUseContext, nb_chars: Optional[int] = None, dry_run: bool = False):
+def erase_text(ctx: MobileUseContext, nb_chars: int | None = None, dry_run: bool = False):
     """
     Removes characters from the currently selected textfield (if any)
     Removes 50 characters if nb_chars is not specified.
@@ -275,7 +275,7 @@ def launch_app(ctx: MobileUseContext, package_name: str, dry_run: bool = False):
     return run_flow_with_wait_for_animation_to_end(ctx, flow_input, dry_run=dry_run)
-def stop_app(ctx: MobileUseContext, package_name: Optional[str] = None, dry_run: bool = False):
+def stop_app(ctx: MobileUseContext, package_name: str | None = None, dry_run: bool = False):
     if package_name is None:
         flow_input = ["stopApp"]
     else:
@@ -311,13 +311,13 @@ def press_key(ctx: MobileUseContext, key: Key, dry_run: bool = False):
 class WaitTimeout(Enum):
-    SHORT = 500
-    MEDIUM = 1000
-    LONG = 5000
+    SHORT = "500"
+    MEDIUM = "1000"
+    LONG = "5000"
 def wait_for_animation_to_end(
-    ctx: MobileUseContext, timeout: Optional[WaitTimeout] = None, dry_run: bool = False
+    ctx: MobileUseContext, timeout: WaitTimeout | None = None, dry_run: bool = False
 ):
     if timeout is None:
         return run_flow(ctx, ["waitForAnimationToEnd"], dry_run=dry_run)
@@ -327,7 +327,7 @@ def wait_for_animation_to_end(
 def run_flow_with_wait_for_animation_to_end(
     ctx: MobileUseContext, base_flow: list, dry_run: bool = False
 ):
-    base_flow.append({"waitForAnimationToEnd": {"timeout": WaitTimeout.MEDIUM.value}})
+    base_flow.append({"waitForAnimationToEnd": {"timeout": int(WaitTimeout.MEDIUM.value)}})
     return run_flow(ctx, base_flow, dry_run=dry_run)
@@ -362,15 +362,27 @@ if __name__ == "__main__":
         agents_thoughts=[],
     )
-    from minitap.mobile_use.tools.mobile.input_text import get_input_text_tool
-    input_resource_id = "com.google.android.apps.nexuslauncher:id/search_container_hotseat"
-    command_output: Command = get_input_text_tool(ctx=ctx).invoke(
+    # from minitap.mobile_use.tools.mobile.input_text import get_input_text_tool
+    # input_resource_id = "com.google.android.apps.nexuslauncher:id/search_container_hotseat"
+    # command_output: Command = get_input_text_tool(ctx=ctx).invoke(
+    #     {
+    #         "tool_call_id": uuid.uuid4().hex,
+    #         "agent_thought": "",
+    #         "text_input_resource_id": input_resource_id,
+    #         "text": "Hello World",
+    #         "state": dummy_state,
+    #         "executor_metadata": None,
+    #     }
+    # )
+    from minitap.mobile_use.tools.mobile.clear_text import get_clear_text_tool
+    input_resource_id = "com.google.android.apps.nexuslauncher:id/input"
+    command_output: Command = get_clear_text_tool(ctx=ctx).invoke(
         {
             "tool_call_id": uuid.uuid4().hex,
             "agent_thought": "",
             "text_input_resource_id": input_resource_id,
-            "text": "Hello World",
             "state": dummy_state,
             "executor_metadata": None,
         }

minitap/mobile_use/controllers/platform_specific_commands_controller.py CHANGED Viewed

@@ -1,6 +1,5 @@
 from datetime import date
 import json
-from typing import Optional
 from adbutils import AdbDevice
 from minitap.mobile_use.utils.logger import MobileUseLogger
@@ -20,8 +19,8 @@ def get_adb_device(ctx: MobileUseContext) -> AdbDevice:
 def get_first_device(
-    logger: Optional[MobileUseLogger] = None,
-) -> tuple[Optional[str], Optional[DevicePlatform]]:
+    logger: MobileUseLogger | None = None,
+) -> tuple[str | None, DevicePlatform | None]:
     """Gets the first available device."""
     try:
         android_output = run_shell_command_on_host("adb devices")
@@ -50,7 +49,7 @@ def get_first_device(
     return None, None
-def get_focused_app_info(ctx: MobileUseContext) -> Optional[str]:
+def get_focused_app_info(ctx: MobileUseContext) -> str | None:
     if ctx.device.mobile_platform == DevicePlatform.IOS:
         return None
     device = get_adb_device(ctx)

minitap/mobile_use/graph/graph.py CHANGED Viewed

@@ -6,6 +6,7 @@ from langchain_core.messages import (
 from langgraph.constants import END, START
 from langgraph.graph import StateGraph
 from langgraph.graph.state import CompiledStateGraph
 from minitap.mobile_use.agents.contextor.contextor import ContextorNode
 from minitap.mobile_use.agents.cortex.cortex import CortexNode
 from minitap.mobile_use.agents.executor.executor import ExecutorNode

minitap/mobile_use/graph/state.py CHANGED Viewed

@@ -1,7 +1,7 @@
 from langchain_core.messages import AIMessage, AnyMessage
 from langgraph.graph import add_messages
 from langgraph.prebuilt.chat_agent_executor import AgentStatePydantic
-from typing_extensions import Annotated, Optional
+from typing import Annotated
 from minitap.mobile_use.agents.planner.types import Subgoal
 from minitap.mobile_use.config import AgentNode
@@ -24,16 +24,16 @@ class State(AgentStatePydantic):
     subgoal_plan: Annotated[list[Subgoal], "The current plan, made of subgoals"]
     # contextor related keys
-    latest_screenshot_base64: Annotated[Optional[str], "Latest screenshot of the device", take_last]
+    latest_screenshot_base64: Annotated[str | None, "Latest screenshot of the device", take_last]
     latest_ui_hierarchy: Annotated[
-        Optional[list[dict]], "Latest UI hierarchy of the device", take_last
+        list[dict] | None, "Latest UI hierarchy of the device", take_last
     ]
-    focused_app_info: Annotated[Optional[str], "Focused app info", take_last]
-    device_date: Annotated[Optional[str], "Date of the device", take_last]
+    focused_app_info: Annotated[str | None, "Focused app info", take_last]
+    device_date: Annotated[str | None, "Date of the device", take_last]
     # cortex related keys
     structured_decisions: Annotated[
-        Optional[str],
+        str | None,
         "Structured decisions made by the cortex, for the executor to follow",
         take_last,
     ]
@@ -45,7 +45,7 @@ class State(AgentStatePydantic):
     # executor related keys
     executor_messages: Annotated[list[AnyMessage], "Sequential Executor messages", add_messages]
-    cortex_last_thought: Annotated[Optional[str], "Last thought of the cortex for the executor"]
+    cortex_last_thought: Annotated[str | None, "Last thought of the cortex for the executor"]
     # common keys
     agents_thoughts: Annotated[
@@ -58,13 +58,13 @@ class State(AgentStatePydantic):
         self,
         ctx: MobileUseContext,
         update: dict,
-        agent: Optional[AgentNode] = None,
+        agent: AgentNode | None = None,
     ):
         """
         Sanitizes the state update to ensure it is valid and apply side effect logic where required.
         The agent is required if the update contains the "agents_thoughts" key.
         """
-        updated_agents_thoughts: Optional[str | list[str]] = update.get("agents_thoughts", None)
+        updated_agents_thoughts: str | list[str] | None = update.get("agents_thoughts", None)
         if updated_agents_thoughts is not None:
             if isinstance(updated_agents_thoughts, str):
                 updated_agents_thoughts = [updated_agents_thoughts]

minitap-mobile-use 2.0.1__py3-none-any.whl → 2.2.0__py3-none-any.whl

Potentially problematic release.

minitap-mobile-use 2.0.1py3-none-any.whl → 2.2.0py3-none-any.whl