PyPI - minitap-mobile-use - Versions diffs - 2.3.0__tar.gz → 2.4.0__tar.gz - Mend

minitap-mobile-use 2.3.0tar.gz → 2.4.0tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Potentially problematic release.

This version of minitap-mobile-use might be problematic. Click here for more details.

Files changed (104) hide show

{minitap_mobile_use-2.3.0 → minitap_mobile_use-2.4.0}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.3
 Name: minitap-mobile-use
-Version: 2.3.0
+Version: 2.4.0
 Summary: AI-powered multi-agent system that automates real Android and iOS devices through low-level control using LangGraph.
 Author: Pierre-Louis Favreau, Jean-Pierre Lo, Nicolas Dehandschoewercker
 License: MIT License
@@ -43,9 +43,11 @@ Requires-Dist: uvicorn[standard]==0.30.1
 Requires-Dist: colorama>=0.4.6
 Requires-Dist: psutil>=5.9.0
 Requires-Dist: langchain-google-vertexai>=2.0.28
+Requires-Dist: httpx>=0.28.1
 Requires-Dist: ruff==0.5.3 ; extra == 'dev'
 Requires-Dist: pytest==8.4.1 ; extra == 'dev'
 Requires-Dist: pytest-cov==5.0.0 ; extra == 'dev'
+Requires-Dist: pyright==1.1.405 ; extra == 'dev'
 Requires-Python: >=3.12
 Project-URL: Homepage, https://minitap.ai/
 Project-URL: Source, https://github.com/minitap-ai/mobile-use

{minitap_mobile_use-2.3.0 → minitap_mobile_use-2.4.0}/minitap/mobile_use/agents/contextor/contextor.py RENAMED Viewed

@@ -21,7 +21,7 @@ class ContextorNode:
         on_success=lambda _: logger.success("Contextor Agent"),
         on_failure=lambda _: logger.error("Contextor Agent"),
     )
-    def __call__(self, state: State):
+    async def __call__(self, state: State):
         device_data = get_screen_data(self.ctx.screen_api_client)
         focused_app_info = get_focused_app_info(self.ctx)
         device_date = get_device_date(self.ctx)
@@ -30,7 +30,7 @@ class ContextorNode:
             list(state.executor_messages)
         )
-        return state.sanitize_update(
+        return await state.asanitize_update(
             ctx=self.ctx,
             update={
                 "latest_screenshot_base64": device_data.base64

{minitap_mobile_use-2.3.0 → minitap_mobile_use-2.4.0}/minitap/mobile_use/agents/cortex/cortex.md RENAMED Viewed

@@ -31,6 +31,12 @@ To understand the device state, you have two senses, each with its purpose:
     *   **Golden Rule:** When the UI hierarchy is ambiguous, seems incomplete, or when you need to verify a visual detail before acting, **`glimpse_screen` is always the most effective and reliable action.** Never guess what the screen looks like; use your sight to be sure.
   **CRITICAL NOTE ON SIGHT:** The visual information from `glimpse_screen` is **ephemeral**. It is available for **THIS decision turn ONLY**. You MUST extract all necessary information from it IMMEDIATELY, as it will be cleared before the next step.
+### CRITICAL ACTION DIRECTIVES
+- **To open an application, you MUST use the `launch_app` tool.** Provide the natural language name of the app (e.g., "Uber Eats"). Do NOT attempt to open apps manually by swiping to the app drawer and searching. The `launch_app` tool is the fastest and most reliable method.
+- **To open URLs/links, you MUST use the `open_link` tool.** This handles all links, including deep links, correctly.
 ### Context You Receive:
 - 📱 **Device state**:
@@ -75,13 +81,32 @@ Focus on the **current PENDING subgoal and the next subgoals not yet started**.
 **You MUST follow it for every element interaction.**
-When you target a UI element (for a `tap`, `input_text`, `clear_text`, etc.), you **MUST** provide a comprehensive target object containing every piece of information you can find about it.
+When you target a UI element (for a `tap`, `input_text`, `clear_text`, etc.), you **MUST** provide a comprehensive `target` object containing every piece of information you can find about **that single element**.
 *   **1. `resource_id`**: Include this if it is present in the UI hierarchy.
-*   **2. `coordinates`**: Include the full bounds (`x`, `y`, `width`, `height`) if they are available.
-*   **3. `text`**: Include the *current text* content of the element (e.g., "Sign In", "Search...", "First Name").
+*   **2. `resource_id_index`**: If there are multiple elements with the same `resource_id`, provide the zero-based index of the specific one you are targeting.
+*   **3. `coordinates`**: Include the full bounds (`x`, `y`, `width`, `height`) if they are available.
+*   **4. `text`**: Include the *current text* content of the element (e.g., placeholder text for an input).
+*   **5. `text_index`**: If there are multiple elements with the same `text`, provide the zero-based index of the specific one you are targeting.
+**CRITICAL: The index must correspond to its identifier.** `resource_id_index` is only used when targeting by `resource_id`. `text_index` is only used when targeting by `text`. This ensures the fallback logic targets the correct element.
+**This is NOT optional.** Providing all locators if we have, it is the foundation of the system's reliability. It allows next steps to use a fallback mechanism: if the ID fails, it tries the coordinates, etc. Failing to provide this complete context will lead to action failures.
+### The Rule of Unpredictable Actions
-**This is NOT optional.** Providing all three locators if we have, it is the foundation of the system's reliability. It allows next steps to use a fallback mechanism: if the ID fails, it tries the coordinates, etc. Failing to provide this complete context will lead to action failures.
+Certain actions have outcomes that can significantly and sometimes unpredictably change the UI. These include:
+- `back`
+- `launch_app`
+- `stop_app`
+- `open_link`
+- `tap` on an element that is clearly for navigation (e.g., a "Back" button, a menu item, a link to another screen).
+**CRITICAL RULE: If your decision includes one of these unpredictable actions, it MUST be the only action in your `Structured Decisions` for this turn. Else, use flows to group actions together.**
+This is not optional. Failing to isolate these actions will cause the system to act on an outdated understanding of the screen, leading to catastrophic errors. For example, after a `back` command, you MUST wait to see the new screen before deciding what to tap next.
+You may only group simple, predictable actions together, such as tapping a text field and then immediately typing into it (`tap` followed by `input_text`).
 ### Outputting Your Decisions
@@ -90,8 +115,8 @@ If you decide to act, output a **valid JSON stringified structured set of instru
 - These must be **concrete low-level actions**.
 - The executor has the following available tools: {{ executor_tools_list }}.
 - Your goal is to achieve subgoals **fast** - so you must put as much actions as possible in your instructions to complete all achievable subgoals (based on your observations) in one go.
-- To open URLs/links directly, use the `open_link` tool - it will automatically handle opening in the appropriate browser. It also handles deep links.
-- When you need to open an app, use the `find_packages` low-level action to try and get its name. Then, simply use the `launch_app` low-level action to launch it.
+- If you refer to a UI element or coordinates, specify it clearly (e.g., `resource-id: com.whatsapp:id/search`, `resource-id-index: 0`, `text: "Alice"`, `resource-id-index: 0`, `x: 100, y: 200, width: 100, height: 100`).
+- **The structure is up to you**, but it must be valid **JSON stringified output**. You will accompany this output with a **natural-language summary** of your reasoning and approach in your agent thought.
 -   **Always use a single `input_text` action** to type in a field. This tool handles focusing the element and placing the cursor correctly. If the tool feedback indicates verification is needed or shows None/empty content, perform verification before proceeding.
 - **Only reference UI element IDs or visible texts that are explicitly present in the provided UI hierarchy or screenshot. Do not invent, infer, or guess any IDs or texts that are not directly observed**.
 - **For text clearing**: When you need to completely clear text from an input field, always call the `clear_text` tool with the correct resource_id. This tool automatically focuses the element, and ensures the field is emptied. If you notice this tool fails to clear the text, try to long press the input, select all, and call `erase_one_char`.
@@ -116,7 +141,23 @@ If you decide to act, output a **valid JSON stringified structured set of instru
 ---
-### Example
+### Example 1
+#### Current Subgoal:
+> "Open WhatsApp"
+#### Structured Decisions:
+```text
+"{\"action\": \"launch_app\", \"app_name\": \"WhatsApp\"}"
+```
+#### Agent Thought:
+> I need to launch the WhatsApp app. I will use the `launch_app` tool to open it.
+### Exemple 2
 #### Current Subgoal:
@@ -125,7 +166,7 @@ If you decide to act, output a **valid JSON stringified structured set of instru
 #### Structured Decisions:
 ```text
-"{\"action\": \"tap\", \"target\": {\"text_input_resource_id\": \"com.whatsapp:id/menuitem_search\", \"text_input_coordinates\": {\"x\": 880, \"y\": 150, \"width\": 120, \"height\": 120}, \"text_input_text\": \"Search\"}}"
+"[{\"action\": \"tap\", \"target\": {\"resource_id\": \"com.whatsapp:id/menuitem_search\", \"resource_id_index\": 1, \"text\": \"Search\", \"text_index\": 0, \"coordinates\": {\"x\": 880, \"y\": 150, \"width\": 120, \"height\": 120}}}]"
 ```
 #### Agent Thought:

{minitap_mobile_use-2.3.0 → minitap_mobile_use-2.4.0}/minitap/mobile_use/agents/cortex/cortex.py RENAMED Viewed

@@ -16,7 +16,7 @@ from minitap.mobile_use.agents.planner.utils import get_current_subgoal
 from minitap.mobile_use.constants import EXECUTOR_MESSAGES_KEY
 from minitap.mobile_use.context import MobileUseContext
 from minitap.mobile_use.graph.state import State
-from minitap.mobile_use.services.llm import get_llm, with_fallback
+from minitap.mobile_use.services.llm import get_llm, invoke_llm_with_timeout_message, with_fallback
 from minitap.mobile_use.tools.index import EXECUTOR_WRAPPERS_TOOLS, format_tools_list
 from minitap.mobile_use.utils.conversations import get_screenshot_message_for_llm
 from minitap.mobile_use.utils.decorators import wrap_with_callbacks
@@ -78,8 +78,12 @@ class CortexNode:
             ctx=self.ctx, name="cortex", use_fallback=True, temperature=1
         ).with_structured_output(CortexOutput)
         response: CortexOutput = await with_fallback(
-            main_call=lambda: llm.ainvoke(messages),
-            fallback_call=lambda: llm_fallback.ainvoke(messages),
+            main_call=lambda: invoke_llm_with_timeout_message(
+                llm.ainvoke(messages), agent_name="Cortex"
+            ),
+            fallback_call=lambda: invoke_llm_with_timeout_message(
+                llm_fallback.ainvoke(messages), agent_name="Cortex (Fallback)"
+            ),
         )  # type: ignore
         is_subgoal_completed = (
@@ -90,7 +94,7 @@ class CortexNode:
         if not is_subgoal_completed:
             response.complete_subgoals_by_ids = []
-        return state.sanitize_update(
+        return await state.asanitize_update(
             ctx=self.ctx,
             update={
                 "agents_thoughts": [response.agent_thought],

{minitap_mobile_use-2.3.0 → minitap_mobile_use-2.4.0}/minitap/mobile_use/agents/executor/executor.md RENAMED Viewed

@@ -25,12 +25,7 @@ and your previous actions, you must:
 "I'm tapping on the chat item labeled 'Alice' to open the conversation."
 ```json
-{
-  "action": "tap",
-  "target": {
-    "resource_id": "com.whatsapp:id/conversation_item"
-  }
-}
+  "[{\"tool_name\": \"tap\", \"arguments\": {\"target\": {\"resource_id\": \"com.whatsapp:id/conversation_item\", \"resource_id_index\": 0, \"text\": \"Alice\", \"text_index\": 0, \"coordinates\": {\"x\": 0, \"y\": 350, \"width\": 1080, \"height\": 80}}}}]"
 ```
 **→ Executor Action**:
@@ -38,13 +33,17 @@ and your previous actions, you must:
 Call the `tap_on_element` tool with:
 - `resource_id = "com.whatsapp:id/conversation_item"`
+- `resource_id_index = 0`
+- `text = "Alice"`
+- `text_index = 0`
+- `coordinates = {"x": 0, "y": 350, "width": 1080, "height": 80}`
 - `agent_thought = "I'm tapping on the chat item labeled 'Alice' to open the conversation."`
 ---
 ### ⚙️ Tools
-- Tools may include actions like: `tap`, `swipe`, `start_app`, `stop_app`, `find_packages`, `get_current_focus`, etc.
+- Tools may include actions like: `tap`, `swipe`, `launch_app`, `stop_app`, etc.
 - You **must not hardcode tool definitions** here.
 - Just use the right tool based on what the `structured_decisions` requires.
 - The tools are provided dynamically via LangGraph's tool binding mechanism.
@@ -53,10 +52,12 @@ Call the `tap_on_element` tool with:
 When using the `input_text` tool:
-- **Provide all available information** from the following optional parameters to identify the text input element:
-  - `text_input_resource_id`: The resource ID of the text input element (when available)
-  - `text_input_coordinates`: The bounds (ElementBounds) of the text input element (when available)
-  - `text_input_text`: The current text content of the text input element (when available)
+- **Provide all available information** in the target object to identify text input element
+  - `resource_id`: The resource ID of the text input element (when available)
+  - `resource_id_index`: The zero-based index of the specific resource ID you are targeting (when available)
+  - `text`: The current text content of the text input element (when available)
+  - `text_index`: The zero-based index of the specific text you are targeting (when available)
+  - `coordinates`: The bounds (ElementBounds) of the text input element (when available)
 - The tool will automatically:
@@ -64,6 +65,8 @@ When using the `input_text` tool:
   2. **Move the cursor to the end** of the existing text
   3. **Then type the new text**
+- **Important**: Special characters and markdown-like escape sequences (e.g., \n, \t, *, _) are not interpreted. For example, typing \n will insert the literal characters \ and n, not a line break.
 #### 🔄 Text Clearing Best Practice
 When you need to completely clear text from an input field, always use the clear_text tool with the correct resource_id.

{minitap_mobile_use-2.3.0 → minitap_mobile_use-2.4.0}/minitap/mobile_use/agents/executor/executor.py RENAMED Viewed

@@ -8,7 +8,7 @@ from langchain_google_vertexai.chat_models import ChatVertexAI
 from minitap.mobile_use.constants import EXECUTOR_MESSAGES_KEY
 from minitap.mobile_use.context import MobileUseContext
 from minitap.mobile_use.graph.state import State
-from minitap.mobile_use.services.llm import get_llm
+from minitap.mobile_use.services.llm import get_llm, invoke_llm_with_timeout_message
 from minitap.mobile_use.tools.index import EXECUTOR_WRAPPERS_TOOLS, get_tools_from_wrappers
 from minitap.mobile_use.utils.decorators import wrap_with_callbacks
 from minitap.mobile_use.utils.logger import get_logger
@@ -29,7 +29,7 @@ class ExecutorNode:
         structured_decisions = state.structured_decisions
         if not structured_decisions:
             logger.warning("No structured decisions found.")
-            return state.sanitize_update(
+            return await state.asanitize_update(
                 ctx=self.ctx,
                 update={
                     "agents_thoughts": [
@@ -62,9 +62,10 @@ class ExecutorNode:
             llm_bind_tools_kwargs["parallel_tool_calls"] = True
         llm = llm.bind_tools(**llm_bind_tools_kwargs)
-        response = await llm.ainvoke(messages)
-        return state.sanitize_update(
+        response = await invoke_llm_with_timeout_message(
+            llm.ainvoke(messages), agent_name="Executor"
+        )
+        return await state.asanitize_update(
             ctx=self.ctx,
             update={
                 "cortex_last_thought": cortex_last_thought,

{minitap_mobile_use-2.3.0 → minitap_mobile_use-2.4.0}/minitap/mobile_use/agents/hopper/hopper.py RENAMED Viewed

@@ -2,10 +2,11 @@ from pathlib import Path
 from jinja2 import Template
 from langchain_core.messages import HumanMessage, SystemMessage
-from minitap.mobile_use.context import MobileUseContext
-from minitap.mobile_use.services.llm import get_llm
 from pydantic import BaseModel, Field
+from minitap.mobile_use.context import MobileUseContext
+from minitap.mobile_use.services.llm import get_llm, invoke_llm_with_timeout_message
 class HopperOutput(BaseModel):
     step: str = Field(
@@ -33,7 +34,9 @@ async def hopper(
     llm = get_llm(ctx=ctx, name="hopper", is_utils=True, temperature=0)
     structured_llm = llm.with_structured_output(HopperOutput)
-    response: HopperOutput = await structured_llm.ainvoke(messages)  # type: ignore
+    response: HopperOutput = await invoke_llm_with_timeout_message(
+        structured_llm.ainvoke(messages), agent_name="Hopper"
+    )  # type: ignore
     return HopperOutput(
         step=response.step,
         output=response.output,

{minitap_mobile_use-2.3.0 → minitap_mobile_use-2.4.0}/minitap/mobile_use/agents/orchestrator/orchestrator.py RENAMED Viewed

@@ -15,7 +15,7 @@ from minitap.mobile_use.agents.planner.utils import (
 )
 from minitap.mobile_use.context import MobileUseContext
 from minitap.mobile_use.graph.state import State
-from minitap.mobile_use.services.llm import get_llm
+from minitap.mobile_use.services.llm import get_llm, invoke_llm_with_timeout_message
 from minitap.mobile_use.utils.decorators import wrap_with_callbacks
 from minitap.mobile_use.utils.logger import get_logger
@@ -45,14 +45,18 @@ class OrchestratorNode:
                     else f"Starting the next subgoal: {new_subgoal}"
                 )
             ]
-            return _get_state_update(ctx=self.ctx, state=state, thoughts=thoughts, update_plan=True)
+            return await _get_state_update(
+                ctx=self.ctx, state=state, thoughts=thoughts, update_plan=True
+            )
         subgoals_to_examine = get_subgoals_by_ids(
             subgoals=state.subgoal_plan,
             ids=state.complete_subgoals_by_ids,
         )
         if len(subgoals_to_examine) <= 0:
-            return _get_state_update(ctx=self.ctx, state=state, thoughts=["No subgoal to examine."])
+            return await _get_state_update(
+                ctx=self.ctx, state=state, thoughts=["No subgoal to examine."]
+            )
         system_message = Template(
             Path(__file__).parent.joinpath("orchestrator.md").read_text(encoding="utf-8")
@@ -72,13 +76,16 @@ class OrchestratorNode:
         llm = get_llm(ctx=self.ctx, name="orchestrator", temperature=1)
         llm = llm.with_structured_output(OrchestratorOutput)
-        response: OrchestratorOutput = await llm.ainvoke(messages)  # type: ignore
+        response: OrchestratorOutput = await invoke_llm_with_timeout_message(
+            llm.ainvoke(messages), agent_name="Orchestrator"
+        )  # type: ignore
         if response.needs_replaning:
             thoughts = [response.reason]
             state.subgoal_plan = fail_current_subgoal(state.subgoal_plan)
             thoughts.append("==== END OF PLAN, REPLANNING ====")
-            return _get_state_update(ctx=self.ctx, state=state, thoughts=thoughts, update_plan=True)
+            return await _get_state_update(
+                ctx=self.ctx, state=state, thoughts=thoughts, update_plan=True
+            )
         state.subgoal_plan = complete_subgoals_by_ids(
             subgoals=state.subgoal_plan,
@@ -87,19 +94,25 @@ class OrchestratorNode:
         thoughts = [response.reason]
         if all_completed(state.subgoal_plan):
             logger.success("All the subgoals have been completed successfully.")
-            return _get_state_update(ctx=self.ctx, state=state, thoughts=thoughts, update_plan=True)
+            return await _get_state_update(
+                ctx=self.ctx, state=state, thoughts=thoughts, update_plan=True
+            )
         if current_subgoal.id not in response.completed_subgoal_ids:
             # The current subgoal is not yet complete.
-            return _get_state_update(ctx=self.ctx, state=state, thoughts=thoughts, update_plan=True)
+            return await _get_state_update(
+                ctx=self.ctx, state=state, thoughts=thoughts, update_plan=True
+            )
         state.subgoal_plan = start_next_subgoal(state.subgoal_plan)
         new_subgoal = get_current_subgoal(state.subgoal_plan)
         thoughts.append(f"==== NEXT SUBGOAL: {new_subgoal} ====")
-        return _get_state_update(ctx=self.ctx, state=state, thoughts=thoughts, update_plan=True)
+        return await _get_state_update(
+            ctx=self.ctx, state=state, thoughts=thoughts, update_plan=True
+        )
-def _get_state_update(
+async def _get_state_update(
     ctx: MobileUseContext,
     state: State,
     thoughts: list[str],
@@ -111,4 +124,6 @@ def _get_state_update(
     }
     if update_plan:
         update["subgoal_plan"] = state.subgoal_plan
-    return state.sanitize_update(ctx=ctx, update=update, agent="orchestrator")
+        if ctx.on_plan_changes:
+            await ctx.on_plan_changes(state.subgoal_plan, False)
+    return await state.asanitize_update(ctx=ctx, update=update, agent="orchestrator")

{minitap_mobile_use-2.3.0 → minitap_mobile_use-2.4.0}/minitap/mobile_use/agents/outputter/outputter.py RENAMED Viewed

@@ -3,13 +3,14 @@ from pathlib import Path
 from jinja2 import Template
 from langchain_core.messages import BaseMessage, HumanMessage, SystemMessage
+from pydantic import BaseModel
 from minitap.mobile_use.config import OutputConfig
 from minitap.mobile_use.context import MobileUseContext
 from minitap.mobile_use.graph.state import State
-from minitap.mobile_use.services.llm import get_llm
+from minitap.mobile_use.services.llm import get_llm, invoke_llm_with_timeout_message
 from minitap.mobile_use.utils.conversations import is_ai_message
 from minitap.mobile_use.utils.logger import get_logger
-from pydantic import BaseModel
 logger = get_logger(__name__)
@@ -61,7 +62,9 @@ async def outputter(
         if schema is not None:
             structured_llm = llm.with_structured_output(schema)
-    response = await structured_llm.ainvoke(messages)  # type: ignore
+    response = await invoke_llm_with_timeout_message(
+        structured_llm.ainvoke(messages), agent_name="Outputter"
+    )  # type: ignore
     if isinstance(response, BaseModel):
         if output_config.output_description and hasattr(response, "content"):
             response = json.loads(response.content)  # type: ignore

{minitap_mobile_use-2.3.0 → minitap_mobile_use-2.4.0}/minitap/mobile_use/agents/planner/planner.md RENAMED Viewed

@@ -9,12 +9,13 @@ You work like an agile tech lead: defining the key milestones without locking in
    Given the **user's goal**:
    - Create a **high-level sequence of subgoals** to complete that goal.
-   - Subgoals should reflect real interactions with mobile UIs (e.g. "Open app", "Tap search bar", "Scroll to item", "Send message to Bob", etc).
+   - Subgoals should reflect real interactions with mobile UIs and describe the intent of the action (e.g., "Open the app to find a contact," "View the image to extract information," "Send a message to Bob confirming the appointment").
+   - Focus on the goal of the interaction, not just the physical action. For example, instead of 'View the receipt,' a better subgoal is 'Open and analyze the receipt to identify transactions.
    - Don't assume the full UI is visible yet. Plan based on how most mobile apps work, and keep flexibility.
-   - List of agents thoughts is empty which is expected, since it is the first plan.
-   - Avoid too granular UI actions based tasks (e.g. "tap", "swipe", "copy", "paste") unless explicitly required.
    - The executor has the following available tools: {{ executor_tools_list }}.
      When one of these tools offers a direct shortcut (e.g. `openLink` instead of manually launching a browser and typing a URL), prefer it over decomposed manual steps.
+   - Ensure that each subgoal prepares the ground for the next. If data needs to be gathered in one step to be used in another, the subgoal should reflect the intent to gather that data.
 2. **Replanning**
    If you're asked to **revise a previous plan**, you'll also receive:
@@ -27,38 +28,35 @@ You work like an agile tech lead: defining the key milestones without locking in
 ### Output
-You must output a **list of subgoals (description + optional subgoal ID)**, each representing a clear subgoal.
+You must output a **list of subgoals (description)**, each representing a clear subgoal.
 Each subgoal should be:
-- Focused on **realistic mobile interactions**
+- Focused on **purpose-driven mobile interactions** that clearly state the intent
 - Neither too vague nor too granular
 - Sequential (later steps may depend on earlier ones)
 - Don't use loop-like formulation unless necessary (e.g. don't say "repeat this X times", instead reuse the same steps X times as subgoals)
-If you're replaning and need to keep a previous subgoal, you **must keep the same subgoal ID**.
 ### Examples
-#### **Initial Goal**: "Open WhatsApp and send 'I’m running late' to Alice"
+#### **Initial Goal**: "Go on https://tesla.com, and tell me what is the first car being displayed"
 **Plan**:
-- Open the WhatsApp app (ID: None -> will be generated as a UUID like bc3c362d-f498-4f1a-991e-4a2d1f8c1226)
-- Locate or search for Alice (ID: None)
-- Open the conversation with Alice (ID: None)
-- Type the message "I’m running late" (ID: None)
-- Send the message (ID: None)
+- Open the link https://tesla.com to find information
+- Analyze the home page to identify the first car displayed
-#### **Initial Goal**: "Go on https://tesla.com, and tell me what is the first car being displayed"
+#### **Initial Goal**: "Open WhatsApp and send 'I’m running late' to Alice"
 **Plan**:
-- Open the link https://tesla.com (ID: None)
-- Find the first car displayed on the home page (ID: None)
+- Open the WhatsApp app to find the contact "Alice"
+- Open the conversation with Alice to send a message
+- Type the message "I’m running late" into the message field
+- Send the message
 #### **Replanning Example**
-**Original Plan**: same as above with IDs set
+**Original Plan**: same as above
 **Agent Thoughts**:
 - Couldn't find Alice in recent chats
@@ -67,8 +65,8 @@ If you're replaning and need to keep a previous subgoal, you **must keep the sam
 **New Plan**:
-- Open WhatsApp (ID: bc3c362d-f498-4f1a-991e-4a2d1f8c1226)
-- Tap the search bar (ID: None)
-- Search for "Alice" (ID: None)
-- Select the correct chat (ID: None)
-- Type and send "I’m running late" (ID: None)
+- Open WhatsApp
+- Tap the search bar to find a contact
+- Search for "Alice" in the search field
+- Select the correct chat to open the conversation
+- Type and send "I’m running late"

{minitap_mobile_use-2.3.0 → minitap_mobile_use-2.4.0}/minitap/mobile_use/agents/planner/planner.py RENAMED Viewed

@@ -1,14 +1,13 @@
-import uuid
 from pathlib import Path
 from jinja2 import Template
 from langchain_core.messages import HumanMessage, SystemMessage
 from minitap.mobile_use.agents.planner.types import PlannerOutput, Subgoal, SubgoalStatus
-from minitap.mobile_use.agents.planner.utils import one_of_them_is_failure
+from minitap.mobile_use.agents.planner.utils import generate_id, one_of_them_is_failure
 from minitap.mobile_use.context import MobileUseContext
 from minitap.mobile_use.graph.state import State
-from minitap.mobile_use.services.llm import get_llm
+from minitap.mobile_use.services.llm import get_llm, invoke_llm_with_timeout_message
 from minitap.mobile_use.tools.index import EXECUTOR_WRAPPERS_TOOLS, format_tools_list
 from minitap.mobile_use.utils.decorators import wrap_with_callbacks
 from minitap.mobile_use.utils.logger import get_logger
@@ -49,11 +48,12 @@ class PlannerNode:
         llm = get_llm(ctx=self.ctx, name="planner")
         llm = llm.with_structured_output(PlannerOutput)
-        response: PlannerOutput = await llm.ainvoke(messages)  # type: ignore
+        response: PlannerOutput = await invoke_llm_with_timeout_message(
+            llm.ainvoke(messages), agent_name="Planner"
+        )  # type: ignore
         subgoals_plan = [
             Subgoal(
-                id=subgoal.id or str(uuid.uuid4()),
+                id=generate_id(),
                 description=subgoal.description,
                 status=SubgoalStatus.NOT_STARTED,
                 completion_reason=None,
@@ -63,7 +63,10 @@ class PlannerNode:
         logger.info("📜 Generated plan:")
         logger.info("\n".join(str(s) for s in subgoals_plan))
-        return state.sanitize_update(
+        if self.ctx.on_plan_changes:
+            await self.ctx.on_plan_changes(subgoals_plan, needs_replan)
+        return await state.asanitize_update(
             ctx=self.ctx,
             update={
                 "subgoal_plan": subgoals_plan,

{minitap_mobile_use-2.3.0 → minitap_mobile_use-2.4.0}/minitap/mobile_use/agents/planner/types.py RENAMED Viewed

@@ -1,11 +1,11 @@
+from datetime import datetime
 from enum import Enum
+from typing import Annotated
 from pydantic import BaseModel
-from typing import Annotated
 class PlannerSubgoalOutput(BaseModel):
-    id: Annotated[str | None, "If not provided, it will be generated"] = None
     description: str
@@ -27,6 +27,8 @@ class Subgoal(BaseModel):
         str | None, "Reason why the subgoal was completed (failure or success)"
     ] = None
     status: SubgoalStatus
+    started_at: Annotated[datetime | None, "When the subgoal started"] = None
+    ended_at: Annotated[datetime | None, "When the subgoal ended"] = None
     def __str__(self):
         status_emoji = "❓"

{minitap_mobile_use-2.3.0 → minitap_mobile_use-2.4.0}/minitap/mobile_use/agents/planner/utils.py RENAMED Viewed

@@ -1,4 +1,8 @@
+import random
+import string
 from minitap.mobile_use.agents.planner.types import Subgoal, SubgoalStatus
+from datetime import datetime, UTC
 def get_current_subgoal(subgoals: list[Subgoal]) -> Subgoal | None:
@@ -22,6 +26,7 @@ def complete_current_subgoal(subgoals: list[Subgoal]) -> list[Subgoal]:
     if not current_subgoal:
         return subgoals
     current_subgoal.status = SubgoalStatus.SUCCESS
+    current_subgoal.ended_at = datetime.now(UTC)
     return subgoals
@@ -29,6 +34,7 @@ def complete_subgoals_by_ids(subgoals: list[Subgoal], ids: list[str]) -> list[Su
     for subgoal in subgoals:
         if subgoal.id in ids:
             subgoal.status = SubgoalStatus.SUCCESS
+            subgoal.ended_at = datetime.now(UTC)
     return subgoals
@@ -37,6 +43,7 @@ def fail_current_subgoal(subgoals: list[Subgoal]) -> list[Subgoal]:
     if not current_subgoal:
         return subgoals
     current_subgoal.status = SubgoalStatus.FAILURE
+    current_subgoal.ended_at = datetime.now(UTC)
     return subgoals
@@ -53,4 +60,11 @@ def start_next_subgoal(subgoals: list[Subgoal]) -> list[Subgoal]:
     if not next_subgoal:
         return subgoals
     next_subgoal.status = SubgoalStatus.PENDING
+    next_subgoal.started_at = datetime.now(UTC)
     return subgoals
+def generate_id(length: int = 6) -> str:
+    """Generates a small and distinct random string ID."""
+    chars = string.ascii_lowercase + string.digits
+    return "".join(random.choice(chars) for _ in range(length))

{minitap_mobile_use-2.3.0 → minitap_mobile_use-2.4.0}/minitap/mobile_use/agents/summarizer/summarizer.py RENAMED Viewed

@@ -13,7 +13,7 @@ class SummarizerNode:
     def __init__(self, ctx: MobileUseContext):
         self.ctx = ctx
-    def __call__(self, state: State):
+    async def __call__(self, state: State):
         if len(state.messages) <= MAX_MESSAGES_IN_HISTORY:
             return {}
@@ -27,7 +27,7 @@ class SummarizerNode:
                 start_removal = True
             if start_removal and msg.id:
                 remove_messages.append(RemoveMessage(id=msg.id))
-            return state.sanitize_update(
+            return await state.asanitize_update(
                 ctx=self.ctx,
                 update={
                     "messages": remove_messages,

{minitap_mobile_use-2.3.0 → minitap_mobile_use-2.4.0}/minitap/mobile_use/config.py RENAMED Viewed

@@ -23,8 +23,10 @@ class Settings(BaseSettings):
     GOOGLE_API_KEY: SecretStr | None = None
     XAI_API_KEY: SecretStr | None = None
     OPEN_ROUTER_API_KEY: SecretStr | None = None
+    MINITAP_API_KEY: SecretStr | None = None
     OPENAI_BASE_URL: str | None = None
+    MINITAP_API_BASE_URL: str = "https://platform.minitap.ai"
     DEVICE_SCREEN_API_BASE_URL: str | None = None
     DEVICE_HARDWARE_BRIDGE_BASE_URL: str | None = None
@@ -90,7 +92,7 @@ def record_events(output_path: Path | None, events: list[str] | BaseModel | Any)
 ### LLM Configuration
-LLMProvider = Literal["openai", "google", "openrouter", "xai", "vertexai"]
+LLMProvider = Literal["openai", "google", "openrouter", "xai", "vertexai", "minitap"]
 LLMUtilsNode = Literal["outputter", "hopper"]
 AgentNode = Literal["planner", "orchestrator", "cortex", "executor"]
 AgentNodeWithFallback = Literal["cortex"]
@@ -131,6 +133,9 @@ class LLM(BaseModel):
             case "xai":
                 if not settings.XAI_API_KEY:
                     raise Exception(f"{name} requires XAI_API_KEY in .env")
+            case "minitap":
+                if not settings.MINITAP_API_KEY:
+                    raise Exception(f"{name} requires MINITAP_API_KEY in .env")
     def __str__(self):
         return f"{self.provider}/{self.model}"

minitap-mobile-use 2.3.0__tar.gz → 2.4.0__tar.gz

Potentially problematic release.

minitap-mobile-use 2.3.0tar.gz → 2.4.0tar.gz