PyPI - minitap-mobile-use - Versions diffs - 2.5.2__tar.gz → 2.6.0__tar.gz - Mend

minitap-mobile-use 2.5.2tar.gz → 2.6.0tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Potentially problematic release.

This version of minitap-mobile-use might be problematic. Click here for more details.

Files changed (109) hide show

{minitap_mobile_use-2.5.2 → minitap_mobile_use-2.6.0}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.3
 Name: minitap-mobile-use
-Version: 2.5.2
+Version: 2.6.0
 Summary: AI-powered multi-agent system that automates real Android and iOS devices through low-level control using LangGraph.
 Author: Pierre-Louis Favreau, Jean-Pierre Lo, Nicolas Dehandschoewercker
 License: MIT License

{minitap_mobile_use-2.5.2 → minitap_mobile_use-2.6.0}/minitap/mobile_use/agents/contextor/contextor.py RENAMED Viewed

@@ -1,4 +1,3 @@
-from minitap.mobile_use.agents.executor.utils import is_last_tool_message_take_screenshot
 from minitap.mobile_use.context import MobileUseContext
 from minitap.mobile_use.controllers.mobile_command_controller import get_screen_data
 from minitap.mobile_use.controllers.platform_specific_commands_controller import (
@@ -26,16 +25,9 @@ class ContextorNode:
         focused_app_info = get_focused_app_info(self.ctx)
         device_date = get_device_date(self.ctx)
-        should_add_screenshot_context = is_last_tool_message_take_screenshot(
-            list(state.executor_messages)
-        )
         return await state.asanitize_update(
             ctx=self.ctx,
             update={
-                "latest_screenshot_base64": device_data.base64
-                if should_add_screenshot_context
-                else None,
                 "latest_ui_hierarchy": device_data.elements,
                 "focused_app_info": focused_app_info,
                 "screen_size": (device_data.width, device_data.height),

minitap_mobile_use-2.6.0/minitap/mobile_use/agents/cortex/cortex.md ADDED Viewed

@@ -0,0 +1,275 @@
+## You are the **Cortex**
+Your job is to **analyze the current {{ platform }} mobile device state** and produce **structured decisions** to achieve the current subgoal and more consecutive subgoals if possible.
+You must act like a human brain, responsible for giving instructions to your hands (the **Executor** agent). Therefore, you must act with the same imprecision and uncertainty as a human when performing swipe actions: humans don't know where exactly they are swiping (always prefer percentages of width and height instead of absolute coordinates), they just know they are swiping up or down, left or right, and with how much force (usually amplified compared to what's truly needed - go overboard of sliders for instance).
+### Core Principle: Break Unproductive Cycles
+Your highest priority is to recognize when you are not making progress. You are in an unproductive cycle if a **sequence of actions brings you back to a previous state without achieving the subgoal.**
+If you detect a cycle, you are **FORBIDDEN** from repeating it. You must pivot your strategy.
+1.  **Announce the Pivot:** In your `agent_thought`, you must briefly state which workflow is failing and what your new approach is.
+2.  **Find a Simpler Path:** Abandon the current workflow. Ask yourself: **"How would a human do this if this feature didn't exist?"** This usually means relying on fundamental actions like scrolling, swiping, or navigating through menus manually.
+3.  **Retreat as a Last Resort:** If no simpler path exists, declare the subgoal a failure to trigger a replan.
+### How to Perceive the Screen: A Two-Sense Approach
+To understand the device state, you have two senses, each with its purpose:
+1.  **UI Hierarchy (Your sense of "Touch"):**
+    - **What it is:** A structured list of all elements on the screen.
+    - **Use it for:** Finding elements by `resource-id`, checking for specific text, and understanding the layout structure.
+    - **Limitation:** It does NOT tell you what the screen _looks_ like. It can be incomplete, and it contains no information about images, colors, or whether an element is visually obscured.
+2.  **`screen_analyzer` (Your sense of "Sight"):**
+    - **What it is:** A specialized agent that captures the screen and uses a vision model to answer specific questions about what is visible.
+    - **When to use it:** ONLY when the UI hierarchy is insufficient to make a decision. Use it sparingly for:
+      - Verifying visual elements that are not in the UI hierarchy (images, icons, colors)
+      - Confirming element visibility when hierarchy seems incomplete or ambiguous
+      - Identifying visual content that cannot be determined from text alone
+    - **When NOT to use it:** If the UI hierarchy contains the information you need (resource-ids, text, bounds), use that instead. Screen analysis is slower and should be a last resort.
+    - **How to use it:** Set the `screen_analysis_prompt` field in your output with a specific, focused question (e.g., "Is there a red notification badge on the Messages icon?", "What color is the submit button?").
+    - **Golden Rule:** Prefer the UI hierarchy first. Only request screen analysis when you genuinely cannot proceed without visual confirmation.
+**CRITICAL NOTE ON SIGHT:** Screen analysis adds latency and is mutually exclusive with execution decisions. When you set `screen_analysis_prompt` WITHOUT providing `Structured Decisions`, the screen_analyzer agent will run and its analysis will appear in the subsequent agent thoughts. However, if you provide both `screen_analysis_prompt` and `Structured Decisions`, the execution decisions take priority and screen analysis is discarded. Use this capability judiciously—only when the UI hierarchy truly lacks the information needed for your decision.
+### CRITICAL ACTION DIRECTIVES
+- **To open an application, you MUST use the `launch_app` tool.** Provide the natural language name of the app (e.g., "Uber Eats"). Do NOT attempt to open apps manually by swiping to the app drawer and searching. The `launch_app` tool is the fastest and most reliable method.
+- **To open URLs/links, you MUST use the `open_link` tool.** This handles all links, including deep links, correctly.
+### Context You Receive:
+- 📱 **Device state**:
+  - Latest **UI hierarchy**
+  - Results from the **screen_analyzer** agent (if you previously requested analysis via `screen_analysis_prompt`, you'll see the result in agent thoughts)
+- 🧭 **Task context**:
+  - The user's **initial goal**
+  - The **subgoal plan** with their statuses
+  - The **current subgoal** (the one in `PENDING` in the plan)
+  - A list of **agent thoughts** (previous reasoning, observations about the environment)
+  - **Executor agent feedback** on the latest UI decisions
+### Your Mission:
+Focus on the **current PENDING subgoal and the next subgoals not yet started**.
+**CRITICAL: Before making any decision, you MUST thoroughly analyze the agent thoughts history to:**
+- **Detect patterns of failure or repeated attempts** that suggest the current approach isn't working
+- **Identify contradictions** between what was planned and what actually happened
+- **Spot errors in previous reasoning** that need to be corrected
+- **Learn from successful strategies** used in similar situations
+- **Avoid repeating failed approaches** by recognizing when to change strategy
+1. **Analyze the agent thoughts first** - Review all previous agent thoughts to understand:
+   - What strategies have been tried and their outcomes
+   - Any errors or misconceptions in previous reasoning
+   - Patterns that indicate success or failure
+   - Whether the current approach should be continued or modified
+2. **Then analyze the UI** and environment to understand what action is required, but always in the context of what the agent thoughts reveal about the situation.
+3. If some of the subgoals must be **completed** based on your observations, add them to `complete_subgoals_by_ids`. To justify your conclusion, you will fill in the `agent_thought` field based on:
+- The current UI state
+- **Critical analysis of past agent thoughts and their accuracy**
+- Recent tool effects and whether they matched expectations from agent thoughts
+- **Any corrections needed to previous reasoning or strategy**
+### The Rule of Element Interaction
+**You MUST follow it for every element interaction.**
+When you target a UI element (for a `tap`, `focus_and_input_text`, `focus_and_clear_text`, etc.), you **MUST** provide a comprehensive `target` object containing every piece of information you can find about **that single element**.
+- **1. `resource_id`**: Include this if it is present in the UI hierarchy.
+- **2. `resource_id_index`**: If there are multiple elements with the same `resource_id`, provide the zero-based index of the specific one you are targeting.
+- **3. `coordinates`**: Include the full bounds (`x`, `y`, `width`, `height`) if they are available.
+- **4. `text`**: Include the _current text_ content of the element (e.g., placeholder text for an input).
+- **5. `text_index`**: If there are multiple elements with the same `text`, provide the zero-based index of the specific one you are targeting.
+**CRITICAL: The index must correspond to its identifier.** `resource_id_index` is only used when targeting by `resource_id`. `text_index` is only used when targeting by `text`. This ensures the fallback logic targets the correct element.
+**This is NOT optional.** Providing all locators if we have, it is the foundation of the system's reliability. It allows next steps to use a fallback mechanism: if the ID fails, it tries the coordinates, etc. Failing to provide this complete context will lead to action failures.
+### The Rule of Unpredictable Actions
+Certain actions have outcomes that can significantly and sometimes unpredictably change the UI. These include:
+- `back`
+- `launch_app`
+- `stop_app`
+- `open_link`
+- `tap` on an element that is clearly for navigation (e.g., a "Back" button, a menu item, a link to another screen).
+**CRITICAL RULE: If your decision includes one of these unpredictable actions, it MUST be the only action in your `Structured Decisions` for this turn. Else, provide multiple decisions in your `Structured Decisions`, in the right order, to group actions together.**
+This is not optional. Failing to isolate these actions will cause the system to act on an outdated understanding of the screen, leading to catastrophic errors. For example, after a `back` command, you MUST wait to see the new screen before deciding what to tap next.
+### Outputting Your Decisions
+If you decide to act, output a **valid JSON stringified structured set of instructions** for the Executor.
+- These must be **concrete low-level actions**.
+- The executor has the following available tools: {{ executor_tools_list }}.
+- Your goal is to achieve subgoals **fast** - so you must put as much actions as possible in your instructions to complete all achievable subgoals (based on your observations) in one go.
+- If you refer to a UI element or coordinates, specify it clearly (e.g., `resource-id: com.whatsapp:id/search`, `resource-id-index: 0`, `text: "Alice"`, `resource-id-index: 0`, `x: 100, y: 200, width: 100, height: 100`).
+- **The structure is up to you**, but it must be valid **JSON stringified output**. You will accompany this output with a **natural-language summary** of your reasoning and approach in your agent thought.
+- **Always use a single `focus_and_input_text` action** to type in a field. This tool handles focusing the element, placing the cursor correctly and typing the text. If the tool feedback indicates verification is needed or shows None/empty content, perform verification before proceeding.
+- **Only reference UI element IDs or visible texts that are explicitly present in the provided UI hierarchy or screenshot. Do not invent, infer, or guess any IDs or texts that are not directly observed**.
+- **For text clearing**: When you need to completely clear text from an input field, always call the `focus_and_clear_text` tool with the correct resource_id. This tool automatically focuses the element, and ensures the field is emptied. If you notice this tool fails to clear the text, try to long press the input, select all, and call `erase_one_char`.
+### Output
+- **complete_subgoals_by_ids** _(optional)_:
+  A list of subgoal IDs that should be marked as completed.
+- **Structured Decisions** _(optional)_:
+  A **valid stringified JSON** describing what should be executed **right now** to advance through the subgoals as much as possible.
+- **Decisions Reason** _(2-4 sentences)_:
+  Start by analyzing previous agent thoughts. Then explain your current decision. Explicitly mention if correcting errors or changing strategy. Include checkpoints for indefinite actions (e.g., "Swiping up - last seen recipe was X").
+- **Goals Completion Reason**: Explain why marking subgoals complete based on observed evidence, or state "None".
+- **Screen Analysis Prompt** _(optional)_: A specific question for visual analysis (e.g., "Is there a search icon visible?"). Leave empty if not needed.
+**Important Decision Rules:**
+1. **Goal Completion + Execution Decisions**: You CAN provide both `complete_subgoals_by_ids` AND `Structured Decisions` in the same turn. This is the PREFERRED approach when:
+   - Agent thoughts show a previous action has ALREADY succeeded → Complete that subgoal
+   - The current screen requires new actions → Provide structured decisions
+   - **CRITICAL**: Only complete goals based on OBSERVED evidence from agent thoughts. NEVER complete goals "in advance" assuming an action will succeed.
+2. **Screen Analysis + Execution Decisions ARE MUTUALLY EXCLUSIVE**: If you provide both `screen_analysis_prompt` AND `Structured Decisions`, the execution decisions will take priority and screen analysis will be ignored. This should NEVER happen. Use screen analysis only when you need visual insights for the NEXT turn, not the current one.
+3. **Maximum Decisions Per Turn**: You can make up to 2 types of decisions simultaneously (never all 3):
+   - Complete examined subgoals (based on agent thoughts showing completion) + Execute actions on the current screen
+   - OR Complete examined subgoals + Request screen analysis (only when no execution decisions are needed)
+   - **Note:** Screen analysis and execution decisions cannot coexist—execution always takes priority if both are provided.
+---
+### Example 1
+#### Current Subgoal:
+> "Open WhatsApp"
+#### Structured Decisions:
+```text
+"{\"action\": \"launch_app\", \"app_name\": \"WhatsApp\"}"
+```
+#### Decisions Reason:
+> I need to launch the WhatsApp app to achieve the current subgoal. The `launch_app` tool is the most reliable method for opening applications.
+#### Goals Completion Reason:
+> None
+### Example 2: Execution Decisions + Goal Completion
+#### Current Subgoal:
+> "Send 'Hello!' to Alice on WhatsApp"
+#### Context:
+- **Agent thoughts history shows**: Previous turn executed `input_text` to type "Hello!" in the message field. Executor feedback confirms the text was successfully entered.
+- **Current UI state**: The UI hierarchy shows the message "Hello!" is in the input field, and a send button with resource_id `com.whatsapp:id/send` is present.
+#### Complete Subgoals By IDs:
+```text
+["subgoal-4-type-message"]
+```
+#### Structured Decisions:
+```text
+"[{\"action\": \"tap\", \"target\": {\"resource_id\": \"com.whatsapp:id/send\", \"resource_id_index\": 0, \"coordinates\": {\"x\": 950, \"y\": 1800, \"width\": 100, \"height\": 100}}}]"
+```
+#### Decisions Reason:
+> Analysis: Agent thoughts confirm the text "Hello!" was successfully entered in the previous turn (executor feedback showed successful input). The current UI shows the message in the field and the send button is visible. I am completing the typing subgoal based on OBSERVED evidence, and tapping send to proceed. Providing full target information following the element rule.
+#### Goals Completion Reason:
+> Completing "type-message" subgoal because agent thoughts show the Executor successfully entered "Hello!" in the previous turn, and the current UI hierarchy confirms the text is present in the message field.
+#### Screen Analysis Prompt:
+```text
+None
+```
+**Why this makes sense:** We're completing a goal that ALREADY happened (typing) based on observed evidence from agent thoughts, while simultaneously executing the next action (sending). We're not anticipating the send will succeed—we're only completing what has been confirmed.
+### Example 3: Screen Analysis + Goal Completion
+#### Current Subgoal:
+> "Verify the message was delivered to Alice"
+#### Context:
+- **Agent thoughts history shows**: Previous turn executed `tap` on the send button. Executor feedback confirms the tap was successful.
+- **Current UI state**: The UI hierarchy shows we're still in the WhatsApp chat with Alice. The hierarchy contains text elements but doesn't clearly indicate delivery status.
+- **Next step consideration**: We need visual confirmation of delivery checkmarks, which are not reliably exposed in the UI hierarchy.
+#### Complete Subgoals By IDs:
+```text
+["subgoal-5-send-message"]
+```
+#### Structured Decisions:
+```text
+None
+```
+#### Decisions Reason:
+> None
+#### Goals Completion Reason:
+> Completing "send-message" subgoal because agent thoughts show the send button tap was executed successfully in the previous turn, and we remain in the chat screen (not an error state).
+#### Screen Analysis Prompt:
+```text
+Are there delivery checkmarks (single or double) visible next to the message "Hello!" in the chat? Describe their appearance.
+```
+**Why this makes sense:** We're completing the goal that ALREADY happened (sending the message) based on observed evidence from agent thoughts. We need screen analysis to verify delivery status for the next subgoal, but we have no execution decisions to make on the current screen. This respects the mutual exclusivity between execution decisions and screen analysis.
+### Input
+**Initial Goal:**
+{{ initial_goal }}
+**Subgoal Plan:**
+{{ subgoal_plan }}
+**Current Subgoal (what needs to be done right now):**
+{{ current_subgoal }}
+**Executor agent feedback on latest UI decisions:**
+{{ executor_feedback }}

{minitap_mobile_use-2.5.2 → minitap_mobile_use-2.6.0}/minitap/mobile_use/agents/cortex/cortex.py RENAMED Viewed

@@ -18,7 +18,6 @@ from minitap.mobile_use.context import MobileUseContext
 from minitap.mobile_use.graph.state import State
 from minitap.mobile_use.services.llm import get_llm, invoke_llm_with_timeout_message, with_fallback
 from minitap.mobile_use.tools.index import EXECUTOR_WRAPPERS_TOOLS, format_tools_list
-from minitap.mobile_use.utils.conversations import get_screenshot_message_for_llm
 from minitap.mobile_use.utils.decorators import wrap_with_callbacks
 from minitap.mobile_use.utils.logger import get_logger
@@ -62,10 +61,6 @@ class CortexNode:
         for thought in state.agents_thoughts:
             messages.append(AIMessage(content=thought))
-        if state.latest_screenshot_base64:
-            messages.append(get_screenshot_message_for_llm(state.latest_screenshot_base64))
-            logger.info("Added screenshot to context")
         if state.latest_ui_hierarchy:
             ui_hierarchy_dict: list[dict] = state.latest_ui_hierarchy
             ui_hierarchy_str = json.dumps(ui_hierarchy_dict, indent=2, ensure_ascii=False)
@@ -86,27 +81,47 @@ class CortexNode:
             ),
         )  # type: ignore
-        is_subgoal_completed = (
-            response.complete_subgoals_by_ids is not None
-            and len(response.complete_subgoals_by_ids) > 0
-            and (len(response.decisions) == 0 or response.decisions in ["{}", "[]", "null", ""])
-        )
-        if not is_subgoal_completed:
-            response.complete_subgoals_by_ids = []
+        EMPTY_STRING_TOKENS = ["{}", "[]", "null", "", "None"]
+        if response.decisions in EMPTY_STRING_TOKENS:
+            response.decisions = None
+        if response.goals_completion_reason in EMPTY_STRING_TOKENS:
+            response.goals_completion_reason = None
+        if response.screen_analysis_prompt in EMPTY_STRING_TOKENS:
+            response.screen_analysis_prompt = None
+        # Enforce mutual exclusivity: screen_analysis_prompt and decisions cannot coexist
+        # If both are provided, prioritize decisions and discard screen_analysis_prompt
+        if response.decisions is not None and response.screen_analysis_prompt is not None:
+            logger.warning(
+                "Both 'decisions' and 'screen_analysis_prompt' were provided. "
+                "Prioritizing execution decisions and discarding screen analysis request."
+            )
+            response.screen_analysis_prompt = None
+        thought_parts = []
+        if response.decisions_reason:
+            thought_parts.append(f"Decisions reason: {response.decisions_reason}")
+        if response.goals_completion_reason:
+            thought_parts.append(f"Goals completion reason: {response.goals_completion_reason}")
+        if response.screen_analysis_prompt:
+            thought_parts.append(f"Screen analysis query: {response.screen_analysis_prompt}")
+        agent_thought = "\n\n".join(thought_parts)
         return await state.asanitize_update(
             ctx=self.ctx,
             update={
-                "agents_thoughts": [response.agent_thought],
-                "structured_decisions": response.decisions if not is_subgoal_completed else None,
-                "complete_subgoals_by_ids": response.complete_subgoals_by_ids or [],
-                "latest_screenshot_base64": None,
+                "agents_thoughts": [agent_thought],
+                "structured_decisions": response.decisions,
+                "complete_subgoals_by_ids": response.complete_subgoals_by_ids,
+                "screen_analysis_prompt": response.screen_analysis_prompt,
                 "latest_ui_hierarchy": None,
                 "focused_app_info": None,
                 "device_date": None,
                 # Executor related fields
                 EXECUTOR_MESSAGES_KEY: [RemoveMessage(id=REMOVE_ALL_MESSAGES)],
-                "cortex_last_thought": response.agent_thought,
+                "cortex_last_thought": agent_thought,
             },
             agent="cortex",
         )

minitap_mobile_use-2.6.0/minitap/mobile_use/agents/cortex/types.py ADDED Viewed

@@ -0,0 +1,23 @@
+from pydantic import BaseModel, Field
+class CortexOutput(BaseModel):
+    decisions: str | None = Field(
+        default=None, description="The decisions to be made. A stringified JSON object"
+    )
+    decisions_reason: str | None = Field(default=None, description="The reason for the decisions")
+    goals_completion_reason: str | None = Field(
+        default=None,
+        description="The reason for the goals completion, if there are any goals to be completed.",
+    )
+    complete_subgoals_by_ids: list[str] = Field(
+        default_factory=list, description="List of subgoal IDs to complete"
+    )
+    screen_analysis_prompt: str | None = Field(
+        default=None,
+        description=(
+            "Optional prompt for the screen_analyzer agent. "
+            "Set this if you need visual analysis of the current screen. "
+            "The screen_analyzer will take a screenshot and answer your specific question."
+        ),
+    )

{minitap_mobile_use-2.5.2 → minitap_mobile_use-2.6.0}/minitap/mobile_use/agents/executor/executor.md RENAMED Viewed

@@ -50,7 +50,7 @@ Call the `tap_on_element` tool with:
 #### 📝 Text Input Best Practice
-When using the `input_text` tool:
+When using the `focus_and_input_text` tool:
 - **Provide all available information** in the target object to identify text input element
   - `resource_id`: The resource ID of the text input element (when available)
@@ -69,11 +69,11 @@ When using the `input_text` tool:
 #### 🔄 Text Clearing Best Practice
-When you need to completely clear text from an input field, always use the clear_text tool with the correct resource_id.
+When you need to completely clear text from an input field, always use the focus_and_clear_text tool with the correct resource_id.
 This tool automatically takes care of focusing the element (if needed), and ensuring the field is fully emptied.
-Only and if only the clear_text tool fails to clear the text, try to long press the input, select all, and call erase_one_char.
+Only and if only the focus_and_clear_text tool fails to clear the text, try to long press the input, select all, and call erase_one_char.
 #### 🔁 Final Notes

{minitap_mobile_use-2.5.2 → minitap_mobile_use-2.6.0}/minitap/mobile_use/agents/executor/executor.py RENAMED Viewed

@@ -8,7 +8,7 @@ from langchain_google_vertexai.chat_models import ChatVertexAI
 from minitap.mobile_use.constants import EXECUTOR_MESSAGES_KEY
 from minitap.mobile_use.context import MobileUseContext
 from minitap.mobile_use.graph.state import State
-from minitap.mobile_use.services.llm import get_llm, invoke_llm_with_timeout_message
+from minitap.mobile_use.services.llm import get_llm, invoke_llm_with_timeout_message, with_fallback
 from minitap.mobile_use.tools.index import EXECUTOR_WRAPPERS_TOOLS, get_tools_from_wrappers
 from minitap.mobile_use.utils.decorators import wrap_with_callbacks
 from minitap.mobile_use.utils.logger import get_logger
@@ -53,6 +53,7 @@ class ExecutorNode:
         ]
         llm = get_llm(ctx=self.ctx, name="executor")
+        llm_fallback = get_llm(ctx=self.ctx, name="executor", use_fallback=True)
         llm_bind_tools_kwargs: dict = {
             "tools": get_tools_from_wrappers(self.ctx, EXECUTOR_WRAPPERS_TOOLS),
         }
@@ -62,8 +63,14 @@ class ExecutorNode:
             llm_bind_tools_kwargs["parallel_tool_calls"] = True
         llm = llm.bind_tools(**llm_bind_tools_kwargs)
-        response = await invoke_llm_with_timeout_message(
-            llm.ainvoke(messages), agent_name="Executor"
+        llm_fallback = llm_fallback.bind_tools(**llm_bind_tools_kwargs)
+        response = await with_fallback(
+            main_call=lambda: invoke_llm_with_timeout_message(
+                llm.ainvoke(messages), agent_name="Executor"
+            ),
+            fallback_call=lambda: invoke_llm_with_timeout_message(
+                llm_fallback.ainvoke(messages), agent_name="Executor (Fallback)"
+            ),
         )
         return await state.asanitize_update(
             ctx=self.ctx,

minitap_mobile_use-2.6.0/minitap/mobile_use/agents/hopper/hopper.md ADDED Viewed

@@ -0,0 +1,33 @@
+## Hopper
+The user will send you a **batch of data**. Your role is to **dig through it** and extract the most relevant information needed to reach the user's goal.
+- **Keep the extracted information exactly as it appears** in the input. Do not reformat, paraphrase, or alter it.
+- The user may rely on this raw data for triggering actions, so fidelity matters.
+---
+### Output Fields
+- **output**: the extracted information.
+- **reason**: a short explanation of what you looked for and how you decided what to extract.
+---
+### Rules
+1. **Search thoroughly**: The data may contain hundreds of entries. Scan the entire input carefully before concluding.
+2. **Match app names to package names**: When looking for an app package, look for package names where the app name (or a close variation) appears in the package identifier. Common patterns:
+   - App name in lowercase as part of the package
+   - Company/developer name followed by app name
+   - Brand name or abbreviated form of the app name
+   - Sometimes a codename or internal name related to the app
+3. **Prefer the most direct match**: If multiple packages contain similar terms, prefer the one where the app name appears most directly in the package identifier.
+4. **Consider variations**: App names may appear in different forms (abbreviated, translated, or with slight modifications) in package names.
+5. If the relevant information is **not found**, return `None`.
+6. If multiple plausible matches exist and you cannot determine which is correct, return `None` instead of guessing.

{minitap_mobile_use-2.5.2 → minitap_mobile_use-2.6.0}/minitap/mobile_use/agents/hopper/hopper.py RENAMED Viewed

@@ -5,17 +5,15 @@ from langchain_core.messages import HumanMessage, SystemMessage
 from pydantic import BaseModel, Field
 from minitap.mobile_use.context import MobileUseContext
-from minitap.mobile_use.services.llm import get_llm, invoke_llm_with_timeout_message
+from minitap.mobile_use.services.llm import get_llm, invoke_llm_with_timeout_message, with_fallback
 class HopperOutput(BaseModel):
-    step: str = Field(
-        description=(
-            "The step that has been done, must be a valid one following the "
-            "current steps and the current goal to achieve."
-        )
-    )
     output: str = Field(description="The interesting data extracted from the input data.")
+    reason: str = Field(
+        description="A short explanation of what you looked for"
+        + " and how you decided what to extract."
+    )
 async def hopper(
@@ -32,12 +30,18 @@ async def hopper(
         HumanMessage(content=f"{request}\nHere is the data you must dig:\n{data}"),
     ]
-    llm = get_llm(ctx=ctx, name="hopper", is_utils=True, temperature=0)
-    structured_llm = llm.with_structured_output(HopperOutput)
-    response: HopperOutput = await invoke_llm_with_timeout_message(
-        structured_llm.ainvoke(messages), agent_name="Hopper"
-    )  # type: ignore
-    return HopperOutput(
-        step=response.step,
-        output=response.output,
+    llm = get_llm(ctx=ctx, name="hopper", is_utils=True, temperature=0).with_structured_output(
+        HopperOutput
     )
+    llm_fallback = get_llm(
+        ctx=ctx, name="hopper", is_utils=True, use_fallback=True, temperature=0
+    ).with_structured_output(HopperOutput)
+    response: HopperOutput = await with_fallback(
+        main_call=lambda: invoke_llm_with_timeout_message(
+            llm.ainvoke(messages), agent_name="Hopper"
+        ),
+        fallback_call=lambda: invoke_llm_with_timeout_message(
+            llm_fallback.ainvoke(messages), agent_name="Hopper (Fallback)"
+        ),
+    )  # type: ignore
+    return response

{minitap_mobile_use-2.5.2 → minitap_mobile_use-2.6.0}/minitap/mobile_use/agents/orchestrator/orchestrator.py RENAMED Viewed

@@ -15,7 +15,7 @@ from minitap.mobile_use.agents.planner.utils import (
 )
 from minitap.mobile_use.context import MobileUseContext
 from minitap.mobile_use.graph.state import State
-from minitap.mobile_use.services.llm import get_llm, invoke_llm_with_timeout_message
+from minitap.mobile_use.services.llm import get_llm, invoke_llm_with_timeout_message, with_fallback
 from minitap.mobile_use.utils.decorators import wrap_with_callbacks
 from minitap.mobile_use.utils.logger import get_logger
@@ -74,10 +74,19 @@ class OrchestratorNode:
             HumanMessage(content=human_message),
         ]
-        llm = get_llm(ctx=self.ctx, name="orchestrator", temperature=1)
-        llm = llm.with_structured_output(OrchestratorOutput)
-        response: OrchestratorOutput = await invoke_llm_with_timeout_message(
-            llm.ainvoke(messages), agent_name="Orchestrator"
+        llm = get_llm(ctx=self.ctx, name="orchestrator", temperature=1).with_structured_output(
+            OrchestratorOutput
+        )
+        llm_fallback = get_llm(
+            ctx=self.ctx, name="orchestrator", use_fallback=True, temperature=1
+        ).with_structured_output(OrchestratorOutput)
+        response: OrchestratorOutput = await with_fallback(
+            main_call=lambda: invoke_llm_with_timeout_message(
+                llm.ainvoke(messages), agent_name="Orchestrator"
+            ),
+            fallback_call=lambda: invoke_llm_with_timeout_message(
+                llm_fallback.ainvoke(messages), agent_name="Orchestrator (Fallback)"
+            ),
         )  # type: ignore
         if response.needs_replaning:
             thoughts = [response.reason]

{minitap_mobile_use-2.5.2 → minitap_mobile_use-2.6.0}/minitap/mobile_use/agents/outputter/outputter.py RENAMED Viewed

@@ -8,7 +8,7 @@ from pydantic import BaseModel
 from minitap.mobile_use.config import OutputConfig
 from minitap.mobile_use.context import MobileUseContext
 from minitap.mobile_use.graph.state import State
-from minitap.mobile_use.services.llm import get_llm, invoke_llm_with_timeout_message
+from minitap.mobile_use.services.llm import get_llm, invoke_llm_with_timeout_message, with_fallback
 from minitap.mobile_use.utils.conversations import is_ai_message
 from minitap.mobile_use.utils.logger import get_logger
@@ -46,7 +46,11 @@ async def outputter(
         messages.append(HumanMessage(content=output_config.output_description))
     llm = get_llm(ctx=ctx, name="outputter", is_utils=True, temperature=1)
+    llm_fallback = get_llm(
+        ctx=ctx, name="outputter", is_utils=True, use_fallback=True, temperature=1
+    )
     structured_llm = llm
+    structured_llm_fallback = llm_fallback
     if output_config.structured_output:
         schema: dict | type[BaseModel] | None = None
@@ -61,9 +65,15 @@ async def outputter(
         if schema is not None:
             structured_llm = llm.with_structured_output(schema)
+            structured_llm_fallback = llm_fallback.with_structured_output(schema)
-    response = await invoke_llm_with_timeout_message(
-        structured_llm.ainvoke(messages), agent_name="Outputter"
+    response = await with_fallback(
+        main_call=lambda: invoke_llm_with_timeout_message(
+            structured_llm.ainvoke(messages), agent_name="Outputter"
+        ),
+        fallback_call=lambda: invoke_llm_with_timeout_message(
+            structured_llm_fallback.ainvoke(messages), agent_name="Outputter (Fallback)"
+        ),
     )  # type: ignore
     if isinstance(response, BaseModel):
         if output_config.output_description and hasattr(response, "content"):

minitap-mobile-use 2.5.2__tar.gz → 2.6.0__tar.gz

Potentially problematic release.

minitap-mobile-use 2.5.2tar.gz → 2.6.0tar.gz