PyPI - minitap-mobile-use - Versions diffs - 2.0.1__tar.gz → 2.2.0__tar.gz - Mend

minitap-mobile-use 2.0.1tar.gz → 2.2.0tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Potentially problematic release.

This version of minitap-mobile-use might be problematic. Click here for more details.

Files changed (97) hide show

{minitap_mobile_use-2.0.1 → minitap_mobile_use-2.2.0}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.3
 Name: minitap-mobile-use
-Version: 2.0.1
+Version: 2.2.0
 Summary: AI-powered multi-agent system that automates real Android and iOS devices through low-level control using LangGraph.
 Author: Pierre-Louis Favreau, Jean-Pierre Lo, Nicolas Dehandschoewercker
 License: MIT License
@@ -24,11 +24,11 @@ License: MIT License
          LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
          OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
          SOFTWARE.
-Requires-Dist: langgraph==0.5.0
+Requires-Dist: langgraph>=0.6.6
 Requires-Dist: adbutils==2.9.3
-Requires-Dist: langchain-google-genai==2.1.5
-Requires-Dist: langchain==0.3.26
-Requires-Dist: langchain-core==0.3.66
+Requires-Dist: langchain-google-genai>=2.1.10
+Requires-Dist: langchain>=0.3.27
+Requires-Dist: langchain-core>=0.3.75
 Requires-Dist: jinja2==3.1.6
 Requires-Dist: python-dotenv==1.1.1
 Requires-Dist: pydantic-settings==2.10.1
@@ -42,6 +42,7 @@ Requires-Dist: fastapi==0.111.0
 Requires-Dist: uvicorn[standard]==0.30.1
 Requires-Dist: colorama>=0.4.6
 Requires-Dist: psutil>=5.9.0
+Requires-Dist: langchain-google-vertexai>=2.0.28
 Requires-Dist: ruff==0.5.3 ; extra == 'dev'
 Requires-Dist: pytest==8.4.1 ; extra == 'dev'
 Requires-Dist: pytest-cov==5.0.0 ; extra == 'dev'
@@ -69,6 +70,10 @@ Description-Content-Type: text/markdown
     <a href="https://x.com/minitap_ai?t=iRWtI497UhRGLeCKYQekig&s=09"><b>Twitter / X</b></a>
 </p>
+[![PyPI version](https://img.shields.io/pypi/v/minitap-mobile-use.svg?color=blue)](https://pypi.org/project/minitap-mobile-use/)
+[![Python Version](https://img.shields.io/badge/python-3.12%2B-blue)](https://www.python.org/downloads/)
+[![License](https://img.shields.io/badge/license-MIT-blue)](https://github.com/minitap-ai/mobile-use/blob/main/LICENSE)
 </div>
 Mobile-use is a powerful, open-source AI agent that controls your Android or IOS device using natural language. It understands your commands and interacts with the UI to perform tasks, from sending messages to navigating complex apps.
@@ -107,11 +112,26 @@ Ready to automate your mobile experience? Follow these steps to get mobile-use u
 2.  **(Optional) Customize LLM Configuration:**
     To use different models or providers, create your own LLM configuration file.
     ```bash
     cp llm-config.override.template.jsonc llm-config.override.jsonc
     ```
     Then, edit `llm-config.override.jsonc` to fit your needs.
+    You can also use local LLMs or any other openai-api compatible providers :
+    1. Set `OPENAI_BASE_URL` and `OPENAI_API_KEY` in your `.env`
+    2. In your `llm-config.override.jsonc`, set `openai` as the provider for the agent nodes you want, and choose a model supported by your provider.
+    > [!NOTE]
+    > If you want to use Google Vertex AI, you must either:
+    >
+    > - Have credentials configured for your environment (gcloud, workload identity, etc…)
+    > - Store the path to a service account JSON file as the GOOGLE_APPLICATION_CREDENTIALS environment variable
+    >
+    > More information: - [Credential types](https://cloud.google.com/docs/authentication/application-default-credentials#GAC) - [google.auth API reference](https://googleapis.dev/python/google-auth/latest/reference/google.auth.html#module-google.auth)
 ### Quick Launch (Docker)
 > [!NOTE]
@@ -257,6 +277,16 @@ python ./src/mobile_use/main.py \
 > [!NOTE]
 > If you haven't configured a specific model, mobile-use will prompt you to choose one from the available options.
+## 🔎 Agentic System Overview
+<div align="center">
+![Graph Visualization](doc/graph.png)
+_This diagram is automatically updated from the codebase. This is our current agentic system architecture._
+</div>
 ## ❤️ Contributing
 We love contributions! Whether you're fixing a bug, adding a feature, or improving documentation, your help is welcome. Please read our **[Contributing Guidelines](CONTRIBUTING.md)** to get started.

{minitap_mobile_use-2.0.1 → minitap_mobile_use-2.2.0}/README.md RENAMED Viewed

@@ -16,6 +16,10 @@
     <a href="https://x.com/minitap_ai?t=iRWtI497UhRGLeCKYQekig&s=09"><b>Twitter / X</b></a>
 </p>
+[![PyPI version](https://img.shields.io/pypi/v/minitap-mobile-use.svg?color=blue)](https://pypi.org/project/minitap-mobile-use/)
+[![Python Version](https://img.shields.io/badge/python-3.12%2B-blue)](https://www.python.org/downloads/)
+[![License](https://img.shields.io/badge/license-MIT-blue)](https://github.com/minitap-ai/mobile-use/blob/main/LICENSE)
 </div>
 Mobile-use is a powerful, open-source AI agent that controls your Android or IOS device using natural language. It understands your commands and interacts with the UI to perform tasks, from sending messages to navigating complex apps.
@@ -54,11 +58,26 @@ Ready to automate your mobile experience? Follow these steps to get mobile-use u
 2.  **(Optional) Customize LLM Configuration:**
     To use different models or providers, create your own LLM configuration file.
     ```bash
     cp llm-config.override.template.jsonc llm-config.override.jsonc
     ```
     Then, edit `llm-config.override.jsonc` to fit your needs.
+    You can also use local LLMs or any other openai-api compatible providers :
+    1. Set `OPENAI_BASE_URL` and `OPENAI_API_KEY` in your `.env`
+    2. In your `llm-config.override.jsonc`, set `openai` as the provider for the agent nodes you want, and choose a model supported by your provider.
+    > [!NOTE]
+    > If you want to use Google Vertex AI, you must either:
+    >
+    > - Have credentials configured for your environment (gcloud, workload identity, etc…)
+    > - Store the path to a service account JSON file as the GOOGLE_APPLICATION_CREDENTIALS environment variable
+    >
+    > More information: - [Credential types](https://cloud.google.com/docs/authentication/application-default-credentials#GAC) - [google.auth API reference](https://googleapis.dev/python/google-auth/latest/reference/google.auth.html#module-google.auth)
 ### Quick Launch (Docker)
 > [!NOTE]
@@ -204,6 +223,16 @@ python ./src/mobile_use/main.py \
 > [!NOTE]
 > If you haven't configured a specific model, mobile-use will prompt you to choose one from the available options.
+## 🔎 Agentic System Overview
+<div align="center">
+![Graph Visualization](doc/graph.png)
+_This diagram is automatically updated from the codebase. This is our current agentic system architecture._
+</div>
 ## ❤️ Contributing
 We love contributions! Whether you're fixing a bug, adding a feature, or improving documentation, your help is welcome. Please read our **[Contributing Guidelines](CONTRIBUTING.md)** to get started.

{minitap_mobile_use-2.0.1 → minitap_mobile_use-2.2.0}/minitap/mobile_use/agents/cortex/cortex.md RENAMED Viewed

@@ -35,17 +35,19 @@ Focus on the **current PENDING subgoal and the next subgoals not yet started**.
 - Past agent thoughts
 - Recent tool effects
-2.2. Otherwise, output a **stringified structured set of instructions** that an **Executor agent** can perform on a real mobile device:
+  2.2. Otherwise, output a **stringified structured set of instructions** that an **Executor agent** can perform on a real mobile device:
-- These must be **concrete low-level actions**: back, tap, swipe, launch app, find packages, close app, input text, paste, erase text, copy, etc.
-- Your goal is to achieve subgoals **fast** - so you must put as much actions as possible in your instructions to complete all achievable subgoals (based on your observations) in one go.
-- When you need to open an app, use the `find_packages` low-level action to try and get its name.
+- These must be **concrete low-level actions**.
+- The executor has the following available tools: {{ executor_tools_list }}.
+- Your goal is to achieve subgoals **fast** - so you must put as much actions as possible in your instructions to complete all achievable subgoals (based on your observations) in one go.
+- To open URLs/links directly, use the `open_link` tool - it will automatically handle opening in the appropriate browser. It also handles deep links.
+- When you need to open an app, use the `find_packages` low-level action to try and get its name. Then, simply use the `launch_app` low-level action to launch it.
 - If you refer to a UI element or coordinates, specify it clearly (e.g., `resource-id: com.whatsapp:id/search`, `text: "Alice"`, `x: 100, y: 200`).
 - **The structure is up to you**, but it must be valid **JSON stringified output**. You will accompany this output with a **natural-language summary** of your reasoning and approach in your agent thought.
 - **Never use a sequence of `tap` + `input_text` to type into a field. Always use a single `input_text` action** with the correct `resource_id` (this already ensures the element is focused and the cursor is moved to the end).
 - When you want to launch/stop an app, prefer using its package name.
 - **Only reference UI element IDs or visible texts that are explicitly present in the provided UI hierarchy or screenshot. Do not invent, infer, or guess any IDs or texts that are not directly observed**.
-- **For text clearing**: When you need to completely clear text from an input field, always use **LONG PRESS** first to select the text field, then erase. Do NOT use tap + erase as this only clears from cursor position.
+- **For text clearing**: When you need to completely clear text from an input field, always call the `clear_text` tool with the correct resource_id. This tool automatically focuses the element, and ensures the field is emptied. If you notice this tool fails to clear the text, try to long press the input, select all, and call `erase_one_char`.
 ### Output

{minitap_mobile_use-2.0.1 → minitap_mobile_use-2.2.0}/minitap/mobile_use/agents/cortex/cortex.py RENAMED Viewed

@@ -10,12 +10,14 @@ from langchain_core.messages import (
     ToolMessage,
 )
 from langgraph.graph.message import REMOVE_ALL_MESSAGES
 from minitap.mobile_use.agents.cortex.types import CortexOutput
 from minitap.mobile_use.agents.planner.utils import get_current_subgoal
 from minitap.mobile_use.constants import EXECUTOR_MESSAGES_KEY
 from minitap.mobile_use.context import MobileUseContext
 from minitap.mobile_use.graph.state import State
 from minitap.mobile_use.services.llm import get_llm, with_fallback
+from minitap.mobile_use.tools.index import EXECUTOR_WRAPPERS_TOOLS, format_tools_list
 from minitap.mobile_use.utils.conversations import get_screenshot_message_for_llm
 from minitap.mobile_use.utils.decorators import wrap_with_callbacks
 from minitap.mobile_use.utils.logger import get_logger
@@ -44,6 +46,7 @@ class CortexNode:
             current_subgoal=get_current_subgoal(state.subgoal_plan),
             agents_thoughts=state.agents_thoughts,
             executor_feedback=executor_feedback,
+            executor_tools_list=format_tools_list(ctx=self.ctx, wrappers=EXECUTOR_WRAPPERS_TOOLS),
         )
         messages = [
             SystemMessage(content=system_message),
@@ -83,7 +86,7 @@ class CortexNode:
         is_subgoal_completed = (
             response.complete_subgoals_by_ids is not None
             and len(response.complete_subgoals_by_ids) > 0
-            and len(response.decisions) == 0
+            and (len(response.decisions) == 0 or response.decisions in ["{}", "[]", "null", ""])
         )
         if not is_subgoal_completed:
             response.complete_subgoals_by_ids = []

{minitap_mobile_use-2.0.1 → minitap_mobile_use-2.2.0}/minitap/mobile_use/agents/cortex/types.py RENAMED Viewed

@@ -1,11 +1,9 @@
-from typing import Optional
 from pydantic import BaseModel, Field
 class CortexOutput(BaseModel):
     decisions: str = Field(..., description="The decisions to be made. A stringified JSON object")
     agent_thought: str = Field(..., description="The agent's thought")
-    complete_subgoals_by_ids: Optional[list[str]] = Field(
+    complete_subgoals_by_ids: list[str] | None = Field(
         [], description="List of subgoal IDs to complete"
     )

{minitap_mobile_use-2.0.1 → minitap_mobile_use-2.2.0}/minitap/mobile_use/agents/executor/executor.md RENAMED Viewed

@@ -64,14 +64,13 @@ When using the `input_text` tool:
 #### 🔄 Text Clearing Best Practice
-When you need to completely clear text from an input field, **DO NOT** simply use `erase_text` alone, as it only erases from the cursor position, backward. Instead:
+When you need to completely clear text from an input field, always use the clear_text tool with the correct resource_id.
-1. **Use `long_press_on` first** to select the text field and bring up selection options
-2. **Then use `erase_text`** to clear the selected content
+This tool automatically takes care of focusing the element (if needed), and ensuring the field is fully emptied.
-This approach ensures the **entire text content** is removed, not just the portion before the cursor position. The long press will typically select all text in the field, making the subsequent erase operation more effective.
+Only and if only the clear_text tool fails to clear the text, try to long press the input, select all, and call erase_one_char.
-### 🔁 Final Notes
+#### 🔁 Final Notes
 - **You do not need to reason or decide strategy** — that's the Cortex's job.
 - You simply interpret and execute — like hands following the brain.

{minitap_mobile_use-2.0.1 → minitap_mobile_use-2.2.0}/minitap/mobile_use/agents/executor/executor.py RENAMED Viewed

@@ -3,6 +3,8 @@ from pathlib import Path
 from jinja2 import Template
 from langchain_core.messages import HumanMessage, SystemMessage
 from langchain_google_genai import ChatGoogleGenerativeAI
+from langchain_google_vertexai.chat_models import ChatVertexAI
 from minitap.mobile_use.constants import EXECUTOR_MESSAGES_KEY
 from minitap.mobile_use.context import MobileUseContext
 from minitap.mobile_use.graph.state import State
@@ -56,7 +58,7 @@ class ExecutorNode:
         }
         # ChatGoogleGenerativeAI does not support the "parallel_tool_calls" keyword
-        if not isinstance(llm, ChatGoogleGenerativeAI):
+        if not isinstance(llm, ChatGoogleGenerativeAI | ChatVertexAI):
             llm_bind_tools_kwargs["parallel_tool_calls"] = True
         llm = llm.bind_tools(**llm_bind_tools_kwargs)

{minitap_mobile_use-2.0.1 → minitap_mobile_use-2.2.0}/minitap/mobile_use/agents/executor/tool_node.py RENAMED Viewed

@@ -1,8 +1,8 @@
 import asyncio
-from typing import Any, Optional
+from typing import Any
 from langgraph.types import Command
 from pydantic import BaseModel
-from typing_extensions import override
+from typing import override
 from langchain_core.runnables import RunnableConfig
 from langgraph.store.base import BaseStore
 from langchain_core.messages import AnyMessage, ToolCall, ToolMessage
@@ -21,7 +21,7 @@ class ExecutorToolNode(ToolNode):
         input: list[AnyMessage] | dict[str, Any] | BaseModel,
         config: RunnableConfig,
         *,
-        store: Optional[BaseStore],
+        store: BaseStore | None,
     ):
         return await self.__func(is_async=True, input=input, config=config, store=store)
@@ -31,7 +31,7 @@ class ExecutorToolNode(ToolNode):
         input: list[AnyMessage] | dict[str, Any] | BaseModel,
         config: RunnableConfig,
         *,
-        store: Optional[BaseStore],
+        store: BaseStore | None,
     ) -> Any:
         loop = asyncio.get_event_loop()
         return loop.run_until_complete(
@@ -44,7 +44,7 @@ class ExecutorToolNode(ToolNode):
         input: list[AnyMessage] | dict[str, Any] | BaseModel,
         config: RunnableConfig,
         *,
-        store: Optional[BaseStore],
+        store: BaseStore | None,
     ) -> Any:
         tool_calls, input_type = self._parse_input(input, store)
         outputs: list[Command | ToolMessage] = []
@@ -74,7 +74,7 @@ class ExecutorToolNode(ToolNode):
         self,
         call: ToolCall,
         output: ToolMessage | Command,
-    ) -> Optional[bool]:
+    ) -> bool | None:
         if isinstance(output, ToolMessage):
             return output.status == "error"
         if isinstance(output, Command):

{minitap_mobile_use-2.0.1 → minitap_mobile_use-2.2.0}/minitap/mobile_use/agents/outputter/outputter.py RENAMED Viewed

@@ -1,6 +1,5 @@
 import json
 from pathlib import Path
-from typing import Dict, Type, Union
 from jinja2 import Template
 from langchain_core.messages import BaseMessage, HumanMessage, SystemMessage
@@ -49,7 +48,7 @@ async def outputter(
     structured_llm = llm
     if output_config.structured_output:
-        schema: Union[Dict, Type[BaseModel], None] = None
+        schema: dict | type[BaseModel] | None = None
         so = output_config.structured_output
         if isinstance(so, dict):

{minitap_mobile_use-2.0.1 → minitap_mobile_use-2.2.0}/minitap/mobile_use/agents/planner/planner.md RENAMED Viewed

@@ -12,7 +12,9 @@ You work like an agile tech lead: defining the key milestones without locking in
    - Subgoals should reflect real interactions with mobile UIs (e.g. "Open app", "Tap search bar", "Scroll to item", "Send message to Bob", etc).
    - Don't assume the full UI is visible yet. Plan based on how most mobile apps work, and keep flexibility.
    - List of agents thoughts is empty which is expected, since it is the first plan.
-   - Don't use precise UI actions when formulating subgoals like "copy", "paste", "tap", "swipe", ... unless explicitly asked in the initial goal.
+   - Avoid too granular UI actions based tasks (e.g. "tap", "swipe", "copy", "paste") unless explicitly required.
+   - The executor has the following available tools: {{ executor_tools_list }}.
+     When one of these tools offers a direct shortcut (e.g. `openLink` instead of manually launching a browser and typing a URL), prefer it over decomposed manual steps.
 2. **Replanning**
    If you're asked to **revise a previous plan**, you'll also receive:
@@ -47,12 +49,19 @@ If you're replaning and need to keep a previous subgoal, you **must keep the sam
 - Type the message "I’m running late" (ID: None)
 - Send the message (ID: None)
+#### **Initial Goal**: "Go on https://tesla.com, and tell me what is the first car being displayed"
+**Plan**:
+- Open the link https://tesla.com (ID: None)
+- Find the first car displayed on the home page (ID: None)
 #### **Replanning Example**
 **Original Plan**: same as above with IDs set
 **Agent Thoughts**:
-- Couldn’t find Alice in recent chats
+- Couldn't find Alice in recent chats
 - Search bar was present on top of the chat screen
 - Keyboard appeared after tapping search

{minitap_mobile_use-2.0.1 → minitap_mobile_use-2.2.0}/minitap/mobile_use/agents/planner/planner.py RENAMED Viewed

@@ -1,13 +1,15 @@
-from pathlib import Path
 import uuid
+from pathlib import Path
 from jinja2 import Template
 from langchain_core.messages import HumanMessage, SystemMessage
 from minitap.mobile_use.agents.planner.types import PlannerOutput, Subgoal, SubgoalStatus
 from minitap.mobile_use.agents.planner.utils import one_of_them_is_failure
 from minitap.mobile_use.context import MobileUseContext
 from minitap.mobile_use.graph.state import State
 from minitap.mobile_use.services.llm import get_llm
+from minitap.mobile_use.tools.index import EXECUTOR_WRAPPERS_TOOLS, format_tools_list
 from minitap.mobile_use.utils.decorators import wrap_with_callbacks
 from minitap.mobile_use.utils.logger import get_logger
@@ -28,7 +30,10 @@ class PlannerNode:
         system_message = Template(
             Path(__file__).parent.joinpath("planner.md").read_text(encoding="utf-8")
-        ).render(platform=self.ctx.device.mobile_platform.value)
+        ).render(
+            platform=self.ctx.device.mobile_platform.value,
+            executor_tools_list=format_tools_list(ctx=self.ctx, wrappers=EXECUTOR_WRAPPERS_TOOLS),
+        )
         human_message = Template(
             Path(__file__).parent.joinpath("human.md").read_text(encoding="utf-8")
         ).render(

{minitap_mobile_use-2.0.1 → minitap_mobile_use-2.2.0}/minitap/mobile_use/agents/planner/types.py RENAMED Viewed

@@ -1,12 +1,11 @@
 from enum import Enum
-from typing import Optional
 from pydantic import BaseModel
-from typing_extensions import Annotated
+from typing import Annotated
 class PlannerSubgoalOutput(BaseModel):
-    id: Annotated[Optional[str], "If not provided, it will be generated"] = None
+    id: Annotated[str | None, "If not provided, it will be generated"] = None
     description: str
@@ -25,7 +24,7 @@ class Subgoal(BaseModel):
     id: Annotated[str, "Unique identifier of the subgoal"]
     description: Annotated[str, "Description of the subgoal"]
     completion_reason: Annotated[
-        Optional[str], "Reason why the subgoal was completed (failure or success)"
+        str | None, "Reason why the subgoal was completed (failure or success)"
     ] = None
     status: SubgoalStatus

{minitap_mobile_use-2.0.1 → minitap_mobile_use-2.2.0}/minitap/mobile_use/agents/summarizer/summarizer.py RENAMED Viewed

@@ -3,6 +3,7 @@ from langchain_core.messages import (
     RemoveMessage,
     ToolMessage,
 )
 from minitap.mobile_use.constants import MAX_MESSAGES_IN_HISTORY
 from minitap.mobile_use.context import MobileUseContext
 from minitap.mobile_use.graph.state import State
@@ -22,7 +23,7 @@ class SummarizerNode:
         start_removal = False
         for msg in reversed(state.messages[:nb_removal_candidates]):
-            if isinstance(msg, (ToolMessage, HumanMessage)):
+            if isinstance(msg, ToolMessage | HumanMessage):
                 start_removal = True
             if start_removal and msg.id:
                 remove_messages.append(RemoveMessage(id=msg.id))

{minitap_mobile_use-2.0.1 → minitap_mobile_use-2.2.0}/minitap/mobile_use/config.py RENAMED Viewed

@@ -1,9 +1,11 @@
 import json
 import os
 from pathlib import Path
-from typing import Annotated, Any, Literal, Optional, Union
+from typing import Annotated, Any, Literal
+import google.auth
 from dotenv import load_dotenv
+from google.auth.exceptions import DefaultCredentialsError
 from pydantic import BaseModel, Field, SecretStr, ValidationError, model_validator
 from pydantic_settings import BaseSettings
@@ -17,17 +19,17 @@ logger = get_logger(__name__)
 class Settings(BaseSettings):
-    OPENAI_API_KEY: Optional[SecretStr] = None
-    GOOGLE_API_KEY: Optional[SecretStr] = None
-    XAI_API_KEY: Optional[SecretStr] = None
-    OPEN_ROUTER_API_KEY: Optional[SecretStr] = None
+    OPENAI_API_KEY: SecretStr | None = None
+    GOOGLE_API_KEY: SecretStr | None = None
+    XAI_API_KEY: SecretStr | None = None
+    OPEN_ROUTER_API_KEY: SecretStr | None = None
-    OPENAI_BASE_URL: Optional[str] = None
+    OPENAI_BASE_URL: str | None = None
-    DEVICE_SCREEN_API_BASE_URL: Optional[str] = None
-    DEVICE_HARDWARE_BRIDGE_BASE_URL: Optional[str] = None
-    ADB_HOST: Optional[str] = None
-    ADB_PORT: Optional[int] = None
+    DEVICE_SCREEN_API_BASE_URL: str | None = None
+    DEVICE_HARDWARE_BRIDGE_BASE_URL: str | None = None
+    ADB_HOST: str | None = None
+    ADB_PORT: int | None = None
     model_config = {"env_file": ".env", "extra": "ignore"}
@@ -71,7 +73,7 @@ def prepare_output_files() -> tuple[str | None, str | None]:
     return validated_events_path, validated_results_path
-def record_events(output_path: Path | None, events: Union[list[str], BaseModel, Any]):
+def record_events(output_path: Path | None, events: list[str] | BaseModel | Any):
     if not output_path:
         return
@@ -88,7 +90,7 @@ def record_events(output_path: Path | None, events: Union[list[str], BaseModel,
 ### LLM Configuration
-LLMProvider = Literal["openai", "google", "openrouter", "xai"]
+LLMProvider = Literal["openai", "google", "openrouter", "xai", "vertexai"]
 LLMUtilsNode = Literal["outputter", "hopper"]
 AgentNode = Literal["planner", "orchestrator", "cortex", "executor"]
 AgentNodeWithFallback = Literal["cortex"]
@@ -98,6 +100,17 @@ DEFAULT_LLM_CONFIG_FILENAME = "llm-config.defaults.jsonc"
 OVERRIDE_LLM_CONFIG_FILENAME = "llm-config.override.jsonc"
+def validate_vertex_ai_credentials():
+    try:
+        _, project = google.auth.default()
+        if not project:
+            raise Exception("VertexAI requires a Google Cloud project to be set.")
+    except DefaultCredentialsError as e:
+        raise Exception(
+            f"VertexAI requires valid Google Application Default Credentials (ADC): {e}"
+        )
 class LLM(BaseModel):
     provider: LLMProvider
     model: str
@@ -110,6 +123,8 @@ class LLM(BaseModel):
             case "google":
                 if not settings.GOOGLE_API_KEY:
                     raise Exception(f"{name} requires GOOGLE_API_KEY in .env")
+            case "vertexai":
+                validate_vertex_ai_credentials()
             case "openrouter":
                 if not settings.OPEN_ROUTER_API_KEY:
                     raise Exception(f"{name} requires OPEN_ROUTER_API_KEY in .env")
@@ -170,7 +185,7 @@ def get_default_llm_config() -> LLMConfig:
     try:
         if not os.path.exists(ROOT_DIR / DEFAULT_LLM_CONFIG_FILENAME):
             raise Exception("Default llm config not found")
-        with open(ROOT_DIR / DEFAULT_LLM_CONFIG_FILENAME, "r") as f:
+        with open(ROOT_DIR / DEFAULT_LLM_CONFIG_FILENAME) as f:
             default_config_dict = load_jsonc(f)
         return LLMConfig.model_validate(default_config_dict["default"])
     except Exception as e:
@@ -211,7 +226,7 @@ def parse_llm_config() -> LLMConfig:
     override_config_dict = {}
     if os.path.exists(ROOT_DIR / OVERRIDE_LLM_CONFIG_FILENAME):
         logger.info("Loading custom llm config...")
-        with open(ROOT_DIR / OVERRIDE_LLM_CONFIG_FILENAME, "r") as f:
+        with open(ROOT_DIR / OVERRIDE_LLM_CONFIG_FILENAME) as f:
             override_config_dict = load_jsonc(f)
     else:
         logger.warning("Custom llm config not found, loading default config")
@@ -237,7 +252,7 @@ def initialize_llm_config() -> LLMConfig:
 class OutputConfig(BaseModel):
     structured_output: Annotated[
-        Optional[Union[type[BaseModel], dict]],
+        type[BaseModel] | dict | None,
         Field(
             default=None,
             description=(
@@ -247,7 +262,7 @@ class OutputConfig(BaseModel):
         ),
     ]
     output_description: Annotated[
-        Optional[str],
+        str | None,
         Field(
             default=None,
             description=(

{minitap_mobile_use-2.0.1 → minitap_mobile_use-2.2.0}/minitap/mobile_use/context.py RENAMED Viewed

@@ -6,12 +6,11 @@ Uses ContextVar to avoid prop drilling and maintain clean function signatures.
 from enum import Enum
 from pathlib import Path
-from typing import Optional
 from adbutils import AdbClient
 from openai import BaseModel
 from pydantic import ConfigDict
-from typing_extensions import Literal
+from typing import Literal
 from minitap.mobile_use.clients.device_hardware_client import DeviceHardwareClient
 from minitap.mobile_use.clients.screen_api_client import ScreenApiClient
@@ -56,8 +55,8 @@ class MobileUseContext(BaseModel):
     hw_bridge_client: DeviceHardwareClient
     screen_api_client: ScreenApiClient
     llm_config: LLMConfig
-    adb_client: Optional[AdbClient] = None
-    execution_setup: Optional[ExecutionSetup] = None
+    adb_client: AdbClient | None = None
+    execution_setup: ExecutionSetup | None = None
     def get_adb_client(self) -> AdbClient:
         if self.adb_client is None:

minitap-mobile-use 2.0.1__tar.gz → 2.2.0__tar.gz

Potentially problematic release.

minitap-mobile-use 2.0.1tar.gz → 2.2.0tar.gz