PyPI - droidrun - Versions diffs - 0.3.10.dev4__tar.gz → 0.3.10.dev6__tar.gz - Mend

droidrun 0.3.10.dev4tar.gz → 0.3.10.dev6tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (124) hide show

{droidrun-0.3.10.dev4 → droidrun-0.3.10.dev6}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: droidrun
-Version: 0.3.10.dev4
+Version: 0.3.10.dev6
 Summary: A framework for controlling Android devices through LLM agents
 Project-URL: Homepage, https://github.com/droidrun/droidrun
 Project-URL: Bug Tracker, https://github.com/droidrun/droidrun/issues
@@ -13,8 +13,6 @@ Classifier: Intended Audience :: Developers
 Classifier: Intended Audience :: Information Technology
 Classifier: Intended Audience :: Science/Research
 Classifier: License :: OSI Approved :: MIT License
-Classifier: Programming Language :: Python :: 3
-Classifier: Programming Language :: Python :: 3.10
 Classifier: Programming Language :: Python :: 3.11
 Classifier: Programming Language :: Python :: 3.12
 Classifier: Programming Language :: Python :: 3.13
@@ -27,10 +25,11 @@ Classifier: Topic :: Software Development :: Testing
 Classifier: Topic :: Software Development :: Testing :: Acceptance
 Classifier: Topic :: System :: Emulators
 Classifier: Topic :: Utilities
-Requires-Python: >=3.13
+Requires-Python: >=3.11
 Requires-Dist: adbutils>=2.10.2
 Requires-Dist: apkutils==2.0.0
 Requires-Dist: arize-phoenix>=12.3.0
+Requires-Dist: httpx>=0.27.0
 Requires-Dist: llama-index==0.14.4
 Requires-Dist: posthog>=6.7.6
 Requires-Dist: pydantic>=2.11.10
@@ -38,14 +37,6 @@ Requires-Dist: rich>=14.1.0
 Provides-Extra: anthropic
 Requires-Dist: anthropic>=0.67.0; extra == 'anthropic'
 Requires-Dist: llama-index-llms-anthropic<0.9.0,>=0.8.6; extra == 'anthropic'
-Provides-Extra: backend
-Requires-Dist: aiohttp>=3.9.0; extra == 'backend'
-Requires-Dist: fastapi>=0.104.0; extra == 'backend'
-Requires-Dist: pydantic-settings>=2.0.0; extra == 'backend'
-Requires-Dist: python-dotenv>=1.0.0; extra == 'backend'
-Requires-Dist: python-multipart>=0.0.6; extra == 'backend'
-Requires-Dist: uvicorn[standard]>=0.24.0; extra == 'backend'
-Requires-Dist: websockets>=12.0; extra == 'backend'
 Provides-Extra: deepseek
 Requires-Dist: llama-index-llms-deepseek>=0.2.1; extra == 'deepseek'
 Provides-Extra: dev

droidrun-0.3.10.dev4/config/prompts/codeact/system.md → droidrun-0.3.10.dev6/config/prompts/codeact/system.jinja2 RENAMED Viewed

@@ -49,7 +49,7 @@ complete(success=True, reason="Successfully navigated to Wi-Fi settings and init
 ## Tools:
 In addition to the Python Standard Library and any functions you have already written, you can use the following functions:
-{tool_descriptions}
+{{ tool_descriptions }}
 ## Final Answer Guidelines:

droidrun-0.3.10.dev4/config/prompts/codeact/user.md → droidrun-0.3.10.dev6/config/prompts/codeact/user.jinja2 RENAMED Viewed

@@ -1,5 +1,5 @@
 **Current Request:**
-{goal}
+{{ goal }}
 **Is the precondition met? What is your reasoning and the next step to address this request?**
 Explain your thought process then provide code in ```python ... ``` tags if needed.

droidrun-0.3.10.dev6/config/prompts/executor/rev1.jinja2 ADDED Viewed

@@ -0,0 +1,78 @@
+# Android Action Executor
+You are an action executor. Your only job: execute the current subgoal exactly as written.
+## Context
+**User Request:** {{ instruction }}
+{% if app_card %}
+App card gives information on how to operate the app and perform actions.
+**App Card:** {{ app_card }}
+{% endif %}
+{% if device_state %}
+**Device State:** {{ device_state }}
+{% endif %}
+**Overall Plan:** {{ plan }}
+**Current Subgoal:** {{ subgoal }}
+**Progress:** {{ progress_status|default("No progress yet.") }}
+**Recent Actions:**
+{% if action_history %}
+{% for action in action_history[-5:] %}
+{% if action.outcome %}
+- Action: {{ action.action }} | Description: {{ action.summary }} | Outcome: Successful
+{% else %}
+- Action: {{ action.action }} | Description: {{ action.summary }} | Outcome: Failed | Feedback: {{ action.error }}
+{% endif %}
+{% endfor %}
+{% else %}
+No actions have been taken yet.
+{% endif %}
+---
+## Your Task
+1. Read the current subgoal
+2. Identify the action verb (tap, swipe, type, press, open)
+3. Identify the target (button name, text, coordinates)
+4. Execute that exact action
+**Do not:**
+- Answer questions
+- Make decisions about what to do next
+- Optimize or substitute actions
+- Repeat failed actions more than once
+---
+## Action Reference
+### Available Actions
+{% for action_name, action_info in atomic_actions.items() %}
+- {{ action_name }}({{ action_info.arguments|join(', ') }}): {{ action_info.description }}
+{% endfor %}
+### Key Rules
+- Close popups (permission requests) before proceeding
+- Always activate input box (click it) before typing
+- Use `open_app` to launch apps, not the app drawer
+- Try different swipe directions if content doesn't change
+---
+## Output Format
+### Thought ###
+What action and target does the subgoal specify?
+### Action ###
+{"action": "action_name", "argument": "value"}
+### Description ###
+One sentence describing the action you're taking.

droidrun-0.3.10.dev4/config/prompts/executor/system.md → droidrun-0.3.10.dev6/config/prompts/executor/system.jinja2 RENAMED Viewed

@@ -1,13 +1,24 @@
 You are a LOW-LEVEL ACTION EXECUTOR for an Android phone. You do NOT answer questions or provide results. You ONLY perform individual atomic actions as specified in the current subgoal. You are part of a larger system - your job is to execute actions, not to think about or answer the user's original question.
 ### User Request ###
-{instruction}
+{{ instruction }}
-{app_card}{device_state_text}### Overall Plan ###
-{plan}
+{% if app_card %}
+App card gives information on how to operate the app and perform actions.
+### App Card ###
+{{ app_card }}
+{% endif %}
+{% if device_state %}
+### Device State ###
+{{ device_state }}
+{% endif %}
+### Overall Plan ###
+{{ plan }}
 ### Current Subgoal ###
-EXECUTE THIS SUBGOAL: {subgoal}
+EXECUTE THIS SUBGOAL: {{ subgoal }}
 EXECUTION MODE: You are a dumb robot. Find the exact text/element mentioned in the subgoal above and perform the specified action on it. Do not read anything below this line until after you execute the subgoal.
@@ -25,7 +36,7 @@ Convert directly to atomic action:
 Execute the atomic action for the exact target mentioned. Ignore everything else.
 ### Progress Status ###
-{progress_status}
+{{ progress_status|default("No progress yet.") }}
 ### Guidelines ###
 General:
@@ -47,11 +58,25 @@ Execute the current subgoal mechanically. Do NOT examine the screen content or m
 #### Atomic Actions ####
 The atomic action functions are listed in the format of `action(arguments): description` as follows:
-{atomic_actions}
+{% for action_name, action_info in atomic_actions.items() %}
+- {{ action_name }}({{ action_info.arguments|join(', ') }}): {{ action_info.description }}
+{% endfor %}
 ### Latest Action History ###
-{action_history}
+{% if action_history %}
+Recent actions you took previously and whether they were successful:
+{% for action in action_history[-5:] %}
+{% if action.outcome %}
+Action: {{ action.action }} | Description: {{ action.summary }} | Outcome: Successful
+{% else %}
+Action: {{ action.action }} | Description: {{ action.summary }} | Outcome: Failed | Feedback: {{ action.error }}
+{% endif %}
+{% endfor %}
+{% else %}
+No actions have been taken yet.
+{% endif %}
 ---
 ### LITERAL EXECUTION RULE ###
 Whatever the current subgoal says to do, do that EXACTLY. Do not substitute with what you think is better. Do not optimize. Do not consider screen state. Parse the subgoal text literally and execute the matching atomic action.

droidrun-0.3.10.dev4/config/prompts/manager/rev1.md → droidrun-0.3.10.dev6/config/prompts/manager/rev1.jinja2 RENAMED Viewed

@@ -3,12 +3,46 @@
 You operate an Android phone by creating high-level plans to fulfill user requests.
 ## User Request
-{instruction}
+{{ instruction }}
 ## Current Context
-{device_date}{app_card}{important_notes}{error_history}
-{custom_tools_descriptions}
+{% if device_date %}
+<device_date>
+{{ device_date }}
+</device_date>
+{% endif %}
+{% if app_card %}
+App card gives information on how to operate the app and perform actions.
+<app_card>
+{{ app_card }}
+</app_card>
+{% endif %}
+{% if important_notes %}
+<important_notes>
+{{ important_notes }}
+</important_notes>
+{% endif %}
+{% if error_history %}
+<potentially_stuck>
+You have encountered several failed attempts. Here are some logs:
+{% for error in error_history %}
+- Attempt: Action: {{ error.action }} | Description: {{ error.summary }} | Outcome: Failed | Feedback: {{ error.error }}
+{% endfor %}
+</potentially_stuck>
+{% endif %}
+{% if custom_tools_descriptions %}
+<custom_actions>
+The executor has access to these additional custom actions beyond the standard actions (click, type, swipe, etc.):
+{{ custom_tools_descriptions }}
+You can reference these custom actions or tell the Executer agent to use them in your plan when they help achieve the user's goal.
+</custom_actions>
+{% endif %}
 ---
@@ -28,7 +62,17 @@ You operate an Android phone by creating high-level plans to fulfill user reques
 - Use memory instead of clipboard unless specifically requested
 **Text Operations:**
-{text_manipulation_section}
+{% if text_manipulation_enabled %}
+<text_manipulation>
+1. Use **TEXT_TASK:** prefix in your plan when you need to modify text in the currently focused text input field
+2. TEXT_TASK is for editing, formatting, or transforming existing text content in text boxes using Python code
+3. Do not use TEXT_TASK for extracting text from messages, typing new text, or composing messages
+4. The focused text field contains editable text that you can modify
+5. Example plan item: 'TEXT_TASK: Add "Hello World" at the beginning of the text'
+6. Always use TEXT_TASK for modifying text, do not try to select the text to copy/cut/paste or adjust the text
+</text_manipulation>
+{% endif %}
 ---
@@ -68,4 +112,4 @@ Example: "At step 5, I obtained recipe from recipes.jpg: Chicken Pasta - ingredi
 <request_accomplished>
 Use ONLY when request is fully completed through concrete actions. Include confirmation message of what was accomplished.
-</request_accomplished>
+</request_accomplished>

droidrun-0.3.10.dev4/config/prompts/manager/system.md → droidrun-0.3.10.dev6/config/prompts/manager/system.jinja2 RENAMED Viewed

@@ -1,11 +1,37 @@
 You are an agent who can operate an Android phone on behalf of a user. Your goal is to track progress and devise high-level plans to achieve the user's requests.
 <user_request>
-{instruction}
+{{ instruction }}
 </user_request>
-{device_date}{app_card}{important_notes}{error_history}
+{% if device_date %}
+<device_date>
+{{ device_date }}
+</device_date>
+{% endif %}
+{% if app_card %}
+App card gives information on how to operate the app and perform actions.
+<app_card>
+{{ app_card }}
+</app_card>
+{% endif %}
+{% if important_notes %}
+<important_notes>
+{{ important_notes }}
+</important_notes>
+{% endif %}
+{% if error_history %}
+<potentially_stuck>
+You have encountered several failed attempts. Here are some logs:
+{% for error in error_history %}
+- Attempt: Action: {{ error.action }} | Description: {{ error.summary }} | Outcome: Failed | Feedback: {{ error.error }}
+{% endfor %}
+</potentially_stuck>
+{% endif %}
 <guidelines>
 The following guidelines will help you plan this request.
 General:
@@ -17,7 +43,17 @@ General:
 6. Make sure names and titles are not cutoff. If the request is to check who sent a message, make sure to check the message sender's full name not just what appears in the notification because it might be cut off.
 7. Dates and file names must match the user query exactly.
 8. Don't do more than what the user asks for.
-{text_manipulation_section}
+{% if text_manipulation_enabled %}
+<text_manipulation>
+1. Use **TEXT_TASK:** prefix in your plan when you need to modify text in the currently focused text input field
+2. TEXT_TASK is for editing, formatting, or transforming existing text content in text boxes using Python code
+3. Do not use TEXT_TASK for extracting text from messages, typing new text, or composing messages
+4. The focused text field contains editable text that you can modify
+5. Example plan item: 'TEXT_TASK: Add "Hello World" at the beginning of the text'
+6. Always use TEXT_TASK for modifying text, do not try to select the text to copy/cut/paste or adjust the text
+</text_manipulation>
+{% endif %}
 Memory Usage:
 - Always include step context: "At step [number], I obtained [actual content] from [source]"
@@ -27,7 +63,16 @@ Memory Usage:
 - Update memory to track progress on multi-step tasks
 </guidelines>
-{custom_tools_descriptions}
+{% if custom_tools_descriptions %}
+<custom_actions>
+The executor has access to these additional custom actions beyond the standard actions (click, type, swipe, etc.):
+{{ custom_tools_descriptions }}
+You can reference these custom actions or tell the Executer agent to use them in your plan when they help achieve the user's goal.
+</custom_actions>
+{% endif %}
 ---
 Carefully assess the current status and the provided screenshot. Check if the current plan needs to be revised.
 Determine if the user request has been fully completed. If you are confident that no further actions are required, use the request_accomplished tag with a message in it. If the user request is not finished, update the plan and don't use it. If you are stuck with errors, think step by step about whether the overall plan needs to be revised to address the error.

{droidrun-0.3.10.dev4 → droidrun-0.3.10.dev6}/config_example.yaml RENAMED Viewed

@@ -19,30 +19,38 @@ agent:
     # Enable vision capabilities (screenshots)
     vision: false
     # System prompt filename (located in prompts_dir/codeact/)
-    system_prompt: system.md
+    system_prompt: system.jinja2
     # User prompt filename (located in prompts_dir/codeact/)
-    user_prompt: user.md
+    user_prompt: user.jinja2
   # Manager Agent Configuration
   manager:
     # Enable vision capabilities (screenshots)
     vision: false
     # System prompt filename (located in prompts_dir/manager/)
-    system_prompt: system.md
+    system_prompt: system.jinja2
   # Executor Agent Configuration
   executor:
     # Enable vision capabilities (screenshots)
     vision: false
     # System prompt filename (located in prompts_dir/executor/)
-    system_prompt: system.md
+    system_prompt: system.jinja2
   # App Cards Configuration
   app_cards:
     # Enable app-specific instruction cards
     enabled: true
-    # Directory containing app card files
+    # Mode: local (file-based), server (HTTP API), or composite (server with local fallback)
+    mode: local
+    # Directory containing app card files (for local/composite modes)
     app_cards_dir: config/app_cards
+    # Server URL for remote app cards (for server/composite modes)
+    server_url: null
+    # Server request timeout in seconds
+    server_timeout: 10.0
+    # Number of server retry attempts
+    server_max_retries: 2
 # === LLM Profiles ===
 # Define LLM configurations for each agent type

{droidrun-0.3.10.dev4 → droidrun-0.3.10.dev6}/droidrun/agent/codeact/codeact_agent.py RENAMED Viewed

@@ -85,15 +85,8 @@ class CodeActAgent(Workflow):
         self.tool_list = {}
         for action_name, signature in merged_signatures.items():
             func = signature["function"]
-            if asyncio.iscoroutinefunction(func):
-                # Create async bound function with proper closure
-                def make_bound(f, ti):
-                    async def bound_func(*args, **kwargs):
-                        return await f(ti, *args, **kwargs)
-                    return bound_func
-                self.tool_list[action_name] = make_bound(func, tools_instance)
-            else:
-                self.tool_list[action_name] = lambda *args, f=func, ti=tools_instance, **kwargs: f(ti, *args, **kwargs)
+            self.tool_list[action_name] = lambda *args, f=func, ti=tools_instance, **kwargs: f(ti, *args, **kwargs)
         self.tool_list["remember"] = tools_instance.remember
         self.tool_list["complete"] = tools_instance.complete
@@ -113,13 +106,10 @@ class CodeActAgent(Workflow):
         )
         self.system_prompt = ChatMessage(role="system", content=system_prompt_text)
-        self.user_prompt_template = PromptLoader.load_prompt(agent_config.get_codeact_user_prompt_path())
         self.executor = SimpleCodeExecutor(
             loop=asyncio.get_event_loop(),
             locals={},
             tools=self.tool_list,
-            tools_instance=tools_instance,
             globals={"__builtins__": __builtins__},
         )
@@ -293,27 +283,30 @@ Now, describe the next step you will take to address the original goal: {goal}""
         try:
             self.code_exec_counter += 1
             result = await self.executor.execute(ExecuterState(ui_state=ctx.store.get("ui_state", None)), code)
-            logger.info(f"💡 Code execution successful. Result: {result['output']}")
+            logger.info(f"💡 Code execution successful. Result: {result}")
             await asyncio.sleep(self.agent_config.after_sleep_action)
-            screenshots = result['screenshots']
-            for screenshot in screenshots[:-1]: # the last screenshot will be captured by next step
-                ctx.write_event_to_stream(ScreenshotEvent(screenshot=screenshot))
-            ui_states = result['ui_states']
-            for ui_state in ui_states[:-1]:
-                ctx.write_event_to_stream(RecordUIStateEvent(ui_state=ui_state['a11y_tree']))
+            # Check if complete() was called
             if self.tools.finished:
-                logger.debug("  - Task completed.")
-                event = TaskEndEvent(
-                    success=self.tools.success, reason=self.tools.reason
-                )
+                logger.info("✅ Task marked as complete via complete() function")
+                # Validate completion state
+                success = self.tools.success if self.tools.success is not None else False
+                reason = self.tools.reason if self.tools.reason else "Task completed without reason"
+                # Reset finished flag for next execution
+                self.tools.finished = False
+                logger.info(f"  - Success: {success}")
+                logger.info(f"  - Reason: {reason}")
+                event = TaskEndEvent(success=success, reason=reason)
                 ctx.write_event_to_stream(event)
                 return event
             self.remembered_info = self.tools.memory
-            event = TaskExecutionResultEvent(output=str(result['output']))
+            event = TaskExecutionResultEvent(output=str(result))
             ctx.write_event_to_stream(event)
             return event

{droidrun-0.3.10.dev4 → droidrun-0.3.10.dev6}/droidrun/agent/droid/events.py RENAMED Viewed

@@ -10,6 +10,7 @@ For internal events with full debugging metadata, see:
 - codeact/events.py (Task*, EpisodicMemoryEvent)
 """
+import asyncio
 from typing import Dict, List
 from llama_index.core.workflow import Event
@@ -49,7 +50,9 @@ class DroidAgentState(BaseModel):
     # Task context
     instruction: str = ""
+    # App Cards
+    app_card: str = ""
+    app_card_loading_task: asyncio.Task[str] | None = None
     # Formatted device state for prompts (complete text)
     formatted_device_state: str = ""

{droidrun-0.3.10.dev4 → droidrun-0.3.10.dev6}/droidrun/agent/executor/executor_agent.py RENAMED Viewed

@@ -90,52 +90,38 @@ class ExecutorAgent(Workflow): # TODO: Fix a bug in bad prompt
         subgoal = ev.get("subgoal", "")
         logger.info(f"🧠 Executor thinking about action for: {subgoal}")
-        # Format app card (include tags in variable value or empty string)
-        app_card = ""  # TODO: Implement app card retrieval
-        app_card_text = ""
-        if app_card.strip():
-            app_card_text = "App card gives information on how to operate the app and perform actions.\n### App Card ###\n" + app_card.strip() + "\n\n"
-        # Format device state (use unified state)
-        device_state_text = ""
-        if self.shared_state.formatted_device_state and self.shared_state.formatted_device_state.strip():
-            device_state_text = "### Device State ###\n" + self.shared_state.formatted_device_state.strip() + "\n\n"
-        # Format progress status
-        progress_status_text = self.shared_state.progress_status + "\n\n" if self.shared_state.progress_status else "No progress yet.\n\n"
-        # Format atomic actions
-        atomic_actions_text = chr(10).join(
-            f"- {action_name}({', '.join(action_info['arguments'])}): {action_info['description']}"
-            for action_name, action_info in ATOMIC_ACTION_SIGNATURES.items()
-        ) + "\n"
-        # Format action history
+        # Prepare action history as structured data (last 5 actions)
+        action_history = []
         if self.shared_state.action_history:
-            action_history_text = "Recent actions you took previously and whether they were successful:\n" + "\n".join(
-                (f"Action: {act} | Description: {summ} | Outcome: Successful" if outcome
-                 else f"Action: {act} | Description: {summ} | Outcome: Failed | Feedback: {err_des}")
+            n = min(5, len(self.shared_state.action_history))
+            action_history = [
+                {
+                    "action": act,
+                    "summary": summ,
+                    "outcome": outcome,
+                    "error": err_des
+                }
                 for act, summ, outcome, err_des in zip(
-                    self.shared_state.action_history[-min(5, len(self.shared_state.action_history)):],
-                    self.shared_state.summary_history[-min(5, len(self.shared_state.action_history)):],
-                    self.shared_state.action_outcomes[-min(5, len(self.shared_state.action_history)):],
-                    self.shared_state.error_descriptions[-min(5, len(self.shared_state.action_history)):], strict=True)
-            ) + "\n\n"
-        else:
-            action_history_text = "No actions have been taken yet.\n\n"
-        # Load and format prompt
+                    self.shared_state.action_history[-n:],
+                    self.shared_state.summary_history[-n:],
+                    self.shared_state.action_outcomes[-n:],
+                    self.shared_state.error_descriptions[-n:],
+                    strict=True
+                )
+            ]
+        # Let Jinja2 handle all formatting
         system_prompt = PromptLoader.load_prompt(
             self.agent_config.get_executor_system_prompt_path(),
             {
                 "instruction": self.shared_state.instruction,
-                "app_card": app_card_text,
-                "device_state_text": device_state_text,
+                "app_card": "",  # TODO: Implement app card loader
+                "device_state": self.shared_state.formatted_device_state,
                 "plan": self.shared_state.plan,
                 "subgoal": subgoal,
-                "progress_status": progress_status_text,
-                "atomic_actions": atomic_actions_text,
-                "action_history": action_history_text
+                "progress_status": self.shared_state.progress_status,
+                "atomic_actions": ATOMIC_ACTION_SIGNATURES,
+                "action_history": action_history
             }
         )

droidrun-0.3.10.dev6/droidrun/agent/executor/prompts.py ADDED Viewed

@@ -0,0 +1,34 @@
+"""
+Prompts for the ExecutorAgent.
+"""
+def parse_executor_response(response: str) -> dict:
+    """
+    Parse the Executor LLM response.
+    Extracts:
+    - thought: Content between "### Thought" and "### Action"
+    - action: Content between "### Action" and "### Description"
+    - description: Content after "### Description"
+    Args:
+        response: Raw LLM response string
+    Returns:
+        Dictionary with 'thought', 'action', 'description' keys
+    """
+    thought = response.split("### Thought")[-1].split("### Action")[0].replace("\n", " ").replace("  ", " ").replace("###", "").strip()
+    action_raw = response.split("### Action")[-1].split("### Description")[0].replace("\n", " ").replace("  ", " ").replace("###", "").strip()
+    start_idx = action_raw.find('{')
+    end_idx = action_raw.rfind('}')
+    if start_idx != -1 and end_idx != -1:
+        action = action_raw[start_idx:end_idx + 1]
+    else:
+        action = action_raw
+    description = response.split("### Description")[-1].replace("\n", " ").replace("  ", " ").replace("###", "").strip()
+    return {
+        "thought": thought,
+        "action": action,
+        "description": description
+    }

droidrun 0.3.10.dev4__tar.gz → 0.3.10.dev6__tar.gz

droidrun 0.3.10.dev4tar.gz → 0.3.10.dev6tar.gz