droidrun 0.1.0__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,5 @@
1
+ dist/
2
+ # Python bytecode files
3
+ __pycache__/
4
+ *.py[cod]
5
+ *$py.class
droidrun-0.1.0/LICENSE ADDED
@@ -0,0 +1,21 @@
1
+ MIT License
2
+
3
+ Copyright (c) 2025 Niels Schmidt
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in all
13
+ copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21
+ SOFTWARE.
@@ -0,0 +1,7 @@
1
+ include LICENSE
2
+ include README.md
3
+ include pyproject.toml
4
+
5
+ recursive-include droidrun *
6
+ recursive-exclude * __pycache__
7
+ recursive-exclude * *.py[cod]
@@ -0,0 +1,276 @@
1
+ Metadata-Version: 2.4
2
+ Name: droidrun
3
+ Version: 0.1.0
4
+ Summary: A framework for controlling Android devices through LLM agents
5
+ Project-URL: Homepage, https://github.com/droidrun/droidrun
6
+ Project-URL: Bug Tracker, https://github.com/droidrun/droidrun/issues
7
+ Project-URL: Documentation, https://docs.droidrun.ai/
8
+ Author-email: Niels Schmidt <niels.schmidt@droidrun.ai>
9
+ License: MIT
10
+ License-File: LICENSE
11
+ Classifier: Development Status :: 4 - Beta
12
+ Classifier: Intended Audience :: Developers
13
+ Classifier: Intended Audience :: Information Technology
14
+ Classifier: Intended Audience :: Science/Research
15
+ Classifier: License :: OSI Approved :: MIT License
16
+ Classifier: Programming Language :: Python :: 3
17
+ Classifier: Programming Language :: Python :: 3.10
18
+ Classifier: Programming Language :: Python :: 3.11
19
+ Classifier: Topic :: Communications :: Chat
20
+ Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
21
+ Classifier: Topic :: Software Development :: Libraries :: Application Frameworks
22
+ Classifier: Topic :: Software Development :: Libraries :: Python Modules
23
+ Classifier: Topic :: Software Development :: Quality Assurance
24
+ Classifier: Topic :: Software Development :: Testing
25
+ Classifier: Topic :: Software Development :: Testing :: Acceptance
26
+ Classifier: Topic :: System :: Emulators
27
+ Classifier: Topic :: Utilities
28
+ Requires-Python: >=3.10
29
+ Requires-Dist: aiofiles>=23.0.0
30
+ Requires-Dist: anthropic>=0.7.0
31
+ Requires-Dist: click>=8.1.0
32
+ Requires-Dist: openai>=1.0.0
33
+ Requires-Dist: pillow>=10.0.0
34
+ Requires-Dist: pydantic>=2.0.0
35
+ Requires-Dist: python-dotenv>=1.0.0
36
+ Requires-Dist: rich>=13.0.0
37
+ Provides-Extra: dev
38
+ Requires-Dist: black>=23.0.0; extra == 'dev'
39
+ Requires-Dist: mypy>=1.0.0; extra == 'dev'
40
+ Requires-Dist: ruff>=0.1.0; extra == 'dev'
41
+ Description-Content-Type: text/markdown
42
+
43
+ # 🤖 DroidRun
44
+
45
+ DroidRun is a powerful framework for controlling Android devices through LLM agents. It allows you to automate Android device interactions using natural language commands.
46
+
47
+ ## ✨ Features
48
+
49
+ - Control Android devices with natural language commands
50
+ - Supports multiple LLM providers (OpenAI, Anthropic, Gemini)
51
+ - Easy to use CLI
52
+ - Extendable Python API for custom automations
53
+ - Screenshot analysis for visual understanding of the device
54
+
55
+ ## 📦 Installation
56
+
57
+ ### 🚀 Option 1: Install from PyPI (Recommended)
58
+
59
+ ```bash
60
+ pip install droidrun
61
+ ```
62
+
63
+ ### 🔧 Option 2: Install from Source
64
+
65
+ ```bash
66
+ git clone https://github.com/yourusername/droidrun.git
67
+ cd droidrun
68
+ pip install -e .
69
+ ```
70
+
71
+ ## 📋 Prerequisites
72
+
73
+ 1. An Android device connected via USB or ADB over TCP/IP
74
+ 2. ADB (Android Debug Bridge) installed and configured
75
+ 3. DroidRun Portal app installed on your Android device
76
+ 4. API key for at least one of the supported LLM providers:
77
+ - OpenAI
78
+ - Anthropic
79
+ - Google Gemini
80
+
81
+ ### 🔧 Setting up ADB
82
+
83
+ ADB (Android Debug Bridge) is required for DroidRun to communicate with your Android device:
84
+
85
+ 1. **Install ADB**:
86
+ - **Windows**: Download [Android SDK Platform Tools](https://developer.android.com/studio/releases/platform-tools) and extract the ZIP file
87
+ - **macOS**: `brew install android-platform-tools`
88
+ - **Linux**: `sudo apt install adb` (Ubuntu/Debian) or `sudo pacman -S android-tools` (Arch)
89
+
90
+ 2. **Add ADB to your PATH**:
91
+ - **Windows**: Add the path to the extracted platform-tools folder to your system's PATH environment variable
92
+ - **macOS/Linux**: Add the following to your ~/.bashrc or ~/.zshrc:
93
+ ```bash
94
+ export PATH=$PATH:/path/to/platform-tools
95
+ ```
96
+
97
+ 3. **Verify ADB installation**:
98
+ ```bash
99
+ adb version
100
+ ```
101
+
102
+ 4. **Enable USB debugging on your Android device**:
103
+ - Go to **Settings → About phone**
104
+ - Tap **Build number** 7 times to enable Developer options
105
+ - Go to **Settings → System → Developer options** (location may vary by device)
106
+ - Enable **USB debugging**
107
+
108
+ ## 🛠️ Setup
109
+
110
+ ### 📱 1. Install DroidRun Portal App
111
+
112
+ DroidRun requires the DroidRun Portal app to be installed on your Android device:
113
+
114
+ 1. Download the DroidRun Portal APK from the [DroidRun Portal repository](https://github.com/droidrun/droidrun-portal)
115
+ 2. Use DroidRun to install the portal app:
116
+ ```bash
117
+ droidrun setup --path=/path/to/droidrun-portal.apk
118
+ ```
119
+
120
+ Alternatively, you can use ADB to install it manually:
121
+ ```bash
122
+ adb install -r /path/to/droidrun-portal.apk
123
+ ```
124
+
125
+ ### 🔑 2. Set up API keys
126
+
127
+ Create a `.env` file in your working directory or set environment variables:
128
+
129
+ ```bash
130
+ # Choose at least one of these based on your preferred provider
131
+ export OPENAI_API_KEY="your_openai_api_key_here"
132
+ export ANTHROPIC_API_KEY="your_anthropic_api_key_here"
133
+ export GEMINI_API_KEY="your_gemini_api_key_here"
134
+ ```
135
+
136
+ To load the environment variables from the `.env` file:
137
+
138
+ ```bash
139
+ source .env
140
+ ```
141
+
142
+ ### 📱 3. Connect to an Android device
143
+
144
+ Connect your device via USB or set up wireless ADB:
145
+
146
+ ```bash
147
+ # List connected devices
148
+ droidrun devices
149
+
150
+ # Connect to a device over Wi-Fi
151
+ droidrun connect 192.168.1.100
152
+ ```
153
+
154
+ ### 🔄 4. Verify the setup
155
+
156
+ Verify that everything is set up correctly:
157
+
158
+ ```bash
159
+ # Should list your connected device and show portal status
160
+ droidrun status
161
+ ```
162
+
163
+ ## 💻 Using the CLI
164
+
165
+ DroidRun's CLI is designed to be simple and intuitive. You can use it in two ways:
166
+
167
+ ### 🚀 Basic Usage
168
+
169
+ ```bash
170
+ # Format: droidrun "task description" [options]
171
+ droidrun "Open the settings app"
172
+ ```
173
+
174
+ ### 🔌 With Provider Options
175
+
176
+ ```bash
177
+ # Using OpenAI
178
+ droidrun "Open the calculator app" --provider openai --model gpt-4o-mini
179
+
180
+ # Using Anthropic
181
+ droidrun "Check the battery level" --provider anthropic --model claude-3-sonnet-20240229
182
+
183
+ # Using Gemini
184
+ droidrun "Install and open Instagram" --provider gemini --model gemini-2.0-flash
185
+ ```
186
+
187
+ ### ⚙️ Additional Options
188
+
189
+ ```bash
190
+ # Specify a particular device
191
+ droidrun "Open Chrome and search for weather" --device abc123
192
+
193
+ # Set maximum number of steps
194
+ droidrun "Open settings and enable dark mode" --steps 20
195
+ ```
196
+
197
+ ## 📝 Creating a Minimal Test Script
198
+
199
+ If you want to use DroidRun in your Python code rather than via the CLI, you can create a minimal test script:
200
+
201
+ ```python
202
+ #!/usr/bin/env python3
203
+ import asyncio
204
+ import os
205
+ from droidrun.agent.react_agent import ReActAgent
206
+ from droidrun.agent.llm_reasoning import LLMReasoner
207
+ from dotenv import load_dotenv
208
+
209
+ # Load environment variables from .env file
210
+ load_dotenv()
211
+
212
+ async def main():
213
+ # Create an LLM instance (choose your preferred provider)
214
+ llm = LLMReasoner(
215
+ llm_provider="gemini", # Can be "openai", "anthropic", or "gemini"
216
+ model_name="gemini-2.0-flash", # Choose appropriate model for your provider
217
+ api_key=os.environ.get("GEMINI_API_KEY"), # Get API key from environment
218
+ temperature=0.2
219
+ )
220
+
221
+ # Create and run the agent
222
+ agent = ReActAgent(
223
+ task="Open the Settings app and check the Android version",
224
+ llm=llm
225
+ )
226
+
227
+ steps = await agent.run()
228
+ print(f"Execution completed with {len(steps)} steps")
229
+
230
+ if __name__ == "__main__":
231
+ asyncio.run(main())
232
+ ```
233
+
234
+ Save this as `test_droidrun.py`, ensure your `.env` file has the appropriate API key, and run:
235
+
236
+ ```bash
237
+ python test_droidrun.py
238
+ ```
239
+
240
+ ## ❓ Troubleshooting
241
+
242
+ ### 🔑 API Key Issues
243
+
244
+ If you encounter errors about missing API keys, ensure:
245
+ 1. You've set the correct environment variable for your chosen provider
246
+ 2. The API key is valid and has appropriate permissions
247
+ 3. You've correctly sourced your `.env` file or exported the variables manually
248
+
249
+ ### 📱 Device Connection Issues
250
+
251
+ If you have trouble connecting to your device:
252
+ 1. Ensure USB debugging is enabled on your Android device
253
+ 2. Check that your device is recognized by ADB: `adb devices`
254
+ 3. For wireless connections, make sure your device and computer are on the same network
255
+
256
+ ### 🤖 LLM Provider Selection
257
+
258
+ If DroidRun is using the wrong LLM provider:
259
+ 1. Explicitly specify the provider with `--provider` (in CLI) or `llm_provider=` (in code)
260
+ 2. When using Gemini, ensure you have set `GEMINI_API_KEY` and specified `--provider gemini`
261
+
262
+ ## 💡 Example Use Cases
263
+
264
+ - Automated UI testing of Android applications
265
+ - Creating guided workflows for non-technical users
266
+ - Automating repetitive tasks on Android devices
267
+ - Remote assistance for less technical users
268
+ - Exploring Android UI with natural language commands
269
+
270
+ ## 👥 Contributing
271
+
272
+ Contributions are welcome! Please feel free to submit a Pull Request.
273
+
274
+ ## 📄 License
275
+
276
+ This project is licensed under the MIT License - see the LICENSE file for details.
@@ -0,0 +1,234 @@
1
+ # 🤖 DroidRun
2
+
3
+ DroidRun is a powerful framework for controlling Android devices through LLM agents. It allows you to automate Android device interactions using natural language commands.
4
+
5
+ ## ✨ Features
6
+
7
+ - Control Android devices with natural language commands
8
+ - Supports multiple LLM providers (OpenAI, Anthropic, Gemini)
9
+ - Easy to use CLI
10
+ - Extendable Python API for custom automations
11
+ - Screenshot analysis for visual understanding of the device
12
+
13
+ ## 📦 Installation
14
+
15
+ ### 🚀 Option 1: Install from PyPI (Recommended)
16
+
17
+ ```bash
18
+ pip install droidrun
19
+ ```
20
+
21
+ ### 🔧 Option 2: Install from Source
22
+
23
+ ```bash
24
+ git clone https://github.com/yourusername/droidrun.git
25
+ cd droidrun
26
+ pip install -e .
27
+ ```
28
+
29
+ ## 📋 Prerequisites
30
+
31
+ 1. An Android device connected via USB or ADB over TCP/IP
32
+ 2. ADB (Android Debug Bridge) installed and configured
33
+ 3. DroidRun Portal app installed on your Android device
34
+ 4. API key for at least one of the supported LLM providers:
35
+ - OpenAI
36
+ - Anthropic
37
+ - Google Gemini
38
+
39
+ ### 🔧 Setting up ADB
40
+
41
+ ADB (Android Debug Bridge) is required for DroidRun to communicate with your Android device:
42
+
43
+ 1. **Install ADB**:
44
+ - **Windows**: Download [Android SDK Platform Tools](https://developer.android.com/studio/releases/platform-tools) and extract the ZIP file
45
+ - **macOS**: `brew install android-platform-tools`
46
+ - **Linux**: `sudo apt install adb` (Ubuntu/Debian) or `sudo pacman -S android-tools` (Arch)
47
+
48
+ 2. **Add ADB to your PATH**:
49
+ - **Windows**: Add the path to the extracted platform-tools folder to your system's PATH environment variable
50
+ - **macOS/Linux**: Add the following to your ~/.bashrc or ~/.zshrc:
51
+ ```bash
52
+ export PATH=$PATH:/path/to/platform-tools
53
+ ```
54
+
55
+ 3. **Verify ADB installation**:
56
+ ```bash
57
+ adb version
58
+ ```
59
+
60
+ 4. **Enable USB debugging on your Android device**:
61
+ - Go to **Settings → About phone**
62
+ - Tap **Build number** 7 times to enable Developer options
63
+ - Go to **Settings → System → Developer options** (location may vary by device)
64
+ - Enable **USB debugging**
65
+
66
+ ## 🛠️ Setup
67
+
68
+ ### 📱 1. Install DroidRun Portal App
69
+
70
+ DroidRun requires the DroidRun Portal app to be installed on your Android device:
71
+
72
+ 1. Download the DroidRun Portal APK from the [DroidRun Portal repository](https://github.com/droidrun/droidrun-portal)
73
+ 2. Use DroidRun to install the portal app:
74
+ ```bash
75
+ droidrun setup --path=/path/to/droidrun-portal.apk
76
+ ```
77
+
78
+ Alternatively, you can use ADB to install it manually:
79
+ ```bash
80
+ adb install -r /path/to/droidrun-portal.apk
81
+ ```
82
+
83
+ ### 🔑 2. Set up API keys
84
+
85
+ Create a `.env` file in your working directory or set environment variables:
86
+
87
+ ```bash
88
+ # Choose at least one of these based on your preferred provider
89
+ export OPENAI_API_KEY="your_openai_api_key_here"
90
+ export ANTHROPIC_API_KEY="your_anthropic_api_key_here"
91
+ export GEMINI_API_KEY="your_gemini_api_key_here"
92
+ ```
93
+
94
+ To load the environment variables from the `.env` file:
95
+
96
+ ```bash
97
+ source .env
98
+ ```
99
+
100
+ ### 📱 3. Connect to an Android device
101
+
102
+ Connect your device via USB or set up wireless ADB:
103
+
104
+ ```bash
105
+ # List connected devices
106
+ droidrun devices
107
+
108
+ # Connect to a device over Wi-Fi
109
+ droidrun connect 192.168.1.100
110
+ ```
111
+
112
+ ### 🔄 4. Verify the setup
113
+
114
+ Verify that everything is set up correctly:
115
+
116
+ ```bash
117
+ # Should list your connected device and show portal status
118
+ droidrun status
119
+ ```
120
+
121
+ ## 💻 Using the CLI
122
+
123
+ DroidRun's CLI is designed to be simple and intuitive. You can use it in two ways:
124
+
125
+ ### 🚀 Basic Usage
126
+
127
+ ```bash
128
+ # Format: droidrun "task description" [options]
129
+ droidrun "Open the settings app"
130
+ ```
131
+
132
+ ### 🔌 With Provider Options
133
+
134
+ ```bash
135
+ # Using OpenAI
136
+ droidrun "Open the calculator app" --provider openai --model gpt-4o-mini
137
+
138
+ # Using Anthropic
139
+ droidrun "Check the battery level" --provider anthropic --model claude-3-sonnet-20240229
140
+
141
+ # Using Gemini
142
+ droidrun "Install and open Instagram" --provider gemini --model gemini-2.0-flash
143
+ ```
144
+
145
+ ### ⚙️ Additional Options
146
+
147
+ ```bash
148
+ # Specify a particular device
149
+ droidrun "Open Chrome and search for weather" --device abc123
150
+
151
+ # Set maximum number of steps
152
+ droidrun "Open settings and enable dark mode" --steps 20
153
+ ```
154
+
155
+ ## 📝 Creating a Minimal Test Script
156
+
157
+ If you want to use DroidRun in your Python code rather than via the CLI, you can create a minimal test script:
158
+
159
+ ```python
160
+ #!/usr/bin/env python3
161
+ import asyncio
162
+ import os
163
+ from droidrun.agent.react_agent import ReActAgent
164
+ from droidrun.agent.llm_reasoning import LLMReasoner
165
+ from dotenv import load_dotenv
166
+
167
+ # Load environment variables from .env file
168
+ load_dotenv()
169
+
170
+ async def main():
171
+ # Create an LLM instance (choose your preferred provider)
172
+ llm = LLMReasoner(
173
+ llm_provider="gemini", # Can be "openai", "anthropic", or "gemini"
174
+ model_name="gemini-2.0-flash", # Choose appropriate model for your provider
175
+ api_key=os.environ.get("GEMINI_API_KEY"), # Get API key from environment
176
+ temperature=0.2
177
+ )
178
+
179
+ # Create and run the agent
180
+ agent = ReActAgent(
181
+ task="Open the Settings app and check the Android version",
182
+ llm=llm
183
+ )
184
+
185
+ steps = await agent.run()
186
+ print(f"Execution completed with {len(steps)} steps")
187
+
188
+ if __name__ == "__main__":
189
+ asyncio.run(main())
190
+ ```
191
+
192
+ Save this as `test_droidrun.py`, ensure your `.env` file has the appropriate API key, and run:
193
+
194
+ ```bash
195
+ python test_droidrun.py
196
+ ```
197
+
198
+ ## ❓ Troubleshooting
199
+
200
+ ### 🔑 API Key Issues
201
+
202
+ If you encounter errors about missing API keys, ensure:
203
+ 1. You've set the correct environment variable for your chosen provider
204
+ 2. The API key is valid and has appropriate permissions
205
+ 3. You've correctly sourced your `.env` file or exported the variables manually
206
+
207
+ ### 📱 Device Connection Issues
208
+
209
+ If you have trouble connecting to your device:
210
+ 1. Ensure USB debugging is enabled on your Android device
211
+ 2. Check that your device is recognized by ADB: `adb devices`
212
+ 3. For wireless connections, make sure your device and computer are on the same network
213
+
214
+ ### 🤖 LLM Provider Selection
215
+
216
+ If DroidRun is using the wrong LLM provider:
217
+ 1. Explicitly specify the provider with `--provider` (in CLI) or `llm_provider=` (in code)
218
+ 2. When using Gemini, ensure you have set `GEMINI_API_KEY` and specified `--provider gemini`
219
+
220
+ ## 💡 Example Use Cases
221
+
222
+ - Automated UI testing of Android applications
223
+ - Creating guided workflows for non-technical users
224
+ - Automating repetitive tasks on Android devices
225
+ - Remote assistance for less technical users
226
+ - Exploring Android UI with natural language commands
227
+
228
+ ## 👥 Contributing
229
+
230
+ Contributions are welcome! Please feel free to submit a Pull Request.
231
+
232
+ ## 📄 License
233
+
234
+ This project is licensed under the MIT License - see the LICENSE file for details.
@@ -0,0 +1,141 @@
1
+ ---
2
+ title: 'ReAct Agent'
3
+ description: 'Understanding the ReAct Agent system in DroidRun'
4
+ ---
5
+
6
+ # 🤖 ReAct Agent
7
+
8
+ DroidRun uses a ReAct (Reasoning + Acting) agent to control Android devices. This powerful approach combines LLM reasoning with concrete actions to achieve complex automation tasks.
9
+
10
+ ## 📚 What is ReAct?
11
+
12
+ ReAct is a framework that combines:
13
+
14
+ - **Reasoning**: Using an LLM to interpret tasks, make decisions, and plan steps
15
+ - **Acting**: Executing concrete actions on an Android device
16
+ - **Observing**: Getting feedback from actions to inform future reasoning
17
+
18
+ This loop of reasoning, acting, and observing allows the agent to handle complex, multi-step tasks on Android devices.
19
+
20
+ ## 🔄 The ReAct Loop
21
+
22
+ <Steps>
23
+ <Step title="Goal Setting">
24
+ The user provides a natural language task like "Open settings and enable dark mode"
25
+ </Step>
26
+ <Step title="Reasoning">
27
+ The LLM analyzes the task and determines what steps are needed
28
+ </Step>
29
+ <Step title="Action Selection">
30
+ The agent selects an appropriate action (e.g., tapping a UI element)
31
+ </Step>
32
+ <Step title="Execution">
33
+ The action is executed on the Android device
34
+ </Step>
35
+ <Step title="Observation">
36
+ The agent observes the result (e.g., a new screen appears)
37
+ </Step>
38
+ <Step title="Further Reasoning">
39
+ The agent evaluates progress and decides on the next action
40
+ </Step>
41
+ </Steps>
42
+
43
+ This cycle repeats until the task is completed or the maximum number of steps is reached.
44
+
45
+ ## 🛠️ Available Actions
46
+
47
+ The ReAct agent can perform various actions on Android devices:
48
+
49
+ <AccordionGroup>
50
+ <Accordion title="UI Interaction">
51
+ - `tap(index)` - Tap on a UI element by its index
52
+ - `swipe(start_x, start_y, end_x, end_y)` - Swipe from one point to another
53
+ - `input_text(text)` - Type text into the current field
54
+ - `press_key(keycode)` - Press a specific key (e.g., HOME, BACK)
55
+ </Accordion>
56
+
57
+ <Accordion title="App Management">
58
+ - `start_app(package)` - Launch an app by package name
59
+ - `list_packages()` - List installed packages
60
+ - `install_app(apk_path)` - Install an app from APK
61
+ - `uninstall_app(package)` - Uninstall an app
62
+ </Accordion>
63
+
64
+ <Accordion title="UI Analysis">
65
+ - `take_screenshot()` - Capture the current screen (vision mode only)
66
+ - `get_clickables()` - Identify clickable elements on screen
67
+ - `extract(filename)` - Save complete UI state to a JSON file
68
+ </Accordion>
69
+
70
+ <Accordion title="Task Management">
71
+ - `complete(result)` - Mark the task as complete with a summary
72
+ </Accordion>
73
+ </AccordionGroup>
74
+
75
+ ## 📸 Vision Capabilities
76
+
77
+ When vision mode is enabled, the ReAct agent can analyze screenshots to better understand the UI:
78
+
79
+ ```python
80
+ agent = ReActAgent(
81
+ task="Open settings and enable dark mode",
82
+ llm=llm_instance,
83
+ vision=True # Enable vision capabilities
84
+ )
85
+ ```
86
+
87
+ This provides several benefits:
88
+
89
+ - **Visual Context**: The LLM can see exactly what's on screen
90
+ - **Better UI Understanding**: Recognizes UI elements even if text detection is imperfect
91
+ - **Complex Navigation**: Handles apps with unusual or complex interfaces more effectively
92
+
93
+ ## 📊 Token Usage Tracking
94
+
95
+ The ReAct agent now tracks token usage for all LLM interactions:
96
+
97
+ ```python
98
+ # After running the agent
99
+ stats = llm.get_token_usage_stats()
100
+ print(f"Total tokens: {stats['total_tokens']}")
101
+ print(f"API calls: {stats['api_calls']}")
102
+ ```
103
+
104
+ This information is useful for:
105
+
106
+ - **Cost Management**: Track and optimize your API usage costs
107
+ - **Performance Tuning**: Identify steps that require the most tokens
108
+ - **Troubleshooting**: Debug issues with prompt sizes or response lengths
109
+
110
+ ## 🧠 Agent Parameters
111
+
112
+ When creating a ReAct agent, you can configure several parameters:
113
+
114
+ ```python
115
+ agent = ReActAgent(
116
+ task="Open settings and enable dark mode", # The goal to achieve
117
+ llm=llm_instance, # LLM to use for reasoning
118
+ device_serial="DEVICE123", # Optional specific device
119
+ max_steps=15, # Maximum steps to attempt
120
+ vision=False # Whether to enable vision capabilities
121
+ )
122
+ ```
123
+
124
+ ## 📊 Step Types
125
+
126
+ The agent records its progress using different step types:
127
+
128
+ - **Thought**: Internal reasoning about what to do
129
+ - **Action**: An action to be executed on the device
130
+ - **Observation**: Result of an action
131
+ - **Plan**: A sequence of steps to achieve the goal
132
+ - **Goal**: The target state to achieve
133
+
134
+ ## 💡 Best Practices
135
+
136
+ 1. **Clear Goals**: Provide specific, clear instructions
137
+ 2. **Realistic Tasks**: Break complex automation into manageable tasks
138
+ 3. **Vision for Complex UIs**: Enable vision mode for complex UI navigation
139
+ 4. **Step Limits**: Set reasonable max_steps to prevent infinite loops
140
+ 5. **Device Connectivity**: Ensure stable connection to your device
141
+ 6. **Token Optimization**: Monitor token usage for cost-effective automation