droidrun 0.1.0__tar.gz
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- droidrun-0.1.0/.gitignore +5 -0
- droidrun-0.1.0/LICENSE +21 -0
- droidrun-0.1.0/MANIFEST.in +7 -0
- droidrun-0.1.0/PKG-INFO +276 -0
- droidrun-0.1.0/README.md +234 -0
- droidrun-0.1.0/docs/concepts/agent.mdx +141 -0
- droidrun-0.1.0/docs/concepts/android-control.mdx +210 -0
- droidrun-0.1.0/docs/conf.py +25 -0
- droidrun-0.1.0/docs/favicon.png +0 -0
- droidrun-0.1.0/docs/installation.mdx +167 -0
- droidrun-0.1.0/docs/introduction.mdx +101 -0
- droidrun-0.1.0/docs/logo/dark.svg +10 -0
- droidrun-0.1.0/docs/logo/light.svg +10 -0
- droidrun-0.1.0/docs/mint.json +48 -0
- droidrun-0.1.0/docs/quickstart.mdx +155 -0
- droidrun-0.1.0/droidrun/__init__.py +19 -0
- droidrun-0.1.0/droidrun/__main__.py +8 -0
- droidrun-0.1.0/droidrun/adb/__init__.py +13 -0
- droidrun-0.1.0/droidrun/adb/device.py +315 -0
- droidrun-0.1.0/droidrun/adb/manager.py +93 -0
- droidrun-0.1.0/droidrun/adb/wrapper.py +226 -0
- droidrun-0.1.0/droidrun/agent/__init__.py +16 -0
- droidrun-0.1.0/droidrun/agent/llm_reasoning.py +567 -0
- droidrun-0.1.0/droidrun/agent/react_agent.py +556 -0
- droidrun-0.1.0/droidrun/cli/__init__.py +9 -0
- droidrun-0.1.0/droidrun/cli/main.py +265 -0
- droidrun-0.1.0/droidrun/llm/__init__.py +24 -0
- droidrun-0.1.0/droidrun/tools/__init__.py +35 -0
- droidrun-0.1.0/droidrun/tools/actions.py +854 -0
- droidrun-0.1.0/droidrun/tools/device.py +29 -0
- droidrun-0.1.0/pyproject.toml +77 -0
- droidrun-0.1.0/setup.py +8 -0
droidrun-0.1.0/LICENSE
ADDED
@@ -0,0 +1,21 @@
|
|
1
|
+
MIT License
|
2
|
+
|
3
|
+
Copyright (c) 2025 Niels Schmidt
|
4
|
+
|
5
|
+
Permission is hereby granted, free of charge, to any person obtaining a copy
|
6
|
+
of this software and associated documentation files (the "Software"), to deal
|
7
|
+
in the Software without restriction, including without limitation the rights
|
8
|
+
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
|
9
|
+
copies of the Software, and to permit persons to whom the Software is
|
10
|
+
furnished to do so, subject to the following conditions:
|
11
|
+
|
12
|
+
The above copyright notice and this permission notice shall be included in all
|
13
|
+
copies or substantial portions of the Software.
|
14
|
+
|
15
|
+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
16
|
+
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
17
|
+
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
18
|
+
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
19
|
+
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
|
20
|
+
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
|
21
|
+
SOFTWARE.
|
droidrun-0.1.0/PKG-INFO
ADDED
@@ -0,0 +1,276 @@
|
|
1
|
+
Metadata-Version: 2.4
|
2
|
+
Name: droidrun
|
3
|
+
Version: 0.1.0
|
4
|
+
Summary: A framework for controlling Android devices through LLM agents
|
5
|
+
Project-URL: Homepage, https://github.com/droidrun/droidrun
|
6
|
+
Project-URL: Bug Tracker, https://github.com/droidrun/droidrun/issues
|
7
|
+
Project-URL: Documentation, https://docs.droidrun.ai/
|
8
|
+
Author-email: Niels Schmidt <niels.schmidt@droidrun.ai>
|
9
|
+
License: MIT
|
10
|
+
License-File: LICENSE
|
11
|
+
Classifier: Development Status :: 4 - Beta
|
12
|
+
Classifier: Intended Audience :: Developers
|
13
|
+
Classifier: Intended Audience :: Information Technology
|
14
|
+
Classifier: Intended Audience :: Science/Research
|
15
|
+
Classifier: License :: OSI Approved :: MIT License
|
16
|
+
Classifier: Programming Language :: Python :: 3
|
17
|
+
Classifier: Programming Language :: Python :: 3.10
|
18
|
+
Classifier: Programming Language :: Python :: 3.11
|
19
|
+
Classifier: Topic :: Communications :: Chat
|
20
|
+
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
|
21
|
+
Classifier: Topic :: Software Development :: Libraries :: Application Frameworks
|
22
|
+
Classifier: Topic :: Software Development :: Libraries :: Python Modules
|
23
|
+
Classifier: Topic :: Software Development :: Quality Assurance
|
24
|
+
Classifier: Topic :: Software Development :: Testing
|
25
|
+
Classifier: Topic :: Software Development :: Testing :: Acceptance
|
26
|
+
Classifier: Topic :: System :: Emulators
|
27
|
+
Classifier: Topic :: Utilities
|
28
|
+
Requires-Python: >=3.10
|
29
|
+
Requires-Dist: aiofiles>=23.0.0
|
30
|
+
Requires-Dist: anthropic>=0.7.0
|
31
|
+
Requires-Dist: click>=8.1.0
|
32
|
+
Requires-Dist: openai>=1.0.0
|
33
|
+
Requires-Dist: pillow>=10.0.0
|
34
|
+
Requires-Dist: pydantic>=2.0.0
|
35
|
+
Requires-Dist: python-dotenv>=1.0.0
|
36
|
+
Requires-Dist: rich>=13.0.0
|
37
|
+
Provides-Extra: dev
|
38
|
+
Requires-Dist: black>=23.0.0; extra == 'dev'
|
39
|
+
Requires-Dist: mypy>=1.0.0; extra == 'dev'
|
40
|
+
Requires-Dist: ruff>=0.1.0; extra == 'dev'
|
41
|
+
Description-Content-Type: text/markdown
|
42
|
+
|
43
|
+
# 🤖 DroidRun
|
44
|
+
|
45
|
+
DroidRun is a powerful framework for controlling Android devices through LLM agents. It allows you to automate Android device interactions using natural language commands.
|
46
|
+
|
47
|
+
## ✨ Features
|
48
|
+
|
49
|
+
- Control Android devices with natural language commands
|
50
|
+
- Supports multiple LLM providers (OpenAI, Anthropic, Gemini)
|
51
|
+
- Easy to use CLI
|
52
|
+
- Extendable Python API for custom automations
|
53
|
+
- Screenshot analysis for visual understanding of the device
|
54
|
+
|
55
|
+
## 📦 Installation
|
56
|
+
|
57
|
+
### 🚀 Option 1: Install from PyPI (Recommended)
|
58
|
+
|
59
|
+
```bash
|
60
|
+
pip install droidrun
|
61
|
+
```
|
62
|
+
|
63
|
+
### 🔧 Option 2: Install from Source
|
64
|
+
|
65
|
+
```bash
|
66
|
+
git clone https://github.com/yourusername/droidrun.git
|
67
|
+
cd droidrun
|
68
|
+
pip install -e .
|
69
|
+
```
|
70
|
+
|
71
|
+
## 📋 Prerequisites
|
72
|
+
|
73
|
+
1. An Android device connected via USB or ADB over TCP/IP
|
74
|
+
2. ADB (Android Debug Bridge) installed and configured
|
75
|
+
3. DroidRun Portal app installed on your Android device
|
76
|
+
4. API key for at least one of the supported LLM providers:
|
77
|
+
- OpenAI
|
78
|
+
- Anthropic
|
79
|
+
- Google Gemini
|
80
|
+
|
81
|
+
### 🔧 Setting up ADB
|
82
|
+
|
83
|
+
ADB (Android Debug Bridge) is required for DroidRun to communicate with your Android device:
|
84
|
+
|
85
|
+
1. **Install ADB**:
|
86
|
+
- **Windows**: Download [Android SDK Platform Tools](https://developer.android.com/studio/releases/platform-tools) and extract the ZIP file
|
87
|
+
- **macOS**: `brew install android-platform-tools`
|
88
|
+
- **Linux**: `sudo apt install adb` (Ubuntu/Debian) or `sudo pacman -S android-tools` (Arch)
|
89
|
+
|
90
|
+
2. **Add ADB to your PATH**:
|
91
|
+
- **Windows**: Add the path to the extracted platform-tools folder to your system's PATH environment variable
|
92
|
+
- **macOS/Linux**: Add the following to your ~/.bashrc or ~/.zshrc:
|
93
|
+
```bash
|
94
|
+
export PATH=$PATH:/path/to/platform-tools
|
95
|
+
```
|
96
|
+
|
97
|
+
3. **Verify ADB installation**:
|
98
|
+
```bash
|
99
|
+
adb version
|
100
|
+
```
|
101
|
+
|
102
|
+
4. **Enable USB debugging on your Android device**:
|
103
|
+
- Go to **Settings → About phone**
|
104
|
+
- Tap **Build number** 7 times to enable Developer options
|
105
|
+
- Go to **Settings → System → Developer options** (location may vary by device)
|
106
|
+
- Enable **USB debugging**
|
107
|
+
|
108
|
+
## 🛠️ Setup
|
109
|
+
|
110
|
+
### 📱 1. Install DroidRun Portal App
|
111
|
+
|
112
|
+
DroidRun requires the DroidRun Portal app to be installed on your Android device:
|
113
|
+
|
114
|
+
1. Download the DroidRun Portal APK from the [DroidRun Portal repository](https://github.com/droidrun/droidrun-portal)
|
115
|
+
2. Use DroidRun to install the portal app:
|
116
|
+
```bash
|
117
|
+
droidrun setup --path=/path/to/droidrun-portal.apk
|
118
|
+
```
|
119
|
+
|
120
|
+
Alternatively, you can use ADB to install it manually:
|
121
|
+
```bash
|
122
|
+
adb install -r /path/to/droidrun-portal.apk
|
123
|
+
```
|
124
|
+
|
125
|
+
### 🔑 2. Set up API keys
|
126
|
+
|
127
|
+
Create a `.env` file in your working directory or set environment variables:
|
128
|
+
|
129
|
+
```bash
|
130
|
+
# Choose at least one of these based on your preferred provider
|
131
|
+
export OPENAI_API_KEY="your_openai_api_key_here"
|
132
|
+
export ANTHROPIC_API_KEY="your_anthropic_api_key_here"
|
133
|
+
export GEMINI_API_KEY="your_gemini_api_key_here"
|
134
|
+
```
|
135
|
+
|
136
|
+
To load the environment variables from the `.env` file:
|
137
|
+
|
138
|
+
```bash
|
139
|
+
source .env
|
140
|
+
```
|
141
|
+
|
142
|
+
### 📱 3. Connect to an Android device
|
143
|
+
|
144
|
+
Connect your device via USB or set up wireless ADB:
|
145
|
+
|
146
|
+
```bash
|
147
|
+
# List connected devices
|
148
|
+
droidrun devices
|
149
|
+
|
150
|
+
# Connect to a device over Wi-Fi
|
151
|
+
droidrun connect 192.168.1.100
|
152
|
+
```
|
153
|
+
|
154
|
+
### 🔄 4. Verify the setup
|
155
|
+
|
156
|
+
Verify that everything is set up correctly:
|
157
|
+
|
158
|
+
```bash
|
159
|
+
# Should list your connected device and show portal status
|
160
|
+
droidrun status
|
161
|
+
```
|
162
|
+
|
163
|
+
## 💻 Using the CLI
|
164
|
+
|
165
|
+
DroidRun's CLI is designed to be simple and intuitive. You can use it in two ways:
|
166
|
+
|
167
|
+
### 🚀 Basic Usage
|
168
|
+
|
169
|
+
```bash
|
170
|
+
# Format: droidrun "task description" [options]
|
171
|
+
droidrun "Open the settings app"
|
172
|
+
```
|
173
|
+
|
174
|
+
### 🔌 With Provider Options
|
175
|
+
|
176
|
+
```bash
|
177
|
+
# Using OpenAI
|
178
|
+
droidrun "Open the calculator app" --provider openai --model gpt-4o-mini
|
179
|
+
|
180
|
+
# Using Anthropic
|
181
|
+
droidrun "Check the battery level" --provider anthropic --model claude-3-sonnet-20240229
|
182
|
+
|
183
|
+
# Using Gemini
|
184
|
+
droidrun "Install and open Instagram" --provider gemini --model gemini-2.0-flash
|
185
|
+
```
|
186
|
+
|
187
|
+
### ⚙️ Additional Options
|
188
|
+
|
189
|
+
```bash
|
190
|
+
# Specify a particular device
|
191
|
+
droidrun "Open Chrome and search for weather" --device abc123
|
192
|
+
|
193
|
+
# Set maximum number of steps
|
194
|
+
droidrun "Open settings and enable dark mode" --steps 20
|
195
|
+
```
|
196
|
+
|
197
|
+
## 📝 Creating a Minimal Test Script
|
198
|
+
|
199
|
+
If you want to use DroidRun in your Python code rather than via the CLI, you can create a minimal test script:
|
200
|
+
|
201
|
+
```python
|
202
|
+
#!/usr/bin/env python3
|
203
|
+
import asyncio
|
204
|
+
import os
|
205
|
+
from droidrun.agent.react_agent import ReActAgent
|
206
|
+
from droidrun.agent.llm_reasoning import LLMReasoner
|
207
|
+
from dotenv import load_dotenv
|
208
|
+
|
209
|
+
# Load environment variables from .env file
|
210
|
+
load_dotenv()
|
211
|
+
|
212
|
+
async def main():
|
213
|
+
# Create an LLM instance (choose your preferred provider)
|
214
|
+
llm = LLMReasoner(
|
215
|
+
llm_provider="gemini", # Can be "openai", "anthropic", or "gemini"
|
216
|
+
model_name="gemini-2.0-flash", # Choose appropriate model for your provider
|
217
|
+
api_key=os.environ.get("GEMINI_API_KEY"), # Get API key from environment
|
218
|
+
temperature=0.2
|
219
|
+
)
|
220
|
+
|
221
|
+
# Create and run the agent
|
222
|
+
agent = ReActAgent(
|
223
|
+
task="Open the Settings app and check the Android version",
|
224
|
+
llm=llm
|
225
|
+
)
|
226
|
+
|
227
|
+
steps = await agent.run()
|
228
|
+
print(f"Execution completed with {len(steps)} steps")
|
229
|
+
|
230
|
+
if __name__ == "__main__":
|
231
|
+
asyncio.run(main())
|
232
|
+
```
|
233
|
+
|
234
|
+
Save this as `test_droidrun.py`, ensure your `.env` file has the appropriate API key, and run:
|
235
|
+
|
236
|
+
```bash
|
237
|
+
python test_droidrun.py
|
238
|
+
```
|
239
|
+
|
240
|
+
## ❓ Troubleshooting
|
241
|
+
|
242
|
+
### 🔑 API Key Issues
|
243
|
+
|
244
|
+
If you encounter errors about missing API keys, ensure:
|
245
|
+
1. You've set the correct environment variable for your chosen provider
|
246
|
+
2. The API key is valid and has appropriate permissions
|
247
|
+
3. You've correctly sourced your `.env` file or exported the variables manually
|
248
|
+
|
249
|
+
### 📱 Device Connection Issues
|
250
|
+
|
251
|
+
If you have trouble connecting to your device:
|
252
|
+
1. Ensure USB debugging is enabled on your Android device
|
253
|
+
2. Check that your device is recognized by ADB: `adb devices`
|
254
|
+
3. For wireless connections, make sure your device and computer are on the same network
|
255
|
+
|
256
|
+
### 🤖 LLM Provider Selection
|
257
|
+
|
258
|
+
If DroidRun is using the wrong LLM provider:
|
259
|
+
1. Explicitly specify the provider with `--provider` (in CLI) or `llm_provider=` (in code)
|
260
|
+
2. When using Gemini, ensure you have set `GEMINI_API_KEY` and specified `--provider gemini`
|
261
|
+
|
262
|
+
## 💡 Example Use Cases
|
263
|
+
|
264
|
+
- Automated UI testing of Android applications
|
265
|
+
- Creating guided workflows for non-technical users
|
266
|
+
- Automating repetitive tasks on Android devices
|
267
|
+
- Remote assistance for less technical users
|
268
|
+
- Exploring Android UI with natural language commands
|
269
|
+
|
270
|
+
## 👥 Contributing
|
271
|
+
|
272
|
+
Contributions are welcome! Please feel free to submit a Pull Request.
|
273
|
+
|
274
|
+
## 📄 License
|
275
|
+
|
276
|
+
This project is licensed under the MIT License - see the LICENSE file for details.
|
droidrun-0.1.0/README.md
ADDED
@@ -0,0 +1,234 @@
|
|
1
|
+
# 🤖 DroidRun
|
2
|
+
|
3
|
+
DroidRun is a powerful framework for controlling Android devices through LLM agents. It allows you to automate Android device interactions using natural language commands.
|
4
|
+
|
5
|
+
## ✨ Features
|
6
|
+
|
7
|
+
- Control Android devices with natural language commands
|
8
|
+
- Supports multiple LLM providers (OpenAI, Anthropic, Gemini)
|
9
|
+
- Easy to use CLI
|
10
|
+
- Extendable Python API for custom automations
|
11
|
+
- Screenshot analysis for visual understanding of the device
|
12
|
+
|
13
|
+
## 📦 Installation
|
14
|
+
|
15
|
+
### 🚀 Option 1: Install from PyPI (Recommended)
|
16
|
+
|
17
|
+
```bash
|
18
|
+
pip install droidrun
|
19
|
+
```
|
20
|
+
|
21
|
+
### 🔧 Option 2: Install from Source
|
22
|
+
|
23
|
+
```bash
|
24
|
+
git clone https://github.com/yourusername/droidrun.git
|
25
|
+
cd droidrun
|
26
|
+
pip install -e .
|
27
|
+
```
|
28
|
+
|
29
|
+
## 📋 Prerequisites
|
30
|
+
|
31
|
+
1. An Android device connected via USB or ADB over TCP/IP
|
32
|
+
2. ADB (Android Debug Bridge) installed and configured
|
33
|
+
3. DroidRun Portal app installed on your Android device
|
34
|
+
4. API key for at least one of the supported LLM providers:
|
35
|
+
- OpenAI
|
36
|
+
- Anthropic
|
37
|
+
- Google Gemini
|
38
|
+
|
39
|
+
### 🔧 Setting up ADB
|
40
|
+
|
41
|
+
ADB (Android Debug Bridge) is required for DroidRun to communicate with your Android device:
|
42
|
+
|
43
|
+
1. **Install ADB**:
|
44
|
+
- **Windows**: Download [Android SDK Platform Tools](https://developer.android.com/studio/releases/platform-tools) and extract the ZIP file
|
45
|
+
- **macOS**: `brew install android-platform-tools`
|
46
|
+
- **Linux**: `sudo apt install adb` (Ubuntu/Debian) or `sudo pacman -S android-tools` (Arch)
|
47
|
+
|
48
|
+
2. **Add ADB to your PATH**:
|
49
|
+
- **Windows**: Add the path to the extracted platform-tools folder to your system's PATH environment variable
|
50
|
+
- **macOS/Linux**: Add the following to your ~/.bashrc or ~/.zshrc:
|
51
|
+
```bash
|
52
|
+
export PATH=$PATH:/path/to/platform-tools
|
53
|
+
```
|
54
|
+
|
55
|
+
3. **Verify ADB installation**:
|
56
|
+
```bash
|
57
|
+
adb version
|
58
|
+
```
|
59
|
+
|
60
|
+
4. **Enable USB debugging on your Android device**:
|
61
|
+
- Go to **Settings → About phone**
|
62
|
+
- Tap **Build number** 7 times to enable Developer options
|
63
|
+
- Go to **Settings → System → Developer options** (location may vary by device)
|
64
|
+
- Enable **USB debugging**
|
65
|
+
|
66
|
+
## 🛠️ Setup
|
67
|
+
|
68
|
+
### 📱 1. Install DroidRun Portal App
|
69
|
+
|
70
|
+
DroidRun requires the DroidRun Portal app to be installed on your Android device:
|
71
|
+
|
72
|
+
1. Download the DroidRun Portal APK from the [DroidRun Portal repository](https://github.com/droidrun/droidrun-portal)
|
73
|
+
2. Use DroidRun to install the portal app:
|
74
|
+
```bash
|
75
|
+
droidrun setup --path=/path/to/droidrun-portal.apk
|
76
|
+
```
|
77
|
+
|
78
|
+
Alternatively, you can use ADB to install it manually:
|
79
|
+
```bash
|
80
|
+
adb install -r /path/to/droidrun-portal.apk
|
81
|
+
```
|
82
|
+
|
83
|
+
### 🔑 2. Set up API keys
|
84
|
+
|
85
|
+
Create a `.env` file in your working directory or set environment variables:
|
86
|
+
|
87
|
+
```bash
|
88
|
+
# Choose at least one of these based on your preferred provider
|
89
|
+
export OPENAI_API_KEY="your_openai_api_key_here"
|
90
|
+
export ANTHROPIC_API_KEY="your_anthropic_api_key_here"
|
91
|
+
export GEMINI_API_KEY="your_gemini_api_key_here"
|
92
|
+
```
|
93
|
+
|
94
|
+
To load the environment variables from the `.env` file:
|
95
|
+
|
96
|
+
```bash
|
97
|
+
source .env
|
98
|
+
```
|
99
|
+
|
100
|
+
### 📱 3. Connect to an Android device
|
101
|
+
|
102
|
+
Connect your device via USB or set up wireless ADB:
|
103
|
+
|
104
|
+
```bash
|
105
|
+
# List connected devices
|
106
|
+
droidrun devices
|
107
|
+
|
108
|
+
# Connect to a device over Wi-Fi
|
109
|
+
droidrun connect 192.168.1.100
|
110
|
+
```
|
111
|
+
|
112
|
+
### 🔄 4. Verify the setup
|
113
|
+
|
114
|
+
Verify that everything is set up correctly:
|
115
|
+
|
116
|
+
```bash
|
117
|
+
# Should list your connected device and show portal status
|
118
|
+
droidrun status
|
119
|
+
```
|
120
|
+
|
121
|
+
## 💻 Using the CLI
|
122
|
+
|
123
|
+
DroidRun's CLI is designed to be simple and intuitive. You can use it in two ways:
|
124
|
+
|
125
|
+
### 🚀 Basic Usage
|
126
|
+
|
127
|
+
```bash
|
128
|
+
# Format: droidrun "task description" [options]
|
129
|
+
droidrun "Open the settings app"
|
130
|
+
```
|
131
|
+
|
132
|
+
### 🔌 With Provider Options
|
133
|
+
|
134
|
+
```bash
|
135
|
+
# Using OpenAI
|
136
|
+
droidrun "Open the calculator app" --provider openai --model gpt-4o-mini
|
137
|
+
|
138
|
+
# Using Anthropic
|
139
|
+
droidrun "Check the battery level" --provider anthropic --model claude-3-sonnet-20240229
|
140
|
+
|
141
|
+
# Using Gemini
|
142
|
+
droidrun "Install and open Instagram" --provider gemini --model gemini-2.0-flash
|
143
|
+
```
|
144
|
+
|
145
|
+
### ⚙️ Additional Options
|
146
|
+
|
147
|
+
```bash
|
148
|
+
# Specify a particular device
|
149
|
+
droidrun "Open Chrome and search for weather" --device abc123
|
150
|
+
|
151
|
+
# Set maximum number of steps
|
152
|
+
droidrun "Open settings and enable dark mode" --steps 20
|
153
|
+
```
|
154
|
+
|
155
|
+
## 📝 Creating a Minimal Test Script
|
156
|
+
|
157
|
+
If you want to use DroidRun in your Python code rather than via the CLI, you can create a minimal test script:
|
158
|
+
|
159
|
+
```python
|
160
|
+
#!/usr/bin/env python3
|
161
|
+
import asyncio
|
162
|
+
import os
|
163
|
+
from droidrun.agent.react_agent import ReActAgent
|
164
|
+
from droidrun.agent.llm_reasoning import LLMReasoner
|
165
|
+
from dotenv import load_dotenv
|
166
|
+
|
167
|
+
# Load environment variables from .env file
|
168
|
+
load_dotenv()
|
169
|
+
|
170
|
+
async def main():
|
171
|
+
# Create an LLM instance (choose your preferred provider)
|
172
|
+
llm = LLMReasoner(
|
173
|
+
llm_provider="gemini", # Can be "openai", "anthropic", or "gemini"
|
174
|
+
model_name="gemini-2.0-flash", # Choose appropriate model for your provider
|
175
|
+
api_key=os.environ.get("GEMINI_API_KEY"), # Get API key from environment
|
176
|
+
temperature=0.2
|
177
|
+
)
|
178
|
+
|
179
|
+
# Create and run the agent
|
180
|
+
agent = ReActAgent(
|
181
|
+
task="Open the Settings app and check the Android version",
|
182
|
+
llm=llm
|
183
|
+
)
|
184
|
+
|
185
|
+
steps = await agent.run()
|
186
|
+
print(f"Execution completed with {len(steps)} steps")
|
187
|
+
|
188
|
+
if __name__ == "__main__":
|
189
|
+
asyncio.run(main())
|
190
|
+
```
|
191
|
+
|
192
|
+
Save this as `test_droidrun.py`, ensure your `.env` file has the appropriate API key, and run:
|
193
|
+
|
194
|
+
```bash
|
195
|
+
python test_droidrun.py
|
196
|
+
```
|
197
|
+
|
198
|
+
## ❓ Troubleshooting
|
199
|
+
|
200
|
+
### 🔑 API Key Issues
|
201
|
+
|
202
|
+
If you encounter errors about missing API keys, ensure:
|
203
|
+
1. You've set the correct environment variable for your chosen provider
|
204
|
+
2. The API key is valid and has appropriate permissions
|
205
|
+
3. You've correctly sourced your `.env` file or exported the variables manually
|
206
|
+
|
207
|
+
### 📱 Device Connection Issues
|
208
|
+
|
209
|
+
If you have trouble connecting to your device:
|
210
|
+
1. Ensure USB debugging is enabled on your Android device
|
211
|
+
2. Check that your device is recognized by ADB: `adb devices`
|
212
|
+
3. For wireless connections, make sure your device and computer are on the same network
|
213
|
+
|
214
|
+
### 🤖 LLM Provider Selection
|
215
|
+
|
216
|
+
If DroidRun is using the wrong LLM provider:
|
217
|
+
1. Explicitly specify the provider with `--provider` (in CLI) or `llm_provider=` (in code)
|
218
|
+
2. When using Gemini, ensure you have set `GEMINI_API_KEY` and specified `--provider gemini`
|
219
|
+
|
220
|
+
## 💡 Example Use Cases
|
221
|
+
|
222
|
+
- Automated UI testing of Android applications
|
223
|
+
- Creating guided workflows for non-technical users
|
224
|
+
- Automating repetitive tasks on Android devices
|
225
|
+
- Remote assistance for less technical users
|
226
|
+
- Exploring Android UI with natural language commands
|
227
|
+
|
228
|
+
## 👥 Contributing
|
229
|
+
|
230
|
+
Contributions are welcome! Please feel free to submit a Pull Request.
|
231
|
+
|
232
|
+
## 📄 License
|
233
|
+
|
234
|
+
This project is licensed under the MIT License - see the LICENSE file for details.
|
@@ -0,0 +1,141 @@
|
|
1
|
+
---
|
2
|
+
title: 'ReAct Agent'
|
3
|
+
description: 'Understanding the ReAct Agent system in DroidRun'
|
4
|
+
---
|
5
|
+
|
6
|
+
# 🤖 ReAct Agent
|
7
|
+
|
8
|
+
DroidRun uses a ReAct (Reasoning + Acting) agent to control Android devices. This powerful approach combines LLM reasoning with concrete actions to achieve complex automation tasks.
|
9
|
+
|
10
|
+
## 📚 What is ReAct?
|
11
|
+
|
12
|
+
ReAct is a framework that combines:
|
13
|
+
|
14
|
+
- **Reasoning**: Using an LLM to interpret tasks, make decisions, and plan steps
|
15
|
+
- **Acting**: Executing concrete actions on an Android device
|
16
|
+
- **Observing**: Getting feedback from actions to inform future reasoning
|
17
|
+
|
18
|
+
This loop of reasoning, acting, and observing allows the agent to handle complex, multi-step tasks on Android devices.
|
19
|
+
|
20
|
+
## 🔄 The ReAct Loop
|
21
|
+
|
22
|
+
<Steps>
|
23
|
+
<Step title="Goal Setting">
|
24
|
+
The user provides a natural language task like "Open settings and enable dark mode"
|
25
|
+
</Step>
|
26
|
+
<Step title="Reasoning">
|
27
|
+
The LLM analyzes the task and determines what steps are needed
|
28
|
+
</Step>
|
29
|
+
<Step title="Action Selection">
|
30
|
+
The agent selects an appropriate action (e.g., tapping a UI element)
|
31
|
+
</Step>
|
32
|
+
<Step title="Execution">
|
33
|
+
The action is executed on the Android device
|
34
|
+
</Step>
|
35
|
+
<Step title="Observation">
|
36
|
+
The agent observes the result (e.g., a new screen appears)
|
37
|
+
</Step>
|
38
|
+
<Step title="Further Reasoning">
|
39
|
+
The agent evaluates progress and decides on the next action
|
40
|
+
</Step>
|
41
|
+
</Steps>
|
42
|
+
|
43
|
+
This cycle repeats until the task is completed or the maximum number of steps is reached.
|
44
|
+
|
45
|
+
## 🛠️ Available Actions
|
46
|
+
|
47
|
+
The ReAct agent can perform various actions on Android devices:
|
48
|
+
|
49
|
+
<AccordionGroup>
|
50
|
+
<Accordion title="UI Interaction">
|
51
|
+
- `tap(index)` - Tap on a UI element by its index
|
52
|
+
- `swipe(start_x, start_y, end_x, end_y)` - Swipe from one point to another
|
53
|
+
- `input_text(text)` - Type text into the current field
|
54
|
+
- `press_key(keycode)` - Press a specific key (e.g., HOME, BACK)
|
55
|
+
</Accordion>
|
56
|
+
|
57
|
+
<Accordion title="App Management">
|
58
|
+
- `start_app(package)` - Launch an app by package name
|
59
|
+
- `list_packages()` - List installed packages
|
60
|
+
- `install_app(apk_path)` - Install an app from APK
|
61
|
+
- `uninstall_app(package)` - Uninstall an app
|
62
|
+
</Accordion>
|
63
|
+
|
64
|
+
<Accordion title="UI Analysis">
|
65
|
+
- `take_screenshot()` - Capture the current screen (vision mode only)
|
66
|
+
- `get_clickables()` - Identify clickable elements on screen
|
67
|
+
- `extract(filename)` - Save complete UI state to a JSON file
|
68
|
+
</Accordion>
|
69
|
+
|
70
|
+
<Accordion title="Task Management">
|
71
|
+
- `complete(result)` - Mark the task as complete with a summary
|
72
|
+
</Accordion>
|
73
|
+
</AccordionGroup>
|
74
|
+
|
75
|
+
## 📸 Vision Capabilities
|
76
|
+
|
77
|
+
When vision mode is enabled, the ReAct agent can analyze screenshots to better understand the UI:
|
78
|
+
|
79
|
+
```python
|
80
|
+
agent = ReActAgent(
|
81
|
+
task="Open settings and enable dark mode",
|
82
|
+
llm=llm_instance,
|
83
|
+
vision=True # Enable vision capabilities
|
84
|
+
)
|
85
|
+
```
|
86
|
+
|
87
|
+
This provides several benefits:
|
88
|
+
|
89
|
+
- **Visual Context**: The LLM can see exactly what's on screen
|
90
|
+
- **Better UI Understanding**: Recognizes UI elements even if text detection is imperfect
|
91
|
+
- **Complex Navigation**: Handles apps with unusual or complex interfaces more effectively
|
92
|
+
|
93
|
+
## 📊 Token Usage Tracking
|
94
|
+
|
95
|
+
The ReAct agent now tracks token usage for all LLM interactions:
|
96
|
+
|
97
|
+
```python
|
98
|
+
# After running the agent
|
99
|
+
stats = llm.get_token_usage_stats()
|
100
|
+
print(f"Total tokens: {stats['total_tokens']}")
|
101
|
+
print(f"API calls: {stats['api_calls']}")
|
102
|
+
```
|
103
|
+
|
104
|
+
This information is useful for:
|
105
|
+
|
106
|
+
- **Cost Management**: Track and optimize your API usage costs
|
107
|
+
- **Performance Tuning**: Identify steps that require the most tokens
|
108
|
+
- **Troubleshooting**: Debug issues with prompt sizes or response lengths
|
109
|
+
|
110
|
+
## 🧠 Agent Parameters
|
111
|
+
|
112
|
+
When creating a ReAct agent, you can configure several parameters:
|
113
|
+
|
114
|
+
```python
|
115
|
+
agent = ReActAgent(
|
116
|
+
task="Open settings and enable dark mode", # The goal to achieve
|
117
|
+
llm=llm_instance, # LLM to use for reasoning
|
118
|
+
device_serial="DEVICE123", # Optional specific device
|
119
|
+
max_steps=15, # Maximum steps to attempt
|
120
|
+
vision=False # Whether to enable vision capabilities
|
121
|
+
)
|
122
|
+
```
|
123
|
+
|
124
|
+
## 📊 Step Types
|
125
|
+
|
126
|
+
The agent records its progress using different step types:
|
127
|
+
|
128
|
+
- **Thought**: Internal reasoning about what to do
|
129
|
+
- **Action**: An action to be executed on the device
|
130
|
+
- **Observation**: Result of an action
|
131
|
+
- **Plan**: A sequence of steps to achieve the goal
|
132
|
+
- **Goal**: The target state to achieve
|
133
|
+
|
134
|
+
## 💡 Best Practices
|
135
|
+
|
136
|
+
1. **Clear Goals**: Provide specific, clear instructions
|
137
|
+
2. **Realistic Tasks**: Break complex automation into manageable tasks
|
138
|
+
3. **Vision for Complex UIs**: Enable vision mode for complex UI navigation
|
139
|
+
4. **Step Limits**: Set reasonable max_steps to prevent infinite loops
|
140
|
+
5. **Device Connectivity**: Ensure stable connection to your device
|
141
|
+
6. **Token Optimization**: Monitor token usage for cost-effective automation
|