PyPI - cua-agent - Versions diffs - 0.4.24__tar.gz → 0.4.25__tar.gz - Mend

cua-agent 0.4.24tar.gz → 0.4.25tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Potentially problematic release.

This version of cua-agent might be problematic. Click here for more details.

Files changed (52) hide show

cua_agent-0.4.25/PKG-INFO ADDED Viewed

@@ -0,0 +1,138 @@
+Metadata-Version: 2.1
+Name: cua-agent
+Version: 0.4.25
+Summary: CUA (Computer Use) Agent for AI-driven computer interaction
+Author-Email: TryCua <gh@trycua.com>
+Requires-Python: >=3.12
+Requires-Dist: httpx>=0.27.0
+Requires-Dist: aiohttp>=3.9.3
+Requires-Dist: asyncio
+Requires-Dist: anyio>=4.4.1
+Requires-Dist: typing-extensions>=4.12.2
+Requires-Dist: pydantic>=2.6.4
+Requires-Dist: rich>=13.7.1
+Requires-Dist: python-dotenv>=1.0.1
+Requires-Dist: cua-computer<0.5.0,>=0.4.0
+Requires-Dist: cua-core<0.2.0,>=0.1.8
+Requires-Dist: certifi>=2024.2.2
+Requires-Dist: litellm>=1.74.12
+Provides-Extra: openai
+Provides-Extra: anthropic
+Provides-Extra: omni
+Requires-Dist: cua-som<0.2.0,>=0.1.0; extra == "omni"
+Provides-Extra: uitars
+Provides-Extra: uitars-mlx
+Requires-Dist: mlx-vlm>=0.1.27; sys_platform == "darwin" and extra == "uitars-mlx"
+Provides-Extra: uitars-hf
+Requires-Dist: accelerate; extra == "uitars-hf"
+Requires-Dist: torch; extra == "uitars-hf"
+Requires-Dist: transformers>=4.54.0; extra == "uitars-hf"
+Provides-Extra: glm45v-hf
+Requires-Dist: accelerate; extra == "glm45v-hf"
+Requires-Dist: torch; extra == "glm45v-hf"
+Requires-Dist: transformers-v4.55.0-GLM-4.5V-preview; extra == "glm45v-hf"
+Provides-Extra: ui
+Requires-Dist: gradio>=5.23.3; extra == "ui"
+Requires-Dist: python-dotenv>=1.0.1; extra == "ui"
+Provides-Extra: cli
+Requires-Dist: yaspin>=3.1.0; extra == "cli"
+Provides-Extra: hud
+Requires-Dist: hud-python<0.5.0,>=0.4.12; extra == "hud"
+Provides-Extra: all
+Requires-Dist: mlx-vlm>=0.1.27; sys_platform == "darwin" and extra == "all"
+Requires-Dist: accelerate; extra == "all"
+Requires-Dist: torch; extra == "all"
+Requires-Dist: transformers>=4.54.0; extra == "all"
+Requires-Dist: gradio>=5.23.3; extra == "all"
+Requires-Dist: python-dotenv>=1.0.1; extra == "all"
+Requires-Dist: yaspin>=3.1.0; extra == "all"
+Requires-Dist: hud-python<0.5.0,>=0.4.12; extra == "all"
+Description-Content-Type: text/markdown
+<div align="center">
+<h1>
+  <div class="image-wrapper" style="display: inline-block;">
+    <picture>
+      <source media="(prefers-color-scheme: dark)" alt="logo" height="150" srcset="https://raw.githubusercontent.com/trycua/cua/main/img/logo_white.png" style="display: block; margin: auto;">
+      <source media="(prefers-color-scheme: light)" alt="logo" height="150" srcset="https://raw.githubusercontent.com/trycua/cua/main/img/logo_black.png" style="display: block; margin: auto;">
+      <img alt="Shows my svg">
+    </picture>
+  </div>
+  [![Python](https://img.shields.io/badge/Python-333333?logo=python&logoColor=white&labelColor=333333)](#)
+  [![macOS](https://img.shields.io/badge/macOS-000000?logo=apple&logoColor=F0F0F0)](#)
+  [![Discord](https://img.shields.io/badge/Discord-%235865F2.svg?&logo=discord&logoColor=white)](https://discord.com/invite/mVnXXpdE85)
+  [![PyPI](https://img.shields.io/pypi/v/cua-computer?color=333333)](https://pypi.org/project/cua-computer/)
+</h1>
+</div>
+**cua-agent** is a general Computer-Use framework with liteLLM integration for running agentic workflows on macOS, Windows, and Linux sandboxes. It provides a unified interface for computer-use agents across multiple LLM providers with advanced callback system for extensibility.
+## Features
+- **Safe Computer-Use/Tool-Use**: Using Computer SDK for sandboxed desktops
+- **Multi-Agent Support**: Anthropic Claude, OpenAI computer-use-preview, UI-TARS, Omniparser + any LLM
+- **Multi-API Support**: Take advantage of liteLLM supporting 100+ LLMs / model APIs, including local models (`huggingface-local/`, `ollama_chat/`, `mlx/`)
+- **Cross-Platform**: Works on Windows, macOS, and Linux with cloud and local computer instances
+- **Extensible Callbacks**: Built-in support for image retention, cache control, PII anonymization, budget limits, and trajectory tracking
+## Install
+```bash
+pip install "cua-agent[all]"
+```
+## Quick Start
+```python
+import asyncio
+import os
+from agent import ComputerAgent
+from computer import Computer
+async def main():
+    # Set up computer instance
+    async with Computer(
+        os_type="linux",
+        provider_type="cloud",
+        name=os.getenv("CUA_CONTAINER_NAME"),
+        api_key=os.getenv("CUA_API_KEY")
+    ) as computer:
+        # Create agent
+        agent = ComputerAgent(
+            model="anthropic/claude-3-5-sonnet-20241022",
+            tools=[computer],
+            only_n_most_recent_images=3,
+            trajectory_dir="trajectories",
+            max_trajectory_budget=5.0  # $5 budget limit
+        )
+        # Run agent
+        messages = [{"role": "user", "content": "Take a screenshot and tell me what you see"}]
+        async for result in agent.run(messages):
+            for item in result["output"]:
+                if item["type"] == "message":
+                    print(item["content"][0]["text"])
+if __name__ == "__main__":
+    asyncio.run(main())
+```
+## Docs
+- [Agent Loops](https://trycua.com/docs/agent-sdk/agent-loops)
+- [Supported Agents](https://trycua.com/docs/agent-sdk/supported-agents)
+- [Supported Models](https://trycua.com/docs/agent-sdk/supported-models)
+- [Chat History](https://trycua.com/docs/agent-sdk/chat-history)
+- [Callbacks](https://trycua.com/docs/agent-sdk/callbacks)
+- [Custom Tools](https://trycua.com/docs/agent-sdk/custom-tools)
+- [Custom Computer Handlers](https://trycua.com/docs/agent-sdk/custom-computer-handlers)
+- [Prompt Caching](https://trycua.com/docs/agent-sdk/prompt-caching)
+- [Usage Tracking](https://trycua.com/docs/agent-sdk/usage-tracking)
+- [Benchmarks](https://trycua.com/docs/agent-sdk/benchmarks)
+## License
+MIT License - see LICENSE file for details.

cua_agent-0.4.25/README.md ADDED Viewed

@@ -0,0 +1,87 @@
+<div align="center">
+<h1>
+  <div class="image-wrapper" style="display: inline-block;">
+    <picture>
+      <source media="(prefers-color-scheme: dark)" alt="logo" height="150" srcset="https://raw.githubusercontent.com/trycua/cua/main/img/logo_white.png" style="display: block; margin: auto;">
+      <source media="(prefers-color-scheme: light)" alt="logo" height="150" srcset="https://raw.githubusercontent.com/trycua/cua/main/img/logo_black.png" style="display: block; margin: auto;">
+      <img alt="Shows my svg">
+    </picture>
+  </div>
+  [![Python](https://img.shields.io/badge/Python-333333?logo=python&logoColor=white&labelColor=333333)](#)
+  [![macOS](https://img.shields.io/badge/macOS-000000?logo=apple&logoColor=F0F0F0)](#)
+  [![Discord](https://img.shields.io/badge/Discord-%235865F2.svg?&logo=discord&logoColor=white)](https://discord.com/invite/mVnXXpdE85)
+  [![PyPI](https://img.shields.io/pypi/v/cua-computer?color=333333)](https://pypi.org/project/cua-computer/)
+</h1>
+</div>
+**cua-agent** is a general Computer-Use framework with liteLLM integration for running agentic workflows on macOS, Windows, and Linux sandboxes. It provides a unified interface for computer-use agents across multiple LLM providers with advanced callback system for extensibility.
+## Features
+- **Safe Computer-Use/Tool-Use**: Using Computer SDK for sandboxed desktops
+- **Multi-Agent Support**: Anthropic Claude, OpenAI computer-use-preview, UI-TARS, Omniparser + any LLM
+- **Multi-API Support**: Take advantage of liteLLM supporting 100+ LLMs / model APIs, including local models (`huggingface-local/`, `ollama_chat/`, `mlx/`)
+- **Cross-Platform**: Works on Windows, macOS, and Linux with cloud and local computer instances
+- **Extensible Callbacks**: Built-in support for image retention, cache control, PII anonymization, budget limits, and trajectory tracking
+## Install
+```bash
+pip install "cua-agent[all]"
+```
+## Quick Start
+```python
+import asyncio
+import os
+from agent import ComputerAgent
+from computer import Computer
+async def main():
+    # Set up computer instance
+    async with Computer(
+        os_type="linux",
+        provider_type="cloud",
+        name=os.getenv("CUA_CONTAINER_NAME"),
+        api_key=os.getenv("CUA_API_KEY")
+    ) as computer:
+        # Create agent
+        agent = ComputerAgent(
+            model="anthropic/claude-3-5-sonnet-20241022",
+            tools=[computer],
+            only_n_most_recent_images=3,
+            trajectory_dir="trajectories",
+            max_trajectory_budget=5.0  # $5 budget limit
+        )
+        # Run agent
+        messages = [{"role": "user", "content": "Take a screenshot and tell me what you see"}]
+        async for result in agent.run(messages):
+            for item in result["output"]:
+                if item["type"] == "message":
+                    print(item["content"][0]["text"])
+if __name__ == "__main__":
+    asyncio.run(main())
+```
+## Docs
+- [Agent Loops](https://trycua.com/docs/agent-sdk/agent-loops)
+- [Supported Agents](https://trycua.com/docs/agent-sdk/supported-agents)
+- [Supported Models](https://trycua.com/docs/agent-sdk/supported-models)
+- [Chat History](https://trycua.com/docs/agent-sdk/chat-history)
+- [Callbacks](https://trycua.com/docs/agent-sdk/callbacks)
+- [Custom Tools](https://trycua.com/docs/agent-sdk/custom-tools)
+- [Custom Computer Handlers](https://trycua.com/docs/agent-sdk/custom-computer-handlers)
+- [Prompt Caching](https://trycua.com/docs/agent-sdk/prompt-caching)
+- [Usage Tracking](https://trycua.com/docs/agent-sdk/usage-tracking)
+- [Benchmarks](https://trycua.com/docs/agent-sdk/benchmarks)
+## License
+MIT License - see LICENSE file for details.

{cua_agent-0.4.24 → cua_agent-0.4.25}/agent/human_tool/ui.py RENAMED Viewed

@@ -15,6 +15,11 @@ class HumanCompletionUI:
         self.current_call_id: Optional[str] = None
         self.refresh_interval = 2.0  # seconds
         self.last_image = None  # Store the last image for display
+        # Track current interactive action controls
+        self.current_action_type: str = "click"
+        self.current_button: str = "left"
+        self.current_scroll_x: int = 0
+        self.current_scroll_y: int = -120
     def format_messages_for_chatbot(self, messages: List[Dict[str, Any]]) -> List[Dict[str, Any]]:
         """Format messages for display in gr.Chatbot with type='messages'."""
@@ -440,8 +445,8 @@ def create_ui():
                     with gr.Group(visible=False) as click_actions_group:
                         with gr.Row():
                             action_type_radio = gr.Dropdown(
-                                label="Action",
-                                choices=["click", "double_click", "move", "left_mouse_up", "left_mouse_down"],
+                                label="Interactive Action",
+                                choices=["click", "double_click", "move", "left_mouse_up", "left_mouse_down", "scroll"],
                                 value="click",
                                 scale=2
                             )
@@ -452,6 +457,18 @@ def create_ui():
                                 visible=True,
                                 scale=1
                             )
+                            scroll_x_input = gr.Number(
+                                label="scroll_x",
+                                value=0,
+                                visible=False,
+                                scale=1
+                            )
+                            scroll_y_input = gr.Number(
+                                label="scroll_y",
+                                value=-120,
+                                visible=False,
+                                scale=1
+                            )
                     conversation_chatbot = gr.Chatbot(
                         label="Conversation",
@@ -545,9 +562,15 @@ def create_ui():
         def handle_image_click(evt: gr.SelectData):
             if evt.index is not None:
                 x, y = evt.index
-                action_type = action_type_radio.value or "click"
-                button = action_button_radio.value or "left"
-                result = ui_handler.submit_click_action(x, y, action_type, button)
+                action_type = ui_handler.current_action_type or "click"
+                button = ui_handler.current_button or "left"
+                if action_type == "scroll":
+                    sx_i = int(ui_handler.current_scroll_x or 0)
+                    sy_i = int(ui_handler.current_scroll_y or 0)
+                    # Submit a scroll action with x,y position and scroll deltas
+                    result = ui_handler.submit_action("scroll", x=x, y=y, scroll_x=sx_i, scroll_y=sy_i)
+                else:
+                    result = ui_handler.submit_click_action(x, y, action_type, button)
                 ui_handler.wait_for_pending_calls()
                 return result
             return "No coordinates selected"
@@ -570,14 +593,49 @@ def create_ui():
             outputs=[call_dropdown, screenshot_image, conversation_chatbot, submit_btn, click_actions_group, actions_group]
         )
-        # Toggle button radio visibility based on action type
-        def toggle_button_visibility(action_type):
-            return gr.update(visible=(action_type == "click"))
+        # Toggle visibility of controls based on action type
+        def toggle_action_controls(action_type):
+            # Button visible only for click
+            button_vis = gr.update(visible=(action_type == "click"))
+            # Scroll inputs visible only for scroll
+            scroll_x_vis = gr.update(visible=(action_type == "scroll"))
+            scroll_y_vis = gr.update(visible=(action_type == "scroll"))
+            # Update state
+            ui_handler.current_action_type = action_type or "click"
+            return button_vis, scroll_x_vis, scroll_y_vis
         action_type_radio.change(
-            fn=toggle_button_visibility,
+            fn=toggle_action_controls,
             inputs=[action_type_radio],
-            outputs=[action_button_radio]
+            outputs=[action_button_radio, scroll_x_input, scroll_y_input]
+        )
+        # Keep other control values in ui_handler state
+        def on_button_change(val):
+            ui_handler.current_button = (val or "left")
+        action_button_radio.change(
+            fn=on_button_change,
+            inputs=[action_button_radio]
+        )
+        def on_scroll_x_change(val):
+            try:
+                ui_handler.current_scroll_x = int(val) if val is not None else 0
+            except Exception:
+                ui_handler.current_scroll_x = 0
+        scroll_x_input.change(
+            fn=on_scroll_x_change,
+            inputs=[scroll_x_input]
+        )
+        def on_scroll_y_change(val):
+            try:
+                ui_handler.current_scroll_y = int(val) if val is not None else 0
+            except Exception:
+                ui_handler.current_scroll_y = 0
+        scroll_y_input.change(
+            fn=on_scroll_y_change,
+            inputs=[scroll_y_input]
         )
         type_submit_btn.click(

{cua_agent-0.4.24 → cua_agent-0.4.25}/agent/loops/anthropic.py RENAMED Viewed

@@ -132,23 +132,22 @@ def _convert_responses_items_to_completion_messages(messages: Messages) -> List[
                 converted_content = []
                 for item in content:
                     if isinstance(item, dict) and item.get("type") == "input_image":
-                        # Convert input_image to Anthropic image format
+                        # Convert input_image to OpenAI image format
                         image_url = item.get("image_url", "")
                         if image_url and image_url != "[omitted]":
-                            # Extract base64 data from data URL
-                            if "," in image_url:
-                                base64_data = image_url.split(",")[-1]
-                            else:
-                                base64_data = image_url
                             converted_content.append({
-                                "type": "image",
-                                "source": {
-                                    "type": "base64",
-                                    "media_type": "image/png",
-                                    "data": base64_data
+                                "type": "image_url",
+                                "image_url": {
+                                    "url": image_url
                                 }
                             })
+                    elif isinstance(item, dict) and item.get("type") == "input_text":
+                        # Convert input_text to OpenAI text format
+                        text = item.get("text", "")
+                        converted_content.append({
+                            "type": "text",
+                            "text": text
+                        })
                     else:
                         # Keep other content types as-is
                         converted_content.append(item)

{cua_agent-0.4.24 → cua_agent-0.4.25}/pyproject.toml RENAMED Viewed

@@ -6,7 +6,7 @@ build-backend = "pdm.backend"
 [project]
 name = "cua-agent"
-version = "0.4.24"
+version = "0.4.25"
 description = "CUA (Computer Use) Agent for AI-driven computer interaction"
 readme = "README.md"
 authors = [
@@ -32,7 +32,6 @@ requires-python = ">=3.12"
 openai = []
 anthropic = []
 omni = [
-    "ultralytics>=8.0.0",
     "cua-som>=0.1.0,<0.2.0",
 ]
 uitars = []
@@ -60,8 +59,6 @@ hud = [
     "hud-python>=0.4.12,<0.5.0",
 ]
 all = [
-    "ultralytics>=8.0.0",
-    "cua-som>=0.1.0,<0.2.0",
     "mlx-vlm>=0.1.27; sys_platform == 'darwin'",
     "accelerate",
     "torch",

cua-agent 0.4.24__tar.gz → 0.4.25__tar.gz

Potentially problematic release.

cua-agent 0.4.24tar.gz → 0.4.25tar.gz