PyPI - cua-agent - Versions diffs - 0.1.6__tar.gz → 0.1.18__tar.gz - Mend

cua-agent 0.1.6tar.gz → 0.1.18tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Potentially problematic release.

This version of cua-agent might be problematic. Click here for more details.

Files changed (96) hide show

cua_agent-0.1.18/PKG-INFO ADDED Viewed

@@ -0,0 +1,165 @@
+Metadata-Version: 2.1
+Name: cua-agent
+Version: 0.1.18
+Summary: CUA (Computer Use) Agent for AI-driven computer interaction
+Author-Email: TryCua <gh@trycua.com>
+Requires-Python: <3.13,>=3.10
+Requires-Dist: httpx<0.29.0,>=0.27.0
+Requires-Dist: aiohttp<4.0.0,>=3.9.3
+Requires-Dist: asyncio
+Requires-Dist: anyio<5.0.0,>=4.4.1
+Requires-Dist: typing-extensions<5.0.0,>=4.12.2
+Requires-Dist: pydantic<3.0.0,>=2.6.4
+Requires-Dist: rich<14.0.0,>=13.7.1
+Requires-Dist: python-dotenv<2.0.0,>=1.0.1
+Requires-Dist: cua-computer<0.2.0,>=0.1.0
+Requires-Dist: cua-core<0.2.0,>=0.1.0
+Requires-Dist: certifi>=2024.2.2
+Provides-Extra: anthropic
+Requires-Dist: anthropic>=0.49.0; extra == "anthropic"
+Requires-Dist: boto3<2.0.0,>=1.35.81; extra == "anthropic"
+Provides-Extra: openai
+Requires-Dist: openai<2.0.0,>=1.14.0; extra == "openai"
+Requires-Dist: httpx<0.29.0,>=0.27.0; extra == "openai"
+Provides-Extra: som
+Requires-Dist: torch>=2.2.1; extra == "som"
+Requires-Dist: torchvision>=0.17.1; extra == "som"
+Requires-Dist: ultralytics>=8.0.0; extra == "som"
+Requires-Dist: transformers>=4.38.2; extra == "som"
+Requires-Dist: cua-som<0.2.0,>=0.1.0; extra == "som"
+Requires-Dist: anthropic<0.47.0,>=0.46.0; extra == "som"
+Requires-Dist: boto3<2.0.0,>=1.35.81; extra == "som"
+Requires-Dist: openai<2.0.0,>=1.14.0; extra == "som"
+Requires-Dist: groq<0.5.0,>=0.4.0; extra == "som"
+Requires-Dist: dashscope<2.0.0,>=1.13.0; extra == "som"
+Requires-Dist: requests<3.0.0,>=2.31.0; extra == "som"
+Provides-Extra: all
+Requires-Dist: torch>=2.2.1; extra == "all"
+Requires-Dist: torchvision>=0.17.1; extra == "all"
+Requires-Dist: ultralytics>=8.0.0; extra == "all"
+Requires-Dist: transformers>=4.38.2; extra == "all"
+Requires-Dist: cua-som<0.2.0,>=0.1.0; extra == "all"
+Requires-Dist: anthropic<0.47.0,>=0.46.0; extra == "all"
+Requires-Dist: boto3<2.0.0,>=1.35.81; extra == "all"
+Requires-Dist: openai<2.0.0,>=1.14.0; extra == "all"
+Requires-Dist: groq<0.5.0,>=0.4.0; extra == "all"
+Requires-Dist: dashscope<2.0.0,>=1.13.0; extra == "all"
+Requires-Dist: requests<3.0.0,>=2.31.0; extra == "all"
+Description-Content-Type: text/markdown
+<div align="center">
+<h1>
+  <div class="image-wrapper" style="display: inline-block;">
+    <picture>
+      <source media="(prefers-color-scheme: dark)" alt="logo" height="150" srcset="../../img/logo_white.png" style="display: block; margin: auto;">
+      <source media="(prefers-color-scheme: light)" alt="logo" height="150" srcset="../../img/logo_black.png" style="display: block; margin: auto;">
+      <img alt="Shows my svg">
+    </picture>
+  </div>
+  [![Python](https://img.shields.io/badge/Python-333333?logo=python&logoColor=white&labelColor=333333)](#)
+  [![macOS](https://img.shields.io/badge/macOS-000000?logo=apple&logoColor=F0F0F0)](#)
+  [![Discord](https://img.shields.io/badge/Discord-%235865F2.svg?&logo=discord&logoColor=white)](https://discord.com/invite/mVnXXpdE85)
+  [![PyPI](https://img.shields.io/pypi/v/cua-computer?color=333333)](https://pypi.org/project/cua-computer/)
+</h1>
+</div>
+**cua-agent** is a general Computer-Use framework for running multi-app agentic workflows targeting macOS and Linux sandbox created with Cua, supporting local (Ollama) and cloud model providers (OpenAI, Anthropic, Groq, DeepSeek, Qwen).
+### Get started with Agent
+<div align="center">
+    <img src="../../img/agent.png"/>
+</div>
+## Install
+```bash
+pip install "cua-agent[all]"
+# or install specific loop providers
+pip install "cua-agent[openai]" # OpenAI Cua Loop
+pip install "cua-agent[anthropic]" # Anthropic Cua Loop
+pip install "cua-agent[omni]" # Cua Loop based on OmniParser
+```
+## Run
+```bash
+async with Computer() as macos_computer:
+  # Create agent with loop and provider
+  agent = ComputerAgent(
+      computer=macos_computer,
+      loop=AgentLoop.OPENAI,
+      model=LLM(provider=LLMProvider.OPENAI)
+  )
+  tasks = [
+      "Look for a repository named trycua/cua on GitHub.",
+      "Check the open issues, open the most recent one and read it.",
+      "Clone the repository in users/lume/projects if it doesn't exist yet.",
+      "Open the repository with an app named Cursor (on the dock, black background and white cube icon).",
+      "From Cursor, open Composer if not already open.",
+      "Focus on the Composer text area, then write and submit a task to help resolve the GitHub issue.",
+  ]
+  for i, task in enumerate(tasks):
+      print(f"\nExecuting task {i}/{len(tasks)}: {task}")
+      async for result in agent.run(task):
+          print(result)
+      print(f"\n✅ Task {i+1}/{len(tasks)} completed: {task}")
+```
+Refer to these notebooks for step-by-step guides on how to use the Computer-Use Agent (CUA):
+- [Agent Notebook](../../notebooks/agent_nb.ipynb) - Complete examples and workflows
+## Agent Loops
+The `cua-agent` package provides three agent loops variations, based on different CUA models providers and techniques:
+| Agent Loop | Supported Models | Description | Set-Of-Marks |
+|:-----------|:-----------------|:------------|:-------------|
+| `AgentLoop.OPENAI` | • `computer_use_preview` | Use OpenAI Operator CUA model | Not Required |
+| `AgentLoop.ANTHROPIC` | • `claude-3-5-sonnet-20240620`<br>• `claude-3-7-sonnet-20250219` | Use Anthropic Computer-Use | Not Required |
+| `AgentLoop.OMNI` <br>(preview) | • `claude-3-5-sonnet-20240620`<br>• `claude-3-7-sonnet-20250219`<br>• `gpt-4.5-preview`<br>• `gpt-4o`<br>• `gpt-4`<br>• `gpt-3.5-turbo` | Use OmniParser for element pixel-detection (SoM) and any VLMs | OmniParser |
+## AgentResponse
+The `AgentResponse` class represents the structured output returned after each agent turn. It contains the agent's response, reasoning, tool usage, and other metadata. The response format aligns with the new [OpenAI Agent SDK specification](https://platform.openai.com/docs/api-reference/responses) for better consistency across different agent loops.
+```python
+async for result in agent.run(task):
+  print("Response ID: ", result.get("id"))
+  # Print detailed usage information
+  usage = result.get("usage")
+  if usage:
+      print("\nUsage Details:")
+      print(f"  Input Tokens: {usage.get('input_tokens')}")
+      if "input_tokens_details" in usage:
+          print(f"  Input Tokens Details: {usage.get('input_tokens_details')}")
+      print(f"  Output Tokens: {usage.get('output_tokens')}")
+      if "output_tokens_details" in usage:
+          print(f"  Output Tokens Details: {usage.get('output_tokens_details')}")
+      print(f"  Total Tokens: {usage.get('total_tokens')}")
+  print("Response Text: ", result.get("text"))
+  # Print tools information
+  tools = result.get("tools")
+  if tools:
+      print("\nTools:")
+      print(tools)
+  # Print reasoning and tool call outputs
+  outputs = result.get("output", [])
+  for output in outputs:
+      output_type = output.get("type")
+      if output_type == "reasoning":
+          print("\nReasoning Output:")
+          print(output)
+      elif output_type == "computer_call":
+          print("\nTool Call Output:")
+          print(output)
+```

cua_agent-0.1.18/README.md ADDED Viewed

@@ -0,0 +1,116 @@
+<div align="center">
+<h1>
+  <div class="image-wrapper" style="display: inline-block;">
+    <picture>
+      <source media="(prefers-color-scheme: dark)" alt="logo" height="150" srcset="../../img/logo_white.png" style="display: block; margin: auto;">
+      <source media="(prefers-color-scheme: light)" alt="logo" height="150" srcset="../../img/logo_black.png" style="display: block; margin: auto;">
+      <img alt="Shows my svg">
+    </picture>
+  </div>
+  [![Python](https://img.shields.io/badge/Python-333333?logo=python&logoColor=white&labelColor=333333)](#)
+  [![macOS](https://img.shields.io/badge/macOS-000000?logo=apple&logoColor=F0F0F0)](#)
+  [![Discord](https://img.shields.io/badge/Discord-%235865F2.svg?&logo=discord&logoColor=white)](https://discord.com/invite/mVnXXpdE85)
+  [![PyPI](https://img.shields.io/pypi/v/cua-computer?color=333333)](https://pypi.org/project/cua-computer/)
+</h1>
+</div>
+**cua-agent** is a general Computer-Use framework for running multi-app agentic workflows targeting macOS and Linux sandbox created with Cua, supporting local (Ollama) and cloud model providers (OpenAI, Anthropic, Groq, DeepSeek, Qwen).
+### Get started with Agent
+<div align="center">
+    <img src="../../img/agent.png"/>
+</div>
+## Install
+```bash
+pip install "cua-agent[all]"
+# or install specific loop providers
+pip install "cua-agent[openai]" # OpenAI Cua Loop
+pip install "cua-agent[anthropic]" # Anthropic Cua Loop
+pip install "cua-agent[omni]" # Cua Loop based on OmniParser
+```
+## Run
+```bash
+async with Computer() as macos_computer:
+  # Create agent with loop and provider
+  agent = ComputerAgent(
+      computer=macos_computer,
+      loop=AgentLoop.OPENAI,
+      model=LLM(provider=LLMProvider.OPENAI)
+  )
+  tasks = [
+      "Look for a repository named trycua/cua on GitHub.",
+      "Check the open issues, open the most recent one and read it.",
+      "Clone the repository in users/lume/projects if it doesn't exist yet.",
+      "Open the repository with an app named Cursor (on the dock, black background and white cube icon).",
+      "From Cursor, open Composer if not already open.",
+      "Focus on the Composer text area, then write and submit a task to help resolve the GitHub issue.",
+  ]
+  for i, task in enumerate(tasks):
+      print(f"\nExecuting task {i}/{len(tasks)}: {task}")
+      async for result in agent.run(task):
+          print(result)
+      print(f"\n✅ Task {i+1}/{len(tasks)} completed: {task}")
+```
+Refer to these notebooks for step-by-step guides on how to use the Computer-Use Agent (CUA):
+- [Agent Notebook](../../notebooks/agent_nb.ipynb) - Complete examples and workflows
+## Agent Loops
+The `cua-agent` package provides three agent loops variations, based on different CUA models providers and techniques:
+| Agent Loop | Supported Models | Description | Set-Of-Marks |
+|:-----------|:-----------------|:------------|:-------------|
+| `AgentLoop.OPENAI` | • `computer_use_preview` | Use OpenAI Operator CUA model | Not Required |
+| `AgentLoop.ANTHROPIC` | • `claude-3-5-sonnet-20240620`<br>• `claude-3-7-sonnet-20250219` | Use Anthropic Computer-Use | Not Required |
+| `AgentLoop.OMNI` <br>(preview) | • `claude-3-5-sonnet-20240620`<br>• `claude-3-7-sonnet-20250219`<br>• `gpt-4.5-preview`<br>• `gpt-4o`<br>• `gpt-4`<br>• `gpt-3.5-turbo` | Use OmniParser for element pixel-detection (SoM) and any VLMs | OmniParser |
+## AgentResponse
+The `AgentResponse` class represents the structured output returned after each agent turn. It contains the agent's response, reasoning, tool usage, and other metadata. The response format aligns with the new [OpenAI Agent SDK specification](https://platform.openai.com/docs/api-reference/responses) for better consistency across different agent loops.
+```python
+async for result in agent.run(task):
+  print("Response ID: ", result.get("id"))
+  # Print detailed usage information
+  usage = result.get("usage")
+  if usage:
+      print("\nUsage Details:")
+      print(f"  Input Tokens: {usage.get('input_tokens')}")
+      if "input_tokens_details" in usage:
+          print(f"  Input Tokens Details: {usage.get('input_tokens_details')}")
+      print(f"  Output Tokens: {usage.get('output_tokens')}")
+      if "output_tokens_details" in usage:
+          print(f"  Output Tokens Details: {usage.get('output_tokens_details')}")
+      print(f"  Total Tokens: {usage.get('total_tokens')}")
+  print("Response Text: ", result.get("text"))
+  # Print tools information
+  tools = result.get("tools")
+  if tools:
+      print("\nTools:")
+      print(tools)
+  # Print reasoning and tool call outputs
+  outputs = result.get("output", [])
+  for output in outputs:
+      output_type = output.get("type")
+      if output_type == "reasoning":
+          print("\nReasoning Output:")
+          print(output)
+      elif output_type == "computer_call":
+          print("\nTool Call Output:")
+          print(output)
+```

{cua_agent-0.1.6 → cua_agent-0.1.18}/agent/__init__.py RENAMED Viewed

@@ -49,6 +49,7 @@ except Exception as e:
     logger.warning(f"Error initializing telemetry: {e}")
 from .providers.omni.types import LLMProvider, LLM
-from .types.base import AgentLoop
+from .core.factory import AgentLoop
+from .core.agent import ComputerAgent
-__all__ = ["AgentLoop", "LLMProvider", "LLM"]
+__all__ = ["AgentLoop", "LLMProvider", "LLM", "ComputerAgent"]

{cua_agent-0.1.6 → cua_agent-0.1.18}/agent/core/__init__.py RENAMED Viewed

@@ -1,12 +1,7 @@
 """Core agent components."""
-from .loop import BaseLoop
+from .factory import BaseLoop
 from .messages import (
-    create_user_message,
-    create_assistant_message,
-    create_system_message,
-    create_image_message,
-    create_screen_message,
     BaseMessageManager,
     ImageRetentionConfig,
 )

cua_agent-0.1.6/agent/core/computer_agent.py → cua_agent-0.1.18/agent/core/agent.py RENAMED Viewed

@@ -3,31 +3,18 @@
 import asyncio
 import logging
 import os
-from typing import Any, AsyncGenerator, Dict, Optional, cast
-from dataclasses import dataclass
+from typing import AsyncGenerator, Optional
 from computer import Computer
-from ..providers.anthropic.loop import AnthropicLoop
-from ..providers.omni.loop import OmniLoop
-from ..providers.omni.parser import OmniParser
-from ..providers.omni.types import LLMProvider, LLM
+from ..providers.omni.types import LLM
 from .. import AgentLoop
+from .types import AgentResponse
+from .factory import LoopFactory
+from .provider_config import DEFAULT_MODELS, ENV_VARS
 logging.basicConfig(level=logging.INFO)
 logger = logging.getLogger(__name__)
-# Default models for different providers
-DEFAULT_MODELS = {
-    LLMProvider.OPENAI: "gpt-4o",
-    LLMProvider.ANTHROPIC: "claude-3-7-sonnet-20250219",
-}
-# Map providers to their environment variable names
-ENV_VARS = {
-    LLMProvider.OPENAI: "OPENAI_API_KEY",
-    LLMProvider.ANTHROPIC: "ANTHROPIC_API_KEY",
-}
 class ComputerAgent:
     """A computer agent that can perform automated tasks using natural language instructions."""
@@ -44,7 +31,6 @@ class ComputerAgent:
         save_trajectory: bool = True,
         trajectory_dir: str = "trajectories",
         only_n_most_recent_images: Optional[int] = None,
-        parser: Optional[OmniParser] = None,
         verbosity: int = logging.INFO,
     ):
         """Initialize the ComputerAgent.
@@ -61,12 +47,11 @@ class ComputerAgent:
             save_trajectory: Whether to save the trajectory.
             trajectory_dir: Directory to save the trajectory.
             only_n_most_recent_images: Maximum number of recent screenshots to include in API requests.
-            parser: Parser instance for the OmniLoop. Only used if provider is not ANTHROPIC.
             verbosity: Logging level.
         """
         # Basic agent configuration
         self.max_retries = max_retries
-        self.computer = computer or Computer()
+        self.computer = computer
         self.queue = asyncio.Queue()
         self.screenshot_dir = screenshot_dir
         self.log_dir = log_dir
@@ -99,39 +84,30 @@ class ComputerAgent:
                     f"No model specified for provider {self.provider} and no default found"
                 )
-        # Ensure computer is properly cast for typing purposes
-        computer_instance = cast(Computer, self.computer)
         # Get API key from environment if not provided
         actual_api_key = api_key or os.environ.get(ENV_VARS[self.provider], "")
         if not actual_api_key:
             raise ValueError(f"No API key provided for {self.provider}")
-        # Initialize the appropriate loop based on the loop parameter
-        if loop == AgentLoop.ANTHROPIC:
-            self._loop = AnthropicLoop(
-                api_key=actual_api_key,
-                model=actual_model_name,
-                computer=computer_instance,
-                save_trajectory=save_trajectory,
-                base_dir=trajectory_dir,
-                only_n_most_recent_images=only_n_most_recent_images,
-            )
-        else:
-            # Default to OmniLoop for other loop types
-            # Initialize parser if not provided
-            actual_parser = parser or OmniParser()
-            self._loop = OmniLoop(
+        # Create the appropriate loop using the factory
+        try:
+            # Let the factory create the appropriate loop with needed components
+            self._loop = LoopFactory.create_loop(
+                loop_type=loop,
                 provider=self.provider,
+                computer=self.computer,
+                model_name=actual_model_name,
                 api_key=actual_api_key,
-                model=actual_model_name,
-                computer=computer_instance,
                 save_trajectory=save_trajectory,
-                base_dir=trajectory_dir,
+                trajectory_dir=trajectory_dir,
                 only_n_most_recent_images=only_n_most_recent_images,
-                parser=actual_parser,
             )
+        except ValueError as e:
+            logger.error(f"Failed to create loop: {str(e)}")
+            raise
+        # Initialize the message manager from the loop
+        self.message_manager = self._loop.message_manager
         logger.info(
             f"ComputerAgent initialized with provider: {self.provider}, model: {actual_model_name}"
@@ -154,21 +130,6 @@ class ComputerAgent:
             else:
                 logger.info("Computer already initialized, skipping initialization")
-            # Take a test screenshot to verify the computer is working
-            logger.info("Testing computer with a screenshot...")
-            try:
-                test_screenshot = await self.computer.interface.screenshot()
-                # Determine the screenshot size based on its type
-                if isinstance(test_screenshot, (bytes, bytearray, memoryview)):
-                    size = len(test_screenshot)
-                elif hasattr(test_screenshot, "base64_image"):
-                    size = len(test_screenshot.base64_image)
-                else:
-                    size = "unknown"
-                logger.info(f"Screenshot test successful, size: {size}")
-            except Exception as e:
-                logger.error(f"Screenshot test failed: {str(e)}")
-                # Even though screenshot failed, we continue since some tests might not need it
         except Exception as e:
             logger.error(f"Error initializing computer in __aenter__: {str(e)}")
             raise
@@ -201,36 +162,30 @@ class ComputerAgent:
                 await self.computer.run()
             self._initialized = True
-    async def _init_if_needed(self):
-        """Initialize the computer interface if it hasn't been initialized yet."""
-        if not self.computer._initialized:
-            logger.info("Computer not initialized, initializing now...")
-            try:
-                # Call run directly
-                await self.computer.run()
-                logger.info("Computer interface initialized successfully")
-            except Exception as e:
-                logger.error(f"Error initializing computer interface: {str(e)}")
-                raise
-    async def run(self, task: str) -> AsyncGenerator[Dict[str, Any], None]:
+    async def run(self, task: str) -> AsyncGenerator[AgentResponse, None]:
         """Run a task using the computer agent.
         Args:
             task: Task description
         Yields:
-            Task execution updates
+            Agent response format
         """
         try:
             logger.info(f"Running task: {task}")
+            logger.info(
+                f"Message history before task has {len(self.message_manager.messages)} messages"
+            )
             # Initialize the computer if needed
             if not self._initialized:
                 await self.initialize()
-            # Format task as a message
-            messages = [{"role": "user", "content": task}]
+            # Add task as a user message using the message manager
+            self.message_manager.add_user_message([{"type": "text", "text": task}])
+            logger.info(
+                f"Added task message. Message history now has {len(self.message_manager.messages)} messages"
+            )
             # Pass properly formatted messages to the loop
             if self._loop is None:
@@ -239,7 +194,7 @@ class ComputerAgent:
                 return
             # Execute the task and yield results
-            async for result in self._loop.run(messages):
+            async for result in self._loop.run(self.message_manager.messages):
                 yield result
         except Exception as e:

cua-agent 0.1.6__tar.gz → 0.1.18__tar.gz

Potentially problematic release.

cua-agent 0.1.6tar.gz → 0.1.18tar.gz