PyPI - cua-agent - Versions diffs - 0.1.0__tar.gz → 0.1.2__tar.gz - Mend

cua-agent 0.1.0tar.gz → 0.1.2tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Potentially problematic release.

This version of cua-agent might be problematic. Click here for more details.

Files changed (69) hide show

{cua_agent-0.1.0 → cua_agent-0.1.2}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.1
 Name: cua-agent
-Version: 0.1.0
+Version: 0.1.2
 Summary: CUA (Computer Use) Agent for AI-driven computer interaction
 Author-Email: TryCua <gh@trycua.com>
 Requires-Python: <3.13,>=3.10

cua_agent-0.1.2/README.md ADDED Viewed

@@ -0,0 +1,126 @@
+<div align="center">
+<h1>
+  <div class="image-wrapper" style="display: inline-block;">
+    <picture>
+      <source media="(prefers-color-scheme: dark)" alt="logo" height="150" srcset="../../img/logo_white.png" style="display: block; margin: auto;">
+      <source media="(prefers-color-scheme: light)" alt="logo" height="150" srcset="../../img/logo_black.png" style="display: block; margin: auto;">
+      <img alt="Shows my svg">
+    </picture>
+  </div>
+  [![Python](https://img.shields.io/badge/Python-333333?logo=python&logoColor=white&labelColor=333333)](#)
+  [![macOS](https://img.shields.io/badge/macOS-000000?logo=apple&logoColor=F0F0F0)](#)
+  [![Discord](https://img.shields.io/badge/Discord-%235865F2.svg?&logo=discord&logoColor=white)](https://discord.com/invite/mVnXXpdE85)
+  [![PyPI](https://img.shields.io/pypi/v/cua-computer?color=333333)](https://pypi.org/project/cua-computer/)
+</h1>
+</div>
+**Agent** is a Computer Use (CUA) framework for running multi-app agentic workflows targeting macOS and Linux sandbox, supporting local (Ollama) and cloud model providers (OpenAI, Anthropic, Groq, DeepSeek, Qwen). The framework integrates with Microsoft's OmniParser for enhanced UI understanding and interaction.
+### Get started with Agent
+```python
+from agent import ComputerAgent, AgentLoop, LLMProvider
+from computer import Computer
+computer = Computer(verbosity=logging.INFO)
+agent = ComputerAgent(
+    computer=computer,
+    loop=AgentLoop.ANTHROPIC,
+    # loop=AgentLoop.OMNI,
+    model=LLM(provider=LLMProvider.ANTHROPIC, name="claude-3-7-sonnet-20250219"),
+    # model=LLM(provider=LLMProvider.OPENAI, name="gpt-4.5-preview"),
+    save_trajectory=True,
+    trajectory_dir=str(Path("trajectories")),
+    only_n_most_recent_images=3,
+    verbosity=logging.INFO,
+)
+tasks = [
+"""
+Please help me with the following task:
+1. Open Safari browser
+2. Go to Wikipedia.org
+3. Search for "Claude AI"
+4. Summarize the main points you find about Claude AI
+"""
+]
+async with agent:
+    for i, task in enumerate(tasks, 1):
+        print(f"\nExecuting task {i}/{len(tasks)}: {task}")
+        async for result in agent.run(task):
+            print(result)
+        print(f"Task {i} completed")
+```
+## Install
+### cua-agent
+```bash
+pip install cua-agent[all]
+# or install specific loop providers
+pip install cua-agent[anthropic]
+pip install cua-agent[omni]
+```
+## Features
+### OmniParser Integration
+- Enhanced UI understanding with element detection
+- Automatic bounding box detection for UI elements
+- Improved accuracy for complex UI interactions
+- Support for icon and text element recognition
+### Basic Computer Control
+- Direct keyboard and mouse control
+- Window and application management
+- Screenshot capabilities
+- Basic UI element detection
+### Provider Support
+- OpenAI (GPT-4V) - Recommended for OmniParser integration
+- Anthropic (Claude) - Strong general performance
+- Groq - Fast inference with Llama models
+- DeepSeek - Alternative model provider
+- Qwen - Alibaba's multimodal model
+## Run
+Refer to these notebooks for step-by-step guides on how to use the Computer-Use Agent (CUA):
+- [Agent Notebook](../../notebooks/agent_nb.ipynb) - Complete examples and workflows
+## Components
+The library consists of several components:
+- **Core**
+  - `ComputerAgent`: Unified agent class supporting multiple loop types
+  - `BaseComputerAgent`: Abstract base class for computer agents
+- **Providers**
+  - `Anthropic`: Implementation for Anthropic Claude models
+  - `Omni`: Implementation for multiple providers (OpenAI, Groq, etc.)
+- **Loops**
+  - `AnthropicLoop`: Loop implementation for Anthropic
+  - `OmniLoop`: Generic loop supporting multiple providers
+## Configuration
+The agent can be configured with various parameters:
+- **loop_type**: The type of loop to use (ANTHROPIC or OMNI)
+- **provider**: AI provider to use with the loop
+- **model**: The AI model to use
+- **save_trajectory**: Whether to save screenshots and logs
+- **only_n_most_recent_images**: Only keep a specific number of recent images
+See the [Core README](./agent/core/README.md) for more details on the unified agent.

cua_agent-0.1.2/agent/__init__.py ADDED Viewed

@@ -0,0 +1,10 @@
+"""CUA (Computer Use) Agent for AI-driven computer interaction."""
+__version__ = "0.1.0"
+from .core.factory import AgentFactory
+from .core.agent import ComputerAgent
+from .providers.omni.types import LLMProvider, LLM
+from .types.base import Provider, AgentLoop
+__all__ = ["AgentFactory", "Provider", "ComputerAgent", "AgentLoop", "LLMProvider", "LLM"]

{cua_agent-0.1.0 → cua_agent-0.1.2}/agent/core/README.md RENAMED Viewed

@@ -34,7 +34,7 @@ Here's how to use the unified ComputerAgent:
 ```python
 from agent.core.agent import ComputerAgent
 from agent.types.base import AgenticLoop
-from agent.providers.omni.types import APIProvider
+from agent.providers.omni.types import LLMProvider
 from computer import Computer
 # Create a Computer instance
@@ -44,7 +44,7 @@ computer = Computer()
 agent = ComputerAgent(
     computer=computer,
     loop_type=AgenticLoop.OMNI,
-    provider=APIProvider.OPENAI,
+    provider=LLMProvider.OPENAI,
     model="gpt-4o",
     api_key="your_api_key_here",  # Can also use OPENAI_API_KEY environment variable
     save_trajectory=True,

{cua_agent-0.1.0 → cua_agent-0.1.2}/agent/core/agent.py RENAMED Viewed

@@ -3,12 +3,12 @@
 import os
 import logging
 import asyncio
-from typing import Any, AsyncGenerator, Dict, List, Optional, TYPE_CHECKING
+from typing import Any, AsyncGenerator, Dict, List, Optional, TYPE_CHECKING, Union, cast
 from datetime import datetime
 from computer import Computer
-from ..types.base import Provider, AgenticLoop
+from ..types.base import Provider, AgentLoop
 from .base_agent import BaseComputerAgent
 # Only import types for type checking to avoid circular imports
@@ -17,23 +17,23 @@ if TYPE_CHECKING:
     from ..providers.omni.loop import OmniLoop
     from ..providers.omni.parser import OmniParser
-# Import the APIProvider enum without importing the whole module
-from ..providers.omni.types import APIProvider
+# Import the provider types
+from ..providers.omni.types import LLMProvider, LLM, Model, LLMModel
 logger = logging.getLogger(__name__)
 # Default models for different providers
 DEFAULT_MODELS = {
-    APIProvider.OPENAI: "gpt-4o",
-    APIProvider.ANTHROPIC: "claude-3-7-sonnet-20250219",
-    APIProvider.GROQ: "llama3-70b-8192",
+    LLMProvider.OPENAI: "gpt-4o",
+    LLMProvider.ANTHROPIC: "claude-3-7-sonnet-20250219",
+    LLMProvider.GROQ: "llama3-70b-8192",
 }
 # Map providers to their environment variable names
 ENV_VARS = {
-    APIProvider.OPENAI: "OPENAI_API_KEY",
-    APIProvider.GROQ: "GROQ_API_KEY",
-    APIProvider.ANTHROPIC: "ANTHROPIC_API_KEY",
+    LLMProvider.OPENAI: "OPENAI_API_KEY",
+    LLMProvider.GROQ: "GROQ_API_KEY",
+    LLMProvider.ANTHROPIC: "ANTHROPIC_API_KEY",
 }
@@ -47,10 +47,9 @@ class ComputerAgent(BaseComputerAgent):
     def __init__(
         self,
         computer: Computer,
-        loop_type: AgenticLoop = AgenticLoop.OMNI,
-        ai_provider: APIProvider = APIProvider.OPENAI,
+        loop: AgentLoop = AgentLoop.OMNI,
+        model: Optional[Union[LLM, Dict[str, str], str]] = None,
         api_key: Optional[str] = None,
-        model: Optional[str] = None,
         save_trajectory: bool = True,
         trajectory_dir: Optional[str] = "trajectories",
         only_n_most_recent_images: Optional[int] = None,
@@ -62,10 +61,13 @@ class ComputerAgent(BaseComputerAgent):
         Args:
             computer: Computer instance to control
-            loop_type: The type of loop to use (Anthropic or Omni)
-            ai_provider: AI provider to use (required for Cua loop)
+            loop: The type of loop to use (Anthropic or Omni)
+            model: LLM configuration. Can be:
+                  - LLM object with provider and name
+                  - Dict with 'provider' and 'name' keys
+                  - String with model name (defaults to OpenAI provider)
+                  - None (defaults based on loop)
             api_key: Optional API key (will use environment variable if not provided)
-            model: Optional model name (will use provider default if not specified)
             save_trajectory: Whether to save screenshots and logs
             trajectory_dir: Directory to save trajectories (defaults to "trajectories")
             only_n_most_recent_images: Limit history to N most recent images
@@ -87,8 +89,7 @@ class ComputerAgent(BaseComputerAgent):
             **kwargs,
         )
-        self.loop_type = loop_type
-        self.provider = ai_provider
+        self.loop_type = loop
         self.save_trajectory = save_trajectory
         self.trajectory_dir = trajectory_dir
         self.only_n_most_recent_images = only_n_most_recent_images
@@ -98,14 +99,19 @@ class ComputerAgent(BaseComputerAgent):
         # Configure logging based on verbosity
         self._configure_logging(verbosity)
+        # Process model configuration
+        self.model_config = self._process_model_config(model, loop)
         # Get API key from environment if not provided
         if api_key is None:
             env_var = (
-                ENV_VARS.get(ai_provider) if loop_type == AgenticLoop.OMNI else "ANTHROPIC_API_KEY"
+                ENV_VARS.get(self.model_config.provider)
+                if loop == AgentLoop.OMNI
+                else "ANTHROPIC_API_KEY"
             )
             if not env_var:
                 raise ValueError(
-                    f"Unsupported provider: {ai_provider}. Please use one of: {list(ENV_VARS.keys())}"
+                    f"Unsupported provider: {self.model_config.provider}. Please use one of: {list(ENV_VARS.keys())}"
                 )
             api_key = os.environ.get(env_var)
@@ -119,18 +125,49 @@ class ComputerAgent(BaseComputerAgent):
                 )
         self.api_key = api_key
-        # Set model based on provider if not specified
-        if model is None:
-            if loop_type == AgenticLoop.OMNI:
-                self.model = DEFAULT_MODELS[ai_provider]
-            else:  # Anthropic loop
-                self.model = DEFAULT_MODELS[APIProvider.ANTHROPIC]
-        else:
-            self.model = model
         # Initialize the appropriate loop based on loop_type
         self.loop = self._init_loop()
+    def _process_model_config(
+        self, model_input: Optional[Union[LLM, Dict[str, str], str]], loop: AgentLoop
+    ) -> LLM:
+        """Process and normalize model configuration.
+        Args:
+            model_input: Input model configuration (LLM, dict, string, or None)
+            loop: The loop type being used
+        Returns:
+            Normalized LLM instance
+        """
+        # Handle case where model_input is None
+        if model_input is None:
+            # Use Anthropic for Anthropic loop, OpenAI for Omni loop
+            default_provider = (
+                LLMProvider.ANTHROPIC if loop == AgentLoop.ANTHROPIC else LLMProvider.OPENAI
+            )
+            return LLM(provider=default_provider)
+        # Handle case where model_input is already a LLM or one of its aliases
+        if isinstance(model_input, (LLM, Model, LLMModel)):
+            return model_input
+        # Handle case where model_input is a dict
+        if isinstance(model_input, dict):
+            provider = model_input.get("provider", LLMProvider.OPENAI)
+            if isinstance(provider, str):
+                provider = LLMProvider(provider)
+            return LLM(provider=provider, name=model_input.get("name"))
+        # Handle case where model_input is a string (model name)
+        if isinstance(model_input, str):
+            default_provider = (
+                LLMProvider.ANTHROPIC if loop == AgentLoop.ANTHROPIC else LLMProvider.OPENAI
+            )
+            return LLM(provider=default_provider, name=model_input)
+        raise ValueError(f"Unsupported model configuration: {model_input}")
     def _configure_logging(self, verbosity: int):
         """Configure logging based on verbosity level."""
         # Use the logging level directly without mapping
@@ -159,12 +196,15 @@ class ComputerAgent(BaseComputerAgent):
         from ..providers.omni.loop import OmniLoop
         from ..providers.omni.parser import OmniParser
-        if self.loop_type == AgenticLoop.ANTHROPIC:
+        if self.loop_type == AgentLoop.ANTHROPIC:
             from ..providers.anthropic.loop import AnthropicLoop
+            # Ensure we always have a valid model name
+            model_name = self.model_config.name or DEFAULT_MODELS[LLMProvider.ANTHROPIC]
             return AnthropicLoop(
                 api_key=self.api_key,
-                model=self.model,
+                model=model_name,
                 computer=self.computer,
                 save_trajectory=self.save_trajectory,
                 base_dir=self.trajectory_dir,
@@ -176,10 +216,13 @@ class ComputerAgent(BaseComputerAgent):
         if "parser" not in self._kwargs:
             self._kwargs["parser"] = OmniParser()
+        # Ensure we always have a valid model name
+        model_name = self.model_config.name or DEFAULT_MODELS[self.model_config.provider]
         return OmniLoop(
-            provider=self.provider,
+            provider=self.model_config.provider,
             api_key=self.api_key,
-            model=self.model,
+            model=model_name,
             computer=self.computer,
             save_trajectory=self.save_trajectory,
             base_dir=self.trajectory_dir,
@@ -198,7 +241,7 @@ class ComputerAgent(BaseComputerAgent):
         """
         try:
             # Format the messages based on loop type
-            if self.loop_type == AgenticLoop.ANTHROPIC:
+            if self.loop_type == AgentLoop.ANTHROPIC:
                 # Anthropic format
                 messages = [{"role": "user", "content": [{"type": "text", "text": task}]}]
             else:
@@ -221,7 +264,7 @@ class ComputerAgent(BaseComputerAgent):
                         continue
                     # Extract content and metadata based on loop type
-                    if self.loop_type == AgenticLoop.ANTHROPIC:
+                    if self.loop_type == AgentLoop.ANTHROPIC:
                         # Handle Anthropic format
                         if "content" in result:
                             content_text = ""

{cua_agent-0.1.0 → cua_agent-0.1.2}/agent/core/messages.py RENAMED Viewed

@@ -37,6 +37,17 @@ class BaseMessageManager:
         if self.image_retention_config.min_removal_threshold < 1:
             raise ValueError("min_removal_threshold must be at least 1")
+        # Track provider for message formatting
+        self.provider = "openai"  # Default provider
+    def set_provider(self, provider: str) -> None:
+        """Set the current provider to format messages for.
+        Args:
+            provider: Provider name (e.g., 'openai', 'anthropic')
+        """
+        self.provider = provider.lower()
     def prepare_messages(self, messages: List[Dict[str, Any]]) -> List[Dict[str, Any]]:
         """Prepare messages by applying image retention and caching as configured.
@@ -96,6 +107,10 @@ class BaseMessageManager:
         Args:
             messages: Messages to inject caching into
         """
+        # Only apply cache_control for Anthropic API, not OpenAI
+        if self.provider != "anthropic":
+            return
         # Default to caching last 3 turns
         turns_to_cache = 3
         for message in reversed(messages):

{cua_agent-0.1.0 → cua_agent-0.1.2}/agent/providers/anthropic/__init__.py RENAMED Viewed

@@ -1,6 +1,6 @@
 """Anthropic provider implementation."""
 from .loop import AnthropicLoop
-from .types import APIProvider
+from .types import LLMProvider
-__all__ = ["AnthropicLoop", "APIProvider"]
+__all__ = ["AnthropicLoop", "LLMProvider"]

{cua_agent-0.1.0 → cua_agent-0.1.2}/agent/providers/anthropic/api/client.py RENAMED Viewed

@@ -3,25 +3,28 @@ import httpx
 import asyncio
 from anthropic import Anthropic, AnthropicBedrock, AnthropicVertex
 from anthropic.types.beta import BetaMessage, BetaMessageParam, BetaToolUnionParam
-from ..types import APIProvider
+from ..types import LLMProvider
 from .logging import log_api_interaction
 import random
 import logging
 logger = logging.getLogger(__name__)
 class APIConnectionError(Exception):
     """Error raised when there are connection issues with the API."""
     pass
 class BaseAnthropicClient:
     """Base class for Anthropic API clients."""
     MAX_RETRIES = 10
     INITIAL_RETRY_DELAY = 1.0
     MAX_RETRY_DELAY = 60.0
     JITTER_FACTOR = 0.1
     async def create_message(
         self,
         *,
@@ -36,79 +39,67 @@ class BaseAnthropicClient:
     async def _make_api_call_with_retries(self, api_call):
         """Make an API call with exponential backoff retry logic.
         Args:
             api_call: Async function that makes the actual API call
         Returns:
             API response
         Raises:
             APIConnectionError: If all retries fail
         """
         retry_count = 0
         last_error = None
         while retry_count < self.MAX_RETRIES:
             try:
                 return await api_call()
             except Exception as e:
                 last_error = e
                 retry_count += 1
                 if retry_count == self.MAX_RETRIES:
                     break
                 # Calculate delay with exponential backoff and jitter
                 delay = min(
-                    self.INITIAL_RETRY_DELAY * (2 ** (retry_count - 1)),
-                    self.MAX_RETRY_DELAY
+                    self.INITIAL_RETRY_DELAY * (2 ** (retry_count - 1)), self.MAX_RETRY_DELAY
                 )
                 # Add jitter to avoid thundering herd
                 jitter = delay * self.JITTER_FACTOR * (2 * random.random() - 1)
                 final_delay = delay + jitter
                 logger.info(
                     f"Retrying request (attempt {retry_count}/{self.MAX_RETRIES}) "
                     f"in {final_delay:.2f} seconds after error: {str(e)}"
                 )
                 await asyncio.sleep(final_delay)
         raise APIConnectionError(
-            f"Failed after {self.MAX_RETRIES} retries. "
-            f"Last error: {str(last_error)}"
+            f"Failed after {self.MAX_RETRIES} retries. " f"Last error: {str(last_error)}"
         )
 class AnthropicDirectClient(BaseAnthropicClient):
     """Direct Anthropic API client implementation."""
     def __init__(self, api_key: str, model: str):
         self.model = model
-        self.client = Anthropic(
-            api_key=api_key,
-            http_client=self._create_http_client()
-        )
+        self.client = Anthropic(api_key=api_key, http_client=self._create_http_client())
     def _create_http_client(self) -> httpx.Client:
         """Create an HTTP client with appropriate settings."""
         return httpx.Client(
             verify=True,
-            timeout=httpx.Timeout(
-                connect=30.0,
-                read=300.0,
-                write=30.0,
-                pool=30.0
-            ),
+            timeout=httpx.Timeout(connect=30.0, read=300.0, write=30.0, pool=30.0),
             transport=httpx.HTTPTransport(
                 retries=3,
                 verify=True,
-                limits=httpx.Limits(
-                    max_keepalive_connections=5,
-                    max_connections=10
-                )
-            )
+                limits=httpx.Limits(max_keepalive_connections=5, max_connections=10),
+            ),
         )
     async def create_message(
         self,
         *,
@@ -119,6 +110,7 @@ class AnthropicDirectClient(BaseAnthropicClient):
         betas: list[str],
     ) -> BetaMessage:
         """Create a message using the direct Anthropic API with retry logic."""
         async def api_call():
             response = self.client.beta.messages.with_raw_response.create(
                 max_tokens=max_tokens,
@@ -130,20 +122,21 @@ class AnthropicDirectClient(BaseAnthropicClient):
             )
             log_api_interaction(response.http_response.request, response.http_response, None)
             return response.parse()
         try:
             return await self._make_api_call_with_retries(api_call)
         except Exception as e:
             log_api_interaction(None, None, e)
             raise
 class AnthropicVertexClient(BaseAnthropicClient):
     """Google Cloud Vertex AI implementation of Anthropic client."""
     def __init__(self, model: str):
         self.model = model
         self.client = AnthropicVertex()
     async def create_message(
         self,
         *,
@@ -154,6 +147,7 @@ class AnthropicVertexClient(BaseAnthropicClient):
         betas: list[str],
     ) -> BetaMessage:
         """Create a message using Vertex AI with retry logic."""
         async def api_call():
             response = self.client.beta.messages.with_raw_response.create(
                 max_tokens=max_tokens,
@@ -165,20 +159,21 @@ class AnthropicVertexClient(BaseAnthropicClient):
             )
             log_api_interaction(response.http_response.request, response.http_response, None)
             return response.parse()
         try:
             return await self._make_api_call_with_retries(api_call)
         except Exception as e:
             log_api_interaction(None, None, e)
             raise
 class AnthropicBedrockClient(BaseAnthropicClient):
     """AWS Bedrock implementation of Anthropic client."""
     def __init__(self, model: str):
         self.model = model
         self.client = AnthropicBedrock()
     async def create_message(
         self,
         *,
@@ -189,6 +184,7 @@ class AnthropicBedrockClient(BaseAnthropicClient):
         betas: list[str],
     ) -> BetaMessage:
         """Create a message using AWS Bedrock with retry logic."""
         async def api_call():
             response = self.client.beta.messages.with_raw_response.create(
                 max_tokens=max_tokens,
@@ -200,23 +196,24 @@ class AnthropicBedrockClient(BaseAnthropicClient):
             )
             log_api_interaction(response.http_response.request, response.http_response, None)
             return response.parse()
         try:
             return await self._make_api_call_with_retries(api_call)
         except Exception as e:
             log_api_interaction(None, None, e)
             raise
 class AnthropicClientFactory:
     """Factory for creating appropriate Anthropic client implementations."""
     @staticmethod
-    def create_client(provider: APIProvider, api_key: str, model: str) -> BaseAnthropicClient:
+    def create_client(provider: LLMProvider, api_key: str, model: str) -> BaseAnthropicClient:
         """Create an appropriate client based on the provider."""
-        if provider == APIProvider.ANTHROPIC:
+        if provider == LLMProvider.ANTHROPIC:
             return AnthropicDirectClient(api_key, model)
-        elif provider == APIProvider.VERTEX:
+        elif provider == LLMProvider.VERTEX:
             return AnthropicVertexClient(model)
-        elif provider == APIProvider.BEDROCK:
+        elif provider == LLMProvider.BEDROCK:
             return AnthropicBedrockClient(model)
-        raise ValueError(f"Unsupported provider: {provider}")
+        raise ValueError(f"Unsupported provider: {provider}")

{cua_agent-0.1.0 → cua_agent-0.1.2}/agent/providers/anthropic/loop.py RENAMED Viewed

@@ -32,7 +32,7 @@ from .tools.manager import ToolManager
 from .messages.manager import MessageManager
 from .callbacks.manager import CallbackManager
 from .prompts import SYSTEM_PROMPT
-from .types import APIProvider
+from .types import LLMProvider
 from .tools import ToolResult
 # Constants
@@ -86,7 +86,7 @@ class AnthropicLoop(BaseLoop):
         self.model = "claude-3-7-sonnet-20250219"
         # Anthropic-specific attributes
-        self.provider = APIProvider.ANTHROPIC
+        self.provider = LLMProvider.ANTHROPIC
         self.client = None
         self.retry_count = 0
         self.tool_manager = None

cua-agent 0.1.0__tar.gz → 0.1.2__tar.gz

Potentially problematic release.

cua-agent 0.1.0tar.gz → 0.1.2tar.gz