PyPI - cua-mcp-server - Versions diffs - 0.1.7__py3-none-any.whl → 0.1.9__py3-none-any.whl - Mend

cua-mcp-server 0.1.7py3-none-any.whl → 0.1.9py3-none-any.whl

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Potentially problematic release.

This version of cua-mcp-server might be problematic. Click here for more details.

Files changed (6) hide show

{cua_mcp_server-0.1.7.dist-info → cua_mcp_server-0.1.9.dist-info}/METADATA RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.1
 Name: cua-mcp-server
-Version: 0.1.7
+Version: 0.1.9
 Summary: MCP Server for Computer-Use Agent (CUA)
 Author-Email: TryCua <gh@trycua.com>
 Requires-Python: >=3.10
@@ -29,6 +29,17 @@ Description-Content-Type: text/markdown
 **cua-mcp-server** is a MCP server for the Computer-Use Agent (CUA), allowing you to run CUA through Claude Desktop or other MCP clients.
 ### Get started with Agent
+## Prerequisites
+Before installing the MCP server, you'll need to set up the full Computer-Use Agent capabilities as described in [Option 2 of the main README](../../README.md#option-2-full-computer-use-agent-capabilities). This includes:
+1. Installing the Lume CLI
+2. Pulling the latest macOS CUA image
+3. Starting the Lume daemon service
+4. Installing the required Python libraries (Optional: only needed if you want to verify the agent is working before installing MCP server)
+Make sure these steps are completed and working before proceeding with the MCP server installation.
 ## Installation
 Install the package from PyPI:
@@ -68,13 +79,51 @@ You can then use the script in your MCP configuration like this:
         "CUA_AGENT_LOOP": "OMNI",
         "CUA_MODEL_PROVIDER": "ANTHROPIC",
         "CUA_MODEL_NAME": "claude-3-7-sonnet-20250219",
-        "ANTHROPIC_API_KEY": "your-api-key"
+        "CUA_PROVIDER_API_KEY": "your-api-key"
+      }
+    }
+  }
+}
+```
+## Development Guide
+If you want to develop with the cua-mcp-server directly without installation, you can use this configuration:
+```json
+{
+  "mcpServers": {
+    "cua-agent": {
+      "command": "/bin/bash",
+      "args": ["~/cua/libs/mcp-server/scripts/start_mcp_server.sh"],
+      "env": {
+        "CUA_AGENT_LOOP": "UITARS",
+        "CUA_MODEL_PROVIDER": "OAICOMPAT",
+        "CUA_MODEL_NAME": "ByteDance-Seed/UI-TARS-1.5-7B",
+        "CUA_PROVIDER_BASE_URL": "https://****************.us-east-1.aws.endpoints.huggingface.cloud/v1",
+        "CUA_PROVIDER_API_KEY": "your-api-key"
       }
     }
   }
 }
 ```
+This configuration:
+- Uses the start_mcp_server.sh script which automatically sets up the Python path and runs the server module
+- Works with Claude Desktop, Cursor, or any other MCP client
+- Automatically uses your development code without requiring installation
+Just add this to your MCP client's configuration and it will use your local development version of the server.
+### Troubleshooting
+If you get a `/bin/bash: ~/cua/libs/mcp-server/scripts/start_mcp_server.sh: No such file or directory` error, try changing the path to the script to be absolute instead of relative.
+To see the logs:
+```
+tail -n 20 -f ~/Library/Logs/Claude/mcp*.log
+```
 ## Claude Desktop Integration
 To use with Claude Desktop, add an entry to your Claude Desktop configuration (`claude_desktop_config.json`, typically found in `~/.config/claude-desktop/`):
@@ -104,7 +153,7 @@ The server is configured using environment variables (can be set in the Claude D
 | Variable | Description | Default |
 |----------|-------------|---------|
-| `CUA_AGENT_LOOP` | Agent loop to use (OPENAI, ANTHROPIC, OMNI) | OMNI |
+| `CUA_AGENT_LOOP` | Agent loop to use (OPENAI, ANTHROPIC, UITARS, OMNI) | OMNI |
 | `CUA_MODEL_PROVIDER` | Model provider (ANTHROPIC, OPENAI, OLLAMA, OAICOMPAT) | ANTHROPIC |
 | `CUA_MODEL_NAME` | Model name to use | None (provider default) |
 | `CUA_PROVIDER_BASE_URL` | Base URL for provider API | None |

cua_mcp_server-0.1.9.dist-info/RECORD ADDED Viewed

@@ -0,0 +1,7 @@
+cua_mcp_server-0.1.9.dist-info/METADATA,sha256=Bw2ET7kbetLRmVqVkyWwZc91NoDSXjn9qNic6pS7T7I,6668
+cua_mcp_server-0.1.9.dist-info/WHEEL,sha256=tSfRZzRHthuv7vxpI4aehrdN9scLjk-dCJkPLzkHxGg,90
+cua_mcp_server-0.1.9.dist-info/entry_points.txt,sha256=Y3uEunDRfoc-RUDS3HnD942RCxYKquiyk-2HRSqphoc,74
+mcp_server/__init__.py,sha256=G5Bps3KxzYfH79B1TDVQI9vbzjamC_mdgi7GJMgbVcA,575
+mcp_server/__main__.py,sha256=BE2ManEiNpz56nqc7Z_asNjQ6TPtvyu5AbWbyJFePnM,132
+mcp_server/server.py,sha256=nV0aNGymSUB1BjwVzS1snUH2phfbVrP3Bl_P_Y4HWII,7907
+cua_mcp_server-0.1.9.dist-info/RECORD,,

mcp_server/server.py CHANGED Viewed

@@ -1,9 +1,10 @@
 import asyncio
+import base64
 import logging
 import os
 import sys
 import traceback
-from typing import Any, Dict, List, Optional, Union
+from typing import Any, Dict, List, Optional, Union, Tuple
 # Configure logging to output to stderr for debug visibility
 logging.basicConfig(
@@ -17,7 +18,7 @@ logger = logging.getLogger("mcp-server")
 logger.debug("MCP Server module loading...")
 try:
-    from mcp.server.fastmcp import Context, FastMCP
+    from mcp.server.fastmcp import Context, FastMCP, Image
     logger.debug("Successfully imported FastMCP")
 except ImportError as e:
@@ -49,16 +50,37 @@ def serve() -> FastMCP:
     server = FastMCP("cua-agent")
     @server.tool()
-    async def run_cua_task(ctx: Context, task: str) -> str:
+    async def screenshot_cua(ctx: Context) -> Image:
         """
-        Run a Computer-Use Agent (CUA) task and return the results.
+        Take a screenshot of the current MacOS VM screen and return the image. Use this before running a CUA task to get a snapshot of the current state.
+        Args:
+            ctx: The MCP context
+        Returns:
+            An image resource containing the screenshot
+        """
+        global global_computer
+        if global_computer is None:
+            global_computer = Computer(verbosity=logging.INFO)
+            await global_computer.run()
+        screenshot = await global_computer.interface.screenshot()
+        return Image(
+            format="png",
+            data=screenshot
+        )
+    @server.tool()
+    async def run_cua_task(ctx: Context, task: str) -> Tuple[str, Image]:
+        """
+        Run a Computer-Use Agent (CUA) task in a MacOS VM and return the results.
         Args:
             ctx: The MCP context
             task: The instruction or task for the agent to perform
         Returns:
-            A string containing the agent's response
+            A tuple containing the agent's response and the final screenshot
         """
         global global_computer
@@ -72,12 +94,7 @@ def serve() -> FastMCP:
             # Determine which loop to use
             loop_str = os.getenv("CUA_AGENT_LOOP", "OMNI")
-            if loop_str == "OPENAI":
-                loop = AgentLoop.OPENAI
-            elif loop_str == "ANTHROPIC":
-                loop = AgentLoop.ANTHROPIC
-            else:
-                loop = AgentLoop.OMNI
+            loop = getattr(AgentLoop, loop_str)
             # Determine provider
             provider_str = os.getenv("CUA_MODEL_PROVIDER", "ANTHROPIC")
@@ -89,6 +106,9 @@ def serve() -> FastMCP:
             # Get base URL for provider (if needed)
             provider_base_url = os.getenv("CUA_PROVIDER_BASE_URL", None)
+            # Get api key for provider (if needed)
+            api_key = os.getenv("CUA_PROVIDER_API_KEY", None)
             # Create agent with the specified configuration
             agent = ComputerAgent(
                 computer=global_computer,
@@ -98,6 +118,7 @@ def serve() -> FastMCP:
                     name=model_name,
                     provider_base_url=provider_base_url,
                 ),
+                api_key=api_key,
                 save_trajectory=False,
                 only_n_most_recent_images=int(os.getenv("CUA_MAX_IMAGES", "3")),
                 verbosity=logging.INFO,
@@ -107,33 +128,34 @@ def serve() -> FastMCP:
             full_result = ""
             async for result in agent.run(task):
                 logger.info(f"Agent step complete: {result.get('id', 'unknown')}")
+                ctx.info(f"Agent step complete: {result.get('id', 'unknown')}")
                 # Add response ID to output
                 full_result += f"\n[Response ID: {result.get('id', 'unknown')}]\n"
-                # Extract and concatenate text responses
-                if "text" in result:
-                    # Handle both string and dict responses
-                    text_response = result.get("text", "")
-                    if isinstance(text_response, str):
-                        full_result += f"Response: {text_response}\n"
-                    else:
-                        # If it's a dict or other structure, convert to string representation
-                        full_result += f"Response: {str(text_response)}\n"
-                # Log detailed information
-                if "tools" in result:
-                    tools_info = result.get("tools")
-                    logger.debug(f"Tools used: {tools_info}")
-                    full_result += f"\nTools used: {tools_info}\n"
+                if "content" in result:
+                    full_result += f"Response: {result.get('content', '')}\n"
                 # Process output if available
                 outputs = result.get("output", [])
                 for output in outputs:
                     output_type = output.get("type")
-                    if output_type == "reasoning":
+                    if output_type == "message":
+                        logger.debug(f"Message: {output}")
+                        content = output.get("content", [])
+                        for content_part in content:
+                            if content_part.get("text"):
+                                full_result += f"\nMessage: {content_part.get('text', '')}\n"
+                    elif output_type == "reasoning":
                         logger.debug(f"Reasoning: {output}")
-                        full_result += f"\nReasoning: {output.get('content', '')}\n"
+                        summary_content = output.get("summary", [])
+                        if summary_content:
+                            for summary_part in summary_content:
+                                if summary_part.get("text"):
+                                    full_result += f"\nReasoning: {summary_part.get('text', '')}\n"
+                        else:
+                            full_result += f"\nReasoning: {output.get('text', output.get('content', ''))}\n"
                     elif output_type == "computer_call":
                         logger.debug(f"Computer call: {output}")
                         action = output.get("action", "")
@@ -144,17 +166,25 @@ def serve() -> FastMCP:
                 full_result += "\n" + "-" * 40 + "\n"
             logger.info(f"CUA task completed successfully")
-            return full_result or "Task completed with no text output."
+            ctx.info(f"CUA task completed successfully")
+            return (
+                full_result or "Task completed with no text output.",
+                Image(
+                    format="png",
+                    data=await global_computer.interface.screenshot()
+                )
+            )
         except Exception as e:
             error_msg = f"Error running CUA task: {str(e)}\n{traceback.format_exc()}"
             logger.error(error_msg)
+            ctx.error(error_msg)
             return f"Error during task execution: {str(e)}"
     @server.tool()
-    async def run_multi_cua_tasks(ctx: Context, tasks: List[str]) -> str:
+    async def run_multi_cua_tasks(ctx: Context, tasks: List[str]) -> List:
         """
-        Run multiple CUA tasks in sequence and return the combined results.
+        Run multiple CUA tasks in a MacOS VM in sequence and return the combined results.
         Args:
             ctx: The MCP context
@@ -164,13 +194,15 @@ def serve() -> FastMCP:
             Combined results from all tasks
         """
         results = []
         for i, task in enumerate(tasks):
             logger.info(f"Running task {i+1}/{len(tasks)}: {task}")
-            result = await run_cua_task(ctx, task)
-            results.append(f"Task {i+1}: {task}\nResult: {result}\n")
-        return "\n".join(results)
+            ctx.info(f"Running task {i+1}/{len(tasks)}: {task}")
+            ctx.report_progress(i / len(tasks))
+            results.extend(await run_cua_task(ctx, task))
+            ctx.report_progress((i + 1) / len(tasks))
+        return results
     return server

cua_mcp_server-0.1.7.dist-info/RECORD DELETED Viewed

@@ -1,7 +0,0 @@
-cua_mcp_server-0.1.7.dist-info/METADATA,sha256=tYO69KlAhGJJBgR3hs3qIggKUnqF-fhebU1Aghgnh-Q,4811
-cua_mcp_server-0.1.7.dist-info/WHEEL,sha256=tSfRZzRHthuv7vxpI4aehrdN9scLjk-dCJkPLzkHxGg,90
-cua_mcp_server-0.1.7.dist-info/entry_points.txt,sha256=Y3uEunDRfoc-RUDS3HnD942RCxYKquiyk-2HRSqphoc,74
-mcp_server/__init__.py,sha256=G5Bps3KxzYfH79B1TDVQI9vbzjamC_mdgi7GJMgbVcA,575
-mcp_server/__main__.py,sha256=BE2ManEiNpz56nqc7Z_asNjQ6TPtvyu5AbWbyJFePnM,132
-mcp_server/server.py,sha256=RdM0kytzt8uF-vbqPXQ3oay-jtGhum4k_Z0jTDZmfoc,6547
-cua_mcp_server-0.1.7.dist-info/RECORD,,

{cua_mcp_server-0.1.7.dist-info → cua_mcp_server-0.1.9.dist-info}/WHEEL RENAMED Viewed

File without changes

{cua_mcp_server-0.1.7.dist-info → cua_mcp_server-0.1.9.dist-info}/entry_points.txt RENAMED Viewed

File without changes

cua-mcp-server 0.1.7__py3-none-any.whl → 0.1.9__py3-none-any.whl

Potentially problematic release.

cua-mcp-server 0.1.7py3-none-any.whl → 0.1.9py3-none-any.whl