PyPI - hud-python - Versions diffs - 0.4.35__tar.gz → 0.4.36__tar.gz - Mend

hud-python 0.4.35tar.gz → 0.4.36tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Potentially problematic release.

This version of hud-python might be problematic. Click here for more details.

Files changed (235) hide show

{hud_python-0.4.35 → hud_python-0.4.36}/.gitignore RENAMED Viewed

@@ -22,7 +22,6 @@ uv.lock
 # Test files
 /*.ipynb
-test.json
 TODO.md
 .coverage

{hud_python-0.4.35 → hud_python-0.4.36}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: hud-python
-Version: 0.4.35
+Version: 0.4.36
 Summary: SDK for the HUD platform.
 Project-URL: Homepage, https://github.com/hud-evals/hud-python
 Project-URL: Bug Tracker, https://github.com/hud-evals/hud-python/issues
@@ -40,7 +40,7 @@ Requires-Dist: datasets>=2.14.0
 Requires-Dist: httpx<1,>=0.23.0
 Requires-Dist: hud-fastmcp-python-sdk>=0.1.2
 Requires-Dist: hud-mcp-python-sdk>=3.13.2
-Requires-Dist: hud-mcp-use-python-sdk>=2.3.16
+Requires-Dist: hud-mcp-use-python-sdk==2.3.19
 Requires-Dist: numpy>=1.24.0
 Requires-Dist: openai
 Requires-Dist: opentelemetry-api>=1.34.1
@@ -50,8 +50,8 @@ Requires-Dist: opentelemetry-sdk>=1.34.1
 Requires-Dist: pathspec>=0.12.1
 Requires-Dist: pillow>=11.1.0
 Requires-Dist: prompt-toolkit==3.0.51
-Requires-Dist: pydantic-settings<3,>=2
-Requires-Dist: pydantic<3,>=2
+Requires-Dist: pydantic-settings<3,>=2.2
+Requires-Dist: pydantic<3,>=2.6
 Requires-Dist: questionary==2.1.0
 Requires-Dist: rich>=13.0.0
 Requires-Dist: toml>=0.10.2
@@ -59,7 +59,9 @@ Requires-Dist: typer>=0.9.0
 Requires-Dist: watchfiles>=0.21.0
 Requires-Dist: wrapt>=1.14.0
 Provides-Extra: agent
+Requires-Dist: aiodocker>=0.24.0; extra == 'agent'
 Requires-Dist: dotenv>=0.9.9; extra == 'agent'
+Requires-Dist: inspect-ai>=0.3.80; extra == 'agent'
 Requires-Dist: ipykernel; extra == 'agent'
 Requires-Dist: ipython<9; extra == 'agent'
 Requires-Dist: jupyter-client; extra == 'agent'
@@ -67,8 +69,21 @@ Requires-Dist: jupyter-core; extra == 'agent'
 Requires-Dist: langchain; extra == 'agent'
 Requires-Dist: langchain-anthropic; extra == 'agent'
 Requires-Dist: langchain-openai; extra == 'agent'
+Requires-Dist: pillow>=11.1.0; extra == 'agent'
+Requires-Dist: playwright; extra == 'agent'
+Requires-Dist: pyautogui>=0.9.54; extra == 'agent'
+Requires-Dist: pyright==1.1.401; extra == 'agent'
+Requires-Dist: pytest-asyncio; extra == 'agent'
+Requires-Dist: pytest-cov; extra == 'agent'
+Requires-Dist: pytest-mock; extra == 'agent'
+Requires-Dist: pytest<9,>=8.1.1; extra == 'agent'
+Requires-Dist: ruff>=0.11.8; extra == 'agent'
+Requires-Dist: setuptools; extra == 'agent'
+Requires-Dist: textdistance<5,>=4.5.0; extra == 'agent'
 Provides-Extra: agents
+Requires-Dist: aiodocker>=0.24.0; extra == 'agents'
 Requires-Dist: dotenv>=0.9.9; extra == 'agents'
+Requires-Dist: inspect-ai>=0.3.80; extra == 'agents'
 Requires-Dist: ipykernel; extra == 'agents'
 Requires-Dist: ipython<9; extra == 'agents'
 Requires-Dist: jupyter-client; extra == 'agents'
@@ -76,6 +91,17 @@ Requires-Dist: jupyter-core; extra == 'agents'
 Requires-Dist: langchain; extra == 'agents'
 Requires-Dist: langchain-anthropic; extra == 'agents'
 Requires-Dist: langchain-openai; extra == 'agents'
+Requires-Dist: pillow>=11.1.0; extra == 'agents'
+Requires-Dist: playwright; extra == 'agents'
+Requires-Dist: pyautogui>=0.9.54; extra == 'agents'
+Requires-Dist: pyright==1.1.401; extra == 'agents'
+Requires-Dist: pytest-asyncio; extra == 'agents'
+Requires-Dist: pytest-cov; extra == 'agents'
+Requires-Dist: pytest-mock; extra == 'agents'
+Requires-Dist: pytest<9,>=8.1.1; extra == 'agents'
+Requires-Dist: ruff>=0.11.8; extra == 'agents'
+Requires-Dist: setuptools; extra == 'agents'
+Requires-Dist: textdistance<5,>=4.5.0; extra == 'agents'
 Provides-Extra: dev
 Requires-Dist: aiodocker>=0.24.0; extra == 'dev'
 Requires-Dist: dotenv>=0.9.9; extra == 'dev'
@@ -100,14 +126,6 @@ Requires-Dist: setuptools; extra == 'dev'
 Requires-Dist: textdistance<5,>=4.5.0; extra == 'dev'
 Provides-Extra: rl
 Requires-Dist: bitsandbytes>=0.41.0; (sys_platform == 'linux') and extra == 'rl'
-Requires-Dist: dotenv>=0.9.9; extra == 'rl'
-Requires-Dist: ipykernel; extra == 'rl'
-Requires-Dist: ipython<9; extra == 'rl'
-Requires-Dist: jupyter-client; extra == 'rl'
-Requires-Dist: jupyter-core; extra == 'rl'
-Requires-Dist: langchain; extra == 'rl'
-Requires-Dist: langchain-anthropic; extra == 'rl'
-Requires-Dist: langchain-openai; extra == 'rl'
 Requires-Dist: liger-kernel>=0.5.0; (sys_platform == 'linux') and extra == 'rl'
 Requires-Dist: peft>=0.17.1; extra == 'rl'
 Requires-Dist: vllm==0.10.1.1; extra == 'rl'

hud_python-0.4.36/environments/blank/README.md ADDED Viewed

@@ -0,0 +1,92 @@
+# test-test
+## Environment design pattern
+- Controller (Think of this as a frontend in web development)
+  - Creates the UX and manages the lifecycle of an app (in this case for an agent)
+  - Define `mcp = MCPServer()` and register `@mcp.tool` as tools the agent can interact with
+- Environment (Think of this as a backend in web development)
+  - Owns all long‑lived states of the environment and exposes the environment data structure
+  - Expose simple HTTP endpoints (`/health`, `/act`, `/reset`, `/state`)
+IMPORTANT: Make sure all logs are going to stderr instead of stdio, which is reserved for MCP communication
+### Interactive Development
+```bash
+# 1. Configure your API keys (optional - only needed for evaluation)
+# Edit .env file to add your HUD_API_KEY and ANTHROPIC_API_KEY
+# 2. Start the environment (optional: with --inspector or --interactive)
+hud dev --build --interactive
+# 3. Choose your preferred way to test:
+# Option A: Run the task with Claude (requires ANTHROPIC_API_KEY)
+hud eval tasks.json --agent claude
+# Option B: Interactive notebook test_env.ipynb (great for learning!)
+# Requires installation:
+pip install hud-python[agents]
+# Option C: Simple Python script (runs all tasks from tasks.json)
+python test_task.py
+```
+## Layout
+```
+controller/
+  __init__.py   # mcp + shared HTTP client
+  __main__.py   # python -m controller → mcp.run()
+  hooks.py      # @mcp.initialize / @mcp.shutdown
+  tools.py      # @mcp.tool act / setup / evaluate
+./environment
+  ├── __init__.py
+  └── server.py       # FastAPI app: /health, /act, /reset, /state
+```
+## Publishing Your Environment
+Once your environment is ready, you can share it with the community:
+### 1. Push to Registry
+```bash
+# Build and push your environment (requires docker hub login and hud api key)
+hud build
+hud push
+```
+### 2. Create a Dataset
+Create a dataset on HuggingFace with your tasks:
+**Option A: Upload manually**
+1. Upload your `tasks.json` to HuggingFace
+2. Make sure it's **public** to appear on leaderboards
+**Option B: Use the SDK**
+```python
+from hud.datasets import save_tasks
+import json
+# Load your tasks
+with open("tasks.json") as f:
+    tasks = json.load(f)
+# Push to HuggingFace
+save_tasks(tasks, repo_id="your-org/your-dataset")
+```
+### 3. Run and Track Performance
+```bash
+# Run Claude on your benchmark
+hud eval "your-org/your-dataset" --agent claude
+# View results at:
+# app.hud.so/leaderboards/your-org/your-dataset
+```
+**Note**: Only public HuggingFace datasets appear as leaderboards!
+📚 Learn more: [Creating Benchmarks](https://docs.hud.so/evaluate-agents/create-benchmarks) | [Leaderboards](https://docs.hud.so/evaluate-agents/leaderboards)

hud_python-0.4.36/environments/blank/controller/README.md ADDED Viewed

@@ -0,0 +1,16 @@
+# Controller
+Frontend for the agent: defines tools, minimal state, calls the environment over HTTP.
+What to implement
+- Shared client in `__init__.py` (one `httpx.AsyncClient`)
+- Lifecycle in `hooks.py` (`@mcp.initialize`/`@mcp.shutdown`)
+- Tools in `tools.py` (`@mcp.tool`) — keep logic thin; docstrings = descriptions
+Run
+```bash
+hud run controller --transport http --reload
+# Helper endpoints: http://localhost:8765/hud and /hud/tools
+```
+Principle: the controller is UX, not state. Keep long‑lived state in the environment.

hud_python-0.4.36/environments/blank/environment/README.md ADDED Viewed

@@ -0,0 +1,16 @@
+# Environment
+Backend service: owns state and exposes HTTP APIs the controller calls.
+Endpoints (FastAPI)
+- `GET /health` → {status: ok}
+- `POST /act` → increments counter and returns {count}
+- `POST /reset` → resets counter
+- `GET /state` → returns {count}
+Run (dev)
+```bash
+uv run uvicorn environment.server:app --reload --port 8005
+```
+Principle: treat like a backend. Keep long‑lived state here; add endpoints as tools need them.

hud_python-0.4.36/environments/blank/pyproject.toml ADDED Viewed

@@ -0,0 +1,19 @@
+[project]
+name = "test_test"
+version = "0.1.0"
+description = "A minimal HUD environment"
+requires-python = ">=3.11"
+dependencies = [ "hud-python==0.4.36", "fastapi", "uvicorn[standard]", "httpx>=0.28.1",]
+[build-system]
+requires = [ "hatchling",]
+build-backend = "hatchling.build"
+[tool.hud]
+image = "test_test:dev"
+[tool.hatch.metadata]
+allow-direct-references = true
+[tool.hatch.build.targets.wheel]
+packages = [ "controller", "environment",]

{hud_python-0.4.35 → hud_python-0.4.36}/environments/browser/README.md RENAMED Viewed

@@ -2,100 +2,99 @@
 A browser automation environment for HUD that provides GUI access and web app interaction capabilities. This environment supports hot-reloading during development while maintaining persistent state.
-## Architecture Overview
+## Quick Start
-The browser environment uses a two-process architecture:
+### Interactive Development
+```bash
+# 1. Configure your API keys (optional - only needed for evaluation)
+# Edit .env file to add your HUD_API_KEY and ANTHROPIC_API_KEY
-1. **Context Server** (`context.py`): Long-running process that maintains persistent state
-2. **MCP Server** (`server.py`): Hot-reloadable process that handles tool requests
+# 2. Start the environment (optional: with inspector)
+hud dev --build --inspector
-### Key Components
+# 3. Choose your preferred way to test:
-- **BrowserContext**: Stores persistent state (running apps, ports, playwright instance)
-- **ServiceManager**: Manages X11, VNC, and app processes
-- **BaseHub Tools**: Setup and evaluate tools organized by app (2048, todo)
-- **Multiprocessing Proxy**: Enables state sharing between processes
+# Option A: Run the task with Claude (requires ANTHROPIC_API_KEY)
+hud eval tasks.json --agent claude
-## Context Management and Common Pitfalls
+# Option B: Interactive notebook test_env.ipynb (great for learning!)
+# Requires installation:
+pip install hud-python[agents]
-### Understanding the Proxy System
+# Option C: Simple Python script (runs all tasks from tasks.json)
+python test_task.py
+```
-The browser environment uses Python's `multiprocessing.Manager` to share state between the context server and MCP server. This introduces important constraints:
+## How HUD Environments Work
-#### ❌ Common Pitfall: Unpicklable Objects
+The environment is split into two components:
-```python
-# BAD: This will fail with "cannot pickle 'coroutine' object"
-@setup.tool("my_tool")
-async def my_tool():
-    env = setup.env
-    result = await env.call_app_api("app", "/api/endpoint")  # Returns coroutine
-    # The coroutine can't be serialized through the proxy!
-```
+- **`env.py`** - Stateful logic that persists across reloads
+- **`server.py`** - MCP server with tools (reloads on file changes)
-#### ✅ Solution: Direct HTTP Calls
+This separation is crucial for `hud dev` - it allows you to modify the MCP tools and see changes immediately without losing the environment state. The environment runs as a separate process and communicates via socket, while the server can be restarted freely.
-```python
-# GOOD: Make HTTP calls directly
-@setup.tool("my_tool")
-async def my_tool():
-    import httpx
-    # Get the backend port from persistent context
-    persistent_ctx = setup.env
-    backend_port = persistent_ctx.get_app_backend_port("app")
-    # Make API call directly
-    url = f"http://localhost:{backend_port}/api/endpoint"
-    async with httpx.AsyncClient() as client:
-        response = await client.get(url)
-        response.raise_for_status()
-        result = response.json()
-```
+If you are ever seeing issues with the environment itself, running `hud dev --full-reload` will reload both the environment and the server.
-### State Synchronization Issues
+## Publishing Your Environment
-#### ❌ Common Pitfall: Direct List/Dict Manipulation
+Once your environment is ready, you can share it with the community:
-```python
-# BAD: Regular Python lists don't sync through proxy
-class ServiceManager:
-    def __init__(self):
-        self._launched_apps = []  # Won't sync!
+### 1. Push to Registry
+```bash
+# Build and push your environment (requires docker hub login and hud api key)
+hud build
+hud push
 ```
-#### ✅ Solution: Store State in Persistent Context
+### 2. Create a Dataset
+Create a dataset on HuggingFace with your tasks:
+**Option A: Upload manually**
+1. Upload your `tasks.json` to HuggingFace
+2. Make sure it's **public** to appear on leaderboards
+**Option B: Use the SDK**
 ```python
-# GOOD: Use the persistent context for shared state
-class BrowserContext:
-    def __init__(self):
-        self._running_apps: List[str] = []
-        self._app_ports: Dict[str, Dict[str, int]] = {}
-    def add_running_app(self, app_name: str) -> None:
-        """Add app to running list."""
-        if app_name not in self._running_apps:
-            self._running_apps.append(app_name)
+from hud.datasets import save_tasks
+import json
+# Load your tasks
+with open("tasks.json") as f:
+    tasks = json.load(f)
+# Push to HuggingFace
+save_tasks(tasks, repo_id="your-org/your-dataset")
 ```
-### Accessing Shared Resources
+### 3. Run and Track Performance
-#### ❌ Common Pitfall: Direct Attribute Access
+```bash
+# Run Claude on your benchmark
+hud eval "your-org/your-dataset" --agent claude
-```python
-# BAD: Direct attribute access on proxy objects
-playwright_tool = env.playwright  # May not work with proxy
+# View results at:
+# app.hud.so/leaderboards/your-org/your-dataset
 ```
-#### ✅ Solution: Use Getter Methods
+**Note**: Only public HuggingFace datasets appear as leaderboards!
-```python
-# GOOD: Use proxy-friendly getter methods
-playwright_tool = persistent_ctx.get_playwright_tool()
-```
+📚 Learn more: [Creating Benchmarks](https://docs.hud.so/evaluate-agents/create-benchmarks) | [Leaderboards](https://docs.hud.so/evaluate-agents/leaderboards)
-## Best Practices
+## Architecture Overview
+The browser environment uses a two-process architecture:
+1. **Context Server** (`context.py`): Long-running process that maintains persistent state
+2. **MCP Server** (`server.py`): Hot-reloadable process that handles tool requests
+### Key Components
+- **BrowserContext**: Stores persistent state (running apps, ports, playwright instance)
+- **ServiceManager**: Manages X11, VNC, and app processes
+- **BaseHub Tools**: Setup and evaluate tools organized by app (2048, todo)
+- **Multiprocessing Proxy**: Enables state sharing between processes
 ### 1. Tool Implementation Pattern
@@ -166,26 +165,6 @@ from . import setup
 # Not inside functions
 ```
-## Troubleshooting
-### "Cannot pickle 'coroutine' object"
-**Cause**: Trying to return an async function result through the proxy.
-**Fix**: Don't use async methods on proxied objects. Make direct HTTP calls instead.
-### "App not launched" errors
-**Cause**: State synchronization issue between ServiceManager and persistent context.
-**Fix**: Ensure `launch_app` stores app info in the persistent context, and setup/evaluate tools check the persistent context's app list.
-### "Object has no attribute" on proxy objects
-**Cause**: Direct attribute access on multiprocessing proxy objects.
-**Fix**: Use getter/setter methods instead of direct attribute access.
 ## Development Workflow
 1. **Start the environment**: `hud dev`

hud_python-0.4.36/environments/browser/environment/pyproject.toml ADDED Viewed

@@ -0,0 +1,20 @@
+[project]
+name = "browser-environment"
+version = "0.1.0"
+description = "Browser environment server for managing X11, VNC, and applications"
+requires-python = ">=3.11"
+dependencies = [
+    "fastapi>=0.104.1",
+    "uvicorn[standard]>=0.24.0",
+    "httpx>=0.25.2",
+    "pydantic>=2.6,<3",
+    "pydantic-settings>=2.2,<3",
+    "python-multipart>=0.0.6",
+]
+[build-system]
+requires = ["hatchling"]
+build-backend = "hatchling.build"
+[tool.hatch.build.targets.wheel]
+packages = ["controller", "environment"]

hud_python-0.4.36/environments/browser/pyproject.toml ADDED Viewed

@@ -0,0 +1,22 @@
+[project]
+name = "hud-browser-controller"
+version = "0.1.0"
+description = "HUD Browser Controller - MCP interface for browser environments"
+requires-python = ">=3.11,<3.14"
+dependencies = [ "pydantic>=2.6,<3", "pydantic-settings>=2.2,<3", "hud-python@git+https://github.com/hud-evals/hud-python@env-cli-improvements", "playwright", "pyautogui", "httpx", "typer", "fastapi", "uvicorn",]
+[build-system]
+requires = [ "hatchling",]
+build-backend = "hatchling.build"
+[project.scripts]
+hud-browser-controller = "controller.__main__:main"
+[tool.hud]
+image = "hud-browser:dev"
+[tool.hatch.metadata]
+allow-direct-references = true
+[tool.hatch.build.targets.wheel]
+packages = [ "controller", "problems",]

hud_python-0.4.36/environments/deepresearch/pyproject.toml ADDED Viewed

@@ -0,0 +1,19 @@
+[project]
+name = "deepresearch"
+version = "0.1.0"
+description = "DeepResearch HUD environment with HTTP backend (EXA on server)"
+requires-python = ">=3.11"
+dependencies = [ "hud-python==0.4.36", "fastapi>=0.104.1", "uvicorn[standard]>=0.24.0", "httpx>=0.24.0",]
+[build-system]
+requires = [ "hatchling",]
+build-backend = "hatchling.build"
+[tool.hud]
+image = "deepresearch:dev"
+[tool.hatch.metadata]
+allow-direct-references = true
+[tool.hatch.build.targets.wheel]
+packages = [ "controller", "environment",]

{hud_python-0.4.35 → hud_python-0.4.36}/hud/agents/tests/test_claude.py RENAMED Viewed

@@ -86,6 +86,7 @@ class TestClaudeAgent:
             model_client=mock_model_client,
             model="claude-3-opus-20240229",
             max_tokens=1000,
+            validate_api_key=False,  # Skip validation in tests
         )
         assert agent.model_name == "claude-3-opus-20240229"
@@ -93,10 +94,14 @@ class TestClaudeAgent:
         assert agent.anthropic_client == mock_model_client
     @pytest.mark.asyncio
-    async def test_init_without_model_client(self, mock_mcp_client):
+    async def test_init_without_model_client(self, mock_mcp_client, mock_anthropic):
         """Test agent initialization without model client."""
         with patch("hud.settings.settings.anthropic_api_key", "test_key"):
-            agent = ClaudeAgent(mcp_client=mock_mcp_client, model="claude-3-opus-20240229")
+            agent = ClaudeAgent(
+                mcp_client=mock_mcp_client,
+                model="claude-3-opus-20240229",
+                validate_api_key=False,  # Skip validation in tests
+            )
             assert agent.model_name == "claude-3-opus-20240229"
             assert agent.anthropic_client is not None
@@ -105,7 +110,11 @@ class TestClaudeAgent:
     async def test_format_blocks(self, mock_mcp_client):
         """Test formatting content blocks into Claude messages."""
         mock_model_client = MagicMock()
-        agent = ClaudeAgent(mcp_client=mock_mcp_client, model_client=mock_model_client)
+        agent = ClaudeAgent(
+            mcp_client=mock_mcp_client,
+            model_client=mock_model_client,
+            validate_api_key=False,  # Skip validation in tests
+        )
         # Test with text only
         text_blocks: list[types.ContentBlock] = [
@@ -141,7 +150,11 @@ class TestClaudeAgent:
     async def test_format_tool_results_method(self, mock_mcp_client):
         """Test the agent's format_tool_results method."""
         mock_model_client = MagicMock()
-        agent = ClaudeAgent(mcp_client=mock_mcp_client, model_client=mock_model_client)
+        agent = ClaudeAgent(
+            mcp_client=mock_mcp_client,
+            model_client=mock_model_client,
+            validate_api_key=False,  # Skip validation in tests
+        )
         tool_calls = [
             MCPToolCall(name="test_tool", arguments={}, id="id1"),
@@ -171,7 +184,11 @@ class TestClaudeAgent:
         """Test getting model response from Claude API."""
         # Disable telemetry for this test to avoid backend configuration issues
         with patch("hud.settings.settings.telemetry_enabled", False):
-            agent = ClaudeAgent(mcp_client=mock_mcp_client, model_client=mock_anthropic)
+            agent = ClaudeAgent(
+                mcp_client=mock_mcp_client,
+                model_client=mock_anthropic,
+                validate_api_key=False,  # Skip validation in tests
+            )
             # Mock the API response
             mock_response = MagicMock()
@@ -215,7 +232,11 @@ class TestClaudeAgent:
         """Test getting text-only response."""
         # Disable telemetry for this test to avoid backend configuration issues
         with patch("hud.settings.settings.telemetry_enabled", False):
-            agent = ClaudeAgent(mcp_client=mock_mcp_client, model_client=mock_anthropic)
+            agent = ClaudeAgent(
+                mcp_client=mock_mcp_client,
+                model_client=mock_anthropic,
+                validate_api_key=False,  # Skip validation in tests
+            )
             mock_response = MagicMock()
             # Create text block
@@ -242,7 +263,11 @@ class TestClaudeAgent:
         """Test handling API errors."""
         # Disable telemetry for this test to avoid backend configuration issues
         with patch("hud.settings.settings.telemetry_enabled", False):
-            agent = ClaudeAgent(mcp_client=mock_mcp_client, model_client=mock_anthropic)
+            agent = ClaudeAgent(
+                mcp_client=mock_mcp_client,
+                model_client=mock_anthropic,
+                validate_api_key=False,  # Skip validation in tests
+            )
             # Mock API error
             mock_anthropic.beta.messages.create = AsyncMock(

{hud_python-0.4.35 → hud_python-0.4.36}/hud/agents/tests/test_openai.py RENAMED Viewed

@@ -44,7 +44,10 @@ class TestOperatorAgent:
         """Test agent initialization."""
         mock_model_client = MagicMock()
         agent = OperatorAgent(
-            mcp_client=mock_mcp_client, model_client=mock_model_client, model="gpt-4"
+            mcp_client=mock_mcp_client,
+            model_client=mock_model_client,
+            model="gpt-4",
+            validate_api_key=False,  # Skip validation in tests
         )
         assert agent.model_name == "openai-gpt-4"
@@ -55,7 +58,11 @@ class TestOperatorAgent:
     async def test_format_blocks(self, mock_mcp_client):
         """Test formatting content blocks."""
         mock_model_client = MagicMock()
-        agent = OperatorAgent(mcp_client=mock_mcp_client, model_client=mock_model_client)
+        agent = OperatorAgent(
+            mcp_client=mock_mcp_client,
+            model_client=mock_model_client,
+            validate_api_key=False,  # Skip validation in tests
+        )
         # Test with text blocks
         blocks: list[types.ContentBlock] = [
@@ -85,7 +92,11 @@ class TestOperatorAgent:
     @pytest.mark.asyncio
     async def test_format_tool_results(self, mock_mcp_client, mock_openai):
         """Test formatting tool results."""
-        agent = OperatorAgent(mcp_client=mock_mcp_client, model_client=mock_openai)
+        agent = OperatorAgent(
+            mcp_client=mock_mcp_client,
+            model_client=mock_openai,
+            validate_api_key=False,  # Skip validation in tests
+        )
         tool_calls = [
             MCPToolCall(name="test_tool", arguments={}, id="call_123"),  # type: ignore
@@ -111,7 +122,11 @@ class TestOperatorAgent:
     @pytest.mark.asyncio
     async def test_format_tool_results_with_error(self, mock_mcp_client, mock_openai):
         """Test formatting tool results with errors."""
-        agent = OperatorAgent(mcp_client=mock_mcp_client, model_client=mock_openai)
+        agent = OperatorAgent(
+            mcp_client=mock_mcp_client,
+            model_client=mock_openai,
+            validate_api_key=False,  # Skip validation in tests
+        )
         tool_calls = [
             MCPToolCall(name="failing_tool", arguments={}, id="call_error"),  # type: ignore
@@ -131,7 +146,11 @@ class TestOperatorAgent:
     @pytest.mark.asyncio
     async def test_get_model_response(self, mock_mcp_client, mock_openai):
         """Test getting model response from OpenAI API."""
-        agent = OperatorAgent(mcp_client=mock_mcp_client, model_client=mock_openai)
+        agent = OperatorAgent(
+            mcp_client=mock_mcp_client,
+            model_client=mock_openai,
+            validate_api_key=False,  # Skip validation in tests
+        )
         # Set up available tools so agent doesn't return "No computer use tools available"
         agent._available_tools = [
@@ -162,7 +181,11 @@ class TestOperatorAgent:
     @pytest.mark.asyncio
     async def test_handle_empty_response(self, mock_mcp_client, mock_openai):
         """Test handling empty response from API."""
-        agent = OperatorAgent(mcp_client=mock_mcp_client, model_client=mock_openai)
+        agent = OperatorAgent(
+            mcp_client=mock_mcp_client,
+            model_client=mock_openai,
+            validate_api_key=False,  # Skip validation in tests
+        )
         # Set up available tools
         agent._available_tools = [

hud-python 0.4.35__tar.gz → 0.4.36__tar.gz

Potentially problematic release.

hud-python 0.4.35tar.gz → 0.4.36tar.gz