PyPI - hud-python - Versions diffs - 0.4.52__tar.gz → 0.4.54__tar.gz - Mend

hud-python 0.4.52tar.gz → 0.4.54tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Potentially problematic release.

This version of hud-python might be problematic. Click here for more details.

Files changed (299) hide show

{hud_python-0.4.52 → hud_python-0.4.54}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: hud-python
-Version: 0.4.52
+Version: 0.4.54
 Summary: SDK for the HUD platform.
 Project-URL: Homepage, https://github.com/hud-evals/hud-python
 Project-URL: Bug Tracker, https://github.com/hud-evals/hud-python/issues
@@ -42,6 +42,7 @@ Requires-Dist: httpx<1,>=0.23.0
 Requires-Dist: hud-fastmcp-python-sdk>=0.1.2
 Requires-Dist: hud-mcp-python-sdk>=3.13.2
 Requires-Dist: hud-mcp-use-python-sdk==2.3.20
+Requires-Dist: langchain==0.3.27
 Requires-Dist: numpy>=1.24.0
 Requires-Dist: openai
 Requires-Dist: opentelemetry-api>=1.34.1
@@ -160,12 +161,12 @@ OSS RL environment + evals toolkit. Wrap software as environments, run benchmark
 ## Highlights
-- 🎓 **[One-click RL](https://hud.so/models)** – Run `hud rl` to get a trained model on any environment.
 - 🚀 **[MCP environment skeleton](https://docs.hud.so/core-concepts/mcp-protocol)** – any agent can call any environment.
 - ⚡️ **[Live telemetry](https://hud.so)** – inspect every tool call, observation, and reward in real time.
 - 🗂️ **[Public benchmarks](https://hud.so/leaderboards)** – OSWorld-Verified, SheetBench-50, and more.
 - 🌐 **[Cloud browsers](environments/remote_browser/)** – AnchorBrowser, Steel, BrowserBase integrations for browser automation.
 - 🛠️ **[Hot-reload dev loop](environments/README.md#phase-5-hot-reload-development-with-cursor-agent)** – `hud dev` for iterating on environments without rebuilds.
+- 🎓 **[One-click RL](https://hud.so/models)** – Run `hud rl` to get a trained model on any environment.
 > We welcome contributors and feature requests – open an issue or hop on a call to discuss improvements!
@@ -186,29 +187,6 @@ uv tool install hud-python
 Before starting, get your HUD_API_KEY at [hud.so](https://hud.so).
-## Quickstart: Training
-RL using GRPO a Qwen2.5-VL model on any hud dataset:
-```bash
-hud get hud-evals/basic-2048 # from HF
-hud rl basic-2048.json
-```
-> See [agent training docs](https://docs.hud.so/train-agents/quickstart)
-Or make your own environment and dataset:
-```bash
-hud init my-env && cd my-env
-hud dev --interactive
-# When ready to run:
-hud rl
-```
-> See [environment design docs](https://docs.hud.so/build-environments)
 ## Quickstart: Evals
 For a tutorial that explains the agent and evaluation design, run:
@@ -265,38 +243,27 @@ The above example let's the agent play 2048 ([See replay](https://hud.so/trace/6
 ![Agent playing 2048](https://raw.githubusercontent.com/hud-evals/hud-python/main/docs/src/images/2048_1.gif)
-## Reinforcement Learning with GRPO
-This is a Qwen‑2.5‑VL‑3B agent training a policy on the 2048-basic browser environment:
-![RL curve](https://raw.githubusercontent.com/hud-evals/hud-python/main/docs/src/images/rl_2.png)
+## Quickstart: Training
-Train with the new interactive `hud rl` flow:
+RL using GRPO a Qwen2.5-VL model on any hud dataset:
 ```bash
-# Install CLI
-uv tool install hud-python
-# Option A: Run directly from a HuggingFace dataset
-hud rl hud-evals/basic-2048
-# Option B: Download first, modify, then train
-hud get hud-evals/basic-2048
-hud rl basic-2048.json
-# Optional: baseline evaluation
-hud eval basic-2048.json
+hud get hud-evals/2048-basic # from HF
+hud rl 2048-basic.json
 ```
-Supports multi‑turn RL for both:
-- Language‑only models (e.g., `Qwen/Qwen2.5-7B-Instruct`)
-- Vision‑Language models (e.g., `Qwen/Qwen2.5-VL-3B-Instruct`)
+> See [agent training docs](https://docs.hud.so/train-agents/quickstart)
-By default, `hud rl` provisions a persistent server and trainer in the cloud, streams telemetry to `hud.so`, and lets you monitor/manage models at `hud.so/models`. Use `--local` to run entirely on your machines (typically 2+ GPUs: one for vLLM, the rest for training).
+Or make your own environment and dataset:
-Any HUD MCP environment and evaluation works with our RL pipeline (including remote configurations). See the guided docs: `https://docs.hud.so/train-agents/quickstart`.
+```bash
+hud init my-env && cd my-env
+hud dev --interactive
+# When ready to run:
+hud rl
+```
-Pricing: Hosted vLLM and training GPU rates are listed in the [Training Quickstart → Pricing](https://docs.hud.so/train-agents/quickstart#pricing). Manage billing at the [HUD billing dashboard](https://hud.so/project/billing).
+> See [environment design docs](https://docs.hud.so/build-environments)
 ## Benchmarking Agents
@@ -460,6 +427,39 @@ We highly suggest running 3-5 evaluations per dataset for the most consistent re
 Using the [`run_dataset`](https://docs.hud.so/reference/tasks#run_dataset) function with a HuggingFace dataset automatically assigns your job to that leaderboard page, and allows you to create a scorecard out of it:
+## Reinforcement Learning with GRPO
+This is a Qwen‑2.5‑VL‑3B agent training a policy on the 2048-basic browser environment:
+![RL curve](https://raw.githubusercontent.com/hud-evals/hud-python/main/docs/src/images/rl_2.png)
+Train with the new interactive `hud rl` flow:
+```bash
+# Install CLI
+uv tool install hud-python
+# Option A: Run directly from a HuggingFace dataset
+hud rl hud-evals/2048-basic
+# Option B: Download first, modify, then train
+hud get hud-evals/2048-basic
+hud rl 2048-basic.json
+# Optional: baseline evaluation
+hud eval 2048-basic.json
+```
+Supports multi‑turn RL for both:
+- Language‑only models (e.g., `Qwen/Qwen2.5-7B-Instruct`)
+- Vision‑Language models (e.g., `Qwen/Qwen2.5-VL-3B-Instruct`)
+By default, `hud rl` provisions a persistent server and trainer in the cloud, streams telemetry to `hud.so`, and lets you monitor/manage models at `hud.so/models`. Use `--local` to run entirely on your machines (typically 2+ GPUs: one for vLLM, the rest for training).
+Any HUD MCP environment and evaluation works with our RL pipeline (including remote configurations). See the guided docs: `https://docs.hud.so/train-agents/quickstart`.
+Pricing: Hosted vLLM and training GPU rates are listed in the [Training Quickstart → Pricing](https://docs.hud.so/train-agents/quickstart#pricing). Manage billing at the [HUD billing dashboard](https://hud.so/project/billing).
 ## Architecture
 ```mermaid

{hud_python-0.4.52 → hud_python-0.4.54}/README.md RENAMED Viewed

@@ -22,12 +22,12 @@ OSS RL environment + evals toolkit. Wrap software as environments, run benchmark
 ## Highlights
-- 🎓 **[One-click RL](https://hud.so/models)** – Run `hud rl` to get a trained model on any environment.
 - 🚀 **[MCP environment skeleton](https://docs.hud.so/core-concepts/mcp-protocol)** – any agent can call any environment.
 - ⚡️ **[Live telemetry](https://hud.so)** – inspect every tool call, observation, and reward in real time.
 - 🗂️ **[Public benchmarks](https://hud.so/leaderboards)** – OSWorld-Verified, SheetBench-50, and more.
 - 🌐 **[Cloud browsers](environments/remote_browser/)** – AnchorBrowser, Steel, BrowserBase integrations for browser automation.
 - 🛠️ **[Hot-reload dev loop](environments/README.md#phase-5-hot-reload-development-with-cursor-agent)** – `hud dev` for iterating on environments without rebuilds.
+- 🎓 **[One-click RL](https://hud.so/models)** – Run `hud rl` to get a trained model on any environment.
 > We welcome contributors and feature requests – open an issue or hop on a call to discuss improvements!
@@ -48,29 +48,6 @@ uv tool install hud-python
 Before starting, get your HUD_API_KEY at [hud.so](https://hud.so).
-## Quickstart: Training
-RL using GRPO a Qwen2.5-VL model on any hud dataset:
-```bash
-hud get hud-evals/basic-2048 # from HF
-hud rl basic-2048.json
-```
-> See [agent training docs](https://docs.hud.so/train-agents/quickstart)
-Or make your own environment and dataset:
-```bash
-hud init my-env && cd my-env
-hud dev --interactive
-# When ready to run:
-hud rl
-```
-> See [environment design docs](https://docs.hud.so/build-environments)
 ## Quickstart: Evals
 For a tutorial that explains the agent and evaluation design, run:
@@ -127,38 +104,27 @@ The above example let's the agent play 2048 ([See replay](https://hud.so/trace/6
 ![Agent playing 2048](https://raw.githubusercontent.com/hud-evals/hud-python/main/docs/src/images/2048_1.gif)
-## Reinforcement Learning with GRPO
-This is a Qwen‑2.5‑VL‑3B agent training a policy on the 2048-basic browser environment:
-![RL curve](https://raw.githubusercontent.com/hud-evals/hud-python/main/docs/src/images/rl_2.png)
+## Quickstart: Training
-Train with the new interactive `hud rl` flow:
+RL using GRPO a Qwen2.5-VL model on any hud dataset:
 ```bash
-# Install CLI
-uv tool install hud-python
-# Option A: Run directly from a HuggingFace dataset
-hud rl hud-evals/basic-2048
-# Option B: Download first, modify, then train
-hud get hud-evals/basic-2048
-hud rl basic-2048.json
-# Optional: baseline evaluation
-hud eval basic-2048.json
+hud get hud-evals/2048-basic # from HF
+hud rl 2048-basic.json
 ```
-Supports multi‑turn RL for both:
-- Language‑only models (e.g., `Qwen/Qwen2.5-7B-Instruct`)
-- Vision‑Language models (e.g., `Qwen/Qwen2.5-VL-3B-Instruct`)
+> See [agent training docs](https://docs.hud.so/train-agents/quickstart)
-By default, `hud rl` provisions a persistent server and trainer in the cloud, streams telemetry to `hud.so`, and lets you monitor/manage models at `hud.so/models`. Use `--local` to run entirely on your machines (typically 2+ GPUs: one for vLLM, the rest for training).
+Or make your own environment and dataset:
-Any HUD MCP environment and evaluation works with our RL pipeline (including remote configurations). See the guided docs: `https://docs.hud.so/train-agents/quickstart`.
+```bash
+hud init my-env && cd my-env
+hud dev --interactive
+# When ready to run:
+hud rl
+```
-Pricing: Hosted vLLM and training GPU rates are listed in the [Training Quickstart → Pricing](https://docs.hud.so/train-agents/quickstart#pricing). Manage billing at the [HUD billing dashboard](https://hud.so/project/billing).
+> See [environment design docs](https://docs.hud.so/build-environments)
 ## Benchmarking Agents
@@ -322,6 +288,39 @@ We highly suggest running 3-5 evaluations per dataset for the most consistent re
 Using the [`run_dataset`](https://docs.hud.so/reference/tasks#run_dataset) function with a HuggingFace dataset automatically assigns your job to that leaderboard page, and allows you to create a scorecard out of it:
+## Reinforcement Learning with GRPO
+This is a Qwen‑2.5‑VL‑3B agent training a policy on the 2048-basic browser environment:
+![RL curve](https://raw.githubusercontent.com/hud-evals/hud-python/main/docs/src/images/rl_2.png)
+Train with the new interactive `hud rl` flow:
+```bash
+# Install CLI
+uv tool install hud-python
+# Option A: Run directly from a HuggingFace dataset
+hud rl hud-evals/2048-basic
+# Option B: Download first, modify, then train
+hud get hud-evals/2048-basic
+hud rl 2048-basic.json
+# Optional: baseline evaluation
+hud eval 2048-basic.json
+```
+Supports multi‑turn RL for both:
+- Language‑only models (e.g., `Qwen/Qwen2.5-7B-Instruct`)
+- Vision‑Language models (e.g., `Qwen/Qwen2.5-VL-3B-Instruct`)
+By default, `hud rl` provisions a persistent server and trainer in the cloud, streams telemetry to `hud.so`, and lets you monitor/manage models at `hud.so/models`. Use `--local` to run entirely on your machines (typically 2+ GPUs: one for vLLM, the rest for training).
+Any HUD MCP environment and evaluation works with our RL pipeline (including remote configurations). See the guided docs: `https://docs.hud.so/train-agents/quickstart`.
+Pricing: Hosted vLLM and training GPU rates are listed in the [Training Quickstart → Pricing](https://docs.hud.so/train-agents/quickstart#pricing). Manage billing at the [HUD billing dashboard](https://hud.so/project/billing).
 ## Architecture
 ```mermaid

{hud_python-0.4.52 → hud_python-0.4.54}/environments/README.md RENAMED Viewed

@@ -804,9 +804,9 @@ class TodoCompleted:
 @problem("todo_basic", description="Complete two todo items", difficulty="easy")
 class TodoBasic:
     def get_setup(self):
-        return {"function": "todo_seed", "args": {"num_items": 5}}
+        return {"name": "todo_seed", "arguments": {"num_items": 5}}
     def get_evaluation(self):
-        return {"function": "todo_completed", "args": {"expected_count": 2}}
+        return {"name": "todo_completed", "arguments": {"expected_count": 2}}
 ```
 Decorators keep registration *next to the implementation* and avoid manual bookkeeping.  The server simply exposes the combined metadata through an MCP **resource**.  Follow `environments/browser/src/hud_controller/problems/registry.py` as a template and expose the JSON with `@mcp.resource("problems://registry")`.

{hud_python-0.4.52 → hud_python-0.4.54}/environments/blank/README.md RENAMED Viewed

@@ -6,10 +6,12 @@ See [docs](https://docs.hud.so/build-environments) for the complete environment
 ## Architecture
 **`environment/`** - Produces structured data
 - Owns all state (game logic, browser sessions, databases, etc.)
 - Exposes HTTP endpoints `/health`, `/act`, `/reset`, `/state` that return structured information about the environment state
 **`server/`** - Wraps data in MCP tools
 - Calls environment endpoints to get structured data for the agent, and environment setup/evaluation
 - Agents and tasks interact only with these tools!
@@ -33,12 +35,14 @@ Visit http://localhost:8765/docs to see the new tool appear instantly.
 In general, we recommend starting work on the environment backend first, then developing the MCP server to expose the right things to the agent.
 For complex environments that require many dependencies, we recommend running `hud dev` in the environment root:
 ```bash
 cd ..
 hud dev
 ```
 ## Tasks & Evaluation
 ```bash
 # Build first in the global folder with the Dockerfile (creates blank:0.1.0)
 hud build
@@ -59,6 +63,7 @@ Your `tasks.json` uses `docker run` to launch the environment:
 ```
 **Commands:**
 ```bash
 # Build first
 hud build
@@ -78,6 +83,7 @@ hud rl tasks.json  # Auto-converts docker→remote, builds & pushes if needed
 Once your environment is ready, you can share it with the community:
 ### 1. Push to Registry
 ```bash
 # Build and push your environment (requires docker hub login and hud api key)
 hud build
@@ -89,10 +95,12 @@ hud push
 Create a dataset on HuggingFace with your tasks:
 **Option A: Upload manually**
 1. Upload your `tasks.json` to HuggingFace
 2. Make sure it's **public** to appear on leaderboards
 **Option B: Use the SDK**
 ```python
 from hud.datasets import save_tasks
 import json
@@ -109,7 +117,7 @@ save_tasks(tasks, repo_id="your-org/your-dataset")
 ```bash
 # Run Claude on your benchmark
-hud eval "your-org/your-dataset" --agent claude
+hud eval "your-org/your-dataset" claude
 # View results at:
 # hud.so/leaderboards/your-org/your-dataset
@@ -118,4 +126,3 @@ hud eval "your-org/your-dataset" --agent claude
 **Note**: Only public HuggingFace datasets appear as leaderboards!
 📚 Learn more: [Creating Benchmarks](https://docs.hud.so/evaluate-agents/create-benchmarks) | [Leaderboards](https://docs.hud.so/evaluate-agents/leaderboards)

{hud_python-0.4.52 → hud_python-0.4.54}/environments/blank/server/pyproject.toml RENAMED Viewed

@@ -4,7 +4,7 @@ version = "0.1.0"
 description = "MCP server for blank environment"
 requires-python = ">=3.11"
 dependencies = [
-    "hud-python>=0.4.52",
+    "hud-python>=0.4.54",
     "httpx>=0.28.1",
 ]

{hud_python-0.4.52 → hud_python-0.4.54}/environments/browser/environment/todo/README.md RENAMED Viewed

@@ -47,8 +47,8 @@ await setup({"name": "todo_basic_usage"})
 await evaluate({"name": "todo_basic_usage"})
 # Direct function calls
-await setup({"function": "todo_reset", "args": {}})
-await evaluate({"function": "todo_completion_rate", "args": {"min_rate": 0.5}})
+await setup({"name": "todo_reset", "arguments": {}})
+await evaluate({"name": "todo_completion_rate", "arguments": {"min_rate": 0.5}})
 # MCP resource discovery
 todo_evaluators = await client.read_resource("evaluators://todo")

{hud_python-0.4.52 → hud_python-0.4.54}/environments/browser/server/pyproject.toml RENAMED Viewed

@@ -4,7 +4,7 @@ version = "0.1.0"
 description = "HUD Browser MCP Server"
 requires-python = ">=3.11,<3.14"
 dependencies = [
-    "hud-python@git+https://github.com/hud-evals/hud-python@cli-dev",
+    "hud-python>=0.4.54",
     "httpx",
     "playwright",
     "pyautogui",

{hud_python-0.4.52 → hud_python-0.4.54}/environments/deepresearch/server/pyproject.toml RENAMED Viewed

@@ -4,7 +4,7 @@ version = "0.1.0"
 description = "MCP server for DeepResearch environment"
 requires-python = ">=3.11"
 dependencies = [
-    "hud-python>=0.4.52",
+    "hud-python>=0.4.54",
     "httpx>=0.24.0",
 ]

{hud_python-0.4.52 → hud_python-0.4.54}/hud/agents/base.py RENAMED Viewed

@@ -137,7 +137,11 @@ class MCPAgent(ABC):
                 "No MCPClient. Please provide one when initializing the agent or pass a Task with mcp_config."  # noqa: E501
             )
-        await self._setup_config(self.mcp_client.mcp_config)
+        try:
+            client_cfg = getattr(self.mcp_client, "mcp_config", None)
+        except Exception:
+            client_cfg = None
+        await self._setup_config(client_cfg)
         # Initialize client if needed
         try:
@@ -618,8 +622,11 @@ class MCPAgent(ABC):
             except Exception as e:
                 self.console.error_log(f"Response lifecycle tool failed: {e}")
-    async def _setup_config(self, mcp_config: dict[str, dict[str, Any]]) -> None:
+    async def _setup_config(self, mcp_config: dict[str, dict[str, Any]] | None) -> None:
         """Inject metadata into the metadata of the initialize request."""
+        if not isinstance(mcp_config, dict):
+            return
         if self.metadata:
             patch_mcp_config(
                 mcp_config,

{hud_python-0.4.52 → hud_python-0.4.54}/hud/agents/openai_chat_generic.py RENAMED Viewed

@@ -20,6 +20,7 @@ import logging
 from typing import TYPE_CHECKING, Any, ClassVar, cast
 import mcp.types as types
+from openai import AsyncOpenAI
 from hud import instrument
 from hud.types import AgentResponse, MCPToolCall, MCPToolResult
@@ -28,7 +29,6 @@ from hud.utils.hud_console import HUDConsole
 from .base import MCPAgent
 if TYPE_CHECKING:
-    from openai import AsyncOpenAI
     from openai.types.chat import ChatCompletionToolParam
 logger = logging.getLogger(__name__)
@@ -42,14 +42,26 @@ class GenericOpenAIChatAgent(MCPAgent):
     def __init__(
         self,
         *,
-        openai_client: AsyncOpenAI | None,
+        openai_client: AsyncOpenAI | None = None,
+        api_key: str | None = None,
+        base_url: str | None = None,
         model_name: str = "gpt-4o-mini",
         completion_kwargs: dict[str, Any] | None = None,
         **agent_kwargs: Any,
     ) -> None:
         # Accept base-agent settings via **agent_kwargs (e.g., mcp_client, system_prompt, etc.)
         super().__init__(**agent_kwargs)
-        self.oai = openai_client
+        # Handle client creation - support both patterns
+        if openai_client is not None:
+            # Use provided client (backward compatibility)
+            self.oai = openai_client
+        elif api_key is not None or base_url is not None:
+            # Create client from config (new pattern, consistent with other agents)
+            self.oai = AsyncOpenAI(api_key=api_key, base_url=base_url)
+        else:
+            raise ValueError("Either openai_client or (api_key and base_url) must be provided")
         self.model_name = model_name
         self.completion_kwargs: dict[str, Any] = completion_kwargs or {}
         self.mcp_schemas = []

{hud_python-0.4.52 → hud_python-0.4.54}/hud/agents/tests/test_base.py RENAMED Viewed

@@ -329,6 +329,21 @@ class TestBaseMCPAgent:
         # call_tools doesn't validate empty names, it will return error
         await agent.call_tools(tool_call)
+    def test_get_tool_schemas(self):
+        """Test getting tool schemas."""
+        agent = MockMCPAgent()
+        agent._available_tools = [
+            types.Tool(name="tool1", description="Tool 1", inputSchema={"type": "object"}),
+            types.Tool(name="setup", description="Setup", inputSchema={"type": "object"}),
+        ]
+        schemas = agent.get_tool_schemas()
+        # Should include non-lifecycle tools
+        assert len(schemas) == 2
+        assert schemas[0]["name"] == "tool1"
     def test_get_tools_by_server(self):
         """Test getting tools grouped by server."""
         agent = MockMCPAgent()

hud_python-0.4.54/hud/agents/tests/test_base_runtime.py ADDED Viewed

@@ -0,0 +1,164 @@
+from __future__ import annotations
+from unittest import mock
+import mcp.types as types
+import pytest
+from hud.agents.base import MCPAgent, find_content, find_reward, text_to_blocks
+from hud.types import AgentResponse, MCPToolCall, MCPToolResult
+class DummyAgent(MCPAgent):
+    async def get_system_messages(self):
+        return [types.TextContent(text="sys", type="text")]
+    async def get_response(self, messages):
+        # Single step: no tool calls -> done
+        return AgentResponse(content="ok", tool_calls=[], done=True)
+    async def format_blocks(self, blocks):
+        # Return as-is
+        return blocks
+    async def format_tool_results(self, tool_calls, tool_results):
+        return [types.TextContent(text="tools", type="text")]
+@pytest.mark.asyncio
+async def test_run_with_string_prompt_auto_client(monkeypatch):
+    # Fake MCPClient with required methods
+    fake_client = mock.AsyncMock()
+    fake_client.initialize.return_value = None
+    fake_client.list_tools.return_value = []
+    fake_client.shutdown.return_value = None
+    # Patch MCPClient construction inside initialize()
+    with mock.patch("hud.clients.MCPClient", return_value=fake_client):
+        agent = DummyAgent(mcp_client=fake_client, auto_trace=False)
+        result = await agent.run("hello", max_steps=1)
+    assert result.done is True and result.isError is False
+def test_find_reward_and_content_extractors():
+    # Structured content
+    r = MCPToolResult(
+        content=text_to_blocks("{}"), isError=False, structuredContent={"reward": 0.7}
+    )
+    assert find_reward(r) == 0.7
+    # Text JSON
+    r2 = MCPToolResult(content=text_to_blocks('{"score": 0.5, "content": "hi"}'), isError=False)
+    assert find_reward(r2) == 0.5
+    assert find_content(r2) == "hi"
+@pytest.mark.asyncio
+async def test_call_tools_error_paths():
+    fake_client = mock.AsyncMock()
+    # First call succeeds
+    ok_result = MCPToolResult(content=text_to_blocks("ok"), isError=False)
+    fake_client.call_tool.side_effect = [ok_result, RuntimeError("boom")]
+    agent = DummyAgent(mcp_client=fake_client, auto_trace=False)
+    results = await agent.call_tools(
+        [MCPToolCall(name="a", arguments={}), MCPToolCall(name="b", arguments={})]
+    )
+    assert results[0].isError is False
+    assert results[1].isError is True
+@pytest.mark.asyncio
+async def test_initialize_without_client_raises_valueerror():
+    agent = DummyAgent(mcp_client=None, auto_trace=False)
+    with pytest.raises(ValueError):
+        await agent.initialize(None)
+def test_get_available_tools_before_initialize_raises():
+    agent = DummyAgent(mcp_client=mock.AsyncMock(), auto_trace=False)
+    with pytest.raises(RuntimeError):
+        agent.get_available_tools()
+@pytest.mark.asyncio
+async def test_format_message_invalid_type_raises():
+    agent = DummyAgent(mcp_client=mock.AsyncMock(), auto_trace=False)
+    with pytest.raises(ValueError):
+        await agent.format_message({"oops": 1})  # type: ignore
+@pytest.mark.asyncio
+async def test_call_tools_timeout_error_shutdown_called():
+    fake_client = mock.AsyncMock()
+    fake_client.call_tool.side_effect = TimeoutError("timeout")
+    fake_client.shutdown.return_value = None
+    agent = DummyAgent(mcp_client=fake_client, auto_trace=False)
+    with pytest.raises(TimeoutError):
+        await agent.call_tools(MCPToolCall(name="x", arguments={}))
+    fake_client.shutdown.assert_awaited_once()
+def test_text_to_blocks_shapes():
+    blocks = text_to_blocks("x")
+    assert isinstance(blocks, list) and blocks and isinstance(blocks[0], types.TextContent)
+@pytest.mark.asyncio
+async def test_run_returns_connection_error_trace(monkeypatch):
+    fake_client = mock.AsyncMock()
+    fake_client.mcp_config = {}
+    fake_client.initialize.side_effect = RuntimeError("Connection refused http://localhost:1234")
+    fake_client.list_tools.return_value = []
+    fake_client.shutdown.return_value = None
+    class DummyCM:
+        def __exit__(self, *args, **kwargs):
+            return False
+    monkeypatch.setattr("hud.utils.mcp.setup_hud_telemetry", lambda *args, **kwargs: DummyCM())
+    agent = DummyAgent(mcp_client=fake_client, auto_trace=False)
+    result = await agent.run("p", max_steps=1)
+    assert result.isError is True
+    assert "Could not connect" in (result.content or "")
+@pytest.mark.asyncio
+async def test_run_calls_response_tool_when_configured(monkeypatch):
+    fake_client = mock.AsyncMock()
+    fake_client.mcp_config = {}
+    fake_client.initialize.return_value = None
+    fake_client.list_tools.return_value = []
+    fake_client.shutdown.return_value = None
+    ok = MCPToolResult(content=text_to_blocks("ok"), isError=False)
+    fake_client.call_tool.return_value = ok
+    class DummyCM:
+        def __exit__(self, *args, **kwargs):
+            return False
+    monkeypatch.setattr("hud.utils.mcp.setup_hud_telemetry", lambda *args, **kwargs: DummyCM())
+    agent = DummyAgent(mcp_client=fake_client, auto_trace=False, response_tool_name="submit")
+    result = await agent.run("hello", max_steps=1)
+    assert result.isError is False
+    fake_client.call_tool.assert_awaited()
+@pytest.mark.asyncio
+async def test_get_available_tools_after_initialize(monkeypatch):
+    fake_client = mock.AsyncMock()
+    fake_client.mcp_config = {}
+    fake_client.initialize.return_value = None
+    fake_client.list_tools.return_value = []
+    fake_client.shutdown.return_value = None
+    class DummyCM:
+        def __exit__(self, *args, **kwargs):
+            return False
+    monkeypatch.setattr("hud.utils.mcp.setup_hud_telemetry", lambda *args, **kwargs: DummyCM())
+    agent = DummyAgent(mcp_client=fake_client, auto_trace=False)
+    await agent.initialize(None)
+    assert agent.get_available_tools() == []

hud-python 0.4.52__tar.gz → 0.4.54__tar.gz

Potentially problematic release.

hud-python 0.4.52tar.gz → 0.4.54tar.gz