PyPI - hud-python - Versions diffs - 0.4.36__tar.gz → 0.4.37__tar.gz - Mend

hud-python 0.4.36tar.gz → 0.4.37tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Potentially problematic release.

This version of hud-python might be problematic. Click here for more details.

Files changed (248) hide show

{hud_python-0.4.36 → hud_python-0.4.37}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: hud-python
-Version: 0.4.36
+Version: 0.4.37
 Summary: SDK for the HUD platform.
 Project-URL: Homepage, https://github.com/hud-evals/hud-python
 Project-URL: Bug Tracker, https://github.com/hud-evals/hud-python/issues
@@ -36,11 +36,13 @@ Classifier: Programming Language :: Python :: 3.12
 Classifier: Programming Language :: Python :: 3.13
 Requires-Python: <3.13,>=3.11
 Requires-Dist: anthropic
+Requires-Dist: blessed>=1.20.0
 Requires-Dist: datasets>=2.14.0
 Requires-Dist: httpx<1,>=0.23.0
 Requires-Dist: hud-fastmcp-python-sdk>=0.1.2
 Requires-Dist: hud-mcp-python-sdk>=3.13.2
 Requires-Dist: hud-mcp-use-python-sdk==2.3.19
+Requires-Dist: litellm>=1.55.0
 Requires-Dist: numpy>=1.24.0
 Requires-Dist: openai
 Requires-Dist: opentelemetry-api>=1.34.1
@@ -156,8 +158,8 @@ OSS RL environment + evals toolkit. Wrap software as environments, run benchmark
 ## Highlights
 - 🚀 **[MCP environment skeleton](https://docs.hud.so/core-concepts/mcp-protocol)** – any agent can call any environment.
-- ⚡️ **[Live telemetry](https://app.hud.so)** – inspect every tool call, observation, and reward in real time.
-- 🗂️ **[Public benchmarks](https://app.hud.so/leaderboards)** – OSWorld-Verified, SheetBench-50, and more.
+- ⚡️ **[Live telemetry](https://hud.so)** – inspect every tool call, observation, and reward in real time.
+- 🗂️ **[Public benchmarks](https://hud.so/leaderboards)** – OSWorld-Verified, SheetBench-50, and more.
 - 🌱 **[Reinforcement learning built-in](rl/)** – Verifiers gym pipelines for GRPO on any environment.
 - 🌐 **[Cloud browsers](environments/remote_browser/)** – AnchorBrowser, Steel, BrowserBase integrations for browser automation.
 - 🛠️ **[Hot-reload dev loop](environments/README.md#phase-5-hot-reload-development-with-cursor-agent)** – `hud dev` for iterating on environments without rebuilds.
@@ -203,14 +205,14 @@ from hud.agents import ClaudeAgent
 from hud.datasets import Task  # See docs: https://docs.hud.so/reference/tasks
 async def main() -> None:
-    with hud.trace("Quick Start 2048"): # All telemetry works for any MCP-based agent (see https://app.hud.so)
+    with hud.trace("Quick Start 2048"): # All telemetry works for any MCP-based agent (see https://hud.so)
         task = {
             "prompt": "Reach 64 in 2048.",
             "mcp_config": {
                 "hud": {
                     "url": "https://mcp.hud.so/v3/mcp",  # HUD's cloud MCP server (see https://docs.hud.so/core-concepts/architecture)
                     "headers": {
-                        "Authorization": f"Bearer {settings.api_key}",  # Get your key at https://app.hud.so
+                        "Authorization": f"Bearer {settings.api_key}",  # Get your key at https://hud.so
                         "Mcp-Image": "hudpython/hud-text-2048:v1.2"  # Docker image from https://hub.docker.com/u/hudpython
                     }
                 }
@@ -237,7 +239,7 @@ async def main() -> None:
 asyncio.run(main())
 ```
-The above example let's the agent play 2048 ([See replay](https://app.hud.so/trace/6feed7bd-5f67-4d66-b77f-eb1e3164604f))
+The above example let's the agent play 2048 ([See replay](https://hud.so/trace/6feed7bd-5f67-4d66-b77f-eb1e3164604f))
 ![Agent playing 2048](https://raw.githubusercontent.com/hud-evals/hud-python/main/docs/src/images/2048_1.gif)
@@ -268,7 +270,7 @@ Supports multi‑turn RL for both:
 - Language‑only models (e.g., `Qwen/Qwen2.5-7B-Instruct`)
 - Vision‑Language models (e.g., `Qwen/Qwen2.5-VL-3B-Instruct`)
-By default, `hud rl` provisions a persistant server and trainer in the cloud, streams telemetry to `app.hud.so`, and lets you monitor/manage models at `app.hud.so/models`. Use `--local` to run entirely on your machines (typically 2+ GPUs: one for vLLM, the rest for training).
+By default, `hud rl` provisions a persistant server and trainer in the cloud, streams telemetry to `hud.so`, and lets you monitor/manage models at `hud.so/models`. Use `--local` to run entirely on your machines (typically 2+ GPUs: one for vLLM, the rest for training).
 Any HUD MCP environment and evaluation works with our RL pipeline (including remote configurations). See the guided docs: `https://docs.hud.so/train-agents/quickstart`.
@@ -278,7 +280,7 @@ This is Claude Computer Use running on our proprietary financial analyst benchma
 ![Trace screenshot](https://raw.githubusercontent.com/hud-evals/hud-python/main/docs/src/images/trace_sheet.gif)
-> [See this trace on _app.hud.so_](https://app.hud.so/trace/9e212e9e-3627-4f1f-9eb5-c6d03c59070a)
+> [See this trace on _hud.so_](https://hud.so/trace/9e212e9e-3627-4f1f-9eb5-c6d03c59070a)
 This example runs the full dataset (only takes ~20 minutes) using [run_evaluation.py](examples/run_evaluation.py):
@@ -304,7 +306,7 @@ results = await run_dataset(
 print(f"Average reward: {sum(r.reward for r in results) / len(results):.2f}")
 ```
-> Running a dataset creates a job and streams results to the [app.hud.so](https://app.hud.so) platform for analysis and [leaderboard submission](https://docs.hud.so/evaluate-agents/leaderboards).
+> Running a dataset creates a job and streams results to the [hud.so](https://hud.so) platform for analysis and [leaderboard submission](https://docs.hud.so/evaluate-agents/leaderboards).
 ## Building Environments (MCP)
@@ -395,7 +397,7 @@ Tools
 hud push # needs docker login, hud api key
 ```
-5. Now you can use `mcp.hud.so` to launch 100s of instances of this environment in parallel with any agent, and see everything live on [app.hud.so](https://app.hud.so):
+5. Now you can use `mcp.hud.so` to launch 100s of instances of this environment in parallel with any agent, and see everything live on [hud.so](https://hud.so):
 ```python
 from hud.agents import ClaudeAgent
@@ -426,7 +428,7 @@ result = await ClaudeAgent().run({  # See all agents: https://docs.hud.so/refere
 ## Leaderboards & benchmarks
-All leaderboards are publicly available on [app.hud.so/leaderboards](https://app.hud.so/leaderboards) (see [docs](https://docs.hud.so/evaluate-agents/leaderboards))
+All leaderboards are publicly available on [hud.so/leaderboards](https://hud.so/leaderboards) (see [docs](https://docs.hud.so/evaluate-agents/leaderboards))
 ![Leaderboard](https://raw.githubusercontent.com/hud-evals/hud-python/main/docs/src/images/leaderboards_3.png)
@@ -440,7 +442,7 @@ Using the [`run_dataset`](https://docs.hud.so/reference/tasks#run_dataset) funct
 %%{init: {"theme": "neutral", "themeVariables": {"fontSize": "14px"}} }%%
 graph LR
     subgraph "Platform"
-        Dashboard["📊 app.hud.so"]
+        Dashboard["📊 hud.so"]
         API["🔌 mcp.hud.so"]
     end

{hud_python-0.4.36 → hud_python-0.4.37}/README.md RENAMED Viewed

@@ -23,8 +23,8 @@ OSS RL environment + evals toolkit. Wrap software as environments, run benchmark
 ## Highlights
 - 🚀 **[MCP environment skeleton](https://docs.hud.so/core-concepts/mcp-protocol)** – any agent can call any environment.
-- ⚡️ **[Live telemetry](https://app.hud.so)** – inspect every tool call, observation, and reward in real time.
-- 🗂️ **[Public benchmarks](https://app.hud.so/leaderboards)** – OSWorld-Verified, SheetBench-50, and more.
+- ⚡️ **[Live telemetry](https://hud.so)** – inspect every tool call, observation, and reward in real time.
+- 🗂️ **[Public benchmarks](https://hud.so/leaderboards)** – OSWorld-Verified, SheetBench-50, and more.
 - 🌱 **[Reinforcement learning built-in](rl/)** – Verifiers gym pipelines for GRPO on any environment.
 - 🌐 **[Cloud browsers](environments/remote_browser/)** – AnchorBrowser, Steel, BrowserBase integrations for browser automation.
 - 🛠️ **[Hot-reload dev loop](environments/README.md#phase-5-hot-reload-development-with-cursor-agent)** – `hud dev` for iterating on environments without rebuilds.
@@ -70,14 +70,14 @@ from hud.agents import ClaudeAgent
 from hud.datasets import Task  # See docs: https://docs.hud.so/reference/tasks
 async def main() -> None:
-    with hud.trace("Quick Start 2048"): # All telemetry works for any MCP-based agent (see https://app.hud.so)
+    with hud.trace("Quick Start 2048"): # All telemetry works for any MCP-based agent (see https://hud.so)
         task = {
             "prompt": "Reach 64 in 2048.",
             "mcp_config": {
                 "hud": {
                     "url": "https://mcp.hud.so/v3/mcp",  # HUD's cloud MCP server (see https://docs.hud.so/core-concepts/architecture)
                     "headers": {
-                        "Authorization": f"Bearer {settings.api_key}",  # Get your key at https://app.hud.so
+                        "Authorization": f"Bearer {settings.api_key}",  # Get your key at https://hud.so
                         "Mcp-Image": "hudpython/hud-text-2048:v1.2"  # Docker image from https://hub.docker.com/u/hudpython
                     }
                 }
@@ -104,7 +104,7 @@ async def main() -> None:
 asyncio.run(main())
 ```
-The above example let's the agent play 2048 ([See replay](https://app.hud.so/trace/6feed7bd-5f67-4d66-b77f-eb1e3164604f))
+The above example let's the agent play 2048 ([See replay](https://hud.so/trace/6feed7bd-5f67-4d66-b77f-eb1e3164604f))
 ![Agent playing 2048](https://raw.githubusercontent.com/hud-evals/hud-python/main/docs/src/images/2048_1.gif)
@@ -135,7 +135,7 @@ Supports multi‑turn RL for both:
 - Language‑only models (e.g., `Qwen/Qwen2.5-7B-Instruct`)
 - Vision‑Language models (e.g., `Qwen/Qwen2.5-VL-3B-Instruct`)
-By default, `hud rl` provisions a persistant server and trainer in the cloud, streams telemetry to `app.hud.so`, and lets you monitor/manage models at `app.hud.so/models`. Use `--local` to run entirely on your machines (typically 2+ GPUs: one for vLLM, the rest for training).
+By default, `hud rl` provisions a persistant server and trainer in the cloud, streams telemetry to `hud.so`, and lets you monitor/manage models at `hud.so/models`. Use `--local` to run entirely on your machines (typically 2+ GPUs: one for vLLM, the rest for training).
 Any HUD MCP environment and evaluation works with our RL pipeline (including remote configurations). See the guided docs: `https://docs.hud.so/train-agents/quickstart`.
@@ -145,7 +145,7 @@ This is Claude Computer Use running on our proprietary financial analyst benchma
 ![Trace screenshot](https://raw.githubusercontent.com/hud-evals/hud-python/main/docs/src/images/trace_sheet.gif)
-> [See this trace on _app.hud.so_](https://app.hud.so/trace/9e212e9e-3627-4f1f-9eb5-c6d03c59070a)
+> [See this trace on _hud.so_](https://hud.so/trace/9e212e9e-3627-4f1f-9eb5-c6d03c59070a)
 This example runs the full dataset (only takes ~20 minutes) using [run_evaluation.py](examples/run_evaluation.py):
@@ -171,7 +171,7 @@ results = await run_dataset(
 print(f"Average reward: {sum(r.reward for r in results) / len(results):.2f}")
 ```
-> Running a dataset creates a job and streams results to the [app.hud.so](https://app.hud.so) platform for analysis and [leaderboard submission](https://docs.hud.so/evaluate-agents/leaderboards).
+> Running a dataset creates a job and streams results to the [hud.so](https://hud.so) platform for analysis and [leaderboard submission](https://docs.hud.so/evaluate-agents/leaderboards).
 ## Building Environments (MCP)
@@ -262,7 +262,7 @@ Tools
 hud push # needs docker login, hud api key
 ```
-5. Now you can use `mcp.hud.so` to launch 100s of instances of this environment in parallel with any agent, and see everything live on [app.hud.so](https://app.hud.so):
+5. Now you can use `mcp.hud.so` to launch 100s of instances of this environment in parallel with any agent, and see everything live on [hud.so](https://hud.so):
 ```python
 from hud.agents import ClaudeAgent
@@ -293,7 +293,7 @@ result = await ClaudeAgent().run({  # See all agents: https://docs.hud.so/refere
 ## Leaderboards & benchmarks
-All leaderboards are publicly available on [app.hud.so/leaderboards](https://app.hud.so/leaderboards) (see [docs](https://docs.hud.so/evaluate-agents/leaderboards))
+All leaderboards are publicly available on [hud.so/leaderboards](https://hud.so/leaderboards) (see [docs](https://docs.hud.so/evaluate-agents/leaderboards))
 ![Leaderboard](https://raw.githubusercontent.com/hud-evals/hud-python/main/docs/src/images/leaderboards_3.png)
@@ -307,7 +307,7 @@ Using the [`run_dataset`](https://docs.hud.so/reference/tasks#run_dataset) funct
 %%{init: {"theme": "neutral", "themeVariables": {"fontSize": "14px"}} }%%
 graph LR
     subgraph "Platform"
-        Dashboard["📊 app.hud.so"]
+        Dashboard["📊 hud.so"]
         API["🔌 mcp.hud.so"]
     end

{hud_python-0.4.36 → hud_python-0.4.37}/environments/README.md RENAMED Viewed

@@ -495,7 +495,7 @@ from hud.agents import ClaudeAgent
 from hud.clients import MCPClient
 async def main():
-    # `trace` captures *everything* that happens and sends it to app.hud.so
+    # `trace` captures *everything* that happens and sends it to hud.so
     with hud.trace("local_test"):
         task = Task(
             prompt="Complete the task",
@@ -524,7 +524,7 @@ async def main():
 asyncio.run(main())
 ```
-The `trace` context manager sends a full timeline of agent actions, tool calls, and rewards to app.hud.so – perfect for debugging.
+The `trace` context manager sends a full timeline of agent actions, tool calls, and rewards to hud.so – perfect for debugging.
 See `examples/01_hello_2048.py` and `examples/task_with_setup_eval.py` for larger end-to-end demos.
@@ -532,7 +532,7 @@ See `examples/01_hello_2048.py` and `examples/task_with_setup_eval.py` for large
 ## Phase 4 – Remote Deployment & HUD Runner
-**Goal →** the exact same image runs in parallel on hundreds of instances, and exposes more telemetry so the app.hud.so can visualise the whole lifecycle.
+**Goal →** the exact same image runs in parallel on hundreds of instances, and exposes more telemetry so the hud.so can visualise the whole lifecycle.
 ### 1. Publish your image
@@ -595,11 +595,11 @@ async def initialize_environment(session=None, progress_token=None):
     await send(100, "ready")
 ```
-Those messages are displayed live on app.hud.so alongside resource graphs – perfect feedback while you wait.
+Those messages are displayed live on hud.so alongside resource graphs – perfect feedback while you wait.
 ### 4. Live telemetry (`telemetry://live`) (Optional)
-Expose a resource named `telemetry://live` exactly like in `environments/browser/src/hud_controller/server.py` to return live url to be displayed on app.hud.so.
+Expose a resource named `telemetry://live` exactly like in `environments/browser/src/hud_controller/server.py` to return live url to be displayed on hud.so.
 Once all of the above works you can unleash *hundreds* of concurrent agents on your new environment.

{hud_python-0.4.36 → hud_python-0.4.37}/environments/blank/README.md RENAMED Viewed

@@ -10,7 +10,7 @@
 IMPORTANT: Make sure all logs are going to stderr instead of stdio, which is reserved for MCP communication
-### Interactive Development
+### Testing your environment
 ```bash
 # 1. Configure your API keys (optional - only needed for evaluation)
 # Edit .env file to add your HUD_API_KEY and ANTHROPIC_API_KEY
@@ -24,13 +24,29 @@ hud dev --build --interactive
 hud eval tasks.json --agent claude
 # Option B: Interactive notebook test_env.ipynb (great for learning!)
-# Requires installation:
-pip install hud-python[agents]
 # Option C: Simple Python script (runs all tasks from tasks.json)
 python test_task.py
 ```
+## Iterating on your environment
+This is usually the process for making any environment better:
+```bash
+# 1. Start the environment and interact with it directly (or give MCP server to an agent):
+hud dev --build --interactive
+# 2. If the environment cannot start or fails inexplicably:
+hud debug test_env:dev # Or your env name that appears when you run hud dev
+# After fixing the error, go back to 1.
+# 3. When the environment is in a stable state:
+hud build
+hud push # Requires docker login
+# 4. As soon as it's pushed to the newest version, make sure tasks have it updated and run:
+hud rl
+# This is a good test to see if your environment and tasks are high quality!
 ## Layout
 ```
 controller/
@@ -83,7 +99,7 @@ save_tasks(tasks, repo_id="your-org/your-dataset")
 hud eval "your-org/your-dataset" --agent claude
 # View results at:
-# app.hud.so/leaderboards/your-org/your-dataset
+# hud.so/leaderboards/your-org/your-dataset
 ```
 **Note**: Only public HuggingFace datasets appear as leaderboards!

{hud_python-0.4.36 → hud_python-0.4.37}/environments/blank/pyproject.toml RENAMED Viewed

@@ -3,7 +3,7 @@ name = "test_test"
 version = "0.1.0"
 description = "A minimal HUD environment"
 requires-python = ">=3.11"
-dependencies = [ "hud-python==0.4.36", "fastapi", "uvicorn[standard]", "httpx>=0.28.1",]
+dependencies = [ "hud-python==0.4.37", "fastapi", "uvicorn[standard]", "httpx>=0.28.1",]
 [build-system]
 requires = [ "hatchling",]

{hud_python-0.4.36 → hud_python-0.4.37}/environments/browser/README.md RENAMED Viewed

@@ -75,7 +75,7 @@ save_tasks(tasks, repo_id="your-org/your-dataset")
 hud eval "your-org/your-dataset" --agent claude
 # View results at:
-# app.hud.so/leaderboards/your-org/your-dataset
+# hud.so/leaderboards/your-org/your-dataset
 ```
 **Note**: Only public HuggingFace datasets appear as leaderboards!

{hud_python-0.4.36 → hud_python-0.4.37}/environments/browser/pyproject.toml RENAMED Viewed

@@ -3,7 +3,7 @@ name = "hud-browser-controller"
 version = "0.1.0"
 description = "HUD Browser Controller - MCP interface for browser environments"
 requires-python = ">=3.11,<3.14"
-dependencies = [ "pydantic>=2.6,<3", "pydantic-settings>=2.2,<3", "hud-python@git+https://github.com/hud-evals/hud-python@env-cli-improvements", "playwright", "pyautogui", "httpx", "typer", "fastapi", "uvicorn",]
+dependencies = [ "pydantic>=2.6,<3", "pydantic-settings>=2.2,<3", "hud-python@git+https://github.com/hud-evals/hud-python@env-cli-improvements", "playwright", "pyautogui", "httpx", "typer", "fastapi>=0.104.1", "uvicorn[standard]>=0.24.0", "python-multipart>=0.0.6",]
 [build-system]
 requires = [ "hatchling",]
@@ -19,4 +19,4 @@ image = "hud-browser:dev"
 allow-direct-references = true
 [tool.hatch.build.targets.wheel]
-packages = [ "controller", "problems",]
+packages = [ "controller", "environment",]

{hud_python-0.4.36 → hud_python-0.4.37}/environments/deepresearch/pyproject.toml RENAMED Viewed

@@ -3,7 +3,7 @@ name = "deepresearch"
 version = "0.1.0"
 description = "DeepResearch HUD environment with HTTP backend (EXA on server)"
 requires-python = ">=3.11"
-dependencies = [ "hud-python==0.4.36", "fastapi>=0.104.1", "uvicorn[standard]>=0.24.0", "httpx>=0.24.0",]
+dependencies = [ "hud-python==0.4.37", "fastapi>=0.104.1", "uvicorn[standard]>=0.24.0", "httpx>=0.24.0",]
 [build-system]
 requires = [ "hatchling",]

{hud_python-0.4.36 → hud_python-0.4.37}/hud/agents/__init__.py RENAMED Viewed

@@ -2,12 +2,14 @@ from __future__ import annotations
 from .base import MCPAgent
 from .claude import ClaudeAgent
+from .lite_llm import LiteAgent
 from .openai import OperatorAgent
 from .openai_chat_generic import GenericOpenAIChatAgent
 __all__ = [
     "ClaudeAgent",
     "GenericOpenAIChatAgent",
+    "LiteAgent",
     "MCPAgent",
     "OperatorAgent",
 ]

hud_python-0.4.37/hud/agents/lite_llm.py ADDED Viewed

@@ -0,0 +1,72 @@
+"""LiteLLM MCP Agent implementation.
+Same OpenAI chat-completions shape + MCP tool plumbing,
+but transport is LiteLLM and (optionally) tools are shaped by LiteLLM's MCP transformer.
+"""
+from __future__ import annotations
+import logging
+from typing import Any, ClassVar
+import litellm
+from .openai_chat_generic import GenericOpenAIChatAgent
+logger = logging.getLogger(__name__)
+# Prefer LiteLLM's built-in MCP -> OpenAI tool transformer (handles Bedrock nuances)
+try:
+    from litellm.experimental_mcp_client.tools import (
+        transform_mcp_tool_to_openai_tool,
+    )
+except Exception:  # pragma: no cover - optional dependency
+    transform_mcp_tool_to_openai_tool = None  # type: ignore
+class LiteAgent(GenericOpenAIChatAgent):
+    """
+    Same OpenAI chat-completions shape + MCP tool plumbing,
+    but transport is LiteLLM and (optionally) tools are shaped by LiteLLM's MCP transformer.
+    """
+    metadata: ClassVar[dict[str, Any]] = {}
+    def __init__(
+        self,
+        *,
+        model_name: str = "gpt-4o-mini",
+        completion_kwargs: dict[str, Any] | None = None,
+        **agent_kwargs: Any,
+    ) -> None:
+        # We don't need an OpenAI client; pass None
+        super().__init__(
+            openai_client=None,
+            model_name=model_name,
+            completion_kwargs=completion_kwargs,
+            **agent_kwargs,
+        )
+    def get_tool_schemas(self) -> list[dict]:
+        # Prefer LiteLLM's stricter transformer (handles Bedrock & friends)
+        if transform_mcp_tool_to_openai_tool is not None:
+            return [
+                transform_mcp_tool_to_openai_tool(t)  # returns ChatCompletionToolParam-like dict
+                for t in self.get_available_tools()
+            ]
+        # Fallback to the generic OpenAI sanitizer
+        return GenericOpenAIChatAgent.get_tool_schemas(self)
+    async def _invoke_chat_completion(
+        self,
+        *,
+        messages: list[Any],
+        tools: list[dict] | None,
+        extra: dict[str, Any],
+    ):
+        return await litellm.acompletion(
+            model=self.model_name,
+            messages=messages,
+            tools=tools or None,  # LiteLLM tolerates None better than []
+            **extra,
+        )

{hud_python-0.4.36 → hud_python-0.4.37}/hud/agents/openai_chat_generic.py RENAMED Viewed

@@ -42,7 +42,7 @@ class GenericOpenAIChatAgent(MCPAgent):
     def __init__(
         self,
         *,
-        openai_client: AsyncOpenAI,
+        openai_client: AsyncOpenAI | None,
         model_name: str = "gpt-4o-mini",
         completion_kwargs: dict[str, Any] | None = None,
         **agent_kwargs: Any,
@@ -171,6 +171,23 @@ class GenericOpenAIChatAgent(MCPAgent):
             openai_tools.append(openai_tool)
         return openai_tools
+    async def _invoke_chat_completion(
+        self,
+        *,
+        messages: list[Any],
+        tools: list[dict] | None,
+        extra: dict[str, Any],
+    ):
+        if self.oai is None:
+            raise ValueError("openai_client is required for GenericOpenAIChatAgent")
+        # default transport = OpenAI SDK
+        return await self.oai.chat.completions.create(
+            model=self.model_name,
+            messages=messages,
+            tools=tools,  # already ChatCompletionToolParam-shaped
+            **extra,
+        )
     @instrument(
         span_type="agent",
         record_args=False,
@@ -180,17 +197,14 @@ class GenericOpenAIChatAgent(MCPAgent):
         """Send chat request to OpenAI and convert the response."""
         # Convert MCP tool schemas to OpenAI format
-        mcp_schemas = self.get_tool_schemas()
+        tools = cast("list[ChatCompletionToolParam]", self.get_tool_schemas())
         protected_keys = {"model", "messages", "tools"}
         extra = {k: v for k, v in (self.completion_kwargs or {}).items() if k not in protected_keys}
         try:
-            response = await self.oai.chat.completions.create(
-                model=self.model_name,
-                messages=messages,
-                tools=cast("list[ChatCompletionToolParam]", mcp_schemas),
-                **extra,
+            response = await self._invoke_chat_completion(
+                messages=messages, tools=tools, extra=extra
             )
         except Exception as e:
             error_content = f"Error getting response {e}"

{hud_python-0.4.36 → hud_python-0.4.37}/hud/cli/__init__.py RENAMED Viewed

@@ -912,7 +912,7 @@ def eval(
     agent: str | None = typer.Argument(
         None,
         help=(
-            "Agent backend to use (claude, openai, or vllm). If not provided, will prompt interactively."  # noqa: E501
+            "Agent backend to use (claude, openai, vllm, or litellm). If not provided, will prompt interactively."  # noqa: E501
         ),
     ),
     full: bool = typer.Option(
@@ -960,6 +960,12 @@ def eval(
         "--verbose",
         help="Enable verbose output from the agent",
     ),
+    very_verbose: bool = typer.Option(
+        False,
+        "--very-verbose",
+        "-vv",
+        help="Enable debug-level logs for maximum visibility",
+    ),
     vllm_base_url: str | None = typer.Option(
         None,
         "--vllm-base-url",
@@ -1025,13 +1031,14 @@ def eval(
                 {"name": "Claude 4 Sonnet", "value": "claude"},
                 {"name": "OpenAI Computer Use", "value": "openai"},
                 {"name": "vLLM (Local Server)", "value": "vllm"},
+                {"name": "LiteLLM (Multi-provider)", "value": "litellm"},
             ]
         )
         agent = hud_console.select("Select an agent to use:", choices=choices, default=0)
     # Handle HUD model selection
-    if agent and agent not in ["claude", "openai", "vllm"]:
+    if agent and agent not in ["claude", "openai", "vllm", "litellm"]:
         # Find remote model name
         model = agent
         if not vllm_base_url:
@@ -1052,7 +1059,7 @@ def eval(
         hud_console.info(f"Using HUD model: {model} (trained on {base_model})")
     # Validate agent choice
-    valid_agents = ["claude", "openai", "vllm"]
+    valid_agents = ["claude", "openai", "vllm", "litellm"]
     if agent not in valid_agents:
         hud_console.error(f"Invalid agent: {agent}. Must be one of: {', '.join(valid_agents)}")
         raise typer.Exit(1)
@@ -1070,6 +1077,7 @@ def eval(
         max_workers=max_workers,
         max_concurrent_per_worker=max_concurrent_per_worker,
         verbose=verbose,
+        very_verbose=very_verbose,
         vllm_base_url=vllm_base_url,
         group_size=group_size,
     )
@@ -1119,7 +1127,7 @@ def rl(
     ),
     model: str | None = typer.Argument(
         None,
-        help="Model to train (default: interactive selection)",
+        help="Model to train from https://hud.so/models (default: interactive selection)",
     ),
     config_file: Path | None = typer.Option(  # noqa: B008
         None,
@@ -1159,6 +1167,12 @@ def rl(
         "--ddp-gpus",
         help="Specific GPUs for DDP (e.g., '0,1,2,3')",
     ),
+    yes: bool = typer.Option(
+        False,
+        "--yes",
+        "-y",
+        help="Auto-accept all prompts and use defaults (lazy mode)",
+    ),
     vllm_gpu: int | None = typer.Option(
         None,
         "--vllm-gpu",
@@ -1180,6 +1194,7 @@ def rl(
         no_ddp=no_ddp,
         ddp_gpus=ddp_gpus,
         vllm_gpu=vllm_gpu,
+        yes=yes,
     )

{hud_python-0.4.36 → hud_python-0.4.37}/hud/cli/build.py RENAMED Viewed

@@ -3,6 +3,7 @@
 from __future__ import annotations
 import asyncio
+import contextlib
 import hashlib
 import subprocess
 import time
@@ -13,6 +14,7 @@ from typing import Any
 import typer
 import yaml
+from hud.cli.utils.source_hash import compute_source_hash, list_source_files
 from hud.clients import MCPClient
 from hud.utils.hud_console import HUDConsole
 from hud.version import __version__ as hud_version
@@ -341,10 +343,11 @@ def build_environment(
     required_env, optional_env = extract_env_vars_from_dockerfile(dockerfile_path)
     # Merge user-provided env vars with detected ones
-    provided_env_vars = {}
+    provided_env_vars: dict[str, str] = {}
     missing_required = []
     if env_vars:
-        provided_env_vars = env_vars.copy()
+        # Use placeholders in lock file for any provided values to avoid storing secrets
+        provided_env_vars = {k: f"${{{k}}}" for k in env_vars}
         # Track which required vars are still missing
         missing_required = [e for e in required_env if e not in env_vars]
@@ -384,6 +387,8 @@ def build_environment(
             "hudVersion": hud_version,
             "directory": str(env_dir.name),
             "version": new_version,  # Internal environment version
+            # Fast source fingerprint for change detection
+            "sourceHash": compute_source_hash(env_dir),
         },
         "environment": {
             "initializeMs": analysis["initializeMs"],
@@ -424,6 +429,16 @@ def build_environment(
     with open(lock_path, "w") as f:
         yaml.dump(lock_content, f, default_flow_style=False, sort_keys=False)
+    # Also write the file list we hashed for transparency (non-essential)
+    with contextlib.suppress(Exception):
+        files = [
+            str(p.resolve().relative_to(env_dir)).replace("\\", "/")
+            for p in list_source_files(env_dir)
+        ]
+        lock_content["build"]["sourceFiles"] = files
+        with open(lock_path, "w") as f:
+            yaml.dump(lock_content, f, default_flow_style=False, sort_keys=False)
     hud_console.success("Created lock file: hud.lock.yaml")
     # Calculate lock file hash

{hud_python-0.4.36 → hud_python-0.4.37}/hud/cli/dev.py RENAMED Viewed

@@ -530,7 +530,7 @@ async def start_mcp_proxy(
                     stderr=asyncio.subprocess.DEVNULL,
                 )
                 await stop_result.communicate()
-                hud_console.success("✅ Container stopped successfully")
+                hud_console.success("Container stopped successfully")
                 container_stopped = True
         except Exception as e:
             hud_console.warning(f"Failed to stop container: {e}")

hud-python 0.4.36__tar.gz → 0.4.37__tar.gz

Potentially problematic release.

hud-python 0.4.36tar.gz → 0.4.37tar.gz