PyPI - hud-python - Versions diffs - 0.4.30__tar.gz → 0.4.31__tar.gz - Mend

hud-python 0.4.30tar.gz → 0.4.31tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Potentially problematic release.

This version of hud-python might be problematic. Click here for more details.

Files changed (222) hide show

{hud_python-0.4.30 → hud_python-0.4.31}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: hud-python
-Version: 0.4.30
+Version: 0.4.31
 Summary: SDK for the HUD platform.
 Project-URL: Homepage, https://github.com/hud-evals/hud-python
 Project-URL: Bug Tracker, https://github.com/hud-evals/hud-python/issues
@@ -35,15 +35,20 @@ Classifier: Programming Language :: Python :: 3.11
 Classifier: Programming Language :: Python :: 3.12
 Classifier: Programming Language :: Python :: 3.13
 Requires-Python: <3.13,>=3.11
+Requires-Dist: anthropic
+Requires-Dist: datasets>=2.14.0
 Requires-Dist: httpx<1,>=0.23.0
 Requires-Dist: hud-fastmcp-python-sdk>=0.1.2
 Requires-Dist: hud-mcp-python-sdk>=3.13.2
 Requires-Dist: hud-mcp-use-python-sdk>=2.3.16
+Requires-Dist: numpy>=1.24.0
+Requires-Dist: openai
 Requires-Dist: opentelemetry-api>=1.34.1
 Requires-Dist: opentelemetry-exporter-otlp-proto-http>=1.34.1
 Requires-Dist: opentelemetry-instrumentation-mcp==0.47.0
 Requires-Dist: opentelemetry-sdk>=1.34.1
 Requires-Dist: pathspec>=0.12.1
+Requires-Dist: pillow>=11.1.0
 Requires-Dist: prompt-toolkit==3.0.51
 Requires-Dist: pydantic-settings<3,>=2
 Requires-Dist: pydantic<3,>=2
@@ -54,8 +59,6 @@ Requires-Dist: typer>=0.9.0
 Requires-Dist: watchfiles>=0.21.0
 Requires-Dist: wrapt>=1.14.0
 Provides-Extra: agent
-Requires-Dist: anthropic; extra == 'agent'
-Requires-Dist: datasets>=2.14.0; extra == 'agent'
 Requires-Dist: dotenv>=0.9.9; extra == 'agent'
 Requires-Dist: ipykernel; extra == 'agent'
 Requires-Dist: ipython<9; extra == 'agent'
@@ -64,12 +67,7 @@ Requires-Dist: jupyter-core; extra == 'agent'
 Requires-Dist: langchain; extra == 'agent'
 Requires-Dist: langchain-anthropic; extra == 'agent'
 Requires-Dist: langchain-openai; extra == 'agent'
-Requires-Dist: numpy>=1.24.0; extra == 'agent'
-Requires-Dist: openai; extra == 'agent'
-Requires-Dist: pillow>=11.1.0; extra == 'agent'
 Provides-Extra: agents
-Requires-Dist: anthropic; extra == 'agents'
-Requires-Dist: datasets>=2.14.0; extra == 'agents'
 Requires-Dist: dotenv>=0.9.9; extra == 'agents'
 Requires-Dist: ipykernel; extra == 'agents'
 Requires-Dist: ipython<9; extra == 'agents'
@@ -78,13 +76,8 @@ Requires-Dist: jupyter-core; extra == 'agents'
 Requires-Dist: langchain; extra == 'agents'
 Requires-Dist: langchain-anthropic; extra == 'agents'
 Requires-Dist: langchain-openai; extra == 'agents'
-Requires-Dist: numpy>=1.24.0; extra == 'agents'
-Requires-Dist: openai; extra == 'agents'
-Requires-Dist: pillow>=11.1.0; extra == 'agents'
 Provides-Extra: dev
 Requires-Dist: aiodocker>=0.24.0; extra == 'dev'
-Requires-Dist: anthropic; extra == 'dev'
-Requires-Dist: datasets>=2.14.0; extra == 'dev'
 Requires-Dist: dotenv>=0.9.9; extra == 'dev'
 Requires-Dist: inspect-ai>=0.3.80; extra == 'dev'
 Requires-Dist: ipykernel; extra == 'dev'
@@ -94,8 +87,6 @@ Requires-Dist: jupyter-core; extra == 'dev'
 Requires-Dist: langchain; extra == 'dev'
 Requires-Dist: langchain-anthropic; extra == 'dev'
 Requires-Dist: langchain-openai; extra == 'dev'
-Requires-Dist: numpy>=1.24.0; extra == 'dev'
-Requires-Dist: openai; extra == 'dev'
 Requires-Dist: pillow>=11.1.0; extra == 'dev'
 Requires-Dist: playwright; extra == 'dev'
 Requires-Dist: pyautogui>=0.9.54; extra == 'dev'
@@ -108,9 +99,7 @@ Requires-Dist: ruff>=0.11.8; extra == 'dev'
 Requires-Dist: setuptools; extra == 'dev'
 Requires-Dist: textdistance<5,>=4.5.0; extra == 'dev'
 Provides-Extra: rl
-Requires-Dist: anthropic; extra == 'rl'
 Requires-Dist: bitsandbytes>=0.41.0; (sys_platform == 'linux') and extra == 'rl'
-Requires-Dist: datasets>=2.14.0; extra == 'rl'
 Requires-Dist: dotenv>=0.9.9; extra == 'rl'
 Requires-Dist: ipykernel; extra == 'rl'
 Requires-Dist: ipython<9; extra == 'rl'
@@ -120,10 +109,7 @@ Requires-Dist: langchain; extra == 'rl'
 Requires-Dist: langchain-anthropic; extra == 'rl'
 Requires-Dist: langchain-openai; extra == 'rl'
 Requires-Dist: liger-kernel>=0.5.0; (sys_platform == 'linux') and extra == 'rl'
-Requires-Dist: numpy>=1.24.0; extra == 'rl'
-Requires-Dist: openai; extra == 'rl'
 Requires-Dist: peft>=0.17.1; extra == 'rl'
-Requires-Dist: pillow>=11.1.0; extra == 'rl'
 Requires-Dist: vllm==0.10.1.1; extra == 'rl'
 Description-Content-Type: text/markdown
@@ -239,21 +225,34 @@ The above example let's the agent play 2048 ([See replay](https://app.hud.so/tra
 ## Reinforcement Learning with GRPO
-This is a Qwen-2.5-3B agent training a policy on the [`text-2048`](environments/text_2048/) environment (see above) using [Verifiers](rl/):
+This is a Qwen‑2.5‑VL‑3B agent training a policy on the 2048-basic browser environment:
 ![RL curve](https://raw.githubusercontent.com/hud-evals/hud-python/main/docs/src/images/rl_2.png)
-To start training, check out the [`rl/README.md`](rl/README.md) folder:
+Train with the new interactive `hud rl` flow:
 ```bash
-git clone https://github.com/hud-evals/hud-python
-cd hud-python/rl
-python train_2048.py
+# Install CLI with RL extras
+uv tool install "hud-python[rl]"
+# Option A: Run directly from a HuggingFace dataset
+hud rl hud-evals/basic-2048
+# Option B: Download first, modify, then train
+hud get hud-evals/basic-2048
+hud rl basic-2048.jsonl
+# Optional: baseline evaluation
+hud eval basic-2048.jsonl
 ```
-Any hud MCP environment and evaluation works with our RL pipeline. Even our remote configurations!
+Supports multi‑turn RL for both:
+- Language‑only models (e.g., `Qwen/Qwen2.5-7B-Instruct`)
+- Vision‑Language models (e.g., `Qwen/Qwen2.5-VL-3B-Instruct`)
+By default, `hud rl` provisions a persistant server and trainer in the cloud, streams telemetry to `app.hud.so`, and lets you monitor/manage models at `app.hud.so/models`. Use `--local` to run entirely on your machines (typically 2+ GPUs: one for vLLM, the rest for training).
-> The [`rl/README.md`](rl/README.md) walks you through several examples of RL training and takes less than 15 minutes to set up for your custom agent!
+Any HUD MCP environment and evaluation works with our RL pipeline (including remote configurations). See the guided docs: `https://docs.hud.so/train-agents/quickstart`.
 ## Benchmarking Agents

{hud_python-0.4.30 → hud_python-0.4.31}/README.md RENAMED Viewed

@@ -110,21 +110,34 @@ The above example let's the agent play 2048 ([See replay](https://app.hud.so/tra
 ## Reinforcement Learning with GRPO
-This is a Qwen-2.5-3B agent training a policy on the [`text-2048`](environments/text_2048/) environment (see above) using [Verifiers](rl/):
+This is a Qwen‑2.5‑VL‑3B agent training a policy on the 2048-basic browser environment:
 ![RL curve](https://raw.githubusercontent.com/hud-evals/hud-python/main/docs/src/images/rl_2.png)
-To start training, check out the [`rl/README.md`](rl/README.md) folder:
+Train with the new interactive `hud rl` flow:
 ```bash
-git clone https://github.com/hud-evals/hud-python
-cd hud-python/rl
-python train_2048.py
+# Install CLI with RL extras
+uv tool install "hud-python[rl]"
+# Option A: Run directly from a HuggingFace dataset
+hud rl hud-evals/basic-2048
+# Option B: Download first, modify, then train
+hud get hud-evals/basic-2048
+hud rl basic-2048.jsonl
+# Optional: baseline evaluation
+hud eval basic-2048.jsonl
 ```
-Any hud MCP environment and evaluation works with our RL pipeline. Even our remote configurations!
+Supports multi‑turn RL for both:
+- Language‑only models (e.g., `Qwen/Qwen2.5-7B-Instruct`)
+- Vision‑Language models (e.g., `Qwen/Qwen2.5-VL-3B-Instruct`)
+By default, `hud rl` provisions a persistant server and trainer in the cloud, streams telemetry to `app.hud.so`, and lets you monitor/manage models at `app.hud.so/models`. Use `--local` to run entirely on your machines (typically 2+ GPUs: one for vLLM, the rest for training).
-> The [`rl/README.md`](rl/README.md) walks you through several examples of RL training and takes less than 15 minutes to set up for your custom agent!
+Any HUD MCP environment and evaluation works with our RL pipeline (including remote configurations). See the guided docs: `https://docs.hud.so/train-agents/quickstart`.
 ## Benchmarking Agents

{hud_python-0.4.30 → hud_python-0.4.31}/hud/agents/openai_chat_generic.py RENAMED Viewed

@@ -231,7 +231,7 @@ class GenericOpenAIChatAgent(MCPAgent):
             for tc in msg.tool_calls:
                 if tc.function.name is not None:  # type: ignore
                     # _oai_to_mcp returns a single MCPToolCall; append it
-                    tool_calls.append(self._oai_to_mcp(tc)) # noqa: PERF401
+                    tool_calls.append(self._oai_to_mcp(tc))  # noqa: PERF401
         # Only stop on length (token limit), never on "stop"
         done = choice.finish_reason == "length"

hud_python-0.4.31/hud/cli/flows/tasks.py ADDED Viewed

@@ -0,0 +1,185 @@
+from __future__ import annotations
+import json
+import re
+from pathlib import Path
+from typing import TYPE_CHECKING, Any
+import typer
+import yaml
+from hud.cli.build import build_environment
+from hud.cli.push import push_environment
+from hud.cli.utils.docker import require_docker_running
+from hud.cli.utils.environment import is_environment_directory
+from hud.cli.utils.registry import extract_name_and_tag
+from hud.utils.hud_console import hud_console
+from hud.utils.tasks import load_tasks
+if TYPE_CHECKING:
+    from hud.types import Task
+def _is_remote_url(url: str) -> bool:
+    """Match the remote url."""
+    # See if a url is a remote url
+    return bool(re.match(r"^(https?:\/\/)?(www\.)?[a-zA-Z0-9\-\.]+\.[a-zA-Z]{2,}(\/\S*)?$", url))
+def _validate_tasks(tasks: list[Task]) -> bool:
+    """Validate the tasks file."""
+    for task in tasks:
+        if not task.mcp_config or (not _is_remote_url(task.mcp_config.get("url", ""))):
+            return False
+    return True
+def _find_environment_dir(tasks_path: Path) -> Path | None:
+    """Find the environment directory related to a tasks file.
+    Strategy:
+    - Prefer a directory containing hud.lock.yaml
+    - Fallback to a directory that looks like an environment (Dockerfile + pyproject.toml)
+    - Search the tasks file directory, CWD, and a couple of parents
+    """
+    candidates: list[Path] = []
+    cwd = Path.cwd()
+    candidates.extend([tasks_path.parent, cwd])
+    # Add parents (up to 2 levels for each)
+    for base in list(candidates):
+        p = base
+        for _ in range(2):
+            p = p.parent
+            if p not in candidates:
+                candidates.append(p)
+    # Prefer those with hud.lock.yaml
+    for d in candidates:
+        if (d / "hud.lock.yaml").exists():
+            return d
+    # Otherwise, find a plausible environment dir
+    for d in candidates:
+        try:
+            if is_environment_directory(d):
+                return d
+        except Exception as e:
+            hud_console.debug(f"Skipping path {d}: {e}")
+            continue
+    return None
+def _ensure_built(env_dir: Path) -> dict[str, Any]:
+    """Ensure the environment is built and a lock file exists; return lock data."""
+    lock_path = env_dir / "hud.lock.yaml"
+    if not lock_path.exists():
+        hud_console.warning("No hud.lock.yaml found. The environment hasn't been built.")
+        if not hud_console.confirm("Build the environment now (runs 'hud build')?", default=True):
+            raise typer.Exit(1)
+        # Check Docker availability before attempting a build
+        require_docker_running()
+        # Run build (non-interactive). If Docker isn't running, this will raise and stop the flow.
+        build_environment(str(env_dir))
+    # Load lock file
+    with open(lock_path) as f:
+        lock_data = yaml.safe_load(f) or {}
+    return lock_data
+def _ensure_pushed(env_dir: Path, lock_data: dict[str, Any]) -> dict[str, Any]:
+    """Ensure the environment is pushed to a registry; return updated lock data."""
+    pushed = bool(lock_data.get("push"))
+    if not pushed:
+        hud_console.warning("Environment not pushed to a registry yet.")
+        if not hud_console.confirm("Push to a registry now (runs 'hud push')?", default=True):
+            raise typer.Exit(1)
+        # Check Docker availability before attempting a push
+        require_docker_running()
+        # If Docker or login is not configured, the push function will fail and halt.
+        push_environment(str(env_dir))
+        # Reload lock after push
+        lock_path = env_dir / "hud.lock.yaml"
+        with open(lock_path) as f:
+            lock_data = yaml.safe_load(f) or {}
+    return lock_data
+def _derive_remote_image(lock_data: dict[str, Any]) -> str:
+    """Derive org/name:tag from lock file image field for MCP header."""
+    image_ref = str(lock_data.get("image", "")).strip()
+    if not image_ref:
+        raise typer.Exit("Lock file missing image reference")
+    name, tag = extract_name_and_tag(image_ref)
+    return f"{name}:{tag}"
+def convert_tasks_to_remote(tasks_file: str) -> str:
+    """Convert a local tasks file to remote MCP tasks and return new filename.
+    Steps:
+    1) Find env dir; ensure built (hud.lock.yaml), otherwise build
+    2) Ensure pushed to registry, otherwise push
+    3) Create remote_[tasks].json with mcp_config pointing to mcp.hud.so and Mcp-Image
+    4) Return the new tasks file path
+    """
+    tasks_path = Path(tasks_file).resolve()
+    tasks = load_tasks(str(tasks_path))
+    # Ensure HUD_API_KEY is available: prefer process env, else load from env_dir/.env
+    from hud.settings import settings
+    if not settings.api_key or not settings.api_key.strip():
+        hud_console.error("HUD_API_KEY is not set")
+        raise typer.Exit(1)
+    # Load tasks (supports .json and .jsonl)
+    if _validate_tasks(tasks):
+        return str(tasks_path)
+    # Locate environment
+    env_dir = _find_environment_dir(tasks_path)
+    if not env_dir:
+        hud_console.error("Could not locate an environment directory (Dockerfile + pyproject.toml)")
+        hud_console.hint("Ensure you're in or near your environment folder before running 'hud rl'")
+        raise typer.Exit(1)
+    # Ensure built and pushed
+    lock_data = _ensure_built(env_dir)
+    lock_data = _ensure_pushed(env_dir, lock_data)
+    # Derive remote image name org/name:tag
+    remote_image = _derive_remote_image(lock_data)
+    # Convert to list[dict]
+    tasks_payload: list[dict[str, Any]] = []
+    for t in tasks:
+        item = t.model_dump()
+        item["mcp_config"] = {
+            "hud": {
+                "url": "https://mcp.hud.so/v3/mcp",
+                "headers": {
+                    "Authorization": "Bearer ${HUD_API_KEY}",
+                    "Mcp-Image": remote_image,
+                },
+            }
+        }
+        tasks_payload.append(item)
+    # Write new file: remote_<name>.json (always JSON array)
+    remote_name = f"remote_{tasks_path.stem}.json"
+    remote_path = tasks_path.parent / remote_name
+    with open(remote_path, "w", encoding="utf-8") as f:
+        json.dump(tasks_payload, f, ensure_ascii=False, indent=2)
+        f.write("\n")
+    hud_console.success(f"Created remote tasks file: {remote_path.name}")
+    hud_console.hint("Proceeding with RL training on the remote environment")
+    return str(remote_path)

{hud_python-0.4.30 → hud_python-0.4.31}/hud/cli/init.py RENAMED Viewed

@@ -433,11 +433,11 @@ NOTEBOOK_TEMPLATE = """{{
 ENV_FILE_TEMPLATE = """# HUD API Configuration
 # Get your API key from https://app.hud.so/account
-HUD_API_KEY=your_hud_api_key_here
+HUD_API_KEY=""
 # Anthropic API Configuration (optional)
 # Required for using Claude agents - get from https://console.anthropic.com/
-ANTHROPIC_API_KEY=your_anthropic_api_key_here
+ANTHROPIC_API_KEY=""
 """
 README_TEMPLATE = """# {title}

hud_python-0.4.31/hud/cli/rl/__init__.py ADDED Viewed

@@ -0,0 +1,165 @@
+"""RL training command for HUD CLI."""
+from __future__ import annotations
+import logging
+import os
+from typing import TYPE_CHECKING
+import typer
+from rich.console import Console
+from hud.cli.utils.tasks import find_tasks_file
+from hud.utils.hud_console import hud_console
+console = Console()
+if TYPE_CHECKING:
+    from pathlib import Path
+def rl_command(
+    tasks_file: str | None = typer.Argument(
+        None,
+        help="Path to tasks file (JSON/JSONL) or HuggingFace dataset name",
+    ),
+    model: str | None = typer.Argument(
+        None,
+        help="Model to train (default: interactive selection)",
+    ),
+    config_file: Path | None = typer.Option(  # noqa: B008
+        None,
+        "--config",
+        "-c",
+        help="Path to existing configuration file",
+    ),
+    output_dir: str = typer.Option(
+        "/checkpoints",
+        "--output-dir",
+        "-o",
+        help="Output directory for checkpoints",
+    ),
+    restart: bool = typer.Option(
+        False,
+        "--restart",
+        help="Restart the vLLM server before training",
+    ),
+    verbose: bool = typer.Option(
+        False,
+        "--verbose",
+        "-v",
+        help="Enable verbose output",
+    ),
+    # DDP options
+    no_ddp: bool = typer.Option(
+        False,
+        "--no-ddp",
+        help="Disable DDP even with multiple GPUs",
+    ),
+    ddp_gpus: str | None = typer.Option(
+        None,
+        "--ddp-gpus",
+        help="Specific GPUs for DDP (e.g., '0,1,2,3')",
+    ),
+    vllm_gpu: int | None = typer.Option(
+        None,
+        "--vllm-gpu",
+        help="Specific GPU for vLLM server",
+    ),
+    # Execution mode options
+    local: bool = typer.Option(
+        False,
+        "--local",
+        help="Run training locally instead of using remote API server",
+    ),
+    # Internal flag
+    skip_vllm_startup: bool = typer.Option(
+        False,
+        hidden=True,
+        help="Skip local vLLM server startup (for internal use)",
+    ),
+) -> None:
+    """Run GRPO reinforcement learning training on tasks."""
+    # Configure logging based on verbose flag BEFORE any output
+    if not verbose:
+        os.environ["HUD_LOG_LEVEL"] = "WARNING"
+        logging.basicConfig(level=logging.WARNING, force=True)
+        root_logger = logging.getLogger()
+        root_logger.setLevel(logging.WARNING)
+        # Suppress INFO logs from various components
+        for logger_name in [
+            "httpx",
+            "hud.agents",
+            "hud.utils.design",
+            "hud",
+            "asyncio",
+            "transformers",
+        ]:
+            logging.getLogger(logger_name).setLevel(logging.WARNING)
+        logging.getLogger("hud.agents.base").setLevel(logging.WARNING)
+    else:
+        logging.basicConfig(level=logging.INFO)
+    hud_console.header("HUD RL Training")
+    # Determine execution mode
+    use_remote = not local
+    if not tasks_file:
+        tasks_file = find_tasks_file(tasks_file)
+        if not tasks_file:
+            hud_console.warning("No tasks file found in current directory")
+            hud_console.hint(
+                "Download a HF dataset using `hud get <dataset_name>` (e.g., `hud get hud-evals/2048-basic`)"  # noqa: E501
+            )
+            hud_console.hint("or create a tasks file manually.")
+            raise typer.Exit(1)
+    # If user ran bare `hud rl`, guide them through remote task conversion flow
+    # before proceeding (remote only)
+    if use_remote:
+        try:
+            from hud.cli.flows.tasks import convert_tasks_to_remote
+            console.print("\n[cyan]Preparing remote training tasks...[/cyan]")
+            console.print("[cyan](build/push if needed)[/cyan]")
+            tasks_file = convert_tasks_to_remote(tasks_file)
+        except typer.Exit:
+            raise
+        except Exception as e:
+            hud_console.warning(f"[red]Tasks file is not valid for remote training: {e!s}[/red]")
+            hud_console.hint("Either ensure the tasks file has remote urls")
+            hud_console.hint("Or rerun `hud rl` within an environment directory")
+            raise typer.Exit(1) from e
+        try:
+            from .remote_runner import run_remote_training
+            run_remote_training(
+                tasks_file=tasks_file, model=model, config_file=config_file, output_dir=output_dir
+            )
+            return
+        except Exception as e:
+            console.print(f"[red]❌ Remote training failed: {e!s}[/red]")
+            raise typer.Exit(1) from e
+    # Local execution flow delegated to local_runner (imports heavy deps lazily)
+    from .local_runner import run_local_training
+    run_local_training(
+        tasks_file=tasks_file,
+        model=model,
+        config_file=config_file,
+        output_dir=output_dir,
+        restart=restart,
+        verbose=verbose,
+        no_ddp=no_ddp,
+        ddp_gpus=ddp_gpus,
+        vllm_gpu=vllm_gpu,
+        skip_vllm_startup=skip_vllm_startup,
+    )
+# Export the command function
+__all__ = ["rl_command"]

hud-python 0.4.30__tar.gz → 0.4.31__tar.gz

Potentially problematic release.

hud-python 0.4.30tar.gz → 0.4.31tar.gz