PyPI - hud-python - Versions diffs - 0.4.15__py3-none-any.whl → 0.4.16__py3-none-any.whl - Mend

hud-python 0.4.15py3-none-any.whl → 0.4.16py3-none-any.whl

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Potentially problematic release.

This version of hud-python might be problematic. Click here for more details.

Files changed (14) hide show

hud/cli/rl/README.md +243 -0
hud/cli/rl/__init__.py +82 -0
hud/cli/rl/init.py +370 -0
hud/cli/rl/pod.py +491 -0
hud/cli/rl/ssh.py +288 -0
hud/cli/rl/train.py +421 -0
hud/cli/rl/utils.py +165 -0
hud/utils/tests/test_version.py +1 -1
hud/version.py +1 -1
{hud_python-0.4.15.dist-info → hud_python-0.4.16.dist-info}/METADATA +1 -1
{hud_python-0.4.15.dist-info → hud_python-0.4.16.dist-info}/RECORD +14 -7
{hud_python-0.4.15.dist-info → hud_python-0.4.16.dist-info}/WHEEL +0 -0
{hud_python-0.4.15.dist-info → hud_python-0.4.16.dist-info}/entry_points.txt +0 -0
{hud_python-0.4.15.dist-info → hud_python-0.4.16.dist-info}/licenses/LICENSE +0 -0

hud/cli/rl/README.md ADDED Viewed

@@ -0,0 +1,243 @@
+# HUD RL Commands
+This module provides reinforcement learning commands for training agents on HUD environments using the `hud-vf-gym` adapter and verifiers framework.
+## Configuration
+API keys can be configured in two ways:
+1. **Environment Variables**:
+   ```bash
+   export HUD_API_KEY="your-hud-api-key"
+   export WANDB_API_KEY="your-wandb-api-key"
+   export PRIME_API_KEY="your-prime-api-key"
+   ```
+2. **`.env` File** (recommended):
+   Create a `.env` file in your project root:
+   ```env
+   HUD_API_KEY=your-hud-api-key
+   WANDB_API_KEY=your-wandb-api-key
+   PRIME_API_KEY=your-prime-api-key
+   ```
+HUD automatically loads settings from the `.env` file if present.
+## Quick Start
+```bash
+# 1. Generate config from environment
+hud rl init my-env:latest
+# 2. Create dataset from tasks
+hud hf tasks.json --name my-org/my-tasks
+# 3. Start training (interactive mode)
+hud rl
+```
+## Commands
+### `hud rl init`
+Generates a `hud-vf-gym` configuration file by analyzing a HUD environment:
+```bash
+hud rl init hudpython/hud-text-2048:latest
+hud rl init my-env:latest -o configs/my-env.yaml
+hud rl init my-env:latest --force  # Overwrite existing
+```
+This command:
+- Analyzes the environment's available tools
+- Generates appropriate action mappings
+- Creates a system prompt with tool descriptions
+- Sets up default parser and rubric configurations
+### `hud hf`
+Converts HUD tasks to HuggingFace dataset format:
+```bash
+hud hf tasks.json --name my-org/my-dataset
+hud hf tasks.json --name my-org/private-dataset --private
+hud hf tasks.json --name local-dataset --no-push  # Local only
+```
+Features:
+- Validates task format
+- Auto-infers MCP config from `hud.lock.yaml`
+- Updates lock file with primary dataset reference
+- Supports both single task and task array formats
+### `hud rl` (main command)
+Runs RL training with automatic setup:
+```bash
+# Interactive mode - prompts for missing components
+hud rl
+# Specify options
+hud rl --model gpt-4o-mini --dataset my-org/my-tasks
+hud rl --config configs/2048.yaml --gpus 4xH100
+hud rl --gpus 4xH100 --provider prime
+```
+The command will:
+1. Check for required files (config, dataset)
+2. Offer to generate missing components
+3. Push environment to registry if needed
+4. Start training (local or remote)
+## Task Format
+Tasks should follow this JSON format:
+```json
+{
+  "id": "task-001",
+  "prompt": "Complete the task description",
+  "mcp_config": {
+    "hud": {
+      "url": "https://mcp.hud.so/v3/mcp",
+      "headers": {
+        "Authorization": "Bearer $HUD_API_KEY",
+        "Mcp-Image": "your-org/your-env:latest"
+      }
+    }
+  },
+  "setup_tool": {
+    "name": "setup",
+    "arguments": {
+      "name": "function_name",
+      "param": "value"
+    }
+  },
+  "evaluate_tool": {
+    "name": "evaluate",
+    "arguments": {
+      "name": "evaluator_name",
+      "expected": "value"
+    }
+  },
+  "metadata": {
+    "difficulty": "easy",
+    "category": "task_type"
+  }
+}
+```
+## Configuration Format
+The generated YAML configs follow the `hud-vf-gym` specification:
+```yaml
+job:
+  name: "RL Training - my-env"
+  metadata:
+    environment: "my-env:latest"
+system_prompt: |
+  You are an AI agent interacting with my-env.
+  Available tools:
+  - tool_name(params): Description
+    Usage: <tool>tool_name(...)</tool>
+parser:
+  use_thinking: true
+  xml_weight: 0.6
+  action_weight: 0.4
+action_mappings:
+  tool_name:
+    _tool: "mcp_tool_name"
+    _parser:
+      positional: ["param1", "param2"]
+    param1:
+      from_arg: "param1"
+rubric:
+  weights:
+    task_completion: 0.8
+    tool_execution: 0.1
+    format_compliance: 0.1
+```
+## Lock File Integration
+The commands integrate with `hud.lock.yaml`:
+```yaml
+image: "my-org/my-env:latest"
+primary_dataset:
+  name: "my-org/my-tasks"
+  task_count: 50
+  updated_at: "2024-01-01T00:00:00"
+```
+This allows:
+- Automatic dataset discovery for `hud rl`
+- MCP config inference for tasks
+- Environment image tracking
+## Remote Training
+The `hud rl` command fully automates remote training on GPU instances:
+1. **Automatic Pod Creation**: Provisions GPU instances via Prime Intellect API
+2. **Environment Setup**: Installs all required dependencies automatically
+3. **Training Execution**: Runs distributed training with vLLM inference server
+4. **Live Monitoring**: Streams training logs with WANDB integration
+### What Happens Automatically
+When you run `hud rl`, the system will:
+1. **Create GPU Pod**:
+   - Selects lowest-cost provider (typically datacrunch)
+   - Allocates specified GPUs (e.g., 2xA100 for GRPO training)
+   - Configures with PyTorch CUDA image
+   - Polls until SSH is available (5-20 minutes)
+2. **Transfer Files**:
+   - Copies your config YAML to the pod
+   - Creates a custom training script
+3. **Install Dependencies**:
+   - Installs `uv` package manager
+   - Creates Python 3.12 virtual environment
+   - Installs `hud-vf-gym` via Prime registry
+   - Installs `verifiers[train]` for GRPO training
+   - Installs `flash-attn` for efficient attention
+4. **Setup Training**:
+   - Exports WANDB_API_KEY and HUD_API_KEY
+   - Starts vLLM inference server on GPU 0 via tmux
+   - Runs GRPO training on GPU 1
+   - Logs metrics to Weights & Biases
+### Required API Keys
+Ensure these are set in your `.env` file or environment:
+- `HUD_API_KEY`: For HUD telemetry and MCP connections
+- `WANDB_API_KEY`: For training metrics and logging
+- `PRIME_API_KEY`: For pod provisioning
+### SSH Key Configuration
+Before using Prime pods:
+1. Generate SSH keys at: https://app.primeintellect.ai/dashboard/profile
+2. Download and save as: `~/.ssh/prime_key.pem`
+3. Set permissions: `chmod 400 ~/.ssh/prime_key.pem`
+4. Configure Prime CLI: `prime config set-ssh-key-path ~/.ssh/prime_key.pem`
+## Implementation Notes
+The RL commands are built on top of:
+- `hud-vf-gym`: Generic adapter for HUD environments
+- `verifiers`: RL training framework
+- HuggingFace datasets: Task storage and distribution
+- Prime Intellect infrastructure: GPU provisioning (planned)

hud/cli/rl/__init__.py ADDED Viewed

@@ -0,0 +1,82 @@
+"""HUD RL - Commands for reinforcement learning with HUD environments."""
+from __future__ import annotations
+from pathlib import Path  # noqa: TC003
+import typer
+from hud.utils.design import HUDDesign
+# Create the RL subcommand app
+rl_app = typer.Typer(
+    name="rl",
+    help="🤖 Reinforcement learning commands for HUD environments",
+    rich_markup_mode="rich",
+)
+design = HUDDesign()
+@rl_app.callback(invoke_without_command=True)
+def rl_main(
+    ctx: typer.Context,
+    model: str = typer.Option("Qwen/Qwen2.5-3B-Instruct", "--model", "-m", help="Model to train"),
+    dataset: str | None = typer.Option(
+        None, "--dataset", "-d", help="Override dataset from lock file"
+    ),
+    config: Path | None = typer.Option(None, "--config", "-c", help="Config YAML path"),  # noqa: B008
+    gpus: str = typer.Option("2xA100", "--gpus", help="GPU configuration (e.g., 2xA100, 4xH100)"),
+    provider: str = typer.Option("prime", "--provider", help="Infrastructure provider"),
+    output_dir: Path = typer.Option("./checkpoints", "--output", "-o", help="Output directory"),  # noqa: B008
+) -> None:
+    """🤖 Train RL models on HUD environments.
+    Runs training on remote GPU infrastructure with automatic setup.
+    The command will:
+    1. Check for required files (config, dataset)
+    2. Offer to generate missing files
+    3. Push environment to registry if needed
+    4. Start remote training on Prime Intellect
+    Examples:
+        hud rl                    # Interactive mode with prompts
+        hud rl --model gpt2       # Train with specific model
+        hud rl --gpus 4xH100      # Use different GPU configuration
+        hud rl init my-env:latest # Generate config for environment
+    """
+    # Only run main command if no subcommand was invoked
+    if ctx.invoked_subcommand is None:
+        from .train import train_command_wrapper
+        train_command_wrapper(
+            model=model,
+            dataset=dataset,
+            config=config,
+            gpus=gpus,
+            provider=provider,
+            output_dir=output_dir,
+        )
+@rl_app.command()
+def init(
+    directory: str = typer.Argument(".", help="Environment directory or Docker image"),
+    output: Path = typer.Option(None, "--output", "-o", help="Output config file path"),  # noqa: B008
+    force: bool = typer.Option(False, "--force", "-f", help="Overwrite existing config"),
+    build: bool = typer.Option(False, "--build", "-b", help="Build environment if no lock file"),
+) -> None:
+    """🔧 Generate hud-vf-gym config from environment.
+    Generates a YAML configuration file compatible with the hud-vf-gym adapter
+    from either a directory with hud.lock.yaml or a Docker image.
+    Examples:
+        hud rl init                    # Use current directory
+        hud rl init environments/test  # Use specific directory
+        hud rl init my-env:latest      # Use Docker image directly
+        hud rl init . -o configs/2048.yaml --build
+    """
+    from .init import init_command_wrapper
+    init_command_wrapper(directory, output, force, build)

hud/cli/rl/init.py ADDED Viewed

@@ -0,0 +1,370 @@
+"""Initialize RL configuration from environment analysis."""
+from __future__ import annotations
+import asyncio
+from pathlib import Path
+from typing import Any
+import typer
+import yaml
+from hud.clients import MCPClient
+from hud.utils.design import HUDDesign
+design = HUDDesign()
+def init_command_wrapper(directory: str, output: Path | None, force: bool, build: bool) -> None:
+    """Wrapper to handle interactive prompts before entering async context."""
+    design.header("RL Config Generator", icon="🔧")
+    # Determine if this is a directory or Docker image
+    path = Path(directory)
+    is_directory = path.exists() and path.is_dir()
+    if is_directory:
+        # Working with a directory - check for lock file
+        lock_path = path / "hud.lock.yaml"
+        if not lock_path.exists():
+            if build:
+                # Auto-build was requested
+                design.info("Building environment...")
+                from hud.cli.build import build_command
+                build_command(str(directory), None, False, False, {})
+                # After build, lock file should exist
+            else:
+                # Try to get image from pyproject.toml or auto-generate
+                from hud.cli.utils.environment import get_image_name, image_exists
+                image, source = get_image_name(directory)
+                if not (source == "cache" and image_exists(image)):
+                    design.warning(f"No hud.lock.yaml found in {directory}")
+                    # Need to handle interactive prompt here, before async
+                    action = design.select(
+                        "No lock file found. Would you like to:",
+                        ["Build the environment", "Use Docker image directly", "Cancel"],
+                    )
+                    if action == "Build the environment":
+                        design.info("Building environment...")
+                        from hud.cli.build import build_command
+                        build_command(str(directory), None, False, False, {})
+                        # After build, lock file should exist
+                    elif action == "Use Docker image directly":
+                        # Prompt for image name
+                        image = typer.prompt("Enter Docker image name")
+                        directory = image  # Override to use as Docker image
+                        is_directory = False  # Treat as image, not directory
+                    else:
+                        raise typer.Exit(1)
+    # Now run the async command with resolved parameters
+    asyncio.run(init_command(directory, output, force, False))
+async def init_command(directory: str, output: Path | None, force: bool, build: bool) -> None:
+    """Generate hud-vf-gym config from environment."""
+    # Determine if this is a directory or Docker image
+    path = Path(directory)
+    is_directory = path.exists() and path.is_dir()
+    if is_directory:
+        # Working with a directory - look for lock file
+        lock_path = path / "hud.lock.yaml"
+        if lock_path.exists():
+            design.info(f"Found lock file: {lock_path}")
+            lock_data = read_lock_file_path(lock_path)
+            if not lock_data:
+                design.error("Failed to read lock file")
+                raise typer.Exit(1)
+            # Get image and tools from lock file
+            image = lock_data.get("image", "")
+            tools = lock_data.get("tools", [])
+            if not image:
+                design.error("No image found in lock file")
+                design.hint("Run 'hud build' to create a proper lock file")
+                raise typer.Exit(1)
+            if not tools:
+                design.error("No tools found in lock file")
+                design.hint("Lock file may be outdated. Run 'hud build' to regenerate")
+                raise typer.Exit(1)
+            # Use lock file data to generate config
+            await generate_from_lock(image, tools, output, force)
+        else:
+            # No lock file - try to use cached image
+            # Build should have been handled in the wrapper
+            from hud.cli.utils.environment import get_image_name, image_exists
+            image, source = get_image_name(directory)
+            if source == "cache" and image_exists(image):
+                # Found cached image in pyproject.toml
+                design.info(f"Using cached image: {image}")
+                await analyze_and_generate(image, output, force)
+            else:
+                # This should have been handled in the wrapper
+                design.error("No valid image or lock file found")
+                raise typer.Exit(1)
+    else:
+        # Working with a Docker image directly
+        image = directory
+        await analyze_and_generate(image, output, force)
+def read_lock_file_path(lock_path: Path) -> dict[str, Any]:
+    """Read lock file from specific path."""
+    try:
+        with open(lock_path) as f:
+            return yaml.safe_load(f) or {}
+    except Exception as e:
+        design.error(f"Failed to read lock file: {e}")
+        return {}
+async def generate_from_lock(
+    image: str, tools: list[dict], output: Path | None, force: bool
+) -> None:
+    """Generate config from lock file data."""
+    # Determine output path
+    if output is None:
+        # Default to configs/{image_name}.yaml
+        image_name = image.split("/")[-1].split(":")[0]
+        if "/" in image_name:
+            image_name = image_name.split("/")[-1]
+        output = Path("configs") / f"{image_name}.yaml"
+    # Check if file exists
+    if output.exists() and not force:
+        design.error(f"Config file already exists: {output}")
+        design.info("Use --force to overwrite")
+        raise typer.Exit(1)
+    # Create output directory if needed
+    output.parent.mkdir(parents=True, exist_ok=True)
+    # Convert lock file tool format to full tool format
+    # Lock file may have full or simplified format
+    full_tools = []
+    for tool in tools:
+        full_tool = {
+            "name": tool["name"],
+            "description": tool.get("description", ""),
+        }
+        # Check if lock file has inputSchema (newer format)
+        if "inputSchema" in tool:
+            full_tool["inputSchema"] = tool["inputSchema"]
+        else:
+            # Old lock file format without schema
+            full_tool["inputSchema"] = {"type": "object", "properties": {}, "required": []}
+        full_tools.append(full_tool)
+    # Generate config
+    config = await generate_config(image, full_tools)
+    # Write to file
+    with open(output, "w") as f:  # noqa: ASYNC230
+        yaml.dump(config, f, default_flow_style=False, sort_keys=False)
+    design.success(f"Generated config: {output}")
+    # Show summary
+    design.section_title("📋 Generated Configuration")
+    design.info("Source: hud.lock.yaml")
+    design.info(f"Image: {image}")
+    design.info(f"System prompt: {len(config['system_prompt'])} characters")
+    design.info(f"Action mappings: {len(config['action_mappings'])} tools")
+    design.info("")
+    design.info("Next steps:")
+    design.command_example("hud hf tasks.json --name my-tasks", "Create dataset")
+    design.command_example(f"hud rl --config {output}", "Start training")
+async def analyze_and_generate(image: str, output: Path | None, force: bool) -> None:
+    """Analyze Docker image and generate config."""
+    # Determine output path
+    if output is None:
+        # Default to configs/{image_name}.yaml
+        image_name = image.split("/")[-1].split(":")[0]
+        output = Path("configs") / f"{image_name}.yaml"
+    # Check if file exists
+    if output.exists() and not force:
+        design.error(f"Config file already exists: {output}")
+        design.info("Use --force to overwrite")
+        raise typer.Exit(1)
+    # Create output directory if needed
+    output.parent.mkdir(parents=True, exist_ok=True)
+    design.info(f"Analyzing environment: {image}")
+    # Analyze the environment
+    try:
+        # Create MCP config for Docker
+        mcp_config = {"local": {"command": "docker", "args": ["run", "--rm", "-i", image]}}
+        # Initialize client and analyze
+        client = MCPClient(mcp_config=mcp_config, auto_trace=False)
+        await client.initialize()
+        try:
+            analysis = await client.analyze_environment()
+            tools = analysis.get("tools", [])
+            # Generate config
+            config = await generate_config(image, tools)
+            # Write to file
+            with open(output, "w") as f:  # noqa: ASYNC230
+                yaml.dump(config, f, default_flow_style=False, sort_keys=False)
+            design.success(f"Generated config: {output}")
+            # Show summary
+            design.section_title("📋 Generated Configuration")
+            design.info(f"System prompt: {len(config['system_prompt'])} characters")
+            design.info(f"Action mappings: {len(config['action_mappings'])} tools")
+            design.info("")
+            design.info("Next steps:")
+            design.command_example("hud hf tasks.json --name my-tasks", "Create dataset")
+            design.command_example(f"hud rl --config {output}", "Start training")
+        finally:
+            await client.shutdown()
+    except Exception as e:
+        design.error(f"Failed to analyze environment: {e}")
+        design.hint("Make sure the Docker image exists and contains a valid MCP server")
+        raise typer.Exit(1) from e
+async def generate_config(image: str, tools: list[dict[str, Any]]) -> dict[str, Any]:
+    """Generate hud-vf-gym configuration from tool analysis."""
+    # Clean up image name for display
+    display_name = image.split("@")[0] if "@" in image else image  # Remove SHA hash
+    env_name = display_name.split("/")[-1].split(":")[0]  # Extract just the env name
+    # Filter out setup/evaluate tools
+    interaction_tools = [t for t in tools if t["name"] not in ["setup", "evaluate"]]
+    # Generate system prompt
+    tool_descriptions = []
+    for tool in interaction_tools:
+        # Check if we have schema (from direct analysis) or just name/description (from lock file)
+        has_schema = "inputSchema" in tool and tool["inputSchema"].get("properties")
+        if has_schema:
+            params = tool.get("inputSchema", {}).get("properties", {})
+            required = tool.get("inputSchema", {}).get("required", [])
+            # Build parameter string
+            param_parts = []
+            for name, schema in params.items():
+                param_type = schema.get("type", "any")
+                if name in required:
+                    param_parts.append(f"{name}: {param_type}")
+                else:
+                    param_parts.append(f"{name}?: {param_type}")
+            param_str = ", ".join(param_parts) if param_parts else ""
+        else:
+            # No schema information
+            param_str = "..."
+        desc = tool.get("description", "No description")
+        tool_descriptions.append(
+            f"- {tool['name']}({param_str}): {desc}\n  Usage: <tool>{tool['name']}(...)</tool>"
+        )
+    # Add note if any tools are missing schema info
+    if interaction_tools and any("inputSchema" not in t for t in interaction_tools):
+        tool_descriptions.append(
+            "\nNote: Some tools are missing parameter information. Update manually if needed."
+        )
+    system_prompt = f"""You are an AI agent in a HUD environment.
+You have access to the following tools:
+{chr(10).join(tool_descriptions)}
+Always use the exact XML format shown above for tool calls.
+Think step by step about what you need to do."""
+    # Generate action mappings
+    action_mappings = {}
+    for tool in interaction_tools:
+        # Check if we have inputSchema information
+        has_input_schema = "inputSchema" in tool
+        if has_input_schema:
+            # We have schema info (even if no parameters)
+            params = tool.get("inputSchema", {}).get("properties", {})
+            required = tool.get("inputSchema", {}).get("required", [])
+            # Simple 1:1 mapping by default
+            mapping = {
+                "_tool": tool["name"],
+                "_parser": {
+                    "positional": list(required)  # Use required params as positional
+                },
+            }
+            # Add parameter mappings (only if there are params)
+            for param_name in params:
+                mapping[param_name] = {"from_arg": param_name}
+        else:
+            # No schema information at all
+            mapping = {
+                "_tool": tool["name"],
+                "_parser": {
+                    "positional": []  # No positional args without schema
+                },
+                "# TODO": "Update with actual parameters",
+            }
+        action_mappings[tool["name"]] = mapping
+    # Add special "done" action if not present
+    if "done" not in action_mappings:
+        action_mappings["done"] = {
+            "_tool": None,  # Special marker for task completion
+            "_parser": {"positional": []},
+        }
+    # Build full config
+    config = {
+        "# Generated by hud rl init": f"for {env_name}",
+        "job": {
+            "name": f"RL Training - {env_name}",
+            "metadata": {
+                "environment": display_name,
+                "full_image": image,
+                "generated_by": "hud rl init",
+            },
+        },
+        "system_prompt": system_prompt,
+        "parser": {"use_thinking": True, "xml_weight": 0.6, "action_weight": 0.4},
+        "action_mappings": action_mappings,
+        "rubric": {
+            "weights": {"task_completion": 0.8, "tool_execution": 0.1, "format_compliance": 0.1}
+        },
+        "defaults": {"max_turns": 100},
+    }
+    return config

hud-python 0.4.15__py3-none-any.whl → 0.4.16__py3-none-any.whl

Potentially problematic release.

hud-python 0.4.15py3-none-any.whl → 0.4.16py3-none-any.whl