PyPI - hud-python - Versions diffs - 0.5.33__tar.gz → 0.5.34__tar.gz - Mend

hud-python 0.5.33tar.gz → 0.5.34tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (344) hide show

{hud_python-0.5.33 → hud_python-0.5.34}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: hud-python
-Version: 0.5.33
+Version: 0.5.34
 Summary: SDK for the HUD platform.
 Project-URL: Homepage, https://github.com/hud-evals/hud-python
 Project-URL: Bug Tracker, https://github.com/hud-evals/hud-python/issues
@@ -101,7 +101,7 @@ Description-Content-Type: text/markdown
   </picture>
 </div>
-The HUD SDK is an open-source Python toolkit for building, evaluating, and training AI agents. Use a unified API for any model provider, wrap your code as MCP environments, run A/B evals at scale, and train with reinforcement learning.
+HUD is a platform for building RL environments for AI agents. Define agent-callable tools, write evaluation scenarios, run evals at scale, and train models on the results.
 To learn more, check out our [Documentation](https://docs.hud.ai) and [API Reference](https://docs.hud.ai/reference).
@@ -110,15 +110,14 @@ To learn more, check out our [Documentation](https://docs.hud.ai) and [API Refer
 [![Add docs to Cursor](https://img.shields.io/badge/Add%20docs%20to-Cursor-black?style=flat-square)](https://cursor.com/en/install-mcp?name=docs-hud-python&config=eyJ1cmwiOiJodHRwczovL2RvY3MuaHVkLmFpL21jcCJ9)
 [![Discord](https://img.shields.io/discord/1327447144772407390?label=Discord&logo=discord&style=flat-square)](https://discord.gg/wkjtmHYYjm)
 [![X Follow](https://img.shields.io/twitter/follow/hud_evals?style=social)](https://x.com/intent/user?screen_name=hud_evals)
-[![Shop](https://img.shields.io/badge/_-white.svg?label=shop&logo=data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAABQAAAAJCAYAAAAywQxIAAAAAXNSR0IArs4c6QAAAARnQU1BAACxjwv8YQUAAAAJcEhZcwAACxMAAAsTAQCanBgAAAF6SURBVChTlZA9ixNhFIWf8yaTpFHRRMXCKpAZhCAYFvwoLHZhwUKw9A9YCJb+Bq0sxGbBQrTxX1j41dvIRAjGZbdwRUUGIzPMeyw2swS3WZ/ynHvP5VylafoAWAd+5Xm+wX+SpukmcMf29RDCZrD9BViz3f53+CjYngKZpD5A2/Y7SQBMJpOkKIprdV1vdzqdHzHGblmW9Ww2+5pl2TmAxWKxmM/nP8fj8cmqqtZijJ9sb0u6ABBWjh0riuIt8CqE8LGu66e2d5MkeQ8QY3xme7fb7T4ZjUbrZVl+jjFuSXoEXGxCDgIl9WzfAO5LSmzvNB771R6vzG4Bx0MIt/M8vwV8aLyDQNt70+n0G1AspaTxVln+aghQluVsKbvxVysflT9NQK/XO7R/SGiQ9Nt2aftElmWXJd1kv0kbeANQVdWl4XB4XtJouXaqNRgMHkrqS+r0+/3XwD1JXdungRfAVWBi+6WkK8D3EMJz22cl3W21WgNgx3YAzvwFd0Chdq03gKUAAAAASUVORK5CYII=&style=social)](https://shop.hud.ai)
 [![Scarf](https://static.scarf.sh/a.png?x-pxid=6530ff33-4945-452b-81f9-626872593933)](https://scarf.sh)
 [![Docs](https://img.shields.io/badge/docs-hud.ai-blue?style=flat-square)](https://docs.hud.ai)
 ## Install
 ```bash
-pip install hud-python
-```
+# Install CLI (recommended)
+uv tool install hud-python --python 3.12
 Get your API key at [hud.ai](https://hud.ai) and set it:
@@ -126,65 +125,88 @@ Get your API key at [hud.ai](https://hud.ai) and set it:
 export HUD_API_KEY=your-key-here
 ```
-> For CLI tools (`hud init`, `hud dev`, etc.): `uv tool install hud-python --python 3.12`
+Get your API key at [hud.ai/project/api-keys](https://hud.ai/project/api-keys).
+> Or install as a library: `pip install hud-python`
 ![Agent running on SheetBench](https://raw.githubusercontent.com/hud-evals/hud-python/main/docs/src/images/trace_sheet.gif)
-## Usage
+## Environments
-### Unified Model API
+An environment is the harness an agent operates in. It packages tools (functions agents can call) and scenarios (how agents are evaluated) into a single deployable unit. Each environment spins up fresh and isolated for every evaluation.
-Use Claude, GPT, Gemini, or Grok through one OpenAI-compatible endpoint:
+```python
+from hud import Environment
+env = Environment("my-env")
+@env.scenario("count")
+async def count(word: str, letter: str):
+    # PROMPT — send a question to the agent.
+    # The agent runs its reasoning loop and returns an answer.
+    answer = yield f"How many '{letter}' in '{word}'?"
+    # SCORE — check the agent's answer against the correct count.
+    # Return a reward: 1.0 for correct, 0.0 for wrong.
+    correct = str(word.lower().count(letter.lower()))
+    yield 1.0 if answer and correct in answer else 0.0
+```
+A scenario has two yields. The first sends a prompt — the agent runs between the yields, calling tools and reasoning. The second checks the result and returns a reward (0.0 to 1.0). → [Core Concepts](https://docs.hud.ai/concepts)
+## Run an Agent
 ```python
-from openai import AsyncOpenAI
-import os
+import hud
+from hud.agents import create_agent
-client = AsyncOpenAI(
-    base_url="https://inference.hud.ai",
-    api_key=os.environ["HUD_API_KEY"]
-)
+task = env("count", word="strawberry", letter="r")
+agent = create_agent("claude-sonnet-4-5")
-response = await client.chat.completions.create(
-    model="claude-sonnet-4-5",  # or gpt-4o, gemini-2.5-pro (https://hud.ai/models)
-    messages=[{"role": "user", "content": "Hello!"}]
-)
+async with hud.eval(task) as ctx:
+    result = await agent.run(ctx)
+print(f"Reward: {result.reward}")  # 1.0 if agent answers "3"
 ```
-Every call is traced at [hud.ai](https://hud.ai). → [Docs](https://docs.hud.ai/quick-links/models)
+`create_agent()` picks the right agent class and native tools for each model. → [Environments](https://docs.hud.ai/quick-links/environments)
-### Environments
+## Workflow
-Turn your code into tools agents can call. Define how to evaluate them:
+```bash
+hud init my-env          # Scaffold environment
+cd my-env
+hud dev env:env -w env.py    # Run locally with hot-reload
+hud eval tasks.py claude     # Run evals locally
+hud deploy                   # Deploy to platform
+hud sync tasks my-taskset    # Sync tasks to platform
+```
-```python
-from hud import Environment
+Once deployed, run evals at scale from the CLI or the [platform UI](https://hud.ai):
-env = Environment("my-env")
+```bash
+hud eval my-taskset claude --remote --full
+```
-@env.tool()
-def add(a: int, b: int) -> int:
-    """Add two numbers."""
-    return a + b
+→ [Deploy](https://docs.hud.ai/quick-links/deploy) · [Testing & Evaluation](https://docs.hud.ai/advanced/testing-environments)
-@env.scenario("solve-math")
-async def solve_math(problem: str, answer: int):
-    response = yield problem                    # Prompt
-    yield 1.0 if str(answer) in response else 0.0  # Reward
+## Pre-built Tools
-async with env("solve-math", problem="What is 2+2?", answer=4) as ctx:
-    # Your agent logic here - call tools, get response
-    result = await ctx.call_tool("add", a=2, b=2)
-    await ctx.submit(f"The answer is {result}")
+HUD ships tools for computer control, shell execution, file editing, browser automation, and web search. Add them to any environment:
-print(ctx.reward)  # 1.0
+```python
+from hud.tools import AnthropicComputerTool, BashTool, EditTool
+env.add_tool(AnthropicComputerTool())  # Mouse, keyboard, screenshots
+env.add_tool(BashTool())               # Persistent bash shell
+env.add_tool(EditTool())               # File viewing and editing
 ```
-The agent runs between the yields. First yield sends the prompt, second yield scores the result. → [Docs](https://docs.hud.ai/quick-links/environments) · [Templates](https://hud.ai/environments)
+HUD adapts each tool to the model's native format — Claude gets `computer_20250124`, OpenAI gets `computer_use_preview`, Gemini gets `ComputerUse`. → [Tools Reference](https://docs.hud.ai/tools/computer)
-### A/B Evals
+## Model Gateway
-Test different models. Repeat runs to see the distribution:
+Use Claude, GPT, Gemini, or Grok through one OpenAI-compatible endpoint:
 ```python
 from openai import AsyncOpenAI
@@ -195,31 +217,13 @@ client = AsyncOpenAI(
     api_key=os.environ["HUD_API_KEY"]
 )
-# Using the env from above
-async with env("solve-math", problem="What is 2+2?", answer=4, variants={"model": ["gpt-4o", "claude-sonnet-4-5"]}, group=5) as ctx:
-    response = await client.chat.completions.create(
-        model=ctx.variants["model"],
-        messages=[{"role": "user", "content": ctx.prompt}],
-        tools=ctx.tools  # Environment tools available to the model
-    )
-    await ctx.submit(response.choices[0].message.content)
-```
-**Variants** test configurations. **Groups** repeat for distribution. Results stream to [hud.ai](https://hud.ai). → [Docs](https://docs.hud.ai/quick-links/evals)
-### Deploy & Train
-Push to GitHub, connect on hud.ai, run at scale:
-```bash
-hud init                  # Scaffold environment
-git push                  # Push to GitHub
-# Connect on hud.ai → New → Environment
-hud eval my-eval --model gpt-4o --group-size 100
-# Or create and run tasks on the platform
+response = await client.chat.completions.create(
+    model="claude-sonnet-4-5",  # or gpt-4o, gemini-2.5-pro (https://hud.ai/models)
+    messages=[{"role": "user", "content": "Hello!"}]
+)
 ```
-Every run generates training data. Use it to fine-tune or run RL. → [Docs](https://docs.hud.ai/quick-links/deploy)
+Every call is traced at [hud.ai](https://hud.ai). → [Models](https://docs.hud.ai/quick-links/models)
 ## Links

{hud_python-0.5.33 → hud_python-0.5.34}/README.md RENAMED Viewed

@@ -6,7 +6,7 @@
   </picture>
 </div>
-The HUD SDK is an open-source Python toolkit for building, evaluating, and training AI agents. Use a unified API for any model provider, wrap your code as MCP environments, run A/B evals at scale, and train with reinforcement learning.
+HUD is a platform for building RL environments for AI agents. Define agent-callable tools, write evaluation scenarios, run evals at scale, and train models on the results.
 To learn more, check out our [Documentation](https://docs.hud.ai) and [API Reference](https://docs.hud.ai/reference).
@@ -15,15 +15,14 @@ To learn more, check out our [Documentation](https://docs.hud.ai) and [API Refer
 [![Add docs to Cursor](https://img.shields.io/badge/Add%20docs%20to-Cursor-black?style=flat-square)](https://cursor.com/en/install-mcp?name=docs-hud-python&config=eyJ1cmwiOiJodHRwczovL2RvY3MuaHVkLmFpL21jcCJ9)
 [![Discord](https://img.shields.io/discord/1327447144772407390?label=Discord&logo=discord&style=flat-square)](https://discord.gg/wkjtmHYYjm)
 [![X Follow](https://img.shields.io/twitter/follow/hud_evals?style=social)](https://x.com/intent/user?screen_name=hud_evals)
-[![Shop](https://img.shields.io/badge/_-white.svg?label=shop&logo=data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAABQAAAAJCAYAAAAywQxIAAAAAXNSR0IArs4c6QAAAARnQU1BAACxjwv8YQUAAAAJcEhZcwAACxMAAAsTAQCanBgAAAF6SURBVChTlZA9ixNhFIWf8yaTpFHRRMXCKpAZhCAYFvwoLHZhwUKw9A9YCJb+Bq0sxGbBQrTxX1j41dvIRAjGZbdwRUUGIzPMeyw2swS3WZ/ynHvP5VylafoAWAd+5Xm+wX+SpukmcMf29RDCZrD9BViz3f53+CjYngKZpD5A2/Y7SQBMJpOkKIprdV1vdzqdHzHGblmW9Ww2+5pl2TmAxWKxmM/nP8fj8cmqqtZijJ9sb0u6ABBWjh0riuIt8CqE8LGu66e2d5MkeQ8QY3xme7fb7T4ZjUbrZVl+jjFuSXoEXGxCDgIl9WzfAO5LSmzvNB771R6vzG4Bx0MIt/M8vwV8aLyDQNt70+n0G1AspaTxVln+aghQluVsKbvxVysflT9NQK/XO7R/SGiQ9Nt2aftElmWXJd1kv0kbeANQVdWl4XB4XtJouXaqNRgMHkrqS+r0+/3XwD1JXdungRfAVWBi+6WkK8D3EMJz22cl3W21WgNgx3YAzvwFd0Chdq03gKUAAAAASUVORK5CYII=&style=social)](https://shop.hud.ai)
 [![Scarf](https://static.scarf.sh/a.png?x-pxid=6530ff33-4945-452b-81f9-626872593933)](https://scarf.sh)
 [![Docs](https://img.shields.io/badge/docs-hud.ai-blue?style=flat-square)](https://docs.hud.ai)
 ## Install
 ```bash
-pip install hud-python
-```
+# Install CLI (recommended)
+uv tool install hud-python --python 3.12
 Get your API key at [hud.ai](https://hud.ai) and set it:
@@ -31,65 +30,88 @@ Get your API key at [hud.ai](https://hud.ai) and set it:
 export HUD_API_KEY=your-key-here
 ```
-> For CLI tools (`hud init`, `hud dev`, etc.): `uv tool install hud-python --python 3.12`
+Get your API key at [hud.ai/project/api-keys](https://hud.ai/project/api-keys).
+> Or install as a library: `pip install hud-python`
 ![Agent running on SheetBench](https://raw.githubusercontent.com/hud-evals/hud-python/main/docs/src/images/trace_sheet.gif)
-## Usage
+## Environments
-### Unified Model API
+An environment is the harness an agent operates in. It packages tools (functions agents can call) and scenarios (how agents are evaluated) into a single deployable unit. Each environment spins up fresh and isolated for every evaluation.
-Use Claude, GPT, Gemini, or Grok through one OpenAI-compatible endpoint:
+```python
+from hud import Environment
+env = Environment("my-env")
+@env.scenario("count")
+async def count(word: str, letter: str):
+    # PROMPT — send a question to the agent.
+    # The agent runs its reasoning loop and returns an answer.
+    answer = yield f"How many '{letter}' in '{word}'?"
+    # SCORE — check the agent's answer against the correct count.
+    # Return a reward: 1.0 for correct, 0.0 for wrong.
+    correct = str(word.lower().count(letter.lower()))
+    yield 1.0 if answer and correct in answer else 0.0
+```
+A scenario has two yields. The first sends a prompt — the agent runs between the yields, calling tools and reasoning. The second checks the result and returns a reward (0.0 to 1.0). → [Core Concepts](https://docs.hud.ai/concepts)
+## Run an Agent
 ```python
-from openai import AsyncOpenAI
-import os
+import hud
+from hud.agents import create_agent
-client = AsyncOpenAI(
-    base_url="https://inference.hud.ai",
-    api_key=os.environ["HUD_API_KEY"]
-)
+task = env("count", word="strawberry", letter="r")
+agent = create_agent("claude-sonnet-4-5")
-response = await client.chat.completions.create(
-    model="claude-sonnet-4-5",  # or gpt-4o, gemini-2.5-pro (https://hud.ai/models)
-    messages=[{"role": "user", "content": "Hello!"}]
-)
+async with hud.eval(task) as ctx:
+    result = await agent.run(ctx)
+print(f"Reward: {result.reward}")  # 1.0 if agent answers "3"
 ```
-Every call is traced at [hud.ai](https://hud.ai). → [Docs](https://docs.hud.ai/quick-links/models)
+`create_agent()` picks the right agent class and native tools for each model. → [Environments](https://docs.hud.ai/quick-links/environments)
-### Environments
+## Workflow
-Turn your code into tools agents can call. Define how to evaluate them:
+```bash
+hud init my-env          # Scaffold environment
+cd my-env
+hud dev env:env -w env.py    # Run locally with hot-reload
+hud eval tasks.py claude     # Run evals locally
+hud deploy                   # Deploy to platform
+hud sync tasks my-taskset    # Sync tasks to platform
+```
-```python
-from hud import Environment
+Once deployed, run evals at scale from the CLI or the [platform UI](https://hud.ai):
-env = Environment("my-env")
+```bash
+hud eval my-taskset claude --remote --full
+```
-@env.tool()
-def add(a: int, b: int) -> int:
-    """Add two numbers."""
-    return a + b
+→ [Deploy](https://docs.hud.ai/quick-links/deploy) · [Testing & Evaluation](https://docs.hud.ai/advanced/testing-environments)
-@env.scenario("solve-math")
-async def solve_math(problem: str, answer: int):
-    response = yield problem                    # Prompt
-    yield 1.0 if str(answer) in response else 0.0  # Reward
+## Pre-built Tools
-async with env("solve-math", problem="What is 2+2?", answer=4) as ctx:
-    # Your agent logic here - call tools, get response
-    result = await ctx.call_tool("add", a=2, b=2)
-    await ctx.submit(f"The answer is {result}")
+HUD ships tools for computer control, shell execution, file editing, browser automation, and web search. Add them to any environment:
-print(ctx.reward)  # 1.0
+```python
+from hud.tools import AnthropicComputerTool, BashTool, EditTool
+env.add_tool(AnthropicComputerTool())  # Mouse, keyboard, screenshots
+env.add_tool(BashTool())               # Persistent bash shell
+env.add_tool(EditTool())               # File viewing and editing
 ```
-The agent runs between the yields. First yield sends the prompt, second yield scores the result. → [Docs](https://docs.hud.ai/quick-links/environments) · [Templates](https://hud.ai/environments)
+HUD adapts each tool to the model's native format — Claude gets `computer_20250124`, OpenAI gets `computer_use_preview`, Gemini gets `ComputerUse`. → [Tools Reference](https://docs.hud.ai/tools/computer)
-### A/B Evals
+## Model Gateway
-Test different models. Repeat runs to see the distribution:
+Use Claude, GPT, Gemini, or Grok through one OpenAI-compatible endpoint:
 ```python
 from openai import AsyncOpenAI
@@ -100,31 +122,13 @@ client = AsyncOpenAI(
     api_key=os.environ["HUD_API_KEY"]
 )
-# Using the env from above
-async with env("solve-math", problem="What is 2+2?", answer=4, variants={"model": ["gpt-4o", "claude-sonnet-4-5"]}, group=5) as ctx:
-    response = await client.chat.completions.create(
-        model=ctx.variants["model"],
-        messages=[{"role": "user", "content": ctx.prompt}],
-        tools=ctx.tools  # Environment tools available to the model
-    )
-    await ctx.submit(response.choices[0].message.content)
-```
-**Variants** test configurations. **Groups** repeat for distribution. Results stream to [hud.ai](https://hud.ai). → [Docs](https://docs.hud.ai/quick-links/evals)
-### Deploy & Train
-Push to GitHub, connect on hud.ai, run at scale:
-```bash
-hud init                  # Scaffold environment
-git push                  # Push to GitHub
-# Connect on hud.ai → New → Environment
-hud eval my-eval --model gpt-4o --group-size 100
-# Or create and run tasks on the platform
+response = await client.chat.completions.create(
+    model="claude-sonnet-4-5",  # or gpt-4o, gemini-2.5-pro (https://hud.ai/models)
+    messages=[{"role": "user", "content": "Hello!"}]
+)
 ```
-Every run generates training data. Use it to fine-tune or run RL. → [Docs](https://docs.hud.ai/quick-links/deploy)
+Every call is traced at [hud.ai](https://hud.ai). → [Models](https://docs.hud.ai/quick-links/models)
 ## Links

{hud_python-0.5.33 → hud_python-0.5.34}/hud/agents/openai.py RENAMED Viewed

@@ -485,10 +485,9 @@ class OpenAIAgent(MCPAgent):
                         type="computer_screenshot",
                         image_url=f"data:image/png;base64,{screenshot}",
                     ),
-                    acknowledged_safety_checks=(
-                        acknowledged_checks if acknowledged_checks else None
-                    ),
                 )
+                if acknowledged_checks:
+                    output_payload["acknowledged_safety_checks"] = acknowledged_checks
                 computer_outputs.append(output_payload)
                 self.pending_call_id = None
                 self.pending_safety_checks = []

{hud_python-0.5.33 → hud_python-0.5.34}/hud/cli/__init__.py RENAMED Viewed

@@ -11,7 +11,7 @@ from rich.panel import Panel
 # Create the main Typer app
 app = typer.Typer(
     name="hud",
-    help="🚀 HUD CLI - build, test, and deploy RL environments",
+    help="HUD CLI - build, test, and deploy evaluation environments",
     add_completion=False,
     rich_markup_mode="rich",
     pretty_exceptions_enable=False,
@@ -40,8 +40,8 @@ from .init import init_command  # noqa: E402
 from .link import link_command  # noqa: E402
 from .models import models_command  # noqa: E402
 from .push import push_command  # noqa: E402
-from .rft import rft_run_command  # noqa: E402
-from .rft_status import rft_status_typer_command  # noqa: E402
+from .scenario import scenario_app  # noqa: E402
+from .sync import sync_app  # noqa: E402
 _EXTRA_ARGS = {"allow_extra_args": True, "ignore_unknown_options": True}
@@ -50,7 +50,7 @@ app.command(name="debug", context_settings=_EXTRA_ARGS)(debug_command)
 app.command(name="dev", context_settings=_EXTRA_ARGS)(dev_command)
 app.command(name="build", context_settings=_EXTRA_ARGS)(build_command)
 app.command(name="deploy")(deploy_command)
-app.command(name="link")(link_command)
+app.command(name="link", hidden=True)(link_command)
 app.command(name="eval")(eval_command)
 app.command(name="push", hidden=True)(push_command)
 app.command(name="init")(init_command)
@@ -108,11 +108,11 @@ def version() -> None:
         console.print("HUD CLI version: [cyan]unknown[/cyan]")
-# RFT subcommand group
-rft_app = typer.Typer(help="🚀 Reinforcement Fine-Tuning (RFT) commands")
-rft_app.command("run")(rft_run_command)
-rft_app.command("status")(rft_status_typer_command)
-app.add_typer(rft_app, name="rft")
+# Scenario subcommand group
+app.add_typer(scenario_app, name="scenario")
+# Sync subcommand group
+app.add_typer(sync_app, name="sync")
 # ---------------------------------------------------------------------------
@@ -140,7 +140,7 @@ def main() -> None:
         if len(sys.argv) == 1 or (len(sys.argv) == 2 and sys.argv[1] in ["--help", "-h"]):
             console.print(
                 Panel.fit(
-                    "[bold cyan]🚀 HUD CLI[/bold cyan]\nBuild, test, and deploy RL environments",
+                    "[bold cyan]HUD CLI[/bold cyan]\nBuild, test, and deploy environments",
                     border_style="cyan",
                 )
             )
@@ -150,10 +150,8 @@ def main() -> None:
             )
             console.print("  2. Start dev server:        [cyan]hud dev[/cyan]")
             console.print("  3. Deploy to HUD platform:  [cyan]hud deploy[/cyan]")
-            console.print("  4. Run evaluations:         [cyan]hud eval tasks.jsonl[/cyan]")
-            console.print("\n[yellow]Training:[/yellow]")
-            console.print("  [cyan]hud rft run tasks.jsonl[/cyan]      Launch an RFT training job")
-            console.print("  [cyan]hud rft status <model-id>[/cyan]  Check training status\n")
+            console.print("  4. Sync tasks:              [cyan]hud sync tasks my-taskset[/cyan]")
+            console.print("  5. Run evaluations:         [cyan]hud eval tasks.py claude[/cyan]\n")
         app()
     except typer.Exit as e:

hud-python 0.5.33__tar.gz → 0.5.34__tar.gz

hud-python 0.5.33tar.gz → 0.5.34tar.gz