PyPI - hud-python - Versions diffs - 0.4.60__tar.gz → 0.4.62__tar.gz - Mend

hud-python 0.4.60tar.gz → 0.4.62tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (316) hide show

{hud_python-0.4.60 → hud_python-0.4.62}/PKG-INFO RENAMED Viewed

@@ -1,11 +1,11 @@
 Metadata-Version: 2.4
 Name: hud-python
-Version: 0.4.60
+Version: 0.4.62
 Summary: SDK for the HUD platform.
 Project-URL: Homepage, https://github.com/hud-evals/hud-python
 Project-URL: Bug Tracker, https://github.com/hud-evals/hud-python/issues
-Project-URL: Documentation, https://docs.hud.so
-Author-email: HUD SDK <founders@hud.so>
+Project-URL: Documentation, https://docs.hud.ai
+Author-email: HUD <founders@hud.ai>
 License: MIT License
         Copyright (c) 2025 Human Union Data, Inc
@@ -59,6 +59,7 @@ Requires-Dist: pydantic<3,>=2.6
 Requires-Dist: questionary==2.1.0
 Requires-Dist: rich>=13.0.0
 Requires-Dist: toml>=0.10.2
+Requires-Dist: tornado>=6.5.2
 Requires-Dist: typer>=0.9.0
 Requires-Dist: watchfiles>=0.21.0
 Requires-Dist: wrapt>=1.14.0
@@ -153,21 +154,21 @@ OSS RL environment + evals toolkit. Wrap software as environments, run benchmark
 [![Add docs to Cursor](https://img.shields.io/badge/Add%20docs%20to-Cursor-black?style=flat-square)](https://cursor.com/en/install-mcp?name=docs-hud-python&config=eyJ1cmwiOiJodHRwczovL2RvY3MuaHVkLnNvL21jcCJ9)
 [![Discord](https://img.shields.io/discord/1327447144772407390?label=Discord&logo=discord&style=flat-square)](https://discord.gg/wkjtmHYYjm)
 [![X Follow](https://img.shields.io/twitter/follow/hud_evals?style=social)](https://x.com/intent/user?screen_name=hud_evals)
-[![Shop](https://img.shields.io/badge/_-white.svg?label=shop&logo=data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAABQAAAAJCAYAAAAywQxIAAAAAXNSR0IArs4c6QAAAARnQU1BAACxjwv8YQUAAAAJcEhZcwAACxMAAAsTAQCanBgAAAF6SURBVChTlZA9ixNhFIWf8yaTpFHRRMXCKpAZhCAYFvwoLHZhwUKw9A9YCJb+Bq0sxGbBQrTxX1j41dvIRAjGZbdwRUUGIzPMeyw2swS3WZ/ynHvP5VylafoAWAd+5Xm+wX+SpukmcMf29RDCZrD9BViz3f53+CjYngKZpD5A2/Y7SQBMJpOkKIprdV1vdzqdHzHGblmW9Ww2+5pl2TmAxWKxmM/nP8fj8cmqqtZijJ9sb0u6ABBWjh0riuIt8CqE8LGu66e2d5MkeQ8QY3xme7fb7T4ZjUbrZVl+jjFuSXoEXGxCDgIl9WzfAO5LSmzvNB771R6vzG4Bx0MIt/M8vwV8aLyDQNt70+n0G1AspaTxVln+aghQluVsKbvxVysflT9NQK/XO7R/SGiQ9Nt2aftElmWXJd1kv0kbeANQVdWl4XB4XtJouXaqNRgMHkrqS+r0+/3XwD1JXdungRfAVWBi+6WkK8D3EMJz22cl3W21WgNgx3YAzvwFd0Chdq03gKUAAAAASUVORK5CYII=&style=social)](https://shop.hud.so)
+[![Shop](https://img.shields.io/badge/_-white.svg?label=shop&logo=data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAABQAAAAJCAYAAAAywQxIAAAAAXNSR0IArs4c6QAAAARnQU1BAACxjwv8YQUAAAAJcEhZcwAACxMAAAsTAQCanBgAAAF6SURBVChTlZA9ixNhFIWf8yaTpFHRRMXCKpAZhCAYFvwoLHZhwUKw9A9YCJb+Bq0sxGbBQrTxX1j41dvIRAjGZbdwRUUGIzPMeyw2swS3WZ/ynHvP5VylafoAWAd+5Xm+wX+SpukmcMf29RDCZrD9BViz3f53+CjYngKZpD5A2/Y7SQBMJpOkKIprdV1vdzqdHzHGblmW9Ww2+5pl2TmAxWKxmM/nP8fj8cmqqtZijJ9sb0u6ABBWjh0riuIt8CqE8LGu66e2d5MkeQ8QY3xme7fb7T4ZjUbrZVl+jjFuSXoEXGxCDgIl9WzfAO5LSmzvNB771R6vzG4Bx0MIt/M8vwV8aLyDQNt70+n0G1AspaTxVln+aghQluVsKbvxVysflT9NQK/XO7R/SGiQ9Nt2aftElmWXJd1kv0kbeANQVdWl4XB4XtJouXaqNRgMHkrqS+r0+/3XwD1JXdungRfAVWBi+6WkK8D3EMJz22cl3W21WgNgx3YAzvwFd0Chdq03gKUAAAAASUVORK5CYII=&style=social)](https://shop.hud.ai)
 ### Are you a startup building agents?
-[📅 Hop on a call](https://cal.com/jay-ram-z6st6w/demo) or [📧 founders@hud.so](mailto:founders@hud.so)
+[📅 Hop on a call](https://cal.com/jay-ram-z6st6w/demo) or [📧 founders@hud.ai](mailto:founders@hud.ai)
 ## Highlights
-- 🚀 **[MCP environment skeleton](https://docs.hud.so/core-concepts/mcp-protocol)** – any agent can call any environment.
-- ⚡️ **[Live telemetry](https://hud.so)** – inspect every tool call, observation, and reward in real time.
-- 🗂️ **[Public benchmarks](https://hud.so/leaderboards)** – OSWorld-Verified, SheetBench-50, and more.
+- 🚀 **[MCP environment skeleton](https://docs.hud.ai/core-concepts/mcp-protocol)** – any agent can call any environment.
+- ⚡️ **[Live telemetry](https://hud.ai)** – inspect every tool call, observation, and reward in real time.
+- 🗂️ **[Public benchmarks](https://hud.ai/leaderboards)** – OSWorld-Verified, SheetBench-50, and more.
 - 🌐 **[Cloud browsers](environments/remote_browser/)** – AnchorBrowser, Steel, BrowserBase integrations for browser automation.
 - 🛠️ **[Hot-reload dev loop](environments/README.md#phase-5-hot-reload-development-with-cursor-agent)** – `hud dev` for iterating on environments without rebuilds.
-- 🎓 **[One-click RL](https://hud.so/models)** – Run `hud rl` to get a trained model on any environment.
+- 🎓 **[One-click RL](https://hud.ai/models)** – Run `hud rl` to get a trained model on any environment.
 > We welcome contributors and feature requests – open an issue or hop on a call to discuss improvements!
@@ -182,10 +183,10 @@ uv tool install hud-python
 # uv tool update-shell
 ```
-> See [docs.hud.so](https://docs.hud.so), or add docs to any MCP client:
-> `claude mcp add --transport http docs-hud https://docs.hud.so/mcp`
+> See [docs.hud.ai](https://docs.hud.ai), or add docs to any MCP client:
+> `claude mcp add --transport http docs-hud https://docs.hud.ai/mcp`
-Before starting, get your HUD_API_KEY at [hud.so](https://hud.so).
+Before starting, get your HUD_API_KEY at [hud.ai](https://hud.ai).
 ## Quickstart: Evals
@@ -203,17 +204,17 @@ import asyncio, hud, os
 from hud.settings import settings
 from hud.clients import MCPClient
 from hud.agents import ClaudeAgent
-from hud.datasets import Task  # See docs: https://docs.hud.so/reference/tasks
+from hud.datasets import Task  # See docs: https://docs.hud.ai/reference/tasks
 async def main() -> None:
-    with hud.trace("Quick Start 2048"): # All telemetry works for any MCP-based agent (see https://hud.so)
+    with hud.trace("Quick Start 2048"): # All telemetry works for any MCP-based agent (see https://hud.ai)
         task = {
             "prompt": "Reach 64 in 2048.",
             "mcp_config": {
                 "hud": {
-                    "url": "https://mcp.hud.so/v3/mcp",  # HUD's cloud MCP server (see https://docs.hud.so/core-concepts/architecture)
+                    "url": "https://mcp.hud.ai/v3/mcp",  # HUD's cloud MCP server (see https://docs.hud.ai/core-concepts/architecture)
                     "headers": {
-                        "Authorization": f"Bearer {settings.api_key}",  # Get your key at https://hud.so
+                        "Authorization": f"Bearer {settings.api_key}",  # Get your key at https://hud.ai
                         "Mcp-Image": "hudpython/hud-text-2048:v1.2"  # Docker image from https://hub.docker.com/u/hudpython
                     }
                 }
@@ -240,7 +241,7 @@ async def main() -> None:
 asyncio.run(main())
 ```
-The above example let's the agent play 2048 ([See replay](https://hud.so/trace/6feed7bd-5f67-4d66-b77f-eb1e3164604f))
+The above example let's the agent play 2048 ([See replay](https://hud.ai/trace/6feed7bd-5f67-4d66-b77f-eb1e3164604f))
 ![Agent playing 2048](https://raw.githubusercontent.com/hud-evals/hud-python/main/docs/src/images/2048_1.gif)
@@ -253,7 +254,7 @@ hud get hud-evals/2048-basic # from HF
 hud rl 2048-basic.json
 ```
-> See [agent training docs](https://docs.hud.so/train-agents/quickstart)
+> See [agent training docs](https://docs.hud.ai/train-agents/quickstart)
 Or make your own environment and dataset:
@@ -264,7 +265,7 @@ hud dev --interactive
 hud rl
 ```
-> See [environment design docs](https://docs.hud.so/build-environments)
+> See [environment design docs](https://docs.hud.ai/build-environments)
 ## Benchmarking Agents
@@ -272,7 +273,7 @@ This is Claude Computer Use running on our proprietary financial analyst benchma
 ![Trace screenshot](https://raw.githubusercontent.com/hud-evals/hud-python/main/docs/src/images/trace_sheet.gif)
-> [See this trace on _hud.so_](https://hud.so/trace/9e212e9e-3627-4f1f-9eb5-c6d03c59070a)
+> [See this trace on _hud.ai_](https://hud.ai/trace/9e212e9e-3627-4f1f-9eb5-c6d03c59070a)
 This example runs the full dataset (only takes ~20 minutes) using [run_evaluation.py](examples/run_evaluation.py):
@@ -290,7 +291,7 @@ from hud.agents import ClaudeAgent
 results = await run_dataset(
     name="My SheetBench-50 Evaluation",
     dataset="hud-evals/SheetBench-50",      # <-- HuggingFace dataset
-    agent_class=ClaudeAgent,                # <-- Your custom agent can replace this (see https://docs.hud.so/evaluate-agents/create-agents)
+    agent_class=ClaudeAgent,                # <-- Your custom agent can replace this (see https://docs.hud.ai/evaluate-agents/create-agents)
     agent_config={"model": "claude-sonnet-4-20250514"},
     max_concurrent=50,
     max_steps=30,
@@ -298,13 +299,13 @@ results = await run_dataset(
 print(f"Average reward: {sum(r.reward for r in results) / len(results):.2f}")
 ```
-> Running a dataset creates a job and streams results to the [hud.so](https://hud.so) platform for analysis and [leaderboard submission](https://docs.hud.so/evaluate-agents/leaderboards).
+> Running a dataset creates a job and streams results to the [hud.ai](https://hud.ai) platform for analysis and [leaderboard submission](https://docs.hud.ai/evaluate-agents/leaderboards).
 ## Building Environments (MCP)
 This is how you can make any environment into an interactable one in 5 steps:
-1. Define MCP server layer using [`MCPServer`](https://docs.hud.so/reference/environments)
+1. Define MCP server layer using [`MCPServer`](https://docs.hud.ai/reference/environments)
 ```python
 from hud.server import MCPServer
@@ -312,10 +313,10 @@ from hud.tools import HudComputerTool
 mcp = MCPServer("My Environment")
-# Add hud tools (see all tools: https://docs.hud.so/reference/tools)
+# Add hud tools (see all tools: https://docs.hud.ai/reference/tools)
 mcp.tool(HudComputerTool())
-# Or custom tools (see https://docs.hud.so/build-environments/adapting-software)
+# Or custom tools (see https://docs.hud.ai/build-environments/adapting-software)
 @mcp.tool("launch_app"):
 def launch_app(name: str = "Gmail")
 ...
@@ -389,16 +390,16 @@ Tools
 hud push # needs docker login, hud api key
 ```
-5. Now you can use `mcp.hud.so` to launch 100s of instances of this environment in parallel with any agent, and see everything live on [hud.so](https://hud.so):
+5. Now you can use `mcp.hud.ai` to launch 100s of instances of this environment in parallel with any agent, and see everything live on [hud.ai](https://hud.ai):
 ```python
 from hud.agents import ClaudeAgent
-result = await ClaudeAgent().run({  # See all agents: https://docs.hud.so/reference/agents
+result = await ClaudeAgent().run({  # See all agents: https://docs.hud.ai/reference/agents
     "prompt": "Please explore this environment",
     "mcp_config": {
         "my-environment": {
-            "url": "https://mcp.hud.so/v3/mcp",
+            "url": "https://mcp.hud.ai/v3/mcp",
             "headers": {
                 "Authorization": f"Bearer {os.getenv('HUD_API_KEY')}",
                 "Mcp-Image": "my-name/my-environment:latest"
@@ -420,13 +421,13 @@ result = await ClaudeAgent().run({  # See all agents: https://docs.hud.so/refere
 ## Leaderboards & benchmarks
-All leaderboards are publicly available on [hud.so/leaderboards](https://hud.so/leaderboards) (see [docs](https://docs.hud.so/evaluate-agents/leaderboards))
+All leaderboards are publicly available on [hud.ai/leaderboards](https://hud.ai/leaderboards) (see [docs](https://docs.hud.ai/evaluate-agents/leaderboards))
 ![Leaderboard](https://raw.githubusercontent.com/hud-evals/hud-python/main/docs/src/images/leaderboards_3.png)
 We highly suggest running 3-5 evaluations per dataset for the most consistent results across multiple jobs.
-Using the [`run_dataset`](https://docs.hud.so/reference/tasks#run_dataset) function with a HuggingFace dataset automatically assigns your job to that leaderboard page, and allows you to create a scorecard out of it:
+Using the [`run_dataset`](https://docs.hud.ai/reference/tasks#run_dataset) function with a HuggingFace dataset automatically assigns your job to that leaderboard page, and allows you to create a scorecard out of it:
 ## Reinforcement Learning with GRPO
@@ -455,11 +456,11 @@ Supports multi‑turn RL for both:
 - Language‑only models (e.g., `Qwen/Qwen2.5-7B-Instruct`)
 - Vision‑Language models (e.g., `Qwen/Qwen2.5-VL-3B-Instruct`)
-By default, `hud rl` provisions a persistent server and trainer in the cloud, streams telemetry to `hud.so`, and lets you monitor/manage models at `hud.so/models`. Use `--local` to run entirely on your machines (typically 2+ GPUs: one for vLLM, the rest for training).
+By default, `hud rl` provisions a persistent server and trainer in the cloud, streams telemetry to `hud.ai`, and lets you monitor/manage models at `hud.ai/models`. Use `--local` to run entirely on your machines (typically 2+ GPUs: one for vLLM, the rest for training).
-Any HUD MCP environment and evaluation works with our RL pipeline (including remote configurations). See the guided docs: `https://docs.hud.so/train-agents/quickstart`.
+Any HUD MCP environment and evaluation works with our RL pipeline (including remote configurations). See the guided docs: `https://docs.hud.ai/train-agents/quickstart`.
-Pricing: Hosted vLLM and training GPU rates are listed in the [Training Quickstart → Pricing](https://docs.hud.so/train-agents/quickstart#pricing). Manage billing at the [HUD billing dashboard](https://hud.so/project/billing).
+Pricing: Hosted vLLM and training GPU rates are listed in the [Training Quickstart → Pricing](https://docs.hud.ai/train-agents/quickstart#pricing). Manage billing at the [HUD billing dashboard](https://hud.ai/project/billing).
 ## Architecture
@@ -467,8 +468,8 @@ Pricing: Hosted vLLM and training GPU rates are listed in the [Training Quicksta
 %%{init: {"theme": "neutral", "themeVariables": {"fontSize": "14px"}} }%%
 graph LR
     subgraph "Platform"
-        Dashboard["📊 hud.so"]
-        API["🔌 mcp.hud.so"]
+        Dashboard["📊 hud.ai"]
+        API["🔌 mcp.hud.ai"]
     end
     subgraph "hud"
@@ -506,14 +507,14 @@ graph LR
 | Command                 | Purpose                                    | Docs |
 | ----------------------- | ------------------------------------------ | ---- |
-| [`hud init`](https://docs.hud.so/reference/cli/init)            | Create new environment with boilerplate.  | [📖](https://docs.hud.so/reference/cli/init) |
-| [`hud dev`](https://docs.hud.so/reference/cli/dev)              | Hot-reload development with Docker.        | [📖](https://docs.hud.so/reference/cli/dev) |
-| [`hud build`](https://docs.hud.so/reference/cli/build)          | Build image and generate lock file.       | [📖](https://docs.hud.so/reference/cli/build) |
-| [`hud push`](https://docs.hud.so/reference/cli/push)            | Share environment to registry.            | [📖](https://docs.hud.so/reference/cli/push) |
-| [`hud pull <target>`](https://docs.hud.so/reference/cli/pull)   | Get environment from registry.            | [📖](https://docs.hud.so/reference/cli/pull) |
-| [`hud analyze <image>`](https://docs.hud.so/reference/cli/analyze) | Discover tools, resources, and metadata.   | [📖](https://docs.hud.so/reference/cli/analyze) |
-| [`hud debug <image>`](https://docs.hud.so/reference/cli/debug)   | Five-phase health check of an environment. | [📖](https://docs.hud.so/reference/cli/debug) |
-| [`hud run <image>`](https://docs.hud.so/reference/cli/run)       | Run MCP server locally or remotely.       | [📖](https://docs.hud.so/reference/cli/run) |
+| [`hud init`](https://docs.hud.ai/reference/cli/init)            | Create new environment with boilerplate.  | [📖](https://docs.hud.ai/reference/cli/init) |
+| [`hud dev`](https://docs.hud.ai/reference/cli/dev)              | Hot-reload development with Docker.        | [📖](https://docs.hud.ai/reference/cli/dev) |
+| [`hud build`](https://docs.hud.ai/reference/cli/build)          | Build image and generate lock file.       | [📖](https://docs.hud.ai/reference/cli/build) |
+| [`hud push`](https://docs.hud.ai/reference/cli/push)            | Share environment to registry.            | [📖](https://docs.hud.ai/reference/cli/push) |
+| [`hud pull <target>`](https://docs.hud.ai/reference/cli/pull)   | Get environment from registry.            | [📖](https://docs.hud.ai/reference/cli/pull) |
+| [`hud analyze <image>`](https://docs.hud.ai/reference/cli/analyze) | Discover tools, resources, and metadata.   | [📖](https://docs.hud.ai/reference/cli/analyze) |
+| [`hud debug <image>`](https://docs.hud.ai/reference/cli/debug)   | Five-phase health check of an environment. | [📖](https://docs.hud.ai/reference/cli/debug) |
+| [`hud run <image>`](https://docs.hud.ai/reference/cli/run)       | Run MCP server locally or remotely.       | [📖](https://docs.hud.ai/reference/cli/run) |
 ## Roadmap

{hud_python-0.4.60 → hud_python-0.4.62}/README.md RENAMED Viewed

@@ -13,21 +13,21 @@ OSS RL environment + evals toolkit. Wrap software as environments, run benchmark
 [![Add docs to Cursor](https://img.shields.io/badge/Add%20docs%20to-Cursor-black?style=flat-square)](https://cursor.com/en/install-mcp?name=docs-hud-python&config=eyJ1cmwiOiJodHRwczovL2RvY3MuaHVkLnNvL21jcCJ9)
 [![Discord](https://img.shields.io/discord/1327447144772407390?label=Discord&logo=discord&style=flat-square)](https://discord.gg/wkjtmHYYjm)
 [![X Follow](https://img.shields.io/twitter/follow/hud_evals?style=social)](https://x.com/intent/user?screen_name=hud_evals)
-[![Shop](https://img.shields.io/badge/_-white.svg?label=shop&logo=data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAABQAAAAJCAYAAAAywQxIAAAAAXNSR0IArs4c6QAAAARnQU1BAACxjwv8YQUAAAAJcEhZcwAACxMAAAsTAQCanBgAAAF6SURBVChTlZA9ixNhFIWf8yaTpFHRRMXCKpAZhCAYFvwoLHZhwUKw9A9YCJb+Bq0sxGbBQrTxX1j41dvIRAjGZbdwRUUGIzPMeyw2swS3WZ/ynHvP5VylafoAWAd+5Xm+wX+SpukmcMf29RDCZrD9BViz3f53+CjYngKZpD5A2/Y7SQBMJpOkKIprdV1vdzqdHzHGblmW9Ww2+5pl2TmAxWKxmM/nP8fj8cmqqtZijJ9sb0u6ABBWjh0riuIt8CqE8LGu66e2d5MkeQ8QY3xme7fb7T4ZjUbrZVl+jjFuSXoEXGxCDgIl9WzfAO5LSmzvNB771R6vzG4Bx0MIt/M8vwV8aLyDQNt70+n0G1AspaTxVln+aghQluVsKbvxVysflT9NQK/XO7R/SGiQ9Nt2aftElmWXJd1kv0kbeANQVdWl4XB4XtJouXaqNRgMHkrqS+r0+/3XwD1JXdungRfAVWBi+6WkK8D3EMJz22cl3W21WgNgx3YAzvwFd0Chdq03gKUAAAAASUVORK5CYII=&style=social)](https://shop.hud.so)
+[![Shop](https://img.shields.io/badge/_-white.svg?label=shop&logo=data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAABQAAAAJCAYAAAAywQxIAAAAAXNSR0IArs4c6QAAAARnQU1BAACxjwv8YQUAAAAJcEhZcwAACxMAAAsTAQCanBgAAAF6SURBVChTlZA9ixNhFIWf8yaTpFHRRMXCKpAZhCAYFvwoLHZhwUKw9A9YCJb+Bq0sxGbBQrTxX1j41dvIRAjGZbdwRUUGIzPMeyw2swS3WZ/ynHvP5VylafoAWAd+5Xm+wX+SpukmcMf29RDCZrD9BViz3f53+CjYngKZpD5A2/Y7SQBMJpOkKIprdV1vdzqdHzHGblmW9Ww2+5pl2TmAxWKxmM/nP8fj8cmqqtZijJ9sb0u6ABBWjh0riuIt8CqE8LGu66e2d5MkeQ8QY3xme7fb7T4ZjUbrZVl+jjFuSXoEXGxCDgIl9WzfAO5LSmzvNB771R6vzG4Bx0MIt/M8vwV8aLyDQNt70+n0G1AspaTxVln+aghQluVsKbvxVysflT9NQK/XO7R/SGiQ9Nt2aftElmWXJd1kv0kbeANQVdWl4XB4XtJouXaqNRgMHkrqS+r0+/3XwD1JXdungRfAVWBi+6WkK8D3EMJz22cl3W21WgNgx3YAzvwFd0Chdq03gKUAAAAASUVORK5CYII=&style=social)](https://shop.hud.ai)
 ### Are you a startup building agents?
-[📅 Hop on a call](https://cal.com/jay-ram-z6st6w/demo) or [📧 founders@hud.so](mailto:founders@hud.so)
+[📅 Hop on a call](https://cal.com/jay-ram-z6st6w/demo) or [📧 founders@hud.ai](mailto:founders@hud.ai)
 ## Highlights
-- 🚀 **[MCP environment skeleton](https://docs.hud.so/core-concepts/mcp-protocol)** – any agent can call any environment.
-- ⚡️ **[Live telemetry](https://hud.so)** – inspect every tool call, observation, and reward in real time.
-- 🗂️ **[Public benchmarks](https://hud.so/leaderboards)** – OSWorld-Verified, SheetBench-50, and more.
+- 🚀 **[MCP environment skeleton](https://docs.hud.ai/core-concepts/mcp-protocol)** – any agent can call any environment.
+- ⚡️ **[Live telemetry](https://hud.ai)** – inspect every tool call, observation, and reward in real time.
+- 🗂️ **[Public benchmarks](https://hud.ai/leaderboards)** – OSWorld-Verified, SheetBench-50, and more.
 - 🌐 **[Cloud browsers](environments/remote_browser/)** – AnchorBrowser, Steel, BrowserBase integrations for browser automation.
 - 🛠️ **[Hot-reload dev loop](environments/README.md#phase-5-hot-reload-development-with-cursor-agent)** – `hud dev` for iterating on environments without rebuilds.
-- 🎓 **[One-click RL](https://hud.so/models)** – Run `hud rl` to get a trained model on any environment.
+- 🎓 **[One-click RL](https://hud.ai/models)** – Run `hud rl` to get a trained model on any environment.
 > We welcome contributors and feature requests – open an issue or hop on a call to discuss improvements!
@@ -42,10 +42,10 @@ uv tool install hud-python
 # uv tool update-shell
 ```
-> See [docs.hud.so](https://docs.hud.so), or add docs to any MCP client:
-> `claude mcp add --transport http docs-hud https://docs.hud.so/mcp`
+> See [docs.hud.ai](https://docs.hud.ai), or add docs to any MCP client:
+> `claude mcp add --transport http docs-hud https://docs.hud.ai/mcp`
-Before starting, get your HUD_API_KEY at [hud.so](https://hud.so).
+Before starting, get your HUD_API_KEY at [hud.ai](https://hud.ai).
 ## Quickstart: Evals
@@ -63,17 +63,17 @@ import asyncio, hud, os
 from hud.settings import settings
 from hud.clients import MCPClient
 from hud.agents import ClaudeAgent
-from hud.datasets import Task  # See docs: https://docs.hud.so/reference/tasks
+from hud.datasets import Task  # See docs: https://docs.hud.ai/reference/tasks
 async def main() -> None:
-    with hud.trace("Quick Start 2048"): # All telemetry works for any MCP-based agent (see https://hud.so)
+    with hud.trace("Quick Start 2048"): # All telemetry works for any MCP-based agent (see https://hud.ai)
         task = {
             "prompt": "Reach 64 in 2048.",
             "mcp_config": {
                 "hud": {
-                    "url": "https://mcp.hud.so/v3/mcp",  # HUD's cloud MCP server (see https://docs.hud.so/core-concepts/architecture)
+                    "url": "https://mcp.hud.ai/v3/mcp",  # HUD's cloud MCP server (see https://docs.hud.ai/core-concepts/architecture)
                     "headers": {
-                        "Authorization": f"Bearer {settings.api_key}",  # Get your key at https://hud.so
+                        "Authorization": f"Bearer {settings.api_key}",  # Get your key at https://hud.ai
                         "Mcp-Image": "hudpython/hud-text-2048:v1.2"  # Docker image from https://hub.docker.com/u/hudpython
                     }
                 }
@@ -100,7 +100,7 @@ async def main() -> None:
 asyncio.run(main())
 ```
-The above example let's the agent play 2048 ([See replay](https://hud.so/trace/6feed7bd-5f67-4d66-b77f-eb1e3164604f))
+The above example let's the agent play 2048 ([See replay](https://hud.ai/trace/6feed7bd-5f67-4d66-b77f-eb1e3164604f))
 ![Agent playing 2048](https://raw.githubusercontent.com/hud-evals/hud-python/main/docs/src/images/2048_1.gif)
@@ -113,7 +113,7 @@ hud get hud-evals/2048-basic # from HF
 hud rl 2048-basic.json
 ```
-> See [agent training docs](https://docs.hud.so/train-agents/quickstart)
+> See [agent training docs](https://docs.hud.ai/train-agents/quickstart)
 Or make your own environment and dataset:
@@ -124,7 +124,7 @@ hud dev --interactive
 hud rl
 ```
-> See [environment design docs](https://docs.hud.so/build-environments)
+> See [environment design docs](https://docs.hud.ai/build-environments)
 ## Benchmarking Agents
@@ -132,7 +132,7 @@ This is Claude Computer Use running on our proprietary financial analyst benchma
 ![Trace screenshot](https://raw.githubusercontent.com/hud-evals/hud-python/main/docs/src/images/trace_sheet.gif)
-> [See this trace on _hud.so_](https://hud.so/trace/9e212e9e-3627-4f1f-9eb5-c6d03c59070a)
+> [See this trace on _hud.ai_](https://hud.ai/trace/9e212e9e-3627-4f1f-9eb5-c6d03c59070a)
 This example runs the full dataset (only takes ~20 minutes) using [run_evaluation.py](examples/run_evaluation.py):
@@ -150,7 +150,7 @@ from hud.agents import ClaudeAgent
 results = await run_dataset(
     name="My SheetBench-50 Evaluation",
     dataset="hud-evals/SheetBench-50",      # <-- HuggingFace dataset
-    agent_class=ClaudeAgent,                # <-- Your custom agent can replace this (see https://docs.hud.so/evaluate-agents/create-agents)
+    agent_class=ClaudeAgent,                # <-- Your custom agent can replace this (see https://docs.hud.ai/evaluate-agents/create-agents)
     agent_config={"model": "claude-sonnet-4-20250514"},
     max_concurrent=50,
     max_steps=30,
@@ -158,13 +158,13 @@ results = await run_dataset(
 print(f"Average reward: {sum(r.reward for r in results) / len(results):.2f}")
 ```
-> Running a dataset creates a job and streams results to the [hud.so](https://hud.so) platform for analysis and [leaderboard submission](https://docs.hud.so/evaluate-agents/leaderboards).
+> Running a dataset creates a job and streams results to the [hud.ai](https://hud.ai) platform for analysis and [leaderboard submission](https://docs.hud.ai/evaluate-agents/leaderboards).
 ## Building Environments (MCP)
 This is how you can make any environment into an interactable one in 5 steps:
-1. Define MCP server layer using [`MCPServer`](https://docs.hud.so/reference/environments)
+1. Define MCP server layer using [`MCPServer`](https://docs.hud.ai/reference/environments)
 ```python
 from hud.server import MCPServer
@@ -172,10 +172,10 @@ from hud.tools import HudComputerTool
 mcp = MCPServer("My Environment")
-# Add hud tools (see all tools: https://docs.hud.so/reference/tools)
+# Add hud tools (see all tools: https://docs.hud.ai/reference/tools)
 mcp.tool(HudComputerTool())
-# Or custom tools (see https://docs.hud.so/build-environments/adapting-software)
+# Or custom tools (see https://docs.hud.ai/build-environments/adapting-software)
 @mcp.tool("launch_app"):
 def launch_app(name: str = "Gmail")
 ...
@@ -249,16 +249,16 @@ Tools
 hud push # needs docker login, hud api key
 ```
-5. Now you can use `mcp.hud.so` to launch 100s of instances of this environment in parallel with any agent, and see everything live on [hud.so](https://hud.so):
+5. Now you can use `mcp.hud.ai` to launch 100s of instances of this environment in parallel with any agent, and see everything live on [hud.ai](https://hud.ai):
 ```python
 from hud.agents import ClaudeAgent
-result = await ClaudeAgent().run({  # See all agents: https://docs.hud.so/reference/agents
+result = await ClaudeAgent().run({  # See all agents: https://docs.hud.ai/reference/agents
     "prompt": "Please explore this environment",
     "mcp_config": {
         "my-environment": {
-            "url": "https://mcp.hud.so/v3/mcp",
+            "url": "https://mcp.hud.ai/v3/mcp",
             "headers": {
                 "Authorization": f"Bearer {os.getenv('HUD_API_KEY')}",
                 "Mcp-Image": "my-name/my-environment:latest"
@@ -280,13 +280,13 @@ result = await ClaudeAgent().run({  # See all agents: https://docs.hud.so/refere
 ## Leaderboards & benchmarks
-All leaderboards are publicly available on [hud.so/leaderboards](https://hud.so/leaderboards) (see [docs](https://docs.hud.so/evaluate-agents/leaderboards))
+All leaderboards are publicly available on [hud.ai/leaderboards](https://hud.ai/leaderboards) (see [docs](https://docs.hud.ai/evaluate-agents/leaderboards))
 ![Leaderboard](https://raw.githubusercontent.com/hud-evals/hud-python/main/docs/src/images/leaderboards_3.png)
 We highly suggest running 3-5 evaluations per dataset for the most consistent results across multiple jobs.
-Using the [`run_dataset`](https://docs.hud.so/reference/tasks#run_dataset) function with a HuggingFace dataset automatically assigns your job to that leaderboard page, and allows you to create a scorecard out of it:
+Using the [`run_dataset`](https://docs.hud.ai/reference/tasks#run_dataset) function with a HuggingFace dataset automatically assigns your job to that leaderboard page, and allows you to create a scorecard out of it:
 ## Reinforcement Learning with GRPO
@@ -315,11 +315,11 @@ Supports multi‑turn RL for both:
 - Language‑only models (e.g., `Qwen/Qwen2.5-7B-Instruct`)
 - Vision‑Language models (e.g., `Qwen/Qwen2.5-VL-3B-Instruct`)
-By default, `hud rl` provisions a persistent server and trainer in the cloud, streams telemetry to `hud.so`, and lets you monitor/manage models at `hud.so/models`. Use `--local` to run entirely on your machines (typically 2+ GPUs: one for vLLM, the rest for training).
+By default, `hud rl` provisions a persistent server and trainer in the cloud, streams telemetry to `hud.ai`, and lets you monitor/manage models at `hud.ai/models`. Use `--local` to run entirely on your machines (typically 2+ GPUs: one for vLLM, the rest for training).
-Any HUD MCP environment and evaluation works with our RL pipeline (including remote configurations). See the guided docs: `https://docs.hud.so/train-agents/quickstart`.
+Any HUD MCP environment and evaluation works with our RL pipeline (including remote configurations). See the guided docs: `https://docs.hud.ai/train-agents/quickstart`.
-Pricing: Hosted vLLM and training GPU rates are listed in the [Training Quickstart → Pricing](https://docs.hud.so/train-agents/quickstart#pricing). Manage billing at the [HUD billing dashboard](https://hud.so/project/billing).
+Pricing: Hosted vLLM and training GPU rates are listed in the [Training Quickstart → Pricing](https://docs.hud.ai/train-agents/quickstart#pricing). Manage billing at the [HUD billing dashboard](https://hud.ai/project/billing).
 ## Architecture
@@ -327,8 +327,8 @@ Pricing: Hosted vLLM and training GPU rates are listed in the [Training Quicksta
 %%{init: {"theme": "neutral", "themeVariables": {"fontSize": "14px"}} }%%
 graph LR
     subgraph "Platform"
-        Dashboard["📊 hud.so"]
-        API["🔌 mcp.hud.so"]
+        Dashboard["📊 hud.ai"]
+        API["🔌 mcp.hud.ai"]
     end
     subgraph "hud"
@@ -366,14 +366,14 @@ graph LR
 | Command                 | Purpose                                    | Docs |
 | ----------------------- | ------------------------------------------ | ---- |
-| [`hud init`](https://docs.hud.so/reference/cli/init)            | Create new environment with boilerplate.  | [📖](https://docs.hud.so/reference/cli/init) |
-| [`hud dev`](https://docs.hud.so/reference/cli/dev)              | Hot-reload development with Docker.        | [📖](https://docs.hud.so/reference/cli/dev) |
-| [`hud build`](https://docs.hud.so/reference/cli/build)          | Build image and generate lock file.       | [📖](https://docs.hud.so/reference/cli/build) |
-| [`hud push`](https://docs.hud.so/reference/cli/push)            | Share environment to registry.            | [📖](https://docs.hud.so/reference/cli/push) |
-| [`hud pull <target>`](https://docs.hud.so/reference/cli/pull)   | Get environment from registry.            | [📖](https://docs.hud.so/reference/cli/pull) |
-| [`hud analyze <image>`](https://docs.hud.so/reference/cli/analyze) | Discover tools, resources, and metadata.   | [📖](https://docs.hud.so/reference/cli/analyze) |
-| [`hud debug <image>`](https://docs.hud.so/reference/cli/debug)   | Five-phase health check of an environment. | [📖](https://docs.hud.so/reference/cli/debug) |
-| [`hud run <image>`](https://docs.hud.so/reference/cli/run)       | Run MCP server locally or remotely.       | [📖](https://docs.hud.so/reference/cli/run) |
+| [`hud init`](https://docs.hud.ai/reference/cli/init)            | Create new environment with boilerplate.  | [📖](https://docs.hud.ai/reference/cli/init) |
+| [`hud dev`](https://docs.hud.ai/reference/cli/dev)              | Hot-reload development with Docker.        | [📖](https://docs.hud.ai/reference/cli/dev) |
+| [`hud build`](https://docs.hud.ai/reference/cli/build)          | Build image and generate lock file.       | [📖](https://docs.hud.ai/reference/cli/build) |
+| [`hud push`](https://docs.hud.ai/reference/cli/push)            | Share environment to registry.            | [📖](https://docs.hud.ai/reference/cli/push) |
+| [`hud pull <target>`](https://docs.hud.ai/reference/cli/pull)   | Get environment from registry.            | [📖](https://docs.hud.ai/reference/cli/pull) |
+| [`hud analyze <image>`](https://docs.hud.ai/reference/cli/analyze) | Discover tools, resources, and metadata.   | [📖](https://docs.hud.ai/reference/cli/analyze) |
+| [`hud debug <image>`](https://docs.hud.ai/reference/cli/debug)   | Five-phase health check of an environment. | [📖](https://docs.hud.ai/reference/cli/debug) |
+| [`hud run <image>`](https://docs.hud.ai/reference/cli/run)       | Run MCP server locally or remotely.       | [📖](https://docs.hud.ai/reference/cli/run) |
 ## Roadmap

{hud_python-0.4.60 → hud_python-0.4.62}/environments/blank/README.md RENAMED Viewed

@@ -1,7 +1,7 @@
 # Blank Environment
 Minimal starter template for building HUD environments.
-See [docs](https://docs.hud.so/build-environments) for the complete environment design workflow.
+See [docs](https://docs.hud.ai/build-environments) for the complete environment design workflow.
 ## Architecture
@@ -120,9 +120,9 @@ save_tasks(tasks, repo_id="your-org/your-dataset")
 hud eval "your-org/your-dataset" claude
 # View results at:
-# hud.so/leaderboards/your-org/your-dataset
+# hud.ai/leaderboards/your-org/your-dataset
 ```
 **Note**: Only public HuggingFace datasets appear as leaderboards!
-📚 Learn more: [Creating Benchmarks](https://docs.hud.so/evaluate-agents/create-benchmarks) | [Leaderboards](https://docs.hud.so/evaluate-agents/leaderboards)
+📚 Learn more: [Creating Benchmarks](https://docs.hud.ai/evaluate-agents/create-benchmarks) | [Leaderboards](https://docs.hud.ai/evaluate-agents/leaderboards)

{hud_python-0.4.60 → hud_python-0.4.62}/environments/browser/README.md RENAMED Viewed

@@ -74,12 +74,12 @@ save_tasks(tasks, repo_id="your-org/your-dataset")
 hud eval "your-org/your-dataset" --agent claude
 # View results at:
-# hud.so/leaderboards/your-org/your-dataset
+# hud.ai/leaderboards/your-org/your-dataset
 ```
 **Note**: Only public HuggingFace datasets appear as leaderboards!
-📚 Learn more: [Creating Benchmarks](https://docs.hud.so/evaluate-agents/create-benchmarks) | [Leaderboards](https://docs.hud.so/evaluate-agents/leaderboards)
+📚 Learn more: [Creating Benchmarks](https://docs.hud.ai/evaluate-agents/create-benchmarks) | [Leaderboards](https://docs.hud.ai/evaluate-agents/leaderboards)
 ## Architecture Overview

{hud_python-0.4.60 → hud_python-0.4.62}/environments/browser/server/pyproject.toml RENAMED Viewed

@@ -4,7 +4,7 @@ version = "0.1.0"
 description = "HUD Browser MCP Server"
 requires-python = ">=3.11,<3.14"
 dependencies = [
-    "hud-python>=0.4.60",
+    "hud-python>=0.4.62",
     "httpx",
     "playwright",
     "pyautogui",

{hud_python-0.4.60 → hud_python-0.4.62}/environments/deepresearch/README.md RENAMED Viewed

@@ -1,7 +1,7 @@
 # Deep Research Environment
 Web research environment powered by Exa API for searching and fetching content.
-See [docs](https://docs.hud.so/build-environments) for the complete environment design workflow.
+See [docs](https://docs.hud.ai/build-environments) for the complete environment design workflow.
 ## Architecture
@@ -141,12 +141,12 @@ save_tasks(tasks, repo_id="your-org/your-dataset")
 hud eval "your-org/your-dataset" --agent claude
 # View results at:
-# hud.so/leaderboards/your-org/your-dataset
+# hud.ai/leaderboards/your-org/your-dataset
 ```
 **Note**: Only public HuggingFace datasets appear as leaderboards!
-📚 Learn more: [Creating Benchmarks](https://docs.hud.so/evaluate-agents/create-benchmarks) | [Leaderboards](https://docs.hud.so/evaluate-agents/leaderboards)
+📚 Learn more: [Creating Benchmarks](https://docs.hud.ai/evaluate-agents/create-benchmarks) | [Leaderboards](https://docs.hud.ai/evaluate-agents/leaderboards)
 ## Example Research Workflow

hud_python-0.4.62/environments/jupyter/README.md ADDED Viewed

@@ -0,0 +1,68 @@
+# Jupyter Env (for SpreadSheetBench)
+## QuickStart
+### MCP Server from Dockerhub (Don't Have to Build Docker Image)
+Run task by
+```
+hud eval Genteki/SpreadSheetBench
+```
+### Local MCP Server
+First we build the docker image with
+```
+docker build -t <image/name> .
+```
+Then modify the docker image name in `test_task.json`. Finally, load all `api_key` needed into environment varible and run
+```
+hud eval
+```
+## File Structure
+`environments/jupyter` file sturcture:
+```
+├── Dockerfile
+├── server
+│   ├── config.py
+│   ├── evaluate
+│   │   ├── compare.py
+│   │   ├── dumb.py
+│   │   ├── eval_all.py
+│   │   ├── eval_single.py
+│   │   ├── generalize.py
+│   │   └── __init__.py
+│   ├── __init__.py
+│   ├── main.py
+│   ├── pyproject.toml
+│   ├── setup
+│   │   ├── __init__.py
+│   │   └── load_spreadsheet.py
+│   └── tools
+│       ├── __init__.py
+│       └── jupyter_with_record.py
+└── test_task.json
+```
+Here we introduce the main parts of the environments
+* `main.py` start point of MCP server
+* `tools/jupyter_with_record.py`: offer `execute_code` method to allow agent interacting with jupyter kernel and record the solution
+* `setup/`: setup methods for eval task
+* `evaluate/` evaluations method for eval task
+## Related Linkd
+### Hugginface:
+* [Genteki/SpreadSheetBench-Tiny](https://huggingface.co/datasets/Genteki/SpreadSheetBench-Tiny) (Size: 10)
+* [Genteki/SpreadSheetBench-200](https://huggingface.co/datasets/Genteki/SpreadSheetBench-200) (Size: 200)
+* [Genteki/SpreadSheetBench](https://huggingface.co/datasets/Genteki/SpreadSheetBench) (Size: 912)
+### Example Traces (May require permission)
+* [Single Test Task](https://www.hud.ai/trace/d31de170-e70a-4abb-8f95-70512515dade)
+* [Genteki/SpreadSheetBench-Tiny Test](https://www.hud.ai/jobs/2c426368-e352-4c79-af4a-aefb136e3f58)
+### Github
+* Feature Branch: [New-Env-Jupyter](https://github.com/Genteki/hud-python/tree/New-Env-Jupyter)

hud-python 0.4.60__tar.gz → 0.4.62__tar.gz

hud-python 0.4.60tar.gz → 0.4.62tar.gz