PyPI - hud-python - Versions diffs - 0.4.35__tar.gz → 0.4.37__tar.gz - Mend

hud-python 0.4.35tar.gz → 0.4.37tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Potentially problematic release.

This version of hud-python might be problematic. Click here for more details.

Files changed (250) hide show

{hud_python-0.4.35 → hud_python-0.4.37}/.gitignore RENAMED Viewed

@@ -22,7 +22,6 @@ uv.lock
 # Test files
 /*.ipynb
-test.json
 TODO.md
 .coverage

{hud_python-0.4.35 → hud_python-0.4.37}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: hud-python
-Version: 0.4.35
+Version: 0.4.37
 Summary: SDK for the HUD platform.
 Project-URL: Homepage, https://github.com/hud-evals/hud-python
 Project-URL: Bug Tracker, https://github.com/hud-evals/hud-python/issues
@@ -36,11 +36,13 @@ Classifier: Programming Language :: Python :: 3.12
 Classifier: Programming Language :: Python :: 3.13
 Requires-Python: <3.13,>=3.11
 Requires-Dist: anthropic
+Requires-Dist: blessed>=1.20.0
 Requires-Dist: datasets>=2.14.0
 Requires-Dist: httpx<1,>=0.23.0
 Requires-Dist: hud-fastmcp-python-sdk>=0.1.2
 Requires-Dist: hud-mcp-python-sdk>=3.13.2
-Requires-Dist: hud-mcp-use-python-sdk>=2.3.16
+Requires-Dist: hud-mcp-use-python-sdk==2.3.19
+Requires-Dist: litellm>=1.55.0
 Requires-Dist: numpy>=1.24.0
 Requires-Dist: openai
 Requires-Dist: opentelemetry-api>=1.34.1
@@ -50,8 +52,8 @@ Requires-Dist: opentelemetry-sdk>=1.34.1
 Requires-Dist: pathspec>=0.12.1
 Requires-Dist: pillow>=11.1.0
 Requires-Dist: prompt-toolkit==3.0.51
-Requires-Dist: pydantic-settings<3,>=2
-Requires-Dist: pydantic<3,>=2
+Requires-Dist: pydantic-settings<3,>=2.2
+Requires-Dist: pydantic<3,>=2.6
 Requires-Dist: questionary==2.1.0
 Requires-Dist: rich>=13.0.0
 Requires-Dist: toml>=0.10.2
@@ -59,7 +61,9 @@ Requires-Dist: typer>=0.9.0
 Requires-Dist: watchfiles>=0.21.0
 Requires-Dist: wrapt>=1.14.0
 Provides-Extra: agent
+Requires-Dist: aiodocker>=0.24.0; extra == 'agent'
 Requires-Dist: dotenv>=0.9.9; extra == 'agent'
+Requires-Dist: inspect-ai>=0.3.80; extra == 'agent'
 Requires-Dist: ipykernel; extra == 'agent'
 Requires-Dist: ipython<9; extra == 'agent'
 Requires-Dist: jupyter-client; extra == 'agent'
@@ -67,8 +71,21 @@ Requires-Dist: jupyter-core; extra == 'agent'
 Requires-Dist: langchain; extra == 'agent'
 Requires-Dist: langchain-anthropic; extra == 'agent'
 Requires-Dist: langchain-openai; extra == 'agent'
+Requires-Dist: pillow>=11.1.0; extra == 'agent'
+Requires-Dist: playwright; extra == 'agent'
+Requires-Dist: pyautogui>=0.9.54; extra == 'agent'
+Requires-Dist: pyright==1.1.401; extra == 'agent'
+Requires-Dist: pytest-asyncio; extra == 'agent'
+Requires-Dist: pytest-cov; extra == 'agent'
+Requires-Dist: pytest-mock; extra == 'agent'
+Requires-Dist: pytest<9,>=8.1.1; extra == 'agent'
+Requires-Dist: ruff>=0.11.8; extra == 'agent'
+Requires-Dist: setuptools; extra == 'agent'
+Requires-Dist: textdistance<5,>=4.5.0; extra == 'agent'
 Provides-Extra: agents
+Requires-Dist: aiodocker>=0.24.0; extra == 'agents'
 Requires-Dist: dotenv>=0.9.9; extra == 'agents'
+Requires-Dist: inspect-ai>=0.3.80; extra == 'agents'
 Requires-Dist: ipykernel; extra == 'agents'
 Requires-Dist: ipython<9; extra == 'agents'
 Requires-Dist: jupyter-client; extra == 'agents'
@@ -76,6 +93,17 @@ Requires-Dist: jupyter-core; extra == 'agents'
 Requires-Dist: langchain; extra == 'agents'
 Requires-Dist: langchain-anthropic; extra == 'agents'
 Requires-Dist: langchain-openai; extra == 'agents'
+Requires-Dist: pillow>=11.1.0; extra == 'agents'
+Requires-Dist: playwright; extra == 'agents'
+Requires-Dist: pyautogui>=0.9.54; extra == 'agents'
+Requires-Dist: pyright==1.1.401; extra == 'agents'
+Requires-Dist: pytest-asyncio; extra == 'agents'
+Requires-Dist: pytest-cov; extra == 'agents'
+Requires-Dist: pytest-mock; extra == 'agents'
+Requires-Dist: pytest<9,>=8.1.1; extra == 'agents'
+Requires-Dist: ruff>=0.11.8; extra == 'agents'
+Requires-Dist: setuptools; extra == 'agents'
+Requires-Dist: textdistance<5,>=4.5.0; extra == 'agents'
 Provides-Extra: dev
 Requires-Dist: aiodocker>=0.24.0; extra == 'dev'
 Requires-Dist: dotenv>=0.9.9; extra == 'dev'
@@ -100,14 +128,6 @@ Requires-Dist: setuptools; extra == 'dev'
 Requires-Dist: textdistance<5,>=4.5.0; extra == 'dev'
 Provides-Extra: rl
 Requires-Dist: bitsandbytes>=0.41.0; (sys_platform == 'linux') and extra == 'rl'
-Requires-Dist: dotenv>=0.9.9; extra == 'rl'
-Requires-Dist: ipykernel; extra == 'rl'
-Requires-Dist: ipython<9; extra == 'rl'
-Requires-Dist: jupyter-client; extra == 'rl'
-Requires-Dist: jupyter-core; extra == 'rl'
-Requires-Dist: langchain; extra == 'rl'
-Requires-Dist: langchain-anthropic; extra == 'rl'
-Requires-Dist: langchain-openai; extra == 'rl'
 Requires-Dist: liger-kernel>=0.5.0; (sys_platform == 'linux') and extra == 'rl'
 Requires-Dist: peft>=0.17.1; extra == 'rl'
 Requires-Dist: vllm==0.10.1.1; extra == 'rl'
@@ -138,8 +158,8 @@ OSS RL environment + evals toolkit. Wrap software as environments, run benchmark
 ## Highlights
 - 🚀 **[MCP environment skeleton](https://docs.hud.so/core-concepts/mcp-protocol)** – any agent can call any environment.
-- ⚡️ **[Live telemetry](https://app.hud.so)** – inspect every tool call, observation, and reward in real time.
-- 🗂️ **[Public benchmarks](https://app.hud.so/leaderboards)** – OSWorld-Verified, SheetBench-50, and more.
+- ⚡️ **[Live telemetry](https://hud.so)** – inspect every tool call, observation, and reward in real time.
+- 🗂️ **[Public benchmarks](https://hud.so/leaderboards)** – OSWorld-Verified, SheetBench-50, and more.
 - 🌱 **[Reinforcement learning built-in](rl/)** – Verifiers gym pipelines for GRPO on any environment.
 - 🌐 **[Cloud browsers](environments/remote_browser/)** – AnchorBrowser, Steel, BrowserBase integrations for browser automation.
 - 🛠️ **[Hot-reload dev loop](environments/README.md#phase-5-hot-reload-development-with-cursor-agent)** – `hud dev` for iterating on environments without rebuilds.
@@ -185,14 +205,14 @@ from hud.agents import ClaudeAgent
 from hud.datasets import Task  # See docs: https://docs.hud.so/reference/tasks
 async def main() -> None:
-    with hud.trace("Quick Start 2048"): # All telemetry works for any MCP-based agent (see https://app.hud.so)
+    with hud.trace("Quick Start 2048"): # All telemetry works for any MCP-based agent (see https://hud.so)
         task = {
             "prompt": "Reach 64 in 2048.",
             "mcp_config": {
                 "hud": {
                     "url": "https://mcp.hud.so/v3/mcp",  # HUD's cloud MCP server (see https://docs.hud.so/core-concepts/architecture)
                     "headers": {
-                        "Authorization": f"Bearer {settings.api_key}",  # Get your key at https://app.hud.so
+                        "Authorization": f"Bearer {settings.api_key}",  # Get your key at https://hud.so
                         "Mcp-Image": "hudpython/hud-text-2048:v1.2"  # Docker image from https://hub.docker.com/u/hudpython
                     }
                 }
@@ -219,7 +239,7 @@ async def main() -> None:
 asyncio.run(main())
 ```
-The above example let's the agent play 2048 ([See replay](https://app.hud.so/trace/6feed7bd-5f67-4d66-b77f-eb1e3164604f))
+The above example let's the agent play 2048 ([See replay](https://hud.so/trace/6feed7bd-5f67-4d66-b77f-eb1e3164604f))
 ![Agent playing 2048](https://raw.githubusercontent.com/hud-evals/hud-python/main/docs/src/images/2048_1.gif)
@@ -250,7 +270,7 @@ Supports multi‑turn RL for both:
 - Language‑only models (e.g., `Qwen/Qwen2.5-7B-Instruct`)
 - Vision‑Language models (e.g., `Qwen/Qwen2.5-VL-3B-Instruct`)
-By default, `hud rl` provisions a persistant server and trainer in the cloud, streams telemetry to `app.hud.so`, and lets you monitor/manage models at `app.hud.so/models`. Use `--local` to run entirely on your machines (typically 2+ GPUs: one for vLLM, the rest for training).
+By default, `hud rl` provisions a persistant server and trainer in the cloud, streams telemetry to `hud.so`, and lets you monitor/manage models at `hud.so/models`. Use `--local` to run entirely on your machines (typically 2+ GPUs: one for vLLM, the rest for training).
 Any HUD MCP environment and evaluation works with our RL pipeline (including remote configurations). See the guided docs: `https://docs.hud.so/train-agents/quickstart`.
@@ -260,7 +280,7 @@ This is Claude Computer Use running on our proprietary financial analyst benchma
 ![Trace screenshot](https://raw.githubusercontent.com/hud-evals/hud-python/main/docs/src/images/trace_sheet.gif)
-> [See this trace on _app.hud.so_](https://app.hud.so/trace/9e212e9e-3627-4f1f-9eb5-c6d03c59070a)
+> [See this trace on _hud.so_](https://hud.so/trace/9e212e9e-3627-4f1f-9eb5-c6d03c59070a)
 This example runs the full dataset (only takes ~20 minutes) using [run_evaluation.py](examples/run_evaluation.py):
@@ -286,7 +306,7 @@ results = await run_dataset(
 print(f"Average reward: {sum(r.reward for r in results) / len(results):.2f}")
 ```
-> Running a dataset creates a job and streams results to the [app.hud.so](https://app.hud.so) platform for analysis and [leaderboard submission](https://docs.hud.so/evaluate-agents/leaderboards).
+> Running a dataset creates a job and streams results to the [hud.so](https://hud.so) platform for analysis and [leaderboard submission](https://docs.hud.so/evaluate-agents/leaderboards).
 ## Building Environments (MCP)
@@ -377,7 +397,7 @@ Tools
 hud push # needs docker login, hud api key
 ```
-5. Now you can use `mcp.hud.so` to launch 100s of instances of this environment in parallel with any agent, and see everything live on [app.hud.so](https://app.hud.so):
+5. Now you can use `mcp.hud.so` to launch 100s of instances of this environment in parallel with any agent, and see everything live on [hud.so](https://hud.so):
 ```python
 from hud.agents import ClaudeAgent
@@ -408,7 +428,7 @@ result = await ClaudeAgent().run({  # See all agents: https://docs.hud.so/refere
 ## Leaderboards & benchmarks
-All leaderboards are publicly available on [app.hud.so/leaderboards](https://app.hud.so/leaderboards) (see [docs](https://docs.hud.so/evaluate-agents/leaderboards))
+All leaderboards are publicly available on [hud.so/leaderboards](https://hud.so/leaderboards) (see [docs](https://docs.hud.so/evaluate-agents/leaderboards))
 ![Leaderboard](https://raw.githubusercontent.com/hud-evals/hud-python/main/docs/src/images/leaderboards_3.png)
@@ -422,7 +442,7 @@ Using the [`run_dataset`](https://docs.hud.so/reference/tasks#run_dataset) funct
 %%{init: {"theme": "neutral", "themeVariables": {"fontSize": "14px"}} }%%
 graph LR
     subgraph "Platform"
-        Dashboard["📊 app.hud.so"]
+        Dashboard["📊 hud.so"]
         API["🔌 mcp.hud.so"]
     end

{hud_python-0.4.35 → hud_python-0.4.37}/README.md RENAMED Viewed

@@ -23,8 +23,8 @@ OSS RL environment + evals toolkit. Wrap software as environments, run benchmark
 ## Highlights
 - 🚀 **[MCP environment skeleton](https://docs.hud.so/core-concepts/mcp-protocol)** – any agent can call any environment.
-- ⚡️ **[Live telemetry](https://app.hud.so)** – inspect every tool call, observation, and reward in real time.
-- 🗂️ **[Public benchmarks](https://app.hud.so/leaderboards)** – OSWorld-Verified, SheetBench-50, and more.
+- ⚡️ **[Live telemetry](https://hud.so)** – inspect every tool call, observation, and reward in real time.
+- 🗂️ **[Public benchmarks](https://hud.so/leaderboards)** – OSWorld-Verified, SheetBench-50, and more.
 - 🌱 **[Reinforcement learning built-in](rl/)** – Verifiers gym pipelines for GRPO on any environment.
 - 🌐 **[Cloud browsers](environments/remote_browser/)** – AnchorBrowser, Steel, BrowserBase integrations for browser automation.
 - 🛠️ **[Hot-reload dev loop](environments/README.md#phase-5-hot-reload-development-with-cursor-agent)** – `hud dev` for iterating on environments without rebuilds.
@@ -70,14 +70,14 @@ from hud.agents import ClaudeAgent
 from hud.datasets import Task  # See docs: https://docs.hud.so/reference/tasks
 async def main() -> None:
-    with hud.trace("Quick Start 2048"): # All telemetry works for any MCP-based agent (see https://app.hud.so)
+    with hud.trace("Quick Start 2048"): # All telemetry works for any MCP-based agent (see https://hud.so)
         task = {
             "prompt": "Reach 64 in 2048.",
             "mcp_config": {
                 "hud": {
                     "url": "https://mcp.hud.so/v3/mcp",  # HUD's cloud MCP server (see https://docs.hud.so/core-concepts/architecture)
                     "headers": {
-                        "Authorization": f"Bearer {settings.api_key}",  # Get your key at https://app.hud.so
+                        "Authorization": f"Bearer {settings.api_key}",  # Get your key at https://hud.so
                         "Mcp-Image": "hudpython/hud-text-2048:v1.2"  # Docker image from https://hub.docker.com/u/hudpython
                     }
                 }
@@ -104,7 +104,7 @@ async def main() -> None:
 asyncio.run(main())
 ```
-The above example let's the agent play 2048 ([See replay](https://app.hud.so/trace/6feed7bd-5f67-4d66-b77f-eb1e3164604f))
+The above example let's the agent play 2048 ([See replay](https://hud.so/trace/6feed7bd-5f67-4d66-b77f-eb1e3164604f))
 ![Agent playing 2048](https://raw.githubusercontent.com/hud-evals/hud-python/main/docs/src/images/2048_1.gif)
@@ -135,7 +135,7 @@ Supports multi‑turn RL for both:
 - Language‑only models (e.g., `Qwen/Qwen2.5-7B-Instruct`)
 - Vision‑Language models (e.g., `Qwen/Qwen2.5-VL-3B-Instruct`)
-By default, `hud rl` provisions a persistant server and trainer in the cloud, streams telemetry to `app.hud.so`, and lets you monitor/manage models at `app.hud.so/models`. Use `--local` to run entirely on your machines (typically 2+ GPUs: one for vLLM, the rest for training).
+By default, `hud rl` provisions a persistant server and trainer in the cloud, streams telemetry to `hud.so`, and lets you monitor/manage models at `hud.so/models`. Use `--local` to run entirely on your machines (typically 2+ GPUs: one for vLLM, the rest for training).
 Any HUD MCP environment and evaluation works with our RL pipeline (including remote configurations). See the guided docs: `https://docs.hud.so/train-agents/quickstart`.
@@ -145,7 +145,7 @@ This is Claude Computer Use running on our proprietary financial analyst benchma
 ![Trace screenshot](https://raw.githubusercontent.com/hud-evals/hud-python/main/docs/src/images/trace_sheet.gif)
-> [See this trace on _app.hud.so_](https://app.hud.so/trace/9e212e9e-3627-4f1f-9eb5-c6d03c59070a)
+> [See this trace on _hud.so_](https://hud.so/trace/9e212e9e-3627-4f1f-9eb5-c6d03c59070a)
 This example runs the full dataset (only takes ~20 minutes) using [run_evaluation.py](examples/run_evaluation.py):
@@ -171,7 +171,7 @@ results = await run_dataset(
 print(f"Average reward: {sum(r.reward for r in results) / len(results):.2f}")
 ```
-> Running a dataset creates a job and streams results to the [app.hud.so](https://app.hud.so) platform for analysis and [leaderboard submission](https://docs.hud.so/evaluate-agents/leaderboards).
+> Running a dataset creates a job and streams results to the [hud.so](https://hud.so) platform for analysis and [leaderboard submission](https://docs.hud.so/evaluate-agents/leaderboards).
 ## Building Environments (MCP)
@@ -262,7 +262,7 @@ Tools
 hud push # needs docker login, hud api key
 ```
-5. Now you can use `mcp.hud.so` to launch 100s of instances of this environment in parallel with any agent, and see everything live on [app.hud.so](https://app.hud.so):
+5. Now you can use `mcp.hud.so` to launch 100s of instances of this environment in parallel with any agent, and see everything live on [hud.so](https://hud.so):
 ```python
 from hud.agents import ClaudeAgent
@@ -293,7 +293,7 @@ result = await ClaudeAgent().run({  # See all agents: https://docs.hud.so/refere
 ## Leaderboards & benchmarks
-All leaderboards are publicly available on [app.hud.so/leaderboards](https://app.hud.so/leaderboards) (see [docs](https://docs.hud.so/evaluate-agents/leaderboards))
+All leaderboards are publicly available on [hud.so/leaderboards](https://hud.so/leaderboards) (see [docs](https://docs.hud.so/evaluate-agents/leaderboards))
 ![Leaderboard](https://raw.githubusercontent.com/hud-evals/hud-python/main/docs/src/images/leaderboards_3.png)
@@ -307,7 +307,7 @@ Using the [`run_dataset`](https://docs.hud.so/reference/tasks#run_dataset) funct
 %%{init: {"theme": "neutral", "themeVariables": {"fontSize": "14px"}} }%%
 graph LR
     subgraph "Platform"
-        Dashboard["📊 app.hud.so"]
+        Dashboard["📊 hud.so"]
         API["🔌 mcp.hud.so"]
     end

{hud_python-0.4.35 → hud_python-0.4.37}/environments/README.md RENAMED Viewed

@@ -495,7 +495,7 @@ from hud.agents import ClaudeAgent
 from hud.clients import MCPClient
 async def main():
-    # `trace` captures *everything* that happens and sends it to app.hud.so
+    # `trace` captures *everything* that happens and sends it to hud.so
     with hud.trace("local_test"):
         task = Task(
             prompt="Complete the task",
@@ -524,7 +524,7 @@ async def main():
 asyncio.run(main())
 ```
-The `trace` context manager sends a full timeline of agent actions, tool calls, and rewards to app.hud.so – perfect for debugging.
+The `trace` context manager sends a full timeline of agent actions, tool calls, and rewards to hud.so – perfect for debugging.
 See `examples/01_hello_2048.py` and `examples/task_with_setup_eval.py` for larger end-to-end demos.
@@ -532,7 +532,7 @@ See `examples/01_hello_2048.py` and `examples/task_with_setup_eval.py` for large
 ## Phase 4 – Remote Deployment & HUD Runner
-**Goal →** the exact same image runs in parallel on hundreds of instances, and exposes more telemetry so the app.hud.so can visualise the whole lifecycle.
+**Goal →** the exact same image runs in parallel on hundreds of instances, and exposes more telemetry so the hud.so can visualise the whole lifecycle.
 ### 1. Publish your image
@@ -595,11 +595,11 @@ async def initialize_environment(session=None, progress_token=None):
     await send(100, "ready")
 ```
-Those messages are displayed live on app.hud.so alongside resource graphs – perfect feedback while you wait.
+Those messages are displayed live on hud.so alongside resource graphs – perfect feedback while you wait.
 ### 4. Live telemetry (`telemetry://live`) (Optional)
-Expose a resource named `telemetry://live` exactly like in `environments/browser/src/hud_controller/server.py` to return live url to be displayed on app.hud.so.
+Expose a resource named `telemetry://live` exactly like in `environments/browser/src/hud_controller/server.py` to return live url to be displayed on hud.so.
 Once all of the above works you can unleash *hundreds* of concurrent agents on your new environment.

hud_python-0.4.37/environments/blank/README.md ADDED Viewed

@@ -0,0 +1,108 @@
+# test-test
+## Environment design pattern
+- Controller (Think of this as a frontend in web development)
+  - Creates the UX and manages the lifecycle of an app (in this case for an agent)
+  - Define `mcp = MCPServer()` and register `@mcp.tool` as tools the agent can interact with
+- Environment (Think of this as a backend in web development)
+  - Owns all long‑lived states of the environment and exposes the environment data structure
+  - Expose simple HTTP endpoints (`/health`, `/act`, `/reset`, `/state`)
+IMPORTANT: Make sure all logs are going to stderr instead of stdio, which is reserved for MCP communication
+### Testing your environment
+```bash
+# 1. Configure your API keys (optional - only needed for evaluation)
+# Edit .env file to add your HUD_API_KEY and ANTHROPIC_API_KEY
+# 2. Start the environment (optional: with --inspector or --interactive)
+hud dev --build --interactive
+# 3. Choose your preferred way to test:
+# Option A: Run the task with Claude (requires ANTHROPIC_API_KEY)
+hud eval tasks.json --agent claude
+# Option B: Interactive notebook test_env.ipynb (great for learning!)
+# Option C: Simple Python script (runs all tasks from tasks.json)
+python test_task.py
+```
+## Iterating on your environment
+This is usually the process for making any environment better:
+```bash
+# 1. Start the environment and interact with it directly (or give MCP server to an agent):
+hud dev --build --interactive
+# 2. If the environment cannot start or fails inexplicably:
+hud debug test_env:dev # Or your env name that appears when you run hud dev
+# After fixing the error, go back to 1.
+# 3. When the environment is in a stable state:
+hud build
+hud push # Requires docker login
+# 4. As soon as it's pushed to the newest version, make sure tasks have it updated and run:
+hud rl
+# This is a good test to see if your environment and tasks are high quality!
+## Layout
+```
+controller/
+  __init__.py   # mcp + shared HTTP client
+  __main__.py   # python -m controller → mcp.run()
+  hooks.py      # @mcp.initialize / @mcp.shutdown
+  tools.py      # @mcp.tool act / setup / evaluate
+./environment
+  ├── __init__.py
+  └── server.py       # FastAPI app: /health, /act, /reset, /state
+```
+## Publishing Your Environment
+Once your environment is ready, you can share it with the community:
+### 1. Push to Registry
+```bash
+# Build and push your environment (requires docker hub login and hud api key)
+hud build
+hud push
+```
+### 2. Create a Dataset
+Create a dataset on HuggingFace with your tasks:
+**Option A: Upload manually**
+1. Upload your `tasks.json` to HuggingFace
+2. Make sure it's **public** to appear on leaderboards
+**Option B: Use the SDK**
+```python
+from hud.datasets import save_tasks
+import json
+# Load your tasks
+with open("tasks.json") as f:
+    tasks = json.load(f)
+# Push to HuggingFace
+save_tasks(tasks, repo_id="your-org/your-dataset")
+```
+### 3. Run and Track Performance
+```bash
+# Run Claude on your benchmark
+hud eval "your-org/your-dataset" --agent claude
+# View results at:
+# hud.so/leaderboards/your-org/your-dataset
+```
+**Note**: Only public HuggingFace datasets appear as leaderboards!
+📚 Learn more: [Creating Benchmarks](https://docs.hud.so/evaluate-agents/create-benchmarks) | [Leaderboards](https://docs.hud.so/evaluate-agents/leaderboards)

hud_python-0.4.37/environments/blank/controller/README.md ADDED Viewed

@@ -0,0 +1,16 @@
+# Controller
+Frontend for the agent: defines tools, minimal state, calls the environment over HTTP.
+What to implement
+- Shared client in `__init__.py` (one `httpx.AsyncClient`)
+- Lifecycle in `hooks.py` (`@mcp.initialize`/`@mcp.shutdown`)
+- Tools in `tools.py` (`@mcp.tool`) — keep logic thin; docstrings = descriptions
+Run
+```bash
+hud run controller --transport http --reload
+# Helper endpoints: http://localhost:8765/hud and /hud/tools
+```
+Principle: the controller is UX, not state. Keep long‑lived state in the environment.

hud_python-0.4.37/environments/blank/environment/README.md ADDED Viewed

@@ -0,0 +1,16 @@
+# Environment
+Backend service: owns state and exposes HTTP APIs the controller calls.
+Endpoints (FastAPI)
+- `GET /health` → {status: ok}
+- `POST /act` → increments counter and returns {count}
+- `POST /reset` → resets counter
+- `GET /state` → returns {count}
+Run (dev)
+```bash
+uv run uvicorn environment.server:app --reload --port 8005
+```
+Principle: treat like a backend. Keep long‑lived state here; add endpoints as tools need them.

hud_python-0.4.37/environments/blank/pyproject.toml ADDED Viewed

@@ -0,0 +1,19 @@
+[project]
+name = "test_test"
+version = "0.1.0"
+description = "A minimal HUD environment"
+requires-python = ">=3.11"
+dependencies = [ "hud-python==0.4.37", "fastapi", "uvicorn[standard]", "httpx>=0.28.1",]
+[build-system]
+requires = [ "hatchling",]
+build-backend = "hatchling.build"
+[tool.hud]
+image = "test_test:dev"
+[tool.hatch.metadata]
+allow-direct-references = true
+[tool.hatch.build.targets.wheel]
+packages = [ "controller", "environment",]

hud-python 0.4.35__tar.gz → 0.4.37__tar.gz

Potentially problematic release.

hud-python 0.4.35tar.gz → 0.4.37tar.gz