PyPI - hud-python - Versions diffs - 0.1.0__tar.gz → 0.1.0b2__tar.gz - Mend

hud-python 0.1.0tar.gz → 0.1.0b2tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Potentially problematic release.

This version of hud-python might be problematic. Click here for more details.

Files changed (54) hide show

{hud_python-0.1.0 → hud_python-0.1.0b2}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: hud-python
-Version: 0.1.0
+Version: 0.1.0b2
 Summary: SDK for the HUD evaluation platform.
 Project-URL: Homepage, https://github.com/Human-Data/hud-sdk
 Project-URL: Bug Tracker, https://github.com/Human-Data/hud-sdk/issues
@@ -44,6 +44,7 @@ Requires-Dist: pydantic-settings<3,>=2
 Requires-Dist: pydantic<3,>=2
 Provides-Extra: dev
 Requires-Dist: anthropic; extra == 'dev'
+Requires-Dist: dotenv; extra == 'dev'
 Requires-Dist: ipykernel; extra == 'dev'
 Requires-Dist: ipython<9; extra == 'dev'
 Requires-Dist: jupyter-client; extra == 'dev'
@@ -54,38 +55,40 @@ Requires-Dist: pytest<9,>=8.1.1; extra == 'dev'
 Requires-Dist: ruff==0.9.8; extra == 'dev'
 Description-Content-Type: text/markdown
-# HUD SDK (Alpha Release)
+# HUD
-A Python SDK for interacting with HUD environments and evaluation benchmarks for browser use and computer use models.
+A Python SDK for interacting with HUD environments and evaluation benchmarks for browser use and computer use models. Visit [hud.so](https://hud.so).
-Visit [hud.so](https://hud.so) for more information about HUD.
-> **Alpha Release Notice**: This SDK is currently in alpha status (v0.1.0-alpha). The API is still evolving and may change in future releases as we gather feedback and improve functionality.
+> **Alpha Release Notice**: This SDK is currently in alpha status (v0.1.0-alpha). The API is evolving and may change in future releases as we gather feedback and improve functionality.
 [![PyPI version](https://img.shields.io/pypi/v/hud-python)](https://pypi.org/project/hud-python/)
-[📚 Documentation](https://docs.hud.so) | [🏠 Homepage](https://hud.so)
+[📚 Documentation](https://documentation.hud.so) | [🏠 Homepage](https://hud.so)
+## Quick start
-## Quick Start
+[RECOMMENDED] To set get started with an agent, see the [Claude Computer use example](https://github.com/Human-Data/hud-sdk/tree/main/examples).
+Otherwise, install the package with Python>=3.9:
 ```bash
-# Install the latest stable release
 pip install hud-python
+```
-# Install the latest alpha release (may include breaking changes)
-pip install --pre hud-python
-# Install a specific alpha version
-pip install hud-python==0.1.0-alpha
+Make sure to setup your account [here](https://hud.so/settings) and add your API key to the environment variables:
+```bash
+HUD_API_KEY=<your-api-key>
 ```
+Load in your agent and create a run! Go to the [examples](https://github.com/Human-Data/hud-sdk/tree/main/examples) folder for more examples.
 ```python
 import asyncio
 from hud import HUDClient
 async def main():
     # Initialize client with API key
-    client = HUDClient(api_key="your-api-key")
+    client = HUDClient(api_key=os.getenv("HUD_API_KEY"))
     # Load a gym and evaluation set
     gym = await client.load_gym(id="OSWorld-Ubuntu")
@@ -93,24 +96,33 @@ async def main():
     # Create a run and environment
     run = client.create_run(name="example-run", gym=gym, evalset=evalset)
-    env = await run.make(metadata={"agent_id": "example"})
+    env = await run.make(metadata={"agent_id": "OSWORLD-1"})
+    await env.wait_for_ready()
+    ###
+    ### Agent loop goes here, see example in /examples
+    ###
-    # Agent loop goes here
-    # For complete examples and usage guides, see our documentation
+    # Evaluate the environment
+    result = await env.evaluate()
     # Close the environment when done
     await env.close()
+    # Get analytics for the run such as rewards, task completions, etc.
+    analytics = await run.get_analytics()
+    print(analytics)
 if __name__ == "__main__":
     asyncio.run(main())
 ```
-## Key Features
+## Features
 - Connect to HUD evaluation environments
 - Run benchmarks across various tasks
 - Support for different agent adapters
-- Asynchronous API for efficient interaction
+- Asynchronous API
 ## Documentation

hud_python-0.1.0b2/README.md ADDED Viewed

@@ -0,0 +1,80 @@
+# HUD
+A Python SDK for interacting with HUD environments and evaluation benchmarks for browser use and computer use models. Visit [hud.so](https://hud.so).
+> **Alpha Release Notice**: This SDK is currently in alpha status (v0.1.0-alpha). The API is evolving and may change in future releases as we gather feedback and improve functionality.
+[![PyPI version](https://img.shields.io/pypi/v/hud-python)](https://pypi.org/project/hud-python/)
+[📚 Documentation](https://documentation.hud.so) | [🏠 Homepage](https://hud.so)
+## Quick start
+[RECOMMENDED] To set get started with an agent, see the [Claude Computer use example](https://github.com/Human-Data/hud-sdk/tree/main/examples).
+Otherwise, install the package with Python>=3.9:
+```bash
+pip install hud-python
+```
+Make sure to setup your account [here](https://hud.so/settings) and add your API key to the environment variables:
+```bash
+HUD_API_KEY=<your-api-key>
+```
+Load in your agent and create a run! Go to the [examples](https://github.com/Human-Data/hud-sdk/tree/main/examples) folder for more examples.
+```python
+import asyncio
+from hud import HUDClient
+async def main():
+    # Initialize client with API key
+    client = HUDClient(api_key=os.getenv("HUD_API_KEY"))
+    # Load a gym and evaluation set
+    gym = await client.load_gym(id="OSWorld-Ubuntu")
+    evalset = await client.load_evalset(id="OSWorld-Ubuntu")
+    # Create a run and environment
+    run = client.create_run(name="example-run", gym=gym, evalset=evalset)
+    env = await run.make(metadata={"agent_id": "OSWORLD-1"})
+    await env.wait_for_ready()
+    ###
+    ### Agent loop goes here, see example in /examples
+    ###
+    # Evaluate the environment
+    result = await env.evaluate()
+    # Close the environment when done
+    await env.close()
+    # Get analytics for the run such as rewards, task completions, etc.
+    analytics = await run.get_analytics()
+    print(analytics)
+if __name__ == "__main__":
+    asyncio.run(main())
+```
+## Features
+- Connect to HUD evaluation environments
+- Run benchmarks across various tasks
+- Support for different agent adapters
+- Asynchronous API
+## Documentation
+For comprehensive guides, examples, and API reference, visit:
+- [Getting Started](https://docs.hud.so/introduction)
+- [Installation](https://docs.hud.so/installation)
+- [API Reference](https://docs.hud.so/api-reference)
+- [Examples](https://docs.hud.so/examples)
+## License
+[MIT License](LICENSE)

{hud_python-0.1.0 → hud_python-0.1.0b2}/agent/base.py RENAMED Viewed

@@ -1,5 +1,8 @@
+from typing import Any
 class Agent:
-    def __init__(self):
+    def __init__(self, client: Any):
+        self.client = client
         self.messages = []
         self.responses = []

{hud_python-0.1.0 → hud_python-0.1.0b2}/agent/claude.py RENAMED Viewed

@@ -2,19 +2,18 @@ import os
 import json
 from agent.base import Agent
 from anthropic import Anthropic
+from anthropic.types import Message
-class Claude(Agent):
-    def __init__(self):
-        super().__init__()
-        self.client = Anthropic(api_key=os.getenv("ANTHROPIC_API_KEY"))
+class ClaudeAgent(Agent):
+    def __init__(self, client: Anthropic):
+        super().__init__(client)
         self.model = "claude-3-7-sonnet-20250219"
         self.max_tokens = 4096
         self.tool_version = "20250124"
         self.thinking_budget = 1024
         self.conversation = []  # Store the full conversation history including Claude's responses
-    async def predict(self, base64_image: str | None = None, input_text: str | None = None):
+    async def predict(self, base64_image: str | None = None, input_text: str | None = None) -> tuple[bool, str | object | None]:
         message = self._create_message(base64_image, input_text)
         # Only append the message if it's not empty
@@ -33,7 +32,10 @@ class Claude(Agent):
         self.conversation.append(assistant_message)
         self.responses.append(response)
-        return response
+        done, processed = await self.process_response(response)
+        return done, processed
     def _create_message(self, base64_image: str | None = None, input_text: str | None = None):
         """Create appropriate message based on context and inputs"""
@@ -120,19 +122,17 @@ class Claude(Agent):
         except Exception as e:
             raise
-    def process_response(self, response: dict) -> tuple[bool, str | None]:
+    async def process_response(self, response: Message) -> tuple[bool, str | object | None]:
         # Check if response contains a computer tool use
-        has_computer_tool_use = False
         computer_action = None
-        for block in response["content"]:
+        for block in response.content:
             if block.type == "tool_use" and block.name == "computer":
-                has_computer_tool_use = True
                 computer_action = block.input
                 break
-        if not has_computer_tool_use:
+        if response.content[-1].type == "text":
             # No computer tool use, treat as final response
-            return True, str(response["content"][-1].text)
+            return True, str(response.content[-1].text)
         # If we have a computer action, adapt it to environment actions
         if computer_action:

{hud_python-0.1.0 → hud_python-0.1.0b2}/docs/installation.mdx RENAMED Viewed

@@ -15,7 +15,7 @@ pip install hud-python
 pip install --pre hud-python
 # Install a specific alpha version
-pip install hud-python==0.1.0-alpha
+pip install hud-python==0.1.0
 ```
 ## Alpha Release Status

hud_python-0.1.0b2/examples/README.md ADDED Viewed

@@ -0,0 +1,44 @@
+## Claude Computer Use evaluation on OSWorld
+### 1. Setup
+Step 1: Install from the source repository:
+```bash
+# Clone the repository
+git clone https://github.com/Human-Data/hud-sdk.git
+cd hud-sdk
+```
+Step 2: Create a virtual environment:
+```bash
+# Option 1: using venv
+python -m venv .venv
+source .venv/bin/activate  # On Windows: .venv\Scripts\activate
+# Option 2: using uv (recommended)
+uv venv
+# Then activate according to your shell
+```
+Step 3: Install in development mode with all dependencies:
+```bash
+# Option 1: using pip
+pip install -e ".[dev]"
+# Option 2: using uv (recommended)
+uv pip install -e ".[dev]"
+```
+### 2. Set up environment variables
+```bash
+HUD_API_KEY=<your-api-key>
+ANTHROPIC_API_KEY=<your-api-key>
+```
+### 3. Run the OSWorld example
+Explore the [claude_osworld.ipynb](https://github.com/Human-Data/hud-sdk/blob/main/examples/claude_osworld.ipynb) notebook from this folder in Jupyter Notebook.

hud_python-0.1.0b2/examples/claude_osworld.ipynb ADDED Viewed

@@ -0,0 +1,154 @@
+{
+ "cells": [
+  {
+   "cell_type": "code",
+   "execution_count": 1,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import os\n",
+    "from dotenv import load_dotenv\n",
+    "load_dotenv()\n",
+    "\n",
+    "from hud import HUDClient\n",
+    "from hud.adapters.claude.adapter import ClaudeAdapter\n",
+    "from agent.claude import ClaudeAgent\n",
+    "\n",
+    "from anthropic import Anthropic"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 2,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# initialize HUD client\n",
+    "client = HUDClient(api_key=os.getenv(\"HUD_API_KEY\"))\n",
+    "\n",
+    "# initalize Claude Computer Use agent\n",
+    "anthropic = Anthropic(api_key=os.getenv(\"ANTHROPIC_API_KEY\"))\n",
+    "agent = ClaudeAgent(anthropic)\n",
+    "\n",
+    "# initialize adapter to interact with the environment\n",
+    "cua_adapter = ClaudeAdapter()"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# load OSWorld environment\n",
+    "gym = await client.load_gym(id=\"OSWorld-Ubuntu\")\n",
+    "\n",
+    "# load OSWorld evalset\n",
+    "evalset = await client.load_evalset(id=\"OSWorld-Ubuntu\")\n",
+    "\n",
+    "# create a run that will host all evaluations\n",
+    "run = client.create_run(name=\"Claude-test-OSWorld\", gym=gym, evalset=evalset)\n",
+    "\n",
+    "# fetch all task ids from the run\n",
+    "tasks = await run.fetch_task_ids()\n",
+    "print(f\"Total tasks in OSWorld: {len(tasks)}\")"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# it may take around 3 minutes to initialize the OSWorld platform and reset to a task\n",
+    "\n",
+    "# make a HUD environment\n",
+    "env = await run.make()\n",
+    "await env.wait_for_ready()\n",
+    "\n",
+    "# reset to a task with an observation (screenshot and text)\n",
+    "obs = await env.reset(task_id=tasks[1])\n",
+    "print(f\"Task description: {obs.text}\")\n",
+    "\n",
+    "# watch the agent live\n",
+    "live_url = await env.get_vnc_url()\n",
+    "client.display_stream(live_url)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# agent loop\n",
+    "for i in range(8):\n",
+    "    # rescale screenshot to Claude's resolution\n",
+    "    screenshot = cua_adapter.rescale(obs.screenshot)\n",
+    "\n",
+    "    # agent's next action\n",
+    "    done, response = await agent.predict(screenshot, obs.text)\n",
+    "    if done:\n",
+    "        env.final_response = str(response)\n",
+    "        break\n",
+    "\n",
+    "    # convert to HUD action space\n",
+    "    actions = cua_adapter.adapt_list([response])\n",
+    "    print(f\"Agent's action: {response}\")\n",
+    "\n",
+    "    # step the environment forward\n",
+    "    obs, reward, terminated, info = await env.step(actions)\n",
+    "\n",
+    "    # drop out if terminated\n",
+    "    if terminated:\n",
+    "        break\n",
+    "    print(f\"Step {i+1} completed\")\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# evaluate environment state\n",
+    "result = await env.evaluate()\n",
+    "print(f\"Evaluation result: {result}\")\n",
+    "\n",
+    "# close environment\n",
+    "await env.close()"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "analytics = await run.get_analytics()\n",
+    "print(analytics)"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": ".venv",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.12.9"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}

{hud_python-0.1.0 → hud_python-0.1.0b2}/hud/__init__.py RENAMED Viewed

@@ -5,14 +5,14 @@ HUD Gym SDK - A Python SDK for interacting with HUD environments.
 from __future__ import annotations
 from hud.client import HUDClient
-from hud.env import Env, EvalSet, Observation, TaskResult
+from hud.environment import Environment, EvalSet, Observation, TaskResult
 from hud.gym import Gym
 from hud.run import Run
-__version__ = "0.1.0"
+__version__ = "0.1.0b2"
 __all__ = [
-    "Env",
+    "Environment",
     "EvalSet",
     "Gym",
     "HUDClient",

{hud_python-0.1.0 → hud_python-0.1.0b2}/hud/client.py RENAMED Viewed

@@ -8,7 +8,7 @@ import json
 from typing import Any
 from .adapters.common import Adapter
-from .env import EvalSet
+from .environment import EvalSet
 from .gym import Gym
 from .run import Run, RunResponse
 from .server import make_request, make_sync_request
@@ -23,15 +23,15 @@ class HUDClient:
     evalsets, and create runs.
     """
-    def __init__(self, api_key: str) -> None:
+    def __init__(self, api_key: str | None = None) -> None:
         """
         Initialize the HUD client with an API key.
         Args:
             api_key: API key for authentication with the HUD API
         """
-        self.api_key = api_key
-        settings.api_key = api_key  # Set global config
+        self.api_key = api_key or settings.api_key
+        settings.api_key = self.api_key
     async def load_gym(self, id: str) -> Gym:
         """
@@ -182,3 +182,18 @@ class HUDClient:
             config=config,
             metadata=metadata,
         )
+    def display_stream(self, live_url: str) -> None:
+        """
+        Display a stream in the HUD system.
+        """
+        from IPython.display import HTML, display
+        html_content = f"""
+        <div style="width: 960px; height: 540px; overflow: hidden;">
+            <div style="transform: scale(0.5); transform-origin: top left;">
+                <iframe src="{live_url}" width="1920" height="1080" style="border: 1px solid #ddd;">
+                </iframe>
+            </div>
+        </div>
+        """
+        display(HTML(html_content))

hud_python-0.1.0/hud/env.py → hud_python-0.1.0b2/hud/environment.py RENAMED Viewed

@@ -1,5 +1,8 @@
 from __future__ import annotations
+import asyncio
+import enum
+import logging
 from typing import TYPE_CHECKING, Any
 from pydantic import BaseModel
@@ -10,6 +13,7 @@ from hud.settings import settings
 if TYPE_CHECKING:
     from .adapters.common import Adapter
+logger = logging.getLogger("hud.environment")
 class Observation(BaseModel):
     """
@@ -38,8 +42,29 @@ class TaskResult(BaseModel):
     terminated: bool
     info: dict[str, Any]
+class EnvironmentStatus(str, enum.Enum):
+    """
+    Status of the environment.
+    Attributes:
+        INITIALIZING: The environment is initializing
+        RUNNING: The environment is running
+        COMPLETED: The environment is completed
+        ERROR: The environment is in an error state
+    """
+    INITIALIZING = "initializing"
+    RUNNING = "running"
+    COMPLETED = "completed"
+    ERROR = "error"
+status_messages = {
+    EnvironmentStatus.RUNNING.value: "is running",
+    EnvironmentStatus.ERROR.value: "had an error initializing",
+    EnvironmentStatus.COMPLETED.value: "completed",
+}
-class Env:
+class Environment:
     """
     Environment interface for agent interactions.
@@ -192,7 +217,9 @@ class Env:
             api_key=settings.api_key,
         )
-    async def reset(self, task_id: str, metadata: dict[str, Any] | None = None) -> Observation:
+    async def reset(
+        self, task_id: str, metadata: dict[str, Any] | None = None
+    ) -> Observation:
         """
         Reset the environment to the task.
@@ -213,6 +240,18 @@ class Env:
         )
         return Observation(**data["observation"])
+    async def wait_for_ready(self) -> None:
+        """Wait for the environment to be ready"""
+        while True:
+            state = await self.get_env_state()
+            if state in (
+                EnvironmentStatus.RUNNING.value,
+                EnvironmentStatus.ERROR.value,
+                EnvironmentStatus.COMPLETED.value,
+            ):
+                logger.info("Environment %s %s", self.id, status_messages.get(state))
+                break
+            await asyncio.sleep(10)
 class EvalSet:
     """

hud-python 0.1.0__tar.gz → 0.1.0b2__tar.gz

Potentially problematic release.

hud-python 0.1.0tar.gz → 0.1.0b2tar.gz