PyPI - hud-python - Versions diffs - 0.2.6__tar.gz → 0.2.7__tar.gz - Mend

hud-python 0.2.6tar.gz → 0.2.7tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (171) hide show

{hud_python-0.2.6 → hud_python-0.2.7}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: hud-python
-Version: 0.2.6
+Version: 0.2.7
 Summary: SDK for the HUD evaluation platform.
 Project-URL: Homepage, https://github.com/hud-evals/hud-sdk
 Project-URL: Bug Tracker, https://github.com/hud-evals/hud-sdk/issues
@@ -47,6 +47,7 @@ Requires-Dist: langchain-openai
 Requires-Dist: mcp
 Requires-Dist: numpy
 Requires-Dist: openai
+Requires-Dist: pathspec>=0.12.1
 Requires-Dist: pillow>=11.1.0
 Requires-Dist: pydantic-settings<3,>=2
 Requires-Dist: pydantic<3,>=2
@@ -61,7 +62,7 @@ Requires-Dist: ipython<9; extra == 'dev'
 Requires-Dist: jupyter-client; extra == 'dev'
 Requires-Dist: jupyter-core; extra == 'dev'
 Requires-Dist: openai; extra == 'dev'
-Requires-Dist: pyright==1.1.364; extra == 'dev'
+Requires-Dist: pyright==1.1.401; extra == 'dev'
 Requires-Dist: pytest-asyncio; extra == 'dev'
 Requires-Dist: pytest-cov; extra == 'dev'
 Requires-Dist: pytest-mock; extra == 'dev'
@@ -90,7 +91,7 @@ We're here to help with eval strategies, custom environments, or improving your
 ## ✨ What You Can Do
-**Evaluate Existing Benchmarks**
+**[Evaluate Existing Benchmarks](https://docs.hud.so/examples/benchmarking-agents)**
 ```python
 from hud import load_taskset, run_job, ClaudeAgent
@@ -98,7 +99,7 @@ taskset = await load_taskset("WebVoyager")  # or GAIA, OSWorld-Ubuntu, Mind2Web
 job = await run_job(ClaudeAgent, taskset, "my-evaluation")
 ```
-**Create Custom Tasks**
+**[Create Custom Tasks](https://docs.hud.so/task-creation)**
 ```python
 from hud.task import Task
@@ -110,7 +111,7 @@ task = Task(
 )
 ```
-**Build Custom Environments**
+**[Build Custom Environments](https://docs.hud.so/environment-creation)**
 ```python
 from hud.types import CustomGym
@@ -123,7 +124,7 @@ custom_gym = CustomGym(
 # Or create complex Docker environments - see environments/ folder for examples
 ```
-**Trace Tool Calls Alongside HUD Environments (or Independently)**
+**[Trace Tool Calls Alongside HUD Environments (or Independently)](https://docs.hud.so/examples/mcp-agent-tracing)**
 ```python
 import hud
@@ -171,6 +172,7 @@ async def main():
         setup=("goto", "google.com"),
         evaluate=("contains_text", "capybara")
     )
+    print(f"Running task with prompt: {task.prompt}")
     # Create environment using the gym module
     env = await gym.make(task)
@@ -182,6 +184,7 @@ async def main():
     obs, _ = await env.reset() # Gets first observation
     for i in range(5):
         actions, done = await agent.predict(obs)
+        print(f"Agent action {i}: {actions}")
         obs, reward, terminated, info = await env.step(actions)
         if done or terminated: break

{hud_python-0.2.6 → hud_python-0.2.7}/README.md RENAMED Viewed

@@ -19,7 +19,7 @@ We're here to help with eval strategies, custom environments, or improving your
 ## ✨ What You Can Do
-**Evaluate Existing Benchmarks**
+**[Evaluate Existing Benchmarks](https://docs.hud.so/examples/benchmarking-agents)**
 ```python
 from hud import load_taskset, run_job, ClaudeAgent
@@ -27,7 +27,7 @@ taskset = await load_taskset("WebVoyager")  # or GAIA, OSWorld-Ubuntu, Mind2Web
 job = await run_job(ClaudeAgent, taskset, "my-evaluation")
 ```
-**Create Custom Tasks**
+**[Create Custom Tasks](https://docs.hud.so/task-creation)**
 ```python
 from hud.task import Task
@@ -39,7 +39,7 @@ task = Task(
 )
 ```
-**Build Custom Environments**
+**[Build Custom Environments](https://docs.hud.so/environment-creation)**
 ```python
 from hud.types import CustomGym
@@ -52,7 +52,7 @@ custom_gym = CustomGym(
 # Or create complex Docker environments - see environments/ folder for examples
 ```
-**Trace Tool Calls Alongside HUD Environments (or Independently)**
+**[Trace Tool Calls Alongside HUD Environments (or Independently)](https://docs.hud.so/examples/mcp-agent-tracing)**
 ```python
 import hud
@@ -100,6 +100,7 @@ async def main():
         setup=("goto", "google.com"),
         evaluate=("contains_text", "capybara")
     )
+    print(f"Running task with prompt: {task.prompt}")
     # Create environment using the gym module
     env = await gym.make(task)
@@ -111,6 +112,7 @@ async def main():
     obs, _ = await env.reset() # Gets first observation
     for i in range(5):
         actions, done = await agent.predict(obs)
+        print(f"Agent action {i}: {actions}")
         obs, reward, terminated, info = await env.step(actions)
         if done or terminated: break

{hud_python-0.2.6 → hud_python-0.2.7}/docs/docs.json RENAMED Viewed

@@ -24,6 +24,7 @@
       "pages": [
         "examples/benchmarking-agents",
         "examples/alignment-evaluation",
+        "examples/web-mocks",
         "examples/custom-os-env",
         "examples/mcp-agent-tracing",
         "examples/web-app-testing"

{hud_python-0.2.6 → hud_python-0.2.7}/docs/environment-creation.mdx RENAMED Viewed

@@ -329,7 +329,7 @@ We strongly encourage community contributions! If you've built a useful custom e
 Check the `environments/` directory in the SDK for inspiration:
 -   `environments/novnc_ubuntu/`: Provides an Ubuntu desktop accessible via VNC, for GUI-based tasks.
 -   `environments/custom_website/`: A template for packaging and testing your own web application.
--   `environments/gameboy/`: Example of a retro gaming environment.
+-   `environments/pokemon_controller/`: Example of a retro gaming environment.
 ## Using Remote Custom Environments
@@ -375,4 +375,4 @@ task_on_remote2 = Task(
 - **[Task Creation](/task-creation)**: How to define tasks that use your custom environments.
 - **[Custom Environments Overview](/environments/custom)**: Higher-level concepts of custom environments.
-- **[Browser Environment](/environments/browser)**: For standard web interaction tasks.
+- **[Browser Environment](/environments/browser)**: For standard web interaction tasks.

hud_python-0.2.7/docs/examples/web-mocks.mdx ADDED Viewed

@@ -0,0 +1,240 @@
+---
+title: 'Web Mocks'
+description: 'Clone websites and host them as stable test environments for AI agents using HUD page archives.'
+icon: 'clone'
+---
+# Page Cloning
+This guide demonstrates how to create and host web archives for testing AI agents with consistent, offline-first environments. By cloning websites into WACZ (Web ARChiveZip) files, you can ensure your agents always test against specific, unchanging versions of web pages.
+**Goal**: Create reproducible web environments for testing browser-based agents without depending on live websites that might change or go offline.
+**Concepts Covered**:
+- Using ArchiveWeb.page to clone websites into WACZ files
+- Hosting archives locally with the HUD page archives repository and `CustomGym`
+- Uploading archives to app.hud.so for immediate cloud hosting
+- Creating tasks that use these stable archived environments
+## Prerequisites
+- HUD SDK installed
+- Docker installed (for local hosting option)
+- ArchiveWeb.page browser extension (for cloning pages)
+- API keys for HUD and your chosen agent
+## Part 1: Cloning the Page
+### Installing ArchiveWeb.page
+1. **Install the Browser Extension**:
+   - Visit [ArchiveWeb.page](https://archiveweb.page)
+   - Install the extension for Chrome/Chromium-based browsers
+   - The extension icon will appear in your browser toolbar
+2. **Create a New Archive**:
+   - Click the ArchiveWeb.page extension icon
+   - Click "Create New Collection"
+   - Give your collection a descriptive name (e.g., "my-test-site")
+### Capturing Web Pages
+1. **Start Archiving**:
+   - Click "Start" in the extension popup to begin an archiving session
+   - Navigate to the website you want to clone
+   - Interact with the site as your agent would (login, navigate through pages, fill forms)
+   - All pages and resources will be captured automatically
+2. **Best Practices for Agent Testing**:
+   - Capture all relevant pages and states your agent will interact with
+   - Include error pages and edge cases
+   - If testing login flows, capture both logged-out and logged-in states
+   - For form submissions, capture the form page and success/error pages
+3. **Stop and Download**:
+   - Click "Stop" in the extension when done capturing
+   - Click "Download" to save your collection
+   - Choose WACZ format (default)
+   - Save with a meaningful filename (e.g., `my-test-site.wacz`)
+### Example: Cloning a Login Flow
+```
+1. Start archiving session
+2. Visit https://example.com/login
+3. Enter test credentials (e.g., testuser/password123)
+4. Submit the form
+5. Capture the dashboard/welcome page
+6. Optionally capture logout flow
+7. Stop and download as my-test-site.wacz
+```
+## Part 2: Hosting the Website
+You have two options for hosting your archived website:
+### Option 1: Local Hosting with CustomGym
+This approach uses the [HUD page archives repository](https://github.com/hud-evals/page-archives) to host archives locally and access them via `CustomGym`.
+#### Step 1: Clone the Page Archives Repository
+```bash
+git clone https://github.com/hud-evals/page-archives.git
+cd page-archives
+```
+#### Step 2: Add Your Archive
+1. **Place your WACZ file**:
+   ```bash
+   cp ~/Downloads/my-test-site.wacz archives/
+   ```
+2. **Update `archives/archive_list.json`**:
+   ```json
+   {
+     "archives": [
+       {
+         "name": "my-test-site",
+         "displayName": "My Test Site Archive",
+         "startPage": "https://example.com/login"  // Optional: default page to open
+       }
+       // ... other archives
+     ]
+   }
+   ```
+   Note: The `name` field must match your WACZ filename without the `.wacz` extension.
+#### Step 3: Create a CustomGym for the Archive Server
+```python
+from hud.types import CustomGym
+from pathlib import Path
+# Create a Dockerfile for the archive server
+archive_server_dockerfile = """
+FROM node:18-slim
+WORKDIR /app
+COPY . /app
+RUN npm install
+EXPOSE 3000
+CMD ["npm", "run", "start"]
+"""
+# Save Dockerfile in the page-archives directory
+with open("page-archives/Dockerfile", "w") as f:
+    f.write(archive_server_dockerfile)
+# Define the CustomGym
+archive_server_gym = CustomGym(
+    location="local",
+    image_or_build_context=Path("./page-archives"),
+    host_config={
+        "port_bindings": {3000: 3000}  # Expose port 3000
+    }
+)
+```
+#### Step 4: Create Tasks Using the Archived Site
+```python
+from hud import Task, run_job
+from hud.agent import ClaudeAgent
+# Task to test login flow on the archived site
+login_task = Task(
+    prompt="Log into the website using username 'testuser' and password 'password123'.",
+    gym="hud-browser",  # Use browser to interact
+    setup=[
+        # Navigate to your archived site running locally
+        ("goto", "http://localhost:3000/my-test-site")
+    ],
+    evaluate=("page_contains", "Welcome, testuser!")
+)
+```
+#### Advanced: Query Parameters
+The archive viewer supports useful query parameters:
+```python
+# Open a specific page within the archive
+specific_page_task = Task(
+    prompt="Navigate to the user profile page",
+    gym="hud-browser",
+    setup=[
+        ("goto", "http://localhost:3000/my-test-site?page=https%3A%2F%2Fexample.com%2Fprofile")
+    ]
+)
+# Debug mode - shows full ReplayWeb.page UI
+debug_task = Task(
+    prompt="Explore the archive interface",
+    gym="hud-browser",
+    setup=[
+        ("goto", "http://localhost:3000/my-test-site?debug=true")
+    ]
+)
+```
+### Option 2: Cloud Hosting on app.hud.so
+For immediate hosting without local setup, use the HUD platform's built-in page cloning feature.
+#### Step 1: Access Page Clone Feature
+1. Go to [app.hud.so](https://app.hud.so)
+2. Click "Create" in the navigation
+3. Select "Page Clone"
+#### Step 2: Upload Your Archive
+1. Click "Upload WACZ file"
+2. Select your `.wacz` file created in Part 1
+3. Provide a name for your cloned environment
+4. Click "Create"
+#### Step 3: Use the Hosted Archive
+Once uploaded, you'll receive a URL for your hosted archive (e.g., `https://archives.hud.so/your-archive-id`).
+```python
+from hud import Task, run_job
+from hud.agent import ClaudeAgent
+# Task using the cloud-hosted archive
+cloud_login_task = Task(
+    prompt="Log into the website using username 'testuser' and password 'password123'.",
+    gym="hud-browser",
+    setup=[
+        # Navigate to your cloud-hosted archive
+        ("goto", "https://archives.hud.so/your-archive-id")
+    ],
+    evaluate=("page_contains", "Welcome, testuser!")
+)
+# Run evaluation
+job = await run_job(
+    agent_cls=ClaudeAgent,
+    task_or_taskset=cloud_login_task,
+    job_name="Cloud Archive Test"
+)
+```
+## Tips for Effective Page Cloning
+1. **Capture Complete Flows**: Don't just capture individual pages - capture entire user journeys
+2. **Include Resources**: Ensure CSS, JavaScript, and images are properly captured
+3. **Test Your Archives**: Always verify your archives work correctly before using them in evaluations
+4. **Document States**: Keep notes on what states and pages are included in each archive
+5. **Update Regularly**: Re-clone sites when significant changes occur
+## Key Takeaways
+- ArchiveWeb.page makes it easy to create WACZ archives of any website
+- Local hosting with CustomGym gives you full control and fast performance
+- Cloud hosting on app.hud.so provides instant deployment without infrastructure
+- Page cloning ensures consistent, reproducible testing environments for AI agents
+- Archived sites eliminate external dependencies and enable offline testing

{hud_python-0.2.6 → hud_python-0.2.7}/docs/task-creation.mdx RENAMED Viewed

@@ -30,6 +30,10 @@ task = Task(
     setup=("goto", "https://news.example.com"), # Function to run at env.reset()
     evaluate=("page_contains", "artificial intelligence") # Function to run at env.evaluate()
 )
+# Create environment
+env = gym.make(task)
+# ...
 ```
 ## Setup Functions (for `hud-browser`)

hud-python 0.2.6__tar.gz → 0.2.7__tar.gz

hud-python 0.2.6tar.gz → 0.2.7tar.gz