PyPI - hud-python - Versions diffs - 0.4.57__tar.gz → 0.4.59__tar.gz - Mend

hud-python 0.4.57tar.gz → 0.4.59tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Potentially problematic release.

This version of hud-python might be problematic. Click here for more details.

Files changed (309) hide show

{hud_python-0.4.57 → hud_python-0.4.59}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: hud-python
-Version: 0.4.57
+Version: 0.4.59
 Summary: SDK for the HUD platform.
 Project-URL: Homepage, https://github.com/hud-evals/hud-python
 Project-URL: Bug Tracker, https://github.com/hud-evals/hud-python/issues
@@ -38,6 +38,7 @@ Requires-Python: <3.13,>=3.11
 Requires-Dist: anthropic
 Requires-Dist: blessed>=1.20.0
 Requires-Dist: datasets>=2.14.0
+Requires-Dist: google-genai
 Requires-Dist: httpx<1,>=0.23.0
 Requires-Dist: hud-fastmcp-python-sdk>=0.1.2
 Requires-Dist: hud-mcp-python-sdk>=3.13.2

{hud_python-0.4.57 → hud_python-0.4.59}/environments/README.md RENAMED Viewed

@@ -495,7 +495,7 @@ from hud.agents import ClaudeAgent
 from hud.clients import MCPClient
 async def main():
-    # `trace` captures *everything* that happens and sends it to hud.so
+    # `trace` captures *everything* that happens and sends it to hud.ai
     with hud.trace("local_test"):
         task = Task(
             prompt="Complete the task",
@@ -524,7 +524,7 @@ async def main():
 asyncio.run(main())
 ```
-The `trace` context manager sends a full timeline of agent actions, tool calls, and rewards to hud.so – perfect for debugging.
+The `trace` context manager sends a full timeline of agent actions, tool calls, and rewards to hud.ai – perfect for debugging.
 See `examples/01_hello_2048.py` and `examples/task_with_setup_eval.py` for larger end-to-end demos.
@@ -532,7 +532,7 @@ See `examples/01_hello_2048.py` and `examples/task_with_setup_eval.py` for large
 ## Phase 4 – Remote Deployment & HUD Runner
-**Goal →** the exact same image runs in parallel on hundreds of instances, and exposes more telemetry so the hud.so can visualise the whole lifecycle.
+**Goal →** the exact same image runs in parallel on hundreds of instances, and exposes more telemetry so the hud.ai can visualise the whole lifecycle.
 ### 1. Publish your image
@@ -595,11 +595,11 @@ async def initialize_environment(session=None, progress_token=None):
     await send(100, "ready")
 ```
-Those messages are displayed live on hud.so alongside resource graphs – perfect feedback while you wait.
+Those messages are displayed live on hud.ai alongside resource graphs – perfect feedback while you wait.
 ### 4. Live telemetry (`telemetry://live`) (Optional)
-Expose a resource named `telemetry://live` exactly like in `environments/browser/src/hud_controller/server.py` to return live url to be displayed on hud.so.
+Expose a resource named `telemetry://live` exactly like in `environments/browser/src/hud_controller/server.py` to return live url to be displayed on hud.ai.
 Once all of the above works you can unleash *hundreds* of concurrent agents on your new environment.

hud_python-0.4.59/environments/browser/browser-base/README.md ADDED Viewed

@@ -0,0 +1,58 @@
+# Browser Base Image
+Base Docker image for browser environments with Playwright, Chromium, and VNC support.
+## Build
+```bash
+docker build -t browser-base:latest .
+```
+## Test with VNC Access
+### 1. Start the container
+```bash
+docker run -it --rm \
+  -p 6080:6080 \
+  -p 5900:5900 \
+  -e DISPLAY=:1 \
+  browser-base:latest \
+  bash
+```
+### 2. Inside the container, start display servers
+```bash
+Xvfb :1 -screen 0 1920x1080x24 > /dev/null 2>&1 &
+x11vnc -display :1 -nopw -listen 0.0.0.0 -forever > /dev/null 2>&1 &
+/usr/share/novnc/utils/novnc_proxy --vnc localhost:5900 --listen 6080 > /dev/null 2>&1 &
+```
+### 3. Test Playwright
+```bash
+python3 -c "
+from playwright.sync_api import sync_playwright
+with sync_playwright() as p:
+    browser = p.chromium.launch(headless=False)
+    page = browser.new_page()
+    page.goto('https://example.com')
+    print('Title:', page.title())
+    input('Press Enter to close...')
+    browser.close()
+"
+```
+### 4. View in browser
+Open `http://localhost:6080/vnc.html` to see Chromium running.
+## What's Included
+- Ubuntu 24.04
+- Desktop environment (Xvfb, x11vnc, noVNC, xfce4)
+- Node.js & npm
+- Python 3 with uv package manager
+- Playwright with Chromium
+- Development tools (git, curl, wget, etc.)

{hud_python-0.4.57 → hud_python-0.4.59}/environments/browser/server/pyproject.toml RENAMED Viewed

@@ -4,7 +4,7 @@ version = "0.1.0"
 description = "HUD Browser MCP Server"
 requires-python = ">=3.11,<3.14"
 dependencies = [
-    "hud-python>=0.4.54",
+    "hud-python>=0.4.59",
     "httpx",
     "playwright",
     "pyautogui",

hud_python-0.4.59/environments/rubrics/README.md ADDED Viewed

@@ -0,0 +1,239 @@
+# SEC EDGAR Rubrics Environment
+SEC filing research environment powered by the SEC EDGAR database for accessing company filings and financial data, with rubric-based evaluation for structured grading provided by [The LLM Data Company](https://llmdata.com).
+See [docs](https://docs.hud.so/build-environments) for the complete environment design workflow.
+## Architecture
+**`environment/`** - Manages SEC EDGAR and web search integration
+- Uses the edgartools Python library to access SEC filing data
+- Integrates with Exa API for supplementary web search capabilities
+- Exposes HTTP endpoints for research workflows with exponential backoff for rate limiting
+**`server/`** - Wraps data in MCP tools
+- Provides research tools for agents to access SEC filings, financial data, and web search
+- Agents and tasks interact only with these tools
+**Why separate?** Edit tools for the agent or tasks without restarting the environment backend.
+## Tools
+### SEC EDGAR Tools
+- **`setup()`** - Initialize the environment and reset state.
+- **`search_company(query: str)`** - Search for a company by ticker symbol or name. Returns company information including ticker, name, and CIK.
+- **`get_filings(ticker?: str, form_type?: str, limit?: int, cutoff_date?: str)`** - Get SEC filings. When `ticker` is provided, returns company-specific filings. Otherwise, returns global recent filings. Can filter by form type (e.g., "10-K", "10-Q", "8-K"), limit results, and filter by date (YYYY-MM-DD).
+- **`get_filing_content(filing_url: str)`** - Fetch the full text content of a specific SEC filing from its URL.
+- **`get_financial_data(ticker: str, accession_number: str)`** - Extract financial statements and key metrics from a 10-K or 10-Q filing. Returns income statement, balance sheet, cash flow, and other financial data.
+- **`get_segment_data(ticker: str, accession_number: str)`** - Extract segment-level financial data from a 10-K or 10-Q filing for companies with multiple business segments.
+- **`get_filing_sections(ticker: str, accession_number: str)`** - Extract specific sections from a 10-K or 10-Q filing (e.g., Business, Risk Factors, MD&A).
+### Web Search Tools
+- **`web_search(query: str)`** - Search the web using Exa API. Returns titles and URLs of relevant results.
+- **`web_fetch(url: str)`** - Fetch and extract content from a web URL. Returns summary, highlights, and full content.
+### Evaluation Tools
+- **`answer(final_answer: str)`** - Submit the final research answer.
+- **`evaluate(rubric: list[dict])`** - Evaluate submitted answer using a structured rubric with weighted requirements.
+### Rubric-Based Evaluation
+The `evaluate` tool uses The LLM Data Company's [rubric](https://github.com/The-LLM-Data-Company/rubric/) package to grade answers against structured criteria with autograders.
+## Setup
+### Environment Variables
+The environment requires several API keys and configuration:
+**Required:**
+- `EDGAR_IDENTITY` - Your identity for SEC EDGAR access (required by SEC regulations)
+  - Format: `"Your Name your.email@example.com"`
+**Optional:**
+- `EXA_API_KEY` - For web search and content fetching capabilities (if using web_search/web_fetch tools)
+- `HUD_API_KEY` - For HUD telemetry and tracing
+- `ANTHROPIC_API_KEY` - For Claude agent (if using Claude)
+- `OPENAI_API_KEY` - For rubric evaluation (if using OpenAI-based autograders)
+Add these to your .env before running `hud eval`:
+```bash
+export EDGAR_IDENTITY="Your Name your.email@example.com"
+export EXA_API_KEY="your-exa-key" # optional, for web search
+export ANTHROPIC_API_KEY="your-anthropic-key" # only if using an Anthropic model
+export OPENAI_API_KEY="your-openai-key"
+# Optional
+export HUD_API_KEY="your-hud-key"
+```
+## Development
+```bash
+# Terminal 1 - Environment backend
+cd environment
+export EDGAR_IDENTITY="Your Name your.email@example.com"
+export EXA_API_KEY="your-exa-key"  # optional, for web search
+uv run uvicorn server:app --reload
+# Terminal 2 - MCP server
+cd server
+uv run hud dev
+```
+The environment includes exponential backoff for rate limiting, so API calls will automatically retry on 429 errors.
+In general, we recommend starting work on the environment backend first, then developing the MCP server to expose the right things to the agent.
+For complex environments that require many dependencies, we recommend running `hud dev` in the environment root:
+```bash
+cd ..
+hud dev
+```
+## Tasks & Evaluation
+```bash
+# Build first in the global folder with the Dockerfile (creates rubrics:latest)
+hud build
+```
+Your `tasks.json` uses `docker run` to launch the environment:
+```json
+{
+  "prompt": "Analyze Tesla's FY2024 10-K filing...",
+  "mcp_config": {
+    "local": {
+      "command": "docker",
+      "args": ["run", "--rm", "-i", "rubrics:latest"]
+    }
+  },
+  "evaluate_tool": {
+    "name": "evaluate",
+    "arguments": {
+      "rubric": [...]
+    }
+  }
+}
+```
+**Note:** Export environment variables before running. The Docker container will inherit them from your shell.
+**Commands:**
+```bash
+# Build first
+hud build
+# Test task locally
+export EDGAR_IDENTITY="Your Name your.email@example.com"
+export EXA_API_KEY="your-exa-key"  # optional, for web search
+export ANTHROPIC_API_KEY="your-anthropic-key"
+export OPENAI_API_KEY="your-openai-key"
+hud eval tasks.json --max-steps 25
+# Push environment for remote running
+hud push
+# Production RL training
+hud rl tasks.json  # Auto-converts docker→remote, builds & pushes if needed
+```
+## Publishing Your Environment
+Once your environment is ready, you can share it with the community:
+### 1. Push to Registry
+```bash
+# Build and push your environment (requires docker hub login and hud api key)
+hud build
+hud push
+```
+### 2. Create a Dataset
+Create a dataset on HuggingFace with your tasks:
+**Option A: Upload manually**
+1. Upload your `tasks.json` to HuggingFace
+2. Make sure it's **public** to appear on leaderboards
+**Option B: Use the SDK**
+```python
+from hud.datasets import save_tasks
+import json
+# Load your tasks
+with open("tasks.json") as f:
+    tasks = json.load(f)
+# Push to HuggingFace
+save_tasks(tasks, repo_id="your-org/your-dataset")
+```
+### 3. Run and Track Performance
+```bash
+# Run Claude on your benchmark
+hud eval "your-org/your-dataset" --agent claude
+# View results at:
+# hud.so/leaderboards/your-org/your-dataset
+```
+**Note**: Only public HuggingFace datasets appear as leaderboards!
+📚 Learn more: [Creating Benchmarks](https://docs.hud.so/evaluate-agents/create-benchmarks) | [Leaderboards](https://docs.hud.so/evaluate-agents/leaderboards)
+## Example Research Workflow
+```python
+# Initialize environment
+setup()
+# Agent searches for a company
+company_info = search_company("TSLA")
+# Returns: [{"ticker": "TSLA", "name": "Tesla Inc", "cik": "1318605"}]
+# Agent gets recent filings
+filings = get_filings(ticker="TSLA", form_type="10-K", limit=1)
+# Returns: [{"filing_date": "2024-01-01", "form_type": "10-K", "accession_number": "...", "filing_url": "..."}]
+# Agent extracts financial data
+financial_data = get_financial_data(ticker="TSLA", accession_number=filings[0]["accession_number"])
+# Returns: {"has_financials": True, "financial_data": {...income statement, balance sheet, etc...}}
+# Agent gets specific sections from the filing
+sections = get_filing_sections(ticker="TSLA", accession_number=filings[0]["accession_number"])
+# Returns: {"sections": {"business": "...", "risk_factors": "...", "mda": "..."}}
+# Agent uses web search for additional context
+search_results = web_search("Tesla FY2024 revenue analysis")
+# Returns: [{"title": "...", "url": "..."}]
+# Agent fetches web content
+web_content = web_fetch(search_results[0]["url"])
+# Returns: "=== SUMMARY ===\n...\n=== KEY HIGHLIGHTS ===\n...\n=== FULL CONTENT ===\n..."
+# Agent submits final answer
+answer("Based on Tesla's FY2024 10-K, revenue was $96.8B...")
+# Evaluate answer using rubric
+result = evaluate(rubric=[
+    {"requirement": "Correctly states FY2024 revenue", "weight": 15},
+    {"requirement": "Provides segment breakdown", "weight": 5},
+])
+# Returns: {"reward": float, "info": {"report": [...]}, "done": True}
+```
+## Dependencies
+- **edgartools**: Python library for accessing SEC EDGAR data
+- **fastapi**: Web framework for the environment server
+- **httpx**: HTTP client for API calls
+- **rubric**: LLM Data Company's rubric evaluation package
+- **Exa API**: Web search and content extraction (optional, for web_search/web_fetch tools)
+## Acknowledgments
+* [EdgarTools](https://github.com/dgunning/edgartools) - Python library to access SEC EDGAR
+* [SEC EDGAR MCP](https://github.com/stefanoamorelli/sec-edgar-mcp) - Rich OSS SEC MCP server

{hud_python-0.4.57 → hud_python-0.4.59}/environments/rubrics/environment/pyproject.toml RENAMED Viewed

@@ -1,13 +1,14 @@
 [project]
 name = "rubrics-environment"
 version = "0.1.0"
-description = "Backend service for Rubrics environment"
+description = "Backend service for Rubrics environment with SEC EDGAR integration"
 requires-python = ">=3.11"
 dependencies = [
     "fastapi>=0.104.1",
     "uvicorn[standard]>=0.24.0",
     "httpx>=0.24.0",
-    "rubric>=1.1.7",
+    "rubric==1.1.8",
+    "edgartools>=4.21.3",
 ]
 [build-system]

{hud_python-0.4.57 → hud_python-0.4.59}/environments/rubrics/pyproject.toml RENAMED Viewed

@@ -16,4 +16,4 @@ image = "rubrics:dev"
 allow-direct-references = true
 [tool.hatch.build.targets.wheel]
-packages = [ "controller", "environment",]
+packages = [ "server", "environment",]

{hud_python-0.4.57 → hud_python-0.4.59}/hud/agents/__init__.py RENAMED Viewed

@@ -2,11 +2,13 @@ from __future__ import annotations
 from .base import MCPAgent
 from .claude import ClaudeAgent
+from .gemini import GeminiAgent
 from .openai import OperatorAgent
 from .openai_chat_generic import GenericOpenAIChatAgent
 __all__ = [
     "ClaudeAgent",
+    "GeminiAgent",
     "GenericOpenAIChatAgent",
     "MCPAgent",
     "OperatorAgent",

hud-python 0.4.57__tar.gz → 0.4.59__tar.gz

Potentially problematic release.

hud-python 0.4.57tar.gz → 0.4.59tar.gz