PyPI - hud-python - Versions diffs - 0.2.1__tar.gz → 0.2.3__tar.gz - Mend

hud-python 0.2.1tar.gz → 0.2.3tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (128) hide show

{hud_python-0.2.1 → hud_python-0.2.3}/.github/workflows/ci.yml RENAMED Viewed

@@ -4,7 +4,7 @@ on:
   push:
     branches: [ "main" ]
   pull_request:
-    branches: [ "main" ]
+    branches: [ "*" ]
 jobs:
   test:
@@ -24,7 +24,7 @@ jobs:
         run: uv python install ${{ matrix.python-version }}
       - name: Run tests
-        run: uv run --python ${{ matrix.python-version }} --with=".[dev]" pytest
+        run: uv run --python ${{ matrix.python-version }} --with=".[dev]" pytest --rootdir=hud --cov --cov-report=''
   lint-ruff:
     runs-on: ubuntu-latest
@@ -35,7 +35,7 @@ jobs:
       - name: Run ruff
         run: |
-          uv run --with=".[dev]" ruff format .
+          uv run --with=".[dev]" ruff format . --check
           uv run --with=".[dev]" ruff check .
   lint-pyright:

{hud_python-0.2.1 → hud_python-0.2.3}/.gitignore RENAMED Viewed

@@ -25,3 +25,5 @@ uv.lock
 /*.ipynb
 test.json
 TODO.md
+.coverage

{hud_python-0.2.1 → hud_python-0.2.3}/PKG-INFO RENAMED Viewed

@@ -1,11 +1,11 @@
 Metadata-Version: 2.4
 Name: hud-python
-Version: 0.2.1
+Version: 0.2.3
 Summary: SDK for the HUD evaluation platform.
 Project-URL: Homepage, https://github.com/hud-evals/hud-sdk
 Project-URL: Bug Tracker, https://github.com/hud-evals/hud-sdk/issues
 Project-URL: Documentation, https://hud.so
-Author-email: Human Union Data SDK <founders@hud.so>
+Author-email: HUD SDK <founders@hud.so>
 License: MIT License
         Copyright (c) 2025 Human Union Data, Inc
@@ -37,8 +37,14 @@ Classifier: Programming Language :: Python :: 3.12
 Classifier: Programming Language :: Python :: 3.13
 Requires-Python: <3.14,>=3.10
 Requires-Dist: aiodocker>=0.24.0
+Requires-Dist: anthropic
 Requires-Dist: httpx<1,>=0.23.0
 Requires-Dist: inspect-ai>=0.3.80
+Requires-Dist: ipykernel
+Requires-Dist: langchain
+Requires-Dist: langchain-openai
+Requires-Dist: numpy
+Requires-Dist: openai
 Requires-Dist: pillow>=11.1.0
 Requires-Dist: pydantic-settings<3,>=2
 Requires-Dist: pydantic<3,>=2
@@ -53,8 +59,11 @@ Requires-Dist: jupyter-client; extra == 'dev'
 Requires-Dist: jupyter-core; extra == 'dev'
 Requires-Dist: openai; extra == 'dev'
 Requires-Dist: pyright==1.1.364; extra == 'dev'
+Requires-Dist: pytest-asyncio; extra == 'dev'
+Requires-Dist: pytest-cov; extra == 'dev'
+Requires-Dist: pytest-mock; extra == 'dev'
 Requires-Dist: pytest<9,>=8.1.1; extra == 'dev'
-Requires-Dist: ruff==0.9.8; extra == 'dev'
+Requires-Dist: ruff==0.11.8; extra == 'dev'
 Description-Content-Type: text/markdown
 # HUD
@@ -88,17 +97,17 @@ pip install hud-python
 ### Simple Browser Example with Claude Computer Use
-> This example uses the `@job("test-run")` decorator, so the results of this run will appear under the job named "test-run" on the your [HUD Jobs page](https://app.hud.so/jobs).
+> This example uses the `@register_job("test-run")` decorator, so the results of this run will appear under the job named "test-run" on the your [HUD Jobs page](https://app.hud.so/jobs).
 Make sure your have defined your `ANTRHOPIC_API_KEY` in environment variables to run Claude.
 ```python
 import asyncio
-from hud import gym, job
+from hud import gym, register_job
 from hud.task import Task
 from hud.agent import ClaudeAgent
-@job("test-run")
+@register_job("test-run")
 async def main():
     task = Task(
         prompt="Insert the text 'capybara' into the search bar",
@@ -117,10 +126,9 @@ async def main():
     obs, _ = await env.reset() # Gets first observation
     for i in range(5):
         actions, done = await agent.predict(obs)
-        if done:
-            break
         obs, reward, terminated, info = await env.step(actions)
+        if done or terminated: break
     # Evaluate and close
     result = await env.evaluate()
@@ -132,22 +140,37 @@ if __name__ == "__main__":
 ```
+Alternatively, run a full evaluation set via the ```run_job``` command:
+```python
+from hud import load_taskset, run_job, ClaudeAgent
+# load
+taskset = load_taskset("GAIA")
+# evaluate
+job = await run_job(ClaudeAgent, taskset, "test-gaia-job")
+# get results OR view them in app.hud.so
+print(await job.get_analytics())
+```
 ## Documentation Sections
 Explore the core concepts and features of the SDK:
-*   **[Tasks and TaskSets](/concepts/task)**: Define goals, context, setup, and evaluation criteria for agent scenarios. This includes both interactive and **question-answering (QA)** style tasks.
-*   **[Environments](/concepts/environment)**: Understand the browser and OS runtimes where agents interact.
-*   **[Agents](/concepts/agent)**: Learn about the agent architecture (Claude, Operator) and how they process observations and predict actions.
-*   **[Adapters](/concepts/adapter)**: See how actions and observations are translated between agents and environments.
-*   **[Jobs](/concepts/job)**: Group related runs for analysis and viewing on the HUD platform.
-*   **[Trajectories](/concepts/trajectory)**: Understand the recorded data from each agent run.
+*   **[Tasks and TaskSets](https://documentation.hud.so/concepts/task)**: Define goals, context, setup, and evaluation criteria for agent scenarios. This includes both interactive and **question-answering (QA)** style tasks.
+*   **[Environments](https://documentation.hud.so/concepts/environment)**: Understand the browser and OS runtimes where agents interact.
+*   **[Agents](https://documentation.hud.so/concepts/agent)**: Learn about the agent architecture (Claude, Operator) and how they process observations and predict actions.
+*   **[Adapters](https://documentation.hud.so/concepts/adapter)**: See how actions and observations are translated between agents and environments.
+*   **[Jobs](https://documentation.hud.so/concepts/job)**: Group related runs for analysis and viewing on the HUD platform.
+*   **[Trajectories](https://documentation.hud.so/concepts/trajectory)**: Understand the recorded data from each agent run.
 *   **Advanced Topics**:
-    *   **[CLA Action Details](/advanced/cla-details)**: Explore the standardized action format.
-    *   **[Custom Environments](/advanced/custom-environments)**: Build your own Docker-based local or remote environments.
-    *   **[Advanced Environment Control](/advanced/environment-control)**: Use `invoke`, `execute`, and `_setup` for finer control.
+    *   **[CLA Action Details](https://documentation.hud.so/advanced/cla-details)**: Explore the standardized action format.
+    *   **[Custom Environments](https://documentation.hud.so/advanced/custom-environments)**: Build your own Docker-based local or remote environments.
+    *   **[Advanced Environment Control](https://documentation.hud.so/advanced/environment-control)**: Use `invoke`, `execute`, and `_setup` for finer control.
-*   **[Full API Reference](/api-reference/gym)**: Detailed specifications for all modules and classes.
+*   **[Full API Reference](https://documentation.hud.so/api-reference/gym)**: Detailed specifications for all modules and classes.
 ## [Examples](examples/)
@@ -160,7 +183,7 @@ We recommend you first take a look at the example notebooks showing how to use t
 ## Documentation
-For comprehensive guides, examples, and API reference, visit [our docs](https://docs.hud.so/introduction)
+For comprehensive guides, examples, and API reference, visit [our docs](https://documentation.hud.so/introduction)
 ## License
@@ -172,7 +195,7 @@ If you use this SDK in your research, please cite it as follows:
 ```bibtex
 @software{hud2025agentevalplatform,
-  author = {HUD and Jay Ram and Lorenss Martinsons and Parth Patel and Max Muoto and Oskars Putans and Govind Pimpale and Mayank Singamreddy and Nguyen Nhat Minh},
+  author = {HUD and Jay Ram and Lorenss Martinsons and Parth Patel and Oskars Putans and Govind Pimpale and Mayank Singamreddy and Nguyen Nhat Minh},
   title = {{HUD: An Evaluation Platform for Agents}},
   date = {2025-04},
   url = {https://github.com/hud-evals/hud-sdk},

{hud_python-0.2.1 → hud_python-0.2.3}/README.md RENAMED Viewed

@@ -29,17 +29,17 @@ pip install hud-python
 ### Simple Browser Example with Claude Computer Use
-> This example uses the `@job("test-run")` decorator, so the results of this run will appear under the job named "test-run" on the your [HUD Jobs page](https://app.hud.so/jobs).
+> This example uses the `@register_job("test-run")` decorator, so the results of this run will appear under the job named "test-run" on the your [HUD Jobs page](https://app.hud.so/jobs).
 Make sure your have defined your `ANTRHOPIC_API_KEY` in environment variables to run Claude.
 ```python
 import asyncio
-from hud import gym, job
+from hud import gym, register_job
 from hud.task import Task
 from hud.agent import ClaudeAgent
-@job("test-run")
+@register_job("test-run")
 async def main():
     task = Task(
         prompt="Insert the text 'capybara' into the search bar",
@@ -58,10 +58,9 @@ async def main():
     obs, _ = await env.reset() # Gets first observation
     for i in range(5):
         actions, done = await agent.predict(obs)
-        if done:
-            break
         obs, reward, terminated, info = await env.step(actions)
+        if done or terminated: break
     # Evaluate and close
     result = await env.evaluate()
@@ -73,22 +72,37 @@ if __name__ == "__main__":
 ```
+Alternatively, run a full evaluation set via the ```run_job``` command:
+```python
+from hud import load_taskset, run_job, ClaudeAgent
+# load
+taskset = load_taskset("GAIA")
+# evaluate
+job = await run_job(ClaudeAgent, taskset, "test-gaia-job")
+# get results OR view them in app.hud.so
+print(await job.get_analytics())
+```
 ## Documentation Sections
 Explore the core concepts and features of the SDK:
-*   **[Tasks and TaskSets](/concepts/task)**: Define goals, context, setup, and evaluation criteria for agent scenarios. This includes both interactive and **question-answering (QA)** style tasks.
-*   **[Environments](/concepts/environment)**: Understand the browser and OS runtimes where agents interact.
-*   **[Agents](/concepts/agent)**: Learn about the agent architecture (Claude, Operator) and how they process observations and predict actions.
-*   **[Adapters](/concepts/adapter)**: See how actions and observations are translated between agents and environments.
-*   **[Jobs](/concepts/job)**: Group related runs for analysis and viewing on the HUD platform.
-*   **[Trajectories](/concepts/trajectory)**: Understand the recorded data from each agent run.
+*   **[Tasks and TaskSets](https://documentation.hud.so/concepts/task)**: Define goals, context, setup, and evaluation criteria for agent scenarios. This includes both interactive and **question-answering (QA)** style tasks.
+*   **[Environments](https://documentation.hud.so/concepts/environment)**: Understand the browser and OS runtimes where agents interact.
+*   **[Agents](https://documentation.hud.so/concepts/agent)**: Learn about the agent architecture (Claude, Operator) and how they process observations and predict actions.
+*   **[Adapters](https://documentation.hud.so/concepts/adapter)**: See how actions and observations are translated between agents and environments.
+*   **[Jobs](https://documentation.hud.so/concepts/job)**: Group related runs for analysis and viewing on the HUD platform.
+*   **[Trajectories](https://documentation.hud.so/concepts/trajectory)**: Understand the recorded data from each agent run.
 *   **Advanced Topics**:
-    *   **[CLA Action Details](/advanced/cla-details)**: Explore the standardized action format.
-    *   **[Custom Environments](/advanced/custom-environments)**: Build your own Docker-based local or remote environments.
-    *   **[Advanced Environment Control](/advanced/environment-control)**: Use `invoke`, `execute`, and `_setup` for finer control.
+    *   **[CLA Action Details](https://documentation.hud.so/advanced/cla-details)**: Explore the standardized action format.
+    *   **[Custom Environments](https://documentation.hud.so/advanced/custom-environments)**: Build your own Docker-based local or remote environments.
+    *   **[Advanced Environment Control](https://documentation.hud.so/advanced/environment-control)**: Use `invoke`, `execute`, and `_setup` for finer control.
-*   **[Full API Reference](/api-reference/gym)**: Detailed specifications for all modules and classes.
+*   **[Full API Reference](https://documentation.hud.so/api-reference/gym)**: Detailed specifications for all modules and classes.
 ## [Examples](examples/)
@@ -101,7 +115,7 @@ We recommend you first take a look at the example notebooks showing how to use t
 ## Documentation
-For comprehensive guides, examples, and API reference, visit [our docs](https://docs.hud.so/introduction)
+For comprehensive guides, examples, and API reference, visit [our docs](https://documentation.hud.so/introduction)
 ## License
@@ -113,7 +127,7 @@ If you use this SDK in your research, please cite it as follows:
 ```bibtex
 @software{hud2025agentevalplatform,
-  author = {HUD and Jay Ram and Lorenss Martinsons and Parth Patel and Max Muoto and Oskars Putans and Govind Pimpale and Mayank Singamreddy and Nguyen Nhat Minh},
+  author = {HUD and Jay Ram and Lorenss Martinsons and Parth Patel and Oskars Putans and Govind Pimpale and Mayank Singamreddy and Nguyen Nhat Minh},
   title = {{HUD: An Evaluation Platform for Agents}},
   date = {2025-04},
   url = {https://github.com/hud-evals/hud-sdk},

{hud_python-0.2.1 → hud_python-0.2.3}/docs/advanced/cla-details.mdx RENAMED Viewed

@@ -69,6 +69,11 @@ Here are some key CLA types grouped by category:
 *   **`ScreenshotFetch`**: Requests a screenshot (used internally, typically not sent by agents directly).
 *   **`PositionFetch`**: Requests the current cursor position (used internally).
+### Response Actions
+*   **`ResponseAction`**: Used to submit a final text answer.
+    *   `text: str`: The final textual response from the agent.
 ### Custom Actions
 *   **`CustomAction`**: Allows defining arbitrary actions specific to a custom environment controller.

{hud_python-0.2.1 → hud_python-0.2.3}/docs/advanced/environment-control.mdx RENAMED Viewed

@@ -12,11 +12,11 @@ While the standard `step`, `evaluate`, and `close` methods cover most interactio
 The `env._invoke_all()` method (and its underlying `client.invoke()`) is the core mechanism for calling specific functions *within* the environment's controller script.
 ```python
-async def _invoke_all(self, configs: HudStyleConfigs) -> list[Any]: ...
+async def _invoke_all(self, configs: FunctionConfigs) -> list[Any]: ...
 ```
 *   **Purpose:** Execute custom functions defined in your environment controller (the Python code running inside the Docker container or remote instance). This is how `setup` and `evaluate` configurations in a `Task` are ultimately executed.
-*   **Usage:** You provide a configuration (string, tuple, dict, or list) matching the `HudStyleConfigs` format. The SDK sends this to the environment controller, which runs the specified function(s) with the given arguments.
+*   **Usage:** You provide a configuration (string, tuple, dict, or list) matching the `FunctionConfigs` format. The SDK sends this to the environment controller, which runs the specified function(s) with the given arguments.
 *   **When to Use:**
     *   Triggering custom evaluation logic not suitable for the standard `evaluate` attribute.
     *   Running specific diagnostic or state-setting functions within your custom environment controller during development or debugging.
@@ -71,7 +71,7 @@ print("Exit Code:", result['exit_code'])
 ## `_setup`
 ```python
-async def _setup(self, config: HudStyleConfigs | None = None) -> None: ...
+async def _setup(self, config: FunctionConfigs | None = None) -> None: ...
 ```
 *   **Purpose:** Executes the setup configuration for the environment.

{hud_python-0.2.1 → hud_python-0.2.3}/docs/api-reference/env.mdx RENAMED Viewed

@@ -25,11 +25,11 @@ class Environment(pydantic.BaseModel):
     ) -> tuple[Observation, float, bool, dict[str, Any]]: ...
     async def evaluate(
-        self, config: HudStyleConfigs | None = None
+        self, config: FunctionConfigs | None = None
     ) -> Any: ...
     async def reset(
-        self, configs: HudStyleConfigs | None = None
+        self, configs: FunctionConfigs | None = None
     ) -> tuple[Observation, dict[str, Any]]: ...
     async def get_urls(self) -> dict[str, Any]: ...
@@ -37,8 +37,8 @@ class Environment(pydantic.BaseModel):
     async def close(self) -> None: ...
     # Internal/Advanced Methods
-    # async def _setup(self, config: HudStyleConfigs | None = None) -> None: ...
-    # async def _invoke_all(self, configs: HudStyleConfigs) -> list[Any]: ...
+    # async def _setup(self, config: FunctionConfigs | None = None) -> None: ...
+    # async def _invoke_all(self, configs: FunctionConfigs) -> list[Any]: ...
 ```
 Represents a running instance (browser, OS) where an [Agent](/concepts/agent) interacts. Environments are typically created using `hud.gym.make()` rather than direct construction.
@@ -58,11 +58,11 @@ Represents a running instance (browser, OS) where an [Agent](/concepts/agent) in
     *   **Parameters:**
         *   `actions`: List of [CLA](/concepts/adapter) actions, or `None` to get initial observation.
     *   **Returns:** `(Observation, reward, terminated, info)` tuple. `reward` is typically 0 unless overridden by custom logic. `terminated` is typically `False`.
-*   **`evaluate(self, config: HudStyleConfigs | None = None)`:** Runs the evaluation logic defined in the [Task](/concepts/task) (or the provided `config`).
+*   **`evaluate(self, config: FunctionConfigs | None = None)`:** Runs the evaluation logic defined in the [Task](/concepts/task) (or the provided `config`).
     *   **Parameters:**
         *   `config`: Optional override for evaluation logic using [Configuration Styles](/concepts/task#configuration-styles).
     *   **Returns:** The result from the evaluation function(s).
-*   **`reset(self, configs: HudStyleConfigs | None = None)`:** Resets the environment state, usually running setup logic.
+*   **`reset(self, configs: FunctionConfigs | None = None)`:** Resets the environment state, usually running setup logic.
     *   **Parameters:**
         *   `configs`: Optional override for setup logic.
     *   **Returns:** `(Observation, info)` tuple after resetting. *(Note: `gym.make(task)` handles initial setup; direct `reset` is less common).*

{hud_python-0.2.1 → hud_python-0.2.3}/docs/api-reference/gym.mdx RENAMED Viewed

@@ -20,7 +20,7 @@ async def make(
 Creates and initializes an [Environment](/concepts/environment) instance based on a specification.
-This function handles selecting the correct client (local docker, remote docker, remote direct) based on the `env_src`, automatically linking to an active [Job](/concepts/job) (from `@job` decorator or the `job` parameter), and running the initial [Task](/concepts/task) setup if `env_src` is a `Task`.
+This function handles selecting the correct client (local docker, remote docker, remote direct) based on the `env_src`, automatically linking to an active [Job](/concepts/job) (from `@register_job` decorator or the `job` parameter), and running the initial [Task](/concepts/task) setup if `env_src` is a `Task`.
 **Parameters:**
@@ -28,7 +28,7 @@ This function handles selecting the correct client (local docker, remote docker,
     *   If a `str` (Gym ID like `"hud-browser"`, `"OSWorld-Ubuntu"`), creates a standard remote environment.
     *   If a `CustomGym` object, creates a custom environment based on its definition (local or remote docker).
     *   If a `Task` object, uses the `task.gym` attribute to determine the environment type and automatically runs `task.setup` after creation.
-*   **`job` (`Job` | None, optional):** A specific [Job](/concepts/job) object to associate this environment run with. If `None`, it attempts to find an active job created by the `@job` decorator.
+*   **`job` (`Job` | None, optional):** A specific [Job](/concepts/job) object to associate this environment run with. If `None`, it attempts to find an active job created by the `@register_job` decorator.
 *   **`metadata` (dict[str, Any] | None, optional):** Additional metadata to attach to the environment instance and its resulting trajectory.
 **Returns:**

{hud_python-0.2.1 → hud_python-0.2.3}/docs/api-reference/job.mdx RENAMED Viewed

@@ -3,13 +3,13 @@ title: 'hud.job'
 description: 'API reference for Jobs and related functions/decorators'
 ---
-The `hud.job` module provides the `@job` decorator, functions to manage Jobs (`create_job`, `load_job`), and the `Job` class itself.
+The `hud.job` module provides the `@register_job` decorator, functions to manage Jobs (`create_job`, `load_job`), and the `Job` class itself.
 See the [Job Concepts](/concepts/job) page for explanations and usage examples.
 # Decorators
-## @job
+## @register_job
 ```python
 def job(
@@ -92,7 +92,7 @@ class Job(pydantic.BaseModel):
     ) -> list[Trajectory]: ...
 ```
-Represents a Job, typically obtained via `@job`, `create_job`, or `load_job`. Primarily used to access associated trajectories.
+Represents a Job, typically obtained via `@register_job`, `create_job`, or `load_job`. Primarily used to access associated trajectories.
 **Attributes:**

{hud_python-0.2.1 → hud_python-0.2.3}/docs/api-reference/task.mdx RENAMED Viewed

@@ -13,8 +13,8 @@ The `hud.task` module provides the `Task` class for defining evaluation scenario
 class Task(pydantic.BaseModel):
     id: str | None = None
     prompt: str
-    setup: HudStyleConfigs | None = None
-    evaluate: HudStyleConfigs | None = None
+    setup: FunctionConfigs | None = None
+    evaluate: FunctionConfigs | None = None
     gym: Gym | None = None
     target: str | list[str] | None = None # Inspect compatibility
     choices: list[str] | None = None      # Inspect compatibility
@@ -33,8 +33,8 @@ See the [Tasks and TaskSets Concepts](/concepts/task) page for detailed explanat
 *   **`id` (str | None):** Optional unique identifier, often assigned when loaded from the HUD platform.
 *   **`prompt` (str):** The main instruction or goal for the agent.
-*   **`setup` (`HudStyleConfigs` | None):** Configuration for setup actions executed before the agent starts. See [Configuration Styles](/concepts/task#configuration-styles).
-*   **`evaluate` (`HudStyleConfigs` | None):** Configuration defining the evaluation logic executed by `env.evaluate()`. See [Configuration Styles](/concepts/task#configuration-styles).
+*   **`setup` (`FunctionConfigs` | None):** Configuration for setup actions executed before the agent starts. See [Configuration Styles](/concepts/task#configuration-styles).
+*   **`evaluate` (`FunctionConfigs` | None):** Configuration defining the evaluation logic executed by `env.evaluate()`. See [Configuration Styles](/concepts/task#configuration-styles).
 *   **`gym` (`Gym` | None):** Specifies the required environment type (e.g., `"hud-browser"`, `CustomGym` object). See `hud.types`.
 *   **`target` (str | list[str] | None):** Ideal target output (primarily for compatibility with `inspect-ai`).
 *   **`choices` (list[str] | None):** Multiple choice options (primarily for compatibility with `inspect-ai`).

{hud_python-0.2.1 → hud_python-0.2.3}/docs/concepts/environment.mdx RENAMED Viewed

@@ -46,7 +46,21 @@ env_os = await gym.make("OSWorld-Ubuntu")
 # await env_os.close()
 ```
-Environments created this way won't have a default `Task` associated unless you explicitly reset them with one later using `env.reset()`. The `gym.make()` function also automatically links the environment to an active [Job](/concepts/job) if one was defined using the `@job` decorator.
+Environments created this way won't have a default `Task` associated unless you explicitly reset them with one later using `env.reset()`. The `gym.make()` function also automatically links the environment to an active [Job](/concepts/job) if one was defined using the `@register_job` decorator.
+## Available Environment Types
+The HUD SDK provides several standard environment types, specified via the `gym` attribute in a [Task](/concepts/task) or directly in `hud.gym.make()`:
+*   **`"hud-browser"`**: Provides a remote Chromium browser instance managed via Playwright. Ideal for web navigation, form interaction, and testing web applications.
+    *   [See `hud-browser` Details](../environments/hud-browser.mdx)
+*   **`"hud-ubuntu"`**: Provides a remote Ubuntu desktop environment accessed via VNC. Suitable for tasks involving GUI applications, file system interaction, or running Linux software.
+    *   [See `hud-ubuntu` Details](../environments/hud-ubuntu.mdx)
+*   **`"qa"`**: A non-interactive environment for question-answering tasks where the agent provides a direct textual response.
+    *   [See `qa` Environment Details](../environments/qa.mdx)
+*   **`CustomGym`**: Allows defining and running your own [Custom Environments](../advanced/custom-environments.mdx) using Docker, either locally or remotely. This provides maximum flexibility for specific testing needs.
+The `gym` attribute in a Task tells `hud.gym.make()` which environment to instantiate.
 ## Interaction Loop
@@ -78,10 +92,10 @@ for _ in range(10):
 ## Key Methods
 *   **`env.step(actions: list[CLA] | None = None)`**: Executes actions (or gets initial state). Returns `(Observation, reward, terminated, info)`.
-*   **`env.evaluate(config: HudStyleConfigs | None = None)`**: Runs evaluation logic defined in the [Task](/concepts/task) (or the provided `config`). Returns evaluation result.
+*   **`env.evaluate(config: FunctionConfigs | None = None)`**: Runs evaluation logic defined in the [Task](/concepts/task) (or the provided `config`). Returns evaluation result.
 *   **`env.close()`**: Shuts down the environment. Saves the [Trajectory](/concepts/trajectory) if linked to a [Job](/concepts/job).
 *   **`env.get_urls()`**: Returns URLs (`url`, `live_url`) for accessing/viewing the environment.
-*   **`env.reset(configs: HudStyleConfigs | None = None)`**: Resets state, often running setup steps. *Mostly used internally or for environments created without an initial Task.*
+*   **`env.reset(configs: FunctionConfigs | None = None)`**: Resets state, often running setup steps. *Mostly used internally or for environments created without an initial Task.*
 *   **`env._setup(...)` / `env._invoke_all(...)`**: Internal methods for running setup/evaluate/custom configurations defined in a [Task](/concepts/task).
 ## Observations
@@ -96,5 +110,5 @@ The `Observation` object returned by `env.step()` contains:
 *   [Task](/concepts/task): Defines the environment type (`gym`), `setup`, and `evaluate` logic.
 *   [Agent](/concepts/agent): Interacts with the Environment via the `step` and `predict` methods.
 *   [Adapter](/concepts/adapter): Ensures actions passed to `step` are in the correct `CLA` format.
-*   [Job](/concepts/job): Groups environment runs; linking happens via `@job` or `gym.make(job=...)`.
+*   [Job](/concepts/job): Groups environment runs; linking happens via `@register_job` or `gym.make(job=...)`.
 *   [Trajectory](/concepts/trajectory): The recording generated when a job-linked environment is closed.

{hud_python-0.2.1 → hud_python-0.2.3}/docs/concepts/job.mdx RENAMED Viewed

@@ -18,16 +18,16 @@ Jobs help organize evaluation data, useful for:
 ## Creating Jobs
-### 1. The `@job` Decorator (Recommended)
+### 1. The `@register_job` Decorator (Recommended)
 Decorate an `async` function. A new Job is created per function call, and any environments created within using `hud.gym.make()` are automatically linked.
 ```python
-from hud import gym, job
+from hud import gym, register_job
 from hud.task import Task
 from hud.agent import OperatorAgent # Example agent
-@job(name="my-evaluation-run", metadata={"agent_version": "1.1"})
+@register_job(name="my-evaluation-run", metadata={"agent_version": "1.1"})
 async def run_evaluation():
     task = Task(prompt="Example", gym="hud-browser")
     env = await gym.make(task) # Linked to "my-evaluation-run" job
@@ -89,7 +89,7 @@ async def analyze_job(job_id: str):
 ## Best Practices
-*   Use `@job` for most scripts.
+*   Use `@register_job` for most scripts.
 *   Use descriptive names and metadata.
 *   Create separate jobs for distinct experiments.

{hud_python-0.2.1 → hud_python-0.2.3}/docs/concepts/task.mdx RENAMED Viewed

@@ -15,8 +15,8 @@ A `Task` object provides the configuration for a specific scenario.
 *   **`prompt` (str):** The primary instruction given to the agent.
 *   **`gym` (str | `CustomGym` | None):** Specifies the type of [Environment](/concepts/environment) needed. Used by `hud.gym.make()`.
-*   **`setup` (`HudStyleConfigs` | None):** Defines actions executed *before* the agent starts. See [Setup Configuration](#setup-configuration).
-*   **`evaluate` (`HudStyleConfigs` | None):** Defines how to check if the agent succeeded *after* interaction. See [Evaluation Configuration](#evaluation-configuration).
+*   **`setup` (`FunctionConfigs` | None):** Defines actions executed *before* the agent starts. See [Setup Configuration](#setup-configuration).
+*   **`evaluate` (`FunctionConfigs` | None):** Defines how to check if the agent succeeded *after* interaction. See [Evaluation Configuration](#evaluation-configuration).
 *   **`id` (str | None):** Optional identifier.
 *   **`metadata` (dict | None):** Optional dictionary for extra information.
 *   **`config` (dict | None):** Optional dictionary, primarily for remote execution.
@@ -30,11 +30,10 @@ task = Task(
     prompt="Log in to example.com with username 'test'",
     gym="hud-browser", # Request a browser environment
     setup=[ # Actions run by gym.make(task)
-        ("goto", "https://example.com/login"),
-        {"function": "wait_for_element", "args": ["#username"]}
+        ("goto", "https://example.com/login")
     ],
     evaluate={ # Logic run by env.evaluate()
-        "function": "check_login_status",
+        "function": "page_contains",
         "args": ["test"]
     }
 )
@@ -42,7 +41,7 @@ task = Task(
 ### <a name="configuration-styles"></a>Configuration Styles (`setup` and `evaluate`)
-Both `setup` and `evaluate` accept configurations defining function calls within the environment's controller, using flexible formats (`HudStyleConfigs`):
+Both `setup` and `evaluate` accept configurations defining function calls within the environment's controller, using flexible formats (`FunctionConfigs`):
 1.  **String:** `"browser.maximize"`
 2.  **Tuple:** `("goto", "https://google.com")`
@@ -82,11 +81,13 @@ Load predefined sets from the HUD platform:
 ```python
 from hud import load_taskset
-taskset = await load_taskset("OSWorld-Ubuntu-Links")
+taskset = await load_taskset("OSWorld-Ubuntu")
 print(f"Number of tasks: {len(taskset)}") # TaskSet acts like a list
 first_task = taskset[0]
 ```
+Currently supported TaskSets available via `load_taskset` include OSWorld, GAIA, and WebVoyager subsets.
 ### Creating a TaskSet Manually
 ```python

{hud_python-0.2.1 → hud_python-0.2.3}/docs/docs.json RENAMED Viewed

@@ -29,6 +29,14 @@
           "concepts/trajectory"
         ]
       },
+      {
+        "group": "Environments",
+        "pages": [
+          "environments/hud-browser",
+          "environments/hud-ubuntu",
+          "environments/qa"
+        ]
+      },
       {
         "group": "Advanced Topics",
         "pages": [

hud_python-0.2.3/docs/environments/hud-browser.mdx ADDED Viewed

@@ -0,0 +1,67 @@
+# HUD Browser Environment
+## Introduction
+The `hud-browser` environment provides a remote Chromium browser instance, managed by Playwright, for agents to interact with websites. It's ideal for tasks involving web navigation, form filling, information retrieval, and testing web applications.
+## Setup
+Setup actions for the `hud-browser` are defined in the `setup` attribute of a [Task](../concepts/task.mdx) and executed by `hud.gym.make()`. They typically involve browser controller functions.
+*   **`goto(url: str)`**: Navigates the browser to the specified `url`. Automatically prepends `http://` if no scheme is provided. Waits for `domcontentloaded` (up to 10s timeout) and adds a 1s wait for rendering.
+    ```python
+    # Example Task Setup:
+    setup=[("goto", "https://google.com")]
+    ```
+*   **Other common setup functions coming soon:** `wait_for_element`, `click`, `type`, `set_cookies` etc.
+Refer to [Task Setup Configuration](../concepts/task.mdx#setup-configuration) for how to define these.
+## Step Interaction
+Agents interact with the browser environment by sending a list of [CLA Actions](../advanced/cla-details.mdx) to `env.step()`. An [Adapter](../concepts/adapter.mdx) typically handles the conversion from the agent model's output to the CLA format.
+Common CLAs used with `hud-browser`:
+*   [`ClickAction`](../advanced/cla-details.mdx#mouse-actions)
+*   [`MoveAction`](../advanced/cla-details.mdx#mouse-actions)
+*   [`TypeAction`](../advanced/cla-details.mdx#keyboard-actions)
+*   [`PressAction`](../advanced/cla-details.mdx#keyboard-actions)
+*   [`ScrollAction`](../advanced/cla-details.mdx#mouse-actions)
+*   [`DragAction`](../advanced/cla-details.mdx#mouse-actions)
+*   [`ResponseAction`](../advanced/cla-details.mdx#response-actions) (to submit a final text answer)
+*See [CLA Action Details](../advanced/cla-details.mdx) for the full specification.*
+## Evaluate
+The `evaluate` attribute of a [Task](../concepts/task.mdx) defines how success is measured using `env.evaluate()`. This calls functions within the browser controller.
+Built-in evaluation functions for `hud-browser`:
+*   **`url_match(expected_url: str)`**: Checks if the current browser URL exactly matches `expected_url`. Returns `1.0` for a match, `0.0` otherwise.
+    ```python
+    # Example Task Evaluation:
+    evaluate=("url_match", "https://google.com/search?q=expected")
+    ```
+*   **`page_contains(texts: list[str])`** (alias `contains_text`): Checks if *all* strings in `texts` are present in `page.content()`. Returns `1.0` if all texts are found, `0.0` otherwise.
+    ```python
+    # Example Task Evaluation:
+    evaluate=("page_contains", ["Search Results", "About 1,000,000 results"])
+    ```
+*   **`sheet_contains(texts: list[str])`**: Custom function for Google Sheets. Returns `1.0` if any text is found, `0.0` otherwise.
+    ```python
+    # Example Task Evaluation:
+    evaluate=("sheet_contains", ["Expected value in cell A1"])
+    ```
+*   **`cookie_exists(cookie_names: list[str])`**: Checks if all cookies in `cookie_names` exist in `context.cookies()`. Returns `1.0` if all exist, `0.0` otherwise.
+    ```python
+    # Example Task Evaluation:
+    evaluate=("cookie_exists", ["session_id", "user_pref"])
+    ```
+*   **`cookie_match(name_value_pairs: list[str])`**: Checks if cookies exist *and* match expected values. `name_value_pairs` format: `[name1, value1, name2, value2, ...]`. Returns `1.0` if all match, `0.0` otherwise.
+    ```python
+    # Example Task Evaluation:
+    evaluate=("cookie_match", ["user_id", "12345", "theme", "dark"])
+    ```
+Refer to [Task Evaluation Configuration](../concepts/task.mdx#evaluation-configuration) for more details.

hud-python 0.2.1__tar.gz → 0.2.3__tar.gz

hud-python 0.2.1tar.gz → 0.2.3tar.gz