PyPI - hud-python - Versions diffs - 0.2.0__tar.gz → 0.2.1__tar.gz - Mend

hud-python 0.2.0tar.gz → 0.2.1tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Potentially problematic release.

This version of hud-python might be problematic. Click here for more details.

Files changed (105) hide show

{hud_python-0.2.0 → hud_python-0.2.1}/PKG-INFO RENAMED Viewed

@@ -1,9 +1,9 @@
 Metadata-Version: 2.4
 Name: hud-python
-Version: 0.2.0
+Version: 0.2.1
 Summary: SDK for the HUD evaluation platform.
-Project-URL: Homepage, https://github.com/Human-Data/hud-sdk
-Project-URL: Bug Tracker, https://github.com/Human-Data/hud-sdk/issues
+Project-URL: Homepage, https://github.com/hud-evals/hud-sdk
+Project-URL: Bug Tracker, https://github.com/hud-evals/hud-sdk/issues
 Project-URL: Documentation, https://hud.so
 Author-email: Human Union Data SDK <founders@hud.so>
 License: MIT License
@@ -57,7 +57,7 @@ Requires-Dist: pytest<9,>=8.1.1; extra == 'dev'
 Requires-Dist: ruff==0.9.8; extra == 'dev'
 Description-Content-Type: text/markdown
-# HUD SDK - Human-Agent Interaction Toolkit
+# HUD
 A Python SDK for creating, evaluating, and benchmarking agent interactions with web browsers and OS environments.
@@ -86,21 +86,20 @@ export HUD_API_KEY=your_api_key_here
 pip install hud-python
 ```
-### Simple Browser Example with Operator
+### Simple Browser Example with Claude Computer Use
 > This example uses the `@job("test-run")` decorator, so the results of this run will appear under the job named "test-run" on the your [HUD Jobs page](https://app.hud.so/jobs).
+Make sure your have defined your `ANTRHOPIC_API_KEY` in environment variables to run Claude.
 ```python
-import os
 import asyncio
 from hud import gym, job
 from hud.task import Task
-from hud.utils import stream
-from hud.agent import OperatorAgent
+from hud.agent import ClaudeAgent
 @job("test-run")
 async def main():
-    # Define a simple task
     task = Task(
         prompt="Insert the text 'capybara' into the search bar",
         gym="hud-browser",
@@ -108,26 +107,20 @@ async def main():
         evaluate=("contains_text", "capybara")
     )
-    # Create environment
+    # Create environment using the gym module
     env = await gym.make(task)
-    # Get URLs and display live view (optional)
-    # urls = await env.get_urls()
-    # stream(urls["live_url"])
     # Initialize Operator agent (API key is loaded automatically)
-    agent = OperatorAgent()
+    agent = ClaudeAgent()
-    # Agent loop
-    obs, _ = env.reset()
+    # Agent loop with predict and step functions
+    obs, _ = await env.reset() # Gets first observation
     for i in range(5):
         actions, done = await agent.predict(obs)
         if done:
             break
         obs, reward, terminated, info = await env.step(actions)
-        if terminated:
-            break
     # Evaluate and close
     result = await env.evaluate()
@@ -143,26 +136,26 @@ if __name__ == "__main__":
 Explore the core concepts and features of the SDK:
-*   **[Tasks and TaskSets](/concepts/task)**: Define goals, context, setup, and evaluation criteria for agent scenarios.
+*   **[Tasks and TaskSets](/concepts/task)**: Define goals, context, setup, and evaluation criteria for agent scenarios. This includes both interactive and **question-answering (QA)** style tasks.
 *   **[Environments](/concepts/environment)**: Understand the browser and OS runtimes where agents interact.
 *   **[Agents](/concepts/agent)**: Learn about the agent architecture (Claude, Operator) and how they process observations and predict actions.
 *   **[Adapters](/concepts/adapter)**: See how actions and observations are translated between agents and environments.
 *   **[Jobs](/concepts/job)**: Group related runs for analysis and viewing on the HUD platform.
 *   **[Trajectories](/concepts/trajectory)**: Understand the recorded data from each agent run.
 *   **Advanced Topics**:
+    *   **[CLA Action Details](/advanced/cla-details)**: Explore the standardized action format.
     *   **[Custom Environments](/advanced/custom-environments)**: Build your own Docker-based local or remote environments.
     *   **[Advanced Environment Control](/advanced/environment-control)**: Use `invoke`, `execute`, and `_setup` for finer control.
-    *   **[CLA Action Details](/advanced/cla-details)**: Dive deeper into the standardized action format.
 *   **[Full API Reference](/api-reference/gym)**: Detailed specifications for all modules and classes.
 ## [Examples](examples/)
-We provide several example notebooks showing how to use the HUD SDK:
+We recommend you first take a look at the example notebooks showing how to use the HUD SDK:
 1. [Browser Basics](examples/browser_use.ipynb) - Simple browser interaction with live view
 2. [Task Design](examples/tasks.ipynb) - Creating and customizing tasks
-3. [OSWorld](examples/osworld.ipynb) - Working with OS environments
+3. [OSWorld](examples/osworld.ipynb) - Running the OSWorld benchmark
 4. [Local Development](examples/local.ipynb) - Setting up local custom environments
 ## Documentation
@@ -180,9 +173,9 @@ If you use this SDK in your research, please cite it as follows:
 ```bibtex
 @software{hud2025agentevalplatform,
   author = {HUD and Jay Ram and Lorenss Martinsons and Parth Patel and Max Muoto and Oskars Putans and Govind Pimpale and Mayank Singamreddy and Nguyen Nhat Minh},
-  title = {{HUD: An Evaluation Platform for Computer Use Agents}},
-  date = {2025-03},
-  url = {https://github.com/Human-Data/hud-sdk},
+  title = {{HUD: An Evaluation Platform for Agents}},
+  date = {2025-04},
+  url = {https://github.com/hud-evals/hud-sdk},
   langid = {en}
 }
 ```

{hud_python-0.2.0 → hud_python-0.2.1}/README.md RENAMED Viewed

@@ -1,4 +1,4 @@
-# HUD SDK - Human-Agent Interaction Toolkit
+# HUD
 A Python SDK for creating, evaluating, and benchmarking agent interactions with web browsers and OS environments.
@@ -27,21 +27,20 @@ export HUD_API_KEY=your_api_key_here
 pip install hud-python
 ```
-### Simple Browser Example with Operator
+### Simple Browser Example with Claude Computer Use
 > This example uses the `@job("test-run")` decorator, so the results of this run will appear under the job named "test-run" on the your [HUD Jobs page](https://app.hud.so/jobs).
+Make sure your have defined your `ANTRHOPIC_API_KEY` in environment variables to run Claude.
 ```python
-import os
 import asyncio
 from hud import gym, job
 from hud.task import Task
-from hud.utils import stream
-from hud.agent import OperatorAgent
+from hud.agent import ClaudeAgent
 @job("test-run")
 async def main():
-    # Define a simple task
     task = Task(
         prompt="Insert the text 'capybara' into the search bar",
         gym="hud-browser",
@@ -49,26 +48,20 @@ async def main():
         evaluate=("contains_text", "capybara")
     )
-    # Create environment
+    # Create environment using the gym module
     env = await gym.make(task)
-    # Get URLs and display live view (optional)
-    # urls = await env.get_urls()
-    # stream(urls["live_url"])
     # Initialize Operator agent (API key is loaded automatically)
-    agent = OperatorAgent()
+    agent = ClaudeAgent()
-    # Agent loop
-    obs, _ = env.reset()
+    # Agent loop with predict and step functions
+    obs, _ = await env.reset() # Gets first observation
     for i in range(5):
         actions, done = await agent.predict(obs)
         if done:
             break
         obs, reward, terminated, info = await env.step(actions)
-        if terminated:
-            break
     # Evaluate and close
     result = await env.evaluate()
@@ -84,26 +77,26 @@ if __name__ == "__main__":
 Explore the core concepts and features of the SDK:
-*   **[Tasks and TaskSets](/concepts/task)**: Define goals, context, setup, and evaluation criteria for agent scenarios.
+*   **[Tasks and TaskSets](/concepts/task)**: Define goals, context, setup, and evaluation criteria for agent scenarios. This includes both interactive and **question-answering (QA)** style tasks.
 *   **[Environments](/concepts/environment)**: Understand the browser and OS runtimes where agents interact.
 *   **[Agents](/concepts/agent)**: Learn about the agent architecture (Claude, Operator) and how they process observations and predict actions.
 *   **[Adapters](/concepts/adapter)**: See how actions and observations are translated between agents and environments.
 *   **[Jobs](/concepts/job)**: Group related runs for analysis and viewing on the HUD platform.
 *   **[Trajectories](/concepts/trajectory)**: Understand the recorded data from each agent run.
 *   **Advanced Topics**:
+    *   **[CLA Action Details](/advanced/cla-details)**: Explore the standardized action format.
     *   **[Custom Environments](/advanced/custom-environments)**: Build your own Docker-based local or remote environments.
     *   **[Advanced Environment Control](/advanced/environment-control)**: Use `invoke`, `execute`, and `_setup` for finer control.
-    *   **[CLA Action Details](/advanced/cla-details)**: Dive deeper into the standardized action format.
 *   **[Full API Reference](/api-reference/gym)**: Detailed specifications for all modules and classes.
 ## [Examples](examples/)
-We provide several example notebooks showing how to use the HUD SDK:
+We recommend you first take a look at the example notebooks showing how to use the HUD SDK:
 1. [Browser Basics](examples/browser_use.ipynb) - Simple browser interaction with live view
 2. [Task Design](examples/tasks.ipynb) - Creating and customizing tasks
-3. [OSWorld](examples/osworld.ipynb) - Working with OS environments
+3. [OSWorld](examples/osworld.ipynb) - Running the OSWorld benchmark
 4. [Local Development](examples/local.ipynb) - Setting up local custom environments
 ## Documentation
@@ -121,9 +114,9 @@ If you use this SDK in your research, please cite it as follows:
 ```bibtex
 @software{hud2025agentevalplatform,
   author = {HUD and Jay Ram and Lorenss Martinsons and Parth Patel and Max Muoto and Oskars Putans and Govind Pimpale and Mayank Singamreddy and Nguyen Nhat Minh},
-  title = {{HUD: An Evaluation Platform for Computer Use Agents}},
-  date = {2025-03},
-  url = {https://github.com/Human-Data/hud-sdk},
+  title = {{HUD: An Evaluation Platform for Agents}},
+  date = {2025-04},
+  url = {https://github.com/hud-evals/hud-sdk},
   langid = {en}
 }
 ```

{hud_python-0.2.0 → hud_python-0.2.1}/docs/concepts/environment.mdx RENAMED Viewed

@@ -67,13 +67,14 @@ obs, _ = await env.reset()
 for _ in range(10):
     # 2. Agent predicts action(s)
     actions, done = await agent.predict(obs)
-    if done: break
     # 3. Execute action(s) in environment
     obs, reward, terminated, info = await env.step(actions)
-    if terminated: break
+    if done or terminated: break
 ```
+*   **Note on QA Tasks:** For [Question-Answering Tasks](/concepts/task#defining-question-answering-qa-tasks), the agent might only need one `predict` call. The agent should output a `ResponseAction`, which the environment stores. The subsequent `env.evaluate()` call then checks this stored response. The environment itself remains largely passive for QA.
 ## Key Methods
 *   **`env.step(actions: list[CLA] | None = None)`**: Executes actions (or gets initial state). Returns `(Observation, reward, terminated, info)`.

{hud_python-0.2.0 → hud_python-0.2.1}/docs/concepts/task.mdx RENAMED Viewed

@@ -60,7 +60,10 @@ Both `setup` and `evaluate` accept configurations defining function calls within
 *   **Purpose:** Determines task success after the agent finishes.
 *   **Execution:** Triggered by `await env.evaluate()`.
 *   **Result:** The return value of `env.evaluate()`, often a reward score (e.g., `1.0` or `0.0`). This is stored in the `reward` field of the [Trajectory](/concepts/trajectory) if linked to a [Job](/concepts/job).
-*   **Examples:** `("contains_text", "Success!")`, `("file_exists", "/path/to/output.txt")`. Check specific environment controller docs for available functions.
+*   **Examples:**
+    *   Interactive: `("contains_text", "Success!")`, `("file_exists", "/path/to/output.txt")`. These typically call functions *within* the active environment controller.
+    *   QA: `("response_includes", "Paris")`. These functions often check the text stored in `env.final_response` (which comes from the agent's `ResponseAction`).
+*   **Note:** Check specific environment or evaluation service documentation for available functions.
 ## TaskSet
@@ -99,4 +102,31 @@ my_taskset = TaskSet(tasks=[task1, task2], description="My set")
 *   [Environment](/concepts/environment): Where Tasks are executed and evaluated.
 *   [Agent](/concepts/agent): Aims to complete the Task `prompt`.
 *   [Job](/concepts/job): Groups runs of different Tasks.
-*   [Trajectory](/concepts/trajectory): Records the execution of a Task.
+*   [Trajectory](/concepts/trajectory): Records the execution of a Task.
+### Defining Question-Answering (QA) Tasks
+While HUD excels at interactive tasks, you can also define tasks that are primarily question-answering. The key differences are:
+*   **`gym`:** You might still use an existing environment type like `"hud-browser"` if you want the QA to happen *within* that context (e.g., asking the agent to answer based on a webpage). For pure QA without environment interaction, a future specific `"qa"` gym type might be introduced, but currently, you'd use an existing type.
+*   **`prompt`:** Contains the question for the agent.
+*   **`setup`:** Often minimal or unnecessary for pure QA.
+*   **`evaluate`:** Defines how to check the agent's final text answer. This typically involves calling a specific evaluation function that compares the agent's final submitted response (see `ResponseAction` in [CLA Details](/advanced/cla-details)) against expected criteria. The `env.final_response` attribute holds the text submitted by the agent via `ResponseAction`.
+*   **`target`:** (Recommended) Store the ground truth answer in the `metadata` or potentially a dedicated `target` field for clarity during evaluation function design.
+```python
+from hud.task import Task
+qa_task = Task(
+    prompt="What is the powerhouse of the cell?",
+    gym="hud-browser", # Or potentially a future "qa" type
+    # No complex setup needed for pure QA
+    setup=(),
+    # Evaluation checks the agent's final submitted text response
+    evaluate=("response_includes", "mitochondria"), # Assumes a function checking env.final_response
+)
+```
+The [Agent](/concepts/agent) handling such a task should recognize it doesn't need complex interaction and output a `ResponseAction` containing the final answer. The `env.evaluate()` call then triggers the specified check (like `response_includes`) against the stored response.
+### <a name="configuration-styles"></a>Configuration Styles (`setup` and `evaluate`)

{hud_python-0.2.0 → hud_python-0.2.1}/docs/docs.json RENAMED Viewed

@@ -14,6 +14,7 @@
         "group": "Getting Started",
         "pages": [
           "quickstart",
+          "running-your-agent",
           "installation"
         ]
       },
@@ -31,9 +32,9 @@
       {
         "group": "Advanced Topics",
         "pages": [
+          "advanced/cla-details",
           "advanced/custom-environments",
-          "advanced/environment-control",
-          "advanced/cla-details"
+          "advanced/environment-control"
         ]
       },
       {
@@ -59,13 +60,13 @@
     "links": [
       {
         "label": "GitHub",
-        "href": "https://github.com/Human-Data/hud-sdk"
+        "href": "https://github.com/hud-evals/hud-sdk"
       }
     ]
   },
   "footer": {
     "socials": {
-      "github": "https://github.com/Human-Data/hud-sdk",
+      "github": "https://github.com/hud-evals/hud-sdk",
       "website": "https://hud.so"
     }
   }

{hud_python-0.2.0 → hud_python-0.2.1}/docs/quickstart.mdx RENAMED Viewed

@@ -31,7 +31,7 @@ The SDK automatically loads API keys from environment variables or a `.env` file
 Example `.env` file:
 ```
-HUD_API_KEY=hud_...
+HUD_API_KEY=sk-hud-...
 OPENAI_API_KEY=sk-...
 # ANTHROPIC_API_KEY=sk-ant-...
 ```
@@ -79,16 +79,10 @@ async def main():
         actions, done = await agent.predict(obs)
         print(f"Agent action(s): {actions}")
-        if done:
-            print("Agent signaled task completion.")
-            break
         # Execute the action(s) in the environment
         obs, reward, terminated, info = await env.step(actions)
-        if terminated:
-            print("Environment terminated.")
-            break
+        if done or terminated: break # Agent signaled task completion or environment terminated
     # 5. Evaluate & Close
     print("Evaluating task...")
@@ -127,4 +121,4 @@ if __name__ == "__main__":
 *   Explore the [Core Concepts](/concepts/environment) to understand the SDK architecture in more detail.
 *   Check out the [Examples folder in the GitHub repo](/examples/) for more detailed, runnable notebooks covering different agents and environments.
-*   Review the [API Reference](/api-reference/gym) for comprehensive documentation on specific functions and classes.
+*   Review the [API Reference](/api-reference/gym) for comprehensive documentation on specific functions and classes.

hud_python-0.2.1/docs/running-your-agent.mdx ADDED Viewed

@@ -0,0 +1,237 @@
+---
+title: 'Running Your Own Agent'
+description: 'Integrating custom agent logic with HUD environments'
+---
+# Running Your Own Agent
+The HUD SDK is designed to be flexible, allowing you to integrate various types of AI agents. While the SDK provides built-in agents (`ClaudeAgent`, `OperatorAgent`), you can easily run your own custom agent logic. This guide outlines the primary ways to achieve this.
+The core interaction loop with any HUD [Environment](/concepts/environment) involves:
+1. Creating the environment: `env = await hud.gym.make(...)`
+2. Getting an initial observation: `obs, _  = await env.reset()`
+3. **Agent Decision:** Processing `obs` to decide on the next action(s).
+4. Executing actions: `obs, _, _, _ = await env.step(actions)`
+5. Evaluating the outcome: `result = await env.evaluate()`
+6. Closing the environment: `await env.close()`
+The key difference lies in how **Step 3 (Agent Decision)** is implemented and how the resulting `actions` are formatted for **Step 4**.
+## Approach 1: Direct CLA Interaction
+This is the most straightforward approach if your agent logic can directly generate actions conforming to HUD's **Common Language Actions (CLA)** format. See [CLA Action Details](/advanced/cla-details) for format specifics.
+*   **Concept:** Your agent code, running outside the HUD `Agent` class structure, processes the `Observation` and directly constructs a list of `CLA` objects.
+*   **Implementation:**
+    *   Focus on your agent's decision-making process based on `obs.screenshot` and `obs.text`.
+    *   Your agent's output must be `list[CLA]`. You'll need to import specific `CLA` types (like `ClickAction`, `TypeAction`, etc.) from `hud.adapters.common.types`.
+    *   Pass this list directly to `env.step()`.
+```python
+import asyncio
+from hud import gym, job
+from hud.task import Task
+from hud.env import Observation
+# Import specific CLA types you need
+from hud.adapters import CLA
+from hud.adapters.common.types import ClickAction, TypeAction, Point
+# --- Your Custom Agent Logic ---
+def my_custom_agent_logic(observation: Observation) -> list[CLA]:
+    # Process screenshot/text...
+    # Decide on next actions...
+    # Example: Click at (100, 150) and type "hello"
+    actions = [
+        ClickAction(point=Point(x=100, y=150)),
+        TypeAction(text="hello")
+    ]
+    # Ensure the return type is list[CLA]
+    return actions
+@job("custom-cla-agent-run")
+async def main():
+    task = Task(prompt="Click and type", gym="hud-browser")
+    env = await gym.make(task)
+    obs, _ = await env.reset() # Initial observation
+    for i in range(5):
+        print(f"--- Step {i+1} ---")
+        # Get actions directly from your logic
+        agent_actions: list[CLA] = my_custom_agent_logic(obs)
+        print(f"Agent actions: {agent_actions}")
+        # Step the environment with CLA actions
+        obs, _, _, terminated, info = await env.step(agent_actions)
+        if terminated: break # Check termination
+    result = await env.evaluate()
+    print(f"Evaluation: {result}")
+    await env.close()
+# if __name__ == "__main__":
+#     asyncio.run(main())
+```
+*   **Pros:** Simple integration, doesn't require understanding the `Agent`/`Adapter` inheritance structure.
+*   **Cons:** Your agent logic needs to be aware of and construct specific `CLA` Pydantic models. No automatic observation preprocessing (like screenshot rescaling) or action postprocessing (like coordinate rescaling) provided by the `Adapter` framework.
+## Approach 2: Inheriting `hud.agent.Agent`
+This approach leverages the SDK's structure for a more integrated solution.
+*   **Concept:** Create a class that inherits from `hud.agent.Agent`. Implement the `fetch_response` method to contain your core agent logic (calling your model, processing results). Optionally, create a custom `hud.adapters.Adapter` if your model uses a non-standard action format or requires specific observation rescaling.
+*   **Implementation:**
+    *   Define `MyAgent(Agent[MyClientType, MyRawActionType])`.
+    *   Implement `async def fetch_response(self, observation: Observation) -> tuple[list[MyRawActionType], bool]: ...`. This method should return the *raw* actions from your model and a `done` flag.
+    *   (Optional) Define `MyAdapter(Adapter)` and implement `convert(self, raw_action: MyRawActionType) -> CLA: ...`. You might also override `__init__` to set `self.agent_width`/`height` if different from the default.
+    *   Instantiate your agent, optionally passing your custom adapter: `agent = MyAgent(client=my_llm_client, adapter=MyAdapter())`. If you provide an adapter, the base `Agent.predict` method will automatically call `adapter.rescale` before `fetch_response` and `adapter.adapt_list` after.
+```python
+import asyncio
+from typing import Any # Placeholder for your raw action type
+from hud import gym, job
+from hud.task import Task
+from hud.env import Observation
+from hud.agent import Agent # Import base class
+from hud.adapters import Adapter, CLA # Import base adapter and CLA type
+# Import your specific CLA types if needed for a custom adapter
+from hud.adapters.common.types import ClickAction, TypeAction, Point
+# --- Your Custom Agent ---
+class MyRawAction(dict): # Example raw action type (e.g., a dictionary)
+    pass
+class MyAgent(Agent[Any, MyRawAction]): # Specify client type and raw action type
+    # You might initialize your LLM client here
+    def __init__(self, adapter: Adapter | None = None): # Optionally take an adapter
+        super().__init__(client=None, adapter=adapter) # Pass adapter to base
+    async def fetch_response(self, observation: Observation) -> tuple[list[MyRawAction], bool]:
+        # 1. Process observation (screenshot already rescaled if adapter exists)
+        prompt = f"Image received. Task: {observation.text}. What to do?"
+        # 2. Call your custom LLM / logic
+        # llm_response = await my_llm_call(prompt, observation.screenshot)
+        llm_response = {"action_type": "click", "x": 200, "y": 250} # Dummy response
+        # 3. Convert LLM response to your raw action format
+        raw_actions: list[MyRawAction] = [MyRawAction(llm_response)] # Example
+        done = False # Decide if task is done
+        return raw_actions, done
+# --- (Optional) Your Custom Adapter ---
+class MyAdapter(Adapter):
+    def __init__(self):
+        super().__init__()
+        self.agent_width = 1000 # Example: If your model expects 1000px wide images
+        self.agent_height = 800
+    def convert(self, raw_action: MyRawAction) -> CLA:
+        # Convert your raw action dict to a CLA Pydantic model
+        if raw_action.get("action_type") == "click":
+            return ClickAction(point=Point(x=raw_action["x"], y=raw_action["y"]))
+        elif raw_action.get("action_type") == "type":
+             return TypeAction(text=raw_action.get("text", ""))
+        # ... handle other action types ...
+        raise ValueError(f"Unknown raw action type: {raw_action}")
+# --- Usage ---
+@job("custom-agent-framework-run")
+async def main():
+    task = Task(prompt="Use custom agent", gym="hud-browser")
+    env = await gym.make(task)
+    # Initialize agent, optionally with the adapter
+    my_agent = MyAgent(adapter=MyAdapter()) # Adapter handles conversion + rescaling
+    obs, _ = await env.reset()
+    for i in range(5):
+        print(f"--- Step {i+1} ---")
+        # Predict handles preprocess, fetch_response, postprocess
+        processed_actions, done = await my_agent.predict(obs)
+        print(f"Processed CLA actions: {processed_actions}")
+        if done: break
+        obs, _, _, terminated, info = await env.step(processed_actions)
+        if terminated: break
+    result = await env.evaluate()
+    print(f"Evaluation: {result}")
+    await env.close()
+# if __name__ == "__main__":
+#     asyncio.run(main())
+```
+*   **Pros:** Leverages SDK structure, benefits from automatic rescaling (if adapter used), cleaner separation of agent logic (`fetch_response`) and action conversion (`Adapter`).
+*   **Cons:** Requires understanding the `Agent` and `Adapter` base classes.
+## Approach 3: External Control (e.g., CDP)
+This approach uses HUD primarily for environment provisioning and lifecycle management, while the core interaction happens via a direct connection using protocols like CDP.
+*   **Concept:** Use `gym.make()` to start an environment (e.g., `"hud-browser"`). Use `env.get_urls()` to retrieve connection details (like a CDP endpoint URL). Use an external library (`pyppeteer`, `playwright`, `selenium` with CDP) to connect directly to the browser instance and control it using that library's commands.
+*   **Implementation:**
+    *   Create the HUD environment: `env = await gym.make(...)`.
+    *   Get connection info: `urls = await env.get_urls()`, `cdp_url = urls['url']`.
+    *   Initialize your external library (e.g., `pyppeteer.connect(browserURL=cdp_url)`).
+    *   Use the external library's functions for interaction (e.g., `page.click()`, `page.type()`). You would likely still use `env.step()` *without actions* periodically to get updated `Observation` (screenshots) for your agent's decision-making, but you wouldn't pass actions *back* to `env.step()`.
+    *   When finished, call `await env.evaluate()` and `await env.close()` on the HUD `env` object.
+```python
+import asyncio
+import os
+from hud import gym, job
+from hud.task import Task
+from hud.utils import stream # For live view
+# Need external library, e.g., pyppeteer (pip install pyppeteer)
+# import pyppeteer
+@job("external-control-run")
+async def main():
+    task = Task(prompt="Externally controlled task", gym="hud-browser", setup=("goto", "google.com"))
+    env = await gym.make(task)
+    try:
+        urls = await env.get_urls()
+        cdp_url = urls.get('url')
+        live_url = urls.get('live_url')
+        if not cdp_url:
+            raise ConnectionError("Could not get CDP URL from environment.")
+        if live_url:
+            stream(live_url) # Show live view
+        print(f"Connecting via CDP: {cdp_url}")
+        # --- Connect using external library (Example: pyppeteer) ---
+        # browser = await pyppeteer.connect(browserURL=cdp_url)
+        # page = (await browser.pages())[0] # Assume first page
+        print("Performing actions via external library (e.g., pyppeteer)...")
+        # await page.waitForSelector('textarea[name="q"]', {'visible': True})
+        # await page.type('textarea[name="q"]', 'capybara')
+        # await page.keyboard.press('Enter')
+        # await asyncio.sleep(2) # Wait for results
+        # --- End external library interaction ---
+        # await browser.disconnect()
+        print("Evaluating task via env.evaluate()...")
+        result = await env.evaluate(("contains_text", "capybara")) # Example eval
+        print(f"Evaluation result: {result}")
+    finally:
+        print("Closing environment...")
+        await env.close()
+# if __name__ == "__main__":
+#     if not os.getenv("HUD_API_KEY"): print("Set HUD_API_KEY")
+#     else: asyncio.run(main())
+```
+*   **Pros:** Maximum control over the environment using specialized libraries. Useful if existing automation scripts use these tools.
+*   **Cons:** **Actions taken via the external library are NOT recorded in the HUD trajectory.** Only observations fetched via `env.step()` and the final `env.evaluate()` result are captured. Bypasses the `CLA` abstraction. Requires managing dependencies for the external control library.
+Choose the approach that best fits your agent's design and your integration needs with the HUD framework's features like trajectory recording and standardized actions.

hud-python 0.2.0__tar.gz → 0.2.1__tar.gz

Potentially problematic release.

hud-python 0.2.0tar.gz → 0.2.1tar.gz