poly-hammer-worker 0.1.0.dev5__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,12 @@
1
+ node_modules/
2
+ /test-results/
3
+ /playwright-report/
4
+ /blob-report/
5
+ /playwright/.cache/
6
+ scratches
7
+ backend/reports
8
+ .terraform
9
+ terraform.tfvars
10
+ .vite/
11
+ secret.env
12
+ __pycache__/
@@ -0,0 +1,125 @@
1
+ Metadata-Version: 2.4
2
+ Name: poly-hammer-worker
3
+ Version: 0.1.0.dev5
4
+ Summary: Poly Hammer self-hosted GPU worker agent
5
+ License-Expression: MIT
6
+ Requires-Python: >=3.13
7
+ Requires-Dist: docker>=7.0
8
+ Requires-Dist: httpx<1,>=0.27
9
+ Requires-Dist: psutil>=5.9
10
+ Requires-Dist: rich>=13.0
11
+ Requires-Dist: typer>=0.12
12
+ Description-Content-Type: text/markdown
13
+
14
+ # ph-worker — Poly Hammer Self-Hosted GPU Worker
15
+
16
+ A CLI agent that connects your GPU workstation to the Poly Hammer Portal, letting you run AI inference jobs on your own hardware instead of consuming cloud credits.
17
+
18
+ ## Requirements
19
+
20
+ - **NVIDIA GPU** with CUDA drivers installed
21
+ - **Docker** with [NVIDIA Container Toolkit](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html) (`--gpus` support)
22
+ - **Python 3.11+**
23
+ - An active Poly Hammer Portal account with a registered worker key
24
+
25
+ ## Installation
26
+
27
+ ```bash
28
+ cd workers/self-hosted/client
29
+ uv sync
30
+ ```
31
+
32
+ ## Quick Start
33
+
34
+ ### 1. Check GPU compatibility
35
+
36
+ ```bash
37
+ ph-worker check-gpu
38
+ ```
39
+
40
+ This will display your GPU model, VRAM, CUDA version, CPU cores, and RAM.
41
+
42
+ ### 2. Register a worker in the Portal
43
+
44
+ Navigate to the **Workers** page in the Poly Hammer Portal and click **Add Worker**. You'll receive a worker key (`ph_worker_...`) — save it, it's shown only once.
45
+
46
+ ### 3. Start the worker
47
+
48
+ ```bash
49
+ ph-worker start --worker-key ph_worker_XXXXXXXX
50
+ ```
51
+
52
+ Or use environment variables:
53
+
54
+ ```bash
55
+ export PH_WORKER_KEY=ph_worker_XXXXXXXX
56
+ export PH_PORTAL_URL=https://portal.polyhammer.com # default
57
+ ph-worker start
58
+ ```
59
+
60
+ The worker will:
61
+
62
+ 1. Detect your GPU/CUDA capabilities
63
+ 2. Send heartbeats to the portal
64
+ 3. Long-poll for inference jobs
65
+ 4. Pull the correct Docker image (tagged by CUDA version) on first job
66
+ 5. Run inference in an isolated container with GPU access
67
+ 6. Upload results and report completion
68
+
69
+ ## How It Works
70
+
71
+ ```
72
+ Portal API ←→ ph-worker agent ←→ Docker (GPU container)
73
+ │ │
74
+ ├── heartbeat (60s) ├── JSON stdin (job)
75
+ ├── poll for jobs ├── JSON stdout (progress)
76
+ ├── report progress └── S3 upload (result)
77
+ └── report complete/fail
78
+ ```
79
+
80
+ - **Images are pre-built** with model weights baked in — no download delays at inference time
81
+ - **CUDA-versioned tags** (`cuda13.1`, etc.) ensure compatibility with your local drivers
82
+ - **Dynamic pulling** — the worker automatically pulls the right image when it receives a job for a model it hasn't run before
83
+
84
+ ## Configuration
85
+
86
+ | Option | Env Var | Default | Description |
87
+ |--------|---------|---------|-------------|
88
+ | `--worker-key` | `PH_WORKER_KEY` | (required) | Worker API key from the portal |
89
+ | `--portal-url` | `PH_PORTAL_URL` | `https://portal.polyhammer.com` | Portal API base URL |
90
+
91
+ ## Docker Image Tags
92
+
93
+ Worker images are published to GHCR with CUDA-versioned tags:
94
+
95
+ | Image | Tag | Description |
96
+ |-------|-----|-------------|
97
+ | `ghcr.io/poly-hammer/hy-motion-worker` | `cuda13.1` | HY-Motion models (latest stable) |
98
+ | `ghcr.io/poly-hammer/mdm-worker` | `cuda13.1` | MDM models (latest stable) |
99
+ | | `cuda13.1-<sha>` | Pinned to a specific commit |
100
+ | | `latest` | Latest build (any CUDA version) |
101
+
102
+ ## Troubleshooting
103
+
104
+ ### "No CUDA detected"
105
+
106
+ - Ensure NVIDIA drivers are installed: `nvidia-smi`
107
+ - The NVIDIA Container Toolkit must be installed for Docker GPU support
108
+
109
+ ### Container fails to start
110
+
111
+ - Verify Docker GPU support: `docker run --rm --gpus all nvidia/cuda:13.1.0-base-ubuntu22.04 nvidia-smi`
112
+ - Check Docker daemon is running: `docker info`
113
+
114
+ ### Worker shows OFFLINE in portal
115
+
116
+ - Heartbeats are sent every 60 seconds; the portal marks workers offline after 120 seconds
117
+ - Check network connectivity to the portal URL
118
+ - Verify your worker key is correct
119
+
120
+ ## Development
121
+
122
+ ```bash
123
+ cd workers/self-hosted/client
124
+ uv sync --group dev
125
+ ```
@@ -0,0 +1,112 @@
1
+ # ph-worker — Poly Hammer Self-Hosted GPU Worker
2
+
3
+ A CLI agent that connects your GPU workstation to the Poly Hammer Portal, letting you run AI inference jobs on your own hardware instead of consuming cloud credits.
4
+
5
+ ## Requirements
6
+
7
+ - **NVIDIA GPU** with CUDA drivers installed
8
+ - **Docker** with [NVIDIA Container Toolkit](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html) (`--gpus` support)
9
+ - **Python 3.11+**
10
+ - An active Poly Hammer Portal account with a registered worker key
11
+
12
+ ## Installation
13
+
14
+ ```bash
15
+ cd workers/self-hosted/client
16
+ uv sync
17
+ ```
18
+
19
+ ## Quick Start
20
+
21
+ ### 1. Check GPU compatibility
22
+
23
+ ```bash
24
+ ph-worker check-gpu
25
+ ```
26
+
27
+ This will display your GPU model, VRAM, CUDA version, CPU cores, and RAM.
28
+
29
+ ### 2. Register a worker in the Portal
30
+
31
+ Navigate to the **Workers** page in the Poly Hammer Portal and click **Add Worker**. You'll receive a worker key (`ph_worker_...`) — save it, it's shown only once.
32
+
33
+ ### 3. Start the worker
34
+
35
+ ```bash
36
+ ph-worker start --worker-key ph_worker_XXXXXXXX
37
+ ```
38
+
39
+ Or use environment variables:
40
+
41
+ ```bash
42
+ export PH_WORKER_KEY=ph_worker_XXXXXXXX
43
+ export PH_PORTAL_URL=https://portal.polyhammer.com # default
44
+ ph-worker start
45
+ ```
46
+
47
+ The worker will:
48
+
49
+ 1. Detect your GPU/CUDA capabilities
50
+ 2. Send heartbeats to the portal
51
+ 3. Long-poll for inference jobs
52
+ 4. Pull the correct Docker image (tagged by CUDA version) on first job
53
+ 5. Run inference in an isolated container with GPU access
54
+ 6. Upload results and report completion
55
+
56
+ ## How It Works
57
+
58
+ ```
59
+ Portal API ←→ ph-worker agent ←→ Docker (GPU container)
60
+ │ │
61
+ ├── heartbeat (60s) ├── JSON stdin (job)
62
+ ├── poll for jobs ├── JSON stdout (progress)
63
+ ├── report progress └── S3 upload (result)
64
+ └── report complete/fail
65
+ ```
66
+
67
+ - **Images are pre-built** with model weights baked in — no download delays at inference time
68
+ - **CUDA-versioned tags** (`cuda13.1`, etc.) ensure compatibility with your local drivers
69
+ - **Dynamic pulling** — the worker automatically pulls the right image when it receives a job for a model it hasn't run before
70
+
71
+ ## Configuration
72
+
73
+ | Option | Env Var | Default | Description |
74
+ |--------|---------|---------|-------------|
75
+ | `--worker-key` | `PH_WORKER_KEY` | (required) | Worker API key from the portal |
76
+ | `--portal-url` | `PH_PORTAL_URL` | `https://portal.polyhammer.com` | Portal API base URL |
77
+
78
+ ## Docker Image Tags
79
+
80
+ Worker images are published to GHCR with CUDA-versioned tags:
81
+
82
+ | Image | Tag | Description |
83
+ |-------|-----|-------------|
84
+ | `ghcr.io/poly-hammer/hy-motion-worker` | `cuda13.1` | HY-Motion models (latest stable) |
85
+ | `ghcr.io/poly-hammer/mdm-worker` | `cuda13.1` | MDM models (latest stable) |
86
+ | | `cuda13.1-<sha>` | Pinned to a specific commit |
87
+ | | `latest` | Latest build (any CUDA version) |
88
+
89
+ ## Troubleshooting
90
+
91
+ ### "No CUDA detected"
92
+
93
+ - Ensure NVIDIA drivers are installed: `nvidia-smi`
94
+ - The NVIDIA Container Toolkit must be installed for Docker GPU support
95
+
96
+ ### Container fails to start
97
+
98
+ - Verify Docker GPU support: `docker run --rm --gpus all nvidia/cuda:13.1.0-base-ubuntu22.04 nvidia-smi`
99
+ - Check Docker daemon is running: `docker info`
100
+
101
+ ### Worker shows OFFLINE in portal
102
+
103
+ - Heartbeats are sent every 60 seconds; the portal marks workers offline after 120 seconds
104
+ - Check network connectivity to the portal URL
105
+ - Verify your worker key is correct
106
+
107
+ ## Development
108
+
109
+ ```bash
110
+ cd workers/self-hosted/client
111
+ uv sync --group dev
112
+ ```
@@ -0,0 +1 @@
1
+ """Poly Hammer self-hosted GPU worker agent."""
@@ -0,0 +1,113 @@
1
+ """
2
+ HTTP client for the Poly Hammer Portal worker API.
3
+
4
+ All requests authenticate via the X-Worker-Key header.
5
+ """
6
+
7
+ import logging
8
+ from typing import Any
9
+
10
+ import httpx
11
+
12
+ logger = logging.getLogger(__name__)
13
+
14
+ # Endpoints (relative to portal base URL)
15
+ _POLL_PATH = "/api/v1/worker-jobs/poll"
16
+ _COMPLETE_PATH = "/api/v1/worker-jobs/{job_id}/complete"
17
+ _FAIL_PATH = "/api/v1/worker-jobs/{job_id}/fail"
18
+ _PROGRESS_PATH = "/api/v1/worker-jobs/{job_id}/progress"
19
+ _HEARTBEAT_PATH = "/api/v1/worker-jobs/heartbeat"
20
+
21
+
22
+ class PortalClient:
23
+ """Thin HTTP wrapper around the worker-jobs API."""
24
+
25
+ def __init__(self, portal_url: str, worker_key: str) -> None:
26
+ self.base_url = portal_url.rstrip("/")
27
+ self._headers = {"X-Worker-Key": worker_key}
28
+ self._client = httpx.Client(
29
+ base_url=self.base_url,
30
+ headers=self._headers,
31
+ timeout=httpx.Timeout(connect=10.0, read=60.0, write=30.0, pool=10.0),
32
+ )
33
+
34
+ def poll_for_job(self) -> dict | None:
35
+ """Long-poll for a pending job. Returns job payload or None."""
36
+ try:
37
+ resp = self._client.get(_POLL_PATH)
38
+ if resp.status_code == 204 or resp.status_code == 200 and not resp.text:
39
+ return None
40
+ resp.raise_for_status()
41
+ return resp.json()
42
+ except httpx.TimeoutException:
43
+ # Normal — server held connection for poll timeout with no job
44
+ return None
45
+ except httpx.HTTPStatusError as e:
46
+ logger.error("Poll failed: %s %s", e.response.status_code, e.response.text)
47
+ raise
48
+
49
+ def complete_job(
50
+ self,
51
+ job_id: str,
52
+ elapsed_seconds: float,
53
+ result_url: str | None = None,
54
+ result_metadata: dict | None = None,
55
+ ) -> None:
56
+ """Report job completion."""
57
+ body: dict[str, Any] = {"elapsed_seconds": elapsed_seconds}
58
+ if result_url:
59
+ body["result_url"] = result_url
60
+ if result_metadata:
61
+ body["result_metadata"] = result_metadata
62
+
63
+ resp = self._client.post(
64
+ _COMPLETE_PATH.format(job_id=job_id),
65
+ json=body,
66
+ )
67
+ resp.raise_for_status()
68
+
69
+ def fail_job(
70
+ self,
71
+ job_id: str,
72
+ error_message: str,
73
+ error_code: str = "INFERENCE_FAILED",
74
+ ) -> None:
75
+ """Report job failure."""
76
+ resp = self._client.post(
77
+ _FAIL_PATH.format(job_id=job_id),
78
+ json={"error_message": error_message, "error_code": error_code},
79
+ )
80
+ resp.raise_for_status()
81
+
82
+ def report_progress(self, job_id: str, progress: float) -> None:
83
+ """Send a progress update (0.0–1.0)."""
84
+ try:
85
+ resp = self._client.post(
86
+ _PROGRESS_PATH.format(job_id=job_id),
87
+ json={"progress": progress},
88
+ )
89
+ resp.raise_for_status()
90
+ except httpx.HTTPError:
91
+ # Progress updates are best-effort
92
+ logger.debug("Progress report failed (non-fatal)")
93
+
94
+ def heartbeat(
95
+ self,
96
+ supported_models: list[str] | None = None,
97
+ hardware_info: dict | None = None,
98
+ ) -> None:
99
+ """Send a heartbeat to keep the worker marked ONLINE."""
100
+ body: dict[str, Any] = {}
101
+ if supported_models is not None:
102
+ body["supported_models"] = supported_models
103
+ if hardware_info is not None:
104
+ body["hardware_info"] = hardware_info
105
+
106
+ try:
107
+ resp = self._client.post(_HEARTBEAT_PATH, json=body)
108
+ resp.raise_for_status()
109
+ except httpx.HTTPError:
110
+ logger.warning("Heartbeat failed (will retry)")
111
+
112
+ def close(self) -> None:
113
+ self._client.close()
@@ -0,0 +1,123 @@
1
+ """
2
+ ph-worker CLI — Poly Hammer self-hosted GPU worker agent.
3
+
4
+ Usage:
5
+ ph-worker start --worker-key <key> --portal-url <url>
6
+ ph-worker check-gpu
7
+ """
8
+
9
+ from typing import Annotated
10
+
11
+ import typer
12
+ from rich.console import Console
13
+
14
+ console = Console()
15
+
16
+ app = typer.Typer(help="Poly Hammer self-hosted GPU worker agent.")
17
+
18
+
19
+ def version_callback(value: bool) -> None:
20
+ if value:
21
+ from importlib.metadata import version
22
+
23
+ typer.echo(f"ph-worker {version('ph-worker')}")
24
+ raise typer.Exit()
25
+
26
+
27
+ @app.callback()
28
+ def main(
29
+ _version: Annotated[
30
+ bool,
31
+ typer.Option(
32
+ "--version",
33
+ callback=version_callback,
34
+ is_eager=True,
35
+ help="Show the version and exit.",
36
+ ),
37
+ ] = False,
38
+ ) -> None:
39
+ pass
40
+
41
+
42
+ @app.command()
43
+ def start(
44
+ worker_key: Annotated[
45
+ str,
46
+ typer.Option(
47
+ envvar="PH_WORKER_KEY",
48
+ help="Worker API key (ph_worker_...). Can also be set via PH_WORKER_KEY env var.",
49
+ ),
50
+ ],
51
+ portal_url: Annotated[
52
+ str,
53
+ typer.Option(
54
+ envvar="PH_PORTAL_URL",
55
+ help="Poly Hammer Portal API base URL.",
56
+ ),
57
+ ] = "https://portal.polyhammer.com",
58
+ auto_cleanup: Annotated[
59
+ bool,
60
+ typer.Option(
61
+ "--auto-cleanup",
62
+ envvar="PH_AUTO_CLEANUP",
63
+ help="Automatically remove old worker images when disk space is low.",
64
+ ),
65
+ ] = False,
66
+ registry: Annotated[
67
+ str | None,
68
+ typer.Option(
69
+ envvar="PH_REGISTRY_URL",
70
+ help="Optional container registry URL to pull images from if not found locally.",
71
+ ),
72
+ ] = None,
73
+ ) -> None:
74
+ """Start the worker agent loop.
75
+
76
+ The worker will:
77
+ 1. Detect local GPU/CUDA capabilities
78
+ 2. Connect to the portal and send heartbeats
79
+ 3. Long-poll for inference jobs
80
+ 4. Use locally-built Docker images (or pull from --registry if set)
81
+ 5. Run inference and upload results
82
+ """
83
+ from poly_hammer_worker.worker_loop import run_worker_loop
84
+
85
+ run_worker_loop(
86
+ worker_key=worker_key,
87
+ portal_url=portal_url,
88
+ auto_cleanup=auto_cleanup,
89
+ registry_url=registry,
90
+ console=console,
91
+ )
92
+
93
+
94
+ @app.command()
95
+ def check_gpu() -> None:
96
+ """Detect and display local GPU/CUDA capabilities."""
97
+ from poly_hammer_worker.hardware import detect_hardware
98
+
99
+ info = detect_hardware()
100
+
101
+ console.print("\n[bold]GPU / Hardware Info[/bold]\n")
102
+ console.print(f" GPU Model: {info.get('gpu_model', 'N/A')}")
103
+ console.print(f" VRAM: {info.get('vram_gb', 'N/A')} GB")
104
+ console.print(f" CUDA Version: {info.get('cuda_version', 'N/A')}")
105
+ console.print(f" CPU Cores: {info.get('cpu_cores', 'N/A')}")
106
+ console.print(f" RAM: {info.get('ram_gb', 'N/A')} GB")
107
+ console.print(f" Disk Free: {info.get('disk_free_gb', 'N/A')} GB")
108
+
109
+ cuda_version = info.get("cuda_version")
110
+ if cuda_version:
111
+ console.print(
112
+ f"\n [green]CUDA {cuda_version} detected — ready for self-hosted workers[/green]"
113
+ )
114
+ else:
115
+ console.print(
116
+ "\n [red]No CUDA detected — self-hosted workers require an NVIDIA GPU[/red]"
117
+ )
118
+
119
+ console.print()
120
+
121
+
122
+ if __name__ == "__main__":
123
+ app()