PyPI - gpufl - Versions diffs - 1.0.2__tar.gz → 1.1.0__tar.gz - Mend

gpufl 1.0.2tar.gz → 1.1.0tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (275) hide show

{gpufl-1.0.2 → gpufl-1.1.0}/.dockerignore RENAMED Viewed

@@ -1,6 +1,6 @@
-# Python / notebooks — not needed for the C++ daemon build
-python/
-example/python/
+# Python / notebooks — uncomment if building Python wheels inside Docker
+# python/
+# example/python/
 **/.Trash-*
 **/__pycache__/
 **/*.pyc
@@ -15,4 +15,4 @@ build/
 .git/
 .idea/
 .vscode/
-*.md
+# *.md

{gpufl-1.0.2 → gpufl-1.1.0}/.github/workflows/build.yml RENAMED Viewed

@@ -3,8 +3,28 @@ name: Build GPUFl Client
 on:
   push:
     branches: [ "main" ]
+    paths-ignore:
+      - '**.md'
+      - 'docs/**'
+      - 'Dockerfile*'
+      - '**/Dockerfile'
+      - '**/Dockerfile.*'
+      - 'LICENSE'
+      - 'THIRD-PARTY-NOTICES.txt'
+      - '.gitignore'
+      - 'images/**'
   pull_request:
     branches: [ "main" ]
+    paths-ignore:
+      - '**.md'
+      - 'docs/**'
+      - 'Dockerfile*'
+      - '**/Dockerfile'
+      - '**/Dockerfile.*'
+      - 'LICENSE'
+      - 'THIRD-PARTY-NOTICES.txt'
+      - '.gitignore'
+      - 'images/**'
 jobs:
   build:

{gpufl-1.0.2 → gpufl-1.1.0}/.github/workflows/release.yml RENAMED Viewed

@@ -148,7 +148,7 @@ jobs:
           # quay.io/pypa/manylinux_2_28_x86_64 image (OpenSSL 3.5.5).
           CIBW_ENVIRONMENT_LINUX: "CUDA_HOME=/usr/local/cuda PATH=/usr/local/cuda/bin:$PATH CMAKE_ARGS='-DGPUFL_ENABLE_NVIDIA=ON -DGPUFL_ENABLE_AMD=OFF -DBUILD_TESTING=OFF -DOPENSSL_INCLUDE_DIR=/usr/include/openssl3 -DOPENSSL_SSL_LIBRARY=/usr/lib64/openssl3/libssl.so -DOPENSSL_CRYPTO_LIBRARY=/usr/lib64/openssl3/libcrypto.so'"
           # Windows build needs the OpenSSL install path so find_package(OpenSSL)
-          # in CMakeLists.txt succeeds, otherwise HTTPS upload (HttpLogSink)
+          # in CMakeLists.txt succeeds, otherwise HTTPS upload (gpufl::uploadLogs)
           # silently falls back to HTTP-only — see openssl-windows.html in the
           # manual repo for the user-facing story. CIBW_BEFORE_ALL_WINDOWS
           # installs choco's openssl package into this path.

{gpufl-1.0.2 → gpufl-1.1.0}/.gitignore RENAMED Viewed

@@ -1,5 +1,6 @@
-### claude
+### ai
 .claude/
+.junie/
 ### idea
 .idea/**
@@ -14,6 +15,7 @@ wget-log*
 ### docker
 example/python/docker/**/
+dist/
 ### C++ template
 # Prerequisites
@@ -89,3 +91,28 @@ example/python/docker/**/
 *.hex
 *.log
+### Python
+# Byte-compiled / optimized files
+__pycache__/
+*.py[cod]
+*$py.class
+# Test / coverage caches
+.pytest_cache/
+.mypy_cache/
+.ruff_cache/
+.coverage
+.coverage.*
+htmlcov/
+coverage.xml
+# Packaging / distribution
+*.egg-info/
+*.egg
+.eggs/
+# Virtualenvs
+.venv/
+venv/
+env/

gpufl-1.1.0/CHANGELOG.md ADDED Viewed

@@ -0,0 +1,214 @@
+# Changelog
+All notable changes to `gpufl-client` are documented here. Format
+inspired by [Keep a Changelog](https://keepachangelog.com/en/1.1.0/);
+versioning follows PEP 440 for the Python wheel and semver-style
+`MAJOR.MINOR.PATCH` for the C++ library.
+## [1.1.0] — 2026-06-03
+### Breaking changes
+#### `HttpLogSink` removed — upload is now a separate post-shutdown step
+The in-process `HttpLogSink` that POSTed every NDJSON event live
+during a session has been deleted. Network failures during the
+workload could leak into the GPU job's exit code, and per-event HTTP
+added measurable jitter to PyTorch training runs. Upload now happens
+as an explicit step after `gpufl::shutdown()` returns.
+For Python customers, the migration is **soft** — `remote_upload=True`
+still works in v1.1 as a deprecation shim (see Deprecations below).
+For pure-C++ customers who `#include`'d the header directly, the
+break is a compile error.
+| Surface | Before (v1.0.x) | New (v1.1+) | v1.1 backward-compat behavior |
+|---|---|---|---|
+| Python `init(remote_upload=True)` | Live HttpLogSink during session | `with gpufl.session(...)` or `gpufl.upload_logs(...)` after shutdown | **Still works** — `DeprecationWarning` at init + `atexit` handler that calls `upload_logs()` at interpreter exit |
+| C++ `opts.remote_upload = true;` | Live HttpLogSink during session | `gpufl::uploadLogs(uopts)` after `shutdown()` | **Still works** — deprecation log at init + auto-call to `gpufl::uploadLogs()` at the end of `gpufl::shutdown()` (shutdown now blocks until upload completes) |
+| Env var `GPUFL_REMOTE_UPLOAD=1` | Live HttpLogSink during session | `gpufl.upload_logs()` post-shutdown | **Still works** — routes through the Python shim above |
+| `#include "gpufl/core/logger/http_log_sink.hpp"` | The header | gone | **Compile error** — drop the include |
+See [docs/getting-started/sending-data.md](docs/getting-started/sending-data.md)
+for the full migration guide.
+### Deprecations (scheduled for v1.2 removal)
+| Field / kwarg | Status in v1.1 | What to use instead |
+|---|---|---|
+| `InitOptions::remote_upload` (Python kwarg + C++ field) | DeprecationWarning + atexit shim that calls `upload_logs()` at interpreter exit | `with gpufl.session(...)` or call `gpufl.upload_logs()` explicitly after `shutdown()` |
+| `InitOptions::backend_url` | Still functional; read by the version-discovery probe and stored for `upload_logs()` to read back | Pass `backend_url` directly to `UploadOptions` / `gpufl.upload_logs()` |
+| `InitOptions::api_key` | Same as `backend_url` | Pass `api_key` directly to `UploadOptions` / `gpufl.upload_logs()` |
+| `GPUFL_REMOTE_UPLOAD` env var | Still read; routes to the Python atexit shim | Drop from container manifests / start scripts |
+All three fields ship in v1.1 to keep the migration painless and will
+be removed together in v1.2 — at which point creds live exclusively on
+`UploadOptions` and `gpufl::init()` stops touching network config
+entirely.
+### Breaking changes (cont.)
+#### `sampling_auto_start` renamed to `continuous_system_sampling`
+The old name only described init-time behavior. The new flag covers
+the full policy — the semantics also got fixed (see Bug fixes).
+- **Python**: old kwarg still accepted for this release with a
+  `DeprecationWarning`. Will be removed in the next release.
+- **C++**: hard rename. Compile error points at the call site with
+  a clear "no member named 'sampling_auto_start'" message.
+### Added
+#### Deferred upload — `gpufl.upload_logs()` / `gpufl::uploadLogs()`
+A new module under `include/gpufl/upload/`. Reads the session's
+NDJSON files post-shutdown, POSTs each event to the existing
+`/api/v1/events/{eventType}` backend endpoints. Never throws on
+network errors; returns an `UploadResult` with `.success`,
+`.events_uploaded`, `.warnings`, etc.
+Python orchestration via context manager:
+```python
+with gpufl.session(app_name="train",
+                   backend_url="https://api.gpuflight.com",
+                   api_key="gpfl_xxxxx"):
+    train_one_epoch()
+# On __exit__: shutdown() then upload_logs() — automatic.
+```
+#### `gpufl upload` CLI
+Post-mortem / ad-hoc shipping tool. Registered via
+`[project.scripts]` in `pyproject.toml`:
+```bash
+gpufl upload /tmp/runs/train --backend-url ... --api-key ...
+gpufl upload /tmp/runs/train --session-id <uuid>
+gpufl upload /tmp/runs/train --all-sessions
+gpufl upload /tmp/runs/train --force                 # bypass cursor check
+```
+Default behavior uploads only the **latest** session found in the
+directory (most recent `job_start.ts_ns`). `--session-id` picks a
+specific one; `--all-sessions` ships every session present.
+#### Session-aware cursor file
+`.gpufl-upload-cursor.json` (in the log directory) tracks which
+sessions have completed a successful upload. Re-running `gpufl
+upload` on a completed session refuses with a clear message
+suggesting `--force`; `--all-sessions` mode silently skips completed
+sessions and uploads the rest. Survives across runs to skip
+already-uploaded rotated files.
+#### `ProfilingEngine` — clarified names
+The engine enum was reworked into a single, plainly-named ladder
+(no aliases). New default is `Monitor` (telemetry only, no CUPTI).
+| Name | What it captures |
+|---|---|
+| `Monitor` | GPU/host health metrics only — no CUPTI. The default. |
+| `Trace` | + activity trace: kernels, memcpy, sync (no sampling) |
+| `PcSampling` | + PC stall-reason sampling |
+| `SassMetrics` | + per-instruction SASS counters |
+| `RangeProfiler` | + hardware throughput counters |
+| `Deep` | `PcSampling` + `SassMetrics` in one run |
+Replaces the earlier `None` / `KernelTrace` / `Continuous` / `Range` /
+`PcSamplingWithSass` names. Pre-1.0, no deprecation shim — the old
+names are gone.
+#### Ref-counted system-metric sampler
+`Sampler::configure()` / `activate()` / `deactivate()` / `shutdown()`
+replaces the old `start()` / `stop()`. Activation count composes
+across `continuous_system_sampling` baseline, `GFL_SCOPE` enter/exit,
+and explicit `systemStart()` / `systemStop()` calls — the sampler
+runs while any activator is in flight.
+### Bug fixes
+#### Scope-driven system sampling now works
+Before: setting `sampling_auto_start=false` silently disabled all
+system metrics, even inside `GFL_SCOPE` regions. The flag's name
+suggested "wait for explicit start" semantics but the code disabled
+sampling entirely. Now, under the renamed `continuous_system_sampling
+= false`, the sampler activates while inside any scope or between
+`systemStart` / `systemStop` calls, then idles outside that window.
+#### EventWrapper envelope on upload POSTs
+The initial `uploadLogs()` draft POSTed bare NDJSON event lines.
+The backend's `EventIngestionController` deserialized those into an
+`EventWrapper` with every field null, the inner `objectMapper.readValue
+(null, ...)` threw, the exception was caught and swallowed, and the
+controller returned 200 OK anyway — silent data loss. Every event is
+now correctly wrapped in `{data, agentSendingTime, hostname, ipAddr}`.
+Regression test added in `tests/upload/test_upload_logs.cpp`.
+### Tests added
+- `tests/core/test_sampler.cpp` — 8 scenarios for the ref-counted
+  Sampler (activate/deactivate, nesting, force-shutdown, unbalanced
+  deactivate clamping).
+- `tests/upload/test_upload_logs.cpp` — 12 scenarios for the upload
+  path (happy path, headers, cursor refusal + force override, auth
+  failure, malformed lines, session-id filter, all-sessions,
+  lifecycle ordering, EventWrapper envelope regression guard).
+- `tests/python/test_continuous_system_sampling.py` — 5 integration
+  scenarios for the three sampling modes plus deprecation behavior.
+### Internal / build
+- Removed `include/gpufl/core/logger/http_log_sink.{hpp,cpp}`.
+- Added `include/gpufl/upload/upload_logs.{hpp,cpp}` to the CMake
+  target sources.
+- `CMakeLists.txt` `project(VERSION)` bumped to 1.1.0; new
+  `GPUFL_VERSION_SUFFIX` variable layers optional PEP 440 pre-release
+  tokens onto `GPUFL_CLIENT_VERSION`.
+### Migration checklist for 1.0.x → 1.1.0
+**Optional in v1.1, required by v1.2:**
+- [ ] Python: replace every `gpufl.init(remote_upload=True, ...)` call
+  with `with gpufl.session(backend_url=..., api_key=...):` or an
+  explicit `gpufl.upload_logs(...)` after `shutdown()`. The old form
+  still works in v1.1 with a `DeprecationWarning`; v1.2 will remove it.
+- [ ] C++: replace `opts.remote_upload = true;` with an explicit
+  `gpufl::uploadLogs(uopts)` after `gpufl::shutdown()`. The field
+  still compiles in v1.1 but is a no-op; v1.2 will delete it.
+- [ ] Container manifests: prefer dropping `GPUFL_REMOTE_UPLOAD` and
+  driving upload via your app code (or the `gpufl upload` CLI in a
+  lifecycle hook). The env var still routes through the Python shim
+  in v1.1; v1.2 stops reading it.
+- [ ] Future-proof: start passing `backend_url` and `api_key` directly
+  to `gpufl::uploadLogs()` / `gpufl.upload_logs()` rather than relying
+  on the InitOptions fields. Those InitOptions fields will move to
+  UploadOptions only in v1.2.
+**Required in v1.1 (no grace period):**
+- [ ] Python: rename `sampling_auto_start` → `continuous_system_sampling`.
+  The old name still works with a `DeprecationWarning` (removed in v1.2).
+- [ ] C++: rename `opts.sampling_auto_start` → `opts.continuous_system_sampling`
+  (compile-time error otherwise — no grace period for C++).
+- [ ] If you `#include`'d `http_log_sink.hpp` directly anywhere,
+  drop the include — the header is gone.
+---
+## Releases prior to 1.1.0
+See git tags for the historical record. Highlights:
+- **1.0.3** — `ScopeMeta` benchmark-iteration helper, scope iterator
+  form, `gpufl.report` text summary improvements.
+- **1.0.2** — first version published to PyPI; "Stable" status.
+- **1.0.1** — `kernel_sample_rate_ms` deprecated (no-op).
+- **1.0.0** — first stable contract.

{gpufl-1.0.2 → gpufl-1.1.0}/CMakeLists.txt RENAMED Viewed

@@ -1,11 +1,17 @@
 cmake_minimum_required(VERSION 3.31)
 project(gpufl_client
-    VERSION 1.0.2
+    VERSION 1.1.0
     LANGUAGES CXX
     DESCRIPTION "Header-only GPU monitoring client library"
 )
+# Pre-release suffix appended to GPUFL_CLIENT_VERSION below. PEP 440
+# pre-release tokens (`rc1`, `a1`, `b1`, …) aren't valid in CMake's
+# `project(... VERSION ...)`, so we layer them on top here. Final releases
+# leave this empty.
+set(GPUFL_VERSION_SUFFIX "")
 # -----------------------
 # CUDA Architectures (CI Friendly)
 # -----------------------
@@ -22,6 +28,7 @@ set(CMAKE_CXX_EXTENSIONS OFF)
 # -----------------------
 option(GPUFL_ENABLE_NVIDIA "Enable NVIDIA backends (CUDA + NVML when available)" ON)
 option(GPUFL_ENABLE_AMD    "Enable AMD backends (ROCm when available)" OFF)
+option(GPUFL_ENABLE_AMD_ROCPROFILER "Enable AMD rocprofiler-sdk tracing backend when available" ON)
 option(BUILD_GPUFL_EXAMPLE "Build gpufl example application" ON)
 option(BUILD_GPUFL_MONITOR "Build gpufl-monitor standalone daemon" ON)
@@ -56,7 +63,7 @@ target_compile_features(gpufl INTERFACE cxx_std_17)
 # inline the literal at call sites and the mismatch becomes visible
 # as comparison failures (e.g. test asserting on User-Agent).
 target_compile_definitions(gpufl PUBLIC
-    GPUFL_CLIENT_VERSION="${PROJECT_VERSION}"
+    GPUFL_CLIENT_VERSION="${PROJECT_VERSION}${GPUFL_VERSION_SUFFIX}"
 )
 # Enable PIC for static library (required when linking into shared libraries like Python modules)
@@ -68,7 +75,7 @@ target_sources(gpufl PRIVATE
     include/gpufl/core/logger/logger.cpp
     include/gpufl/core/logger/log_rotator.cpp
     include/gpufl/core/logger/file_log_sink.cpp
-    include/gpufl/core/logger/http_log_sink.cpp
+    include/gpufl/upload/upload_logs.cpp
     include/gpufl/core/host_info.cpp
     include/gpufl/core/remote_config.cpp
     include/gpufl/core/model/batch_models.cpp
@@ -147,7 +154,9 @@ target_sources(gpufl PRIVATE include/gpufl/core/logger/file_compressor.cpp)
 # -----------------------
-# cpp-httplib — HTTP client for direct-to-backend log upload (HttpLogSink).
+# cpp-httplib — HTTP client used by gpufl::uploadLogs (deferred upload
+# of session NDJSON files to the backend, called after gpufl::shutdown).
+# Also used by remote_config.cpp for the post-init version probe.
 #
 # Single-header library. Fetched once via FetchContent so every build gets
 # the same version regardless of the host system. HTTPS support is gated
@@ -167,30 +176,37 @@ FetchContent_MakeAvailable(httplib)
 find_package(OpenSSL QUIET)
 if(OpenSSL_FOUND)
-    message(STATUS "Found OpenSSL: ${OPENSSL_VERSION} — HttpLogSink HTTPS enabled")
+    message(STATUS "Found OpenSSL: ${OPENSSL_VERSION} — HTTPS upload enabled")
     target_compile_definitions(gpufl PRIVATE CPPHTTPLIB_OPENSSL_SUPPORT=1)
     target_link_libraries(gpufl PRIVATE OpenSSL::SSL OpenSSL::Crypto)
     set(GPUFL_HTTPLIB_TLS 1)
 else()
     message(WARNING
-        "OpenSSL not found — HttpLogSink will support HTTP only. "
-        "Users pointing remote_config at an https:// endpoint will see "
-        "upload failures. Install OpenSSL (apt: libssl-dev, vcpkg: openssl, "
-        "brew: openssl) to enable TLS.")
+        "OpenSSL not found — gpufl::uploadLogs will support HTTP only. "
+        "Pointing backend_url at an https:// endpoint will fail to verify. "
+        "Install OpenSSL (apt: libssl-dev, vcpkg: openssl, brew: openssl) "
+        "to enable TLS.")
     set(GPUFL_HTTPLIB_TLS 0)
 endif()
 target_compile_definitions(gpufl PRIVATE GPUFL_HTTPLIB_TLS=${GPUFL_HTTPLIB_TLS})
-# NOTE: cpp-httplib's gzip request-body path (CPPHTTPLIB_ZLIB_SUPPORT)
-# is intentionally NOT defined. HttpLogSink sends uncompressed JSON
-# per event; bandwidth-conscious deployments use gpufl-agent against
-# the rotated NDJSON files (where deflate amortizes its dictionary
-# across the whole file for 10-15× compression vs the ~5× we'd get
-# per-POST). See the architectural note above HttpLogSink::Options
-# in http_log_sink.hpp for the full rationale.
+# Enable cpp-httplib's gzip path. PUBLIC so any consumer that includes
+# httplib.h via gpufl's headers (notably the test target's embedded
+# httplib::Server) gets the same wire-format support — without this,
+# server::Post handlers return 415 on incoming Content-Encoding: gzip
+# requests, which is exactly what uploadLogs sends in v1.2+.
+#
+# In production, the Spring Boot backend handles gzip natively, so
+# this define affects the test target more than the client. We still
+# define it on gpufl PUBLIC because:
+#   - It enables future use of httplib::Client::set_compress(true)
+#     to compress the body in-place instead of our manual gzipString
+#     (no current need, but a free optimization if we ever want it).
+#   - It propagates to tests, which is the immediate motivation.
 #
-# ZLIB is still linked above (around line 96) — file_compressor.cpp
-# uses it to gzip rotated NDJSON files written by FileLogSink, which
-# is a separate, working feature.
+# ZLIB is already linked above — file_compressor.cpp uses it to gzip
+# rotated NDJSON files, and upload_logs.cpp uses it both to read those
+# files back and to gzip outgoing stream-chunks.
+target_compile_definitions(gpufl PUBLIC CPPHTTPLIB_ZLIB_SUPPORT=1)
 target_link_libraries(gpufl PRIVATE httplib::httplib)
@@ -225,6 +241,7 @@ if(GPUFL_ENABLE_NVIDIA)
                 include/gpufl/backends/nvidia/sampler/cupti_sass.hpp
                 include/gpufl/backends/nvidia/cuda_collector.cpp
                 include/gpufl/backends/nvidia/cupti_utils.cpp
+                include/gpufl/backends/nvidia/cuda_cleanup_handler.cpp
                 include/gpufl/backends/nvidia/resource_handler.cpp
                 include/gpufl/backends/nvidia/kernel_launch_handler.cpp
                 include/gpufl/backends/nvidia/mem_transfer_handler.cpp
@@ -233,6 +250,7 @@ if(GPUFL_ENABLE_NVIDIA)
                 include/gpufl/backends/nvidia/cupti_backend.cpp
                 include/gpufl/backends/nvidia/engine/pc_sampling_engine.cpp
                 include/gpufl/backends/nvidia/engine/pc_sampling_with_sass_engine.cpp
+                include/gpufl/backends/nvidia/engine/pm_sampling_engine.cpp
                 include/gpufl/backends/nvidia/engine/sass_metrics_engine.cpp
                 include/gpufl/backends/nvidia/engine/range_profiler_engine.cpp)
             target_link_libraries(gpufl PRIVATE CUDA::cudart CUDA::cuda_driver)
@@ -481,18 +499,22 @@ if(GPUFL_ENABLE_AMD)
         endif()
     endif()
-    find_package(rocprofiler-sdk QUIET CONFIG HINTS /opt/rocm)
-    if(TARGET rocprofiler-sdk::rocprofiler-sdk)
-        set(GPUFL_HAS_ROCPROFILER_SDK 1)
-        target_link_libraries(gpufl PRIVATE rocprofiler-sdk::rocprofiler-sdk)
-        target_sources(gpufl PRIVATE
-            include/gpufl/backends/amd/monitor_adapter_amd.cpp
-            include/gpufl/backends/amd/rocprofiler_backend.cpp
-            include/gpufl/backends/amd/engine/dispatch_counter_engine.cpp
-        )
-        message(STATUS "Found ROCprofiler-SDK support")
+    if(GPUFL_ENABLE_AMD_ROCPROFILER)
+        find_package(rocprofiler-sdk QUIET CONFIG HINTS /opt/rocm)
+        if(TARGET rocprofiler-sdk::rocprofiler-sdk)
+            set(GPUFL_HAS_ROCPROFILER_SDK 1)
+            target_link_libraries(gpufl PRIVATE rocprofiler-sdk::rocprofiler-sdk)
+            target_sources(gpufl PRIVATE
+                include/gpufl/backends/amd/monitor_adapter_amd.cpp
+                include/gpufl/backends/amd/rocprofiler_backend.cpp
+                include/gpufl/backends/amd/engine/dispatch_counter_engine.cpp
+            )
+            message(STATUS "Found ROCprofiler-SDK support")
+        else()
+            message(STATUS "ROCprofiler-SDK not found; AMD kernel tracing disabled")
+        endif()
     else()
-        message(STATUS "ROCprofiler-SDK not found; AMD kernel tracing disabled")
+        message(STATUS "ROCprofiler-SDK support disabled by GPUFL_ENABLE_AMD_ROCPROFILER=OFF")
     endif()
     if(GPUFL_HAS_ROCM_SMI OR GPUFL_HAS_HIP)

{gpufl-1.0.2 → gpufl-1.1.0}/Dockerfile.monitor.amd RENAMED Viewed

@@ -44,6 +44,7 @@ RUN cmake -S . -B /build \
         -DBUILD_GPUFL_MONITOR=ON \
         -DGPUFL_ENABLE_NVIDIA=OFF \
         -DGPUFL_ENABLE_AMD=ON \
+        -DGPUFL_ENABLE_AMD_ROCPROFILER=OFF \
     && cmake --build /build --target gpufl-monitor --parallel
 # ── Stage 2: pull the pre-built Java agent ────────────────────────────────────
@@ -53,22 +54,13 @@ FROM ghcr.io/gpu-flight/gpufl-agent:latest AS agent-jar
 FROM eclipse-temurin:25-jre AS jre
 # ── Stage 4: runtime image ─────────────────────────────────────────────────────
-FROM ubuntu:24.04
+# Use ROCm base runtime to avoid fragile manual ROCm apt repo/package wiring.
+FROM rocm/dev-ubuntu-24.04:6.4-complete
 ENV DEBIAN_FRONTEND=noninteractive
-# Add ROCm apt repository
 RUN apt-get update && apt-get install -y --no-install-recommends \
-        ca-certificates \
-        wget \
-        gnupg \
-    && wget -q -O - https://repo.radeon.com/rocm/rocm.gpg.key | gpg --dearmor -o /etc/apt/keyrings/rocm.gpg \
-    && echo 'deb [arch=amd64 signed-by=/etc/apt/keyrings/rocm.gpg] https://repo.radeon.com/rocm/apt/6.4 noble main' \
-        > /etc/apt/sources.list.d/rocm.list \
-    && apt-get update && apt-get install -y --no-install-recommends \
         supervisor \
-        rocm-smi-lib \
-        rocm-hip-runtime \
     && rm -rf /var/lib/apt/lists/*
 # Copy Java 25 JRE from Temurin (Ubuntu 24.04 repos only ship up to openjdk-21)

gpufl 1.0.2__tar.gz → 1.1.0__tar.gz

gpufl 1.0.2tar.gz → 1.1.0tar.gz