PyPI - gpufl - Versions diffs - 0.1.4__tar.gz → 1.0.0__tar.gz - Mend

gpufl 0.1.4tar.gz → 1.0.0tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (232) hide show

{gpufl-0.1.4 → gpufl-1.0.0}/.github/workflows/release.yml RENAMED Viewed

@@ -136,16 +136,65 @@ jobs:
         uses: pypa/cibuildwheel@v2.22.0
         env:
           CIBW_VIRTUALENV_VERSION: "20.27.1"
-          CIBW_ENVIRONMENT_LINUX: "CUDA_HOME=/usr/local/cuda PATH=/usr/local/cuda/bin:$PATH CMAKE_ARGS='-DGPUFL_ENABLE_NVIDIA=ON -DGPUFL_ENABLE_AMD=OFF -DBUILD_TESTING=OFF'"
+          # OpenSSL hints: manylinux_2_28 (AlmaLinux 8) ships OpenSSL 1.1.1
+          # as the system default, but cpp-httplib v0.18.5 requires >= 3.0.0
+          # (SSL_get1_peer_certificate). We install EL8's `openssl3-devel`
+          # compat package (see CIBW_BEFORE_ALL_LINUX) which lays OpenSSL 3.x
+          # down under NON-standard prefixes — headers in /usr/include/openssl3,
+          # dev symlinks in /usr/lib64/openssl3 — so find_package(OpenSSL)
+          # won't see it without these explicit cache vars. The runtime
+          # SONAME is still libssl.so.3 in /usr/lib64, so auditwheel bundles
+          # it into gpufl.libs/ as before. Verified against the actual
+          # quay.io/pypa/manylinux_2_28_x86_64 image (OpenSSL 3.5.5).
+          CIBW_ENVIRONMENT_LINUX: "CUDA_HOME=/usr/local/cuda PATH=/usr/local/cuda/bin:$PATH CMAKE_ARGS='-DGPUFL_ENABLE_NVIDIA=ON -DGPUFL_ENABLE_AMD=OFF -DBUILD_TESTING=OFF -DOPENSSL_INCLUDE_DIR=/usr/include/openssl3 -DOPENSSL_SSL_LIBRARY=/usr/lib64/openssl3/libssl.so -DOPENSSL_CRYPTO_LIBRARY=/usr/lib64/openssl3/libcrypto.so'"
+          # Windows build needs the OpenSSL install path so find_package(OpenSSL)
+          # in CMakeLists.txt succeeds, otherwise HTTPS upload (HttpLogSink)
+          # silently falls back to HTTP-only — see openssl-windows.html in the
+          # manual repo for the user-facing story. CIBW_BEFORE_ALL_WINDOWS
+          # installs choco's openssl package into this path.
+          CIBW_ENVIRONMENT_WINDOWS: >-
+            OPENSSL_ROOT_DIR="C:/Program Files/OpenSSL-Win64"
+            CMAKE_ARGS="-DGPUFL_ENABLE_NVIDIA=ON -DGPUFL_ENABLE_AMD=OFF -DBUILD_TESTING=OFF"
           # cuda-nvml-devel-13-1 ships the libnvidia-ml.so stub under
           # targets/x86_64-linux/lib/stubs/ — without it CMake's NVML probe
           # finds nothing and (since v0.1.1) fails the build loudly. Every
           # release before v0.1.1 silently shipped wheels without NVML
           # because this package was missing here.
+          #
+          # openssl3-devel (NOT openssl-devel — that's 1.1.1 on EL8) provides
+          # OpenSSL 3.x headers + .so symlinks so cpp-httplib's
+          # CPPHTTPLIB_OPENSSL_SUPPORT path compiles (it #errors on < 3.0.0).
+          # It installs under /usr/include/openssl3 + /usr/lib64/openssl3 —
+          # see the OPENSSL_* hints in CIBW_ENVIRONMENT_LINUX. auditwheel
+          # bundles the resulting libssl.so.3 / libcrypto.so.3 (from
+          # /usr/lib64) into the wheel under gpufl.libs/ automatically
+          # (they're not on the manylinux_2_28 whitelist or our --exclude list).
           CIBW_BEFORE_ALL_LINUX: >-
             curl -fsSL https://developer.download.nvidia.com/compute/cuda/repos/rhel8/x86_64/cuda-rhel8.repo > /etc/yum.repos.d/cuda.repo &&
-            dnf install -y --nogpgcheck cuda-nvcc-13-1 cuda-cudart-devel-13-1 cuda-cupti-13-1 cuda-driver-devel-13-1 cuda-nvml-devel-13-1
-          CIBW_MANYLINUX_X86_64_IMAGE: manylinux_2_28
+            dnf install -y --nogpgcheck cuda-nvcc-13-1 cuda-cudart-devel-13-1 cuda-cupti-13-1 cuda-driver-devel-13-1 cuda-nvml-devel-13-1 openssl3-devel
+          # Install OpenSSL on the Windows runner so find_package(OpenSSL)
+          # in CMakeLists.txt succeeds and cpp-httplib gets compiled with
+          # CPPHTTPLIB_OPENSSL_SUPPORT=1. Chocolatey is pre-installed on
+          # GitHub's windows-latest runners; the openssl package installs
+          # to C:\Program Files\OpenSSL-Win64\ (matches OPENSSL_ROOT_DIR
+          # in CIBW_ENVIRONMENT_WINDOWS above).
+          CIBW_BEFORE_ALL_WINDOWS: choco install -y openssl --no-progress
+          # cibuildwheel ships NO default Windows repair command and only
+          # auto-installs delvewheel for that (nonexistent) default. Because
+          # we override CIBW_REPAIR_WHEEL_COMMAND_WINDOWS below (to pass
+          # --add-path for the OpenSSL DLLs), we must install delvewheel
+          # ourselves. The repair step runs in this same build env, so the
+          # tool lands on PATH. Without this the repair dies with
+          # "'delvewheel' is not recognized as an internal or external command".
+          CIBW_BEFORE_BUILD_WINDOWS: pip install delvewheel
+          # Pin a recent manylinux_2_28 image (AlmaLinux 8.10). cibuildwheel
+          # 2.22.0's default pin (2024.11.16-1) is an AlmaLinux 8.6 snapshot
+          # whose repos only carry OpenSSL 1.1.1 — there is NO openssl3
+          # package — so the build died with "No match for argument:
+          # openssl3-devel". 8.10 ships openssl3 (3.5.5). Same glibc-2.28 /
+          # manylinux_2_28 ABI; verified to have cp312+cp313, openssl3-devel,
+          # and libssl.so.3 in /usr/lib64 (so auditwheel still bundles it).
+          CIBW_MANYLINUX_X86_64_IMAGE: quay.io/pypa/manylinux_2_28_x86_64:2026.05.17-1
           CIBW_BUILD: "cp312-manylinux_x86_64 cp313-manylinux_x86_64 cp312-win_amd64 cp313-win_amd64"
           # libnvidia-ml.so.1 is excluded for the same reason as libcuda.so.1:
           # it ships with the NVIDIA driver, not the CUDA toolkit, and is
@@ -156,7 +205,39 @@ jobs:
           # could not be located"). The toolkit's `libnvidia-ml.so` stub is
           # only the unversioned link-time placeholder — the versioned
           # `.so.1` the SONAME chains to lives on the user's machine.
+          #
+          # libssl/libcrypto are NOT excluded — auditwheel bundles them
+          # under gpufl.libs/ so the wheel ships its own OpenSSL and
+          # HTTPS works on user machines without any system install.
           CIBW_REPAIR_WHEEL_COMMAND_LINUX: "auditwheel repair --plat manylinux_2_28_x86_64 --exclude libcuda.so.1 --exclude libnvidia-ml.so.1 -w {dest_dir} {wheel}"
+          # On Windows, cibuildwheel's default is `delvewheel repair`.
+          # We need delvewheel to find the OpenSSL DLLs (libssl-3-x64.dll,
+          # libcrypto-3-x64.dll) so it copies them into the wheel. The
+          # choco install puts them under C:\Program Files\OpenSSL-Win64\bin\
+          # — give that to delvewheel via --add-path. Without this, the
+          # rebuilt wheel imports cleanly on a system that already has
+          # OpenSSL on PATH but fails on a clean machine.
+          # delvewheel vendors the wheel's DLL deps. We must:
+          #  * --add-path the dirs holding the DLLs to bundle. OpenSSL is in
+          #    its choco bin; cudart64_*.dll is in CUDA\vX.Y\bin (on PATH, but
+          #    listed for safety); cupti64_*.dll lives in CUDA's
+          #    extras\CUPTI\lib64, which is NOT on PATH — without it delvewheel
+          #    fails with "Unable to find library: cupti64_2025.4.0.dll". We
+          #    bundle cudart+cupti so the wheel is self-contained, matching the
+          #    Linux wheel (auditwheel bundles libcudart/libcupti). The CUPTI
+          #    version-suffixed DLL name also makes excluding it fragile across
+          #    CUDA point releases, so bundling is the robust choice.
+          #  * --exclude the driver DLLs that ship with the user's driver, not
+          #    the toolkit, and are absent on the GPU-less runner: nvcuda.dll
+          #    (== libcuda.so.1) and nvml.dll (== libnvidia-ml.so.1). Mirrors
+          #    the Linux auditwheel --exclude flags.
+          # Paths use the pinned CUDA 13.1 location (see the Jimver cuda-toolkit
+          # step). Both --add-path and --exclude are ';'-delimited.
+          CIBW_REPAIR_WHEEL_COMMAND_WINDOWS: >-
+            delvewheel repair
+            --add-path "C:\\Program Files\\OpenSSL-Win64\\bin;C:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v13.1\\bin;C:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v13.1\\extras\\CUPTI\\lib64"
+            --exclude "nvcuda.dll;nvml.dll"
+            -w {dest_dir} {wheel}
       - uses: actions/upload-artifact@v4
         with:

{gpufl-0.1.4 → gpufl-1.0.0}/CMakeLists.txt RENAMED Viewed

@@ -1,7 +1,7 @@
 cmake_minimum_required(VERSION 3.31)
 project(gpufl_client
-    VERSION 0.1.4
+    VERSION 1.0.0
     LANGUAGES CXX
     DESCRIPTION "Header-only GPU monitoring client library"
 )

{gpufl-0.1.4 → gpufl-1.0.0}/Dockerfile.monitor RENAMED Viewed

@@ -5,10 +5,14 @@
 #
 # Run:
 #   docker run --gpus all \
-#     -e GPUFL_HTTP_URL=http://my-backend:8080/api/v1/events/ \
+#     -e GPUFL_HTTP_HOST=https://api.gpuflight.com \
 #     -e GPUFL_HTTP_TOKEN=gfl_... \
 #     gpufl/monitor:latest
 #
+# GPUFL_HTTP_HOST is just the scheme+host. The agent appends the
+# /api/{version}/events/<type> path automatically; override the
+# version with GPUFL_HTTP_API_VERSION when the backend cuts v2 etc.
+#
 # The Java agent JAR is pulled from the pre-built ghcr.io/gpu-flight/gpufl-agent image.
 # No local gpufl-agent checkout is required.

{gpufl-0.1.4 → gpufl-1.0.0}/Dockerfile.monitor.amd RENAMED Viewed

@@ -7,10 +7,14 @@
 #   docker run -d \
 #     --device /dev/kfd --device /dev/dri \
 #     --group-add video --group-add render \
-#     -e GPUFL_HTTP_URL=http://my-backend:8080/api/v1/events/ \
+#     -e GPUFL_HTTP_HOST=https://api.gpuflight.com \
 #     -e GPUFL_HTTP_TOKEN=gfl_... \
 #     gpufl/monitor-amd:latest
 #
+# GPUFL_HTTP_HOST is just the scheme+host. The agent appends the
+# /api/{version}/events/<type> path automatically; override the
+# version with GPUFL_HTTP_API_VERSION when the backend cuts v2 etc.
+#
 # The Java agent JAR is pulled from the pre-built ghcr.io/gpu-flight/gpufl-agent image.
 # No local gpufl-agent checkout is required.

{gpufl-0.1.4 → gpufl-1.0.0}/Dockerfile.monitor.supervisord.conf RENAMED Viewed

@@ -21,7 +21,9 @@ stderr_logfile_maxbytes=0
 ;   GPUFL_SOURCE_FOLDER  — must match the log dir used by gpufl-monitor
 ;   GPUFL_SOURCE_PREFIX  — must match GPUFL_MONITOR_LOG_DIR base name (default: session)
 ;   GPUFL_PUBLISHER_TYPE — http or kafka
-;   GPUFL_HTTP_URL       — e.g. http://backend:8080/api/v1/events/
+;   GPUFL_HTTP_HOST      — scheme+host, e.g. https://api.gpuflight.com
+;                          (agent appends /api/{version}/events/<type> automatically)
+;   GPUFL_HTTP_API_VERSION — optional; defaults to v1
 ;   GPUFL_HTTP_TOKEN     — Bearer token
 ;   GPUFL_LOG_TYPES      — default: device,scope,system (override to restrict channels)
 ;   GPUFL_CURSOR_FILE    — default: ./cursor.json (override for persistence across restarts)

{gpufl-0.1.4 → gpufl-1.0.0}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.2
 Name: gpufl
-Version: 0.1.4
+Version: 1.0.0
 Summary: GPU Monitoring Client
 Author-Email: Myoungho Shin <myounghoshin84@gmail.com>
 Classifier: Development Status :: 3 - Alpha
@@ -22,7 +22,7 @@ Requires-Dist: jax>=0.4; extra == "jax"
 Provides-Extra: triton
 Requires-Dist: triton>=2.1; extra == "triton"
 Provides-Extra: numba
-Requires-Dist: numba; extra == "numba"
+Requires-Dist: numba-cuda; extra == "numba"
 Requires-Dist: numpy; extra == "numba"
 Provides-Extra: viz
 Requires-Dist: pandas>=1.5; extra == "viz"
@@ -36,7 +36,7 @@ Requires-Dist: requests>=2.28; extra == "all"
 Requires-Dist: cupy-cuda12x>=12; extra == "all"
 Requires-Dist: jax>=0.4; extra == "all"
 Requires-Dist: triton>=2.1; extra == "all"
-Requires-Dist: numba; extra == "all"
+Requires-Dist: numba-cuda; extra == "all"
 Requires-Dist: numpy; extra == "all"
 Requires-Dist: pandas>=1.5; extra == "all"
 Requires-Dist: matplotlib>=3.7; extra == "all"
@@ -63,7 +63,7 @@ To keep the initial design coherent, **we are not currently accepting major feat
 Try the portal with real session data — no sign-up required:
-**[https://gpufl-front.vercel.app/demo/gdemo_4R98GA5MzYdosqvNsUqdp_MaUgEcDHABS2C5PHbCDQE](https://gpufl-front.vercel.app/demo/gdemo_4R98GA5MzYdosqvNsUqdp_MaUgEcDHABS2C5PHbCDQE)**
+**[Demo Link](https://demo.gpuflight.com)**
 ## Key Features
@@ -128,7 +128,6 @@ import gpufl
 gpufl.init("my-app",
            log_path="./my_logs",
            sampling_auto_start=True,
-           enable_kernel_details=True,
            enable_stack_trace=True)
 a = torch.randn(1024, 1024, device="cuda")
@@ -150,8 +149,7 @@ from gpufl import ProfilingEngine
 gpufl.init("my-app",
            log_path="./logs",
-           profiling_engine=ProfilingEngine.PcSampling,
-           enable_kernel_details=True)
+           profiling_engine=ProfilingEngine.PcSampling)
 ```
 | Engine | What it collects | Analyzer method | Best for |
@@ -167,7 +165,6 @@ gpufl.init("my-app",
 gpufl::InitOptions opts;
 opts.app_name = "my_app";
 opts.log_path = "my_logs";
-opts.enable_kernel_details = true;
 opts.enable_stack_trace = true;
 opts.sampling_auto_start = true;
 opts.profiling_engine = gpufl::ProfilingEngine::SassMetrics;
@@ -305,6 +302,60 @@ viz.show()
 ---
+## Report Generation
+For a quick, shareable text summary of a session — session metadata, kernel
+hotspots, duration percentiles, and system metrics — generate a **text report**.
+It's the fastest way to see "what happened" without opening the dashboard, and
+it drops cleanly into CI logs, PR comments, or a plain terminal.
+![Text report example](images/Screenshot2.png)
+The report includes:
+- **Session Summary** — app name, session ID, duration, GPU device + SM count.
+- **Kernel Execution Summary** — total / unique kernels, GPU-busy %, and
+  duration statistics (avg / median / P90 / P99 / min / max). When a SASS
+  profiling engine was active, kernel durations include instrumentation
+  overhead and the report labels them accordingly.
+- **Top kernels by total GPU time** — with per-kernel call counts.
+- **Per-kernel details** — grid/block dimensions, occupancy, registers,
+  shared memory (static + dynamic), register spills, and Waves/SM.
+### From C++
+Call `generateReport()` after `shutdown()` — it reads the NDJSON logs written
+during the session:
+```cpp
+gpufl::init(opts);
+// ... your CUDA / HIP work ...
+gpufl::shutdown();
+gpufl::generateReport();               // print to stdout
+gpufl::generateReport("report.txt");   // or save to a file
+```
+### From Python
+```python
+from gpufl.report import generate_report
+# Print the report — wrap in print() so newlines render. In a Jupyter
+# notebook this also keeps the table columns aligned (stdout renders in
+# a monospace font). A bare `generate_report(...)` as a cell's last
+# expression shows an escaped one-line string, so always print() it.
+text = generate_report("./logs", log_prefix="my_app", top_n=10)
+print(text)
+# Or save it straight to a file
+generate_report("./logs", log_prefix="my_app", top_n=10, output_path="report.txt")
+```
+The Python version reads the same NDJSON logs the analyzer uses — no GPU
+required, so you can generate reports from logs copied off another machine.
+---
 ## Testing
 ### C++ Tests
@@ -345,5 +396,13 @@ To allow non-root users to profile GPU kernels (using CUPTI/PC Sampling) on Linu
 ---
-*GPU Flight is open source: [github.com/gpu-flight](https://github.com/gpu-flight)*
-*Python package: [pypi.org/project/gpufl](https://pypi.org/project/gpufl/)*
+## Where your logs go
+By default the client writes NDJSON to disk. To stream them to a hosted
+dashboard, set `backend_url` + `api_key` (or the `GPUFL_BACKEND_URL` /
+`GPUFL_API_KEY` env vars) and they're delivered live to
+[app.gpuflight.com](https://app.gpuflight.com). Create a workspace at
+[gpuflight.com](https://gpuflight.com)
+This client (gpufl-client) is open source. The ingestion service and
+the dashboard UI are proprietary and managed-only today.

{gpufl-0.1.4 → gpufl-1.0.0}/README.md RENAMED Viewed

@@ -18,7 +18,7 @@ To keep the initial design coherent, **we are not currently accepting major feat
 Try the portal with real session data — no sign-up required:
-**[https://gpufl-front.vercel.app/demo/gdemo_4R98GA5MzYdosqvNsUqdp_MaUgEcDHABS2C5PHbCDQE](https://gpufl-front.vercel.app/demo/gdemo_4R98GA5MzYdosqvNsUqdp_MaUgEcDHABS2C5PHbCDQE)**
+**[Demo Link](https://demo.gpuflight.com)**
 ## Key Features
@@ -83,7 +83,6 @@ import gpufl
 gpufl.init("my-app",
            log_path="./my_logs",
            sampling_auto_start=True,
-           enable_kernel_details=True,
            enable_stack_trace=True)
 a = torch.randn(1024, 1024, device="cuda")
@@ -105,8 +104,7 @@ from gpufl import ProfilingEngine
 gpufl.init("my-app",
            log_path="./logs",
-           profiling_engine=ProfilingEngine.PcSampling,
-           enable_kernel_details=True)
+           profiling_engine=ProfilingEngine.PcSampling)
 ```
 | Engine | What it collects | Analyzer method | Best for |
@@ -122,7 +120,6 @@ gpufl.init("my-app",
 gpufl::InitOptions opts;
 opts.app_name = "my_app";
 opts.log_path = "my_logs";
-opts.enable_kernel_details = true;
 opts.enable_stack_trace = true;
 opts.sampling_auto_start = true;
 opts.profiling_engine = gpufl::ProfilingEngine::SassMetrics;
@@ -260,6 +257,60 @@ viz.show()
 ---
+## Report Generation
+For a quick, shareable text summary of a session — session metadata, kernel
+hotspots, duration percentiles, and system metrics — generate a **text report**.
+It's the fastest way to see "what happened" without opening the dashboard, and
+it drops cleanly into CI logs, PR comments, or a plain terminal.
+![Text report example](images/Screenshot2.png)
+The report includes:
+- **Session Summary** — app name, session ID, duration, GPU device + SM count.
+- **Kernel Execution Summary** — total / unique kernels, GPU-busy %, and
+  duration statistics (avg / median / P90 / P99 / min / max). When a SASS
+  profiling engine was active, kernel durations include instrumentation
+  overhead and the report labels them accordingly.
+- **Top kernels by total GPU time** — with per-kernel call counts.
+- **Per-kernel details** — grid/block dimensions, occupancy, registers,
+  shared memory (static + dynamic), register spills, and Waves/SM.
+### From C++
+Call `generateReport()` after `shutdown()` — it reads the NDJSON logs written
+during the session:
+```cpp
+gpufl::init(opts);
+// ... your CUDA / HIP work ...
+gpufl::shutdown();
+gpufl::generateReport();               // print to stdout
+gpufl::generateReport("report.txt");   // or save to a file
+```
+### From Python
+```python
+from gpufl.report import generate_report
+# Print the report — wrap in print() so newlines render. In a Jupyter
+# notebook this also keeps the table columns aligned (stdout renders in
+# a monospace font). A bare `generate_report(...)` as a cell's last
+# expression shows an escaped one-line string, so always print() it.
+text = generate_report("./logs", log_prefix="my_app", top_n=10)
+print(text)
+# Or save it straight to a file
+generate_report("./logs", log_prefix="my_app", top_n=10, output_path="report.txt")
+```
+The Python version reads the same NDJSON logs the analyzer uses — no GPU
+required, so you can generate reports from logs copied off another machine.
+---
 ## Testing
 ### C++ Tests
@@ -300,5 +351,13 @@ To allow non-root users to profile GPU kernels (using CUPTI/PC Sampling) on Linu
 ---
-*GPU Flight is open source: [github.com/gpu-flight](https://github.com/gpu-flight)*
-*Python package: [pypi.org/project/gpufl](https://pypi.org/project/gpufl/)*
+## Where your logs go
+By default the client writes NDJSON to disk. To stream them to a hosted
+dashboard, set `backend_url` + `api_key` (or the `GPUFL_BACKEND_URL` /
+`GPUFL_API_KEY` env vars) and they're delivered live to
+[app.gpuflight.com](https://app.gpuflight.com). Create a workspace at
+[gpuflight.com](https://gpuflight.com)
+This client (gpufl-client) is open source. The ingestion service and
+the dashboard UI are proprietary and managed-only today.

{gpufl-0.1.4 → gpufl-1.0.0}/daemon/README.md RENAMED Viewed

@@ -79,7 +79,7 @@ docker build \
 Copy `.env.example` to `.env` and set the required variables, then:
 ```bash
-GPUFL_HTTP_URL=https://your-backend/api/v1/events/ \
+GPUFL_HTTP_HOST=https://your-backend \
 GPUFL_HTTP_TOKEN=gfl_your_token_here \
 docker compose -f docker-compose.monitor.yml up -d
 ```
@@ -99,7 +99,7 @@ docker compose -f docker-compose.monitor.yml down
 ### AMD
 ```bash
-GPUFL_HTTP_URL=https://your-backend/api/v1/events/ \
+GPUFL_HTTP_HOST=https://your-backend \
 GPUFL_HTTP_TOKEN=gfl_your_token_here \
 docker compose -f docker-compose.monitor.amd.yml up -d
 ```
@@ -127,7 +127,7 @@ docker run -d \
   --name gpufl-monitor \
   --gpus all \
   --restart unless-stopped \
-  -e GPUFL_HTTP_URL=https://your-backend/api/v1/events/ \
+  -e GPUFL_HTTP_HOST=https://your-backend \
   -e GPUFL_HTTP_TOKEN=gfl_your_token_here \
   -v gpufl-cursor:/var/gpufl/monitor \
   gpufl/monitor:latest
@@ -141,7 +141,7 @@ docker run -d \
   --device /dev/kfd --device /dev/dri \
   --group-add video --group-add render \
   --restart unless-stopped \
-  -e GPUFL_HTTP_URL=https://your-backend/api/v1/events/ \
+  -e GPUFL_HTTP_HOST=https://your-backend \
   -e GPUFL_HTTP_TOKEN=gfl_your_token_here \
   -v gpufl-cursor-amd:/var/gpufl/monitor \
   gpufl/monitor-amd:latest
@@ -175,7 +175,8 @@ The named volume persists the agent's read cursor so it resumes from where it le
 | Variable | Default | Description |
 |---|---|---|
 | `GPUFL_PUBLISHER_TYPE` | `http` | Publisher backend: `http` or `kafka` |
-| `GPUFL_HTTP_URL` | *(required)* | Backend ingest URL, e.g. `https://app.gpuflight.io/api/v1/events/` |
+| `GPUFL_HTTP_HOST` | *(required)* | Backend scheme+host, e.g. `https://api.gpuflight.com`. The agent appends `/api/{version}/events/<type>` automatically. |
+| `GPUFL_HTTP_API_VERSION` | `v1` | Backend API version. Bump when the backend cuts v2 etc. |
 | `GPUFL_HTTP_TOKEN` | *(empty)* | Bearer token for the backend API |
 | `GPUFL_HTTP_TIMEOUT_SEC` | `10` | HTTP request timeout in seconds |

{gpufl-0.1.4 → gpufl-1.0.0}/docker-compose.monitor.amd.yml RENAMED Viewed

@@ -22,9 +22,13 @@ services:
       GPUFL_LOG_TYPES: ${GPUFL_LOG_TYPES:-device,scope,system}
       GPUFL_CURSOR_FILE: ${GPUFL_CURSOR_FILE:-/var/gpufl/monitor/cursor.json}
-      # Java agent — publisher (HTTP)
+      # Java agent — publisher (HTTP). Set GPUFL_HTTP_HOST to just the
+      # scheme+host (e.g. https://api.gpuflight.com); the agent builds
+      # the /api/{version}/events/<type> path itself. Override the
+      # version via GPUFL_HTTP_API_VERSION when the backend bumps to v2.
       GPUFL_PUBLISHER_TYPE: ${GPUFL_PUBLISHER_TYPE:-http}
-      GPUFL_HTTP_URL: ${GPUFL_HTTP_URL}
+      GPUFL_HTTP_HOST: ${GPUFL_HTTP_HOST}
+      GPUFL_HTTP_API_VERSION: ${GPUFL_HTTP_API_VERSION:-v1}
       GPUFL_HTTP_TOKEN: ${GPUFL_HTTP_TOKEN:-}
       GPUFL_HTTP_TIMEOUT_SEC: ${GPUFL_HTTP_TIMEOUT_SEC:-10}

{gpufl-0.1.4 → gpufl-1.0.0}/docker-compose.monitor.yml RENAMED Viewed

@@ -39,9 +39,13 @@ services:
       GPUFL_SOURCE_FOLDERS: ${GPUFL_SOURCE_FOLDERS:-/var/gpufl/monitor,/var/gpufl/demo}
       GPUFL_CURSOR_FILE: ${GPUFL_CURSOR_FILE:-/var/gpufl/monitor/cursor.json}
-      # Java agent — publisher (HTTP)
+      # Java agent — publisher (HTTP). Set GPUFL_HTTP_HOST to just the
+      # scheme+host (e.g. https://api.gpuflight.com); the agent builds
+      # the /api/{version}/events/<type> path itself. Override the
+      # version via GPUFL_HTTP_API_VERSION when the backend bumps to v2.
       GPUFL_PUBLISHER_TYPE: ${GPUFL_PUBLISHER_TYPE:-http}
-      GPUFL_HTTP_URL: ${GPUFL_HTTP_URL}
+      GPUFL_HTTP_HOST: ${GPUFL_HTTP_HOST}
+      GPUFL_HTTP_API_VERSION: ${GPUFL_HTTP_API_VERSION:-v1}
       GPUFL_HTTP_TOKEN: ${GPUFL_HTTP_TOKEN:-}
       GPUFL_HTTP_TIMEOUT_SEC: ${GPUFL_HTTP_TIMEOUT_SEC:-10}

{gpufl-0.1.4 → gpufl-1.0.0}/example/amd/gpufl_scope_demo.cpp RENAMED Viewed

@@ -130,7 +130,6 @@ int main() {
     opts.system_sample_rate_ms = 50;
     opts.kernel_sample_rate_ms = 0;
     opts.sampling_auto_start = true;
-    opts.enable_kernel_details = true;
     opts.enable_debug_output = true;
     opts.enable_stack_trace = false;
     opts.profiling_engine = gpufl::ProfilingEngine::SassMetrics;

{gpufl-0.1.4 → gpufl-1.0.0}/example/cuda/block_style_example.cu RENAMED Viewed

@@ -38,7 +38,6 @@ int main() {
     opts.log_path = "gfl_block";
     opts.system_sample_rate_ms = 50;
     opts.kernel_sample_rate_ms = 50;
-    opts.enable_kernel_details = true;
     opts.sampling_auto_start = true;
     opts.enable_debug_output = true;
     opts.enable_source_collection = true;

{gpufl-0.1.4 → gpufl-1.0.0}/example/cuda/memory_coalescing_demo.cu RENAMED Viewed

@@ -61,7 +61,6 @@ int main() {
     opts.log_path = "memory_coalescing_demo";
     opts.system_sample_rate_ms = 10;
     opts.kernel_sample_rate_ms = 10;
-    opts.enable_kernel_details = true;
     opts.sampling_auto_start = true;
     opts.enable_debug_output = true;
     opts.profiling_engine = gpufl::ProfilingEngine::PcSamplingWithSass;

{gpufl-0.1.4 → gpufl-1.0.0}/example/cuda/occupancy_demo.cu RENAMED Viewed

@@ -79,7 +79,6 @@ int main()
     gpufl::InitOptions opts;
     opts.app_name              = "occupancy_demo";
     opts.log_path              = "occupancy_demo.log";
-    opts.enable_kernel_details = true;  // required for occupancy breakdown fields
     opts.sampling_auto_start   = true;
     opts.enable_debug_output   = false;

{gpufl-0.1.4 → gpufl-1.0.0}/example/cuda/sass_divergence_demo.cu RENAMED Viewed

@@ -144,15 +144,12 @@ int main() {
     gpufl::InitOptions opts;
     opts.app_name = "sass_divergence_demo";
     opts.log_path = "sass_divergence";
-    if (const char* k = std::getenv("GPUFL_API_KEY")) opts.api_key = k;
-    opts.backend_url = "http://localhost:8080";
-    opts.remote_upload = true;
+    opts.remote_upload = false;
     opts.system_sample_rate_ms = 10;
-    opts.enable_kernel_details = true;
     opts.enable_debug_output = true;
     opts.sampling_auto_start = true;
     opts.enable_stack_trace = true;
-    opts.profiling_engine = gpufl::ProfilingEngine::PcSampling;
+    opts.profiling_engine = gpufl::ProfilingEngine::PcSamplingWithSass;
     if (!gpufl::init(opts)) {
         std::cerr << "Failed to initialize gpufl" << std::endl;

{gpufl-0.1.4 → gpufl-1.0.0}/example/cuda/vector_add_benchmark.cu RENAMED Viewed

@@ -27,7 +27,6 @@ int main() {
     opts.log_path = "vector_add_benchmark";
     opts.system_sample_rate_ms = 50;
     opts.kernel_sample_rate_ms = 50;
-    opts.enable_kernel_details = true;
     opts.sampling_auto_start = true;
     opts.enable_debug_output = true;
     opts.enable_source_collection = true;

{gpufl-0.1.4 → gpufl-1.0.0}/example/python/02_numba_cuda.py RENAMED Viewed

@@ -1,7 +1,9 @@
 import gpufl as gfl
+from gpufl.report import generate_report
 import numpy as np
 from numba import cuda
 import math
+import os
 import time
 # --- 1. Define a Real CUDA Kernel (Matrix Mul) ---
@@ -20,9 +22,32 @@ def matmul_kernel(A, B, C):
 def run_benchmark():
     # --- 2. Initialize GPUFL ---
-    # We enable the background sampler (16ms) to catch VRAM/Power usage during the heavy compute
+    # LOG_PATH is the file prefix the FileLogSink writes to — it produces
+    # <LOG_PATH>.device.log / .scope.log / .system.log. We reuse it below
+    # to point generate_report() at the same files.
+    LOG_PATH = "./gfl_logs"
+    BACKEND_URL = os.environ.get("GPUFL_BACKEND_URL", "api.gpuflight.com")
+    API_KEY = os.environ.get("GPUFL_API_KEY", "")
+    REMOTE_UPLOAD = bool(API_KEY)
     print("[GPUFL] Initializing...")
-    gfl.init("Numba_App", "./gfl_logs", 100)
+    if REMOTE_UPLOAD:
+        print(f"[GPUFL] Live upload ON -> {BACKEND_URL}")
+    else:
+        print("[GPUFL] Live upload OFF (set GPUFL_API_KEY to enable). Local files only.")
+    gfl.init(
+        app_name="Numba_App",
+        log_path=LOG_PATH,
+        sampling_auto_start=True,
+        system_sample_rate_ms=100,
+        enable_debug_output=True,
+        profiling_engine=gfl.ProfilingEngine.PcSamplingWithSass,
+        backend_url=BACKEND_URL,
+        api_key=API_KEY,
+        remote_upload=REMOTE_UPLOAD,
+    )
     try:
         # --- 3. Setup Data (Heavy Load) ---
@@ -70,6 +95,21 @@ def run_benchmark():
         print("[GPUFL] Shutting down...")
         gfl.shutdown()
+        # --- 6. Generate a text report from the logs we just wrote ---
+        # shutdown() above flushes and closes the NDJSON channels, so the
+        # report reflects the full session. generate_report reads the same
+        # logs the analyzer uses — no GPU required for this step. We split
+        # LOG_PATH into (dir, prefix) the way GpuFlightSession expects:
+        #   "./gfl_logs" -> dir=".", prefix="gfl_logs"
+        #                -> reads ./gfl_logs.{device,scope,system}.log
+        # Wrap in print() so the report renders with real newlines (and,
+        # in a Jupyter notebook, in the monospace stdout area so the
+        # kernel tables stay aligned).
+        log_dir = os.path.dirname(LOG_PATH) or "."
+        log_prefix = os.path.basename(LOG_PATH)
+        print("\n[GPUFL] Session report:\n")
+        print(generate_report(log_dir, log_prefix=log_prefix, top_n=10))
 if __name__ == "__main__":
     if cuda.is_available():
         run_benchmark()

{gpufl-0.1.4 → gpufl-1.0.0}/example/python/03_pytorch_benchmark.py RENAMED Viewed

@@ -31,9 +31,7 @@ def run_stress_test():
                sampling_auto_start=True,
                system_sample_rate_ms=50,
                kernel_sample_rate_ms=50,
-               enable_kernel_details=True,
-               enable_debug_output=True,
-               enable_profiling=True,
+               enable_debug_output=False,
                enable_stack_trace=True,
                # opt-in to memory tracking. Default-off in v1
                # because TF eager and similar workloads can produce
@@ -49,7 +47,7 @@ def run_stress_test():
                remote_upload=remote_upload,
                api_key=api_key,
                backend_url=backend_url,
-               profiling_engine=gpufl.ProfilingEngine.PcSamplingWithSass)
+               profiling_engine=gpufl.ProfilingEngine.PcSampling)
     try:
         # 2. Allocate (Uses approx 3GB VRAM)

gpufl-1.0.0/images/Screenshot2.png ADDED Viewed

Binary file

gpufl 0.1.4__tar.gz → 1.0.0__tar.gz

gpufl 0.1.4tar.gz → 1.0.0tar.gz