PyPI - kernelmeter - Versions diffs - 0.3.1__tar.gz → 0.4.1__tar.gz - Mend

kernelmeter 0.3.1tar.gz → 0.4.1tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (33) hide show

{kernelmeter-0.3.1/src/kernelmeter.egg-info → kernelmeter-0.4.1}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: kernelmeter
-Version: 0.3.1
+Version: 0.4.1
 Summary: Query every CUDA device attribute without profiling a kernel, and benchmark your kernels against the hardware's speed of light.
 Author: nuemaan
 License: MIT
@@ -82,6 +82,12 @@ Device 0: Tesla T4 (14.6 GiB)
   compute capability        : 7.5
   theoretical mem bandwidth : 320.1 GB/s
   theoretical FP32 peak     : 8.14 TFLOP/s
+  theoretical fp16 tensor   : 65.13 TFLOP/s (dense)
+  architecture (nvml)       : Turing, 2560 CUDA cores
+  pcie link (nvml)          : gen1/3 x8/16
+  memory in use (nvml)      : 450 / 15360 MiB
+  ecc (nvml)                : on
+  vbios (nvml)              : 90.04.96.00.02
   attribute                                        value
   ------------------------------------------------ ------------
@@ -90,16 +96,22 @@ Device 0: Tesla T4 (14.6 GiB)
   max_shared_memory_per_block                      49152
   warp_size                                        32
   clock_rate_khz                                   1590000
-  ...                                              (147 attributes total)
+  ...                                              (148 attributes total)
 ```
-These are the same values Nsight Compute shows as `device__attribute_*`,
-except you don't need to profile a kernel to see them. Add `--json` for
-machine-readable output.
-Every attribute id is probed against the live driver, so the output always
-matches the machine you run it on. Ids newer than the bundled name table
-still show up, just under a generic `attribute_<id>` name.
+The `attribute` table is read straight from the driver via
+`cuDeviceGetAttribute`, the same values Nsight Compute shows as
+`device__attribute_*`, but you don't need to profile a kernel to see them.
+Every id is probed live, so the output matches the machine you run it on;
+ids newer than the bundled name table show up as `attribute_<id>`.
+The `(nvml)` lines come from a second source: NVML, the library behind
+`nvidia-smi`, also shipped with the driver. They surface facts the driver
+attribute enum doesn't have (architecture name, real CUDA core count,
+PCIe link, live memory use, ECC, VBIOS) and are skipped silently if NVML
+isn't present. (The `gen1/3 x8/16` above is the live link: an idle T4
+drops to a lower PCIe state and ramps up under load.) Add `--json` for
+machine-readable output; the NVML block lands under `devices[].nvml`.
 ## Benchmarking a kernel

{kernelmeter-0.3.1 → kernelmeter-0.4.1}/README.md RENAMED Viewed

@@ -58,6 +58,12 @@ Device 0: Tesla T4 (14.6 GiB)
   compute capability        : 7.5
   theoretical mem bandwidth : 320.1 GB/s
   theoretical FP32 peak     : 8.14 TFLOP/s
+  theoretical fp16 tensor   : 65.13 TFLOP/s (dense)
+  architecture (nvml)       : Turing, 2560 CUDA cores
+  pcie link (nvml)          : gen1/3 x8/16
+  memory in use (nvml)      : 450 / 15360 MiB
+  ecc (nvml)                : on
+  vbios (nvml)              : 90.04.96.00.02
   attribute                                        value
   ------------------------------------------------ ------------
@@ -66,16 +72,22 @@ Device 0: Tesla T4 (14.6 GiB)
   max_shared_memory_per_block                      49152
   warp_size                                        32
   clock_rate_khz                                   1590000
-  ...                                              (147 attributes total)
+  ...                                              (148 attributes total)
 ```
-These are the same values Nsight Compute shows as `device__attribute_*`,
-except you don't need to profile a kernel to see them. Add `--json` for
-machine-readable output.
-Every attribute id is probed against the live driver, so the output always
-matches the machine you run it on. Ids newer than the bundled name table
-still show up, just under a generic `attribute_<id>` name.
+The `attribute` table is read straight from the driver via
+`cuDeviceGetAttribute`, the same values Nsight Compute shows as
+`device__attribute_*`, but you don't need to profile a kernel to see them.
+Every id is probed live, so the output matches the machine you run it on;
+ids newer than the bundled name table show up as `attribute_<id>`.
+The `(nvml)` lines come from a second source: NVML, the library behind
+`nvidia-smi`, also shipped with the driver. They surface facts the driver
+attribute enum doesn't have (architecture name, real CUDA core count,
+PCIe link, live memory use, ECC, VBIOS) and are skipped silently if NVML
+isn't present. (The `gen1/3 x8/16` above is the live link: an idle T4
+drops to a lower PCIe state and ramps up under load.) Add `--json` for
+machine-readable output; the NVML block lands under `devices[].nvml`.
 ## Benchmarking a kernel

{kernelmeter-0.3.1 → kernelmeter-0.4.1}/pyproject.toml RENAMED Viewed

@@ -4,7 +4,7 @@ build-backend = "setuptools.build_meta"
 [project]
 name = "kernelmeter"
-version = "0.3.1"
+version = "0.4.1"
 description = "Query every CUDA device attribute without profiling a kernel, and benchmark your kernels against the hardware's speed of light."
 readme = "README.md"
 license = { text = "MIT" }

{kernelmeter-0.3.1 → kernelmeter-0.4.1}/src/kernelmeter/__init__.py RENAMED Viewed

@@ -1,25 +1,28 @@
 """kernelmeter: CUDA device attributes without profiling, and kernel
 benchmarks measured against the hardware's speed of light."""
+from . import extras, occupancy, roofline
 from .bench import REGISTRY, BenchResult, BenchSpec, benchmark, device_peaks, run, run_registry
 from .cudadrv import CudaDriverError, CudaNotAvailableError, Driver
-from . import occupancy, roofline
+from .extras import DeviceExtras
 from .occupancy import Occupancy
 from .peaks import Peaks
-__version__ = "0.3.1"
+__version__ = "0.4.1"
 __all__ = [
     "BenchResult",
     "BenchSpec",
     "CudaDriverError",
     "CudaNotAvailableError",
+    "DeviceExtras",
     "Driver",
     "Occupancy",
     "Peaks",
     "REGISTRY",
     "benchmark",
     "device_peaks",
+    "extras",
     "occupancy",
     "roofline",
     "run",

{kernelmeter-0.3.1 → kernelmeter-0.4.1}/src/kernelmeter/attrs.py RENAMED Viewed

@@ -159,9 +159,16 @@ KNOWN_ATTRS: dict[int, str] = {
     146: "host_alloc_dma_buf_supported",
     147: "only_partial_host_native_atomic_supported",
     148: "atomic_reduction_supported",
-    # 149 is CU_DEVICE_ATTRIBUTE_MAX, a sentinel rather than a real
-    # attribute, so it stops here. Anything the driver adds beyond this is
-    # still reported generically as attribute_<id> by the probe below.
+    149: "d3d12_cig_streams_supported",
+    150: "dma_buf_mmap_supported",
+    151: "logical_endpoint_unicast_supported",
+    152: "logical_endpoint_multicast_supported",
+    153: "logical_endpoint_counted_ops_supported",
+    154: "logical_endpoint_unicast_access_on_owner_device_supported",
+    # CU_DEVICE_ATTRIBUTE_MAX (155 as of CUDA 13.x) is a sentinel that
+    # moves up with each toolkit release, not a real attribute. Names stop
+    # at the last defined value; anything newer the driver reports is still
+    # surfaced generically as attribute_<id> by the probe below.
 }

{kernelmeter-0.3.1 → kernelmeter-0.4.1}/src/kernelmeter/cli.py RENAMED Viewed

@@ -57,24 +57,55 @@ def _print_live_telemetry(ordinal: int) -> None:
 # ---------------------------------------------------------------------------
 def gather_info(driver: Driver) -> dict:
+    from . import extras as _extras
     major, minor = driver.driver_version()
     devices = []
     for ordinal in range(driver.device_count()):
         dev = driver.device(ordinal)
         attributes = _attrs.query_all(driver, dev)
         peaks = _peaks.derive(attributes)
+        nvml_extras = _extras.gather(ordinal)
         devices.append(
             {
                 "ordinal": ordinal,
                 "name": dev.name,
                 "total_memory_bytes": dev.total_mem_bytes,
                 "derived": peaks.as_dict(),
+                "nvml": nvml_extras.as_dict() if nvml_extras else None,
                 "attributes": attributes,
             }
         )
     return {"driver_version": f"{major}.{minor}", "devices": devices}
+def _print_nvml_extras(nvml: dict) -> None:
+    """Print the NVML-sourced facts, skipping fields the card didn't report."""
+    arch = nvml.get("architecture")
+    cores = nvml.get("num_gpu_cores")
+    if arch or cores:
+        bits = []
+        if arch:
+            bits.append(arch)
+        if cores:
+            bits.append(f"{cores} CUDA cores")
+        print("  architecture (nvml)       : " + ", ".join(bits))
+    gen, gen_max = nvml.get("pcie_gen_current"), nvml.get("pcie_gen_max")
+    w, w_max = nvml.get("pcie_width_current"), nvml.get("pcie_width_max")
+    if gen and w:
+        print(f"  pcie link (nvml)          : gen{gen}/{gen_max} x{w}/{w_max}")
+    total, used = nvml.get("memory_total_bytes"), nvml.get("memory_used_bytes")
+    if total:
+        print(
+            f"  memory in use (nvml)      : {used / 2**20:.0f} / "
+            f"{total / 2**20:.0f} MiB"
+        )
+    if nvml.get("ecc_enabled") is not None:
+        print(f"  ecc (nvml)                : {'on' if nvml['ecc_enabled'] else 'off'}")
+    if nvml.get("vbios_version"):
+        print(f"  vbios (nvml)              : {nvml['vbios_version']}")
 def cmd_info(args: argparse.Namespace) -> int:
     try:
         driver = Driver()
@@ -111,6 +142,8 @@ def cmd_info(args: argparse.Namespace) -> int:
                 "  theoretical tf32 tensor   : "
                 + _fmt(derived["theoretical_tf32_tensor_tflops"], " TFLOP/s (dense)", nd=2)
             )
+        if dev.get("nvml"):
+            _print_nvml_extras(dev["nvml"])
         _print_live_telemetry(dev["ordinal"])
         print(f"\n  {'attribute':<48} value")
         print(f"  {'-' * 48} {'-' * 12}")

kernelmeter-0.4.1/src/kernelmeter/extras.py ADDED Viewed

@@ -0,0 +1,93 @@
+"""Device facts from NVML, the second data source the driver attribute
+enum can't give you.
+``kernelmeter info`` reports ``cuDeviceGetAttribute`` values. Tools like
+Nsight Compute show more (architecture name, real core count, PCIe link,
+memory breakdown) because they pull from their own device database and
+from NVML. NVML ships with the driver, so this module adds those facts
+without a toolkit -- the same ctypes approach as the rest of kernelmeter.
+It does not invent ncu-internal metrics (sass_level, ram_type, ...): those
+aren't exposed by either the driver or NVML, so they would have to be
+hardcoded per board and would go stale. Everything here is read live.
+"""
+from __future__ import annotations
+from dataclasses import dataclass
+from . import nvml as _nvml
+@dataclass
+class DeviceExtras:
+    architecture: str | None
+    brand: str | None
+    num_gpu_cores: int | None
+    memory_total_bytes: int | None
+    memory_used_bytes: int | None
+    memory_free_bytes: int | None
+    pcie_gen_current: int | None
+    pcie_gen_max: int | None
+    pcie_width_current: int | None
+    pcie_width_max: int | None
+    ecc_enabled: bool | None
+    vbios_version: str | None
+    driver_version: str | None
+    def as_dict(self) -> dict:
+        return dict(self.__dict__)
+def from_nvml(n: "_nvml.Nvml", index: int = 0) -> DeviceExtras:
+    """Build the extras for one device from an open NVML handle. Each
+    query is individually tolerant: an unsupported field becomes None
+    rather than failing the whole gather."""
+    h = n.device(index)
+    def safe(fn, *args):
+        try:
+            return fn(*args)
+        except Exception:
+            return None
+    arch_id = safe(n.architecture, h)
+    brand_id = safe(n.brand, h)
+    mem = safe(n.memory_info, h) or (None, None, None)
+    pcie = safe(n.pcie_link, h) or (None, None, None, None)
+    return DeviceExtras(
+        architecture=_nvml.ARCH_NAMES.get(arch_id) if arch_id is not None else None,
+        brand=_nvml.BRAND_NAMES.get(brand_id) if brand_id is not None else None,
+        num_gpu_cores=safe(n.num_gpu_cores, h),
+        memory_total_bytes=mem[0],
+        memory_free_bytes=mem[1],
+        memory_used_bytes=mem[2],
+        pcie_gen_current=pcie[0],
+        pcie_gen_max=pcie[1],
+        pcie_width_current=pcie[2],
+        pcie_width_max=pcie[3],
+        ecc_enabled=safe(n.ecc_enabled, h),
+        vbios_version=safe(n.vbios_version, h),
+        driver_version=safe(n.driver_version),
+    )
+def gather(index: int = 0, nvml_obj: "_nvml.Nvml | None" = None) -> DeviceExtras | None:
+    """Open NVML (if not given one), read the extras, clean up. Returns
+    None when NVML isn't available so callers can skip the section."""
+    owns = nvml_obj is None
+    try:
+        n = nvml_obj if nvml_obj is not None else _nvml.Nvml()
+    except Exception:
+        return None
+    try:
+        return from_nvml(n, index)
+    except Exception:
+        return None
+    finally:
+        if owns:
+            try:
+                n.close()
+            except Exception:
+                pass

{kernelmeter-0.3.1 → kernelmeter-0.4.1}/src/kernelmeter/nvml.py RENAMED Viewed

@@ -16,9 +16,40 @@ import threading
 from dataclasses import dataclass
 NVML_SUCCESS = 0
+NVML_ERROR_NOT_SUPPORTED = 3
 NVML_CLOCK_SM = 1
 NVML_CLOCK_MEM = 2
 NVML_TEMPERATURE_GPU = 0
+NVML_FEATURE_ENABLED = 1
+# nvmlDeviceArchitecture_t
+ARCH_NAMES = {
+    2: "Kepler",
+    3: "Maxwell",
+    4: "Pascal",
+    5: "Volta",
+    6: "Turing",
+    7: "Ampere",
+    8: "Ada Lovelace",
+    9: "Hopper",
+    10: "Blackwell",
+}
+# nvmlBrandType_t (common entries)
+BRAND_NAMES = {
+    0: "Unknown",
+    1: "Quadro",
+    2: "Tesla",
+    3: "NVS",
+    4: "GRID",
+    5: "GeForce",
+    6: "Titan",
+    7: "NVIDIA vApps",
+    8: "NVIDIA vPC",
+    9: "NVIDIA vCS",
+    10: "NVIDIA vWS",
+    11: "NVIDIA Cloud Gaming",
+}
 class NvmlError(RuntimeError):
@@ -30,6 +61,14 @@ class NvmlNotAvailableError(RuntimeError):
     pass
+class _Memory(ctypes.Structure):
+    _fields_ = [
+        ("total", ctypes.c_ulonglong),
+        ("free", ctypes.c_ulonglong),
+        ("used", ctypes.c_ulonglong),
+    ]
 def load_library() -> ctypes.CDLL:
     if sys.platform == "darwin":
         raise NvmlNotAvailableError("NVML is not available on macOS")
@@ -73,6 +112,23 @@ class Nvml:
         self._check(func_name, fn(handle, *args, ctypes.byref(out)))
         return out.value
+    def _uint_query_opt(self, func_name: str, handle, *args) -> int | None:
+        """Like _uint_query but returns None when the card doesn't support
+        the query (e.g. consumer cards have no ECC), instead of raising."""
+        out = ctypes.c_uint(0)
+        fn = getattr(self._lib, func_name)
+        code = fn(handle, *args, ctypes.byref(out))
+        if code == NVML_ERROR_NOT_SUPPORTED:
+            return None
+        self._check(func_name, code)
+        return out.value
+    def _str_query(self, func_name: str, *args, length: int = 96) -> str:
+        buf = ctypes.create_string_buffer(length)
+        fn = getattr(self._lib, func_name)
+        self._check(func_name, fn(*args, buf, ctypes.c_uint(length)))
+        return buf.value.decode("utf-8", errors="replace")
     def sm_clock_mhz(self, handle) -> int:
         return self._uint_query("nvmlDeviceGetClockInfo", handle, NVML_CLOCK_SM)
@@ -94,6 +150,52 @@ class Nvml:
     def power_limit_w(self, handle) -> float:
         return self._uint_query("nvmlDeviceGetEnforcedPowerLimit", handle) / 1000.0
+    # ---- static device facts the driver attribute enum does not expose ----
+    def architecture(self, handle) -> int | None:
+        return self._uint_query_opt("nvmlDeviceGetArchitecture", handle)
+    def brand(self, handle) -> int | None:
+        return self._uint_query_opt("nvmlDeviceGetBrand", handle)
+    def num_gpu_cores(self, handle) -> int | None:
+        # NVML 11.8+. Older drivers don't have it -> AttributeError on the symbol.
+        if not hasattr(self._lib, "nvmlDeviceGetNumGpuCores"):
+            return None
+        return self._uint_query_opt("nvmlDeviceGetNumGpuCores", handle)
+    def memory_info(self, handle) -> tuple[int, int, int]:
+        mem = _Memory()
+        self._check(
+            "nvmlDeviceGetMemoryInfo",
+            self._lib.nvmlDeviceGetMemoryInfo(handle, ctypes.byref(mem)),
+        )
+        return mem.total, mem.free, mem.used
+    def pcie_link(self, handle) -> tuple[int | None, int | None, int | None, int | None]:
+        """(current gen, max gen, current width, max width)."""
+        return (
+            self._uint_query_opt("nvmlDeviceGetCurrPcieLinkGeneration", handle),
+            self._uint_query_opt("nvmlDeviceGetMaxPcieLinkGeneration", handle),
+            self._uint_query_opt("nvmlDeviceGetCurrPcieLinkWidth", handle),
+            self._uint_query_opt("nvmlDeviceGetMaxPcieLinkWidth", handle),
+        )
+    def ecc_enabled(self, handle) -> bool | None:
+        cur = ctypes.c_uint(0)
+        pend = ctypes.c_uint(0)
+        code = self._lib.nvmlDeviceGetEccMode(handle, ctypes.byref(cur), ctypes.byref(pend))
+        if code == NVML_ERROR_NOT_SUPPORTED:
+            return None
+        self._check("nvmlDeviceGetEccMode", code)
+        return cur.value == NVML_FEATURE_ENABLED
+    def vbios_version(self, handle) -> str:
+        return self._str_query("nvmlDeviceGetVbiosVersion", handle)
+    def driver_version(self) -> str:
+        return self._str_query("nvmlSystemGetDriverVersion")
 @dataclass
 class Telemetry:

{kernelmeter-0.3.1 → kernelmeter-0.4.1/src/kernelmeter.egg-info}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: kernelmeter
-Version: 0.3.1
+Version: 0.4.1
 Summary: Query every CUDA device attribute without profiling a kernel, and benchmark your kernels against the hardware's speed of light.
 Author: nuemaan
 License: MIT
@@ -82,6 +82,12 @@ Device 0: Tesla T4 (14.6 GiB)
   compute capability        : 7.5
   theoretical mem bandwidth : 320.1 GB/s
   theoretical FP32 peak     : 8.14 TFLOP/s
+  theoretical fp16 tensor   : 65.13 TFLOP/s (dense)
+  architecture (nvml)       : Turing, 2560 CUDA cores
+  pcie link (nvml)          : gen1/3 x8/16
+  memory in use (nvml)      : 450 / 15360 MiB
+  ecc (nvml)                : on
+  vbios (nvml)              : 90.04.96.00.02
   attribute                                        value
   ------------------------------------------------ ------------
@@ -90,16 +96,22 @@ Device 0: Tesla T4 (14.6 GiB)
   max_shared_memory_per_block                      49152
   warp_size                                        32
   clock_rate_khz                                   1590000
-  ...                                              (147 attributes total)
+  ...                                              (148 attributes total)
 ```
-These are the same values Nsight Compute shows as `device__attribute_*`,
-except you don't need to profile a kernel to see them. Add `--json` for
-machine-readable output.
-Every attribute id is probed against the live driver, so the output always
-matches the machine you run it on. Ids newer than the bundled name table
-still show up, just under a generic `attribute_<id>` name.
+The `attribute` table is read straight from the driver via
+`cuDeviceGetAttribute`, the same values Nsight Compute shows as
+`device__attribute_*`, but you don't need to profile a kernel to see them.
+Every id is probed live, so the output matches the machine you run it on;
+ids newer than the bundled name table show up as `attribute_<id>`.
+The `(nvml)` lines come from a second source: NVML, the library behind
+`nvidia-smi`, also shipped with the driver. They surface facts the driver
+attribute enum doesn't have (architecture name, real CUDA core count,
+PCIe link, live memory use, ECC, VBIOS) and are skipped silently if NVML
+isn't present. (The `gen1/3 x8/16` above is the live link: an idle T4
+drops to a lower PCIe state and ramps up under load.) Add `--json` for
+machine-readable output; the NVML block lands under `devices[].nvml`.
 ## Benchmarking a kernel

{kernelmeter-0.3.1 → kernelmeter-0.4.1}/src/kernelmeter.egg-info/SOURCES.txt RENAMED Viewed

@@ -7,6 +7,7 @@ src/kernelmeter/bench.py
 src/kernelmeter/ceiling.py
 src/kernelmeter/cli.py
 src/kernelmeter/cudadrv.py
+src/kernelmeter/extras.py
 src/kernelmeter/nvml.py
 src/kernelmeter/occupancy.py
 src/kernelmeter/peaks.py
@@ -22,6 +23,7 @@ tests/test_bench_math.py
 tests/test_bench_roofline.py
 tests/test_cli.py
 tests/test_cli_new_commands.py
+tests/test_extras.py
 tests/test_nvml.py
 tests/test_occupancy.py
 tests/test_peaks.py

{kernelmeter-0.3.1 → kernelmeter-0.4.1}/tests/test_attrs.py RENAMED Viewed

@@ -20,8 +20,8 @@ def test_unsupported_ids_are_skipped(fake_driver):
 def test_unknown_but_supported_ids_get_generic_names(fake_driver):
     dev = fake_driver.device(0)
     result = attrs.query_all(fake_driver, dev)
-    # id 155 succeeds in the fake but has no name in our table
-    assert result["attribute_155"] == 7
+    # id 160 succeeds in the fake but has no name in our table
+    assert result["attribute_160"] == 7
 def test_cuda12_range_names(fake_driver):
@@ -29,13 +29,16 @@ def test_cuda12_range_names(fake_driver):
     result = attrs.query_all(fake_driver, dev)
     assert result["numa_id"] == -1
     assert result["gpu_pci_device_id"] == 0x1EB810DE
-    # last named driver attribute before the CU_DEVICE_ATTRIBUTE_MAX sentinel
     assert result["atomic_reduction_supported"] == 1
+    # a CUDA 13.x attribute past the old 0.3.1 table
+    assert result["dma_buf_mmap_supported"] == 1
 def test_max_sentinel_is_not_named():
-    # 149 is CU_DEVICE_ATTRIBUTE_MAX, not a real attribute
-    assert 149 not in attrs.KNOWN_ATTRS
+    # 155 is CU_DEVICE_ATTRIBUTE_MAX as of CUDA 13.x, not a real attribute
+    assert 155 not in attrs.KNOWN_ATTRS
+    # the last real attribute we name
+    assert attrs.KNOWN_ATTRS[154] == "logical_endpoint_unicast_access_on_owner_device_supported"
 def test_device_metadata(fake_driver):

{kernelmeter-0.3.1 → kernelmeter-0.4.1}/tests/test_cli.py RENAMED Viewed

@@ -31,6 +31,46 @@ def test_info_human_readable(patched_driver, capsys):
     assert "GB/s" in out
+@pytest.fixture
+def patched_nvml(monkeypatch):
+    from kernelmeter import extras, nvml
+    from test_nvml import FakeNvmlLib
+    real_nvml = nvml.Nvml  # capture before patching to avoid self-recursion
+    monkeypatch.setattr(
+        extras._nvml, "Nvml", lambda *a, **k: real_nvml(lib=FakeNvmlLib())
+    )
+def test_info_json_includes_nvml(patched_driver, patched_nvml, capsys):
+    assert cli.main(["info", "--json"]) == 0
+    dev = json.loads(capsys.readouterr().out)["devices"][0]
+    assert dev["nvml"]["architecture"] == "Turing"
+    assert dev["nvml"]["num_gpu_cores"] == 2560
+    assert dev["nvml"]["pcie_gen_max"] == 3
+def test_info_human_shows_nvml(patched_driver, patched_nvml, capsys):
+    assert cli.main(["info"]) == 0
+    out = capsys.readouterr().out
+    assert "Turing" in out
+    assert "2560 CUDA cores" in out
+    assert "pcie link" in out
+def test_info_json_nvml_null_without_nvml(patched_driver, monkeypatch, capsys):
+    from kernelmeter import extras, nvml
+    def boom(*_a, **_k):
+        raise nvml.NvmlNotAvailableError("no driver")
+    monkeypatch.setattr(extras._nvml, "Nvml", boom)
+    assert cli.main(["info", "--json"]) == 0
+    dev = json.loads(capsys.readouterr().out)["devices"][0]
+    assert dev["nvml"] is None
 def test_info_without_driver(monkeypatch, capsys):
     from kernelmeter.cudadrv import CudaNotAvailableError

kernelmeter-0.4.1/tests/test_extras.py ADDED Viewed

@@ -0,0 +1,54 @@
+from kernelmeter import extras, nvml
+from test_nvml import FakeNvmlLib
+def _fake_nvml():
+    return nvml.Nvml(lib=FakeNvmlLib())
+def test_from_nvml_builds_extras():
+    ex = extras.from_nvml(_fake_nvml(), 0)
+    assert ex.architecture == "Turing"
+    assert ex.brand == "Tesla"
+    assert ex.num_gpu_cores == 2560
+    assert ex.memory_total_bytes == 15843721216
+    assert ex.pcie_gen_current == 3
+    assert ex.pcie_width_max == 16
+    assert ex.ecc_enabled is True
+    assert ex.vbios_version == "90.04.38.00.03"
+    assert ex.driver_version == "535.104.05"
+def test_gather_uses_injected_nvml():
+    ex = extras.gather(0, nvml_obj=_fake_nvml())
+    assert ex is not None
+    assert ex.architecture == "Turing"
+def test_gather_returns_none_when_nvml_missing(monkeypatch):
+    # simulate a machine with no driver: Nvml() construction raises
+    def boom(*_a, **_k):
+        raise nvml.NvmlNotAvailableError("no driver")
+    monkeypatch.setattr(extras._nvml, "Nvml", boom)
+    assert extras.gather(0) is None
+def test_individual_field_failure_is_tolerated():
+    # a card that doesn't report cores shouldn't sink the whole gather
+    class NoCores(FakeNvmlLib):
+        def nvmlDeviceGetNumGpuCores(self, handle, ptr):
+            return nvml.NVML_ERROR_NOT_SUPPORTED
+    ex = extras.from_nvml(nvml.Nvml(lib=NoCores()), 0)
+    assert ex.num_gpu_cores is None
+    assert ex.architecture == "Turing"  # the rest still came through
+def test_as_dict_roundtrips():
+    ex = extras.from_nvml(_fake_nvml(), 0)
+    d = ex.as_dict()
+    assert d["architecture"] == "Turing"
+    assert d["num_gpu_cores"] == 2560
+    assert set(d) == set(ex.__dict__)

{kernelmeter-0.3.1 → kernelmeter-0.4.1}/tests/test_nvml.py RENAMED Viewed

@@ -45,6 +45,54 @@ class FakeNvmlLib:
         ptr._obj.value = 70000
         return NVML_SUCCESS
+    # static device facts (modelled on a Tesla T4)
+    def nvmlDeviceGetArchitecture(self, handle, ptr):
+        ptr._obj.value = 6  # Turing
+        return NVML_SUCCESS
+    def nvmlDeviceGetBrand(self, handle, ptr):
+        ptr._obj.value = 2  # Tesla
+        return NVML_SUCCESS
+    def nvmlDeviceGetNumGpuCores(self, handle, ptr):
+        ptr._obj.value = 2560
+        return NVML_SUCCESS
+    def nvmlDeviceGetMemoryInfo(self, handle, ptr):
+        ptr._obj.total = 15843721216
+        ptr._obj.free = 15500000000
+        ptr._obj.used = 343721216
+        return NVML_SUCCESS
+    def nvmlDeviceGetCurrPcieLinkGeneration(self, handle, ptr):
+        ptr._obj.value = 3
+        return NVML_SUCCESS
+    def nvmlDeviceGetMaxPcieLinkGeneration(self, handle, ptr):
+        ptr._obj.value = 3
+        return NVML_SUCCESS
+    def nvmlDeviceGetCurrPcieLinkWidth(self, handle, ptr):
+        ptr._obj.value = 16
+        return NVML_SUCCESS
+    def nvmlDeviceGetMaxPcieLinkWidth(self, handle, ptr):
+        ptr._obj.value = 16
+        return NVML_SUCCESS
+    def nvmlDeviceGetEccMode(self, handle, cur, pend):
+        cur._obj.value = 1  # enabled
+        pend._obj.value = 1
+        return NVML_SUCCESS
+    def nvmlDeviceGetVbiosVersion(self, handle, buf, length):
+        buf.value = b"90.04.38.00.03"
+        return NVML_SUCCESS
+    def nvmlSystemGetDriverVersion(self, buf, length):
+        buf.value = b"535.104.05"
+        return NVML_SUCCESS
 def test_wrapper_reads_values():
     n = nvml.Nvml(lib=FakeNvmlLib())
@@ -67,6 +115,31 @@ def test_error_code_raises():
         n.temperature_c(n.device(0))
+def test_static_device_facts():
+    n = nvml.Nvml(lib=FakeNvmlLib())
+    h = n.device(0)
+    assert nvml.ARCH_NAMES[n.architecture(h)] == "Turing"
+    assert nvml.BRAND_NAMES[n.brand(h)] == "Tesla"
+    assert n.num_gpu_cores(h) == 2560
+    total, free, used = n.memory_info(h)
+    assert total == 15843721216
+    assert used == 343721216
+    assert n.pcie_link(h) == (3, 3, 16, 16)
+    assert n.ecc_enabled(h) is True
+    assert n.vbios_version(h) == "90.04.38.00.03"
+    assert n.driver_version() == "535.104.05"
+def test_unsupported_field_returns_none():
+    # consumer cards return NOT_SUPPORTED (3) for ECC
+    class NoEcc(FakeNvmlLib):
+        def nvmlDeviceGetEccMode(self, handle, cur, pend):
+            return nvml.NVML_ERROR_NOT_SUPPORTED
+    n = nvml.Nvml(lib=NoEcc())
+    assert n.ecc_enabled(n.device(0)) is None
 def test_summarize_samples():
     t = nvml.summarize_samples(
         sm=[1500, 1560], mem=[4985, 4985], temp=[60, 63], power=[44.0, 46.0],

{kernelmeter-0.3.1 → kernelmeter-0.4.1}/LICENSE RENAMED Viewed

File without changes

{kernelmeter-0.3.1 → kernelmeter-0.4.1}/setup.cfg RENAMED Viewed

File without changes

{kernelmeter-0.3.1 → kernelmeter-0.4.1}/src/kernelmeter/bench.py RENAMED Viewed

File without changes

{kernelmeter-0.3.1 → kernelmeter-0.4.1}/src/kernelmeter/ceiling.py RENAMED Viewed

File without changes

{kernelmeter-0.3.1 → kernelmeter-0.4.1}/src/kernelmeter/cudadrv.py RENAMED Viewed

File without changes

{kernelmeter-0.3.1 → kernelmeter-0.4.1}/src/kernelmeter/occupancy.py RENAMED Viewed

File without changes

{kernelmeter-0.3.1 → kernelmeter-0.4.1}/src/kernelmeter/peaks.py RENAMED Viewed

File without changes

{kernelmeter-0.3.1 → kernelmeter-0.4.1}/src/kernelmeter/roofline.py RENAMED Viewed

File without changes

{kernelmeter-0.3.1 → kernelmeter-0.4.1}/src/kernelmeter.egg-info/dependency_links.txt RENAMED Viewed

File without changes

{kernelmeter-0.3.1 → kernelmeter-0.4.1}/src/kernelmeter.egg-info/entry_points.txt RENAMED Viewed

File without changes

{kernelmeter-0.3.1 → kernelmeter-0.4.1}/src/kernelmeter.egg-info/requires.txt RENAMED Viewed

File without changes

{kernelmeter-0.3.1 → kernelmeter-0.4.1}/src/kernelmeter.egg-info/top_level.txt RENAMED Viewed

File without changes

{kernelmeter-0.3.1 → kernelmeter-0.4.1}/tests/test_bench_math.py RENAMED Viewed

File without changes

{kernelmeter-0.3.1 → kernelmeter-0.4.1}/tests/test_bench_roofline.py RENAMED Viewed

File without changes

{kernelmeter-0.3.1 → kernelmeter-0.4.1}/tests/test_cli_new_commands.py RENAMED Viewed

File without changes

{kernelmeter-0.3.1 → kernelmeter-0.4.1}/tests/test_occupancy.py RENAMED Viewed

File without changes

{kernelmeter-0.3.1 → kernelmeter-0.4.1}/tests/test_peaks.py RENAMED Viewed

File without changes

{kernelmeter-0.3.1 → kernelmeter-0.4.1}/tests/test_roofline.py RENAMED Viewed

File without changes

{kernelmeter-0.3.1 → kernelmeter-0.4.1}/tests/test_tensor_peaks.py RENAMED Viewed

File without changes

kernelmeter 0.3.1__tar.gz → 0.4.1__tar.gz

kernelmeter 0.3.1tar.gz → 0.4.1tar.gz