PyPI - gpu-usage-audit - Versions diffs - 1.0.0__tar.gz → 1.0.2__tar.gz - Mend

gpu-usage-audit 1.0.0tar.gz → 1.0.2tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (43) hide show

gpu_usage_audit-1.0.2/CHANGELOG.md ADDED Viewed

@@ -0,0 +1,38 @@
+# Changelog
+## 1.0.2 - 2026-05-15
+- Hardened `gua status` and `gua stop` so stale PID files do not act on
+  unrelated live processes.
+- Clarified report output by explaining sample units, classification rules,
+  interval-dependent GPU-hours, and heatmap density.
+- Split §2 from generic "Waste" into idle-held capacity and truly-idle
+  capacity. The equivalent-GPU figures now use GPUs present in the report
+  window instead of the entire database.
+- Made §4 Top identities aggregate by identity/GPU/tick before converting to
+  GPU-hours, so reports may show lower per-user GPU-hours when one user has
+  multiple processes on the same GPU at the same tick.
+- Warn when NVML process-list visibility is unavailable for a GPU.
+## 1.0.1 - 2026-05-15
+- Made `gua` the documented command surface for daemon, report, demo, and doctor output.
+- Made `gua daemon` start the collector in the background by default, with
+  `gua daemon --foreground` available for systemd and debugging.
+- Added `gua start`, `gua status`, and `gua stop` for background collector management.
+## 1.0.0 - 2026-05-15
+Bare-metal 1.0 narrows `gpu-usage-audit` to one clear workflow: inspect the
+current NVIDIA Linux host, collect NVML telemetry into SQLite, and render a
+retrospective active / idle-held / truly-idle report.
+- Reset the product surface to a single local bare-metal host.
+- Added `gua doctor` for read-only local NVIDIA/NVML/database readiness checks.
+- Made `nvidia-ml-py` a default dependency while keeping the `nvml` extra as a
+  compatibility alias.
+- Defaulted `daemon` and `report` to `/tmp/gua.db`.
+- Made `daemon` refuse an existing database and `report` refuse a missing one.
+- Kept the schema at v1: `host`, `gpu_sample`, `proc_sample`.
+- Removed post-1.0 auto-runtime planning artifacts and runtime-detection code.
+- Preserved `demo` for GPU-less output checks with fake telemetry.

{gpu_usage_audit-1.0.0 → gpu_usage_audit-1.0.2}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: gpu-usage-audit
-Version: 1.0.0
+Version: 1.0.2
 Summary: Single-host daemon that surfaces 'idle-held' NVIDIA GPU memory — the embarrassing category conventional dashboards miss.
 Project-URL: Homepage, https://github.com/AI-Ocean/gpu-usage-audit
 Project-URL: Issues, https://github.com/AI-Ocean/gpu-usage-audit/issues
@@ -233,7 +233,7 @@ Jupyter notebook open with an 8 GB tensor on the GPU and went to
 lunch — `nvidia-smi` will show 1% utilization, but the card is
 *unusable* by anyone else. This tool measures that.
-> **Status:** bare-metal 1.0 release candidate.
+> **Status:** bare-metal 1.0.
 > `gua doctor` checks only the current machine. `daemon` records NVML
 > telemetry from the current NVIDIA host, `report` reads the resulting
 > SQLite database, and `demo` runs anywhere with fake telemetry. The Go
@@ -253,8 +253,10 @@ runtime. If Python downloads are disabled by local policy, install Python
 uv tool install gpu-usage-audit
 gua doctor
-gpu-usage-audit daemon --interval 30s
-gpu-usage-audit report --since 1h --interval 30s
+gua daemon --interval 30s
+gua status
+gua report --since 1h --interval 30s
+gua stop
 ```
 `gua doctor` is intentionally read-only. It checks only the current
@@ -269,7 +271,8 @@ with GPU UUIDs, so review it before sharing it outside your team.
 `gua doctor` does not need `sudo`; run it as the same user that will run
 the daemon.
-Available `gua` subcommands: `doctor`.
+Available `gua` subcommands: `doctor`, `daemon`, `start`, `status`,
+`stop`, `report`, `demo`, `version`, `help`.
 Update or remove the installed tool with uv:
@@ -284,8 +287,8 @@ its `gua` / `gpu-usage-audit` commands.
 GitHub Release assets are also available for manual download:
 ```sh
-BASE="https://github.com/AI-Ocean/gpu-usage-audit/releases/download/v1.0.0"
-WHEEL="gpu_usage_audit-1.0.0-py3-none-any.whl"
+BASE="https://github.com/AI-Ocean/gpu-usage-audit/releases/download/v1.0.2"
+WHEEL="gpu_usage_audit-1.0.2-py3-none-any.whl"
 curl -fsSLO "$BASE/$WHEEL"
 curl -fsSLO "$BASE/SHA256SUMS"
@@ -297,30 +300,37 @@ uvx --from "./$WHEEL" gua doctor
 ## What you get
 ```
-$ gpu-usage-audit report --since 1h --interval 30s
-gpu-usage-audit — lab-a100 (bare, driver 560.35.05)  Window: 1:00:00
+$ gua report --since 1h --interval 30s
+gua — lab-a100 (bare, driver 560.35.05)  Window: 1:00:00
 §1 Headline
+  basis: one sample = one GPU card at one daemon tick
+  rules: active >=10% util; idle-held <10% util with >100 MB process memory
   █████████▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒░░░░░░░░░░░░░░░░░░░░░░░░
   active       █   15.7%
   idle-held    ▒   45.1%       ← this is the number conventional tools miss
   truly-idle   ░   39.2%
   (51 samples)
-§2 Waste
-  ~0.43 GPU-hours idle, ~2.53 GPUs equivalently unused
+§2 Idle capacity
+  converted from card-ticks to GPU-hours using the report --interval
+  idle-held: ~0.31 GPU-hours, ~1.53 GPUs equivalently unavailable
+  truly-idle: ~0.12 GPU-hours, ~1.00 GPUs equivalently free
 §3 Per-GPU
+  per-card share of samples in the same three states
   GPU-0     active  47.1%  idle-held  35.3%  truly-idle  17.6%
   GPU-1     active   0.0%  idle-held 100.0%  truly-idle   0.0%
   GPU-2     active   0.0%  idle-held   0.0%  truly-idle 100.0%
 §4 Top identities
-  identity              gpu-hours   idle-held
-  alice                      0.42       42.9%
-  bob                        0.28      100.0%
+  one identity counts once per GPU/tick after its processes are summed
+  identity              gpu-hours   idle-held   samples
+  alice                      0.42       42.9%        51
+  bob                        0.28      100.0%        34
 §5 Time-of-day heatmap (UTC)
+  darker means higher active share; blank means no samples
         0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3
   Mon               .
 ```
@@ -328,7 +338,10 @@ gpu-usage-audit — lab-a100 (bare, driver 560.35.05)  Window: 1:00:00
 The 3-bar collapses every card × every tick over the window into the
 active / idle-held / truly-idle split. **`idle-held` rows are the
 embarrassing category**: a process is holding GPU memory but the SM
-utilization is below 10%.
+utilization is below 10%. §2 converts those card-ticks into GPU-hours
+with `--interval`; §4 groups process rows by identity, GPU, and tick
+before ranking users, so multiple same-user processes on one GPU/tick
+count once.
 ## Demo (no GPU required)
@@ -336,7 +349,7 @@ The `demo` subcommand records 30 ticks of fake telemetry and prints the
 report — all in one process, no second shell needed.
 ```sh
-gpu-usage-audit demo
+gua demo
 ```
 The bundled `FakeTier` produces a deterministic 5-tick workload —
@@ -369,21 +382,28 @@ can collect real telemetry.
 Then run the collector:
 ```sh
-gpu-usage-audit daemon --interval 30s
+gua daemon --interval 30s
+gua status
 ```
-Run the report from another shell:
+Run the report:
 ```sh
-gpu-usage-audit report --since 1h --interval 30s
+gua report --since 1h --interval 30s
+```
+Stop the background collector when the collection window is done:
+```sh
+gua stop
 ```
 If `--db` is omitted, both `daemon` and `report` use `/tmp/gua.db`.
 `daemon` refuses to start when that database file already exists, so a
 new collection run does not silently append to an old test database. If
 `gua doctor` reports that the database already exists, either run
-`gpu-usage-audit report` against the existing data or choose a fresh
-`--db PATH` for the next daemon run.
+`gua report` against the existing data or choose a fresh `--db PATH` for
+the next daemon run.
 > The daemon requires the NVIDIA driver and `libnvidia-ml.so.1`. On a
 > driverless host it exits with a friendly NVML initialization error. For
@@ -391,18 +411,24 @@ new collection run does not silently append to an old test database. If
 ## Usage
-`gpu-usage-audit` has three commands sharing one SQLite file:
+`gua` has commands sharing one SQLite file. The `gpu-usage-audit` entry
+point remains installed for compatibility, but new examples use `gua`.
 | Command  | What it does                                                |
 | -------- | ----------------------------------------------------------- |
-| `daemon` | Long-running background process. Samples real NVML telemetry on every tick and writes to a new database. Stop with Ctrl+C (SIGINT) or `systemctl stop`. NVIDIA host required. |
+| `daemon` | Starts the collector in the background. Samples real NVML telemetry on every tick and writes to a new database. NVIDIA host required. |
+| `start`  | Alias for `gua daemon`. |
+| `status` | Shows whether the background collector PID is still running. Also clears a stale PID file when it points to a missing or unrelated process. |
+| `stop`   | Stops the background collector with SIGTERM. |
 | `report` | One-shot read against the accumulated database. Safe to run **while the daemon is still writing** — SQLite WAL mode handles the concurrency. |
 | `demo`   | Self-contained showcase. Records N fake ticks and immediately prints the report. No GPU, no second shell, no operational meaning — just to see the output shape. |
-### `daemon`
+### `daemon` / `start`
 ```
-gpu-usage-audit daemon [--db PATH] [--interval D]
+gua daemon [--db PATH] [--interval D] [--pid-file PATH] [--log-file PATH]
+gua start  [--db PATH] [--interval D] [--pid-file PATH] [--log-file PATH]
+gua daemon --foreground [--db PATH] [--interval D]
 ```
 - `--db PATH` (default `/tmp/gua.db`) — SQLite file to create and write
@@ -410,14 +436,23 @@ gpu-usage-audit daemon [--db PATH] [--interval D]
   is enabled automatically.
 - `--interval D` (default `30s`) — how often to sample. Accepts `30s`,
   `1m`, `200ms`, etc.
-Each tick prints a one-line summary to stdout; on shutdown the cumulative
-row count is printed.
+- `--pid-file PATH` (default `/tmp/gua.pid`) — background PID file.
+- `--log-file PATH` (default `/tmp/gua.log`) — stdout/stderr from the
+  background collector.
+- `--foreground` — keep the collector attached to the current process.
+  Use this for systemd or debugging.
+By default, `gua daemon` returns after the collector starts. Each tick is
+written to the log file; on shutdown the cumulative row count is written
+there too. `gua daemon --foreground` prints the tick summaries directly
+to the terminal and exits on Ctrl+C, SIGTERM, or `systemctl stop`.
+`gua status` and `gua stop` verify that the PID file points to the
+managed collector before acting on it; stale PID files are cleared.
 ### `report`
 ```
-gpu-usage-audit report [--db PATH] [--since D] [--interval D] [--width N]
+gua report [--db PATH] [--since D] [--interval D] [--width N]
 ```
 - `--db PATH` (default `/tmp/gua.db`) — same SQLite file the daemon writes
@@ -427,14 +462,14 @@ gpu-usage-audit report [--db PATH] [--since D] [--interval D] [--width N]
   of oldest sample), so passing a huge `--since` is the same as "all
   data". Units: `ms`, `s`, `m`, `h`, `d` (no `w`; use `7d`).
 - `--interval D` (default `30s`) — **must match what the daemon used**.
-  This is how §2 (Waste) and §4 (Top identities) convert tick counts
+  This is how §2 (Idle capacity) and §4 (Top identities) convert tick counts
   to GPU-hours. Mismatched intervals → wrong GPU-hours.
 - `--width N` (default `60`) — width of the §1 three-bar in characters.
 ### `demo`
 ```
-gpu-usage-audit demo [--db PATH] [--ticks N] [--interval D]
+gua demo [--db PATH] [--ticks N] [--interval D]
 ```
 - `--db PATH` (optional) — if omitted, a fresh temporary database is
@@ -446,7 +481,7 @@ gpu-usage-audit demo [--db PATH] [--ticks N] [--interval D]
 ### Operational notes
 - **Same `--interval` on both sides.** If you ran the daemon with
-  `--interval 30s`, run `report --interval 30s` too.
+  `--interval 30s`, run `gua report --interval 30s` too.
 - **Let it run for a while.** §1/§3 are meaningful after one tick;
   §4 (Top identities) needs hours; §5 (Heatmap) needs days.
 - **WAL leaves sidecar files** (`gua.db-wal`, `gua.db-shm`). They are
@@ -461,12 +496,12 @@ For a long-running deployment, drop a unit file in
 ```ini
 [Unit]
-Description=gpu-usage-audit daemon
+Description=gua daemon
 After=network.target
 [Service]
 Type=simple
-ExecStart=/usr/local/bin/gpu-usage-audit daemon --db /var/lib/gua/gua.db --interval 30s
+ExecStart=/usr/local/bin/gua daemon --foreground --db /var/lib/gua/gua.db --interval 30s
 Restart=on-failure
 User=gua
@@ -506,7 +541,7 @@ uv sync                          # create .venv, install dev deps
 uv run pytest                    # run the test suite
 uv run ruff check                # lint
 uv run mypy                      # type-check (strict)
-uv run gpu-usage-audit demo      # see the report shape locally
+uv run gua demo                  # see the report shape locally
 ```
 CI runs ruff + format check + mypy + pytest, then builds and smoke-tests

{gpu_usage_audit-1.0.0 → gpu_usage_audit-1.0.2}/README.md RENAMED Viewed

@@ -10,7 +10,7 @@ Jupyter notebook open with an 8 GB tensor on the GPU and went to
 lunch — `nvidia-smi` will show 1% utilization, but the card is
 *unusable* by anyone else. This tool measures that.
-> **Status:** bare-metal 1.0 release candidate.
+> **Status:** bare-metal 1.0.
 > `gua doctor` checks only the current machine. `daemon` records NVML
 > telemetry from the current NVIDIA host, `report` reads the resulting
 > SQLite database, and `demo` runs anywhere with fake telemetry. The Go
@@ -30,8 +30,10 @@ runtime. If Python downloads are disabled by local policy, install Python
 uv tool install gpu-usage-audit
 gua doctor
-gpu-usage-audit daemon --interval 30s
-gpu-usage-audit report --since 1h --interval 30s
+gua daemon --interval 30s
+gua status
+gua report --since 1h --interval 30s
+gua stop
 ```
 `gua doctor` is intentionally read-only. It checks only the current
@@ -46,7 +48,8 @@ with GPU UUIDs, so review it before sharing it outside your team.
 `gua doctor` does not need `sudo`; run it as the same user that will run
 the daemon.
-Available `gua` subcommands: `doctor`.
+Available `gua` subcommands: `doctor`, `daemon`, `start`, `status`,
+`stop`, `report`, `demo`, `version`, `help`.
 Update or remove the installed tool with uv:
@@ -61,8 +64,8 @@ its `gua` / `gpu-usage-audit` commands.
 GitHub Release assets are also available for manual download:
 ```sh
-BASE="https://github.com/AI-Ocean/gpu-usage-audit/releases/download/v1.0.0"
-WHEEL="gpu_usage_audit-1.0.0-py3-none-any.whl"
+BASE="https://github.com/AI-Ocean/gpu-usage-audit/releases/download/v1.0.2"
+WHEEL="gpu_usage_audit-1.0.2-py3-none-any.whl"
 curl -fsSLO "$BASE/$WHEEL"
 curl -fsSLO "$BASE/SHA256SUMS"
@@ -74,30 +77,37 @@ uvx --from "./$WHEEL" gua doctor
 ## What you get
 ```
-$ gpu-usage-audit report --since 1h --interval 30s
-gpu-usage-audit — lab-a100 (bare, driver 560.35.05)  Window: 1:00:00
+$ gua report --since 1h --interval 30s
+gua — lab-a100 (bare, driver 560.35.05)  Window: 1:00:00
 §1 Headline
+  basis: one sample = one GPU card at one daemon tick
+  rules: active >=10% util; idle-held <10% util with >100 MB process memory
   █████████▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒░░░░░░░░░░░░░░░░░░░░░░░░
   active       █   15.7%
   idle-held    ▒   45.1%       ← this is the number conventional tools miss
   truly-idle   ░   39.2%
   (51 samples)
-§2 Waste
-  ~0.43 GPU-hours idle, ~2.53 GPUs equivalently unused
+§2 Idle capacity
+  converted from card-ticks to GPU-hours using the report --interval
+  idle-held: ~0.31 GPU-hours, ~1.53 GPUs equivalently unavailable
+  truly-idle: ~0.12 GPU-hours, ~1.00 GPUs equivalently free
 §3 Per-GPU
+  per-card share of samples in the same three states
   GPU-0     active  47.1%  idle-held  35.3%  truly-idle  17.6%
   GPU-1     active   0.0%  idle-held 100.0%  truly-idle   0.0%
   GPU-2     active   0.0%  idle-held   0.0%  truly-idle 100.0%
 §4 Top identities
-  identity              gpu-hours   idle-held
-  alice                      0.42       42.9%
-  bob                        0.28      100.0%
+  one identity counts once per GPU/tick after its processes are summed
+  identity              gpu-hours   idle-held   samples
+  alice                      0.42       42.9%        51
+  bob                        0.28      100.0%        34
 §5 Time-of-day heatmap (UTC)
+  darker means higher active share; blank means no samples
         0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3
   Mon               .
 ```
@@ -105,7 +115,10 @@ gpu-usage-audit — lab-a100 (bare, driver 560.35.05)  Window: 1:00:00
 The 3-bar collapses every card × every tick over the window into the
 active / idle-held / truly-idle split. **`idle-held` rows are the
 embarrassing category**: a process is holding GPU memory but the SM
-utilization is below 10%.
+utilization is below 10%. §2 converts those card-ticks into GPU-hours
+with `--interval`; §4 groups process rows by identity, GPU, and tick
+before ranking users, so multiple same-user processes on one GPU/tick
+count once.
 ## Demo (no GPU required)
@@ -113,7 +126,7 @@ The `demo` subcommand records 30 ticks of fake telemetry and prints the
 report — all in one process, no second shell needed.
 ```sh
-gpu-usage-audit demo
+gua demo
 ```
 The bundled `FakeTier` produces a deterministic 5-tick workload —
@@ -146,21 +159,28 @@ can collect real telemetry.
 Then run the collector:
 ```sh
-gpu-usage-audit daemon --interval 30s
+gua daemon --interval 30s
+gua status
 ```
-Run the report from another shell:
+Run the report:
 ```sh
-gpu-usage-audit report --since 1h --interval 30s
+gua report --since 1h --interval 30s
+```
+Stop the background collector when the collection window is done:
+```sh
+gua stop
 ```
 If `--db` is omitted, both `daemon` and `report` use `/tmp/gua.db`.
 `daemon` refuses to start when that database file already exists, so a
 new collection run does not silently append to an old test database. If
 `gua doctor` reports that the database already exists, either run
-`gpu-usage-audit report` against the existing data or choose a fresh
-`--db PATH` for the next daemon run.
+`gua report` against the existing data or choose a fresh `--db PATH` for
+the next daemon run.
 > The daemon requires the NVIDIA driver and `libnvidia-ml.so.1`. On a
 > driverless host it exits with a friendly NVML initialization error. For
@@ -168,18 +188,24 @@ new collection run does not silently append to an old test database. If
 ## Usage
-`gpu-usage-audit` has three commands sharing one SQLite file:
+`gua` has commands sharing one SQLite file. The `gpu-usage-audit` entry
+point remains installed for compatibility, but new examples use `gua`.
 | Command  | What it does                                                |
 | -------- | ----------------------------------------------------------- |
-| `daemon` | Long-running background process. Samples real NVML telemetry on every tick and writes to a new database. Stop with Ctrl+C (SIGINT) or `systemctl stop`. NVIDIA host required. |
+| `daemon` | Starts the collector in the background. Samples real NVML telemetry on every tick and writes to a new database. NVIDIA host required. |
+| `start`  | Alias for `gua daemon`. |
+| `status` | Shows whether the background collector PID is still running. Also clears a stale PID file when it points to a missing or unrelated process. |
+| `stop`   | Stops the background collector with SIGTERM. |
 | `report` | One-shot read against the accumulated database. Safe to run **while the daemon is still writing** — SQLite WAL mode handles the concurrency. |
 | `demo`   | Self-contained showcase. Records N fake ticks and immediately prints the report. No GPU, no second shell, no operational meaning — just to see the output shape. |
-### `daemon`
+### `daemon` / `start`
 ```
-gpu-usage-audit daemon [--db PATH] [--interval D]
+gua daemon [--db PATH] [--interval D] [--pid-file PATH] [--log-file PATH]
+gua start  [--db PATH] [--interval D] [--pid-file PATH] [--log-file PATH]
+gua daemon --foreground [--db PATH] [--interval D]
 ```
 - `--db PATH` (default `/tmp/gua.db`) — SQLite file to create and write
@@ -187,14 +213,23 @@ gpu-usage-audit daemon [--db PATH] [--interval D]
   is enabled automatically.
 - `--interval D` (default `30s`) — how often to sample. Accepts `30s`,
   `1m`, `200ms`, etc.
-Each tick prints a one-line summary to stdout; on shutdown the cumulative
-row count is printed.
+- `--pid-file PATH` (default `/tmp/gua.pid`) — background PID file.
+- `--log-file PATH` (default `/tmp/gua.log`) — stdout/stderr from the
+  background collector.
+- `--foreground` — keep the collector attached to the current process.
+  Use this for systemd or debugging.
+By default, `gua daemon` returns after the collector starts. Each tick is
+written to the log file; on shutdown the cumulative row count is written
+there too. `gua daemon --foreground` prints the tick summaries directly
+to the terminal and exits on Ctrl+C, SIGTERM, or `systemctl stop`.
+`gua status` and `gua stop` verify that the PID file points to the
+managed collector before acting on it; stale PID files are cleared.
 ### `report`
 ```
-gpu-usage-audit report [--db PATH] [--since D] [--interval D] [--width N]
+gua report [--db PATH] [--since D] [--interval D] [--width N]
 ```
 - `--db PATH` (default `/tmp/gua.db`) — same SQLite file the daemon writes
@@ -204,14 +239,14 @@ gpu-usage-audit report [--db PATH] [--since D] [--interval D] [--width N]
   of oldest sample), so passing a huge `--since` is the same as "all
   data". Units: `ms`, `s`, `m`, `h`, `d` (no `w`; use `7d`).
 - `--interval D` (default `30s`) — **must match what the daemon used**.
-  This is how §2 (Waste) and §4 (Top identities) convert tick counts
+  This is how §2 (Idle capacity) and §4 (Top identities) convert tick counts
   to GPU-hours. Mismatched intervals → wrong GPU-hours.
 - `--width N` (default `60`) — width of the §1 three-bar in characters.
 ### `demo`
 ```
-gpu-usage-audit demo [--db PATH] [--ticks N] [--interval D]
+gua demo [--db PATH] [--ticks N] [--interval D]
 ```
 - `--db PATH` (optional) — if omitted, a fresh temporary database is
@@ -223,7 +258,7 @@ gpu-usage-audit demo [--db PATH] [--ticks N] [--interval D]
 ### Operational notes
 - **Same `--interval` on both sides.** If you ran the daemon with
-  `--interval 30s`, run `report --interval 30s` too.
+  `--interval 30s`, run `gua report --interval 30s` too.
 - **Let it run for a while.** §1/§3 are meaningful after one tick;
   §4 (Top identities) needs hours; §5 (Heatmap) needs days.
 - **WAL leaves sidecar files** (`gua.db-wal`, `gua.db-shm`). They are
@@ -238,12 +273,12 @@ For a long-running deployment, drop a unit file in
 ```ini
 [Unit]
-Description=gpu-usage-audit daemon
+Description=gua daemon
 After=network.target
 [Service]
 Type=simple
-ExecStart=/usr/local/bin/gpu-usage-audit daemon --db /var/lib/gua/gua.db --interval 30s
+ExecStart=/usr/local/bin/gua daemon --foreground --db /var/lib/gua/gua.db --interval 30s
 Restart=on-failure
 User=gua
@@ -283,7 +318,7 @@ uv sync                          # create .venv, install dev deps
 uv run pytest                    # run the test suite
 uv run ruff check                # lint
 uv run mypy                      # type-check (strict)
-uv run gpu-usage-audit demo      # see the report shape locally
+uv run gua demo                  # see the report shape locally
 ```
 CI runs ruff + format check + mypy + pytest, then builds and smoke-tests

gpu_usage_audit-1.0.2/projects/bare-metal-1.0/handoff.ko.md ADDED Viewed

@@ -0,0 +1,83 @@
+# Bare Metal 1.0 Handoff
+갱신일: 2026-05-15
+## 이어받을 때 먼저 볼 것
+- `projects/bare-metal-1.0/status.ko.md`: 현재 완료 상태, 1.0.1 검증 결과, 1.0.2 release prep 상태.
+- `README.md`: 실제 사용자 문서와 release/install/runbook/report 표면.
+- `src/gpu_usage_audit/__main__.py`: `gua` CLI, background daemon lifecycle, PID handling.
+- `src/gpu_usage_audit/report.py`: report SQL 집계.
+- `src/gpu_usage_audit/render.py`: report 사람이 읽는 출력.
+- `.github/workflows/release.yml`: tag release, GitHub Release, PyPI publish 경로.
+## 고정된 결정
+- 1.0은 단일 로컬 베어메탈 NVIDIA 호스트만 본다.
+- Kubernetes, Slurm, Docker/Podman fallback, remote node, cluster-wide report는 1.0 범위 밖이다.
+- `nvidia-ml-py`는 기본 dependency다.
+- `gpu-usage-audit[nvml]` extra는 compatibility를 위해 빈 alias로 남긴다.
+- DB schema는 v1을 유지한다: `host`, `gpu_sample`, `proc_sample`.
+- 기본 DB는 `/tmp/gua.db`다.
+- `gua daemon`은 기본 백그라운드 실행이다.
+- `gua daemon --foreground`는 systemd/debugging 용도다.
+- `gua start`는 `gua daemon` alias다.
+- `gua status`와 `gua stop`은 pid file 기반 background collector 관리용이다.
+- `daemon`은 기존 DB 파일이 있으면 실패한다.
+- `report`는 DB 파일이 없으면 실패한다.
+- `daemon`과 `demo`는 host row의 `env_kind`를 항상 `"bare"`로 기록한다.
+- auto-runtime proposal/project 문서는 삭제했다. Kubernetes/Slurm/Docker/Podman 확장을 다시
+  시작하려면 새 proposal로 시작한다.
+## 현재 상태
+- PR A: implemented in PR #9.
+- PR B: implemented in PR #10.
+- Post-1.0 cleanup: completed in PR #11.
+- Bare-metal 1.0 release: completed in PR #12 and tag `v1.0.0`.
+- 1.0.1 command surface/background daemon release: completed in PR #13 and tag `v1.0.1`.
+- GitHub Release `v1.0.1`: published.
+- PyPI `gpu-usage-audit 1.0.1`: published.
+- NVIDIA host acceptance: 사용자가 실제 host에서 수집 정상 동작을 확인했다.
+- 1.0.2 release prep: 진행 중. #14 lifecycle/report cleanup을 patch release로 배포한다.
+  package version은 `1.0.2`로 bump했고 local build/wheel smoke는 통과했다.
+## 마지막 로컬 검증
+```sh
+uv run ruff check
+uv run ruff format --check
+uv run mypy
+uv run pytest
+uv build --out-dir /tmp/gua-dist-1.0.2-prep
+bash scripts/smoke-dist-wheel.sh /tmp/gua-dist-1.0.2-prep/gpu_usage_audit-1.0.2-py3-none-any.whl
+env GITHUB_REF_NAME=v1.0.2 uv run python scripts/check-tag-version.py
+```
+결과는 `pytest` 124 passed, `mypy` 25 source files, `ruff format` 26 files 기준이다.
+## 현재 cleanup PR 방향
+- `/tmp/gua.pid`가 PID 재사용으로 다른 프로세스를 가리킬 수 있으므로 `status`/`stop` 전에
+  해당 PID가 실제 managed `gpu_usage_audit daemon` 프로세스인지 확인한다.
+- report §2는 low-util 전체를 "waste"로 합치지 말고 `idle-held`와 `truly-idle`을 분리한다.
+- report §4는 process row가 아니라 identity/GPU/tick 단위로 먼저 접어서 사용자별 GPU-hours를 계산한다.
+- report 출력 자체에 sample 의미, classification rule, `--interval` 의존성, heatmap 의미를 짧게 노출한다.
+- NVML process list 조회 실패는 idle-held를 과소평가할 수 있으므로 warning으로 남긴다.
+- 1.0.2 release prep에서는 package version, README release asset 예시, CHANGELOG를 `1.0.2`로 맞춘다.
+## 주의할 점
+- 현재 로컬 개발 머신은 NVIDIA host가 아니다. `gua doctor`가 unsupported를 내는 것은 정상이다.
+- `/tmp/gua.db`가 이미 존재한다. 기본 경로 daemon 실행이 거부되는 것은 기대 동작이다.
+- `report --interval`은 daemon 수집 interval과 같아야 GPU-hours가 맞다.
+- SQLite WAL sidecar(`*.db-wal`, `*.db-shm`)는 마지막 connection이 닫히면 정리된다.
+- 1.0.2를 자를 경우 `env GITHUB_REF_NAME=v1.0.2 uv run python scripts/check-tag-version.py`가
+  통과해야 한다.
+## 다음 세션 추천 순서
+1. `git status --short`로 사용자 변경 여부를 먼저 확인한다.
+2. cleanup PR의 CI 결과와 review comments를 확인한다.
+3. 필요하면 report wording을 실제 운영자가 읽기 쉬운 형태로 한 번 더 다듬는다.
+4. merge 후 patch release가 필요하면 version bump와 changelog를 별도 PR로 처리한다.

gpu-usage-audit 1.0.0__tar.gz → 1.0.2__tar.gz

gpu-usage-audit 1.0.0tar.gz → 1.0.2tar.gz