PyPI - alloc - Versions diffs - 0.3.1__tar.gz → 0.5.0__tar.gz - Mend

alloc 0.3.1tar.gz → 0.5.0tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (45) hide show

{alloc-0.3.1 → alloc-0.5.0}/PKG-INFO RENAMED Viewed

@@ -1,9 +1,9 @@
 Metadata-Version: 2.4
 Name: alloc
-Version: 0.3.1
-Summary: Training cost intelligence for ML workloads: estimate fit, profile runs, and ship budget-aware GPU decisions.
+Version: 0.5.0
+Summary: Engineer-first training calibration: estimate VRAM fit, profile short runs, and pick GPU configs under real budget constraints.
 Author-email: Alloc Labs <hello@alloclabs.com>
-License: Apache-2.0
+License-Expression: Apache-2.0
 Project-URL: Homepage, https://alloclabs.com
 Project-URL: Repository, https://github.com/alloc-labs/alloc
 Classifier: Development Status :: 3 - Alpha
@@ -28,28 +28,25 @@ Provides-Extra: dev
 Requires-Dist: pytest>=7.0.0; extra == "dev"
 Requires-Dist: pytest-cov>=4.0.0; extra == "dev"
-# Alloc by [Alloc Labs](https://www.alloclabs.com)
+# alloc (by [Alloc Labs](https://www.alloclabs.com))
-**Training cost intelligence for ML workloads.** Alloc helps you decide what to run, where to run it, and whether the run should continue, before expensive mistakes hit your cloud bill.
+Engineer-first training calibration: estimate VRAM fit, profile short runs, and pick GPU configs under real budget constraints.
 [![Website](https://img.shields.io/badge/alloclabs.com-website-22c55e)](https://www.alloclabs.com)
 [![PyPI](https://img.shields.io/pypi/v/alloc)](https://pypi.org/project/alloc/)
 [![License](https://img.shields.io/badge/license-Apache--2.0-blue)](LICENSE)
-> Built by [Alloc Labs](https://www.alloclabs.com) — GPU cost optimization for ML training.
+> Built by [Alloc Labs](https://www.alloclabs.com): reduce ML training costs with better pre-flight decisions and faster feedback loops.
 ## What Alloc Does
-Most ML teams overpay because resource decisions are guesswork and feedback arrives too late. A single mis-sized run can burn thousands before anyone notices.
+Most ML teams waste spend because resource decisions are guesswork and feedback arrives too late. Alloc gives you a progressive workflow:
-Alloc gives you a progressive workflow:
+- **Pre-flight**: estimate VRAM fit and rank feasible configs by objective (`alloc scan`, `alloc ghost`)
+- **Calibration run**: measure peak VRAM + utilization (and optionally step timing) from a short run (`alloc run`)
+- **Run history**: upload artifacts for team visibility and budget-aware proposals (`alloc upload`)
-- **Pre-flight feasibility** — estimate fit and strategy risk before launch (`alloc ghost`, `alloc scan`)
-- **Live calibration** — run briefly, collect real utilization/timing signals, then stop (`alloc run`)
-- **Run intelligence** — upload artifacts for cost-aware analysis and proposals (`alloc upload`)
-- **Policy path** — move from single-user optimization to team budget/governance over time
-Works with PyTorch, HuggingFace, Lightning, and launcher flows such as `python`, `torchrun`, and `accelerate`. Local profiling works without outbound internet.
+Alloc is launcher-first. It works with `python`, `torchrun`, `accelerate`, and cluster entrypoints (Slurm, Ray, Kubernetes) because it does not require framework-specific wrappers for baseline value.
 ## Who This Is For
@@ -69,21 +66,31 @@ Works with PyTorch, HuggingFace, Lightning, and launcher flows such as `python`,
 ```bash
 pip install alloc
-# With GPU monitoring support
+# With GPU monitoring support (NVML via pynvml)
 pip install alloc[gpu]
 ```
+Notes:
+- `alloc` does not depend on torch. If you want `alloc ghost train.py` to infer param counts from a script, torch must be installed in that environment, otherwise use `--param-count-b`.
+- `alloc run` will still execute your command without `alloc[gpu]`, but it cannot collect GPU metrics.
 ## Commands
-### `alloc scan` — Remote Ghost Scan (no GPU needed)
+### `alloc scan`: Remote Ghost Scan (no GPU needed)
 ```bash
 alloc scan --model llama-3-70b --gpu A100-80GB
 alloc scan --model mistral-7b --gpu A10G --strategy fsdp --num-gpus 4
 alloc scan --param-count-b 13.0 --gpu H100-80GB --dtype bf16
+# Objective + budget constraints
+alloc scan --model llama-3-70b --gpu H100-80GB --objective fastest_within_budget --max-budget-hourly 12
+# Topology hints (optional, improves planner quality)
+alloc scan --param-count-b 70 --gpu H100-80GB --num-gpus 64 --num-nodes 8 --gpus-per-node 8 --interconnect infiniband
 ```
-### `alloc ghost` — Local VRAM estimation
+### `alloc ghost`: Local VRAM estimation
 ```bash
 alloc ghost train.py --dtype bf16 --batch-size 32
@@ -92,7 +99,7 @@ alloc ghost train.py --param-count-b 7.0   # manual override
 Analyzes your training script to discover model parameters and computes a VRAM breakdown. Uses a three-method fallback: (1) `--param-count-b` manual override, (2) subprocess execution to find `nn.Module` classes and count parameters, (3) AST parsing for `from_pretrained()` calls.
-### `alloc run` — Training with GPU monitoring
+### `alloc run`: Training with GPU monitoring
 ```bash
 alloc run python train.py                # calibrate and exit (default)
@@ -103,20 +110,40 @@ alloc run -- python train.py --epochs 10
 Wraps your command, monitors GPU memory/utilization/power via `pynvml`, and writes an artifact.
-**Default: calibrate-and-exit.** Auto-stops when GPU metrics stabilize (~30-60s), prints a verdict with bottleneck classification and recommendation, then exits. Use `--full` to monitor the entire run. Use `--timeout N` to adjust max calibration time (default 120s).
+**Default: calibrate-and-exit.** Auto-stops when GPU metrics stabilize, prints a verdict with bottleneck classification and a top recommendation, then exits. Use `--timeout N` to adjust max calibration time (default 120s). Use `--full` to monitor the entire run.
 **Multi-GPU:** Automatically discovers all GPUs used by the process tree (works with `torchrun`, `accelerate launch`, etc.).
 **Hardware context:** Captures driver version, CUDA version, and SM compute capability from NVML.
-### `alloc login` — Authenticate with dashboard
+### `alloc login`: Authenticate with dashboard
 ```bash
 alloc login
-# Prompts for email + password, stores token in ~/.alloc/config.json
+# Prompts for email + password, stores token + refresh_token in ~/.alloc/config.json
+alloc login --token <ACCESS_TOKEN>
+# Paste an access token from the dashboard (no password prompt)
 ```
-### `alloc upload` — Upload artifact to dashboard
+### `alloc whoami`: Show current auth + org context
+```bash
+alloc whoami
+alloc whoami --json
+```
+Prints the current identity (when logged in), plus objective, effective budget cap, and fleet counts.
+### `alloc logout`: Clear local session
+```bash
+alloc logout
+```
+Clears saved `token`/`refresh_token` from `~/.alloc/config.json`.
+### `alloc upload`: Upload artifact to dashboard
 ```bash
 alloc upload alloc_artifact.json.gz
@@ -124,7 +151,9 @@ alloc upload alloc_artifact.json.gz
 Uploads a previously saved `.json.gz` artifact to the dashboard via `POST /runs/ingest`. Requires authentication (`alloc login` first).
-### `alloc catalog` — Browse GPU hardware catalog
+If your session token has expired and a `refresh_token` is available (password login flow), `alloc upload` refreshes once and retries automatically.
+### `alloc catalog`: Browse GPU hardware catalog
 ```bash
 alloc catalog list                           # list all 13 GPUs (sorted by VRAM)
@@ -136,11 +165,12 @@ alloc catalog show nvidia-a100-sxm-80gb      # lookup by stable ID
 Offline reference for GPU specs, interconnect details, and cloud pricing. Supports aliases (H100, A100, T4) and stable IDs.
-### `alloc init` — Configure GPU fleet & budget
+### `alloc init`: Configure GPU fleet and budget
 ```bash
 alloc init                     # interactive wizard
 alloc init --yes               # non-interactive defaults (full catalog, 50/50 priority)
+alloc init --from-org --yes    # pull fleet/budget/objective from your org (requires alloc login)
 ```
 Creates a `.alloc.yaml` file in the current directory with your GPU fleet, explore list, budget, and priority weights. When present, `ghost`, `run`, and `scan` automatically use fleet context for recommendations. Use `--no-config` on any command to skip it.
@@ -182,13 +212,15 @@ Callbacks write a `.alloc_callback.json` sidecar with step time (p50/p90), sampl
 ## Configuration
-All config via environment variables. Zero config files required.
+Alloc works with zero config. You can optionally configure it with environment variables and/or a `.alloc.yaml` in your repo.
 | Variable | Default | Description |
 |----------|---------|-------------|
 | `ALLOC_API_URL` | `https://alloc-production-ffc2.up.railway.app` | API endpoint for remote scans |
 | `ALLOC_TOKEN` | (empty) | Auth token for API calls |
-| `ALLOC_UPLOAD` | `false` | Upload results to dashboard |
+| `ALLOC_UPLOAD` | `false` | Upload results to dashboard (`alloc run --upload` also works) |
+| `ALLOC_OUT` | `alloc_artifact.json.gz` | Artifact output path |
+| `ALLOC_GPU_COUNT_CANDIDATES` | (empty) | Override GPU-count candidates for ranking (comma-separated ints) |
 ## Architecture
@@ -210,18 +242,15 @@ All config via environment variables. Zero config files required.
 ## Design Principles
-1. **Zero config** — `alloc run python train.py` works out of the box
-2. **No monkey-patching** — External monitoring only, explicit opt-in API
-3. **Never crash user's training** — All Alloc failures are caught and silenced
-4. **Progressive disclosure** — Individual use first, team governance later
-## Deep GPU Metrics (via Probe)
-| Metric | Why It Matters |
-|--------|---------------|
-| Memory bandwidth utilization | Identifies memory-bandwidth-bound workloads |
-| Tensor core vs CUDA core utilization | Reveals if workload uses tensor cores (FP16/BF16) |
-| SM occupancy | Low occupancy = kernel launch overhead or small batches |
-| PCIe/NVLink transfer rates | Communication bottlenecks in multi-GPU setups |
-| Compute throughput (TFLOPS) | Actual vs theoretical — feeds cost-efficiency analysis |
-| Power draw | Thermal throttling detection |
+1. **Zero config**: `alloc run python train.py` works out of the box
+2. **No monkey-patching**: External monitoring only; deeper signals are opt-in
+3. **Never crash user's training**: All Alloc failures are caught and training continues
+4. **Progressive disclosure**: Individual use first, team governance later
+## Telemetry Levels
+Alloc intentionally starts non-invasive and adds richer signals only when you opt in.
+- **NVML (today)**: peak VRAM, GPU utilization, power draw, basic hardware context (driver/CUDA/SM), multi-GPU discovery from the process tree.
+- **Framework timing (today, opt-in)**: step time p50/p90, samples/sec, estimated dataloader wait percentage via HF/Lightning callbacks.
+- **Distributed timing (planned, opt-in)**: per-rank timing skew, communication overhead, stronger interconnect-aware recommendations.

{alloc-0.3.1 → alloc-0.5.0}/README.md RENAMED Viewed

@@ -1,25 +1,22 @@
-# Alloc by [Alloc Labs](https://www.alloclabs.com)
+# alloc (by [Alloc Labs](https://www.alloclabs.com))
-**Training cost intelligence for ML workloads.** Alloc helps you decide what to run, where to run it, and whether the run should continue, before expensive mistakes hit your cloud bill.
+Engineer-first training calibration: estimate VRAM fit, profile short runs, and pick GPU configs under real budget constraints.
 [![Website](https://img.shields.io/badge/alloclabs.com-website-22c55e)](https://www.alloclabs.com)
 [![PyPI](https://img.shields.io/pypi/v/alloc)](https://pypi.org/project/alloc/)
 [![License](https://img.shields.io/badge/license-Apache--2.0-blue)](LICENSE)
-> Built by [Alloc Labs](https://www.alloclabs.com) — GPU cost optimization for ML training.
+> Built by [Alloc Labs](https://www.alloclabs.com): reduce ML training costs with better pre-flight decisions and faster feedback loops.
 ## What Alloc Does
-Most ML teams overpay because resource decisions are guesswork and feedback arrives too late. A single mis-sized run can burn thousands before anyone notices.
+Most ML teams waste spend because resource decisions are guesswork and feedback arrives too late. Alloc gives you a progressive workflow:
-Alloc gives you a progressive workflow:
+- **Pre-flight**: estimate VRAM fit and rank feasible configs by objective (`alloc scan`, `alloc ghost`)
+- **Calibration run**: measure peak VRAM + utilization (and optionally step timing) from a short run (`alloc run`)
+- **Run history**: upload artifacts for team visibility and budget-aware proposals (`alloc upload`)
-- **Pre-flight feasibility** — estimate fit and strategy risk before launch (`alloc ghost`, `alloc scan`)
-- **Live calibration** — run briefly, collect real utilization/timing signals, then stop (`alloc run`)
-- **Run intelligence** — upload artifacts for cost-aware analysis and proposals (`alloc upload`)
-- **Policy path** — move from single-user optimization to team budget/governance over time
-Works with PyTorch, HuggingFace, Lightning, and launcher flows such as `python`, `torchrun`, and `accelerate`. Local profiling works without outbound internet.
+Alloc is launcher-first. It works with `python`, `torchrun`, `accelerate`, and cluster entrypoints (Slurm, Ray, Kubernetes) because it does not require framework-specific wrappers for baseline value.
 ## Who This Is For
@@ -39,21 +36,31 @@ Works with PyTorch, HuggingFace, Lightning, and launcher flows such as `python`,
 ```bash
 pip install alloc
-# With GPU monitoring support
+# With GPU monitoring support (NVML via pynvml)
 pip install alloc[gpu]
 ```
+Notes:
+- `alloc` does not depend on torch. If you want `alloc ghost train.py` to infer param counts from a script, torch must be installed in that environment, otherwise use `--param-count-b`.
+- `alloc run` will still execute your command without `alloc[gpu]`, but it cannot collect GPU metrics.
 ## Commands
-### `alloc scan` — Remote Ghost Scan (no GPU needed)
+### `alloc scan`: Remote Ghost Scan (no GPU needed)
 ```bash
 alloc scan --model llama-3-70b --gpu A100-80GB
 alloc scan --model mistral-7b --gpu A10G --strategy fsdp --num-gpus 4
 alloc scan --param-count-b 13.0 --gpu H100-80GB --dtype bf16
+# Objective + budget constraints
+alloc scan --model llama-3-70b --gpu H100-80GB --objective fastest_within_budget --max-budget-hourly 12
+# Topology hints (optional, improves planner quality)
+alloc scan --param-count-b 70 --gpu H100-80GB --num-gpus 64 --num-nodes 8 --gpus-per-node 8 --interconnect infiniband
 ```
-### `alloc ghost` — Local VRAM estimation
+### `alloc ghost`: Local VRAM estimation
 ```bash
 alloc ghost train.py --dtype bf16 --batch-size 32
@@ -62,7 +69,7 @@ alloc ghost train.py --param-count-b 7.0   # manual override
 Analyzes your training script to discover model parameters and computes a VRAM breakdown. Uses a three-method fallback: (1) `--param-count-b` manual override, (2) subprocess execution to find `nn.Module` classes and count parameters, (3) AST parsing for `from_pretrained()` calls.
-### `alloc run` — Training with GPU monitoring
+### `alloc run`: Training with GPU monitoring
 ```bash
 alloc run python train.py                # calibrate and exit (default)
@@ -73,20 +80,40 @@ alloc run -- python train.py --epochs 10
 Wraps your command, monitors GPU memory/utilization/power via `pynvml`, and writes an artifact.
-**Default: calibrate-and-exit.** Auto-stops when GPU metrics stabilize (~30-60s), prints a verdict with bottleneck classification and recommendation, then exits. Use `--full` to monitor the entire run. Use `--timeout N` to adjust max calibration time (default 120s).
+**Default: calibrate-and-exit.** Auto-stops when GPU metrics stabilize, prints a verdict with bottleneck classification and a top recommendation, then exits. Use `--timeout N` to adjust max calibration time (default 120s). Use `--full` to monitor the entire run.
 **Multi-GPU:** Automatically discovers all GPUs used by the process tree (works with `torchrun`, `accelerate launch`, etc.).
 **Hardware context:** Captures driver version, CUDA version, and SM compute capability from NVML.
-### `alloc login` — Authenticate with dashboard
+### `alloc login`: Authenticate with dashboard
 ```bash
 alloc login
-# Prompts for email + password, stores token in ~/.alloc/config.json
+# Prompts for email + password, stores token + refresh_token in ~/.alloc/config.json
+alloc login --token <ACCESS_TOKEN>
+# Paste an access token from the dashboard (no password prompt)
 ```
-### `alloc upload` — Upload artifact to dashboard
+### `alloc whoami`: Show current auth + org context
+```bash
+alloc whoami
+alloc whoami --json
+```
+Prints the current identity (when logged in), plus objective, effective budget cap, and fleet counts.
+### `alloc logout`: Clear local session
+```bash
+alloc logout
+```
+Clears saved `token`/`refresh_token` from `~/.alloc/config.json`.
+### `alloc upload`: Upload artifact to dashboard
 ```bash
 alloc upload alloc_artifact.json.gz
@@ -94,7 +121,9 @@ alloc upload alloc_artifact.json.gz
 Uploads a previously saved `.json.gz` artifact to the dashboard via `POST /runs/ingest`. Requires authentication (`alloc login` first).
-### `alloc catalog` — Browse GPU hardware catalog
+If your session token has expired and a `refresh_token` is available (password login flow), `alloc upload` refreshes once and retries automatically.
+### `alloc catalog`: Browse GPU hardware catalog
 ```bash
 alloc catalog list                           # list all 13 GPUs (sorted by VRAM)
@@ -106,11 +135,12 @@ alloc catalog show nvidia-a100-sxm-80gb      # lookup by stable ID
 Offline reference for GPU specs, interconnect details, and cloud pricing. Supports aliases (H100, A100, T4) and stable IDs.
-### `alloc init` — Configure GPU fleet & budget
+### `alloc init`: Configure GPU fleet and budget
 ```bash
 alloc init                     # interactive wizard
 alloc init --yes               # non-interactive defaults (full catalog, 50/50 priority)
+alloc init --from-org --yes    # pull fleet/budget/objective from your org (requires alloc login)
 ```
 Creates a `.alloc.yaml` file in the current directory with your GPU fleet, explore list, budget, and priority weights. When present, `ghost`, `run`, and `scan` automatically use fleet context for recommendations. Use `--no-config` on any command to skip it.
@@ -152,13 +182,15 @@ Callbacks write a `.alloc_callback.json` sidecar with step time (p50/p90), sampl
 ## Configuration
-All config via environment variables. Zero config files required.
+Alloc works with zero config. You can optionally configure it with environment variables and/or a `.alloc.yaml` in your repo.
 | Variable | Default | Description |
 |----------|---------|-------------|
 | `ALLOC_API_URL` | `https://alloc-production-ffc2.up.railway.app` | API endpoint for remote scans |
 | `ALLOC_TOKEN` | (empty) | Auth token for API calls |
-| `ALLOC_UPLOAD` | `false` | Upload results to dashboard |
+| `ALLOC_UPLOAD` | `false` | Upload results to dashboard (`alloc run --upload` also works) |
+| `ALLOC_OUT` | `alloc_artifact.json.gz` | Artifact output path |
+| `ALLOC_GPU_COUNT_CANDIDATES` | (empty) | Override GPU-count candidates for ranking (comma-separated ints) |
 ## Architecture
@@ -180,18 +212,15 @@ All config via environment variables. Zero config files required.
 ## Design Principles
-1. **Zero config** — `alloc run python train.py` works out of the box
-2. **No monkey-patching** — External monitoring only, explicit opt-in API
-3. **Never crash user's training** — All Alloc failures are caught and silenced
-4. **Progressive disclosure** — Individual use first, team governance later
-## Deep GPU Metrics (via Probe)
-| Metric | Why It Matters |
-|--------|---------------|
-| Memory bandwidth utilization | Identifies memory-bandwidth-bound workloads |
-| Tensor core vs CUDA core utilization | Reveals if workload uses tensor cores (FP16/BF16) |
-| SM occupancy | Low occupancy = kernel launch overhead or small batches |
-| PCIe/NVLink transfer rates | Communication bottlenecks in multi-GPU setups |
-| Compute throughput (TFLOPS) | Actual vs theoretical — feeds cost-efficiency analysis |
-| Power draw | Thermal throttling detection |
+1. **Zero config**: `alloc run python train.py` works out of the box
+2. **No monkey-patching**: External monitoring only; deeper signals are opt-in
+3. **Never crash user's training**: All Alloc failures are caught and training continues
+4. **Progressive disclosure**: Individual use first, team governance later
+## Telemetry Levels
+Alloc intentionally starts non-invasive and adds richer signals only when you opt in.
+- **NVML (today)**: peak VRAM, GPU utilization, power draw, basic hardware context (driver/CUDA/SM), multi-GPU discovery from the process tree.
+- **Framework timing (today, opt-in)**: step time p50/p90, samples/sec, estimated dataloader wait percentage via HF/Lightning callbacks.
+- **Distributed timing (planned, opt-in)**: per-rank timing skew, communication overhead, stronger interconnect-aware recommendations.

{alloc-0.3.1 → alloc-0.5.0}/pyproject.toml RENAMED Viewed

@@ -4,10 +4,10 @@ build-backend = "setuptools.build_meta"
 [project]
 name = "alloc"
-version = "0.3.1"
-description = "Training cost intelligence for ML workloads: estimate fit, profile runs, and ship budget-aware GPU decisions."
+version = "0.5.0"
+description = "Engineer-first training calibration: estimate VRAM fit, profile short runs, and pick GPU configs under real budget constraints."
 readme = "README.md"
-license = {text = "Apache-2.0"}
+license = "Apache-2.0"
 requires-python = ">=3.8"
 authors = [{name = "Alloc Labs", email = "hello@alloclabs.com"}]
 classifiers = [

{alloc-0.3.1 → alloc-0.5.0}/src/alloc/__init__.py RENAMED Viewed

@@ -2,7 +2,7 @@
 from __future__ import annotations
-__version__ = "0.3.1"
+__version__ = "0.5.0"
 from alloc.ghost import ghost, GhostReport
 from alloc.callbacks import AllocCallback as HuggingFaceCallback

{alloc-0.3.1 → alloc-0.5.0}/src/alloc/callbacks.py RENAMED Viewed

@@ -81,6 +81,42 @@ def _estimate_dataloader_wait(cv):
     return round((cv - 0.1) / 0.4 * 30.0, 1)
+def _detect_distributed():
+    # type: () -> tuple
+    """Detect if running inside a torch.distributed process group.
+    Returns (is_distributed, rank, world_size). Fail-safe: returns
+    (False, 0, 1) if torch.distributed is unavailable or not initialized.
+    """
+    try:
+        import torch.distributed as dist
+        if dist.is_initialized():
+            return True, dist.get_rank(), dist.get_world_size()
+    except Exception:
+        pass
+    return False, 0, 1
+def _estimate_comm_overhead(step_times_ms, dataloader_wait_pct=0.0):
+    # type: (List[float], float) -> Optional[float]
+    """Estimate communication overhead % for distributed training.
+    Uses the p90/p50 spread as a proxy for sync barrier delays.
+    Subtracts estimated dataloader contribution to avoid double-counting.
+    Returns None if insufficient data.
+    """
+    if len(step_times_ms) < 10:
+        return None
+    sorted_vals = sorted(step_times_ms)
+    p50 = _compute_percentile(sorted_vals, 50)
+    p90 = _compute_percentile(sorted_vals, 90)
+    if p50 <= 0:
+        return None
+    raw_pct = ((p90 - p50) / p50) * 100
+    comm_pct = max(0.0, raw_pct - dataloader_wait_pct)
+    return round(min(40.0, comm_pct), 1)
 def _write_callback_data(data):
     # type: (Dict[str, Any]) -> None
     """Write callback data to the alloc sidecar file.
@@ -101,6 +137,9 @@ def _build_sidecar(
     step_count,      # type: int
     step_times_ms,   # type: List[float]
     batch_size,      # type: Optional[int]
+    is_distributed=False,  # type: bool
+    rank=0,          # type: int
+    world_size=1,    # type: int
 ):
     # type: (...) -> Dict[str, Any]
     """Build the sidecar dict from collected timing data."""
@@ -124,6 +163,15 @@ def _build_sidecar(
         "batch_size": batch_size,
         "dataloader_wait_pct": dataloader_wait_pct,
     }
+    if is_distributed:
+        data["is_distributed"] = True
+        data["rank"] = rank
+        data["world_size"] = world_size
+        comm = _estimate_comm_overhead(step_times_ms, dataloader_wait_pct)
+        if comm is not None:
+            data["comm_overhead_pct"] = comm
     return data
@@ -142,9 +190,17 @@ try:
             self._step_start = None      # type: Optional[float]
             self._batch_size = None      # type: Optional[int]
             self._last_write_step = 0    # type: int
+            self._dist_checked = False   # type: bool
+            self._is_distributed = False  # type: bool
+            self._rank = 0               # type: int
+            self._world_size = 1         # type: int
         def on_step_begin(self, args, state, control, **kwargs):
             self._step_start = time.monotonic()
+            # Detect distributed once after process group is initialized
+            if not self._dist_checked:
+                self._is_distributed, self._rank, self._world_size = _detect_distributed()
+                self._dist_checked = True
         def on_step_end(self, args, state, control, **kwargs):
             self.step_count = state.global_step
@@ -183,6 +239,9 @@ try:
                 step_count=self.step_count,
                 step_times_ms=self._step_times_ms,
                 batch_size=self._batch_size,
+                is_distributed=self._is_distributed,
+                rank=self._rank,
+                world_size=self._world_size,
             )
             _write_callback_data(data)
@@ -214,9 +273,16 @@ try:
             self._step_start = None      # type: Optional[float]
             self._batch_size = None      # type: Optional[int]
             self._last_write_step = 0    # type: int
+            self._dist_checked = False   # type: bool
+            self._is_distributed = False  # type: bool
+            self._rank = 0               # type: int
+            self._world_size = 1         # type: int
         def on_train_batch_start(self, trainer, pl_module, batch, batch_idx):
             self._step_start = time.monotonic()
+            if not self._dist_checked:
+                self._is_distributed, self._rank, self._world_size = _detect_distributed()
+                self._dist_checked = True
         def on_train_batch_end(self, trainer, pl_module, outputs, batch, batch_idx):
             self.step_count = trainer.global_step
@@ -259,6 +325,9 @@ try:
                 step_count=self.step_count,
                 step_times_ms=self._step_times_ms,
                 batch_size=self._batch_size,
+                is_distributed=self._is_distributed,
+                rank=self._rank,
+                world_size=self._world_size,
             )
             _write_callback_data(data)

{alloc-0.3.1 → alloc-0.5.0}/src/alloc/catalog/__init__.py RENAMED Viewed

@@ -76,6 +76,35 @@ def list_gpus() -> List[dict]:
     return sorted(result, key=lambda x: x["vram_gb"], reverse=True)
+def get_default_rate(gpu_name: str) -> Optional[float]:
+    """Look up the average default $/hr for a GPU by name or alias.
+    Tries to match the probe-reported GPU name against catalog display names.
+    Returns the average across clouds, or None if not found.
+    """
+    rate_card = _load_rate_card()
+    rates = rate_card.get("rates", {})
+    # Direct match by display name
+    for display_name, cloud_rates in rates.items():
+        if display_name.lower() in gpu_name.lower() or gpu_name.lower() in display_name.lower():
+            vals = [v for v in cloud_rates.values() if isinstance(v, (int, float))]
+            return sum(vals) / len(vals) if vals else None
+    # Try aliases → display name
+    for alias, stable_id in _ALIASES.items():
+        if alias.lower() in gpu_name.lower():
+            catalog = _load_catalog()
+            spec = catalog.get("gpus", {}).get(stable_id)
+            if spec:
+                dn = spec.get("display_name", "")
+                cloud_rates = rates.get(dn, {})
+                vals = [v for v in cloud_rates.values() if isinstance(v, (int, float))]
+                return sum(vals) / len(vals) if vals else None
+    return None
 def get_gpu(gpu_id: str) -> Optional[dict]:
     """Look up a GPU by stable ID or alias.

alloc 0.3.1__tar.gz → 0.5.0__tar.gz

alloc 0.3.1tar.gz → 0.5.0tar.gz