PyPI - wavedl - Versions diffs - 1.3.1__tar.gz → 1.4.1__tar.gz - Mend

wavedl 1.3.1tar.gz → 1.4.1tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (42) hide show

{wavedl-1.3.1/src/wavedl.egg-info → wavedl-1.4.1}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.2
 Name: wavedl
-Version: 1.3.1
+Version: 1.4.1
 Summary: A Scalable Deep Learning Framework for Wave-Based Inverse Problems
 Author: Ductho Le
 License: MIT
@@ -30,31 +30,18 @@ Requires-Dist: scikit-learn>=1.2.0
 Requires-Dist: pandas>=2.0.0
 Requires-Dist: matplotlib>=3.7.0
 Requires-Dist: tqdm>=4.65.0
-Requires-Dist: wandb>=0.15.0
 Requires-Dist: pyyaml>=6.0.0
 Requires-Dist: h5py>=3.8.0
 Requires-Dist: safetensors>=0.3.0
-Provides-Extra: dev
-Requires-Dist: pytest>=7.0.0; extra == "dev"
-Requires-Dist: pytest-xdist>=3.5.0; extra == "dev"
-Requires-Dist: ruff>=0.8.0; extra == "dev"
-Requires-Dist: pre-commit>=3.5.0; extra == "dev"
-Provides-Extra: onnx
-Requires-Dist: onnx>=1.14.0; extra == "onnx"
-Requires-Dist: onnxruntime>=1.15.0; extra == "onnx"
-Provides-Extra: compile
-Requires-Dist: triton; sys_platform == "linux" and extra == "compile"
-Provides-Extra: hpo
-Requires-Dist: optuna>=3.0.0; extra == "hpo"
-Provides-Extra: all
-Requires-Dist: pytest>=7.0.0; extra == "all"
-Requires-Dist: pytest-xdist>=3.5.0; extra == "all"
-Requires-Dist: ruff>=0.8.0; extra == "all"
-Requires-Dist: pre-commit>=3.5.0; extra == "all"
-Requires-Dist: onnx>=1.14.0; extra == "all"
-Requires-Dist: onnxruntime>=1.15.0; extra == "all"
-Requires-Dist: triton; sys_platform == "linux" and extra == "all"
-Requires-Dist: optuna>=3.0.0; extra == "all"
+Requires-Dist: wandb>=0.15.0
+Requires-Dist: optuna>=3.0.0
+Requires-Dist: onnx>=1.14.0
+Requires-Dist: onnxruntime>=1.15.0
+Requires-Dist: pytest>=7.0.0
+Requires-Dist: pytest-xdist>=3.5.0
+Requires-Dist: ruff>=0.8.0
+Requires-Dist: pre-commit>=3.5.0
+Requires-Dist: triton>=2.0.0; sys_platform == "linux"
 <div align="center">
@@ -210,20 +197,20 @@ Deploy models anywhere:
 ### Installation
+#### From PyPI (recommended for all users)
 ```bash
-# Install from PyPI (recommended)
 pip install wavedl
-# Or install with all extras (ONNX export, HPO, dev tools)
-pip install wavedl[all]
 ```
+This installs everything you need: training, inference, HPO, ONNX export, and dev tools.
 #### From Source (for development)
 ```bash
 git clone https://github.com/ductho-le/WaveDL.git
 cd WaveDL
-pip install -e ".[dev]"
+pip install -e .
 ```
 > [!NOTE]
@@ -359,41 +346,47 @@ WaveDL handles everything else: training loop, logging, checkpoints, multi-GPU,
 ```
 WaveDL/
 ├── src/
-│   └── wavedl/                # Main package (namespaced)
-│       ├── __init__.py        # Package init with __version__
-│       ├── train.py           # Training entry point
-│       ├── test.py            # Testing & inference script
-│       ├── hpo.py             # Hyperparameter optimization
-│       ├── hpc.py             # HPC distributed training launcher
+│   └── wavedl/                   # Main package (namespaced)
+│       ├── __init__.py           # Package init with __version__
+│       ├── train.py              # Training entry point
+│       ├── test.py               # Testing & inference script
+│       ├── hpo.py                # Hyperparameter optimization
+│       ├── hpc.py                # HPC distributed training launcher
 │       │
-│       ├── models/            # Model architectures
-│       │   ├── registry.py    # Model factory (@register_model)
-│       │   ├── base.py        # Abstract base class
-│       │   ├── cnn.py         # Baseline CNN
-│       │   ├── resnet.py      # ResNet-18/34/50 (1D/2D/3D)
-│       │   ├── efficientnet.py# EfficientNet-B0/B1/B2
-│       │   ├── vit.py         # Vision Transformer (1D/2D)
-│       │   ├── convnext.py    # ConvNeXt (1D/2D/3D)
-│       │   ├── densenet.py    # DenseNet-121/169 (1D/2D/3D)
-│       │   └── unet.py        # U-Net / U-Net Regression
+│       ├── models/               # Model architectures (38 variants)
+│       │   ├── registry.py       # Model factory (@register_model)
+│       │   ├── base.py           # Abstract base class
+│       │   ├── cnn.py            # Baseline CNN (1D/2D/3D)
+│       │   ├── resnet.py         # ResNet-18/34/50 (1D/2D/3D)
+│       │   ├── resnet3d.py       # ResNet3D-18, MC3-18 (3D only)
+│       │   ├── tcn.py            # TCN (1D only)
+│       │   ├── efficientnet.py   # EfficientNet-B0/B1/B2 (2D)
+│       │   ├── efficientnetv2.py # EfficientNetV2-S/M/L (2D)
+│       │   ├── mobilenetv3.py    # MobileNetV3-Small/Large (2D)
+│       │   ├── regnet.py         # RegNetY variants (2D)
+│       │   ├── swin.py           # Swin Transformer (2D)
+│       │   ├── vit.py            # Vision Transformer (1D/2D)
+│       │   ├── convnext.py       # ConvNeXt (1D/2D/3D)
+│       │   ├── densenet.py       # DenseNet-121/169 (1D/2D/3D)
+│       │   └── unet.py           # U-Net Regression
 │       │
-│       └── utils/             # Utilities
-│           ├── data.py        # Memory-mapped data pipeline
-│           ├── metrics.py     # R², Pearson, visualization
-│           ├── distributed.py # DDP synchronization
-│           ├── losses.py      # Loss function factory
-│           ├── optimizers.py  # Optimizer factory
-│           ├── schedulers.py  # LR scheduler factory
-│           └── config.py      # YAML configuration support
+│       └── utils/                # Utilities
+│           ├── data.py           # Memory-mapped data pipeline
+│           ├── metrics.py        # R², Pearson, visualization
+│           ├── distributed.py    # DDP synchronization
+│           ├── losses.py         # Loss function factory
+│           ├── optimizers.py     # Optimizer factory
+│           ├── schedulers.py     # LR scheduler factory
+│           └── config.py         # YAML configuration support
 │
-├── configs/                   # YAML config templates
-├── examples/                  # Ready-to-run examples
-├── notebooks/                 # Jupyter notebooks
-├── unit_tests/                # Pytest test suite (422 tests)
+├── configs/                      # YAML config templates
+├── examples/                     # Ready-to-run examples
+├── notebooks/                    # Jupyter notebooks
+├── unit_tests/                   # Pytest test suite (704 tests)
 │
-├── pyproject.toml             # Package config, dependencies
-├── CHANGELOG.md               # Version history
-└── CITATION.cff               # Citation metadata
+├── pyproject.toml                # Package config, dependencies
+├── CHANGELOG.md                  # Version history
+└── CITATION.cff                  # Citation metadata
 ```
 ---
@@ -412,33 +405,63 @@ WaveDL/
 > ```
 <details>
-<summary><b>Available Models</b> — 21 pre-built architectures</summary>
-| Model | Best For | Params (2D) | Dimensionality |
-|-------|----------|-------------|----------------|
-| `cnn` | Baseline, lightweight | 1.7M | 1D/2D/3D |
-| `resnet18` | Fast training, smaller datasets | 11.4M | 1D/2D/3D |
-| `resnet34` | Balanced performance | 21.5M | 1D/2D/3D |
-| `resnet50` | High capacity, complex patterns | 24.6M | 1D/2D/3D |
-| `resnet18_pretrained` | **Transfer learning** ⭐ | 11.4M | 2D only |
-| `resnet50_pretrained` | **Transfer learning** ⭐ | 24.6M | 2D only |
-| `efficientnet_b0` | Efficient, **pretrained** ⭐ | 4.7M | 2D only |
-| `efficientnet_b1` | Efficient, **pretrained** ⭐ | 7.2M | 2D only |
-| `efficientnet_b2` | Efficient, **pretrained** ⭐ | 8.4M | 2D only |
-| `vit_tiny` | Transformer, small datasets | 5.4M | 1D/2D |
-| `vit_small` | Transformer, balanced | 21.5M | 1D/2D |
-| `vit_base` | Transformer, high capacity | 85.5M | 1D/2D |
-| `convnext_tiny` | Modern CNN, transformer-inspired | 28.2M | 1D/2D/3D |
-| `convnext_tiny_pretrained` | **Transfer learning** ⭐ | 28.2M | 2D only |
-| `convnext_small` | Modern CNN, balanced | 49.8M | 1D/2D/3D |
-| `convnext_base` | Modern CNN, high capacity | 88.1M | 1D/2D/3D |
-| `densenet121` | Feature reuse, small data | 7.5M | 1D/2D/3D |
-| `densenet121_pretrained` | **Transfer learning** ⭐ | 7.5M | 2D only |
-| `densenet169` | Deeper DenseNet | 13.3M | 1D/2D/3D |
-| `unet` | Spatial output (velocity fields) | 31.0M | 1D/2D/3D |
-| `unet_regression` | Multi-scale features for regression | 31.1M | 1D/2D/3D |
-> ⭐ **Pretrained models** use ImageNet weights for transfer learning.
+<summary><b>Available Models</b> — 38 architectures</summary>
+| Model | Params | Dim |
+|-------|--------|-----|
+| **CNN** — Convolutional Neural Network |||
+| `cnn` | 1.7M | 1D/2D/3D |
+| **ResNet** — Residual Network |||
+| `resnet18` | 11.4M | 1D/2D/3D |
+| `resnet34` | 21.5M | 1D/2D/3D |
+| `resnet50` | 24.6M | 1D/2D/3D |
+| `resnet18_pretrained` ⭐ | 11.4M | 2D |
+| `resnet50_pretrained` ⭐ | 24.6M | 2D |
+| **ResNet3D** — 3D Residual Network |||
+| `resnet3d_18` | 33.6M | 3D |
+| `mc3_18` — Mixed Convolution 3D | 11.9M | 3D |
+| **TCN** — Temporal Convolutional Network |||
+| `tcn_small` | 1.0M | 1D |
+| `tcn` | 7.0M | 1D |
+| `tcn_large` | 10.2M | 1D |
+| **EfficientNet** — Efficient Neural Network |||
+| `efficientnet_b0` ⭐ | 4.7M | 2D |
+| `efficientnet_b1` ⭐ | 7.2M | 2D |
+| `efficientnet_b2` ⭐ | 8.4M | 2D |
+| **EfficientNetV2** — Efficient Neural Network V2 |||
+| `efficientnet_v2_s` ⭐ | 21.0M | 2D |
+| `efficientnet_v2_m` ⭐ | 53.6M | 2D |
+| `efficientnet_v2_l` ⭐ | 118.0M | 2D |
+| **MobileNetV3** — Mobile Neural Network V3 |||
+| `mobilenet_v3_small` ⭐ | 1.1M | 2D |
+| `mobilenet_v3_large` ⭐ | 3.2M | 2D |
+| **RegNet** — Regularized Network |||
+| `regnet_y_400mf` ⭐ | 4.0M | 2D |
+| `regnet_y_800mf` ⭐ | 5.8M | 2D |
+| `regnet_y_1_6gf` ⭐ | 10.5M | 2D |
+| `regnet_y_3_2gf` ⭐ | 18.3M | 2D |
+| `regnet_y_8gf` ⭐ | 37.9M | 2D |
+| **Swin** — Shifted Window Transformer |||
+| `swin_t` ⭐ | 28.0M | 2D |
+| `swin_s` ⭐ | 49.4M | 2D |
+| `swin_b` ⭐ | 87.4M | 2D |
+| **ConvNeXt** — Convolutional Next |||
+| `convnext_tiny` | 28.2M | 1D/2D/3D |
+| `convnext_small` | 49.8M | 1D/2D/3D |
+| `convnext_base` | 88.1M | 1D/2D/3D |
+| `convnext_tiny_pretrained` ⭐ | 28.2M | 2D |
+| **DenseNet** — Densely Connected Network |||
+| `densenet121` | 7.5M | 1D/2D/3D |
+| `densenet169` | 13.3M | 1D/2D/3D |
+| `densenet121_pretrained` ⭐ | 7.5M | 2D |
+| **ViT** — Vision Transformer |||
+| `vit_tiny` | 5.5M | 1D/2D |
+| `vit_small` | 21.6M | 1D/2D |
+| `vit_base` | 85.6M | 1D/2D |
+| **U-Net** — U-shaped Network |||
+| `unet_regression` | 31.1M | 1D/2D/3D |
+> ⭐ = Pretrained on ImageNet. Recommended for smaller datasets.
 </details>
@@ -479,12 +502,32 @@ WaveDL/
 | Argument | Default | Description |
 |----------|---------|-------------|
-| `--compile` | `False` | Enable `torch.compile` |
+| `--compile` | `False` | Enable `torch.compile` (recommended for long runs) |
 | `--precision` | `bf16` | Mixed precision mode (`bf16`, `fp16`, `no`) |
+| `--workers` | `-1` | DataLoader workers per GPU (-1=auto, up to 16) |
 | `--wandb` | `False` | Enable W&B logging |
+| `--wandb_watch` | `False` | Enable W&B gradient watching (adds overhead) |
 | `--project_name` | `DL-Training` | W&B project name |
 | `--run_name` | `None` | W&B run name (auto-generated if not set) |
+**Automatic GPU Optimizations:**
+WaveDL automatically enables performance optimizations for modern GPUs:
+| Optimization | Effect | GPU Support |
+|--------------|--------|-------------|
+| **TF32 precision** | ~2x speedup for float32 matmul | A100, H100 (Ampere+) |
+| **cuDNN benchmark** | Auto-tuned convolutions | All NVIDIA GPUs |
+| **Worker scaling** | Up to 16 workers per GPU | All systems |
+> [!NOTE]
+> These optimizations are **backward compatible** — they have no effect on older GPUs (V100, T4, GTX) or CPU-only systems. No configuration needed.
+**HPC Best Practices:**
+- Stage data to `$SLURM_TMPDIR` (local NVMe) for maximum I/O throughput
+- Use `--compile` for training runs > 50 epochs
+- Increase `--workers` manually if auto-detection is suboptimal
 </details>
 <details>
@@ -653,12 +696,7 @@ seed: 2025
 Automatically find the best training configuration using [Optuna](https://optuna.org/).
-**Step 1: Install**
-```bash
-pip install -e ".[hpo]"
-```
-**Step 2: Run HPO**
+**Run HPO:**
 You specify which models to search and how many trials to run:
 ```bash
@@ -672,7 +710,7 @@ python -m wavedl.hpo --data_path train.npz --models cnn --n_trials 50
 python -m wavedl.hpo --data_path train.npz --models cnn resnet18 resnet50 vit_small densenet121 --n_trials 200
 ```
-**Step 3: Train with best parameters**
+**Train with best parameters**
 After HPO completes, it prints the optimal command:
 ```bash
@@ -715,7 +753,7 @@ accelerate launch -m wavedl.train --data_path train.npz --model cnn --lr 3.2e-4
 | `--output` | `hpo_results.json` | Output file |
 > [!TIP]
-> See [Available Models](#available-models) for all 21 architectures you can search.
+> See [Available Models](#available-models) for all 38 architectures you can search.
 </details>
@@ -819,7 +857,7 @@ import numpy as np
 X = np.random.randn(1000, 256, 256).astype(np.float32)
 y = np.random.randn(1000, 5).astype(np.float32)
-np.savez('test_data.npz', input_train=X, output_train=y)
+np.savez('test_data.npz', input_test=X, output_test=y)
 ```
 </details>
@@ -831,7 +869,7 @@ np.savez('test_data.npz', input_train=X, output_train=y)
 import numpy as np
 data = np.load('train_data.npz')
-assert data['input_train'].ndim == 3, "Input must be 3D: (N, H, W)"
+assert data['input_train'].ndim >= 2, "Input must be at least 2D: (N, ...) "
 assert data['output_train'].ndim == 2, "Output must be 2D: (N, T)"
 assert len(data['input_train']) == len(data['output_train']), "Sample mismatch"
@@ -986,7 +1024,7 @@ Beyond the material characterization example above, the WaveDL pipeline can be a
 | Resource | Description |
 |----------|-------------|
 | Technical Paper | In-depth framework description *(coming soon)* |
-| [`_template.py`](models/_template.py) | Template for new architectures |
+| [`_template.py`](src/wavedl/models/_template.py) | Template for custom architectures |
 ---

{wavedl-1.3.1 → wavedl-1.4.1}/README.md RENAMED Viewed

@@ -152,20 +152,20 @@ Deploy models anywhere:
 ### Installation
+#### From PyPI (recommended for all users)
 ```bash
-# Install from PyPI (recommended)
 pip install wavedl
-# Or install with all extras (ONNX export, HPO, dev tools)
-pip install wavedl[all]
 ```
+This installs everything you need: training, inference, HPO, ONNX export, and dev tools.
 #### From Source (for development)
 ```bash
 git clone https://github.com/ductho-le/WaveDL.git
 cd WaveDL
-pip install -e ".[dev]"
+pip install -e .
 ```
 > [!NOTE]
@@ -301,41 +301,47 @@ WaveDL handles everything else: training loop, logging, checkpoints, multi-GPU,
 ```
 WaveDL/
 ├── src/
-│   └── wavedl/                # Main package (namespaced)
-│       ├── __init__.py        # Package init with __version__
-│       ├── train.py           # Training entry point
-│       ├── test.py            # Testing & inference script
-│       ├── hpo.py             # Hyperparameter optimization
-│       ├── hpc.py             # HPC distributed training launcher
+│   └── wavedl/                   # Main package (namespaced)
+│       ├── __init__.py           # Package init with __version__
+│       ├── train.py              # Training entry point
+│       ├── test.py               # Testing & inference script
+│       ├── hpo.py                # Hyperparameter optimization
+│       ├── hpc.py                # HPC distributed training launcher
 │       │
-│       ├── models/            # Model architectures
-│       │   ├── registry.py    # Model factory (@register_model)
-│       │   ├── base.py        # Abstract base class
-│       │   ├── cnn.py         # Baseline CNN
-│       │   ├── resnet.py      # ResNet-18/34/50 (1D/2D/3D)
-│       │   ├── efficientnet.py# EfficientNet-B0/B1/B2
-│       │   ├── vit.py         # Vision Transformer (1D/2D)
-│       │   ├── convnext.py    # ConvNeXt (1D/2D/3D)
-│       │   ├── densenet.py    # DenseNet-121/169 (1D/2D/3D)
-│       │   └── unet.py        # U-Net / U-Net Regression
+│       ├── models/               # Model architectures (38 variants)
+│       │   ├── registry.py       # Model factory (@register_model)
+│       │   ├── base.py           # Abstract base class
+│       │   ├── cnn.py            # Baseline CNN (1D/2D/3D)
+│       │   ├── resnet.py         # ResNet-18/34/50 (1D/2D/3D)
+│       │   ├── resnet3d.py       # ResNet3D-18, MC3-18 (3D only)
+│       │   ├── tcn.py            # TCN (1D only)
+│       │   ├── efficientnet.py   # EfficientNet-B0/B1/B2 (2D)
+│       │   ├── efficientnetv2.py # EfficientNetV2-S/M/L (2D)
+│       │   ├── mobilenetv3.py    # MobileNetV3-Small/Large (2D)
+│       │   ├── regnet.py         # RegNetY variants (2D)
+│       │   ├── swin.py           # Swin Transformer (2D)
+│       │   ├── vit.py            # Vision Transformer (1D/2D)
+│       │   ├── convnext.py       # ConvNeXt (1D/2D/3D)
+│       │   ├── densenet.py       # DenseNet-121/169 (1D/2D/3D)
+│       │   └── unet.py           # U-Net Regression
 │       │
-│       └── utils/             # Utilities
-│           ├── data.py        # Memory-mapped data pipeline
-│           ├── metrics.py     # R², Pearson, visualization
-│           ├── distributed.py # DDP synchronization
-│           ├── losses.py      # Loss function factory
-│           ├── optimizers.py  # Optimizer factory
-│           ├── schedulers.py  # LR scheduler factory
-│           └── config.py      # YAML configuration support
+│       └── utils/                # Utilities
+│           ├── data.py           # Memory-mapped data pipeline
+│           ├── metrics.py        # R², Pearson, visualization
+│           ├── distributed.py    # DDP synchronization
+│           ├── losses.py         # Loss function factory
+│           ├── optimizers.py     # Optimizer factory
+│           ├── schedulers.py     # LR scheduler factory
+│           └── config.py         # YAML configuration support
 │
-├── configs/                   # YAML config templates
-├── examples/                  # Ready-to-run examples
-├── notebooks/                 # Jupyter notebooks
-├── unit_tests/                # Pytest test suite (422 tests)
+├── configs/                      # YAML config templates
+├── examples/                     # Ready-to-run examples
+├── notebooks/                    # Jupyter notebooks
+├── unit_tests/                   # Pytest test suite (704 tests)
 │
-├── pyproject.toml             # Package config, dependencies
-├── CHANGELOG.md               # Version history
-└── CITATION.cff               # Citation metadata
+├── pyproject.toml                # Package config, dependencies
+├── CHANGELOG.md                  # Version history
+└── CITATION.cff                  # Citation metadata
 ```
 ---
@@ -354,33 +360,63 @@ WaveDL/
 > ```
 <details>
-<summary><b>Available Models</b> — 21 pre-built architectures</summary>
-| Model | Best For | Params (2D) | Dimensionality |
-|-------|----------|-------------|----------------|
-| `cnn` | Baseline, lightweight | 1.7M | 1D/2D/3D |
-| `resnet18` | Fast training, smaller datasets | 11.4M | 1D/2D/3D |
-| `resnet34` | Balanced performance | 21.5M | 1D/2D/3D |
-| `resnet50` | High capacity, complex patterns | 24.6M | 1D/2D/3D |
-| `resnet18_pretrained` | **Transfer learning** ⭐ | 11.4M | 2D only |
-| `resnet50_pretrained` | **Transfer learning** ⭐ | 24.6M | 2D only |
-| `efficientnet_b0` | Efficient, **pretrained** ⭐ | 4.7M | 2D only |
-| `efficientnet_b1` | Efficient, **pretrained** ⭐ | 7.2M | 2D only |
-| `efficientnet_b2` | Efficient, **pretrained** ⭐ | 8.4M | 2D only |
-| `vit_tiny` | Transformer, small datasets | 5.4M | 1D/2D |
-| `vit_small` | Transformer, balanced | 21.5M | 1D/2D |
-| `vit_base` | Transformer, high capacity | 85.5M | 1D/2D |
-| `convnext_tiny` | Modern CNN, transformer-inspired | 28.2M | 1D/2D/3D |
-| `convnext_tiny_pretrained` | **Transfer learning** ⭐ | 28.2M | 2D only |
-| `convnext_small` | Modern CNN, balanced | 49.8M | 1D/2D/3D |
-| `convnext_base` | Modern CNN, high capacity | 88.1M | 1D/2D/3D |
-| `densenet121` | Feature reuse, small data | 7.5M | 1D/2D/3D |
-| `densenet121_pretrained` | **Transfer learning** ⭐ | 7.5M | 2D only |
-| `densenet169` | Deeper DenseNet | 13.3M | 1D/2D/3D |
-| `unet` | Spatial output (velocity fields) | 31.0M | 1D/2D/3D |
-| `unet_regression` | Multi-scale features for regression | 31.1M | 1D/2D/3D |
-> ⭐ **Pretrained models** use ImageNet weights for transfer learning.
+<summary><b>Available Models</b> — 38 architectures</summary>
+| Model | Params | Dim |
+|-------|--------|-----|
+| **CNN** — Convolutional Neural Network |||
+| `cnn` | 1.7M | 1D/2D/3D |
+| **ResNet** — Residual Network |||
+| `resnet18` | 11.4M | 1D/2D/3D |
+| `resnet34` | 21.5M | 1D/2D/3D |
+| `resnet50` | 24.6M | 1D/2D/3D |
+| `resnet18_pretrained` ⭐ | 11.4M | 2D |
+| `resnet50_pretrained` ⭐ | 24.6M | 2D |
+| **ResNet3D** — 3D Residual Network |||
+| `resnet3d_18` | 33.6M | 3D |
+| `mc3_18` — Mixed Convolution 3D | 11.9M | 3D |
+| **TCN** — Temporal Convolutional Network |||
+| `tcn_small` | 1.0M | 1D |
+| `tcn` | 7.0M | 1D |
+| `tcn_large` | 10.2M | 1D |
+| **EfficientNet** — Efficient Neural Network |||
+| `efficientnet_b0` ⭐ | 4.7M | 2D |
+| `efficientnet_b1` ⭐ | 7.2M | 2D |
+| `efficientnet_b2` ⭐ | 8.4M | 2D |
+| **EfficientNetV2** — Efficient Neural Network V2 |||
+| `efficientnet_v2_s` ⭐ | 21.0M | 2D |
+| `efficientnet_v2_m` ⭐ | 53.6M | 2D |
+| `efficientnet_v2_l` ⭐ | 118.0M | 2D |
+| **MobileNetV3** — Mobile Neural Network V3 |||
+| `mobilenet_v3_small` ⭐ | 1.1M | 2D |
+| `mobilenet_v3_large` ⭐ | 3.2M | 2D |
+| **RegNet** — Regularized Network |||
+| `regnet_y_400mf` ⭐ | 4.0M | 2D |
+| `regnet_y_800mf` ⭐ | 5.8M | 2D |
+| `regnet_y_1_6gf` ⭐ | 10.5M | 2D |
+| `regnet_y_3_2gf` ⭐ | 18.3M | 2D |
+| `regnet_y_8gf` ⭐ | 37.9M | 2D |
+| **Swin** — Shifted Window Transformer |||
+| `swin_t` ⭐ | 28.0M | 2D |
+| `swin_s` ⭐ | 49.4M | 2D |
+| `swin_b` ⭐ | 87.4M | 2D |
+| **ConvNeXt** — Convolutional Next |||
+| `convnext_tiny` | 28.2M | 1D/2D/3D |
+| `convnext_small` | 49.8M | 1D/2D/3D |
+| `convnext_base` | 88.1M | 1D/2D/3D |
+| `convnext_tiny_pretrained` ⭐ | 28.2M | 2D |
+| **DenseNet** — Densely Connected Network |||
+| `densenet121` | 7.5M | 1D/2D/3D |
+| `densenet169` | 13.3M | 1D/2D/3D |
+| `densenet121_pretrained` ⭐ | 7.5M | 2D |
+| **ViT** — Vision Transformer |||
+| `vit_tiny` | 5.5M | 1D/2D |
+| `vit_small` | 21.6M | 1D/2D |
+| `vit_base` | 85.6M | 1D/2D |
+| **U-Net** — U-shaped Network |||
+| `unet_regression` | 31.1M | 1D/2D/3D |
+> ⭐ = Pretrained on ImageNet. Recommended for smaller datasets.
 </details>
@@ -421,12 +457,32 @@ WaveDL/
 | Argument | Default | Description |
 |----------|---------|-------------|
-| `--compile` | `False` | Enable `torch.compile` |
+| `--compile` | `False` | Enable `torch.compile` (recommended for long runs) |
 | `--precision` | `bf16` | Mixed precision mode (`bf16`, `fp16`, `no`) |
+| `--workers` | `-1` | DataLoader workers per GPU (-1=auto, up to 16) |
 | `--wandb` | `False` | Enable W&B logging |
+| `--wandb_watch` | `False` | Enable W&B gradient watching (adds overhead) |
 | `--project_name` | `DL-Training` | W&B project name |
 | `--run_name` | `None` | W&B run name (auto-generated if not set) |
+**Automatic GPU Optimizations:**
+WaveDL automatically enables performance optimizations for modern GPUs:
+| Optimization | Effect | GPU Support |
+|--------------|--------|-------------|
+| **TF32 precision** | ~2x speedup for float32 matmul | A100, H100 (Ampere+) |
+| **cuDNN benchmark** | Auto-tuned convolutions | All NVIDIA GPUs |
+| **Worker scaling** | Up to 16 workers per GPU | All systems |
+> [!NOTE]
+> These optimizations are **backward compatible** — they have no effect on older GPUs (V100, T4, GTX) or CPU-only systems. No configuration needed.
+**HPC Best Practices:**
+- Stage data to `$SLURM_TMPDIR` (local NVMe) for maximum I/O throughput
+- Use `--compile` for training runs > 50 epochs
+- Increase `--workers` manually if auto-detection is suboptimal
 </details>
 <details>
@@ -595,12 +651,7 @@ seed: 2025
 Automatically find the best training configuration using [Optuna](https://optuna.org/).
-**Step 1: Install**
-```bash
-pip install -e ".[hpo]"
-```
-**Step 2: Run HPO**
+**Run HPO:**
 You specify which models to search and how many trials to run:
 ```bash
@@ -614,7 +665,7 @@ python -m wavedl.hpo --data_path train.npz --models cnn --n_trials 50
 python -m wavedl.hpo --data_path train.npz --models cnn resnet18 resnet50 vit_small densenet121 --n_trials 200
 ```
-**Step 3: Train with best parameters**
+**Train with best parameters**
 After HPO completes, it prints the optimal command:
 ```bash
@@ -657,7 +708,7 @@ accelerate launch -m wavedl.train --data_path train.npz --model cnn --lr 3.2e-4
 | `--output` | `hpo_results.json` | Output file |
 > [!TIP]
-> See [Available Models](#available-models) for all 21 architectures you can search.
+> See [Available Models](#available-models) for all 38 architectures you can search.
 </details>
@@ -761,7 +812,7 @@ import numpy as np
 X = np.random.randn(1000, 256, 256).astype(np.float32)
 y = np.random.randn(1000, 5).astype(np.float32)
-np.savez('test_data.npz', input_train=X, output_train=y)
+np.savez('test_data.npz', input_test=X, output_test=y)
 ```
 </details>
@@ -773,7 +824,7 @@ np.savez('test_data.npz', input_train=X, output_train=y)
 import numpy as np
 data = np.load('train_data.npz')
-assert data['input_train'].ndim == 3, "Input must be 3D: (N, H, W)"
+assert data['input_train'].ndim >= 2, "Input must be at least 2D: (N, ...) "
 assert data['output_train'].ndim == 2, "Output must be 2D: (N, T)"
 assert len(data['input_train']) == len(data['output_train']), "Sample mismatch"
@@ -928,7 +979,7 @@ Beyond the material characterization example above, the WaveDL pipeline can be a
 | Resource | Description |
 |----------|-------------|
 | Technical Paper | In-depth framework description *(coming soon)* |
-| [`_template.py`](models/_template.py) | Template for new architectures |
+| [`_template.py`](src/wavedl/models/_template.py) | Template for custom architectures |
 ---

wavedl 1.3.1__tar.gz → 1.4.1__tar.gz

wavedl 1.3.1tar.gz → 1.4.1tar.gz