PyPI - aws-bootstrap-g4dn - Versions diffs - 0.5.0__tar.gz → 0.7.0__tar.gz - Mend

aws-bootstrap-g4dn 0.5.0tar.gz → 0.7.0tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (50) hide show

{aws_bootstrap_g4dn-0.5.0 → aws_bootstrap_g4dn-0.7.0}/.pre-commit-config.yaml RENAMED Viewed

@@ -17,7 +17,7 @@ repos:
     - id: end-of-file-fixer
     - id: trailing-whitespace
 - repo: https://github.com/astral-sh/ruff-pre-commit
-  rev: v0.14.7
+  rev: v0.15.0
   hooks:
     # Run the linter.
     - id: ruff-check
@@ -28,7 +28,7 @@ repos:
   rev: v1.19.0
   hooks:
     - id: mypy
-      additional_dependencies: [types-pyyaml>=6.0.12]
+      additional_dependencies: [types-pyyaml>=6.0.12, types-tabulate>=0.9]
 - repo: local
   hooks:
   - id: pytest

{aws_bootstrap_g4dn-0.5.0 → aws_bootstrap_g4dn-0.7.0}/CLAUDE.md RENAMED Viewed

@@ -13,6 +13,8 @@ Target workflows: Jupyter server-client, VSCode Remote SSH, and NVIDIA Nsight re
 - **Python 3.12+** with **uv** package manager (astral-sh/uv) — used for venv creation, dependency management, and running the project
 - **boto3** — AWS SDK for EC2 provisioning (AMI lookup, security groups, instance launch, waiters)
 - **click** — CLI framework with built-in color support (`click.secho`, `click.style`)
+- **pyyaml** — YAML serialization for `--output yaml`
+- **tabulate** — Table formatting for `--output table`
 - **setuptools + setuptools-scm** — build backend with git-tag-based versioning (configured in pyproject.toml)
 - **AWS CLI v2** with a configured AWS profile (`AWS_PROFILE` env var or `--profile` flag)
 - **direnv** for automatic venv activation (`.envrc` sources `.venv/bin/activate`)
@@ -32,9 +34,10 @@ aws_bootstrap/
     __init__.py          # Package init
     cli.py               # Click CLI entry point (launch, status, terminate commands)
     config.py            # LaunchConfig dataclass with defaults
-    ec2.py               # AMI lookup, security group, instance launch/find/terminate, polling, spot pricing
+    ec2.py               # AMI lookup, security group, instance launch/find/terminate, polling, spot pricing, EBS volume ops
     gpu.py               # GPU architecture mapping and GpuInfo dataclass
-    ssh.py               # SSH key pair import, SSH readiness check, remote setup, ~/.ssh/config management, GPU queries
+    output.py            # Output formatting: OutputFormat enum, emit(), echo/secho wrappers for structured output
+    ssh.py               # SSH key pair import, SSH readiness check, remote setup, ~/.ssh/config management, GPU queries, EBS mount
     resources/           # Non-Python artifacts SCP'd to remote instances
         __init__.py
         gpu_benchmark.py       # GPU throughput benchmark (CNN + Transformer), copied to ~/gpu_benchmark.py on instance
@@ -48,9 +51,12 @@ aws_bootstrap/
         test_config.py
         test_cli.py
         test_ec2.py
+        test_output.py
         test_gpu.py
         test_ssh_config.py
         test_ssh_gpu.py
+        test_ebs.py
+        test_ssh_ebs.py
 docs/
     nsight-remote-profiling.md # Nsight Compute, Nsight Systems, and Nsight VSCE remote profiling guide
     spot-request-lifecycle.md  # Research notes on spot request cleanup
@@ -60,9 +66,12 @@ Entry point: `aws-bootstrap = "aws_bootstrap.cli:main"` (installed via `uv sync`
 ## CLI Commands
-- **`launch`** — provisions an EC2 instance (spot by default, falls back to on-demand on capacity errors); adds SSH config alias (e.g. `aws-gpu1`) to `~/.ssh/config`; `--python-version` controls which Python `uv` installs in the remote venv; `--ssh-port` overrides the default SSH port (22) for security group ingress, connection checks, and SSH config
-- **`status`** — lists all non-terminated instances (including `shutting-down`) with type, IP, SSH alias, pricing (spot price/hr or on-demand), uptime, and estimated cost for running spot instances; `--gpu` flag queries GPU info via SSH, reporting both CUDA toolkit version (from `nvcc`) and driver-supported max (from `nvidia-smi`); `--instructions` (default: on) prints connection commands (SSH, Jupyter tunnel, VSCode Remote SSH, GPU benchmark) for each running instance; suppress with `--no-instructions`
-- **`terminate`** — terminates instances by ID or SSH alias (e.g. `aws-gpu1`, resolved via `~/.ssh/config`), or all aws-bootstrap instances in the region if no arguments given; removes SSH config aliases
+**Global option:** `--output` / `-o` controls output format: `text` (default, human-readable with color), `json`, `yaml`, `table`. Structured formats (json/yaml/table) suppress all progress messages and emit machine-readable output. Commands requiring confirmation (`terminate`, `cleanup`) require `--yes` in structured modes.
+- **`launch`** — provisions an EC2 instance (spot by default, falls back to on-demand on capacity errors); adds SSH config alias (e.g. `aws-gpu1`) to `~/.ssh/config`; `--python-version` controls which Python `uv` installs in the remote venv; `--ssh-port` overrides the default SSH port (22) for security group ingress, connection checks, and SSH config; `--ebs-storage SIZE` creates and attaches a new gp3 EBS data volume (mounted at `/data`); `--ebs-volume-id ID` attaches an existing EBS volume (mutually exclusive with `--ebs-storage`)
+- **`status`** — lists all non-terminated instances (including `shutting-down`) with type, IP, SSH alias, EBS data volumes, pricing (spot price/hr or on-demand), uptime, and estimated cost for running spot instances; `--gpu` flag queries GPU info via SSH, reporting both CUDA toolkit version (from `nvcc`) and driver-supported max (from `nvidia-smi`); `--instructions` (default: on) prints connection commands (SSH, Jupyter tunnel, VSCode Remote SSH, GPU benchmark) for each running instance; suppress with `--no-instructions`
+- **`terminate`** — terminates instances by ID or SSH alias (e.g. `aws-gpu1`, resolved via `~/.ssh/config`), or all aws-bootstrap instances in the region if no arguments given; removes SSH config aliases; deletes associated EBS data volumes by default; `--keep-ebs` preserves volumes and prints reattach commands
+- **`cleanup`** — removes stale `~/.ssh/config` entries for terminated/non-existent instances; compares managed SSH config blocks against live EC2 instances; `--dry-run` previews removals without modifying config; `--yes` skips the confirmation prompt
 - **`list instance-types`** — lists EC2 instance types matching a family prefix (default: `g4dn`), showing vCPUs, memory, and GPU info
 - **`list amis`** — lists available AMIs matching a name pattern (default: Deep Learning Base OSS Nvidia Driver GPU AMIs), sorted newest-first
@@ -91,6 +100,18 @@ uv run pytest
 Use `uv add <package>` to add dependencies and `uv add --group dev <package>` for dev dependencies.
+## Structured Output Architecture
+The `--output` option uses a context-aware suppression pattern via `aws_bootstrap/output.py`:
+- **`output.echo()` / `output.secho()`** — wrap `click.echo`/`click.secho`; silent in non-text modes. Used in `ec2.py` and `ssh.py` for progress messages.
+- **`is_text(ctx)`** — checks if the current output format is text. Used in `cli.py` to guard text-only blocks.
+- **`emit(data, headers=..., ctx=...)`** — dispatches structured data to JSON/YAML/table renderers. No-op in text mode.
+- **CLI helper guards** — `step()`, `info()`, `val()`, `success()`, `warn()` in `cli.py` check `is_text()` and return early in structured modes.
+- Each CLI command builds a result dict alongside existing logic, emits it via `emit()` for non-text formats, and falls through to text output for text mode.
+- **Confirmation prompts** (`terminate`, `cleanup`) require `--yes` in structured modes to avoid corrupting output.
+- The spot-fallback `click.confirm()` in `ec2.py` auto-confirms in structured modes.
 ## CUDA-Aware PyTorch Installation
 `remote_setup.sh` detects the CUDA toolkit version on the instance (via `nvcc`, falling back to `nvidia-smi`) and installs PyTorch from the matching CUDA wheel index (`https://download.pytorch.org/whl/cu{TAG}`). This ensures `torch.version.cuda` matches the system's CUDA toolkit, which is required for compiling custom CUDA extensions with `nvcc`.
@@ -112,6 +133,34 @@ The `KNOWN_CUDA_TAGS` array in `remote_setup.sh` lists the CUDA wheel tags publi
 `resources/gpu_benchmark.py` is uploaded to `~/gpu_benchmark.py` on the remote instance during setup. It benchmarks GPU throughput with two modes: CNN on MNIST and a GPT-style Transformer on synthetic data. It reports samples/sec, batch times, and peak GPU memory. Supports `--precision` (fp32/fp16/bf16/tf32), `--diagnose` for CUDA smoke tests, and separate `--transformer-batch-size` (default 32, T4-safe). Dependencies (`torch`, `torchvision`, `tqdm`) are already installed by the setup script.
+## EBS Data Volumes
+The `--ebs-storage` and `--ebs-volume-id` options on `launch` create or attach persistent gp3 EBS volumes mounted at `/data`. The implementation spans three modules:
+- **`ec2.py`** — Volume lifecycle: `create_ebs_volume`, `validate_ebs_volume`, `attach_ebs_volume`, `detach_ebs_volume`, `delete_ebs_volume`, `find_ebs_volumes_for_instance`. Constants `EBS_DEVICE_NAME` (`/dev/sdf`) and `EBS_MOUNT_POINT` (`/data`).
+- **`ssh.py`** — `mount_ebs_volume()` SSHs to the instance and runs a shell script that detects the device, optionally formats it, mounts it, and adds an fstab entry.
+- **`cli.py`** — Orchestrates the flow: create/validate → attach → wait for SSH → mount. Mount failures are non-fatal (warn and continue).
+### Tagging strategy
+Volumes are tagged for discovery by `status` and `terminate`:
+| Tag | Value | Purpose |
+|-----|-------|---------|
+| `created-by` | `aws-bootstrap-g4dn` | Standard tool-managed resource tag |
+| `Name` | `aws-bootstrap-data-{instance_id}` | Human-readable in AWS console |
+| `aws-bootstrap-instance` | `i-xxxxxxxxx` | Links volume to instance for `find_ebs_volumes_for_instance` |
+### NVMe device detection
+On Nitro instances (g4dn), `/dev/sdf` is remapped to `/dev/nvmeXn1`. The mount script detects the correct device by matching the volume ID serial number via `lsblk -o NAME,SERIAL -dpn`, with fallbacks to `/dev/nvme1n1`, `/dev/xvdf`, `/dev/sdf`.
+### Spot interruption and terminate cleanup
+Non-root EBS volumes attached via API have `DeleteOnTermination=False` by default. This means data volumes **survive spot interruptions** — when AWS reclaims the instance, the volume detaches and becomes `available`, preserving all data. The user can reattach it to a new instance with `--ebs-volume-id`.
+The `terminate` command discovers volumes via `find_ebs_volumes_for_instance`, waits for them to detach (becomes `available`), then deletes them. `--keep-ebs` skips deletion and prints the volume ID with a reattach command.
 ## Versioning & Publishing
 Version is derived automatically from git tags via **setuptools-scm** — no hardcoded version string in the codebase.

{aws_bootstrap_g4dn-0.5.0 → aws_bootstrap_g4dn-0.7.0}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: aws-bootstrap-g4dn
-Version: 0.5.0
+Version: 0.7.0
 Summary: Bootstrap AWS EC2 GPU instances for hybrid local-remote development
 Author: Adam Ever-Hadani
 License-Expression: MIT
@@ -15,6 +15,8 @@ Description-Content-Type: text/markdown
 License-File: LICENSE
 Requires-Dist: boto3>=1.35
 Requires-Dist: click>=8.1
+Requires-Dist: pyyaml>=6.0.3
+Requires-Dist: tabulate>=0.9.0
 Dynamic: license-file
 # aws-bootstrap-g4dn
@@ -44,7 +46,8 @@ ssh aws-gpu1                  # You're in, venv activated, PyTorch works
 | 📊 | **GPU benchmark included** | CNN (MNIST) + Transformer benchmarks with FP16/FP32/BF16 precision and tqdm progress |
 | 📓 | **Jupyter ready** | Lab server auto-starts as a systemd service on port 8888 — just SSH tunnel and open |
 | 🖥️ | **`status --gpu`** | Shows CUDA toolkit version, driver max, GPU architecture, spot pricing, uptime, and estimated cost |
-| 🗑️ | **Clean terminate** | Stops instances, removes SSH aliases, shows shutting-down state until fully gone |
+| 💾 | **EBS data volumes** | Attach persistent storage at `/data` — survives spot interruptions and termination, reattach to new instances |
+| 🗑️ | **Clean terminate** | Stops instances, removes SSH aliases, cleans up EBS volumes (or preserves with `--keep-ebs`) |
 ### 🎯 Target Workflows
@@ -132,16 +135,24 @@ aws-bootstrap launch --python-version 3.13
 # Use a non-default SSH port
 aws-bootstrap launch --ssh-port 2222
+# Attach a persistent EBS data volume (96 GB gp3, mounted at /data)
+aws-bootstrap launch --ebs-storage 96
+# Reattach an existing EBS volume from a previous instance
+aws-bootstrap launch --ebs-volume-id vol-0abc123def456
 # Use a specific AWS profile
 aws-bootstrap launch --profile my-aws-profile
 ```
 After launch, the CLI:
-1. **Adds an SSH alias** (e.g. `aws-gpu1`) to `~/.ssh/config`
-2. **Runs remote setup** — installs utilities, creates a Python venv, installs CUDA-matched PyTorch, sets up Jupyter
-3. **Runs a CUDA smoke test** — verifies `torch.cuda.is_available()` and runs a quick GPU matmul
-4. **Prints connection commands** — SSH, Jupyter tunnel, GPU benchmark, and terminate
+1. **Creates/attaches EBS volume** (if `--ebs-storage` or `--ebs-volume-id` was specified)
+2. **Adds an SSH alias** (e.g. `aws-gpu1`) to `~/.ssh/config`
+3. **Runs remote setup** — installs utilities, creates a Python venv, installs CUDA-matched PyTorch, sets up Jupyter
+4. **Mounts EBS volume** at `/data` (if applicable — formats new volumes, mounts existing ones as-is)
+5. **Runs a CUDA smoke test** — verifies `torch.cuda.is_available()` and runs a quick GPU matmul
+6. **Prints connection commands** — SSH, Jupyter tunnel, GPU benchmark, and terminate
 ```bash
 ssh aws-gpu1                  # venv auto-activates on login
@@ -154,7 +165,7 @@ The setup script runs automatically on the instance after SSH becomes available:
 | Step | What |
 |------|------|
 | **GPU verify** | Confirms `nvidia-smi` and `nvcc` are working |
-| **Utilities** | Installs `htop`, `tmux`, `tree`, `jq` |
+| **Utilities** | Installs `htop`, `tmux`, `tree`, `jq`, `ffmpeg` |
 | **Python venv** | Creates `~/venv` with `uv`, auto-activates in `~/.bashrc`. Use `--python-version` to pin a specific Python (e.g. `3.13`) |
 | **CUDA-aware PyTorch** | Detects CUDA toolkit version → installs PyTorch from the matching `cu{TAG}` wheel index |
 | **CUDA smoke test** | Runs `torch.cuda.is_available()` + GPU matmul to verify the stack |
@@ -223,6 +234,30 @@ Then install the [Nsight VSCE extension](https://marketplace.visualstudio.com/it
 See [Nsight remote profiling guide](docs/nsight-remote-profiling.md) for more details on CUDA debugging and profiling workflows.
+### 📤 Structured Output
+All commands support `--output` / `-o` for machine-readable output — useful for scripting, piping to `jq`, or LLM tool-use:
+```bash
+# JSON output (pipe to jq)
+aws-bootstrap -o json status
+aws-bootstrap -o json status | jq '.instances[0].instance_id'
+# YAML output
+aws-bootstrap -o yaml status
+# Table output
+aws-bootstrap -o table status
+# Works with all commands
+aws-bootstrap -o json list instance-types | jq '.[].instance_type'
+aws-bootstrap -o json launch --dry-run
+aws-bootstrap -o json terminate --yes
+aws-bootstrap -o json cleanup --dry-run
+```
+Supported formats: `text` (default, human-readable with color), `json`, `yaml`, `table`. Commands that require confirmation (`terminate`, `cleanup`) require `--yes` in structured output modes.
 ### 📋 Listing Resources
 ```bash
@@ -261,6 +296,9 @@ aws-bootstrap status --region us-east-1
 # Terminate all aws-bootstrap instances (with confirmation prompt)
 aws-bootstrap terminate
+# Terminate but preserve EBS data volumes for reuse
+aws-bootstrap terminate --keep-ebs
 # Terminate by SSH alias (resolved via ~/.ssh/config)
 aws-bootstrap terminate aws-gpu1
@@ -272,6 +310,15 @@ aws-bootstrap terminate aws-gpu1 i-def456
 # Skip confirmation prompt
 aws-bootstrap terminate --yes
+# Remove stale SSH config entries for terminated instances
+aws-bootstrap cleanup
+# Preview what would be removed without modifying config
+aws-bootstrap cleanup --dry-run
+# Skip confirmation prompt
+aws-bootstrap cleanup --yes
 ```
 `status --gpu` reports both the **installed CUDA toolkit** version (from `nvcc`) and the **maximum CUDA version supported by the driver** (from `nvidia-smi`), so you can see at a glance whether they match:
@@ -282,6 +329,31 @@ CUDA: 12.8 (driver supports up to 13.0)
 SSH aliases are managed automatically — they're created on `launch`, shown in `status`, and cleaned up on `terminate`. Aliases use sequential numbering (`aws-gpu1`, `aws-gpu2`, etc.) and never reuse numbers from previous instances. You can use aliases anywhere you'd use an instance ID, e.g. `aws-bootstrap terminate aws-gpu1`.
+## EBS Data Volumes
+Attach persistent EBS storage to keep datasets and model checkpoints across instance lifecycles. Volumes are mounted at `/data` and persist independently of the instance.
+```bash
+# Create a new 96 GB gp3 volume, formatted and mounted at /data
+aws-bootstrap launch --ebs-storage 96
+# After terminating with --keep-ebs, reattach the same volume to a new instance
+aws-bootstrap terminate --keep-ebs
+# Output: Preserving EBS volume: vol-0abc123...
+#         Reattach with: aws-bootstrap launch --ebs-volume-id vol-0abc123...
+aws-bootstrap launch --ebs-volume-id vol-0abc123def456
+```
+Key behaviors:
+- `--ebs-storage` and `--ebs-volume-id` are mutually exclusive
+- New volumes are formatted as ext4; existing volumes are mounted as-is
+- Volumes are tagged for automatic discovery by `status` and `terminate`
+- `terminate` deletes data volumes by default; use `--keep-ebs` to preserve them
+- **Spot-safe** — data volumes survive spot interruptions. If AWS reclaims your instance, the volume detaches automatically and can be reattached to a new instance with `--ebs-volume-id`
+- EBS volumes must be in the same availability zone as the instance
+- Mount failures are non-fatal — the instance remains usable
 ## EC2 vCPU Quotas
 AWS accounts have [service quotas](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ec2-resource-limits.html) that limit how many vCPUs you can run per instance family. New or lightly-used accounts often have a **default quota of 0 vCPUs** for GPU instance families (G and VT), which will cause errors on launch:

{aws_bootstrap_g4dn-0.5.0 → aws_bootstrap_g4dn-0.7.0}/README.md RENAMED Viewed

@@ -25,7 +25,8 @@ ssh aws-gpu1                  # You're in, venv activated, PyTorch works
 | 📊 | **GPU benchmark included** | CNN (MNIST) + Transformer benchmarks with FP16/FP32/BF16 precision and tqdm progress |
 | 📓 | **Jupyter ready** | Lab server auto-starts as a systemd service on port 8888 — just SSH tunnel and open |
 | 🖥️ | **`status --gpu`** | Shows CUDA toolkit version, driver max, GPU architecture, spot pricing, uptime, and estimated cost |
-| 🗑️ | **Clean terminate** | Stops instances, removes SSH aliases, shows shutting-down state until fully gone |
+| 💾 | **EBS data volumes** | Attach persistent storage at `/data` — survives spot interruptions and termination, reattach to new instances |
+| 🗑️ | **Clean terminate** | Stops instances, removes SSH aliases, cleans up EBS volumes (or preserves with `--keep-ebs`) |
 ### 🎯 Target Workflows
@@ -113,16 +114,24 @@ aws-bootstrap launch --python-version 3.13
 # Use a non-default SSH port
 aws-bootstrap launch --ssh-port 2222
+# Attach a persistent EBS data volume (96 GB gp3, mounted at /data)
+aws-bootstrap launch --ebs-storage 96
+# Reattach an existing EBS volume from a previous instance
+aws-bootstrap launch --ebs-volume-id vol-0abc123def456
 # Use a specific AWS profile
 aws-bootstrap launch --profile my-aws-profile
 ```
 After launch, the CLI:
-1. **Adds an SSH alias** (e.g. `aws-gpu1`) to `~/.ssh/config`
-2. **Runs remote setup** — installs utilities, creates a Python venv, installs CUDA-matched PyTorch, sets up Jupyter
-3. **Runs a CUDA smoke test** — verifies `torch.cuda.is_available()` and runs a quick GPU matmul
-4. **Prints connection commands** — SSH, Jupyter tunnel, GPU benchmark, and terminate
+1. **Creates/attaches EBS volume** (if `--ebs-storage` or `--ebs-volume-id` was specified)
+2. **Adds an SSH alias** (e.g. `aws-gpu1`) to `~/.ssh/config`
+3. **Runs remote setup** — installs utilities, creates a Python venv, installs CUDA-matched PyTorch, sets up Jupyter
+4. **Mounts EBS volume** at `/data` (if applicable — formats new volumes, mounts existing ones as-is)
+5. **Runs a CUDA smoke test** — verifies `torch.cuda.is_available()` and runs a quick GPU matmul
+6. **Prints connection commands** — SSH, Jupyter tunnel, GPU benchmark, and terminate
 ```bash
 ssh aws-gpu1                  # venv auto-activates on login
@@ -135,7 +144,7 @@ The setup script runs automatically on the instance after SSH becomes available:
 | Step | What |
 |------|------|
 | **GPU verify** | Confirms `nvidia-smi` and `nvcc` are working |
-| **Utilities** | Installs `htop`, `tmux`, `tree`, `jq` |
+| **Utilities** | Installs `htop`, `tmux`, `tree`, `jq`, `ffmpeg` |
 | **Python venv** | Creates `~/venv` with `uv`, auto-activates in `~/.bashrc`. Use `--python-version` to pin a specific Python (e.g. `3.13`) |
 | **CUDA-aware PyTorch** | Detects CUDA toolkit version → installs PyTorch from the matching `cu{TAG}` wheel index |
 | **CUDA smoke test** | Runs `torch.cuda.is_available()` + GPU matmul to verify the stack |
@@ -204,6 +213,30 @@ Then install the [Nsight VSCE extension](https://marketplace.visualstudio.com/it
 See [Nsight remote profiling guide](docs/nsight-remote-profiling.md) for more details on CUDA debugging and profiling workflows.
+### 📤 Structured Output
+All commands support `--output` / `-o` for machine-readable output — useful for scripting, piping to `jq`, or LLM tool-use:
+```bash
+# JSON output (pipe to jq)
+aws-bootstrap -o json status
+aws-bootstrap -o json status | jq '.instances[0].instance_id'
+# YAML output
+aws-bootstrap -o yaml status
+# Table output
+aws-bootstrap -o table status
+# Works with all commands
+aws-bootstrap -o json list instance-types | jq '.[].instance_type'
+aws-bootstrap -o json launch --dry-run
+aws-bootstrap -o json terminate --yes
+aws-bootstrap -o json cleanup --dry-run
+```
+Supported formats: `text` (default, human-readable with color), `json`, `yaml`, `table`. Commands that require confirmation (`terminate`, `cleanup`) require `--yes` in structured output modes.
 ### 📋 Listing Resources
 ```bash
@@ -242,6 +275,9 @@ aws-bootstrap status --region us-east-1
 # Terminate all aws-bootstrap instances (with confirmation prompt)
 aws-bootstrap terminate
+# Terminate but preserve EBS data volumes for reuse
+aws-bootstrap terminate --keep-ebs
 # Terminate by SSH alias (resolved via ~/.ssh/config)
 aws-bootstrap terminate aws-gpu1
@@ -253,6 +289,15 @@ aws-bootstrap terminate aws-gpu1 i-def456
 # Skip confirmation prompt
 aws-bootstrap terminate --yes
+# Remove stale SSH config entries for terminated instances
+aws-bootstrap cleanup
+# Preview what would be removed without modifying config
+aws-bootstrap cleanup --dry-run
+# Skip confirmation prompt
+aws-bootstrap cleanup --yes
 ```
 `status --gpu` reports both the **installed CUDA toolkit** version (from `nvcc`) and the **maximum CUDA version supported by the driver** (from `nvidia-smi`), so you can see at a glance whether they match:
@@ -263,6 +308,31 @@ CUDA: 12.8 (driver supports up to 13.0)
 SSH aliases are managed automatically — they're created on `launch`, shown in `status`, and cleaned up on `terminate`. Aliases use sequential numbering (`aws-gpu1`, `aws-gpu2`, etc.) and never reuse numbers from previous instances. You can use aliases anywhere you'd use an instance ID, e.g. `aws-bootstrap terminate aws-gpu1`.
+## EBS Data Volumes
+Attach persistent EBS storage to keep datasets and model checkpoints across instance lifecycles. Volumes are mounted at `/data` and persist independently of the instance.
+```bash
+# Create a new 96 GB gp3 volume, formatted and mounted at /data
+aws-bootstrap launch --ebs-storage 96
+# After terminating with --keep-ebs, reattach the same volume to a new instance
+aws-bootstrap terminate --keep-ebs
+# Output: Preserving EBS volume: vol-0abc123...
+#         Reattach with: aws-bootstrap launch --ebs-volume-id vol-0abc123...
+aws-bootstrap launch --ebs-volume-id vol-0abc123def456
+```
+Key behaviors:
+- `--ebs-storage` and `--ebs-volume-id` are mutually exclusive
+- New volumes are formatted as ext4; existing volumes are mounted as-is
+- Volumes are tagged for automatic discovery by `status` and `terminate`
+- `terminate` deletes data volumes by default; use `--keep-ebs` to preserve them
+- **Spot-safe** — data volumes survive spot interruptions. If AWS reclaims your instance, the volume detaches automatically and can be reattached to a new instance with `--ebs-volume-id`
+- EBS volumes must be in the same availability zone as the instance
+- Mount failures are non-fatal — the instance remains usable
 ## EC2 vCPU Quotas
 AWS accounts have [service quotas](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ec2-resource-limits.html) that limit how many vCPUs you can run per instance family. New or lightly-used accounts often have a **default quota of 0 vCPUs** for GPU instance families (G and VT), which will cause errors on launch:

aws-bootstrap-g4dn 0.5.0__tar.gz → 0.7.0__tar.gz

aws-bootstrap-g4dn 0.5.0tar.gz → 0.7.0tar.gz