PyPI - aws-bootstrap-g4dn - Versions diffs - 0.3.0__tar.gz → 0.4.0__tar.gz - Mend

aws-bootstrap-g4dn 0.3.0tar.gz → 0.4.0tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (44) hide show

{aws_bootstrap_g4dn-0.3.0 → aws_bootstrap_g4dn-0.4.0}/.pre-commit-config.yaml RENAMED Viewed

@@ -8,6 +8,7 @@ repos:
     - id: fix-byte-order-marker
     - id: check-case-conflict
     - id: check-json
+      exclude: ^aws_bootstrap/resources/(launch|tasks)\.json$
     - id: check-yaml
       args: [ --unsafe ]
     - id: detect-aws-credentials

{aws_bootstrap_g4dn-0.3.0 → aws_bootstrap_g4dn-0.4.0}/CLAUDE.md RENAMED Viewed

@@ -39,6 +39,9 @@ aws_bootstrap/
         __init__.py
         gpu_benchmark.py       # GPU throughput benchmark (CNN + Transformer), copied to ~/gpu_benchmark.py on instance
         gpu_smoke_test.ipynb   # Interactive Jupyter notebook for GPU verification, copied to ~/gpu_smoke_test.ipynb
+        launch.json            # VSCode CUDA debug config template (deployed to ~/workspace/.vscode/launch.json)
+        saxpy.cu               # Example CUDA SAXPY source (deployed to ~/workspace/saxpy.cu)
+        tasks.json             # VSCode CUDA build tasks template (deployed to ~/workspace/.vscode/tasks.json)
         remote_setup.sh        # Uploaded & run on instance post-boot (GPU verify, Jupyter, etc.)
         requirements.txt       # Python dependencies installed on the remote instance
     tests/               # Unit tests (pytest)
@@ -49,6 +52,7 @@ aws_bootstrap/
         test_ssh_config.py
         test_ssh_gpu.py
 docs/
+    nsight-remote-profiling.md # Nsight Compute, Nsight Systems, and Nsight VSCE remote profiling guide
     spot-request-lifecycle.md  # Research notes on spot request cleanup
 ```
@@ -99,8 +103,10 @@ The `KNOWN_CUDA_TAGS` array in `remote_setup.sh` lists the CUDA wheel tags publi
 `remote_setup.sh` also:
 - Creates `~/venv` and appends `source ~/venv/bin/activate` to `~/.bashrc` so the venv is auto-activated on SSH login. When `--python-version` is passed to `launch`, the CLI sets `PYTHON_VERSION` as an inline env var on the SSH command; `remote_setup.sh` reads it to run `uv python install` and `uv venv --python` with the requested version
+- Adds NVIDIA Nsight Systems (`nsys`) to PATH if installed under `/opt/nvidia/nsight-systems/` (pre-installed on Deep Learning AMIs but not on PATH by default). Fixes directory permissions, finds the latest version, and prepends its `bin/` to PATH in `~/.bashrc`
 - Runs a quick CUDA smoke test (`torch.cuda.is_available()` + GPU matmul) after PyTorch installation to verify the GPU stack; prints a WARNING on failure but does not abort
 - Copies `gpu_benchmark.py` to `~/gpu_benchmark.py` and `gpu_smoke_test.ipynb` to `~/gpu_smoke_test.ipynb`
+- Sets up `~/workspace/.vscode/` with `launch.json` and `tasks.json` for CUDA debugging. Detects `cuda-gdb` path and GPU SM architecture (via `nvidia-smi --query-gpu=compute_cap`) at deploy time, replacing `__CUDA_GDB_PATH__` and `__GPU_ARCH__` placeholders in the template files via `sed`
 ## GPU Benchmark

{aws_bootstrap_g4dn-0.3.0/aws_bootstrap_g4dn.egg-info → aws_bootstrap_g4dn-0.4.0}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: aws-bootstrap-g4dn
-Version: 0.3.0
+Version: 0.4.0
 Summary: Bootstrap AWS EC2 GPU instances for hybrid local-remote development
 Author: Adam Ever-Hadani
 License-Expression: MIT
@@ -49,7 +49,7 @@ ssh aws-gpu1                  # You're in, venv activated, PyTorch works
 ### 🎯 Target Workflows
 1. **Jupyter server-client** — Jupyter runs on the instance, connect from your local browser
-2. **VSCode Remote SSH** — `ssh aws-gpu1` just works with the Remote SSH extension
+2. **VSCode Remote SSH** — opens `~/workspace` with pre-configured CUDA debug/build tasks and an example `.cu` file
 3. **NVIDIA Nsight remote debugging** — GPU debugging over SSH
 ---
@@ -162,6 +162,7 @@ The setup script runs automatically on the instance after SSH becomes available:
 | **GPU smoke test notebook** | Copies `gpu_smoke_test.ipynb` to `~/gpu_smoke_test.ipynb` (open in JupyterLab) |
 | **Jupyter** | Configures and starts JupyterLab as a systemd service on port 8888 |
 | **SSH keepalive** | Configures server-side keepalive to prevent idle disconnects |
+| **VSCode workspace** | Creates `~/workspace/.vscode/` with `launch.json` and `tasks.json` (auto-detected `cuda-gdb` path and GPU arch), plus an example `saxpy.cu` |
 ### 📊 GPU Benchmark
@@ -200,6 +201,28 @@ ssh -i ~/.ssh/id_ed25519 -NL 8888:localhost:8888 ubuntu@<public-ip>
 A **GPU smoke test notebook** (`~/gpu_smoke_test.ipynb`) is pre-installed on every instance. Open it in JupyterLab to interactively verify the CUDA stack, run FP32/FP16 matmuls, train a small CNN on MNIST, and visualise training loss and GPU memory usage.
+### 🖥️ VSCode Remote SSH
+The remote setup creates a `~/workspace` folder with pre-configured CUDA debug and build tasks:
+```
+~/workspace/
+├── .vscode/
+│   ├── launch.json   # CUDA debug configs (cuda-gdb path auto-detected)
+│   └── tasks.json    # nvcc build tasks (GPU arch auto-detected, e.g. sm_75)
+└── saxpy.cu          # Example CUDA source — open and press F5 to debug
+```
+Connect directly from your terminal:
+```bash
+code --folder-uri vscode-remote://ssh-remote+aws-gpu1/home/ubuntu/workspace
+```
+Then install the [Nsight VSCE extension](https://marketplace.visualstudio.com/items?itemName=NVIDIA.nsight-vscode-edition) on the remote when prompted. Open `saxpy.cu`, set a breakpoint, and press F5.
+See [Nsight remote profiling guide](docs/nsight-remote-profiling.md) for more details on CUDA debugging and profiling workflows.
 ### 📋 Listing Resources
 ```bash
@@ -322,7 +345,7 @@ aws-bootstrap launch --instance-type t3.medium --ami-filter "ubuntu/images/hvm-s
 | GPU instance pricing | [instances.vantage.sh](https://instances.vantage.sh/aws/ec2/g4dn.xlarge) |
 | Spot instance quotas | [AWS docs](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/using-spot-limits.html) |
 | Deep Learning AMIs | [AWS docs](https://docs.aws.amazon.com/dlami/latest/devguide/what-is-dlami.html) |
-| Nvidia Nsight remote debugging | [Nvidia docs](https://docs.nvidia.com/nsight-visual-studio-edition/3.2/Content/Setup_Remote_Debugging.htm) |
+| Nsight remote GPU profiling | [Guide](docs/nsight-remote-profiling.md) — Nsight Compute, Nsight Systems, and Nsight VSCE on EC2 |
 Tutorials on setting up a CUDA environment on EC2 GPU instances:

{aws_bootstrap_g4dn-0.3.0 → aws_bootstrap_g4dn-0.4.0}/README.md RENAMED Viewed

@@ -30,7 +30,7 @@ ssh aws-gpu1                  # You're in, venv activated, PyTorch works
 ### 🎯 Target Workflows
 1. **Jupyter server-client** — Jupyter runs on the instance, connect from your local browser
-2. **VSCode Remote SSH** — `ssh aws-gpu1` just works with the Remote SSH extension
+2. **VSCode Remote SSH** — opens `~/workspace` with pre-configured CUDA debug/build tasks and an example `.cu` file
 3. **NVIDIA Nsight remote debugging** — GPU debugging over SSH
 ---
@@ -143,6 +143,7 @@ The setup script runs automatically on the instance after SSH becomes available:
 | **GPU smoke test notebook** | Copies `gpu_smoke_test.ipynb` to `~/gpu_smoke_test.ipynb` (open in JupyterLab) |
 | **Jupyter** | Configures and starts JupyterLab as a systemd service on port 8888 |
 | **SSH keepalive** | Configures server-side keepalive to prevent idle disconnects |
+| **VSCode workspace** | Creates `~/workspace/.vscode/` with `launch.json` and `tasks.json` (auto-detected `cuda-gdb` path and GPU arch), plus an example `saxpy.cu` |
 ### 📊 GPU Benchmark
@@ -181,6 +182,28 @@ ssh -i ~/.ssh/id_ed25519 -NL 8888:localhost:8888 ubuntu@<public-ip>
 A **GPU smoke test notebook** (`~/gpu_smoke_test.ipynb`) is pre-installed on every instance. Open it in JupyterLab to interactively verify the CUDA stack, run FP32/FP16 matmuls, train a small CNN on MNIST, and visualise training loss and GPU memory usage.
+### 🖥️ VSCode Remote SSH
+The remote setup creates a `~/workspace` folder with pre-configured CUDA debug and build tasks:
+```
+~/workspace/
+├── .vscode/
+│   ├── launch.json   # CUDA debug configs (cuda-gdb path auto-detected)
+│   └── tasks.json    # nvcc build tasks (GPU arch auto-detected, e.g. sm_75)
+└── saxpy.cu          # Example CUDA source — open and press F5 to debug
+```
+Connect directly from your terminal:
+```bash
+code --folder-uri vscode-remote://ssh-remote+aws-gpu1/home/ubuntu/workspace
+```
+Then install the [Nsight VSCE extension](https://marketplace.visualstudio.com/items?itemName=NVIDIA.nsight-vscode-edition) on the remote when prompted. Open `saxpy.cu`, set a breakpoint, and press F5.
+See [Nsight remote profiling guide](docs/nsight-remote-profiling.md) for more details on CUDA debugging and profiling workflows.
 ### 📋 Listing Resources
 ```bash
@@ -303,7 +326,7 @@ aws-bootstrap launch --instance-type t3.medium --ami-filter "ubuntu/images/hvm-s
 | GPU instance pricing | [instances.vantage.sh](https://instances.vantage.sh/aws/ec2/g4dn.xlarge) |
 | Spot instance quotas | [AWS docs](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/using-spot-limits.html) |
 | Deep Learning AMIs | [AWS docs](https://docs.aws.amazon.com/dlami/latest/devguide/what-is-dlami.html) |
-| Nvidia Nsight remote debugging | [Nvidia docs](https://docs.nvidia.com/nsight-visual-studio-edition/3.2/Content/Setup_Remote_Debugging.htm) |
+| Nsight remote GPU profiling | [Guide](docs/nsight-remote-profiling.md) — Nsight Compute, Nsight Systems, and Nsight VSCE on EC2 |
 Tutorials on setting up a CUDA environment on EC2 GPU instances:

{aws_bootstrap_g4dn-0.3.0 → aws_bootstrap_g4dn-0.4.0}/aws_bootstrap/cli.py RENAMED Viewed

@@ -277,7 +277,7 @@ def launch(
     click.echo()
     click.secho("  VSCode Remote SSH:", fg="cyan")
     click.secho(
-        f"    code --folder-uri vscode-remote://ssh-remote+{alias}/home/{config.ssh_user}",
+        f"    code --folder-uri vscode-remote://ssh-remote+{alias}/home/{config.ssh_user}/workspace",
         bold=True,
     )
@@ -410,7 +410,7 @@ def status(region, profile, gpu, instructions):
             click.secho("    VSCode Remote SSH:", fg="cyan")
             click.secho(
-                f"      code --folder-uri vscode-remote://ssh-remote+{alias}/home/{user}",
+                f"      code --folder-uri vscode-remote://ssh-remote+{alias}/home/{user}/workspace",
                 bold=True,
             )

{aws_bootstrap_g4dn-0.3.0 → aws_bootstrap_g4dn-0.4.0}/aws_bootstrap/resources/gpu_benchmark.py RENAMED Viewed

@@ -628,7 +628,9 @@ def configure_precision(device: torch.device, requested: PrecisionMode) -> Preci
     return PrecisionMode.FP32
-def print_system_info(requested_precision: PrecisionMode) -> tuple[torch.device, PrecisionMode]:
+def print_system_info(
+    requested_precision: PrecisionMode, force_cpu: bool = False
+) -> tuple[torch.device, PrecisionMode]:
     """Print system and CUDA information, return device and actual precision mode."""
     print("\n" + "=" * 60)
     print("System Information")
@@ -636,7 +638,7 @@ def print_system_info(requested_precision: PrecisionMode) -> tuple[torch.device,
     print(f"PyTorch version: {torch.__version__}")
     print(f"Python version: {sys.version.split()[0]}")
-    if torch.cuda.is_available():
+    if torch.cuda.is_available() and not force_cpu:
         device = torch.device("cuda")
         print("CUDA available: Yes")
         print(f"CUDA version: {torch.version.cuda}")
@@ -666,8 +668,11 @@ def print_system_info(requested_precision: PrecisionMode) -> tuple[torch.device,
     else:
         device = torch.device("cpu")
         actual_precision = PrecisionMode.FP32
-        print("CUDA available: No (running on CPU)")
-        print("WARNING: GPU benchmark results will not be representative!")
+        if force_cpu:
+            print("CPU-only mode requested (--cpu flag)")
+        else:
+            print("CUDA available: No (running on CPU)")
+        print("Running on CPU for benchmarking")
     print("=" * 60)
     return device, actual_precision
@@ -724,10 +729,15 @@ def main() -> None:
         action="store_true",
         help="Run CUDA/cuBLAS diagnostic tests before benchmarking",
     )
+    parser.add_argument(
+        "--cpu",
+        action="store_true",
+        help="Force CPU-only execution (for CPU vs GPU comparison)",
+    )
     args = parser.parse_args()
     requested_precision = PrecisionMode(args.precision)
-    device, actual_precision = print_system_info(requested_precision)
+    device, actual_precision = print_system_info(requested_precision, force_cpu=args.cpu)
     # Run diagnostics if requested
     if args.diagnose:

aws_bootstrap_g4dn-0.4.0/aws_bootstrap/resources/launch.json ADDED Viewed

@@ -0,0 +1,42 @@
+{
+    // CUDA debug configurations for VSCode
+    // Deployed to: ~/workspace/.vscode/launch.json
+    //
+    // Usage: Open any .cu file, press F5 to build and debug
+    "version": "0.2.0",
+    "configurations": [
+        {
+            "name": "CUDA: Build and Debug Active File",
+            "type": "cuda-gdb",
+            "request": "launch",
+            "program": "${fileDirname}/${fileBasenameNoExtension}",
+            "args": [],
+            "cwd": "${fileDirname}",
+            "miDebuggerPath": "__CUDA_GDB_PATH__",
+            "stopAtEntry": false,
+            "preLaunchTask": "nvcc: build active file (debug)"
+        },
+        {
+            "name": "CUDA: Build and Debug (stop at main)",
+            "type": "cuda-gdb",
+            "request": "launch",
+            "program": "${fileDirname}/${fileBasenameNoExtension}",
+            "args": [],
+            "cwd": "${fileDirname}",
+            "miDebuggerPath": "__CUDA_GDB_PATH__",
+            "stopAtEntry": true,
+            "preLaunchTask": "nvcc: build active file (debug)"
+        },
+        {
+            "name": "CUDA: Run Active File (no debug)",
+            "type": "cuda-gdb",
+            "request": "launch",
+            "program": "${fileDirname}/${fileBasenameNoExtension}",
+            "args": [],
+            "cwd": "${fileDirname}",
+            "miDebuggerPath": "__CUDA_GDB_PATH__",
+            "stopAtEntry": false,
+            "preLaunchTask": "nvcc: build active file (release)"
+        }
+    ]
+}

{aws_bootstrap_g4dn-0.3.0 → aws_bootstrap_g4dn-0.4.0}/aws_bootstrap/resources/remote_setup.sh RENAMED Viewed

@@ -7,7 +7,7 @@ echo "=== aws-bootstrap-g4dn remote setup ==="
 # 1. Verify GPU
 echo ""
-echo "[1/5] Verifying GPU and CUDA..."
+echo "[1/6] Verifying GPU and CUDA..."
 if command -v nvidia-smi &>/dev/null; then
     nvidia-smi --query-gpu=name,driver_version,memory.total --format=csv,noheader
 else
@@ -20,15 +20,40 @@ else
     echo "WARNING: nvcc not found (CUDA toolkit may not be installed)"
 fi
+# Make Nsight Systems (nsys) available on PATH if installed under /opt/nvidia
+if ! command -v nsys &>/dev/null; then
+    NSIGHT_DIR="/opt/nvidia/nsight-systems"
+    if [ -d "$NSIGHT_DIR" ]; then
+        # Fix permissions — the parent dir is often root-only (drwx------)
+        sudo chmod o+rx "$NSIGHT_DIR"
+        # Find the latest version directory (lexicographic sort)
+        NSYS_VERSION=$(ls -1 "$NSIGHT_DIR" | sort -V | tail -1)
+        if [ -n "$NSYS_VERSION" ] && [ -x "$NSIGHT_DIR/$NSYS_VERSION/bin/nsys" ]; then
+            NSYS_BIN="$NSIGHT_DIR/$NSYS_VERSION/bin"
+            if ! grep -q "nsight-systems" ~/.bashrc 2>/dev/null; then
+                echo "export PATH=\"$NSYS_BIN:\$PATH\"" >> ~/.bashrc
+            fi
+            export PATH="$NSYS_BIN:$PATH"
+            echo "  Nsight Systems $NSYS_VERSION added to PATH ($NSYS_BIN)"
+        else
+            echo "  WARNING: Nsight Systems directory found but no nsys binary"
+        fi
+    else
+        echo "  Nsight Systems not found at $NSIGHT_DIR"
+    fi
+else
+    echo "  nsys already on PATH: $(command -v nsys)"
+fi
 # 2. Install utilities
 echo ""
-echo "[2/5] Installing utilities..."
+echo "[2/6] Installing utilities..."
 sudo apt-get update -qq
 sudo apt-get install -y -qq htop tmux tree jq
 # 3. Set up Python environment with uv
 echo ""
-echo "[3/5] Setting up Python environment with uv..."
+echo "[3/6] Setting up Python environment with uv..."
 if ! command -v uv &>/dev/null; then
     curl -LsSf https://astral.sh/uv/install.sh | sh
 fi
@@ -153,7 +178,7 @@ echo "  Jupyter config written to $JUPYTER_CONFIG_DIR/jupyter_lab_config.py"
 # 4. Jupyter systemd service
 echo ""
-echo "[4/5] Setting up Jupyter systemd service..."
+echo "[4/6] Setting up Jupyter systemd service..."
 LOGIN_USER=$(whoami)
 sudo tee /etc/systemd/system/jupyter.service > /dev/null << SVCEOF
@@ -180,7 +205,7 @@ echo "  Jupyter service started (port 8888)"
 # 5. SSH keepalive
 echo ""
-echo "[5/5] Configuring SSH keepalive..."
+echo "[5/6] Configuring SSH keepalive..."
 if ! grep -q "ClientAliveInterval" /etc/ssh/sshd_config; then
     echo "ClientAliveInterval 60" | sudo tee -a /etc/ssh/sshd_config > /dev/null
     echo "ClientAliveCountMax 10" | sudo tee -a /etc/ssh/sshd_config > /dev/null
@@ -190,5 +215,58 @@ else
     echo "  SSH keepalive already configured"
 fi
+# 6. VSCode workspace setup
+echo ""
+echo "[6/6] Setting up VSCode workspace..."
+mkdir -p ~/workspace/.vscode
+# Detect cuda-gdb path
+CUDA_GDB_PATH=""
+if command -v cuda-gdb &>/dev/null; then
+    CUDA_GDB_PATH=$(command -v cuda-gdb)
+elif [ -x /usr/local/cuda/bin/cuda-gdb ]; then
+    CUDA_GDB_PATH="/usr/local/cuda/bin/cuda-gdb"
+else
+    # Try glob for versioned CUDA installs
+    for p in /usr/local/cuda-*/bin/cuda-gdb; do
+        if [ -x "$p" ]; then
+            CUDA_GDB_PATH="$p"
+        fi
+    done
+fi
+if [ -z "$CUDA_GDB_PATH" ]; then
+    echo "  WARNING: cuda-gdb not found — using placeholder in launch.json"
+    CUDA_GDB_PATH="cuda-gdb"
+else
+    echo "  cuda-gdb: $CUDA_GDB_PATH"
+fi
+# Detect GPU SM architecture
+GPU_ARCH=""
+if command -v nvidia-smi &>/dev/null; then
+    COMPUTE_CAP=$(nvidia-smi --query-gpu=compute_cap --format=csv,noheader 2>/dev/null | head -1 | tr -d '[:space:]')
+    if [ -n "$COMPUTE_CAP" ]; then
+        GPU_ARCH="sm_$(echo "$COMPUTE_CAP" | tr -d '.')"
+    fi
+fi
+if [ -z "$GPU_ARCH" ]; then
+    echo "  WARNING: Could not detect GPU arch — defaulting to sm_75"
+    GPU_ARCH="sm_75"
+else
+    echo "  GPU arch: $GPU_ARCH"
+fi
+# Copy example CUDA source into workspace
+cp /tmp/saxpy.cu ~/workspace/saxpy.cu
+echo "  Deployed saxpy.cu"
+# Deploy launch.json with cuda-gdb path
+sed "s|__CUDA_GDB_PATH__|${CUDA_GDB_PATH}|g" /tmp/launch.json > ~/workspace/.vscode/launch.json
+echo "  Deployed launch.json"
+# Deploy tasks.json with GPU architecture
+sed "s|__GPU_ARCH__|${GPU_ARCH}|g" /tmp/tasks.json > ~/workspace/.vscode/tasks.json
+echo "  Deployed tasks.json"
 echo ""
 echo "=== Remote setup complete ==="

aws_bootstrap_g4dn-0.4.0/aws_bootstrap/resources/saxpy.cu ADDED Viewed

@@ -0,0 +1,49 @@
+/**
+ * SAXPY Example, CUDA Style
+ * Source: https://developer.nvidia.com/blog/easy-introduction-cuda-c-and-c/
+ *
+ * This is included as an example CUDA C++ source file to try out the VS Code launch configuration we include on the host machine.
+ *
+ */
+#include <stdio.h>
+__global__
+void saxpy(int n, float a, float *x, float *y)
+{
+  int i = blockIdx.x*blockDim.x + threadIdx.x;
+  if (i < n) y[i] = a*x[i] + y[i];
+}
+int main(void)
+{
+  int N = 1<<20;
+  float *x, *y, *d_x, *d_y;
+  x = (float*)malloc(N*sizeof(float));
+  y = (float*)malloc(N*sizeof(float));
+  cudaMalloc(&d_x, N*sizeof(float));
+  cudaMalloc(&d_y, N*sizeof(float));
+  for (int i = 0; i < N; i++) {
+    x[i] = 1.0f;
+    y[i] = 2.0f;
+  }
+  cudaMemcpy(d_x, x, N*sizeof(float), cudaMemcpyHostToDevice);
+  cudaMemcpy(d_y, y, N*sizeof(float), cudaMemcpyHostToDevice);
+  // Perform SAXPY on 1M elements
+  saxpy<<<(N+255)/256, 256>>>(N, 2.0f, d_x, d_y);
+  cudaMemcpy(y, d_y, N*sizeof(float), cudaMemcpyDeviceToHost);
+  float maxError = 0.0f;
+  for (int i = 0; i < N; i++)
+    maxError = max(maxError, abs(y[i]-4.0f));
+  printf("Max error: %f\n", maxError);
+  cudaFree(d_x);
+  cudaFree(d_y);
+  free(x);
+  free(y);
+}

aws_bootstrap_g4dn-0.4.0/aws_bootstrap/resources/tasks.json ADDED Viewed

@@ -0,0 +1,48 @@
+{
+    // CUDA build tasks for VSCode
+    // Deployed to: ~/workspace/.vscode/tasks.json
+    "version": "2.0.0",
+    "tasks": [
+        {
+            "label": "nvcc: build active file (debug)",
+            "type": "shell",
+            "command": "nvcc",
+            "args": [
+                "-g",                           // Host debug symbols
+                "-G",                           // Device (GPU) debug symbols
+                "-O0",                          // No optimization
+                "-arch=__GPU_ARCH__",            // GPU arch (auto-detected)
+                "-o",
+                "${fileDirname}/${fileBasenameNoExtension}",
+                "${file}"
+            ],
+            "options": {
+                "cwd": "${fileDirname}"
+            },
+            "problemMatcher": ["$nvcc"],
+            "group": {
+                "kind": "build",
+                "isDefault": true
+            },
+            "detail": "Compile active .cu file with debug symbols (-g -G)"
+        },
+        {
+            "label": "nvcc: build active file (release)",
+            "type": "shell",
+            "command": "nvcc",
+            "args": [
+                "-O3",
+                "-arch=__GPU_ARCH__",
+                "-o",
+                "${fileDirname}/${fileBasenameNoExtension}",
+                "${file}"
+            ],
+            "options": {
+                "cwd": "${fileDirname}"
+            },
+            "problemMatcher": ["$nvcc"],
+            "group": "build",
+            "detail": "Compile active .cu file optimized (no debug)"
+        }
+    ]
+}

{aws_bootstrap_g4dn-0.3.0 → aws_bootstrap_g4dn-0.4.0}/aws_bootstrap/ssh.py RENAMED Viewed

@@ -159,6 +159,42 @@ def run_remote_setup(
         click.secho(f"  SCP failed: {nb_result.stderr}", fg="red", err=True)
         return False
+    # SCP the CUDA example source
+    saxpy_path = script_path.parent / "saxpy.cu"
+    click.echo("  Uploading saxpy.cu...")
+    saxpy_result = subprocess.run(
+        ["scp", *ssh_opts, *scp_port_opts, str(saxpy_path), f"{user}@{host}:/tmp/saxpy.cu"],
+        capture_output=True,
+        text=True,
+    )
+    if saxpy_result.returncode != 0:
+        click.secho(f"  SCP failed: {saxpy_result.stderr}", fg="red", err=True)
+        return False
+    # SCP the VSCode launch.json
+    launch_json_path = script_path.parent / "launch.json"
+    click.echo("  Uploading launch.json...")
+    launch_result = subprocess.run(
+        ["scp", *ssh_opts, *scp_port_opts, str(launch_json_path), f"{user}@{host}:/tmp/launch.json"],
+        capture_output=True,
+        text=True,
+    )
+    if launch_result.returncode != 0:
+        click.secho(f"  SCP failed: {launch_result.stderr}", fg="red", err=True)
+        return False
+    # SCP the VSCode tasks.json
+    tasks_json_path = script_path.parent / "tasks.json"
+    click.echo("  Uploading tasks.json...")
+    tasks_result = subprocess.run(
+        ["scp", *ssh_opts, *scp_port_opts, str(tasks_json_path), f"{user}@{host}:/tmp/tasks.json"],
+        capture_output=True,
+        text=True,
+    )
+    if tasks_result.returncode != 0:
+        click.secho(f"  SCP failed: {tasks_result.stderr}", fg="red", err=True)
+        return False
     # SCP the script
     click.echo("  Uploading remote_setup.sh...")
     scp_result = subprocess.run(

{aws_bootstrap_g4dn-0.3.0 → aws_bootstrap_g4dn-0.4.0}/aws_bootstrap/tests/test_cli.py RENAMED Viewed

@@ -565,7 +565,7 @@ def test_status_instructions_shown_by_default(mock_find, mock_spot, mock_session
     assert result.exit_code == 0
     assert "ssh aws-gpu1" in result.output
     assert "ssh -NL 8888:localhost:8888 aws-gpu1" in result.output
-    assert "vscode-remote://ssh-remote+aws-gpu1/home/ubuntu" in result.output
+    assert "vscode-remote://ssh-remote+aws-gpu1/home/ubuntu/workspace" in result.output
     assert "python ~/gpu_benchmark.py" in result.output

{aws_bootstrap_g4dn-0.3.0 → aws_bootstrap_g4dn-0.4.0/aws_bootstrap_g4dn.egg-info}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: aws-bootstrap-g4dn
-Version: 0.3.0
+Version: 0.4.0
 Summary: Bootstrap AWS EC2 GPU instances for hybrid local-remote development
 Author: Adam Ever-Hadani
 License-Expression: MIT
@@ -49,7 +49,7 @@ ssh aws-gpu1                  # You're in, venv activated, PyTorch works
 ### 🎯 Target Workflows
 1. **Jupyter server-client** — Jupyter runs on the instance, connect from your local browser
-2. **VSCode Remote SSH** — `ssh aws-gpu1` just works with the Remote SSH extension
+2. **VSCode Remote SSH** — opens `~/workspace` with pre-configured CUDA debug/build tasks and an example `.cu` file
 3. **NVIDIA Nsight remote debugging** — GPU debugging over SSH
 ---
@@ -162,6 +162,7 @@ The setup script runs automatically on the instance after SSH becomes available:
 | **GPU smoke test notebook** | Copies `gpu_smoke_test.ipynb` to `~/gpu_smoke_test.ipynb` (open in JupyterLab) |
 | **Jupyter** | Configures and starts JupyterLab as a systemd service on port 8888 |
 | **SSH keepalive** | Configures server-side keepalive to prevent idle disconnects |
+| **VSCode workspace** | Creates `~/workspace/.vscode/` with `launch.json` and `tasks.json` (auto-detected `cuda-gdb` path and GPU arch), plus an example `saxpy.cu` |
 ### 📊 GPU Benchmark
@@ -200,6 +201,28 @@ ssh -i ~/.ssh/id_ed25519 -NL 8888:localhost:8888 ubuntu@<public-ip>
 A **GPU smoke test notebook** (`~/gpu_smoke_test.ipynb`) is pre-installed on every instance. Open it in JupyterLab to interactively verify the CUDA stack, run FP32/FP16 matmuls, train a small CNN on MNIST, and visualise training loss and GPU memory usage.
+### 🖥️ VSCode Remote SSH
+The remote setup creates a `~/workspace` folder with pre-configured CUDA debug and build tasks:
+```
+~/workspace/
+├── .vscode/
+│   ├── launch.json   # CUDA debug configs (cuda-gdb path auto-detected)
+│   └── tasks.json    # nvcc build tasks (GPU arch auto-detected, e.g. sm_75)
+└── saxpy.cu          # Example CUDA source — open and press F5 to debug
+```
+Connect directly from your terminal:
+```bash
+code --folder-uri vscode-remote://ssh-remote+aws-gpu1/home/ubuntu/workspace
+```
+Then install the [Nsight VSCE extension](https://marketplace.visualstudio.com/items?itemName=NVIDIA.nsight-vscode-edition) on the remote when prompted. Open `saxpy.cu`, set a breakpoint, and press F5.
+See [Nsight remote profiling guide](docs/nsight-remote-profiling.md) for more details on CUDA debugging and profiling workflows.
 ### 📋 Listing Resources
 ```bash
@@ -322,7 +345,7 @@ aws-bootstrap launch --instance-type t3.medium --ami-filter "ubuntu/images/hvm-s
 | GPU instance pricing | [instances.vantage.sh](https://instances.vantage.sh/aws/ec2/g4dn.xlarge) |
 | Spot instance quotas | [AWS docs](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/using-spot-limits.html) |
 | Deep Learning AMIs | [AWS docs](https://docs.aws.amazon.com/dlami/latest/devguide/what-is-dlami.html) |
-| Nvidia Nsight remote debugging | [Nvidia docs](https://docs.nvidia.com/nsight-visual-studio-edition/3.2/Content/Setup_Remote_Debugging.htm) |
+| Nsight remote GPU profiling | [Guide](docs/nsight-remote-profiling.md) — Nsight Compute, Nsight Systems, and Nsight VSCE on EC2 |
 Tutorials on setting up a CUDA environment on EC2 GPU instances:

{aws_bootstrap_g4dn-0.3.0 → aws_bootstrap_g4dn-0.4.0}/aws_bootstrap_g4dn.egg-info/SOURCES.txt RENAMED Viewed

@@ -21,8 +21,11 @@ aws_bootstrap/ssh.py
 aws_bootstrap/resources/__init__.py
 aws_bootstrap/resources/gpu_benchmark.py
 aws_bootstrap/resources/gpu_smoke_test.ipynb
+aws_bootstrap/resources/launch.json
 aws_bootstrap/resources/remote_setup.sh
 aws_bootstrap/resources/requirements.txt
+aws_bootstrap/resources/saxpy.cu
+aws_bootstrap/resources/tasks.json
 aws_bootstrap/tests/__init__.py
 aws_bootstrap/tests/test_cli.py
 aws_bootstrap/tests/test_config.py
@@ -35,4 +38,5 @@ aws_bootstrap_g4dn.egg-info/SOURCES.txt
 aws_bootstrap_g4dn.egg-info/dependency_links.txt
 aws_bootstrap_g4dn.egg-info/entry_points.txt
 aws_bootstrap_g4dn.egg-info/requires.txt
-aws_bootstrap_g4dn.egg-info/top_level.txt
+aws_bootstrap_g4dn.egg-info/top_level.txt
+docs/nsight-remote-profiling.md

aws_bootstrap_g4dn-0.4.0/docs/nsight-remote-profiling.md ADDED Viewed

@@ -0,0 +1,245 @@
+# NVIDIA Nsight Remote GPU Profiling on EC2
+Guide to using NVIDIA's Nsight profiling and debugging tools with remote EC2 GPU instances provisioned by `aws-bootstrap`.
+## Overview
+NVIDIA provides several Nsight tools for GPU profiling and debugging. The most relevant ones for remote EC2 work are:
+| Tool | Purpose | macOS Host | Ports Required | Best Approach |
+|------|---------|-----------|----------------|---------------|
+| **Nsight Compute** | CUDA kernel profiling | Native GUI | SSH only (22) | GUI remote or CLI + local viewer |
+| **Nsight Systems** | System-wide tracing | Native GUI | SSH (22) + 45555 | CLI + local viewer |
+| **Nsight VSCE** | Interactive CUDA debugging | Via VSCode | SSH only (22) | VSCode Remote SSH |
+| **Nsight Graphics** | Graphics/shader profiling | No | SSH only (22) | CLI captures (graphics workloads only) |
+---
+## Nsight Compute (Kernel-Level Profiler)
+Nsight Compute is the most straightforward tool for remote profiling over SSH. It provides per-kernel performance metrics, roofline analysis, occupancy analysis, and memory throughput data.
+### How It Works
+The GUI (`ncu-ui`) runs on your local machine and connects to the EC2 instance over SSH. Nsight Compute automatically deploys its CLI tools to a deployment directory on the remote target on first connection. All profiling traffic is tunneled through SSH — no extra ports needed.
+Two profiling modes are available:
+- **Interactive:** A SOCKS proxy tunnels through SSH, letting you step through kernels and control execution in real time.
+- **Non-Interactive:** The profiler runs to completion on the remote and copies the report back automatically via SSH remote forwarding.
+### Setup
+**Local machine (macOS/Linux/Windows):**
+1. Download Nsight Compute from [NVIDIA Developer](https://developer.nvidia.com/tools-overview/nsight-compute/get-started) (free, requires NVIDIA developer account)
+2. Install `ncu-ui` (the GUI application). As of 2025, macOS ARM64 (Apple Silicon) is natively supported.
+**Remote EC2 instance:**
+Nothing extra is needed — the GUI auto-deploys the CLI on first connection. The Deep Learning AMI already includes the CUDA toolkit.
+### GPU Performance Counter Permissions
+By default, non-admin users cannot access GPU performance counters, which results in `ERR_NVGPUCTRPERM` errors. To fix this:
+```bash
+ssh aws-gpu1
+sudo bash -c 'echo "options nvidia NVreg_RestrictProfilingToAdminUsers=0" > /etc/modprobe.d/nvidia.conf'
+sudo update-initramfs -u -k all
+sudo reboot
+```
+> **Important:** Rebooting an EC2 instance without an Elastic IP will assign a new public IP. After reboot, run `aws-bootstrap status` to see the new IP and update the SSH config alias. You may need to `aws-bootstrap terminate` and re-launch, or manually update `~/.ssh/config`. This is a one-time setup per instance.
+### Workflow A: GUI Remote Profiling
+1. Open `ncu-ui` locally.
+2. Click **Connect** and add a new SSH connection:
+   - **Host:** your EC2 public IP (from `aws-bootstrap status`)
+   - **Username:** `ubuntu`
+   - **Port:** 22 (or your custom `--ssh-port`)
+   - **Authentication:** Private key (`~/.ssh/id_ed25519`)
+3. Select the CUDA binary to profile on the remote machine.
+4. Choose an output file location on your local machine.
+5. Click **Launch** to start profiling.
+Nsight Compute supports `ProxyJump` and `ProxyCommand` SSH options if you need to reach the instance through a bastion host.
+### Workflow B: CLI on Remote, View Locally (Recommended)
+This is the most reliable approach — avoids real-time connection issues:
+```bash
+# Profile on the remote instance
+ssh aws-gpu1 'ncu -o /tmp/profile --set full ./my_cuda_app'
+# Download the report
+scp aws-gpu1:/tmp/profile.ncu-rep .
+# Open locally in the GUI
+ncu-ui profile.ncu-rep
+```
+For source-level correlation, compile with `nvcc --lineinfo`.
+### References
+- [Nsight Compute Documentation](https://docs.nvidia.com/nsight-compute/NsightCompute/index.html)
+- [How to Set Up Nsight Compute on EC2](https://tspeterkim.github.io/posts/nsight-setup-on-ec2) — step-by-step walkthrough with screenshots
+---
+## Nsight Systems (System-Wide Profiler)
+Nsight Systems traces CPU activity, GPU workloads (CUDA, Vulkan), OS runtime, threading, memory transfers, and NVTX annotations on a unified timeline. Useful for understanding end-to-end application performance.
+### Security Caveat
+Unlike Nsight Compute, Nsight Systems uses SSH only for the initial connection. **Actual profiling data transfers over a raw, unencrypted TCP socket on port 45555.** NVIDIA explicitly warns against using this on untrusted networks.
+For EC2, you can mitigate this by tunneling port 45555 through SSH:
+```bash
+ssh -L 45555:localhost:45555 aws-gpu1
+```
+Then configure the Nsight Systems GUI to connect to `localhost` instead of the remote IP.
+### Setup
+**Local machine:**
+Download Nsight Systems from [NVIDIA Developer](https://developer.nvidia.com/nsight-systems/get-started). The GUI (`nsys-ui`) is available for macOS, Linux, and Windows.
+**Remote EC2 instance:**
+The `nsys` CLI is typically included with the CUDA toolkit on Deep Learning AMIs. Verify with:
+```bash
+ssh aws-gpu1 'nsys status -e'
+```
+Additionally, Netcat must be installed (required by the remote profiling daemon):
+```bash
+ssh aws-gpu1 'sudo apt-get install -y netcat'
+```
+### Port Requirements
+If using GUI remote profiling (not the CLI workflow), you need **port 45555** open in the EC2 security group in addition to SSH. The current `aws-bootstrap` security group only opens SSH — you would need to manually add the rule via the AWS console or CLI, or use the SSH tunnel approach described above.
+### Workflow: CLI on Remote, View Locally (Recommended)
+This avoids the port 45555 requirement entirely:
+```bash
+# Profile on the remote instance
+ssh aws-gpu1 'nsys profile --trace=cuda,nvtx --output=/tmp/report ./my_app'
+# Download the report
+scp aws-gpu1:/tmp/report.nsys-rep .
+# Open locally in the GUI
+nsys-ui report.nsys-rep
+```
+### References
+- [Nsight Systems User Guide](https://docs.nvidia.com/nsight-systems/UserGuide/index.html)
+- [Nsight Systems Installation Guide](https://docs.nvidia.com/nsight-systems/InstallationGuide/index.html)
+---
+## Nsight Visual Studio Code Edition (CUDA Debugger)
+Nsight VSCE is a VS Code extension for building and debugging CUDA applications. This is the most natural fit for the `aws-bootstrap` workflow since it works directly with VSCode Remote SSH.
+### How It Works
+The extension provides CUDA debugging via `cuda-gdb` (or `cuda-gdbserver` for explicit remote setups). When used with VSCode Remote SSH, everything runs on the remote instance — the extension, the compiler, the debugger.
+Features include:
+- Breakpoints in GPU device code (including conditional breakpoints)
+- GPU register, variable, and call-stack inspection
+- Warp and lane focus controls (switch between streaming multiprocessors, warps, lanes)
+- Full CPU thread inspection while stopped in GPU code, and vice versa
+- CUDA-aware syntax highlighting and IntelliSense
+### Setup
+**Local machine:**
+1. Install [VSCode](https://code.visualstudio.com/)
+2. Install the [Remote - SSH](https://marketplace.visualstudio.com/items?itemName=ms-vscode-remote.remote-ssh) extension
+**Remote EC2 instance (via VSCode Remote SSH):**
+1. Connect to the instance: `code --folder-uri vscode-remote://ssh-remote+aws-gpu1/home/ubuntu/workspace`
+2. Install the [Nsight VSCE extension](https://marketplace.visualstudio.com/items?itemName=NVIDIA.nsight-vscode-edition) on the remote (VS Code will prompt)
+3. `cuda-gdb` is included with the CUDA toolkit on Deep Learning AMIs
+### Debugging Workflow
+1. Connect to `aws-gpu1` via VSCode Remote SSH (opens `~/workspace`).
+2. `launch.json` and `tasks.json` are pre-configured in `~/workspace/.vscode/` with the detected `cuda-gdb` path and GPU architecture.
+3. Open or create `.cu` files in `~/workspace`.
+4. Set breakpoints in your `.cu` files.
+5. Press F5 to start debugging.
+### Known Issues
+- `cuda-gdb` may require root privileges for GPU access. The same `NVreg_RestrictProfilingToAdminUsers=0` modprobe fix (described in the Nsight Compute section) resolves this. Alternatively, create a sudoers entry for `cuda-gdb`.
+- Some users report the debugger failing to start on certain Remote SSH configurations. Check the Debug Console output for error details.
+### References
+- [Nsight VSCE Documentation](https://docs.nvidia.com/nsight-visual-studio-code-edition/latest/)
+- [Nsight VSCE on VS Code Marketplace](https://marketplace.visualstudio.com/items?itemName=NVIDIA.nsight-vscode-edition)
+- [Nsight VSCE on GitHub](https://github.com/NVIDIA/nsight-vscode-edition)
+---
+## Quick Reference
+### Common Setup: GPU Performance Counter Access
+Required for Nsight Compute profiling and `cuda-gdb` debugging. This is a one-time setup per instance but **requires a reboot**:
+```bash
+ssh aws-gpu1
+sudo bash -c 'echo "options nvidia NVreg_RestrictProfilingToAdminUsers=0" > /etc/modprobe.d/nvidia.conf'
+sudo update-initramfs -u -k all
+sudo reboot
+```
+After reboot, the instance will have a new public IP (unless using an Elastic IP). Run `aws-bootstrap status` to see the updated address.
+### Recommended Approach: CLI Profiling + Local Viewer
+The most practical and secure workflow for `aws-bootstrap` instances:
+```bash
+# Kernel profiling with Nsight Compute
+ssh aws-gpu1 'ncu -o /tmp/profile --set full ./my_cuda_app'
+scp aws-gpu1:/tmp/profile.ncu-rep .
+ncu-ui profile.ncu-rep
+# System profiling with Nsight Systems
+ssh aws-gpu1 'nsys profile --trace=cuda,nvtx --output=/tmp/report ./my_app'
+scp aws-gpu1:/tmp/report.nsys-rep .
+nsys-ui report.nsys-rep
+```
+This requires no additional ports, no security group changes, and works with the existing SSH configuration that `aws-bootstrap` sets up.
+### Port Summary
+| Tool | Method | Ports |
+|------|--------|-------|
+| Nsight Compute (GUI remote) | SSH tunnel | 22 only |
+| Nsight Compute (CLI + scp) | SSH | 22 only |
+| Nsight Systems (GUI remote) | SSH + raw socket | 22 + 45555 |
+| Nsight Systems (CLI + scp) | SSH | 22 only |
+| Nsight VSCE (VSCode) | Remote SSH | 22 only |

{aws_bootstrap_g4dn-0.3.0 → aws_bootstrap_g4dn-0.4.0}/pyproject.toml RENAMED Viewed

@@ -33,7 +33,7 @@ aws-bootstrap = "aws_bootstrap.cli:main"
 include = ["aws_bootstrap*"]
 [tool.setuptools.package-data]
-"aws_bootstrap.resources" = ["*.sh", "*.txt", "*.ipynb"]
+"aws_bootstrap.resources" = ["*.sh", "*.txt", "*.ipynb", "*.json", "*.cu"]
 [tool.setuptools_scm]