PyPI - wafer-cli - Versions diffs - 0.2.52__tar.gz → 0.2.54__tar.gz - Mend

wafer-cli 0.2.52tar.gz → 0.2.54tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (146) hide show

{wafer_cli-0.2.52 → wafer_cli-0.2.54}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: wafer-cli
-Version: 0.2.52
+Version: 0.2.54
 Summary: CLI for running GPU workloads, managing remote workspaces, and evaluating/optimizing kernels
 Requires-Python: >=3.11
 Description-Content-Type: text/markdown
@@ -67,11 +67,11 @@ Create and manage persistent GPU environments.
 - `B200` - NVIDIA Blackwell B200 (180GB HBM3e, CUDA) - default
 ```bash
-wafer workspaces list
-wafer workspaces create my-workspace --gpu B200 --wait   # NVIDIA B200
-wafer workspaces create amd-dev --gpu MI300X             # AMD MI300X
-wafer workspaces ssh <workspace-id>
-wafer workspaces delete <workspace-id>
+wafer target workspace list
+wafer target workspace create my-workspace --gpu B200 --wait   # NVIDIA B200
+wafer target workspace create amd-dev --gpu MI300X             # AMD MI300X
+wafer target workspace ssh <workspace-id>
+wafer target workspace delete <workspace-id>
 ```
 ### `wafer agent`
@@ -83,17 +83,17 @@ wafer agent "What is TMEM in CuTeDSL?"
 wafer agent -s "optimize this kernel" < kernel.py
 ```
-### `wafer evaluate`
+### `wafer tool eval`
 Evaluate kernel correctness and performance against a reference implementation.
 **Functional format** (default):
 ```bash
 # Generate template files
-wafer evaluate make-template ./my-kernel
+wafer tool eval make-template ./my-kernel
 # Run evaluation
-wafer evaluate --impl kernel.py --reference ref.py --test-cases tests.json --benchmark
+wafer tool eval gpumode --impl kernel.py --reference ref.py --test-cases tests.json --benchmark
 ```
 The implementation must define `custom_kernel(inputs)`, the reference must define `ref_kernel(inputs)` and `generate_input(**params)`.
@@ -101,64 +101,40 @@ The implementation must define `custom_kernel(inputs)`, the reference must defin
 **KernelBench format** (ModelNew class):
 ```bash
 # Extract a KernelBench problem as template
-wafer evaluate kernelbench make-template level1/1
+wafer tool eval kernelbench make-template level1/1
 # Run evaluation
-wafer evaluate kernelbench --impl my_kernel.py --reference problem.py --benchmark
+wafer tool eval kernelbench --impl my_kernel.py --reference problem.py --benchmark
 ```
 The implementation must define `class ModelNew(nn.Module)`, the reference must define `class Model`, `get_inputs()`, and `get_init_inputs()`.
-### `wafer wevin -t ask-docs`
+### `wafer agent -t ask-docs`
-Query GPU documentation using the docs template.
+Query GPU documentation using the docs template. Uses the `ask_docs` tool to search wafer's documentation corpus via the API.
 ```bash
-wafer wevin -t ask-docs --json -s "What causes bank conflicts in shared memory?"
-```
-### `wafer corpus`
-Download documentation to local filesystem for agents to search.
-```bash
-wafer corpus list
-wafer corpus download cuda-programming-guide
+wafer agent -t ask-docs -s "What causes bank conflicts in shared memory?"
 ```
 ---
 ## Customization
-### `wafer remote-run` options
+### `wafer tool eval` options
 ```bash
-wafer remote-run --image pytorch/pytorch:2.5.1-cuda12.4-cudnn9-devel -- python3 script.py
-wafer remote-run --require-hwc -- ncu --set full python3 bench.py   # Hardware counters for NCU
-```
-### `wafer evaluate` options
-```bash
-wafer evaluate --impl k.py --reference r.py --test-cases t.json \
+wafer tool eval gpumode --impl k.py --reference r.py --test-cases t.json \
     --target vultr-b200 \    # Specific GPU target
     --benchmark \            # Measure performance
     --profile                # Enable torch.profiler + NCU
 ```
-### `wafer push` for multi-command workflows
-```bash
-WORKSPACE=$(wafer push ./project)
-wafer remote-run --workspace-id $WORKSPACE -- python3 test1.py
-wafer remote-run --workspace-id $WORKSPACE -- python3 test2.py
-```
 ### Profile analysis
 ```bash
-wafer nvidia ncu analyze profile.ncu-rep
-wafer nvidia nsys analyze profile.nsys-rep
+wafer tool ncu analyze profile.ncu-rep
+wafer tool nsys analyze profile.nsys-rep
 ```
 ---
@@ -170,9 +146,9 @@ wafer nvidia nsys analyze profile.nsys-rep
 Bypass the API and SSH directly to your own GPUs:
 ```bash
-wafer targets list
-wafer targets add ./my-gpu.toml
-wafer targets default my-gpu
+wafer target config list
+wafer target config add ./my-gpu.toml
+wafer target config default my-gpu
 ```
 ### Defensive evaluation
@@ -180,23 +156,23 @@ wafer targets default my-gpu
 Detect evaluation hacking (stream injection, lazy evaluation, etc.):
 ```bash
-wafer evaluate --impl k.py --reference r.py --test-cases t.json --benchmark --defensive
+wafer tool eval gpumode --impl k.py --reference r.py --test-cases t.json --benchmark --defensive
 ```
 ### Other tools
 ```bash
-wafer perfetto <trace.json> --query "SELECT * FROM slice"   # Perfetto SQL queries
-wafer capture ./script.py                                    # Capture execution snapshot
-wafer compiler-analyze kernel.ptx                            # Analyze PTX/SASS
+wafer tool perfetto <trace.json> --query "SELECT * FROM slice"   # Perfetto SQL queries
+wafer tool capture ./script.py                                    # Capture execution snapshot
+wafer compiler-analyze kernel.ptx                                 # Analyze PTX/SASS
 ```
 ### ROCm profiling (AMD GPUs)
 ```bash
-wafer rocprof-sdk ...
-wafer rocprof-systems ...
-wafer rocprof-compute ...
+wafer tool rocprof-sdk ...
+wafer tool rocprof-systems ...
+wafer tool rocprof-compute ...
 ```
 ---
@@ -214,10 +190,10 @@ source ~/.zshrc  # or ~/.bashrc
 ```
 Now you can tab-complete:
-- Commands: `wafer eva<TAB>` → `wafer evaluate`
-- Options: `wafer evaluate --<TAB>`
-- Target names: `wafer evaluate --target v<TAB>` → `wafer evaluate --target vultr-b200`
-- File paths: `wafer evaluate --impl ./<TAB>`
+- Commands: `wafer tool ev<TAB>` → `wafer tool eval`
+- Options: `wafer tool eval --<TAB>`
+- Target names: `wafer tool eval --target v<TAB>` → `wafer tool eval --target vultr-b200`
+- File paths: `wafer tool eval gpumode --impl ./<TAB>`
 ---

{wafer_cli-0.2.52 → wafer_cli-0.2.54}/README.md RENAMED Viewed

@@ -49,11 +49,11 @@ Create and manage persistent GPU environments.
 - `B200` - NVIDIA Blackwell B200 (180GB HBM3e, CUDA) - default
 ```bash
-wafer workspaces list
-wafer workspaces create my-workspace --gpu B200 --wait   # NVIDIA B200
-wafer workspaces create amd-dev --gpu MI300X             # AMD MI300X
-wafer workspaces ssh <workspace-id>
-wafer workspaces delete <workspace-id>
+wafer target workspace list
+wafer target workspace create my-workspace --gpu B200 --wait   # NVIDIA B200
+wafer target workspace create amd-dev --gpu MI300X             # AMD MI300X
+wafer target workspace ssh <workspace-id>
+wafer target workspace delete <workspace-id>
 ```
 ### `wafer agent`
@@ -65,17 +65,17 @@ wafer agent "What is TMEM in CuTeDSL?"
 wafer agent -s "optimize this kernel" < kernel.py
 ```
-### `wafer evaluate`
+### `wafer tool eval`
 Evaluate kernel correctness and performance against a reference implementation.
 **Functional format** (default):
 ```bash
 # Generate template files
-wafer evaluate make-template ./my-kernel
+wafer tool eval make-template ./my-kernel
 # Run evaluation
-wafer evaluate --impl kernel.py --reference ref.py --test-cases tests.json --benchmark
+wafer tool eval gpumode --impl kernel.py --reference ref.py --test-cases tests.json --benchmark
 ```
 The implementation must define `custom_kernel(inputs)`, the reference must define `ref_kernel(inputs)` and `generate_input(**params)`.
@@ -83,64 +83,40 @@ The implementation must define `custom_kernel(inputs)`, the reference must defin
 **KernelBench format** (ModelNew class):
 ```bash
 # Extract a KernelBench problem as template
-wafer evaluate kernelbench make-template level1/1
+wafer tool eval kernelbench make-template level1/1
 # Run evaluation
-wafer evaluate kernelbench --impl my_kernel.py --reference problem.py --benchmark
+wafer tool eval kernelbench --impl my_kernel.py --reference problem.py --benchmark
 ```
 The implementation must define `class ModelNew(nn.Module)`, the reference must define `class Model`, `get_inputs()`, and `get_init_inputs()`.
-### `wafer wevin -t ask-docs`
+### `wafer agent -t ask-docs`
-Query GPU documentation using the docs template.
+Query GPU documentation using the docs template. Uses the `ask_docs` tool to search wafer's documentation corpus via the API.
 ```bash
-wafer wevin -t ask-docs --json -s "What causes bank conflicts in shared memory?"
-```
-### `wafer corpus`
-Download documentation to local filesystem for agents to search.
-```bash
-wafer corpus list
-wafer corpus download cuda-programming-guide
+wafer agent -t ask-docs -s "What causes bank conflicts in shared memory?"
 ```
 ---
 ## Customization
-### `wafer remote-run` options
+### `wafer tool eval` options
 ```bash
-wafer remote-run --image pytorch/pytorch:2.5.1-cuda12.4-cudnn9-devel -- python3 script.py
-wafer remote-run --require-hwc -- ncu --set full python3 bench.py   # Hardware counters for NCU
-```
-### `wafer evaluate` options
-```bash
-wafer evaluate --impl k.py --reference r.py --test-cases t.json \
+wafer tool eval gpumode --impl k.py --reference r.py --test-cases t.json \
     --target vultr-b200 \    # Specific GPU target
     --benchmark \            # Measure performance
     --profile                # Enable torch.profiler + NCU
 ```
-### `wafer push` for multi-command workflows
-```bash
-WORKSPACE=$(wafer push ./project)
-wafer remote-run --workspace-id $WORKSPACE -- python3 test1.py
-wafer remote-run --workspace-id $WORKSPACE -- python3 test2.py
-```
 ### Profile analysis
 ```bash
-wafer nvidia ncu analyze profile.ncu-rep
-wafer nvidia nsys analyze profile.nsys-rep
+wafer tool ncu analyze profile.ncu-rep
+wafer tool nsys analyze profile.nsys-rep
 ```
 ---
@@ -152,9 +128,9 @@ wafer nvidia nsys analyze profile.nsys-rep
 Bypass the API and SSH directly to your own GPUs:
 ```bash
-wafer targets list
-wafer targets add ./my-gpu.toml
-wafer targets default my-gpu
+wafer target config list
+wafer target config add ./my-gpu.toml
+wafer target config default my-gpu
 ```
 ### Defensive evaluation
@@ -162,23 +138,23 @@ wafer targets default my-gpu
 Detect evaluation hacking (stream injection, lazy evaluation, etc.):
 ```bash
-wafer evaluate --impl k.py --reference r.py --test-cases t.json --benchmark --defensive
+wafer tool eval gpumode --impl k.py --reference r.py --test-cases t.json --benchmark --defensive
 ```
 ### Other tools
 ```bash
-wafer perfetto <trace.json> --query "SELECT * FROM slice"   # Perfetto SQL queries
-wafer capture ./script.py                                    # Capture execution snapshot
-wafer compiler-analyze kernel.ptx                            # Analyze PTX/SASS
+wafer tool perfetto <trace.json> --query "SELECT * FROM slice"   # Perfetto SQL queries
+wafer tool capture ./script.py                                    # Capture execution snapshot
+wafer compiler-analyze kernel.ptx                                 # Analyze PTX/SASS
 ```
 ### ROCm profiling (AMD GPUs)
 ```bash
-wafer rocprof-sdk ...
-wafer rocprof-systems ...
-wafer rocprof-compute ...
+wafer tool rocprof-sdk ...
+wafer tool rocprof-systems ...
+wafer tool rocprof-compute ...
 ```
 ---
@@ -196,10 +172,10 @@ source ~/.zshrc  # or ~/.bashrc
 ```
 Now you can tab-complete:
-- Commands: `wafer eva<TAB>` → `wafer evaluate`
-- Options: `wafer evaluate --<TAB>`
-- Target names: `wafer evaluate --target v<TAB>` → `wafer evaluate --target vultr-b200`
-- File paths: `wafer evaluate --impl ./<TAB>`
+- Commands: `wafer tool ev<TAB>` → `wafer tool eval`
+- Options: `wafer tool eval --<TAB>`
+- Target names: `wafer tool eval --target v<TAB>` → `wafer tool eval --target vultr-b200`
+- File paths: `wafer tool eval gpumode --impl ./<TAB>`
 ---

{wafer_cli-0.2.52 → wafer_cli-0.2.54}/pyproject.toml RENAMED Viewed

@@ -1,6 +1,6 @@
 [project]
 name = "wafer-cli"
-version = "0.2.52"
+version = "0.2.54"
 description = "CLI for running GPU workloads, managing remote workspaces, and evaluating/optimizing kernels"
 readme = "README.md"
 requires-python = ">=3.11"
@@ -37,7 +37,7 @@ where = ["."]
 include = ["wafer*"]
 [tool.setuptools.package-data]
-wafer = ["GUIDE.md", "skills/*/SKILL.md", "corpora/**/*.md"]
+wafer = ["GUIDE.md", "skills/*/*.md"]
 [tool.ruff]
 line-length = 100
@@ -78,6 +78,7 @@ ignore = [
 [tool.ruff.lint.per-file-ignores]
 "tests/**/*.py" = ["ANN001", "ANN201", "ANN202", "ANN204"]  # Don't require type annotations in tests
+"tests/test_ncu_run_local_e2e.py" = ["PLR0915"]  # E2E test has a long sequential flow by design
 "wafer/evaluate.py" = ["PLR0915", "PLR1702", "E402", "PLW2901", "ASYNC221"]  # complex deployment flows - TODO: refactor
 "wafer/output.py" = ["ANN401"]  # Output collector uses **kwargs for flexible event data
 "wafer/autotuner.py" = ["PLR0915", "PLR1702", "B007", "B904"]  # complex sweep logic - TODO: refactor

wafer-cli 0.2.52__tar.gz → 0.2.54__tar.gz

wafer-cli 0.2.52tar.gz → 0.2.54tar.gz