PyPI - modelstudio - Versions diffs - 0.5.0__tar.gz → 0.6.0__tar.gz - Mend

modelstudio 0.5.0tar.gz → 0.6.0tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (235) hide show

{modelstudio-0.5.0 → modelstudio-0.6.0}/CMakeLists.txt RENAMED Viewed

@@ -5,6 +5,16 @@ option(MODELSTUDIO_ENABLE_CUDA "Build CUDA backend" OFF)
 option(MODELSTUDIO_ENABLE_ROCM "Build ROCm backend" OFF)
 option(MODELSTUDIO_ENABLE_ONEAPI "Build oneAPI backend" OFF)
+if(MODELSTUDIO_ENABLE_CUDA)
+  include(CheckLanguage)
+  check_language(CUDA)
+  if(NOT CMAKE_CUDA_COMPILER)
+    message(FATAL_ERROR "MODELSTUDIO_ENABLE_CUDA=ON requires an NVIDIA CUDA compiler/toolkit, but none was found.")
+  endif()
+  enable_language(CUDA)
+  find_package(CUDAToolkit REQUIRED)
+endif()
 set(CMAKE_CXX_STANDARD 20)
 set(CMAKE_CXX_STANDARD_REQUIRED ON)
 set(CMAKE_CXX_EXTENSIONS OFF)

{modelstudio-0.5.0/python/modelstudio.egg-info → modelstudio-0.6.0}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: modelstudio
-Version: 0.5.0
+Version: 0.6.0
 Summary: An early-stage AI tensor framework with CPU tensors, autograd, and backend extension scaffolding.
 Author: ModelStudio Contributors
 License-Expression: MIT
@@ -31,14 +31,14 @@ Dynamic: license-file
 # ModelStudio
-ModelStudio is an early-stage AI tensor framework. Version `0.5.0` provides a
+ModelStudio is an early-stage AI tensor framework. Version `0.6.0` provides a
 CPU tensor/autograd MVP with neural-network modules, optimizers, serialization,
-data loading, graph tracing metadata, backend status inspection, and small
-LLM-oriented building blocks.
+data loading, graph tracing metadata, backend status inspection, a public CUDA
+availability namespace, and small LLM-oriented building blocks.
-It is not a PyTorch or TensorFlow replacement. CPU is the only working backend.
-CUDA, ROCm, and oneAPI remain explicit scaffolds until real kernels are built
-and tested.
+It is not a PyTorch or TensorFlow replacement. The default PyPI package is
+CPU-only. CUDA, ROCm, and oneAPI remain explicit scaffolds until real kernels
+are built and tested in hardware-backed environments.
 ## Installation
@@ -74,6 +74,24 @@ python -m pip install -e ".[dev]"
 | Interop | `asarray`, `from_numpy`, `to_numpy`, and `ms.numpy` |
 | Metrics | accuracy and top-k accuracy |
 | Compiler | Metadata-only tracing plus placeholder IR and passes |
+| CUDA API | Availability, device-count, sync, and memory-status facade; tensor execution is not implemented in the CPU wheel |
+## Architecture
+```text
+Python frontend
+  -> Tensor, nn, optim, autograd, ops
+  -> runtime dispatcher
+  -> backend interface
+  -> NumPy CPU backend today
+  -> optional native CPU / CUDA / ROCm / oneAPI extensions later
+Native scaffold
+  -> core metadata
+  -> dispatcher interfaces
+  -> CPU kernel prototypes
+  -> CUDA, ROCm, oneAPI backend directories
+```
 ## Backend Status
@@ -89,7 +107,7 @@ Expected shape:
 ```python
 {
     "cpu": {"available": True, "native": False},
-    "cuda": {"available": False, "reason": "..."},
+    "cuda": {"available": False, "built": False, "device_count": 0, "reason": "..."},
     "rocm": {"available": False, "reason": "..."},
     "oneapi": {"available": False, "reason": "..."},
 }
@@ -100,6 +118,17 @@ raises `ModelStudioBackendUnavailable` unless a future optional native extension
 is actually installed. Unsupported accelerator devices fail with
 `ModelStudioBackendUnavailable`.
+CUDA availability can also be checked through the public namespace:
+```python
+print(ms.cuda.is_available())
+print(ms.cuda.device_count())
+print(ms.cuda.memory_summary())
+```
+In the CPU-only wheel, explicit CUDA tensor requests raise a clear runtime error
+instead of falling back to CPU.
 ## Tensor Example
 ```python
@@ -195,6 +224,8 @@ python examples/backend_status.py
 python examples/tracing_demo.py
 python examples/functional_training.py
 python examples/random_linalg_demo.py
+python examples/cuda_tensor_demo.py
+python examples/cuda_mlp_demo.py
 python benchmarks/bench_matmul.py
 python benchmarks/bench_mlp.py
 python benchmarks/bench_attention.py
@@ -205,11 +236,14 @@ python benchmarks/bench_creation.py
 python benchmarks/bench_manipulation.py
 python benchmarks/bench_elementwise.py
 python benchmarks/bench_trace.py
+python benchmarks/bench_cuda_elementwise.py
+python benchmarks/bench_cuda_matmul.py
 ```
 ## Documentation
 - [Backend status](docs/backend-status.md)
+- [CUDA status](docs/cuda.md)
 - [Tracing](docs/tracing.md)
 - [Functional API](docs/functional-api.md)
 - [Random namespace](docs/random.md)
@@ -237,5 +271,8 @@ python benchmarks/bench_trace.py
 - Expand tensor and autograd coverage.
 - Wire optional native CPU kernels only after a safe Python extension exists.
-- Add tested CUDA, ROCm, and oneAPI packages when hardware-backed CI exists.
+- Build a real optional CUDA package after tensor storage, kernels, bindings,
+  and hardware-backed CI are in place.
+- Add tested ROCm and oneAPI packages after CUDA establishes the accelerator
+  backend contract.
 - Improve compiler graph capture, analysis passes, and lowering.

{modelstudio-0.5.0 → modelstudio-0.6.0}/README.md RENAMED Viewed

@@ -1,13 +1,13 @@
 # ModelStudio
-ModelStudio is an early-stage AI tensor framework. Version `0.5.0` provides a
+ModelStudio is an early-stage AI tensor framework. Version `0.6.0` provides a
 CPU tensor/autograd MVP with neural-network modules, optimizers, serialization,
-data loading, graph tracing metadata, backend status inspection, and small
-LLM-oriented building blocks.
+data loading, graph tracing metadata, backend status inspection, a public CUDA
+availability namespace, and small LLM-oriented building blocks.
-It is not a PyTorch or TensorFlow replacement. CPU is the only working backend.
-CUDA, ROCm, and oneAPI remain explicit scaffolds until real kernels are built
-and tested.
+It is not a PyTorch or TensorFlow replacement. The default PyPI package is
+CPU-only. CUDA, ROCm, and oneAPI remain explicit scaffolds until real kernels
+are built and tested in hardware-backed environments.
 ## Installation
@@ -43,6 +43,24 @@ python -m pip install -e ".[dev]"
 | Interop | `asarray`, `from_numpy`, `to_numpy`, and `ms.numpy` |
 | Metrics | accuracy and top-k accuracy |
 | Compiler | Metadata-only tracing plus placeholder IR and passes |
+| CUDA API | Availability, device-count, sync, and memory-status facade; tensor execution is not implemented in the CPU wheel |
+## Architecture
+```text
+Python frontend
+  -> Tensor, nn, optim, autograd, ops
+  -> runtime dispatcher
+  -> backend interface
+  -> NumPy CPU backend today
+  -> optional native CPU / CUDA / ROCm / oneAPI extensions later
+Native scaffold
+  -> core metadata
+  -> dispatcher interfaces
+  -> CPU kernel prototypes
+  -> CUDA, ROCm, oneAPI backend directories
+```
 ## Backend Status
@@ -58,7 +76,7 @@ Expected shape:
 ```python
 {
     "cpu": {"available": True, "native": False},
-    "cuda": {"available": False, "reason": "..."},
+    "cuda": {"available": False, "built": False, "device_count": 0, "reason": "..."},
     "rocm": {"available": False, "reason": "..."},
     "oneapi": {"available": False, "reason": "..."},
 }
@@ -69,6 +87,17 @@ raises `ModelStudioBackendUnavailable` unless a future optional native extension
 is actually installed. Unsupported accelerator devices fail with
 `ModelStudioBackendUnavailable`.
+CUDA availability can also be checked through the public namespace:
+```python
+print(ms.cuda.is_available())
+print(ms.cuda.device_count())
+print(ms.cuda.memory_summary())
+```
+In the CPU-only wheel, explicit CUDA tensor requests raise a clear runtime error
+instead of falling back to CPU.
 ## Tensor Example
 ```python
@@ -164,6 +193,8 @@ python examples/backend_status.py
 python examples/tracing_demo.py
 python examples/functional_training.py
 python examples/random_linalg_demo.py
+python examples/cuda_tensor_demo.py
+python examples/cuda_mlp_demo.py
 python benchmarks/bench_matmul.py
 python benchmarks/bench_mlp.py
 python benchmarks/bench_attention.py
@@ -174,11 +205,14 @@ python benchmarks/bench_creation.py
 python benchmarks/bench_manipulation.py
 python benchmarks/bench_elementwise.py
 python benchmarks/bench_trace.py
+python benchmarks/bench_cuda_elementwise.py
+python benchmarks/bench_cuda_matmul.py
 ```
 ## Documentation
 - [Backend status](docs/backend-status.md)
+- [CUDA status](docs/cuda.md)
 - [Tracing](docs/tracing.md)
 - [Functional API](docs/functional-api.md)
 - [Random namespace](docs/random.md)
@@ -206,5 +240,8 @@ python benchmarks/bench_trace.py
 - Expand tensor and autograd coverage.
 - Wire optional native CPU kernels only after a safe Python extension exists.
-- Add tested CUDA, ROCm, and oneAPI packages when hardware-backed CI exists.
+- Build a real optional CUDA package after tensor storage, kernels, bindings,
+  and hardware-backed CI are in place.
+- Add tested ROCm and oneAPI packages after CUDA establishes the accelerator
+  backend contract.
 - Improve compiler graph capture, analysis passes, and lowering.

modelstudio-0.6.0/benchmarks/bench_cuda_elementwise.py ADDED Viewed

@@ -0,0 +1,54 @@
+from __future__ import annotations
+import platform
+import time
+from collections.abc import Callable
+import modelstudio as ms
+def _time_ms(fn: Callable[[], object], warmup: int, iterations: int, *, synchronize: bool) -> float:
+    for _ in range(warmup):
+        fn()
+    if synchronize:
+        ms.cuda.synchronize()
+    start = time.perf_counter()
+    for _ in range(iterations):
+        fn()
+    if synchronize:
+        ms.cuda.synchronize()
+    return (time.perf_counter() - start) * 1000.0 / iterations
+def main() -> None:
+    shape = (1024, 1024)
+    warmup = 5
+    iterations = 50
+    print(f"Python:      {platform.python_version()}")
+    print(f"NumPy:       {ms.numpy.__version__}")
+    print(f"ModelStudio: {ms.__version__}")
+    print(f"CUDA:        available={ms.cuda.is_available()} device_count={ms.cuda.device_count()}")
+    print(f"Shape:       {shape}")
+    print(f"Warmup:      {warmup}")
+    print(f"Iterations:  {iterations}")
+    if not ms.cuda.is_available():
+        print(ms.cuda.memory_summary())
+        print("Skipping CUDA elementwise benchmark because CUDA tensor execution is not available.")
+        return
+    ms.manual_seed(123)
+    x = ms.randn(shape, device="cuda")
+    y = ms.randn(shape, device="cuda")
+    add_ms = _time_ms(lambda: x + y, warmup, iterations, synchronize=True)
+    relu_ms = _time_ms(lambda: ms.relu(x), warmup, iterations, synchronize=True)
+    print(f"CUDA add avg:  {add_ms:.3f} ms")
+    print(f"CUDA relu avg: {relu_ms:.3f} ms")
+    print(ms.cuda.memory_summary())
+if __name__ == "__main__":
+    main()

modelstudio-0.6.0/benchmarks/bench_cuda_matmul.py ADDED Viewed

@@ -0,0 +1,52 @@
+from __future__ import annotations
+import platform
+import time
+from collections.abc import Callable
+import modelstudio as ms
+def _time_ms(fn: Callable[[], object], warmup: int, iterations: int, *, synchronize: bool) -> float:
+    for _ in range(warmup):
+        fn()
+    if synchronize:
+        ms.cuda.synchronize()
+    start = time.perf_counter()
+    for _ in range(iterations):
+        fn()
+    if synchronize:
+        ms.cuda.synchronize()
+    return (time.perf_counter() - start) * 1000.0 / iterations
+def main() -> None:
+    shape = (512, 512)
+    warmup = 3
+    iterations = 20
+    print(f"Python:      {platform.python_version()}")
+    print(f"NumPy:       {ms.numpy.__version__}")
+    print(f"ModelStudio: {ms.__version__}")
+    print(f"CUDA:        available={ms.cuda.is_available()} device_count={ms.cuda.device_count()}")
+    print(f"Shape:       {shape} x {shape}")
+    print(f"Warmup:      {warmup}")
+    print(f"Iterations:  {iterations}")
+    if not ms.cuda.is_available():
+        print(ms.cuda.memory_summary())
+        print("Skipping CUDA matmul benchmark because CUDA tensor execution is not available.")
+        return
+    ms.manual_seed(123)
+    a = ms.randn(shape, device="cuda")
+    b = ms.randn(shape, device="cuda")
+    matmul_ms = _time_ms(lambda: a @ b, warmup, iterations, synchronize=True)
+    print(f"CUDA matmul avg: {matmul_ms:.3f} ms")
+    print(ms.cuda.memory_summary())
+if __name__ == "__main__":
+    main()

{modelstudio-0.5.0 → modelstudio-0.6.0}/benchmarks/bench_elementwise.py RENAMED Viewed

@@ -41,4 +41,3 @@ def main() -> None:
 if __name__ == "__main__":
     main()

{modelstudio-0.5.0 → modelstudio-0.6.0}/benchmarks/bench_trace.py RENAMED Viewed

@@ -44,4 +44,3 @@ def main() -> None:
 if __name__ == "__main__":
     main()

{modelstudio-0.5.0 → modelstudio-0.6.0}/csrc/CMakeLists.txt RENAMED Viewed

@@ -9,9 +9,17 @@ add_library(modelstudio_native STATIC
 target_include_directories(modelstudio_native PUBLIC ${CMAKE_CURRENT_SOURCE_DIR})
 if(MODELSTUDIO_ENABLE_CUDA)
-  enable_language(CUDA)
-  target_sources(modelstudio_native PRIVATE backends/cuda/cuda_backend.cu)
+  target_sources(modelstudio_native PRIVATE
+    backends/cuda/cuda_backend.cu
+    backends/cuda/cuda_context.cu
+    backends/cuda/cuda_memory.cu
+    backends/cuda/cuda_stream.cu
+    backends/cuda/kernels/elementwise.cu
+    backends/cuda/kernels/reductions.cu
+    backends/cuda/kernels/matmul.cu
+  )
   target_compile_definitions(modelstudio_native PUBLIC MODELSTUDIO_ENABLE_CUDA=1)
+  target_link_libraries(modelstudio_native PUBLIC CUDA::cudart CUDA::cublas)
 endif()
 if(MODELSTUDIO_ENABLE_ROCM)

modelstudio-0.6.0/csrc/backends/cuda/README.md ADDED Viewed

@@ -0,0 +1,19 @@
+# CUDA Backend
+This directory is scaffolding for a future NVIDIA CUDA backend.
+Current status:
+- Not built by default.
+- Enabled only with `MODELSTUDIO_ENABLE_CUDA=ON`.
+- Python CPU users do not import or depend on CUDA artifacts.
+- Context, allocator, stream, and kernel entry-point files are present as
+  scaffolding only.
+Implementation path:
+1. Track allocation sizes and ownership in the CUDA allocator.
+2. Add device tensor storage and shape/stride views.
+3. Replace placeholder kernel entry points with tested CUDA kernels.
+4. Bind CUDA runtime functions and tensors into Python.
+5. Register the native backend with the dispatcher only when all required ops
+   are implemented.
+6. Ship as an optional package such as `modelstudio-cuda`.

modelstudio-0.6.0/csrc/backends/cuda/cuda_backend.cu ADDED Viewed

@@ -0,0 +1,28 @@
+#include "backends/cuda/cuda_backend.hpp"
+#include "backends/cuda/cuda_kernels.hpp"
+#include "core/error.hpp"
+namespace modelstudio::cuda {
+Tensor CUDABackend::empty(const Shape&, DType) {
+  throw Error("CUDA tensor allocation is scaffolded but not wired into Python yet");
+}
+Tensor CUDABackend::add(const Tensor& lhs, const Tensor& rhs) {
+  return add_kernel(lhs, rhs);
+}
+Tensor CUDABackend::mul(const Tensor& lhs, const Tensor& rhs) {
+  return mul_kernel(lhs, rhs);
+}
+Tensor CUDABackend::matmul(const Tensor& lhs, const Tensor& rhs) {
+  return matmul_cublas(lhs, rhs);
+}
+Tensor CUDABackend::relu(const Tensor& input) {
+  return relu_kernel(input);
+}
+}  // namespace modelstudio::cuda

modelstudio-0.6.0/csrc/backends/cuda/cuda_context.cu ADDED Viewed

@@ -0,0 +1,37 @@
+#include "backends/cuda/cuda_context.hpp"
+#include <cuda_runtime_api.h>
+#include <string>
+#include "core/error.hpp"
+namespace modelstudio::cuda {
+void check_cuda(int status, const char* operation) {
+  if (status != cudaSuccess) {
+    throw Error(std::string(operation) + " failed: " + cudaGetErrorString(static_cast<cudaError_t>(status)));
+  }
+}
+int device_count() {
+  int count = 0;
+  auto status = cudaGetDeviceCount(&count);
+  if (status == cudaErrorNoDevice || status == cudaErrorInsufficientDriver) {
+    return 0;
+  }
+  check_cuda(status, "cudaGetDeviceCount");
+  return count;
+}
+int current_device() {
+  int device = 0;
+  check_cuda(cudaGetDevice(&device), "cudaGetDevice");
+  return device;
+}
+void set_device(int index) {
+  check_cuda(cudaSetDevice(index), "cudaSetDevice");
+}
+}  // namespace modelstudio::cuda

modelstudio-0.6.0/csrc/backends/cuda/cuda_context.hpp ADDED Viewed

@@ -0,0 +1,10 @@
+#pragma once
+namespace modelstudio::cuda {
+int device_count();
+int current_device();
+void set_device(int index);
+void check_cuda(int status, const char* operation);
+}  // namespace modelstudio::cuda

modelstudio-0.6.0/csrc/backends/cuda/cuda_kernels.hpp ADDED Viewed

@@ -0,0 +1,16 @@
+#pragma once
+#include "core/tensor.hpp"
+namespace modelstudio::cuda {
+Tensor add_kernel(const Tensor& lhs, const Tensor& rhs);
+Tensor sub_kernel(const Tensor& lhs, const Tensor& rhs);
+Tensor mul_kernel(const Tensor& lhs, const Tensor& rhs);
+Tensor div_kernel(const Tensor& lhs, const Tensor& rhs);
+Tensor relu_kernel(const Tensor& input);
+Tensor sum_kernel(const Tensor& input);
+Tensor mean_kernel(const Tensor& input);
+Tensor matmul_cublas(const Tensor& lhs, const Tensor& rhs);
+}  // namespace modelstudio::cuda

modelstudio-0.6.0/csrc/backends/cuda/cuda_memory.cu ADDED Viewed

@@ -0,0 +1,34 @@
+#include "backends/cuda/cuda_memory.hpp"
+#include <cuda_runtime_api.h>
+#include <atomic>
+#include <cstddef>
+#include "backends/cuda/cuda_context.hpp"
+namespace modelstudio::cuda {
+namespace {
+std::atomic<unsigned long long> g_allocated_bytes{0};
+}
+void* CUDAMemoryAllocator::allocate(unsigned long long bytes) {
+  void* ptr = nullptr;
+  check_cuda(cudaMalloc(&ptr, static_cast<std::size_t>(bytes)), "cudaMalloc");
+  g_allocated_bytes.fetch_add(bytes, std::memory_order_relaxed);
+  return ptr;
+}
+void CUDAMemoryAllocator::deallocate(void* ptr) {
+  if (ptr == nullptr) {
+    return;
+  }
+  check_cuda(cudaFree(ptr), "cudaFree");
+}
+unsigned long long allocated_bytes() {
+  return g_allocated_bytes.load(std::memory_order_relaxed);
+}
+}  // namespace modelstudio::cuda

{modelstudio-0.5.0 → modelstudio-0.6.0}/csrc/backends/cuda/cuda_memory.hpp RENAMED Viewed

@@ -9,4 +9,6 @@ class CUDAMemoryAllocator {
   void deallocate(void* ptr);
 };
+unsigned long long allocated_bytes();
 }  // namespace modelstudio::cuda

modelstudio-0.6.0/csrc/backends/cuda/cuda_stream.cu ADDED Viewed

@@ -0,0 +1,13 @@
+#include "backends/cuda/cuda_stream.hpp"
+#include <cuda_runtime_api.h>
+#include "backends/cuda/cuda_context.hpp"
+namespace modelstudio::cuda {
+void synchronize_device() {
+  check_cuda(cudaDeviceSynchronize(), "cudaDeviceSynchronize");
+}
+}  // namespace modelstudio::cuda

modelstudio-0.6.0/csrc/backends/cuda/cuda_stream.hpp ADDED Viewed

@@ -0,0 +1,7 @@
+#pragma once
+namespace modelstudio::cuda {
+void synchronize_device();
+}  // namespace modelstudio::cuda

modelstudio-0.6.0/csrc/backends/cuda/kernels/elementwise.cu ADDED Viewed

@@ -0,0 +1,27 @@
+#include "backends/cuda/cuda_kernels.hpp"
+#include "core/error.hpp"
+namespace modelstudio::cuda {
+Tensor add_kernel(const Tensor&, const Tensor&) {
+  throw Error("CUDA add kernel is scaffolded but not wired into Python yet");
+}
+Tensor sub_kernel(const Tensor&, const Tensor&) {
+  throw Error("CUDA sub kernel is scaffolded but not wired into Python yet");
+}
+Tensor mul_kernel(const Tensor&, const Tensor&) {
+  throw Error("CUDA mul kernel is scaffolded but not wired into Python yet");
+}
+Tensor div_kernel(const Tensor&, const Tensor&) {
+  throw Error("CUDA div kernel is scaffolded but not wired into Python yet");
+}
+Tensor relu_kernel(const Tensor&) {
+  throw Error("CUDA relu kernel is scaffolded but not wired into Python yet");
+}
+}  // namespace modelstudio::cuda

modelstudio-0.6.0/csrc/backends/cuda/kernels/matmul.cu ADDED Viewed

@@ -0,0 +1,13 @@
+#include "backends/cuda/cuda_kernels.hpp"
+#include <cublas_v2.h>
+#include "core/error.hpp"
+namespace modelstudio::cuda {
+Tensor matmul_cublas(const Tensor&, const Tensor&) {
+  throw Error("CUDA cuBLAS matmul is scaffolded but not wired into Python yet");
+}
+}  // namespace modelstudio::cuda

modelstudio-0.6.0/csrc/backends/cuda/kernels/reductions.cu ADDED Viewed

@@ -0,0 +1,15 @@
+#include "backends/cuda/cuda_kernels.hpp"
+#include "core/error.hpp"
+namespace modelstudio::cuda {
+Tensor sum_kernel(const Tensor&) {
+  throw Error("CUDA sum reduction is scaffolded but not wired into Python yet");
+}
+Tensor mean_kernel(const Tensor&) {
+  throw Error("CUDA mean reduction is scaffolded but not wired into Python yet");
+}
+}  // namespace modelstudio::cuda

modelstudio-0.6.0/csrc/bindings/cuda_bindings.cpp ADDED Viewed

@@ -0,0 +1,12 @@
+#include "backends/cuda/cuda_context.hpp"
+#include "backends/cuda/cuda_memory.hpp"
+#include "backends/cuda/cuda_stream.hpp"
+namespace modelstudio::bindings {
+// Future Python extension registration point. The CPU-only wheel does not build
+// this file; CUDA bindings will be enabled only when MODELSTUDIO_ENABLE_CUDA=ON
+// and a binding layer is added.
+void register_cuda_bindings_placeholder() {}
+}  // namespace modelstudio::bindings

{modelstudio-0.5.0 → modelstudio-0.6.0}/docs/backend-architecture.md RENAMED Viewed

@@ -16,7 +16,7 @@ Tensor API
 | Backend | Status |
 | --- | --- |
 | CPU | Working MVP backed by NumPy |
-| CUDA | Scaffold only |
+| CUDA | Public status namespace plus native scaffold; no tensor execution in the CPU wheel |
 | ROCm | Scaffold only |
 | oneAPI | Scaffold only |
@@ -46,7 +46,7 @@ The native scaffolding under `csrc/` mirrors the Python runtime:
 - `csrc/core`: dtype, device, shape, tensor metadata, storage
 - `csrc/dispatcher`: backend interface and operator registry
 - `csrc/backends/cpu`: native CPU backend and kernels
-- `csrc/backends/cuda`: CUDA placeholders
+- `csrc/backends/cuda`: CUDA context, memory, stream, and kernel scaffolds
 - `csrc/backends/rocm`: ROCm/HIP placeholders
 - `csrc/backends/oneapi`: oneAPI/SYCL placeholders
@@ -58,7 +58,7 @@ Python API stays stable.
 The CPU MVP stores arrays in NumPy. Future native backends should introduce:
 - CPU allocator abstractions for native storage
-- CUDA device allocator and stream support
+- Complete CUDA device allocator and stream support
 - HIP allocator and stream support
 - SYCL allocator and queue support

modelstudio 0.5.0__tar.gz → 0.6.0__tar.gz

modelstudio 0.5.0tar.gz → 0.6.0tar.gz