PyPI - blksprs - Versions diffs - 2.0rc6__tar.gz → 2.0rc7__tar.gz - Mend

blksprs 2.0rc6tar.gz → 2.0rc7tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (28) hide show

{blksprs-2.0rc6 → blksprs-2.0rc7}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: blksprs
-Version: 2.0rc6
+Version: 2.0rc7
 Summary: A lightweight library for operations on blocksparse matrices in PyTorch.
 Author-email: Felix Schön <schoen@kr.tuwien.ac.at>
 Project-URL: Homepage, https://github.com/FelixSchoen/blksprs
@@ -108,12 +108,16 @@ library.
 ## Known Limitations and Issues
+- Triton has a bug with `tl.atomix_max()` used for the row-wise max operation.
+  In order to work around this bug a manual conversion of some values is needed, (slightly) negatively impacting
+  performance.
+  Watch the [issue](https://github.com/triton-lang/triton/issues/6376) on Triton's issue tracker for more information.
 - PyTorch's `wrap_triton()` currently does not support config pruning. It thus cannot be used for some of the kernels,
   which could impact graph compilation.
 - There seem to be some issues with autocasting, forcing some operations to manually cast.
 - There will be some slight numerical differences between vanilla and blksprs operations.
-These instabilities are due to Triton and thus cannot be fixed by this library alone.
-However, for all intents and purposes, these very minor differences should not matter and can safely be ignored.
+  These instabilities are due to Triton and thus cannot be fixed by this library alone.
+  However, for all intents and purposes, these very minor differences should not matter and can safely be ignored.
 ## Usage

{blksprs-2.0rc6 → blksprs-2.0rc7}/README.md RENAMED Viewed

@@ -89,12 +89,16 @@ library.
 ## Known Limitations and Issues
+- Triton has a bug with `tl.atomix_max()` used for the row-wise max operation.
+  In order to work around this bug a manual conversion of some values is needed, (slightly) negatively impacting
+  performance.
+  Watch the [issue](https://github.com/triton-lang/triton/issues/6376) on Triton's issue tracker for more information.
 - PyTorch's `wrap_triton()` currently does not support config pruning. It thus cannot be used for some of the kernels,
   which could impact graph compilation.
 - There seem to be some issues with autocasting, forcing some operations to manually cast.
 - There will be some slight numerical differences between vanilla and blksprs operations.
-These instabilities are due to Triton and thus cannot be fixed by this library alone.
-However, for all intents and purposes, these very minor differences should not matter and can safely be ignored.
+  These instabilities are due to Triton and thus cannot be fixed by this library alone.
+  However, for all intents and purposes, these very minor differences should not matter and can safely be ignored.
 ## Usage

{blksprs-2.0rc6 → blksprs-2.0rc7}/blksprs/ops/distribution.py RENAMED Viewed

@@ -240,7 +240,7 @@ def scatter(src: BlksprsTensor, sparsity_layout_src: Tensor,
                           reduce_op="none", lut=lut)
-@torch.amp.custom_fwd(device_type="cuda", cast_inputs=torch.float16)
+@torch.amp.custom_fwd(device_type="cuda", cast_inputs=torch.float32)
 def scatter_reduce(src: BlksprsTensor, sparsity_layout_src: Tensor,
                    dim: int,
                    idx: BlksprsTensor,

{blksprs-2.0rc6 → blksprs-2.0rc7}/blksprs/ops/misc/broadcast_ops.py RENAMED Viewed

@@ -12,6 +12,7 @@ from blksprs.utils.validation import validate_contiguous, validate_device, \
     validate_sparsity_block_size
+@torch.amp.custom_fwd(device_type="cuda", cast_inputs=torch.float16)
 def broadcast_add(x: Tensor, y: Tensor, sparsity_layout_output: Tensor,
                   sparsity_block_size: int) -> BlksprsTensor:
     """Performs a broadcast and subsequent addition of two dense tensors x and y. Returns a block-sparse tensor in

{blksprs-2.0rc6 → blksprs-2.0rc7}/blksprs/ops/misc/row_wise.py RENAMED Viewed

@@ -4,9 +4,9 @@ from torch import Tensor
 from torch._library.triton import wrap_triton, triton_op
 from triton import language as tl
-from blksprs.utils.blksprs_tensor import BlksprsTensor
-from blksprs.utils.tools import stride, get_autocast_min_val
 from blksprs.utils.autotuning import get_autotune_configs, prune_autotune_configs
+from blksprs.utils.blksprs_tensor import BlksprsTensor
+from blksprs.utils.tools import stride
 from blksprs.utils.validation import validate_dimensions, validate_contiguous, validate_device, validate_sparsity, \
     validate_sparsity_block_size
@@ -95,6 +95,7 @@ def row_wise_sum_forward(x: Tensor, sparsity_lut: Tensor,
     return output
+# noinspection PyUnusedLocal
 @triton.autotune(
     configs=get_autotune_configs(),
     key=["sparsity_block_size"],
@@ -175,6 +176,8 @@ def row_wise_max(x: BlksprsTensor, sparsity_layout: Tensor, sparsity_block_size:
             of the input and the sparsity layout of the output tensor.
     """
+    # TODO Fix for triton bug, see https://github.com/triton-lang/triton/issues/6376
+    x = torch.where(x == -0.0, torch.tensor(0.0), x)
     x = x.contiguous()
     validate_dimensions(x)
@@ -209,7 +212,7 @@ def row_wise_max_forward(x: Tensor, sparsity_lut: Tensor,
     output = torch.full(size=(n_sparse_blocks_output,
                               sparsity_block_size,
                               1 if flag_slice_only else sparsity_block_size),
-                        fill_value=get_autocast_min_val(),
+                        fill_value=torch.finfo(x.dtype).min,
                         device=x.device)
     x_b, x_r, x_c = x.size()
@@ -238,6 +241,7 @@ def row_wise_max_forward(x: Tensor, sparsity_lut: Tensor,
     return output
+# noinspection PyUnusedLocal
 @triton.autotune(
     configs=get_autotune_configs(),
     key=["sparsity_block_size"],

{blksprs-2.0rc6 → blksprs-2.0rc7}/blksprs/utils/tools.py RENAMED Viewed

@@ -26,12 +26,3 @@ def stride(x: Tensor):
         return x.size(1) * x.size(2), x.size(2), 1
     else:
         raise NotImplementedError
-def get_autocast_min_val():
-    if torch.is_autocast_enabled():
-        dtype = torch.get_autocast_dtype("cuda")
-    else:
-        dtype = torch.float
-    return torch.finfo(dtype).min

{blksprs-2.0rc6 → blksprs-2.0rc7}/blksprs.egg-info/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: blksprs
-Version: 2.0rc6
+Version: 2.0rc7
 Summary: A lightweight library for operations on blocksparse matrices in PyTorch.
 Author-email: Felix Schön <schoen@kr.tuwien.ac.at>
 Project-URL: Homepage, https://github.com/FelixSchoen/blksprs
@@ -108,12 +108,16 @@ library.
 ## Known Limitations and Issues
+- Triton has a bug with `tl.atomix_max()` used for the row-wise max operation.
+  In order to work around this bug a manual conversion of some values is needed, (slightly) negatively impacting
+  performance.
+  Watch the [issue](https://github.com/triton-lang/triton/issues/6376) on Triton's issue tracker for more information.
 - PyTorch's `wrap_triton()` currently does not support config pruning. It thus cannot be used for some of the kernels,
   which could impact graph compilation.
 - There seem to be some issues with autocasting, forcing some operations to manually cast.
 - There will be some slight numerical differences between vanilla and blksprs operations.
-These instabilities are due to Triton and thus cannot be fixed by this library alone.
-However, for all intents and purposes, these very minor differences should not matter and can safely be ignored.
+  These instabilities are due to Triton and thus cannot be fixed by this library alone.
+  However, for all intents and purposes, these very minor differences should not matter and can safely be ignored.
 ## Usage

{blksprs-2.0rc6 → blksprs-2.0rc7}/pyproject.toml RENAMED Viewed

@@ -1,6 +1,6 @@
 [project]
 name = "blksprs"
-version = "2.0-rc.6"
+version = "2.0-rc.7"
 authors = [{ name = "Felix Schön", email = "schoen@kr.tuwien.ac.at" }]
 description = "A lightweight library for operations on blocksparse matrices in PyTorch."
 readme = "README.md"